All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: John Harrison <John.C.Harrison@Intel.com>,
	Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [PATCH v5 06/35] drm/i915: Start of GPU scheduler
Date: Fri, 26 Feb 2016 11:13:30 +0200	[thread overview]
Message-ID: <1456478010.7316.1.camel@linux.intel.com> (raw)
In-Reply-To: <56C74AE9.90009@Intel.com>

Hi,

The below answers are reasonable. So v6 should be the version.

Regards, Joonas

On pe, 2016-02-19 at 17:03 +0000, John Harrison wrote:
> On 19/02/2016 13:03, Joonas Lahtinen wrote:
> > 
> > Hi,
> > 
> > Now the code is in reviewable chunks, excellent!
> > 
> > I've added my comments below. A few repeats from last round, but
> > now
> > with more questions about the logic itself.
> > 
> > On to, 2016-02-18 at 14:26 +0000, John.C.Harrison@Intel.com wrote:
> > > 
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > Initial creation of scheduler source files. Note that this patch
> > > implements most of the scheduler functionality but does not hook
> > > it in
> > > to the driver yet. It also leaves the scheduler code in 'pass
> > > through'
> > > mode so that even when it is hooked in, it will not actually do
> > > very
> > > much. This allows the hooks to be added one at a time in bite
> > > size
> > > chunks and only when the scheduler is finally enabled at the end
> > > does
> > > anything start happening.
> > > 
> > > The general theory of operation is that when batch buffers are
> > > submitted to the driver, the execbuffer() code packages up all
> > > the
> > > information required to execute the batch buffer at a later time.
> > > This
> > > package is given over to the scheduler which adds it to an
> > > internal
> > > node list. The scheduler also scans the list of objects
> > > associated
> > > with the batch buffer and compares them against the objects
> > > already in
> > > use by other buffers in the node list. If matches are found then
> > > the
> > > new batch buffer node is marked as being dependent upon the
> > > matching
> > > node. The same is done for the context object. The scheduler also
> > > bumps up the priority of such matching nodes on the grounds that
> > > the
> > > more dependencies a given batch buffer has the more important it
> > > is
> > > likely to be.
> > > 
> > > The scheduler aims to have a given (tuneable) number of batch
> > > buffers
> > > in flight on the hardware at any given time. If fewer than this
> > > are
> > > currently executing when a new node is queued, then the node is
> > > passed
> > > straight through to the submit function. Otherwise it is simply
> > > added
> > > to the queue and the driver returns back to user land.
> > > 
> > > The scheduler is notified when each batch buffer completes and
> > > updates
> > > its internal tracking accordingly. At the end of the completion
> > > interrupt processing, if any scheduler tracked batches were
> > > processed,
> > > the scheduler's deferred worker thread is woken up. This can do
> > > more
> > > involved processing such as actually removing completed nodes
> > > from the
> > > queue and freeing up the resources associated with them (internal
> > > memory allocations, DRM object references, context reference,
> > > etc.).
> > > The work handler also checks the in flight count and calls the
> > > submission code if a new slot has appeared.
> > > 
> > > When the scheduler's submit code is called, it scans the queued
> > > node
> > > list for the highest priority node that has no unmet
> > > dependencies.
> > > Note that the dependency calculation is complex as it must take
> > > inter-ring dependencies and potential preemptions into account.
> > > Note
> > > also that in the future this will be extended to include external
> > > dependencies such as the Android Native Sync file descriptors
> > > and/or
> > > the linux dma-buff synchronisation scheme.
> > > 
> > > If a suitable node is found then it is sent to execbuff_final()
> > > for
> > > submission to the hardware. The in flight count is then re-
> > > checked and
> > > a new node popped from the list if appropriate. All nodes that
> > > are not
> > > submitted have their priority bumped. This ensures that low
> > > priority
> > > tasks do not get starved out by busy higher priority ones -
> > > everything
> > > will eventually get its turn to run.
> > > 
> > > Note that this patch does not implement pre-emptive scheduling.
> > > Only
> > > basic scheduling by re-ordering batch buffer submission is
> > > currently
> > > implemented. Pre-emption of actively executing batch buffers
> > > comes in
> > > the next patch series.
> > > 
> > > v2: Changed priority levels to +/-1023 due to feedback from Chris
> > > Wilson.
> > > 
> > > Removed redundant index from scheduler node.
> > > 
> > > Changed time stamps to use jiffies instead of raw monotonic. This
> > > provides lower resolution but improved compatibility with other
> > > i915
> > > code.
> > > 
> > > Major re-write of completion tracking code due to struct fence
> > > conversion. The scheduler no longer has it's own private IRQ
> > > handler
> > > but just lets the existing request code handle completion events.
> > > Instead, the scheduler now hooks into the request notify code to
> > > be
> > > told when a request has completed.
> > > 
> > > Reduced driver mutex locking scope. Removal of scheduler nodes no
> > > longer grabs the mutex lock.
> > > 
> > > v3: Refactor of dependency generation to make the code more
> > > readable.
> > > Also added in read-read optimisation support - i.e., don't treat
> > > a
> > > shared read-only buffer as being a dependency.
> > > 
> > > Allowed the killing of queued nodes rather than only flying ones.
> > > 
> > > v4: Updated the commit message to better reflect the current
> > > state of
> > > the code. Downgraded some BUG_ONs to WARN_ONs. Used the correct
> > > array
> > > memory allocator function (kmalloc_array instead of kmalloc).
> > > Corrected the format of some comments. Wrapped some lines
> > > differently
> > > to keep the style checker happy.
> > > 
> > > Fixed a WARN_ON when killing nodes. The dependency removal code
> > > checks
> > > that nodes being destroyed do not have any oustanding
> > > dependencies
> > > (which would imply they should not have been executed yet). In
> > > the
> > > case of nodes being destroyed, e.g. due to context banning, then
> > > this
> > > might well be the case - they have not been executed and do
> > > indeed
> > > have outstanding dependencies.
> > > 
> > > Re-instated the code to disble interrupts when not in use. The
> > > underlying problem causing broken IRQ reference counts seems to
> > > have
> > > been fixed now.
> > > 
> > > v5: Shuffled various functions around to remove forward
> > > declarations
> > > as apparently these are frowned upon. Removed lots of white space
> > > as
> > > apparently having easy to read code is also frowned upon. Split
> > > the
> > > direct submission scheduler bypass code out into a separate
> > > function.
> > > Squashed down the i915_scheduler.c sections of various patches
> > > into
> > > this patch. Thus the later patches simply hook in existing code
> > > into
> > > various parts of the driver rather than adding the code as well.
> > > Added
> > > documentation to various functions. Re-worked the submit function
> > > in
> > > terms of mutex locking, error handling and exit paths. Split the
> > > delayed work handler function in half. Made use of the kernel
> > > 'clamp'
> > > macro. [Joonas Lahtinen]
> > > 
> > > Added runtime PM calls as these must be done at the top level
> > > before
> > > acquiring the driver mutex lock. [Chris Wilson]
> > > 
> > > Removed some obsolete debug code that had been forgotten about.
> > > 
> > > Moved more clean up code into the
> > > 'i915_gem_scheduler_clean_node()'
> > > function rather than replicating it in mutliple places.
> > > 
> > > Used lighter weight spinlocks.
> > > 
> > > For: VIZ-1587
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/Makefile         |   1 +
> > >   drivers/gpu/drm/i915/i915_drv.h       |   6 +
> > >   drivers/gpu/drm/i915/i915_gem.c       |   5 +
> > >   drivers/gpu/drm/i915/i915_scheduler.c | 874
> > > ++++++++++++++++++++++++++++++++++
> > >   drivers/gpu/drm/i915/i915_scheduler.h |  95 ++++
> > >   5 files changed, 981 insertions(+)
> > >   create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
> > >   create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Makefile
> > > b/drivers/gpu/drm/i915/Makefile
> > > index 15398c5..79cb38b 100644
> > > --- a/drivers/gpu/drm/i915/Makefile
> > > +++ b/drivers/gpu/drm/i915/Makefile
> > > @@ -10,6 +10,7 @@ ccflags-y := -Werror
> > >   i915-y := i915_drv.o \
> > >   	  i915_irq.o \
> > >   	  i915_params.o \
> > > +	  i915_scheduler.o \
> > >             i915_suspend.o \
> > >   	  i915_sysfs.o \
> > >   	  intel_csr.o \
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > > b/drivers/gpu/drm/i915/i915_drv.h
> > > index f4487b9..03add1a 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -1703,6 +1703,8 @@ struct i915_execbuffer_params {
> > >   	struct drm_i915_gem_request     *request;
> > >   };
> > >   
> > > +struct i915_scheduler;
> > > +
> > >   /* used in computing the new watermarks state */
> > >   struct intel_wm_config {
> > >   	unsigned int num_pipes_active;
> > > @@ -1968,6 +1970,8 @@ struct drm_i915_private {
> > >   
> > >   	struct i915_runtime_pm pm;
> > >   
> > > +	struct i915_scheduler *scheduler;
> > > +
> > I think we should have i915.enable_scheduler parameter (just like
> > i915.enable_execlists) and make this a data member, not pointer.
> > 
> > Then we can go forward and have i915.enable_scheduler_preempt and
> > so on
> > to control things at boot time.
> There are. The enable_scheduler comes in as patch #21. Including it
> from 
> the start means modifying i915_params.[ch] in this patch as well.
> And 
> then you are getting into one big patch that contains everything and
> is 
> unreviewable due to its sheer size. The enable_preemption parameter 
> comes in with the pre-emption code. No point in adding that until
> there 
> is something for it to control.
> 
> 
> > 
> > 
> > > 
> > >   	/* Abstract the submission mechanism (legacy ringbuffer
> > > or execlists) away */
> > >   	struct {
> > >   		int (*execbuf_submit)(struct
> > > i915_execbuffer_params *params,
> > > @@ -2290,6 +2294,8 @@ struct drm_i915_gem_request {
> > >   	/** process identifier submitting this request */
> > >   	struct pid *pid;
> > >   
> > > +	struct i915_scheduler_queue_entry	*scheduler_qe;
> > > +
> > Funny whitespace.
> > 
> > > 
> > >   	/**
> > >   	 * The ELSP only accepts two elements at a time, so we
> > > queue
> > >   	 * context/tail pairs on a given queue (ring-
> > > >execlist_queue) until the
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > > b/drivers/gpu/drm/i915/i915_gem.c
> > > index dfe43ea..7d9aa24 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -32,6 +32,7 @@
> > >   #include "i915_vgpu.h"
> > >   #include "i915_trace.h"
> > >   #include "intel_drv.h"
> > > +#include "i915_scheduler.h"
> > >   #include
> > >   #include
> > >   #include
> > > @@ -5315,6 +5316,10 @@ int i915_gem_init(struct drm_device *dev)
> > >   	 */
> > >   	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> > >   
> > > +	ret = i915_scheduler_init(dev);
> > > +	if (ret)
> > > +		goto out_unlock;
> > > +
> > >   	ret = i915_gem_init_userptr(dev);
> > >   	if (ret)
> > >   		goto out_unlock;
> > > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c
> > > b/drivers/gpu/drm/i915/i915_scheduler.c
> > > new file mode 100644
> > > index 0000000..fc23ee7
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > > @@ -0,0 +1,874 @@
> > > +/*
> > > + * Copyright (c) 2014 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person
> > > obtaining a
> > > + * copy of this software and associated documentation files (the
> > > "Software"),
> > > + * to deal in the Software without restriction, including
> > > without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute,
> > > sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to
> > > whom the
> > > + * Software is furnished to do so, subject to the following
> > > conditions:
> > > + *
> > > + * The above copyright notice and this permission notice
> > > (including the next
> > > + * paragraph) shall be included in all copies or substantial
> > > portions of the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > > KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> > > EVENT SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> > > DAMAGES OR OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> > > OTHERWISE, ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > OTHER DEALINGS
> > > + * IN THE SOFTWARE.
> > > + *
> > > + */
> > > +
> > > +#include "i915_drv.h"
> > > +#include "intel_drv.h"
> > > +#include "i915_scheduler.h"
> > > +
> > > +/**
> > > + * i915_scheduler_is_enabled - Returns true if the scheduler is
> > > enabled.
> > > + * @dev: DRM device
> > > + */
> > > +bool i915_scheduler_is_enabled(struct drm_device *dev)
> > > +{
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +
> > > +	return dev_priv->scheduler != NULL;
> > > +}
> > i915.enable_scheduler would be used instead.
> It is when it arrives.
> 
> > 
> > 
> > > 
> > > +
> > > +/**
> > > + * i915_scheduler_init - Initialise the scheduler.
> > > + * @dev: DRM device
> > > + * Returns zero on success or -ENOMEM if memory allocations
> > > fail.
> > > + */
> > > +int i915_scheduler_init(struct drm_device *dev)
> > > +{
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	int r;
> > > +
> > This protection -->
> > 
> > > 
> > > +	if (scheduler)
> > > +		return 0;
> > > +
> > > +	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
> > > +	if (!scheduler)
> > > +		return -ENOMEM;
> > > +
> > Is not needed if we have data member and i915.enable_scheduler.
> Making the scheduler structure a object rather than a pointer
> basically 
> means moving the whole of i915_scheduler.h into i915_drv.h. There is
> too 
> much interconnectedness between the various structures. I would say
> that 
> keeping the code nicely partitioned and readable is worth the one 
> dynamic allocation at driver load time.
> 
> > 
> > 
> > > 
> > > +	spin_lock_init(&scheduler->lock);
> > > +
> > > +	for (r = 0; r < I915_NUM_RINGS; r++)
> > > +		INIT_LIST_HEAD(&scheduler->node_queue[r]);
> > > +
> > > +	/* Default tuning values: */
> > > +	scheduler->priority_level_min     = -1023;
> > > +	scheduler->priority_level_max     = 1023;
> > > +	scheduler->priority_level_bump    = 50;
> > > +	scheduler->priority_level_preempt = 900;
> > > +	scheduler->min_flying             = 2;
> > > +
> > > +	dev_priv->scheduler = scheduler;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > > + * Add a popped node back in to the queue. For example, because
> > > the ring was
> > > + * hung when execfinal() was called and thus the ring submission
> > > needs to be
> > > + * retried later.
> > > + */
> > > +static void i915_scheduler_node_requeue(struct
> > > i915_scheduler_queue_entry *node)
> > > +{
> > > +	WARN_ON(!node);
> > Rather remove this completely as it is an internal function, or use
> > BUG_ON because next line will straight lead to OOPS after warning.
> > I'll
> > pass mentioning rest of the obvious WARN_ON vs BUG_ON's.
> Yeah, most of the WARN/BUG_ONs are probably unnecessary now. They
> were 
> scattered liberally around during initial development when strange
> and 
> unexpected things could happen. Now the the code paths are pretty
> well 
> established they aren't really much use. I'll reduce them down to
> just 
> the ones that still make sense.
> 
> 
> > 
> > 
> > > 
> > > +	WARN_ON(!I915_SQS_IS_FLYING(node));
> > > +
> > > +	/* Seqno will be reassigned on relaunch */
> > > +	node->params.request->seqno = 0;
> > > +	node->status = i915_sqs_queued;
> > > +}
> > > +
> > > +/*
> > > + * Give up on a node completely. For example, because it is
> > > causing the
> > > + * ring to hang or is using some resource that no longer exists.
> > > + */
> > > +static void i915_scheduler_node_kill(struct
> > > i915_scheduler_queue_entry *node)
> > > +{
> > > +	WARN_ON(!node);
> > > +	WARN_ON(I915_SQS_IS_COMPLETE(node));
> > > +
> > > +	node->status = i915_sqs_dead;
> > > +}
> > > +
> > > +/* Mark a node as in flight on the hardware. */
> > As for documentation, from below, I only assume scheduler lock must
> > be
> > held and node's can be only manipulated when scheduler lock is
> > held?
> > It's good to add a comment describing the expected locking. Or
> > maybe
> > add assert_scheduler_lock_held() helper and sprinkle it around the
> > functions similar to assert_rpm_wakelock_held(), which is self-
> > documenting?
> > 
> > > 
> > > +static int i915_scheduler_node_fly(struct
> > > i915_scheduler_queue_entry *node)
> > > +{
> > > +	struct drm_i915_private *dev_priv = node->params.dev-
> > > >dev_private;
> > If node is NULL, this would already OOPS, I do not think it's
> > unreasonable to expect node not to be NULL. The below WARN_ON check
> > is
> > never reached.
> > 
> > > 
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct intel_engine_cs *ring;
> > > +
> > > +	WARN_ON(!scheduler);
> > > +	WARN_ON(!node);
> > > +	WARN_ON(node->status != i915_sqs_popped);
> > > +
> > > +	ring = node->params.ring;
> > > +
> > This and similar assignments can be squashed to declaration line,
> > as
> > this is basically an alias.
> > 
> > > 
> > > +	/*
> > > +	 * Add the node (which should currently be in state
> > > popped) to the
> > > +	 * front of the queue. This ensure that flying nodes are
> > > always held
> > > +	 * in hardware submission order.
> > > +	 */
> > > +	list_add(&node->link, &scheduler->node_queue[ring->id]);
> > > +
> > > +	node->status = i915_sqs_flying;
> > > +
> > > +	if (!(scheduler->flags[ring->id] &
> > > i915_sf_interrupts_enabled)) {
> > > +		bool success = true;
> > > +
> > > +		success = ring->irq_get(ring);
> > > +		if (success)
> > > +			scheduler->flags[ring->id] |=
> > > i915_sf_interrupts_enabled;
> > > +		else
> > > +			return -EINVAL;
> > Shouldn't the node be cleaned up from node_queue in case of an
> > error?
> Not really. The batch buffer will still be executed regardless of 
> whether interrupts could be enabled or not. The return code is
> actually 
> ignored so this should really be a void function, I guess. Without 
> interrupts then completion might take a while to be noticed but
> stuff 
> should still basically work.
> 
> 
> > 
> > 
> > Also, I think above could be written much more compact form
> > (because it
> > looks like something where more logic will be added later). It
> > makes it
> > easier to write and visualize the error paths when reading if there
> > are
> > no nested if's.
> > 
> > I won't mention about the error paths of functions below, I expect
> > the
> > following guidline to be adhered;
> > 
> > 	if (scheduler->flags[ring->id] & I915_SF_INT_ENABLED)
> > 		goto out;
> > 
> > 	if (!ring->irq_get(ring))
> > 		goto err_irq;
> > 
> > 	if (!...)
> > 		goto err_foo;
> > 
> > 	scheduler->flags[ring->id] |= I915_SF_INT_ENABLED;
> > out:
> > 	return 0;
> > err_foo:
> > 	foobar();
> > err_irq:
> > 	list_remove()
> > 	return -EINVAL;
> > 
> > > 
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static uint32_t i915_scheduler_count_flying(struct
> > > i915_scheduler *scheduler,
> > > +					    struct
> > > intel_engine_cs *ring)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	uint32_t flying = 0;
> > > +
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link)
> > +1 for the for_each_scheduler_node(...)
> > 
> > > 
> > > +		if (I915_SQS_IS_FLYING(node))
> > > +			flying++;
> > > +
> > > +	return flying;
> > > +}
> > > +
> > > +static void i915_scheduler_priority_bump_clear(struct
> > > i915_scheduler *scheduler)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int i;
> > > +
> > > +	/*
> > > +	 * Ensure circular dependencies don't cause problems and
> > > that a bump
> > > +	 * by object usage only bumps each using buffer once:
> > > +	 */
> > > +	for (i = 0; i < I915_NUM_RINGS; i++) {
> > > +		list_for_each_entry(node, &scheduler-
> > > >node_queue[i], link)
> > > +			node->bumped = false;
> > > +	}
> > > +}
> > > +
> > > +static int i915_scheduler_priority_bump(struct i915_scheduler
> > > *scheduler,
> > > +				struct
> > > i915_scheduler_queue_entry *target,
> > Is there specific reason why struct i915_scheduler_queue_entry are
> > not
> > referred to just as "node" but "qe" and here something else, do
> > "node"
> > and "qe" have a special semantic meaning?
> Not sure what you mean. The use of 'node' is because almost
> everything 
> an i915_scheduler_queue_entry object is being used it is as a node in
> a 
> list that is being iterated over. The only 'qe' variable is in 
> 'i915_scheduler_queue_execbuffer[_bypass]' and there it is because
> the 
> qe is being passed in from outside and is copied to a local 'node'
> that 
> is then added to the list. I think the use of 'target' here is
> largely 
> historical. The subsequent array search was originally a list search
> and 
> hence had a 'node' that was used as the iterator. Thus the qe passed
> in 
> was called something else to avoid conflicts and target seemed
> appropriate.
> 
> > 
> > > 
> > > +				uint32_t bump)
> > > +{
> > > +	uint32_t new_priority;
> > > +	int i, count;
> > > +
> > > +	if (target->priority >= scheduler->priority_level_max)
> > > +		return 1;
> > So if one node reaches maximum priority the dependencies are
> > expected
> > to be maxed out already?
> Yes. Dependencies should always be of equal or higher priority to
> the 
> dependee. When a new dependency is created, the priority of the
> dependee 
> is added on to ensure that something of high priority can never be
> stuck 
> waiting on something of low priority. Also, it means that a batch
> buffer 
> that lots of other buffers are waiting on will be bumped to higher
> and 
> higher priority the more waiters there are and thus is more likely to
> be 
> executed sooner and free them all up.
> 
> 
> > 
> > > 
> > > +
> > > +	if (target->bumped)
> > > +		return 0;
> > > +
> > > +	new_priority = target->priority + bump;
> > > +	if ((new_priority <= target->priority) ||
> > > +	    (new_priority > scheduler->priority_level_max))
> > > +		target->priority = scheduler-
> > > >priority_level_max;
> > > +	else
> > > +		target->priority = new_priority;
> > > +
> > > +	count = 1;
> > > +	target->bumped = true;
> > > +
> > > +	for (i = 0; i < target->num_deps; i++) {
> > > +		if (!target->dep_list[i])
> > > +			continue;
> > > +
> > > +		if (target->dep_list[i]->bumped)
> > > +			continue;
> > > +
> > > +		count += i915_scheduler_priority_bump(scheduler,
> > > +						      target-
> > > >dep_list[i],
> > > +						      bump);
> > > +	}
> > > +
> > > +	return count;
> > > +}
> > > +
> > > +/*
> > > + * Nodes are considered valid dependencies if they are queued on
> > > any ring or
> > > + * if they are in flight on a different ring. In flight on the
> > > same ring is no
> > > + * longer interesting for non-premptive nodes as the ring
> > > serialises execution.
> > > + * For pre-empting nodes, all in flight dependencies are valid
> > > as they must not
> > > + * be jumped by the act of pre-empting.
> > > + *
> > > + * Anything that is neither queued nor flying is uninteresting.
> > > + */
> > > +static inline bool i915_scheduler_is_dependency_valid(
> > > +			struct i915_scheduler_queue_entry *node,
> > > uint32_t idx)
> > > +{
> > > +	struct i915_scheduler_queue_entry *dep;
> > > +
> > > +	dep = node->dep_list[idx];
> > > +	if (!dep)
> > > +		return false;
> > > +
> > > +	if (I915_SQS_IS_QUEUED(dep))
> > > +		return true;
> > > +
> > > +	if (I915_SQS_IS_FLYING(dep)) {
> > > +		if (node->params.ring != dep->params.ring)
> > > +			return true;
> > > +	}
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static int i915_scheduler_pop_from_queue_locked(struct
> > > intel_engine_cs *ring,
> > > +				struct
> > > i915_scheduler_queue_entry **pop_node)
> > > +{
> > > +	struct drm_i915_private *dev_priv = ring->dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *best = NULL;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int ret;
> > > +	int i;
> > > +	bool any_queued = false;
> > > +	bool has_local, has_remote, only_remote = false;
> > > +
> > > +	*pop_node = NULL;
> > > +	ret = -ENODATA;
> > > +
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link) {
> > > +		if (!I915_SQS_IS_QUEUED(node))
> > > +			continue;
> > > +		any_queued = true;
> > > +
> > > +		has_local  = false;
> > > +		has_remote = false;
> > > +		for (i = 0; i < node->num_deps; i++) {
> > > +			if
> > > (!i915_scheduler_is_dependency_valid(node, i))
> > > +				continue;
> > > +
> > > +			if (node->dep_list[i]->params.ring ==
> > > node->params.ring)
> > > +				has_local = true;
> > > +			else
> > > +				has_remote = true;
> > > +		}
> > > +
> > > +		if (has_remote && !has_local)
> > > +			only_remote = true;
> > > +
> > > +		if (!has_local && !has_remote) {
> > > +			if (!best ||
> > > +			    (node->priority > best->priority))
> > > +				best = node;
> > > +		}
> > > +	}
> > > +
> > > +	if (best) {
> > > +		list_del(&best->link);
> > > +
> > > +		INIT_LIST_HEAD(&best->link);
> > > +		best->status  = i915_sqs_popped;
> > > +
> > > +		ret = 0;
> > > +	} else {
> > > +		/* Can only get here if:
> > > +		 * (a) there are no buffers in the queue
> > > +		 * (b) all queued buffers are dependent on other
> > > buffers
> > > +		 *     e.g. on a buffer that is in flight on a
> > > different ring
> > > +		 */
> > > +		if (only_remote) {
> > > +			/* The only dependent buffers are on
> > > another ring. */
> > > +			ret = -EAGAIN;
> > > +		} else if (any_queued) {
> > > +			/* It seems that something has gone
> > > horribly wrong! */
> > > +			DRM_ERROR("Broken dependency tracking on
> > > ring %d!\n",
> > > +				  (int) ring->id);
> > Would this condition be worthy WARN_ONCE(), because it will keep
> > repeating as the situation will persist?
> > 
> > > 
> > > +		}
> > > +	}
> > > +
> > > +	*pop_node = best;
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * NB: The driver mutex lock must be held before calling this
> > > function. It is
> > > + * only really required during the actual back end submission
> > > call. However,
> > > + * attempting to acquire a mutex while holding a spin lock is a
> > > Bad Idea.
> > > + * And releasing the one before acquiring the other leads to
> > > other code
> > > + * being run and interfering.
> > > + */
> > > +static int i915_scheduler_submit(struct intel_engine_cs *ring)
> > > +{
> > > +	struct drm_device *dev = ring->dev;
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int ret, count = 0, flying;
> > > +
> > > +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> > > +
> > > +	spin_lock_irq(&scheduler->lock);
> > > +
> > > +	WARN_ON(scheduler->flags[ring->id] &
> > > i915_sf_submitting);
> > > +	scheduler->flags[ring->id] |= i915_sf_submitting;
> > > +
> > > +	/* First time around, complain if anything unexpected
> > > occurs: */
> > > +	ret = i915_scheduler_pop_from_queue_locked(ring, &node);
> > > +	if (ret)
> > > +		goto error;
> > > +
> > > +	do {
> > > +		WARN_ON(!node);
> > > +		WARN_ON(node->params.ring != ring);
> > > +		WARN_ON(node->status != i915_sqs_popped);
> > > +		count++;
> > > +
> > > +		/*
> > > +		 * The call to pop above will have removed the
> > > node from the
> > > +		 * list. So add it back in and mark it as in
> > > flight.
> > > +		 */
> > > +		i915_scheduler_node_fly(node);
> > > +
> > > +		spin_unlock_irq(&scheduler->lock);
> > > +		ret = dev_priv->gt.execbuf_final(&node->params);
> > > +		spin_lock_irq(&scheduler->lock);
> > > +
> > > +		/*
> > > +		 * Handle failed submission but first check that
> > > the
> > > +		 * watchdog/reset code has not nuked the node
> > > while we
> > > +		 * weren't looking:
> > > +		 */
> > > +		if (ret && (node->status != i915_sqs_dead)) {
> > > +			bool requeue = true;
> > > +
> > > +			/*
> > > +			 * Oh dear! Either the node is broken or
> > > the ring is
> > > +			 * busy. So need to kill the node or
> > > requeue it and try
> > > +			 * again later as appropriate.
> > > +			 */
> > > +
> > > +			switch (-ret) {
> > > +			case ENODEV:
> > > +			case ENOENT:
> > > +				/* Fatal errors. Kill the node.
> > > */
> > > +				requeue = false;
> > > +				i915_scheduler_node_kill(node);
> > > +			break;
> > These break indents still.
> > 
> > > 
> > > +
> > > +			case EAGAIN:
> > > +			case EBUSY:
> > > +			case EIO:
> > > +			case ENOMEM:
> > > +			case ERESTARTSYS:
> > > +			case EINTR:
> > > +				/* Supposedly recoverable
> > > errors. */
> > > +			break;
> > > +
> > > +			default:
> > > +				/*
> > > +				 * Assume the error is
> > > recoverable and hope
> > > +				 * for the best.
> > > +				 */
> > > +				MISSING_CASE(-ret);
> > > +			break;
> > > +			}
> > > +
> > > +			if (requeue) {
> > > +				i915_scheduler_node_requeue(node
> > > );
> > > +				/*
> > > +				 * No point spinning if the ring
> > > is currently
> > > +				 * unavailable so just give up
> > > and come back
> > > +				 * later.
> > > +				 */
> > > +				break;
> > > +			}
> > > +		}
> > > +
> > > +		/* Keep launching until the sky is sufficiently
> > > full. */
> > > +		flying = i915_scheduler_count_flying(scheduler,
> > > ring);
> > > +		if (flying >= scheduler->min_flying)
> > > +			break;
> > > +
> > > +		/* Grab another node and go round again... */
> > > +		ret = i915_scheduler_pop_from_queue_locked(ring,
> > > &node);
> > > +	} while (ret == 0);
> > > +
> > > +	/* Don't complain about not being able to submit extra
> > > entries */
> > > +	if (ret == -ENODATA)
> > > +		ret = 0;
> > > +
> > > +	/*
> > > +	 * Bump the priority of everything that was not
> > > submitted to prevent
> > > +	 * starvation of low priority tasks by a spamming high
> > > priority task.
> > > +	 */
> > This, this calls for an I-G-T test to make sure we're effective.
> Yeah, IGTs are under development.
> 
> > 
> > > 
> > > +	i915_scheduler_priority_bump_clear(scheduler);
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link) {
> > > +		if (!I915_SQS_IS_QUEUED(node))
> > > +			continue;
> > > +
> > > +		i915_scheduler_priority_bump(scheduler, node,
> > > +					     scheduler-
> > > >priority_level_bump);
> > > +	}
> > > +
> > > +	/* On success, return the number of buffers submitted.
> > > */
> > > +	if (ret == 0)
> > > +		ret = count;
> > > +
> > > +error:
> > > +	scheduler->flags[ring->id] &= ~i915_sf_submitting;
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +	return ret;
> > > +}
> > > +
> > > +static void i915_generate_dependencies(struct i915_scheduler
> > > *scheduler,
> > > +				       struct
> > > i915_scheduler_queue_entry *node,
> > > +				       uint32_t ring)
> > > +{
> > > +	struct i915_scheduler_obj_entry *this, *that;
> > > +	struct i915_scheduler_queue_entry *test;
> > > +	int i, j;
> > > +	bool found;
> > > +
> > > +	list_for_each_entry(test, &scheduler->node_queue[ring],
> > > link) {
> > > +		if (I915_SQS_IS_COMPLETE(test))
> > > +			continue;
> > > +
> > > +		/*
> > > +		 * Batches on the same ring for the same
> > > +		 * context must be kept in order.
> > > +		 */
> > > +		found = (node->params.ctx == test->params.ctx)
> > > &&
> > > +			(node->params.ring == test-
> > > >params.ring);
> > > +
> > > +		/*
> > > +		 * Batches working on the same objects must
> > > +		 * be kept in order.
> > > +		 */
> > > +		for (i = 0; (i < node->num_objs) && !found; i++)
> > > {
> > > +			this = node->saved_objects + i;
> > node->objs and node->num_objs would be a better naming match.
> > num_objs
> > and saved_objects makes me think we have a bug here, indexing wrong
> > array.
> > 
> > > 
> > > +
> > > +			for (j = 0; j < test->num_objs; j++) {
> > > +				that = test->saved_objects + j;
> > > +
> > > +				if (this->obj != that->obj)
> > > +					continue;
> > > +
> > > +				/* Only need to worry about
> > > writes */
> > > +				if (this->read_only && that-
> > > >read_only)
> > > +					continue;
> > > +
> > > +				found = true;
> > > +				break;
> > > +			}
> > > +		}
> > > +
> > > +		if (found) {
> > > +			node->dep_list[node->num_deps] = test;
> > > +			node->num_deps++;
> > > +		}
> > > +	}
> > > +}
> > > +
> > > +static int i915_scheduler_queue_execbuffer_bypass(struct
> > > i915_scheduler_queue_entry *qe)
> > > +{
> > > +	struct drm_i915_private *dev_priv = qe->params.dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	int ret;
> > > +
> > > +	scheduler->flags[qe->params.ring->id] |=
> > > i915_sf_submitting;
> > > +	ret = dev_priv->gt.execbuf_final(&qe->params);
> > > +	scheduler->flags[qe->params.ring->id] &=
> > > ~i915_sf_submitting;
> > > +
> > The above code makes me think of locking again, maybe document at
> > top
> > of this function.
> No need to acquire the spinlock at this point. If the scheduler is
> in 
> bypass mode then there is no internal state to protect. The driver
> mutex 
> lock will be held because there is only one code path to get here
> which 
> is the execbuff IOCTL. The whole point of bypass mode is that it is
> a 
> single contiguous execution path from IOCTL entry all the way through
> to 
> hardware submission. A mutex check could be added but it seems 
> unnecessary given the limited calling scope.
> 
> > 
> > 
> > > 
> > > +	/*
> > > +	 * Don't do any clean up on failure because the caller
> > > will
> > > +	 * do it all anyway.
> > > +	 */
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	/* Need to release any resources held by the node: */
> > > +	i915_scheduler_clean_node(qe);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_queue_execbuffer - Submit a batch buffer
> > > request to the
> > > + * scheduler.
> > > + * @qe: The batch buffer request to be queued.
> > > + * The expectation is the qe passed in is a local stack
> > > variable. This
> > > + * function will copy its contents into a freshly allocated list
> > > node. The
> > > + * new node takes ownership of said contents so the original qe
> > > should simply
> > > + * be discarded and not cleaned up (i.e. don't free memory it
> > > points to or
> > > + * dereference objects it holds). The node is added to the
> > > scheduler's queue
> > > + * and the batch buffer will be submitted to the hardware at
> > > some future
> > > + * point in time (which may be immediately, before returning or
> > > may be quite
> > > + * a lot later).
> > > + */
> > > +int i915_scheduler_queue_execbuffer(struct
> > > i915_scheduler_queue_entry *qe)
> > > +{
> > > +	struct drm_i915_private *dev_priv = qe->params.dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct intel_engine_cs *ring = qe->params.ring;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	struct i915_scheduler_queue_entry *test;
> > > +	bool not_flying;
> > > +	int i, r;
> > > +	int incomplete = 0;
> > > +
> > > +	WARN_ON(!scheduler);
> > > +
> > > +	if (1/*!i915.enable_scheduler*/)
> > > +		return
> > > i915_scheduler_queue_execbuffer_bypass(qe);
> > > +
> > I'm thinking, should we branch already before calling a function
> > named
> > i915_scheduler_queue_execbuffer if scheduler is disabled?
> That would require putting if(scheduler) logic in random points 
> throughout the driver. Doing it this way, the driver as a whole has
> a 
> single execution path which is via the scheduler. It is then up to
> the 
> scheduler itself to decide whether to run in bypass mode, simple 
> re-ordering mode, full pre-emption mode or any other mode we feel a
> need 
> to add.
> 
> 
> > 
> > > 
> > > +	node = kmalloc(sizeof(*node), GFP_KERNEL);
> > > +	if (!node)
> > > +		return -ENOMEM;
> > > +
> > > +	*node = *qe;
> > Now I read the added documentation for the function, maybe we
> > should at
> > least zero out qe to avoid future problems?
> The execbuff success path hands the qe contents over and pretty much 
> immediately drops all the way back out of the IOCTL. The failure
> path 
> does the full cleanup of de-allocating all the qe stuff it had 
> previously allocated. IMO there doesn't seem to be much need to zero
> out 
> the structure at this point.
> 
> > 
> > 
> > > 
> > > +	INIT_LIST_HEAD(&node->link);
> > > +	node->status = i915_sqs_queued;
> > > +	node->stamp  = jiffies;
> > > +	i915_gem_request_reference(node->params.request);
> > > +
> > > +	WARN_ON(node->params.request->scheduler_qe);
> > > +	node->params.request->scheduler_qe = node;
> > > +
> > > +	/* Need to determine the number of incomplete entries in
> > > the list as
> > > +	 * that will be the maximum size of the dependency list.
> > > +	 *
> > > +	 * Note that the allocation must not be made with the
> > > spinlock acquired
> > > +	 * as kmalloc can sleep. However, the unlock/relock is
> > > safe because no
> > > +	 * new entries can be queued up during the unlock as the
> > > i915 driver
> > > +	 * mutex is still held. Entries could be removed from
> > > the list but that
> > > +	 * just means the dep_list will be over-allocated which
> > > is fine.
> > > +	 */
> > > +	spin_lock_irq(&scheduler->lock);
> > -->
> > 
> > > 
> > > +	for (r = 0; r < I915_NUM_RINGS; r++) {
> > > +		list_for_each_entry(test, &scheduler-
> > > >node_queue[r], link) {
> > > +			if (I915_SQS_IS_COMPLETE(test))
> > > +				continue;
> > > +
> > > +			incomplete++;
> > > +		}
> > > +	}
> > > +
> > <-- This counting structure, I still think it would be good idea to
> > make it a static helper.
> > 
> > > 
> > > +	/* Temporarily unlock to allocate memory: */
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +	if (incomplete) {
> > > +		node->dep_list = kmalloc_array(incomplete,
> > > +					       sizeof(*node-
> > > >dep_list),
> > > +					       GFP_KERNEL);
> > > +		if (!node->dep_list) {
> > > +			kfree(node);
> > > +			return -ENOMEM;
> > > +		}
> > > +	} else
> > > +		node->dep_list = NULL;
> > > +
> > > +	spin_lock_irq(&scheduler->lock);
> > > +	node->num_deps = 0;
> > > +
> > > +	if (node->dep_list) {
> > > +		for (r = 0; r < I915_NUM_RINGS; r++)
> > > +			i915_generate_dependencies(scheduler,
> > > node, r);
> > > +
> > > +		WARN_ON(node->num_deps > incomplete);
> > > +	}
> > > +
> > > +	node->priority = clamp(node->priority,
> > > +			       scheduler->priority_level_min,
> > > +			       scheduler->priority_level_max);
> > > +
> > > +	if ((node->priority > 0) && node->num_deps) {
> > > +		i915_scheduler_priority_bump_clear(scheduler);
> > > +
> > > +		for (i = 0; i < node->num_deps; i++)
> > > +			i915_scheduler_priority_bump(scheduler,
> > > +					node->dep_list[i], node-
> > > >priority);
> > If node is posted with maximum priority to begin with, wouldn't the
> > priority propagation stop after first level dependencies due to the
> > check in the beginning of priority_bump function?
> The recursion will stop whenever it encounters a node already at
> maximum 
> priority. However, it is not starting from the newly submitted node
> but 
> from that node's dependencies. If they are already at max priority
> then 
> their dependencies must also be and so on down the chain and no
> further 
> processing is required. On the other hand, if they are not then they 
> will be bumped and their dependencies checked and so on down the
> chain.
> 
> > 
> > > 
> > > +
> > > +	list_add_tail(&node->link, &scheduler->node_queue[ring-
> > > >id]);
> > > +
> > > +	not_flying = i915_scheduler_count_flying(scheduler,
> > > ring) <
> > > +						 scheduler-
> > > >min_flying;
> > > +
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +
> > > +	if (not_flying)
> > > +		i915_scheduler_submit(ring);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_notify_request - Notify the scheduler that the
> > > given
> > > + * request has completed on the hardware.
> > > + * @req: Request structure which has completed
> > > + * @preempt: Did it complete pre-emptively?
> > > + * A sequence number has popped out of the hardware and the
> > > request handling
> > > + * code has mapped it back to a request and will mark that
> > > request complete.
> > > + * It also calls this function to notify the scheduler about the
> > > completion
> > > + * so the scheduler's node can be updated appropriately.
> > > + * Returns true if the request is scheduler managed, false if
> > > not. The return
> > > + * value is combined for all freshly completed requests and if
> > > any were true
> > > + * then i915_scheduler_wakeup() is called so the scheduler can
> > > do further
> > > + * processing (submit more work) at the end.
> > > + */
> > > +bool i915_scheduler_notify_request(struct drm_i915_gem_request
> > > *req)
> > > +{
> > > +	struct drm_i915_private *dev_priv = to_i915(req->ring-
> > > >dev);
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *node = req-
> > > >scheduler_qe;
> > > +	unsigned long flags;
> > > +
> > > +	if (!node)
> > > +		return false;
> > > +
> > > +	spin_lock_irqsave(&scheduler->lock, flags);
> > > +
> > > +	WARN_ON(!I915_SQS_IS_FLYING(node));
> > > +
> > > +	/* Node was in flight so mark it as complete. */
> > > +	if (req->cancelled)
> > > +		node->status = i915_sqs_dead;
> > > +	else
> > > +		node->status = i915_sqs_complete;
> > > +
> > > +	spin_unlock_irqrestore(&scheduler->lock, flags);
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static int i915_scheduler_remove_dependent(struct i915_scheduler
> > > *scheduler,
> > > +				struct
> > > i915_scheduler_queue_entry *remove)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	int i, r;
> > > +	int count = 0;
> > > +
> > > +	/*
> > > +	 * Ensure that a node is not being removed which is
> > > still dependent
> > > +	 * upon other (not completed) work. If that happens, it
> > > implies
> > > +	 * something has gone very wrong with the dependency
> > > tracking! Note
> > > +	 * that there is no need to worry if this node has been
> > > explicitly
> > > +	 * killed for some reason - it might be being killed
> > > before it got
> > > +	 * sent to the hardware.
> > > +	 */
> > > +	if (remove->status != i915_sqs_dead) {
> > > +		for (i = 0; i < remove->num_deps; i++)
> > > +			if ((remove->dep_list[i]) &&
> > > +			    (!I915_SQS_IS_COMPLETE(remove-
> > > >dep_list[i])))
> > > +				count++;
> > > +		WARN_ON(count);
> > > +	}
> > > +
> > > +	/*
> > > +	 * Remove this node from the dependency lists of any
> > > other node which
> > > +	 * might be waiting on it.
> > > +	 */
> > > +	for (r = 0; r < I915_NUM_RINGS; r++) {
> > > +		list_for_each_entry(node, &scheduler-
> > > >node_queue[r], link) {
> > > +			for (i = 0; i < node->num_deps; i++) {
> > > +				if (node->dep_list[i] != remove)
> > > +					continue;
> > > +
> > > +				node->dep_list[i] = NULL;
> > > +			}
> > > +		}
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_wakeup - wake the scheduler's worker thread
> > > + * @dev: DRM device
> > > + * Called at the end of seqno interrupt processing if any
> > > request has
> > > + * completed that corresponds to a scheduler node.
> > > + */
> > > +void i915_scheduler_wakeup(struct drm_device *dev)
> > > +{
> > > +	/* XXX: Need to call i915_scheduler_remove() via work
> > > handler. */
> > > +}
> > > +
> > > +/**
> > > + * i915_scheduler_clean_node - free up any
> > > allocations/references
> > > + * associated with the given scheduler queue entry.
> > > + * @node: Queue entry structure which is complete
> > > + * After a give batch buffer completes on the hardware, all the
> > > information
> > > + * required to resubmit it is no longer required. However, the
> > > node entry
> > > + * itself might still be required for tracking purposes for a
> > > while longer.
> > > + * This function should be called as soon as the node is known
> > > to be complete
> > > + * so that these resources may be freed even though the node
> > > itself might
> > > + * hang around.
> > > + */
> > > +void i915_scheduler_clean_node(struct i915_scheduler_queue_entry
> > > *node)
> > > +{
> > > +	if (!I915_SQS_IS_COMPLETE(node)) {
> > > +		WARN(!node->params.request->cancelled,
> > > +		     "Cleaning active node: %d!\n", node-
> > > >status);
> > > +		return;
> > > +	}
> > > +
> > > +	if (node->params.batch_obj) {
> > > +		/*
> > > +		 * The batch buffer must be unpinned before it
> > > is unreferenced
> > > +		 * otherwise the unpin fails with a missing
> > > vma!?
> > > +		 */
> > > +		if (node->params.dispatch_flags &
> > > I915_DISPATCH_SECURE)
> > > +			i915_gem_execbuff_release_batch_obj(node
> > > ->params.batch_obj);
> > > +
> > This code seems little bit out of place.
> Not sure where else you could put it. The code itself is just a copy
> of 
> the existing clean up code that was already in the execbuff code
> path. 
> The cleanup can no longer be done within the execbuff call due to
> the 
> batch buffer execution being arbitrarily delayed by the scheduler.
> Thus 
> it is down to the scheduler to do all the necessary cleanup when it 
> knows that the batch buffer is no longer required.
> 
> > 
> > 
> > > 
> > > +		node->params.batch_obj = NULL;
> > > +	}
> > > +
> > > +	/* And anything else owned by the node: */
> > > +	if (node->params.cliprects) {
> > > +		kfree(node->params.cliprects);
> > > +		node->params.cliprects = NULL;
> > > +	}
> > > +}
> > > +
> > The below function is quite hacky, I understood this will be
> > mitigated
> > by per-ctx sequence numbering in the future?
> To be honest, I think it is already obsolete due to the previous work
> of 
> converting from seqno tests everywhere to testing request
> structures. 
> The point of the code below was to cope with seqno values popping out
> of 
> the hardware in random order due to the re-ordering of batch buffers 
> prior to execution. Whereas, the struct request conversion allowed
> the 
> seqno to be late allocated and thus kept in order.
> 
> Unfortunately, pre-emption confuses matters again. Although the 
> intention is that seqno values will still be kept in order. It's
> just 
> that a given request might have multiple different seqno values
> assigned 
> to it throughout its life. The pre-emption code hasn't quite settled 
> down yet though, especially with interesting new hardware planned
> for 
> future chips. Thus I don't really want to rip out all of this code
> quite 
> yet just in case we do still need it.
> 
> As you say, per-ctx seqnos would definitely have an effect here as
> well. 
> So yes, long term the intention is to remove all this nastyness.
> Short 
> term, it seems best to keep it around until proven unnecessary.
> 
> > 
> > 
> > > 
> > > +static bool i915_scheduler_remove(struct i915_scheduler
> > > *scheduler,
> > > +				  struct intel_engine_cs *ring,
> > > +				  struct list_head *remove)
> > > +{
> > > +	struct i915_scheduler_queue_entry *node, *node_next;
> > > +	int flying = 0, queued = 0;
> > > +	bool do_submit;
> > > +	uint32_t min_seqno;
> > > +
> > > +	spin_lock_irq(&scheduler->lock);
> > > +
> > > +	/*
> > > +	 * In the case where the system is idle, starting
> > > 'min_seqno' from a big
> > > +	 * number will cause all nodes to be removed as they are
> > > now back to
> > > +	 * being in-order. However, this will be a problem if
> > > the last one to
> > > +	 * complete was actually out-of-order as the ring seqno
> > > value will be
> > > +	 * lower than one or more completed buffers. Thus code
> > > looking for the
> > > +	 * completion of said buffers will wait forever.
> > > +	 * Instead, use the hardware seqno as the starting
> > > point. This means
> > > +	 * that some buffers might be kept around even in a
> > > completely idle
> > > +	 * system but it should guarantee that no-one ever gets
> > > confused when
> > > +	 * waiting for buffer completion.
> > > +	 */
> > > +	min_seqno = ring->get_seqno(ring, true);
> > > +
> > > +	list_for_each_entry(node, &scheduler->node_queue[ring-
> > > >id], link) {
> > > +		if (I915_SQS_IS_QUEUED(node))
> > > +			queued++;
> > > +		else if (I915_SQS_IS_FLYING(node))
> > > +			flying++;
> > > +		else if (I915_SQS_IS_COMPLETE(node))
> > > +			continue;
> > > +
> > > +		if (node->params.request->seqno == 0)
> > > +			continue;
> > > +
> > > +		if (!i915_seqno_passed(node->params.request-
> > > >seqno, min_seqno))
> > > +			min_seqno = node->params.request->seqno;
> > > +	}
> > > +
> > > +	INIT_LIST_HEAD(remove);
> > > +	list_for_each_entry_safe(node, node_next, &scheduler-
> > > >node_queue[ring->id], link) {
> > > +		/*
> > > +		 * Only remove completed nodes which have a
> > > lower seqno than
> > > +		 * all pending nodes. While there is the
> > > possibility of the
> > > +		 * ring's seqno counting backwards, all higher
> > > buffers must
> > > +		 * be remembered so that the
> > > 'i915_seqno_passed()' test can
> > > +		 * report that they have in fact passed.
> > > +		 *
> > > +		 * NB: This is not true for 'dead' nodes. The
> > > GPU reset causes
> > > +		 * the software seqno to restart from its
> > > initial value. Thus
> > > +		 * the dead nodes must be removed even though
> > > their seqno values
> > > +		 * are potentially vastly greater than the
> > > current ring seqno.
> > > +		 */
> > > +		if (!I915_SQS_IS_COMPLETE(node))
> > > +			continue;
> > > +
> > > +		if (node->status != i915_sqs_dead) {
> > > +			if (i915_seqno_passed(node-
> > > >params.request->seqno, min_seqno) &&
> > > +			    (node->params.request->seqno !=
> > > min_seqno))
> > > +				continue;
> > > +		}
> > > +
> > > +		list_del(&node->link);
> > > +		list_add(&node->link, remove);
> > > +
> > > +		/* Strip the dependency info while the mutex is
> > > still locked */
> > > +		i915_scheduler_remove_dependent(scheduler,
> > > node);
> > > +
> > > +		continue;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Release the interrupt reference count if there are no
> > > longer any
> > > +	 * nodes to worry about.
> > > +	 */
> > > +	if (!flying && !queued &&
> > > +	    (scheduler->flags[ring->id] &
> > > i915_sf_interrupts_enabled)) {
> > > +		ring->irq_put(ring);
> > > +		scheduler->flags[ring->id] &=
> > > ~i915_sf_interrupts_enabled;
> > Maybe have this piece of code as a helper? To enable interrupts and
> > disable them, maybe even a reference count?
> Not sure how much you could abstract out into a helper other than
> the 
> two lines of irq_get|put + flag set|clear. The flying|queued test is 
> particular to this one instance. Also, there is no need to reference 
> count as the irq code does that already.
> 
> > 
> > 
> > > 
> > > +	}
> > > +
> > > +	/* Launch more packets now? */
> > > +	do_submit = (queued > 0) && (flying < scheduler-
> > > >min_flying);
> > > +
> > > +	spin_unlock_irq(&scheduler->lock);
> > > +
> > > +	return do_submit;
> > > +}
> > > +
> > > +void i915_scheduler_process_work(struct intel_engine_cs *ring)
> > > +{
> > > +	struct drm_i915_private *dev_priv = ring->dev-
> > > >dev_private;
> > > +	struct i915_scheduler *scheduler = dev_priv->scheduler;
> > > +	struct i915_scheduler_queue_entry *node;
> > > +	bool do_submit;
> > > +	struct list_head remove;
> > > +
> > > +	if (list_empty(&scheduler->node_queue[ring->id]))
> > > +		return;
> > > +
> > > +	/* Remove completed nodes. */
> > > +	do_submit = i915_scheduler_remove(scheduler, ring,
> > > &remove);
> > > +
> > > +	if (!do_submit && list_empty(&remove))
> > > +		return;
> > > +
> > > +	/* Need to grab the pm lock outside of the mutex lock */
> > > +	if (do_submit)
> > > +		intel_runtime_pm_get(dev_priv);
> > > +
> > > +	mutex_lock(&ring->dev->struct_mutex);
> > > +
> > > +	if (do_submit)
> > > +		i915_scheduler_submit(ring);
> > > +
> > > +	while (!list_empty(&remove)) {
> > > +		node = list_first_entry(&remove, typeof(*node),
> > > link);
> > > +		list_del(&node->link);
> > > +
> > > +		/* Free up all the DRM references */
> > > +		i915_scheduler_clean_node(node);
> > > +
> > > +		/* And anything else owned by the node: */
> > > +		node->params.request->scheduler_qe = NULL;
> > > +		i915_gem_request_unreference(node-
> > > >params.request);
> > > +		kfree(node->dep_list);
> > > +		kfree(node);
> > Maybe have a separate helper for freeing a node, would make this
> > much
> > cleaner.
> There is a helper for freeing up the contents of the node - see 
> i915_scheduler_clean_node() above. However, that function is called
> from 
> multiple places and this is the only instance where the node itself
> is 
> actually freed. The other callers either need to keep the node
> around 
> for a while longer (i.e. until this piece of code can run) or don't
> own 
> the node because it is stack allocated (in the bypass case). So the 
> 'anything else' section cannot be moved into the existing clean
> function 
> and any secondary helper would be single usage and only four lines
> long. 
> It doesn't seem worth it.
> 
> > 
> > 
> > > 
> > > +	}
> > > +
> > > +	mutex_unlock(&ring->dev->struct_mutex);
> > > +
> > > +	if (do_submit)
> > > +		intel_runtime_pm_put(dev_priv);
> > > +}
> > > diff --git a/drivers/gpu/drm/i915/i915_scheduler.h
> > > b/drivers/gpu/drm/i915/i915_scheduler.h
> > > new file mode 100644
> > > index 0000000..415fec8
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> > > @@ -0,0 +1,95 @@
> > > +/*
> > > + * Copyright (c) 2014 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person
> > > obtaining a
> > > + * copy of this software and associated documentation files (the
> > > "Software"),
> > > + * to deal in the Software without restriction, including
> > > without limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute,
> > > sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to
> > > whom the
> > > + * Software is furnished to do so, subject to the following
> > > conditions:
> > > + *
> > > + * The above copyright notice and this permission notice
> > > (including the next
> > > + * paragraph) shall be included in all copies or substantial
> > > portions of the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> > > KIND, EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> > > EVENT SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> > > DAMAGES OR OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> > > OTHERWISE, ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > > OTHER DEALINGS
> > > + * IN THE SOFTWARE.
> > > + *
> > > + */
> > > +
> > > +#ifndef _I915_SCHEDULER_H_
> > > +#define _I915_SCHEDULER_H_
> > > +
> > > +enum i915_scheduler_queue_status {
> > > +	/* Limbo: */
> > > +	i915_sqs_none = 0,
> > > +	/* Not yet submitted to hardware: */
> > > +	i915_sqs_queued,
> > > +	/* Popped from queue, ready to fly: */
> > > +	i915_sqs_popped,
> > > +	/* Sent to hardware for processing: */
> > > +	i915_sqs_flying,
> > > +	/* Finished processing on the hardware: */
> > > +	i915_sqs_complete,
> > > +	/* Killed by watchdog or catastrophic submission
> > > failure: */
> > > +	i915_sqs_dead,
> > > +	/* Limit value for use with arrays/loops */
> > > +	i915_sqs_MAX
> > > +};
> > > +
> > To uppercase.
> Grrr. Linux kernel style guide is daft.
> 
> > 
> > 
> > > 
> > > +#define I915_SQS_IS_QUEUED(node)	(((node)->status ==
> > > i915_sqs_queued))
> > > +#define I915_SQS_IS_FLYING(node)	(((node)->status ==
> > > i915_sqs_flying))
> > > +#define I915_SQS_IS_COMPLETE(node)	(((node)->status ==
> > > i915_sqs_complete) || \
> > > +					 ((node)->status ==
> > > i915_sqs_dead))
> > > +
> > > +struct i915_scheduler_obj_entry {
> > > +	struct drm_i915_gem_object          *obj;
> > > +	bool                                read_only;
> > > +};
> > > +
> > > +struct i915_scheduler_queue_entry {
> > > +	struct i915_execbuffer_params       params;
> > > +	/* -1023 = lowest priority, 0 = default, 1023 = highest
> > > */
> > > +	int32_t                             priority;
> > > +	struct i915_scheduler_obj_entry     *saved_objects;
> > > +	int                                 num_objs;
> > > +	bool                                bumped;
> > > +	struct i915_scheduler_queue_entry   **dep_list;
> > > +	int                                 num_deps;
> > > +	enum i915_scheduler_queue_status    status;
> > > +	unsigned long                       stamp;
> > > +	struct list_head                    link;
> > Above whitespace is still incorrect and I could really use
> > comments.
> > 
> > > 
> > > +};
> > > +
> > > +struct i915_scheduler {
> > > +	struct list_head    node_queue[I915_NUM_RINGS];
> > > +	uint32_t            flags[I915_NUM_RINGS];
> > > +	spinlock_t          lock;
> > > +
> > > +	/* Tuning parameters: */
> > > +	int32_t             priority_level_min;
> > > +	int32_t             priority_level_max;
> > > +	int32_t             priority_level_bump;
> > > +	int32_t             priority_level_preempt;
> > > +	uint32_t            min_flying;
> > > +};
> > > +
> > > +/* Flag bits for i915_scheduler::flags */
> > > +enum {
> > > +	i915_sf_interrupts_enabled  = (1 << 0),
> > > +	i915_sf_submitting          = (1 << 1),
> > > +};
> > These must be I915_UPPERCASE_SOMETHING by convention.
> > 
> > > 
> > > +
> > > +bool i915_scheduler_is_enabled(struct drm_device *dev);
> > > +int i915_scheduler_init(struct drm_device *dev);
> > > +void i915_scheduler_clean_node(struct i915_scheduler_queue_entry
> > > *node);
> > > +int i915_scheduler_queue_execbuffer(struct
> > > i915_scheduler_queue_entry *qe);
> > > +bool i915_scheduler_notify_request(struct drm_i915_gem_request
> > > *req);
> > > +void i915_scheduler_wakeup(struct drm_device *dev);
> > > +
> > > +#endif  /* _I915_SCHEDULER_H_ */
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-02-26  9:13 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-18 14:26 [PATCH v5 00/35] GPU scheduler for i915 driver John.C.Harrison
2016-02-18 14:26 ` [PATCH v5 01/35] drm/i915: Add total count to context status debugfs output John.C.Harrison
2016-02-18 14:26 ` [PATCH v5 02/35] drm/i915: Prelude to splitting i915_gem_do_execbuffer in two John.C.Harrison
2016-02-18 14:26 ` [PATCH v5 03/35] drm/i915: Split i915_dem_do_execbuffer() in half John.C.Harrison
2016-02-18 14:26 ` [PATCH v5 04/35] drm/i915: Cache request pointer in *_submission_final() John.C.Harrison
2016-02-18 14:26 ` [PATCH v5 05/35] drm/i915: Re-instate request->uniq because it is extremely useful John.C.Harrison
2016-02-18 14:26 ` [PATCH v5 06/35] drm/i915: Start of GPU scheduler John.C.Harrison
2016-02-19 13:03   ` Joonas Lahtinen
2016-02-19 17:03     ` John Harrison
2016-02-26  9:13       ` Joonas Lahtinen [this message]
2016-02-26 14:18         ` John Harrison
2016-02-18 14:26 ` [PATCH v5 07/35] drm/i915: Prepare retire_requests to handle out-of-order seqnos John.C.Harrison
2016-02-19 19:23   ` Jesse Barnes
2016-02-18 14:26 ` [PATCH v5 08/35] drm/i915: Disable hardware semaphores when GPU scheduler is enabled John.C.Harrison
2016-02-19 19:27   ` Jesse Barnes
2016-02-18 14:26 ` [PATCH v5 09/35] drm/i915: Force MMIO flips when scheduler enabled John.C.Harrison
2016-02-19 19:28   ` Jesse Barnes
2016-02-19 19:53     ` Ville Syrjälä
2016-02-19 20:01       ` Jesse Barnes
2016-02-22  9:41         ` Lankhorst, Maarten
2016-02-22 12:53           ` John Harrison
2016-02-20  9:22     ` Chris Wilson
2016-02-22 20:42       ` Jesse Barnes
2016-02-23 11:16         ` Chris Wilson
2016-02-18 14:26 ` [PATCH v5 10/35] drm/i915: Added scheduler hook when closing DRM file handles John.C.Harrison
2016-03-01  8:59   ` Joonas Lahtinen
2016-03-01 14:52     ` John Harrison
2016-02-18 14:26 ` [PATCH v5 11/35] drm/i915: Added scheduler hook into i915_gem_request_notify() John.C.Harrison
2016-03-01  9:10   ` Joonas Lahtinen
2016-02-18 14:27 ` [PATCH v5 12/35] drm/i915: Added deferred work handler for scheduler John.C.Harrison
2016-03-01  9:16   ` Joonas Lahtinen
2016-03-01 15:12     ` John Harrison
2016-02-18 14:27 ` [PATCH v5 13/35] drm/i915: Redirect execbuffer_final() via scheduler John.C.Harrison
2016-02-19 19:33   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 14/35] drm/i915: Keep the reserved space mechanism happy John.C.Harrison
2016-02-19 19:36   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 15/35] drm/i915: Added tracking/locking of batch buffer objects John.C.Harrison
2016-02-19 19:42   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 16/35] drm/i915: Hook scheduler node clean up into retire requests John.C.Harrison
2016-02-19 19:44   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 17/35] drm/i915: Added scheduler support to __wait_request() calls John.C.Harrison
2016-03-01 10:02   ` Joonas Lahtinen
2016-03-11 11:47     ` John Harrison
2016-02-18 14:27 ` [PATCH v5 18/35] drm/i915: Added scheduler support to page fault handler John.C.Harrison
2016-02-19 19:45   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 19/35] drm/i915: Added scheduler flush calls to ring throttle and idle functions John.C.Harrison
2016-03-07 11:31   ` Joonas Lahtinen
2016-03-11 16:22     ` John Harrison
2016-02-18 14:27 ` [PATCH v5 20/35] drm/i915: Add scheduler hook to GPU reset John.C.Harrison
2016-02-23 20:27   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 21/35] drm/i915: Added a module parameter to allow the scheduler to be disabled John.C.Harrison
2016-02-23 20:29   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 22/35] drm/i915: Support for 'unflushed' ring idle John.C.Harrison
2016-02-23 20:35   ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 23/35] drm/i915: Defer seqno allocation until actual hardware submission time John.C.Harrison
2016-03-07 12:16   ` Joonas Lahtinen
2016-02-18 14:27 ` [PATCH v5 24/35] drm/i915: Added trace points to scheduler John.C.Harrison
2016-02-23 20:42   ` Jesse Barnes
2016-02-23 20:42   ` Jesse Barnes
2016-02-26 15:55     ` John Harrison
2016-02-26 17:12       ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 25/35] drm/i915: Added scheduler queue throttling by DRM file handle John.C.Harrison
2016-02-23 21:02   ` Jesse Barnes
2016-03-01 15:52     ` John Harrison
2016-02-18 14:27 ` [PATCH v5 26/35] drm/i915: Added debugfs interface to scheduler tuning parameters John.C.Harrison
2016-02-23 21:06   ` Jesse Barnes
2016-03-11 16:28     ` John Harrison
2016-03-11 17:25       ` Jesse Barnes
2016-02-18 14:27 ` [PATCH v5 27/35] drm/i915: Added debug state dump facilities to scheduler John.C.Harrison
2016-03-07 12:31   ` Joonas Lahtinen
2016-03-11 16:38     ` John Harrison
2016-03-15 10:53       ` Joonas Lahtinen
2016-02-18 14:27 ` [PATCH v5 28/35] drm/i915: Add early exit to execbuff_final() if insufficient ring space John.C.Harrison
2016-02-18 14:27 ` [PATCH v5 29/35] drm/i915: Added scheduler statistic reporting to debugfs John.C.Harrison
2016-02-18 14:27 ` [PATCH v5 30/35] drm/i915: Add scheduler support functions for TDR John.C.Harrison
2016-02-18 14:27 ` [PATCH v5 31/35] drm/i915: Scheduler state dump via debugfs John.C.Harrison
2016-02-18 14:27 ` [PATCH v5 32/35] drm/i915: Enable GPU scheduler by default John.C.Harrison
2016-02-18 14:27 ` [PATCH v5 33/35] drm/i915: Add scheduling priority to per-context parameters John.C.Harrison
2016-02-18 14:27 ` [PATCH v5 34/35] drm/i915: Add support for retro-actively banning batch buffers John.C.Harrison
2016-02-18 14:27 ` [PATCH v5 35/35] drm/i915: Allow scheduler to manage inter-ring object synchronisation John.C.Harrison
2016-02-18 14:27 ` [PATCH 01/20] igt/gem_ctx_param_basic: Updated to support scheduler priority interface John.C.Harrison
2016-02-18 15:30 ` ✗ Fi.CI.BAT: failure for GPU scheduler for i915 driver Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1456478010.7316.1.camel@linux.intel.com \
    --to=joonas.lahtinen@linux.intel.com \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=John.C.Harrison@Intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.