Re: [PATCH v4 07/38] drm/i915: Start of GPU scheduler

From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: John.C.Harrison@Intel.com, Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [PATCH v4 07/38] drm/i915: Start of GPU scheduler
Date: Wed, 20 Jan 2016 15:18:33 +0200	[thread overview]
Message-ID: <1453295913.7637.55.camel@linux.intel.com> (raw)
In-Reply-To: <1452537788-32136-8-git-send-email-John.C.Harrison@Intel.com>

Hi,

Comments below this pre text.

Many of the comments are related to the indent and style of the code.
That stuff is important to fix for future maintainability. In order for
the future review to be more effective, I'd like to next see a v5 of
the series where the code quality concerns have been addressed, patches
squashed to be actual reviewable chunks and appropriate kerneldoc being
added.

To give an idea of proper slicing of patches, first produce a no-op
scheduler, adding the extra function calls where needed and still
keeping the scheduling completely linear. Second patch could introduce
out of order submitting, third one priority bumping, fourth pre-empting 
and so on. That way, each patch extends the functionality and is itself
already mergeable. That way I've been able to go through and understand
the existing code, and I can actually review (other than just nag about
indent and coding style) if the changes are appropriate to bring in the
functionality desired.

In the current split, for me or anyone who did not participate writing
the code, it is otherwise too confusing to try to guess what future
changes might make each piece of code make sense, and which will be
redundant in the future too. There is no value in splitting code to
chunks that are not itself functional.

Regards, Joonas

On Mon, 2016-01-11 at 18:42 +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Initial creation of scheduler source files. Note that this patch
> implements most of the scheduler functionality but does not hook it
> in
> to the driver yet. It also leaves the scheduler code in 'pass
> through'
> mode so that even when it is hooked in, it will not actually do very
> much. This allows the hooks to be added one at a time in bite size
> chunks and only when the scheduler is finally enabled at the end does
> anything start happening.
> 
> The general theory of operation is that when batch buffers are
> submitted to the driver, the execbuffer() code packages up all the
> information required to execute the batch buffer at a later time.
> This
> package is given over to the scheduler which adds it to an internal
> node list. The scheduler also scans the list of objects associated
> with the batch buffer and compares them against the objects already
> in
> use by other buffers in the node list. If matches are found then the
> new batch buffer node is marked as being dependent upon the matching
> node. The same is done for the context object. The scheduler also
> bumps up the priority of such matching nodes on the grounds that the
> more dependencies a given batch buffer has the more important it is
> likely to be.
> 
> The scheduler aims to have a given (tuneable) number of batch buffers
> in flight on the hardware at any given time. If fewer than this are
> currently executing when a new node is queued, then the node is
> passed
> straight through to the submit function. Otherwise it is simply added
> to the queue and the driver returns back to user land.
> 
> The scheduler is notified when each batch buffer completes and
> updates
> its internal tracking accordingly. At the end of the completion
> interrupt processing, if any scheduler tracked batches were
> processed,
> the scheduler's deferred worker thread is woken up. This can do more
> involved processing such as actually removing completed nodes from
> the
> queue and freeing up the resources associated with them (internal
> memory allocations, DRM object references, context reference, etc.).
> The work handler also checks the in flight count and calls the
> submission code if a new slot has appeared.
> 
> When the scheduler's submit code is called, it scans the queued node
> list for the highest priority node that has no unmet dependencies.
> Note that the dependency calculation is complex as it must take
> inter-ring dependencies and potential preemptions into account. Note
> also that in the future this will be extended to include external
> dependencies such as the Android Native Sync file descriptors and/or
> the linux dma-buff synchronisation scheme.
> 
> If a suitable node is found then it is sent to execbuff_final() for
> submission to the hardware. The in flight count is then re-checked
> and
> a new node popped from the list if appropriate.
> 
> Note that this patch does not implement pre-emptive scheduling. Only
> basic scheduling by re-ordering batch buffer submission is currently
> implemented. Pre-emption of actively executing batch buffers comes in
> the next patch series.
> 
> v2: Changed priority levels to +/-1023 due to feedback from Chris
> Wilson.
> 
> Removed redundant index from scheduler node.
> 
> Changed time stamps to use jiffies instead of raw monotonic. This
> provides lower resolution but improved compatibility with other i915
> code.
> 
> Major re-write of completion tracking code due to struct fence
> conversion. The scheduler no longer has it's own private IRQ handler
> but just lets the existing request code handle completion events.
> Instead, the scheduler now hooks into the request notify code to be
> told when a request has completed.
> 
> Reduced driver mutex locking scope. Removal of scheduler nodes no
> longer grabs the mutex lock.
> 
> v3: Refactor of dependency generation to make the code more readable.
> Also added in read-read optimisation support - i.e., don't treat a
> shared read-only buffer as being a dependency.
> 
> Allowed the killing of queued nodes rather than only flying ones.
> 
> v4: Updated the commit message to better reflect the current state of
> the code. Downgraded some BUG_ONs to WARN_ONs. Used the correct array
> memory allocator function (kmalloc_array instead of kmalloc).
> Corrected the format of some comments. Wrapped some lines differently
> to keep the style checker happy.
> 
> Fixed a WARN_ON when killing nodes. The dependency removal code
> checks
> that nodes being destroyed do not have any oustanding dependencies
> (which would imply they should not have been executed yet). In the
> case of nodes being destroyed, e.g. due to context banning, then this
> might well be the case - they have not been executed and do indeed
> have outstanding dependencies.
> 
> Re-instated the code to disble interrupts when not in use. The
> underlying problem causing broken IRQ reference counts seems to have
> been fixed now.
> 
> For: VIZ-1587
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile         |   1 +
>  drivers/gpu/drm/i915/i915_drv.h       |   4 +
>  drivers/gpu/drm/i915/i915_gem.c       |   5 +
>  drivers/gpu/drm/i915/i915_scheduler.c | 797
> ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_scheduler.h |  91 ++++
>  5 files changed, 898 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
>  create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 15398c5..79cb38b 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -10,6 +10,7 @@ ccflags-y := -Werror
>  i915-y := i915_drv.o \
>  	  i915_irq.o \
>  	  i915_params.o \
> +	  i915_scheduler.o \
>            i915_suspend.o \
>  	  i915_sysfs.o \
>  	  intel_csr.o \
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 10dffdd..38f423b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1695,6 +1695,8 @@ struct i915_execbuffer_params {
>  	struct drm_i915_gem_request     *request;
>  };
>  
> +struct i915_scheduler;
> +

Rather add "i915_scheduler.h" include at the top and eliminate circular
include dependencies. This is needed for the next comment.

>  /* used in computing the new watermarks state */
>  struct intel_wm_config {
>  	unsigned int num_pipes_active;
> @@ -1947,6 +1949,8 @@ struct drm_i915_private {
>  
>  	struct i915_runtime_pm pm;
>  
> +	struct i915_scheduler *scheduler;
> +

As the scheduler is going to be enabled on all platforms to an extent,
no point in making it a pointer. Just making it member like "pm" is the
best, this also requires the above include change.

>  	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
>  	struct {
>  		int (*execbuf_submit)(struct i915_execbuffer_params *params,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index cff3768..47aa85b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include "i915_scheduler.h"
>  

This should go before any <linux/...> not to mask missing includes from
the header itself, so correct place would be right after #include
"i915_trace.h" 

>  #define RQ_BUG_ON(expr)
>  
> @@ -5242,6 +5243,10 @@ int i915_gem_init(struct drm_device *dev)
>  	 */
>  	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
>  
> +	ret = i915_scheduler_init(dev);
> +	if (ret)
> +		goto out_unlock;
> +
>  	ret = i915_gem_init_userptr(dev);
>  	if (ret)
>  		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> new file mode 100644
> index 0000000..8cb9063
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -0,0 +1,797 @@
> +/*
> + * Copyright (c) 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "i915_drv.h"
> +#include "intel_drv.h"
> +#include "i915_scheduler.h"

Again, this include should be the top one, should not require any
includes before it.

> +
> +static int         i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node);
> +static int         i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
> +						   struct i915_scheduler_queue_entry *remove);
> +static int         i915_scheduler_submit(struct intel_engine_cs *ring,
> +					 bool is_locked);
> +static uint32_t    i915_scheduler_count_flying(struct i915_scheduler *scheduler,
> +					       struct intel_engine_cs *ring);
> +static void        i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler);
> +static int         i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
> +						struct i915_scheduler_queue_entry *target,
> +						uint32_t bump);

Do not indent the function names like this, it becomes unmaintainable
and messy very fast if somebody adds a new function with a more complex
return type, which is a very likely thing to happen.

What I would do is move all the helper functions here at the top and
order them so that the forward declarations are not needed, at least
i915_scheduler_fly_node is not used before its definition.

> +
> +int i915_scheduler_init(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	int                     r;

Again, this indent is a no-go. I'll not mention it on further
functions, assume it to be fixed for next revision.

> +
> +	if (scheduler)
> +		return 0;
> +
> +	scheduler = kzalloc(sizeof(*scheduler), GFP_KERNEL);
> +	if (!scheduler)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&scheduler->lock);
> +
> +	for (r = 0; r < I915_NUM_RINGS; r++)
> +		INIT_LIST_HEAD(&scheduler->node_queue[r]);
> +
> +	/* Default tuning values: */
> +	scheduler->priority_level_min     = -1023;
> +	scheduler->priority_level_max     = 1023;
> +	scheduler->priority_level_preempt = 900;
> +	scheduler->min_flying             = 2;

This kind of indent is tolerable because it is a contained code block,
but not needed either.

> +
> +	dev_priv->scheduler = scheduler;
> +
> +	return 0;
> +}
> +
> +static void i915_generate_dependencies(struct i915_scheduler *scheduler,
> +				       struct i915_scheduler_queue_entry *node,
> +				       uint32_t ring)
> +{
> +	struct i915_scheduler_obj_entry *this, *that;
> +	struct i915_scheduler_queue_entry  *test;
> +	int                 i, j;
> +	bool                found;
> +
> +	list_for_each_entry(test, &scheduler->node_queue[ring], link) {
> +		if (I915_SQS_IS_COMPLETE(test))
> +			continue;
> +
> +		/*
> +		 * Batches on the same ring for the same
> +		 * context must be kept in order.
> +		 */
> +		found = (node->params.ctx == test->params.ctx) &&
> +			(node->params.ring == test->params.ring);
> +
> +		if (found)
> +			goto depends;
> +

This is not needed.

> +		/*
> +		 * Batches working on the same objects must
> +		 * be kept in order.
> +		 */
> +		for (i = 0; (i < node->num_objs) && !found; i++) {

As the test is here already                         ---^

> +			this = node->saved_objects + i;
> +
> +			for (j = 0; j < test->num_objs; j++) {
> +				that = test->saved_objects + j;
> +
> +				if (this->obj != that->obj)
> +					continue;

How about VMAs? There might be multiple mappings to an object, isn't it
enough to depend on the required VMA instead of the whole object?

> +
> +				/* Only need to worry about writes */
> +				if (this->read_only && that->read_only)
> +					continue;
> +
> +				found = true;
> +				break;
> +			}
> +		}
> +

The following block is not needed.

> +		if (!found)
> +			continue;
> +
> +depends:

Rather like this, in order to avoid a goto label;
if (found) {
...

> +		node->dep_list[node->num_deps] = test;
> +		node->num_deps++;
> +	}
> +}
> +

Please add a brief kerneldoc above each function in the header, it's
required. Adding it to non-trivial inline helper functions too will
make reviewing much easier.

> +int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe)
> +{
> +	struct drm_i915_private *dev_priv = qe->params.dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct intel_engine_cs  *ring = qe->params.ring;
> +	struct i915_scheduler_queue_entry  *node;
> +	struct i915_scheduler_queue_entry  *test;
> +	unsigned long       flags;
> +	bool                not_flying;
> +	int                 i, r;
> +	int                 incomplete = 0;
> +
> +	WARN_ON(!scheduler);
> +

This kind of situations should have a be a BUG_ON, because scheduler
being zero is literally going to cause an OOPS in the next dereference
which is going to happen unconditionally. WARN + OOPS is kind of what
BUG_ON should be used avoid. But this should be removed anyway after
scheduler is made a data member of dev_priv.

> +	if (1/*i915.scheduler_override & i915_so_direct_submit*/) {

I assume this is going to be addressed in a future commit. Could have
been introduced in this patch, too.

> +		int ret;
> +
> +		scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
> +		ret = dev_priv->gt.execbuf_final(&qe->params);
> +		scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
> +

The kerneldoc should mention locking requirements of this function.

> +		/*
> +		 * Don't do any clean up on failure because the caller will
> +		 * do it all anyway.
> +		 */
> +		if (ret)
> +			return ret;
> +
> +		/* Free everything that is owned by the QE structure: */
> +		kfree(qe->params.cliprects);
> +		if (qe->params.dispatch_flags & I915_DISPATCH_SECURE)
> +			i915_gem_execbuff_release_batch_obj(qe->params.batch_obj);
> +
> +		return 0;

Above piece of code looks like its own function, so it should probably
be one.

> +	}
> +
> +	node = kmalloc(sizeof(*node), GFP_KERNEL);
> +	if (!node)
> +		return -ENOMEM;
> +
> +	*node = *qe;

Any reason we can't simply move ownership of qe? If not, I'd rather
make a clone function

> +	INIT_LIST_HEAD(&node->link);
> +	node->status = i915_sqs_queued;
> +	node->stamp  = jiffies;
> +	i915_gem_request_reference(node->params.request);
> +
> +	/* Need to determine the number of incomplete entries in the list as
> +	 * that will be the maximum size of the dependency list.
> +	 *
> +	 * Note that the allocation must not be made with the spinlock acquired
> +	 * as kmalloc can sleep. However, the unlock/relock is safe because no
> +	 * new entries can be queued up during the unlock as the i915 driver
> +	 * mutex is still held. Entries could be removed from the list but that
> +	 * just means the dep_list will be over-allocated which is fine.
> +	 */
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +	for (r = 0; r < I915_NUM_RINGS; r++) {
> +		list_for_each_entry(test, &scheduler->node_queue[r], link) {
> +			if (I915_SQS_IS_COMPLETE(test))
> +				continue;
> +
> +			incomplete++;
> +		}
> +	}
> +
> +	/* Temporarily unlock to allocate memory: */
> +	spin_unlock_irqrestore(&scheduler->lock, flags);

I'd make the above piece of code a helper, these stats are to be
counted for debugfs anyway, too?

> +	if (incomplete) {
> +		node->dep_list = kmalloc_array(incomplete,
> +					       sizeof(*node->dep_list),
> +					       GFP_KERNEL);
> +		if (!node->dep_list) {
> +			kfree(node);
> +			return -ENOMEM;
> +		}
> +	} else
> +		node->dep_list = NULL;
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +	node->num_deps = 0;
> +
> +	if (node->dep_list) {
> +		for (r = 0; r < I915_NUM_RINGS; r++)
> +			i915_generate_dependencies(scheduler, node, r);
> +
> +		WARN_ON(node->num_deps > incomplete);

Wouldn't this condition again lead to a crash? If so, should be BUG_ON
to cause that crash as early as possible. WARN_ON is only good if there
is a way of coping with the situation and no imminent system crash is
bound to happen.

> +	}
> +
> +	if (node->priority > scheduler->priority_level_max)
> +		node->priority = scheduler->priority_level_max;
> +	else if (node->priority < scheduler->priority_level_min)
> +		node->priority = scheduler->priority_level_min;
> +

There is clamp_val macro in linux/kernel.h .

> +	if ((node->priority > 0) && node->num_deps) {
> +		i915_scheduler_priority_bump_clear(scheduler);
> +
> +		for (i = 0; i < node->num_deps; i++)
> +			i915_scheduler_priority_bump(scheduler,
> +					node->dep_list[i], node->priority);
> +	}
> +
> +	list_add_tail(&node->link, &scheduler->node_queue[ring->id]);
> +
> +	not_flying = i915_scheduler_count_flying(scheduler, ring) <
> +						 scheduler->min_flying;
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	if (not_flying)
> +		i915_scheduler_submit(ring, true);
> +
> +	return 0;
> +}
> +
> +static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
> +{
> +	struct drm_i915_private *dev_priv = node->params.dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct intel_engine_cs  *ring;
> +
> +	WARN_ON(!scheduler);
> +	WARN_ON(!node);
> +	WARN_ON(node->status != i915_sqs_popped);

Other states had their I915_SQS_IS_* macro, why some don't?

> +
> +	ring = node->params.ring;
> +
> +	/*
> +	 * Add the node (which should currently be in state popped) to the
> +	 * front of the queue. This ensure that flying nodes are always held
> +	 * in hardware submission order.
> +	 */
> +	list_add(&node->link, &scheduler->node_queue[ring->id]);
> +
> +	node->status = i915_sqs_flying;
> +
> +	if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
> +		bool    success = true;
> +
> +		success = ring->irq_get(ring);
> +		if (success)
> +			scheduler->flags[ring->id] |= i915_sf_interrupts_enabled;
> +		else
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Nodes are considered valid dependencies if they are queued on any ring or
> + * if they are in flight on a different ring. In flight on the same ring is no
> + * longer interesting for non-premptive nodes as the ring serialises execution.
> + * For pre-empting nodes, all in flight dependencies are valid as they must not
> + * be jumped by the act of pre-empting.
> + *
> + * Anything that is neither queued nor flying is uninteresting.
> + */
> +static inline bool i915_scheduler_is_dependency_valid(
> +			struct i915_scheduler_queue_entry *node, uint32_t idx)
> +{
> +	struct i915_scheduler_queue_entry *dep;
> +
> +	dep = node->dep_list[idx];
> +	if (!dep)
> +		return false;
> +
> +	if (I915_SQS_IS_QUEUED(dep))
> +		return true;
> +
> +	if (I915_SQS_IS_FLYING(dep)) {
> +		if (node->params.ring != dep->params.ring)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static uint32_t i915_scheduler_count_flying(struct i915_scheduler *scheduler,
> +					    struct intel_engine_cs *ring)
> +{
> +	struct i915_scheduler_queue_entry *node;
> +	uint32_t                          flying = 0;
> +
> +	list_for_each_entry(node, &scheduler->node_queue[ring->id], link)

This loop keeps popping up, it could use a define similar to the ones
in i915_drv.h ;

#define for_each_hpd_pin(__pin) \

> +		if (I915_SQS_IS_FLYING(node))
> +			flying++;
> +
> +	return flying;
> +}
> +
> +/*
> + * Add a popped node back in to the queue. For example, because the ring was
> + * hung when execfinal() was called and thus the ring submission needs to be
> + * retried later.
> + */
> +static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node)
> +{
> +	WARN_ON(!node);

	if (WARN_ON(!node))
		return;

Or rather no check at all, it's going to crash anyway even in the
calling function if there's NULL, and it's internal function. It's
relevant to check if the userspace.

> +	WARN_ON(!I915_SQS_IS_FLYING(node));
> +
> +	/* Seqno will be reassigned on relaunch */
> +	node->params.request->seqno = 0;
> +	node->status = i915_sqs_queued;
> +}
> +
> +/*
> + * Give up on a node completely. For example, because it is causing the
> + * ring to hang or is using some resource that no longer exists.
> + */
> +static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node)
> +{
> +	WARN_ON(!node);
> +	WARN_ON(I915_SQS_IS_COMPLETE(node));
> +
> +	node->status = i915_sqs_dead;
> +}
> +
> +/*
> + * A sequence number has popped out of the hardware and the request handling
> + * code has mapped it back to a request and will mark that request complete.
> + * It also calls this function to notify the scheduler about the completion
> + * so the scheduler's node can be updated appropriately.
> + * Returns true if the request is scheduler managed, false if not.
> + */
> +bool i915_scheduler_notify_request(struct drm_i915_gem_request *req)
> +{
> +	struct drm_i915_private *dev_priv  = to_i915(req->ring->dev);
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	/* XXX: Need to map back from request to node */
> +	struct i915_scheduler_queue_entry *node = NULL;
> +	unsigned long       flags;
> +
> +	if (!node)
> +		return false;

Not so sure if slicing the series down to an extent that functions are
impossible to review, was a good idea. Idea is to slice things down, to
reviewable pieces. It's hard to predict or keep looking forward the
series what is going to come.

> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +
> +	WARN_ON(!I915_SQS_IS_FLYING(node));
> +
> +	/* Node was in flight so mark it as complete. */
> +	if (req->cancelled)
> +		node->status = i915_sqs_dead;
> +	else
> +		node->status = i915_sqs_complete;
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	/*
> +	 * XXX: If the in-flight list is now empty then new work should be
> +	 * submitted. However, this function is called from interrupt context
> +	 * and thus cannot acquire mutex locks and other such things that are
> +	 * necessary for fresh submission.
> +	 */
> +
> +	return true;
> +}
> +
> +int i915_scheduler_remove(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry  *node, *node_next;
> +	unsigned long       flags;
> +	int                 flying = 0, queued = 0;
> +	int                 ret = 0;
> +	bool                do_submit;
> +	uint32_t            min_seqno;
> +	struct list_head    remove;
> +
> +	if (list_empty(&scheduler->node_queue[ring->id]))
> +		return 0;
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +
> +	/* /i915_scheduler_dump_locked(ring, "remove/pre");/ */
> +

This should not be here at all.

> +	/*
> +	 * In the case where the system is idle, starting 'min_seqno' from a big
> +	 * number will cause all nodes to be removed as they are now back to
> +	 * being in-order. However, this will be a problem if the last one to
> +	 * complete was actually out-of-order as the ring seqno value will be
> +	 * lower than one or more completed buffers. Thus code looking for the
> +	 * completion of said buffers will wait forever.
> +	 * Instead, use the hardware seqno as the starting point. This means
> +	 * that some buffers might be kept around even in a completely idle
> +	 * system but it should guarantee that no-one ever gets confused when
> +	 * waiting for buffer completion.
> +	 */
> +	min_seqno = ring->get_seqno(ring, true);
> +
> +	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> +		if (I915_SQS_IS_QUEUED(node))
> +			queued++;
> +		else if (I915_SQS_IS_FLYING(node))
> +			flying++;
> +		else if (I915_SQS_IS_COMPLETE(node))
> +			continue;
> +
> +		if (node->params.request->seqno == 0)
> +			continue;
> +
> +		if (!i915_seqno_passed(node->params.request->seqno, min_seqno))
> +			min_seqno = node->params.request->seqno;
> +	}

Couldn't these values be kept cached, instead of counting them at each
function?

> +
> +	INIT_LIST_HEAD(&remove);
> +	list_for_each_entry_safe(node, node_next, &scheduler->node_queue[ring->id], link) {
> +		/*
> +		 * Only remove completed nodes which have a lower seqno than
> +		 * all pending nodes. While there is the possibility of the
> +		 * ring's seqno counting backwards, all higher buffers must
> +		 * be remembered so that the 'i915_seqno_passed()' test can
> +		 * report that they have in fact passed.
> +		 *
> +		 * NB: This is not true for 'dead' nodes. The GPU reset causes
> +		 * the software seqno to restart from its initial value. Thus
> +		 * the dead nodes must be removed even though their seqno values
> +		 * are potentially vastly greater than the current ring seqno.
> +		 */
> +		if (!I915_SQS_IS_COMPLETE(node))
> +			continue;
> +
> +		if (node->status != i915_sqs_dead) {
> +			if (i915_seqno_passed(node->params.request->seqno, min_seqno) &&
> +			    (node->params.request->seqno != min_seqno))
> +				continue;
> +		}
> +
> +		list_del(&node->link);
> +		list_add(&node->link, &remove);
> +
> +		/* Strip the dependency info while the mutex is still locked */
> +		i915_scheduler_remove_dependent(scheduler, node);
> +
> +		continue;
> +	}
> +
> +	/*
> +	 * Release the interrupt reference count if there are no longer any
> +	 * nodes to worry about.
> +	 */
> +	if (!flying && !queued &&
> +	    (scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
> +		ring->irq_put(ring);
> +		scheduler->flags[ring->id] &= ~i915_sf_interrupts_enabled;
> +	}
> +
> +	/* Launch more packets now? */
> +	do_submit = (queued > 0) && (flying < scheduler->min_flying);
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	if (!do_submit && list_empty(&remove))
> +		return ret;
> +
> +	mutex_lock(&ring->dev->struct_mutex);
> +
> +	if (do_submit)
> +		ret = i915_scheduler_submit(ring, true);

Confusing to have this at a remove function. Function naming needs to
be reconsidered or moved out from here.

> +
> +	while (!list_empty(&remove)) {
> +		node = list_first_entry(&remove, typeof(*node), link);
> +		list_del(&node->link);
> +
> +		/*
> +		 * The batch buffer must be unpinned before it is unreferenced
> +		 * otherwise the unpin fails with a missing vma!?
> +		 */priority_bump_clear
> +		if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
> +			i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
> +
> +		/* Free everything that is owned by the node: */
> +		i915_gem_request_unreference(node->params.request);
> +		kfree(node->params.cliprects);
> +		kfree(node->dep_list);
> +		kfree(node);
> +	}
> +
> +	mutex_unlock(&ring->dev->struct_mutex);
> +
> +	return ret;
> +}
> +
> +static void i915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler)
> +{
> +	struct i915_scheduler_queue_entry *node;
> +	int i;
> +
> +	/*
> +	 * Ensure circular dependencies don't cause problems and that a bump
> +	 * by object usage only bumps each using buffer once:
> +	 */
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		list_for_each_entry(node, &scheduler->node_queue[i], link)
> +			node->bumped = false;
> +	}
> +}
> +
> +static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler,
> +				struct i915_scheduler_queue_entry *target,
> +				uint32_t bump)
> +{
> +	uint32_t new_priority;
> +	int      i, count;
> +
> +	if (target->priority >= scheduler->priority_level_max)
> +		return 1;
> +
> +	if (target->bumped)
> +		return 0;
> +
> +	new_priority = target->priority + bump;
> +	if ((new_priority <= target->priority) ||
> +	    (new_priority > scheduler->priority_level_max))
> +		target->priority = scheduler->priority_level_max;
> +	else
> +		target->priority = new_priority;
> +
> +	count = 1;
> +	target->bumped = true;
> +
> +	for (i = 0; i < target->num_deps; i++) {
> +		if (!target->dep_list[i])
> +			continue;
> +
> +		if (target->dep_list[i]->bumped)
> +			continue;
> +
> +		count += i915_scheduler_priority_bump(scheduler,
> +						      target->dep_list[i],
> +						      bump);
> +	}
> +
> +	return count;
> +}
> +
> +static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring,
> +				struct i915_scheduler_queue_entry **pop_node,
> +				unsigned long *flags)
> +{
> +	struct drm_i915_private            *dev_priv = ring->dev->dev_private;
> +	struct i915_scheduler              *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry  *best;
> +	struct i915_scheduler_queue_entry  *node;
> +	int     ret;
> +	int     i;
> +	bool	any_queued;
> +	bool	has_local, has_remote, only_remote;
> +
> +	*pop_node = NULL;
> +	ret = -ENODATA;
> +
> +	any_queued = false;
> +	only_remote = false;
> +	best = NULL;

These should just be initialized in-place. But looking at the code
forward. 

> +
> +	list_for_each_entry(node, &scheduler->node_queue[ring->id], link) {
> +		if (!I915_SQS_IS_QUEUED(node))
> +			continue;
> +		any_queued = true;
> +
> +		has_local  = false;
> +		has_remote = false;
> +		for (i = 0; i < node->num_deps; i++) {
> +			if (!i915_scheduler_is_dependency_valid(node, i))
> +				continue;
> +
> +			if (node->dep_list[i]->params.ring == node->params.ring)
> +				has_local = true;
> +			else
> +				has_remote = true;
> +		}
> +
> +		if (has_remote && !has_local)
> +			only_remote = true;
> +
> +		if (!has_local && !has_remote) {
> +			if (!best ||
> +			    (node->priority > best->priority))
> +				best = node;
> +		}
> +	}
> +
> +	if (best) {
> +		list_del(&best->link);
> +
> +		INIT_LIST_HEAD(&best->link);
> +		best->status  = i915_sqs_popped;
> +
> +		ret = 0;
> +	} else {
> +		/* Can only get here if:
> +		 * (a) there are no buffers in the queue
> +		 * (b) all queued buffers are dependent on other buffers
> +		 *     e.g. on a buffer that is in flight on a different ring
> +		 */
> +		if (only_remote) {
> +			/* The only dependent buffers are on another ring. */
> +			ret = -EAGAIN;
> +		} else if (any_queued) {
> +			/* It seems that something has gone horribly wrong! */
> +			DRM_ERROR("Broken dependency tracking on ring %d!\n",
> +				  (int) ring->id);
> +		}
> +	}
> +
> +	/* i915_scheduler_dump_queue_pop(ring, best); */
> +
> +	*pop_node = best;
> +	return ret;
> +}
> +
> +static int i915_scheduler_submit(struct intel_engine_cs *ring, bool was_locked)
> +{
> +	struct drm_device   *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_scheduler   *scheduler = dev_priv->scheduler;
> +	struct i915_scheduler_queue_entry  *node;
> +	unsigned long       flags;
> +	int                 ret = 0, count = 0;
> +
> +	if (!was_locked) {
> +		ret = i915_mutex_lock_interruptible(dev);
> +		if (ret)
> +			return ret;
> +	}
> +

I don't really fancy this construct. Should be moved outside of this
function for proper lockdep tracking.

> +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> +
> +	spin_lock_irqsave(&scheduler->lock, flags);
> +
> +	/* First time around, complain if anything unexpected occurs: */
> +	ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
> +	if (ret) {
> +		spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +		if (!was_locked)
> +			mutex_unlock(&dev->struct_mutex);
> +
> +		return ret;
> +	}
> +

Dropping the was_locked stuff, this should become a proper goto error
label. e.g. out_unlock

> +	do {
> +		WARN_ON(!node);
> +		WARN_ON(node->params.ring != ring);
> +		WARN_ON(node->status != i915_sqs_popped);
> +		count++;
> +
> +		/*
> +		 * The call to pop above will have removed the node from the
> +		 * list. So add it back in and mark it as in flight.
> +		 */
> +		i915_scheduler_fly_node(node);

Why do we want to pull an object out of the list inside spin lock and
push it back immediately in our critical code path? Seems like a waste
for no obvious gain at this point. Why do not we rather just select an
entry and modify it in-place, if it's going to stay in the same queue
anyway.

> +
> +		scheduler->flags[ring->id] |= i915_sf_submitting;
> +		spin_unlock_irqrestore(&scheduler->lock, flags);
> +		ret = dev_priv->gt.execbuf_final(&node->params);
> +		spin_lock_irqsave(&scheduler->lock, flags);
> +		scheduler->flags[ring->id] &= ~i915_sf_submitting;
> +
> +		if (ret) {
> +			int requeue = 1;

Multipurpose variable, not really a good idea. And as commented
further, should not exist at all.

> +
> +			/*
> +			 * Oh dear! Either the node is broken or the ring is
> +			 * busy. So need to kill the node or requeue it and try
> +			 * again later as appropriate.
> +			 */
> +
> +			switch (-ret) {
> +			case ENODEV:
> +			case ENOENT:
> +				/* Fatal errors. Kill the node. */
> +				requeue = -1;
> +			break;

"break" indent is wrong.

> +
> +			case EAGAIN:
> +			case EBUSY:
> +			case EIO:
> +			case ENOMEM:
> +			case ERESTARTSYS:
> +			case EINTR:
> +				/* Supposedly recoverable errors. */
> +			break;
> +
> +			default:
> +				/*
> +				 * Assume the error is recoverable and hope
> +				 * for the best.
> +				 */
> +				DRM_DEBUG_DRIVER("<%s> Got unexpected error from execfinal(): %d!\n",
> +						 ring->name, ret);

There's MISSING_CASE macro, should use it.

> +			break;
> +			}
> +

Just move the code below this point to the switch, no point having a
switch to categorize your options and then doing bunch of ifs to
execute code that could be in switch.

> +			/*
> +			 * Check that the watchdog/reset code has not nuked
> +			 * the node while we weren't looking:
> +			 */
> +			if (node->status == i915_sqs_dead)
> +				requeue = 0;
> +
> +			if (requeue == 1) {
> +				i915_scheduler_node_requeue(node);
> +				/*
> +				 * No point spinning if the ring is currently
> +				 * unavailable so just give up and come back
> +				 * later.
> +				 */
> +				break;
> +			} else if (requeue == -1)
> +				i915_scheduler_node_kill(node);
> +		}
> +

Ending here, this actual submission of a single node could go to its
own helper function, these functions now become too long to follow,
although they really are not doing anything complicated.

> +		/* Keep launching until the sky is sufficiently full. */
> +		if (i915_scheduler_count_flying(scheduler, ring) >=
> +						scheduler->min_flying)
> +			break;
> +
> +		ret = i915_scheduler_pop_from_queue_locked(ring, &node, &flags);
> +	} while (ret == 0);
> +
> +	spin_unlock_irqrestore(&scheduler->lock, flags);
> +
> +	if (!was_locked)
> +		mutex_unlock(&dev->struct_mutex);
> +
> +	/* Don't complain about not being able to submit extra entries */
> +	if (ret == -ENODATA)
> +		ret = 0;
> +
> +	return (ret < 0) ? ret : count;

This is a combined error and success path, keeping to the convention of
kernel drivers is preferred;

	if (ret != -ENODATA)
		goto out_foo;

	return count;

out_unlock:
	spin_unlock(...);
out_foo:
	return ret;

> +}
> +
> +static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler,
> +				struct i915_scheduler_queue_entry *remove)
> +{
> +	struct i915_scheduler_queue_entry  *node;
> +	int     i, r;
> +	int     count = 0;
> +
> +	/*
> +	 * Ensure that a node is not being removed which is still dependent
> +	 * upon other (not completed) work. If that happens, it implies
> +	 * something has gone very wrong with the dependency tracking! Note
> +	 * that there is no need to worry if this node has been explicitly
> +	 * killed for some reason - it might be being killed before it got
> +	 * sent to the hardware.
> +	 */
> +	if (remove->status != i915_sqs_dead) {
> +		for (i = 0; i < remove->num_deps; i++)
> +			if ((remove->dep_list[i]) &&
> +			    (!I915_SQS_IS_COMPLETE(remove->dep_list[i])))
> +				count++;
> +		WARN_ON(count);
> +	}
> +
> +	/*
> +	 * Remove this node from the dependency lists of any other node which
> +	 * might be waiting on it.
> +	 */
> +	for (r = 0; r < I915_NUM_RINGS; r++) {
> +		list_for_each_entry(node, &scheduler->node_queue[r], link) {
> +			for (i = 0; i < node->num_deps; i++) {
> +				if (node->dep_list[i] != remove)
> +					continue;
> +
> +				node->dep_list[i] = NULL;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> new file mode 100644
> index 0000000..00dc7f3
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -0,0 +1,91 @@
> +/*
> + * Copyright (c) 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef _I915_SCHEDULER_H_
> +#define _I915_SCHEDULER_H_
> +
> +enum i915_scheduler_queue_status {
> +	/* Limbo: */
> +	i915_sqs_none = 0,
> +	/* Not yet submitted to hardware: */
> +	i915_sqs_queued,
> +	/* Popped from queue, ready to fly: */
> +	i915_sqs_popped,
> +	/* Sent to hardware for processing: */
> +	i915_sqs_flying,
> +	/* Finished processing on the hardware: */
> +	i915_sqs_complete,
> +	/* Killed by watchdog or catastrophic submission failure: */
> +	i915_sqs_dead,
> +	/* Limit value for use with arrays/loops */
> +	i915_sqs_MAX
> +};

These should be UPPERCASE_FOR_ENUM_VALUES . See i915_drv.h for samples.

> +
> +#define I915_SQS_IS_QUEUED(node)	(((node)->status == i915_sqs_queued))
> +#define I915_SQS_IS_FLYING(node)	(((node)->status == i915_sqs_flying))
> +#define I915_SQS_IS_COMPLETE(node)	(((node)->status == i915_sqs_complete) || \
> +					 ((node)->status == i915_sqs_dead))
> +

Might be slightly confusing that name is IS_COMPLETE and there is
actual COMPLETE value. Rather have the test like IS_DONE and then test
for COMPLETE or DEAD, no confusion. Also, some states have their IS_*
macro and others don't, is there going to be more?

> +struct i915_scheduler_obj_entry {
> +	struct drm_i915_gem_object          *obj;
> +	bool                                read_only;
> +};
> +
> +struct i915_scheduler_queue_entry {
> +	struct i915_execbuffer_params       params;
> +	/* -1023 = lowest priority, 0 = default, 1023 = highest */
> +	int32_t                             priority;
> +	struct i915_scheduler_obj_entry     *saved_objects;
> +	int                                 num_objs;
> +	bool                                bumped;
> +	struct i915_scheduler_queue_entry   **dep_list;
> +	int                                 num_deps;
> +	enum i915_scheduler_queue_status    status;
> +	unsigned long                       stamp;
> +	struct list_head                    link;
> +};
> +
> +struct i915_scheduler {
> +	struct list_head    node_queue[I915_NUM_RINGS];
> +	uint32_t            flags[I915_NUM_RINGS];
> +	spinlock_t          lock;
> +
> +	/* Tuning parameters: */
> +	int32_t             priority_level_min;
> +	int32_t             priority_level_max;
> +	int32_t             priority_level_preempt;
> +	uint32_t            min_flying;
> +};
> +
> +/* Flag bits for i915_scheduler::flags */
> +enum {
> +	i915_sf_interrupts_enabled  = (1 << 0),
> +	i915_sf_submitting          = (1 << 1),

Again, should be uppercase. Also, enums to the beginning of file.

> +};
> +
> +int         i915_scheduler_init(struct drm_device *dev);
> +int         i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
> +bool        i915_scheduler_notify_request(struct drm_i915_gem_request *req);
> +
> +#endif  /* _I915_SCHEDULER_H_ */
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx