All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org,
	"# v4 . 10+" <stable@vger.kernel.org>
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Keep all engine locks across scheduling
Date: Mon, 27 Mar 2017 12:39:38 +0100	[thread overview]
Message-ID: <26f4763c-d603-d0c4-f12f-f6337aecb2f9@linux.intel.com> (raw)
In-Reply-To: <20170327103123.GC10606@nuc-i3427.alporthouse.com>


On 27/03/2017 11:31, Chris Wilson wrote:
> On Mon, Mar 27, 2017 at 11:11:47AM +0100, Tvrtko Ursulin wrote:
>>
>> On 26/03/2017 09:46, Chris Wilson wrote:
>>> Unlocking is dangerous. In this case we combine an early update to the
>>> out-of-queue request, because we know that it will be inserted into the
>>> correct FIFO priority-ordered slot when it becomes ready in the future.
>>> However, given sufficient enthusiasm, it may become ready as we are
>>> continuing to reschedule, and so may gazump the FIFO if we have since
>>> dropped its spinlock. The result is that it may be executed too early,
>>> before its dependees.
>>>
>>> Fixes: 20311bd35060 ("drm/i915/scheduler: Execute requests in order of priorities")
>>> Testcase: igt/gem_exec_whisper
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: <stable@vger.kernel.org> # v4.10+
>>> ---
>>> drivers/gpu/drm/i915/intel_lrc.c | 54 +++++++++++++++++++++++++++-------------
>>> 1 file changed, 37 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index dd0e9d587852..3fdabba0a32d 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -658,30 +658,47 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
>>> 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
>>> }
>>>
>>> -static struct intel_engine_cs *
>>> -pt_lock_engine(struct i915_priotree *pt, struct intel_engine_cs *locked)
>>> +static inline struct intel_engine_cs *
>>> +pt_lock_engine(struct i915_priotree *pt, unsigned long *locked)
>>> {
>>> -	struct intel_engine_cs *engine;
>>> -
>>> -	engine = container_of(pt,
>>> -			      struct drm_i915_gem_request,
>>> -			      priotree)->engine;
>>> -	if (engine != locked) {
>>> -		if (locked)
>>> -			spin_unlock_irq(&locked->timeline->lock);
>>> -		spin_lock_irq(&engine->timeline->lock);
>>> -	}
>>> +	struct intel_engine_cs *engine =
>>> +		container_of(pt, struct drm_i915_gem_request, priotree)->engine;
>>> +
>>> +	/* Locking the engines in a random order will rightfully trigger a
>>> +	 * spasm in lockdep. However, we can ignore lockdep (by marking each
>>> +	 * as a seperate nesting) so long as we never nest the
>>> +	 * engine->timeline->lock elsewhere. Also the number of nesting
>>> +	 * subclasses is severely limited (7) which is going to cause an
>>> +	 * issue at some point.
>>> +	 * BUILD_BUG_ON(I915_NUM_ENGINES >= MAX_LOCKDEP_SUBCLASSES);
>>
>> Lets bite the bullet and not hide this BUILD_BUG_ON in a comment. :I
>
> The code would continue to work nevertheless, just lockdep would
> eventually give up. I like it slightly better than taking either a
> global spinlock for engine->execlists_queue insertion, or taking the
> spinlock on every engine for scheduling. How often will we reschedule
> across engines? Not sure.

I think counting on "doesn't happen often" and "it still works" falls 
short of your high standards! ;) So a global execlist_queue lock if it 
must be..

>>> +	 */
>>> +	if (!__test_and_set_bit(engine->id, locked))
>>> +		spin_lock_nested(&engine->timeline->lock,
>>> +				 hweight_long(*locked));
>>>
>>> 	return engine;
>>> }
>>>
>>> +static void
>>> +unlock_engines(struct drm_i915_private *i915, unsigned long locked)
>>> +{
>>> +	struct intel_engine_cs *engine;
>>> +	unsigned long tmp;
>>> +
>>> +	for_each_engine_masked(engine, i915, locked, tmp)
>>> +		spin_unlock(&engine->timeline->lock);
>>> +}
>>> +
>>> static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> {
>>> -	struct intel_engine_cs *engine = NULL;
>>> +	struct intel_engine_cs *engine;
>>> 	struct i915_dependency *dep, *p;
>>> 	struct i915_dependency stack;
>>> +	unsigned long locked = 0;
>>> 	LIST_HEAD(dfs);
>>>
>>> +	BUILD_BUG_ON(I915_NUM_ENGINES > BITS_PER_LONG);
>>> +
>>> 	if (prio <= READ_ONCE(request->priotree.priority))
>>> 		return;
>>>
>>> @@ -691,6 +708,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> 	stack.signaler = &request->priotree;
>>> 	list_add(&stack.dfs_link, &dfs);
>>>
>>> +	GEM_BUG_ON(irqs_disabled());
>>> +	local_irq_disable();
>>> +
>>
>> Why not just irqsave/restore? Sounds like too low level for this
>> position in the flow. If just optimisation it would need a comment I
>> think.
>
> It was because we are not taking the spin lock/unlock inside the same
> block, so it felt dangerous. Who holds the irqflags?

Hm yes, it cannot be made elegant.

>>> 	/* Recursively bump all dependent priorities to match the new request.
>>> 	 *
>>> 	 * A naive approach would be to use recursion:
>>> @@ -719,7 +739,7 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> 		if (!RB_EMPTY_NODE(&pt->node))
>>> 			continue;
>>>
>>> -		engine = pt_lock_engine(pt, engine);
>>> +		engine = pt_lock_engine(pt, &locked);
>>>
>>> 		/* If it is not already in the rbtree, we can update the
>>> 		 * priority inplace and skip over it (and its dependencies)
>>> @@ -737,7 +757,7 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>>
>>> 		INIT_LIST_HEAD(&dep->dfs_link);
>>>
>>> -		engine = pt_lock_engine(pt, engine);
>>> +		engine = pt_lock_engine(pt, &locked);
>>>
>>> 		if (prio <= pt->priority)
>>> 			continue;
>>> @@ -750,8 +770,8 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> 			engine->execlist_first = &pt->node;
>>> 	}
>>>
>>> -	if (engine)
>>> -		spin_unlock_irq(&engine->timeline->lock);
>>> +	unlock_engines(request->i915, locked);
>>> +	local_irq_enable();
>>>
>>> 	/* XXX Do we need to preempt to make room for us and our deps? */
>>> }
>>>
>>
>> I am trying to think whether removing the skip on requests not in
>> the execution tree would work and help any.
>
> It's dangerous due to the duplicate branches in the dependency graph that
> we are resolving to generate the topological ordering. We need a way to
> do a mark-and-sweep whilst also ensuring that we end up with the correct
> order. I'm open to (better :) suggestions.
>
>> Or if the above scheme
>> is completely safe or we would need to lock atomically all engines
>> requests on which will be touched. Especially since the code is only
>> dealing with adjusting the priorities so I don't immediately see how
>> it can cause out of order execution.
>
> interrupt leading to submit_request, which wants to then insert a
> request into the execlist_queue rbtree vs ->schedule() also trying to
> manipulate the rbtree (and in this case elements currently outside of the
> rbtree). Our insertion into the rbtree ensures fifo so that we don't
> reorder the equivalent priority dependencies during ->schedule(), hence
> if we mark an out-of-rbtree request as a higher priority before
> inserting all of its dependencies into the tree, if the submit_notify
> occurs, it will insert the request into the tree before we get to insert
> its dependencies, hence reordering.

Ok I get the general idea. I don't have any better suggestions at the 
moment than trying the global lock. Luckily you have just removed one 
atomic from the irq handler so one step forward, two steps back. :)

Regards,

Tvrtko

WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org,
	"# v4 . 10+" <stable@vger.kernel.org>
Subject: Re: [PATCH] drm/i915: Keep all engine locks across scheduling
Date: Mon, 27 Mar 2017 12:39:38 +0100	[thread overview]
Message-ID: <26f4763c-d603-d0c4-f12f-f6337aecb2f9@linux.intel.com> (raw)
In-Reply-To: <20170327103123.GC10606@nuc-i3427.alporthouse.com>


On 27/03/2017 11:31, Chris Wilson wrote:
> On Mon, Mar 27, 2017 at 11:11:47AM +0100, Tvrtko Ursulin wrote:
>>
>> On 26/03/2017 09:46, Chris Wilson wrote:
>>> Unlocking is dangerous. In this case we combine an early update to the
>>> out-of-queue request, because we know that it will be inserted into the
>>> correct FIFO priority-ordered slot when it becomes ready in the future.
>>> However, given sufficient enthusiasm, it may become ready as we are
>>> continuing to reschedule, and so may gazump the FIFO if we have since
>>> dropped its spinlock. The result is that it may be executed too early,
>>> before its dependees.
>>>
>>> Fixes: 20311bd35060 ("drm/i915/scheduler: Execute requests in order of priorities")
>>> Testcase: igt/gem_exec_whisper
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: <stable@vger.kernel.org> # v4.10+
>>> ---
>>> drivers/gpu/drm/i915/intel_lrc.c | 54 +++++++++++++++++++++++++++-------------
>>> 1 file changed, 37 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index dd0e9d587852..3fdabba0a32d 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -658,30 +658,47 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
>>> 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
>>> }
>>>
>>> -static struct intel_engine_cs *
>>> -pt_lock_engine(struct i915_priotree *pt, struct intel_engine_cs *locked)
>>> +static inline struct intel_engine_cs *
>>> +pt_lock_engine(struct i915_priotree *pt, unsigned long *locked)
>>> {
>>> -	struct intel_engine_cs *engine;
>>> -
>>> -	engine = container_of(pt,
>>> -			      struct drm_i915_gem_request,
>>> -			      priotree)->engine;
>>> -	if (engine != locked) {
>>> -		if (locked)
>>> -			spin_unlock_irq(&locked->timeline->lock);
>>> -		spin_lock_irq(&engine->timeline->lock);
>>> -	}
>>> +	struct intel_engine_cs *engine =
>>> +		container_of(pt, struct drm_i915_gem_request, priotree)->engine;
>>> +
>>> +	/* Locking the engines in a random order will rightfully trigger a
>>> +	 * spasm in lockdep. However, we can ignore lockdep (by marking each
>>> +	 * as a seperate nesting) so long as we never nest the
>>> +	 * engine->timeline->lock elsewhere. Also the number of nesting
>>> +	 * subclasses is severely limited (7) which is going to cause an
>>> +	 * issue at some point.
>>> +	 * BUILD_BUG_ON(I915_NUM_ENGINES >= MAX_LOCKDEP_SUBCLASSES);
>>
>> Lets bite the bullet and not hide this BUILD_BUG_ON in a comment. :I
>
> The code would continue to work nevertheless, just lockdep would
> eventually give up. I like it slightly better than taking either a
> global spinlock for engine->execlists_queue insertion, or taking the
> spinlock on every engine for scheduling. How often will we reschedule
> across engines? Not sure.

I think counting on "doesn't happen often" and "it still works" falls 
short of your high standards! ;) So a global execlist_queue lock if it 
must be..

>>> +	 */
>>> +	if (!__test_and_set_bit(engine->id, locked))
>>> +		spin_lock_nested(&engine->timeline->lock,
>>> +				 hweight_long(*locked));
>>>
>>> 	return engine;
>>> }
>>>
>>> +static void
>>> +unlock_engines(struct drm_i915_private *i915, unsigned long locked)
>>> +{
>>> +	struct intel_engine_cs *engine;
>>> +	unsigned long tmp;
>>> +
>>> +	for_each_engine_masked(engine, i915, locked, tmp)
>>> +		spin_unlock(&engine->timeline->lock);
>>> +}
>>> +
>>> static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> {
>>> -	struct intel_engine_cs *engine = NULL;
>>> +	struct intel_engine_cs *engine;
>>> 	struct i915_dependency *dep, *p;
>>> 	struct i915_dependency stack;
>>> +	unsigned long locked = 0;
>>> 	LIST_HEAD(dfs);
>>>
>>> +	BUILD_BUG_ON(I915_NUM_ENGINES > BITS_PER_LONG);
>>> +
>>> 	if (prio <= READ_ONCE(request->priotree.priority))
>>> 		return;
>>>
>>> @@ -691,6 +708,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> 	stack.signaler = &request->priotree;
>>> 	list_add(&stack.dfs_link, &dfs);
>>>
>>> +	GEM_BUG_ON(irqs_disabled());
>>> +	local_irq_disable();
>>> +
>>
>> Why not just irqsave/restore? Sounds like too low level for this
>> position in the flow. If just optimisation it would need a comment I
>> think.
>
> It was because we are not taking the spin lock/unlock inside the same
> block, so it felt dangerous. Who holds the irqflags?

Hm yes, it cannot be made elegant.

>>> 	/* Recursively bump all dependent priorities to match the new request.
>>> 	 *
>>> 	 * A naive approach would be to use recursion:
>>> @@ -719,7 +739,7 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> 		if (!RB_EMPTY_NODE(&pt->node))
>>> 			continue;
>>>
>>> -		engine = pt_lock_engine(pt, engine);
>>> +		engine = pt_lock_engine(pt, &locked);
>>>
>>> 		/* If it is not already in the rbtree, we can update the
>>> 		 * priority inplace and skip over it (and its dependencies)
>>> @@ -737,7 +757,7 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>>
>>> 		INIT_LIST_HEAD(&dep->dfs_link);
>>>
>>> -		engine = pt_lock_engine(pt, engine);
>>> +		engine = pt_lock_engine(pt, &locked);
>>>
>>> 		if (prio <= pt->priority)
>>> 			continue;
>>> @@ -750,8 +770,8 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
>>> 			engine->execlist_first = &pt->node;
>>> 	}
>>>
>>> -	if (engine)
>>> -		spin_unlock_irq(&engine->timeline->lock);
>>> +	unlock_engines(request->i915, locked);
>>> +	local_irq_enable();
>>>
>>> 	/* XXX Do we need to preempt to make room for us and our deps? */
>>> }
>>>
>>
>> I am trying to think whether removing the skip on requests not in
>> the execution tree would work and help any.
>
> It's dangerous due to the duplicate branches in the dependency graph that
> we are resolving to generate the topological ordering. We need a way to
> do a mark-and-sweep whilst also ensuring that we end up with the correct
> order. I'm open to (better :) suggestions.
>
>> Or if the above scheme
>> is completely safe or we would need to lock atomically all engines
>> requests on which will be touched. Especially since the code is only
>> dealing with adjusting the priorities so I don't immediately see how
>> it can cause out of order execution.
>
> interrupt leading to submit_request, which wants to then insert a
> request into the execlist_queue rbtree vs ->schedule() also trying to
> manipulate the rbtree (and in this case elements currently outside of the
> rbtree). Our insertion into the rbtree ensures fifo so that we don't
> reorder the equivalent priority dependencies during ->schedule(), hence
> if we mark an out-of-rbtree request as a higher priority before
> inserting all of its dependencies into the tree, if the submit_notify
> occurs, it will insert the request into the tree before we get to insert
> its dependencies, hence reordering.

Ok I get the general idea. I don't have any better suggestions at the 
moment than trying the global lock. Luckily you have just removed one 
atomic from the irq handler so one step forward, two steps back. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-03-27 11:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-26  8:44 [PATCH] drm/i915: Keep all engine locks across scheduling Chris Wilson
2017-03-26  8:44 ` Chris Wilson
2017-03-26  8:46 ` Chris Wilson
2017-03-26  8:46   ` Chris Wilson
2017-03-27 10:11   ` [Intel-gfx] " Tvrtko Ursulin
2017-03-27 10:11     ` Tvrtko Ursulin
2017-03-27 10:31     ` [Intel-gfx] " Chris Wilson
2017-03-27 10:31       ` Chris Wilson
2017-03-27 11:39       ` Tvrtko Ursulin [this message]
2017-03-27 11:39         ` Tvrtko Ursulin
2017-03-27 21:06     ` [Intel-gfx] " Chris Wilson
2017-03-27 21:23       ` Chris Wilson
2017-03-26  9:03 ` ✓ Fi.CI.BAT: success for drm/i915: Keep all engine locks across scheduling (rev2) Patchwork
2017-03-27 20:21 ` [PATCH v2] drm/i915: Avoid lock dropping between rescheduling Chris Wilson
2017-03-27 20:21   ` Chris Wilson
2017-03-29  9:33   ` [Intel-gfx] " Tvrtko Ursulin
2017-03-29  9:33     ` Tvrtko Ursulin
2017-03-29 12:15     ` [Intel-gfx] " Chris Wilson
2017-03-29 12:15       ` Chris Wilson
2017-03-27 20:41 ` ✓ Fi.CI.BAT: success for drm/i915: Keep all engine locks across scheduling (rev3) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26f4763c-d603-d0c4-f12f-f6337aecb2f9@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.