All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/9] drm/i915/gt: Unlock engine-pm after queuing the kernel context switch
Date: Wed, 20 Nov 2019 12:40:13 +0000	[thread overview]
Message-ID: <b560217a-5e19-c43d-59b7-b6957c50e59d@linux.intel.com> (raw)
In-Reply-To: <157425165727.13839.12483018227008886642@skylake-alporthouse-com>


On 20/11/2019 12:07, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-11-20 11:58:27)
>>
>> On 20/11/2019 09:32, Chris Wilson wrote:
>>> In commit a79ca656b648 ("drm/i915: Push the wakeref->count deferral to
>>> the backend"), I erroneously concluded that we last modify the engine
>>> inside __i915_request_commit() meaning that we could enable concurrent
>>> submission for userspace as we enqueued this request. However, this
>>> falls into a trap with other users of the engine->kernel_context waking
>>> up and submitting their request before the idle-switch is queued, with
>>> the result that the kernel_context is executed out-of-sequence most
>>> likely upsetting the GPU and certainly ourselves when we try to retire
>>> the out-of-sequence requests.
>>>
>>> As such we need to hold onto the effective engine->kernel_context mutex
>>> lock (via the engine pm mutex proxy) until we have finish queuing the
>>> request to the engine.
>>>
>>> v2: Serialise against concurrent intel_gt_retire_requests()
>>> v3: Describe the hairy locking scheme with intel_gt_retire_requests()
>>> for future reference.
>>> v4: Combine timeline->lock and engine pm release; it's hairy.
>>>
>>> Fixes: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_engine_pm.c | 47 +++++++++++++++++++----
>>>    1 file changed, 40 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> index 3c0f490ff2c7..1f517357a268 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> @@ -73,8 +73,25 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
>>>    
>>>    #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
>>>    
>>> +static void
>>> +__intel_timeline_enter_and_pm_release(struct intel_timeline *tl,
>>> +                                   struct intel_engine_cs *engine)
>>> +{
>>> +     struct intel_gt_timelines *timelines = &engine->gt->timelines;
>>> +
>>> +     spin_lock(&timelines->lock);
>>> +
>>> +     if (!atomic_fetch_inc(&tl->active_count))
>>> +             list_add_tail(&tl->link, &timelines->active_list);
>>
>> Hmm with these new part it maybe matches/answers my question from
>> "drm/i915/gt: Close race between engine_park and
>> intel_gt_retire_requests". I think at least. Since it now adds a second
>> place timeline can enter the active_list.
>>
>> But no, where does the intel_timeline_enter race come from? Can't be
>> userspace submission since they are blocked on wf->lock.
> 
> The race is not just with intel_timeline_enter, but with
> intel_gt_retire_requests.
> 
> As soon as we are on that list, we may be retired. If we are retired
> before adjusting the engine->wakeref.count, we are b0rked.
> 
> As soon as we adjust the engine->wakeref.count, another submission may
> call intel_timeline_enter, and again may even retire this request. The
> enter itself is perfectly fine, but we need to serialise against the
> retires.

I think the two patches combined work, I am just not sure two 
atomic_fetch_inc()->list_add() are needed now that you re-ordered 
__i915_requests_queue and __intel_wakeref_defer_park - that's the part 
which is confusing me. But it also doesn't harm...

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 3/9] drm/i915/gt: Unlock engine-pm after queuing the kernel context switch
Date: Wed, 20 Nov 2019 12:40:13 +0000	[thread overview]
Message-ID: <b560217a-5e19-c43d-59b7-b6957c50e59d@linux.intel.com> (raw)
Message-ID: <20191120124013.zqMiNCu8BqEsnJkrwPZsSP41TSW2ZbBMPf9DRcG4gGg@z> (raw)
In-Reply-To: <157425165727.13839.12483018227008886642@skylake-alporthouse-com>


On 20/11/2019 12:07, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-11-20 11:58:27)
>>
>> On 20/11/2019 09:32, Chris Wilson wrote:
>>> In commit a79ca656b648 ("drm/i915: Push the wakeref->count deferral to
>>> the backend"), I erroneously concluded that we last modify the engine
>>> inside __i915_request_commit() meaning that we could enable concurrent
>>> submission for userspace as we enqueued this request. However, this
>>> falls into a trap with other users of the engine->kernel_context waking
>>> up and submitting their request before the idle-switch is queued, with
>>> the result that the kernel_context is executed out-of-sequence most
>>> likely upsetting the GPU and certainly ourselves when we try to retire
>>> the out-of-sequence requests.
>>>
>>> As such we need to hold onto the effective engine->kernel_context mutex
>>> lock (via the engine pm mutex proxy) until we have finish queuing the
>>> request to the engine.
>>>
>>> v2: Serialise against concurrent intel_gt_retire_requests()
>>> v3: Describe the hairy locking scheme with intel_gt_retire_requests()
>>> for future reference.
>>> v4: Combine timeline->lock and engine pm release; it's hairy.
>>>
>>> Fixes: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend")
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_engine_pm.c | 47 +++++++++++++++++++----
>>>    1 file changed, 40 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> index 3c0f490ff2c7..1f517357a268 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> @@ -73,8 +73,25 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
>>>    
>>>    #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
>>>    
>>> +static void
>>> +__intel_timeline_enter_and_pm_release(struct intel_timeline *tl,
>>> +                                   struct intel_engine_cs *engine)
>>> +{
>>> +     struct intel_gt_timelines *timelines = &engine->gt->timelines;
>>> +
>>> +     spin_lock(&timelines->lock);
>>> +
>>> +     if (!atomic_fetch_inc(&tl->active_count))
>>> +             list_add_tail(&tl->link, &timelines->active_list);
>>
>> Hmm with these new part it maybe matches/answers my question from
>> "drm/i915/gt: Close race between engine_park and
>> intel_gt_retire_requests". I think at least. Since it now adds a second
>> place timeline can enter the active_list.
>>
>> But no, where does the intel_timeline_enter race come from? Can't be
>> userspace submission since they are blocked on wf->lock.
> 
> The race is not just with intel_timeline_enter, but with
> intel_gt_retire_requests.
> 
> As soon as we are on that list, we may be retired. If we are retired
> before adjusting the engine->wakeref.count, we are b0rked.
> 
> As soon as we adjust the engine->wakeref.count, another submission may
> call intel_timeline_enter, and again may even retire this request. The
> enter itself is perfectly fine, but we need to serialise against the
> retires.

I think the two patches combined work, I am just not sure two 
atomic_fetch_inc()->list_add() are needed now that you re-ordered 
__i915_requests_queue and __intel_wakeref_defer_park - that's the part 
which is confusing me. But it also doesn't harm...

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-11-20 12:40 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20  9:32 [PATCH 1/9] drm/i915/selftests: Take a ref to the request we wait upon Chris Wilson
2019-11-20  9:32 ` [Intel-gfx] " Chris Wilson
2019-11-20  9:32 ` [PATCH 2/9] drm/i915/gt: Close race between engine_park and intel_gt_retire_requests Chris Wilson
2019-11-20  9:32   ` [Intel-gfx] " Chris Wilson
2019-11-20 13:19   ` Tvrtko Ursulin
2019-11-20 13:19     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20  9:32 ` [PATCH 3/9] drm/i915/gt: Unlock engine-pm after queuing the kernel context switch Chris Wilson
2019-11-20  9:32   ` [Intel-gfx] " Chris Wilson
2019-11-20 11:58   ` Tvrtko Ursulin
2019-11-20 11:58     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20 12:07     ` Chris Wilson
2019-11-20 12:07       ` [Intel-gfx] " Chris Wilson
2019-11-20 12:40       ` Tvrtko Ursulin [this message]
2019-11-20 12:40         ` Tvrtko Ursulin
2019-11-20 12:44         ` Chris Wilson
2019-11-20 12:44           ` [Intel-gfx] " Chris Wilson
2019-11-20 13:19           ` Tvrtko Ursulin
2019-11-20 13:19             ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20  9:32 ` [PATCH 4/9] drm/i915: Mark up the calling context for intel_wakeref_put() Chris Wilson
2019-11-20  9:32   ` [Intel-gfx] " Chris Wilson
2019-11-20 12:46   ` Tvrtko Ursulin
2019-11-20 12:46     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20  9:32 ` [PATCH 5/9] drm/i915/gt: Declare timeline.lock to be irq-free Chris Wilson
2019-11-20  9:32   ` [Intel-gfx] " Chris Wilson
2019-11-20  9:32 ` [PATCH 6/9] drm/i915/selftests: Force bonded submission to overlap Chris Wilson
2019-11-20  9:32   ` [Intel-gfx] " Chris Wilson
2019-11-20 12:55   ` Tvrtko Ursulin
2019-11-20 12:55     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20 12:59     ` Chris Wilson
2019-11-20 12:59       ` [Intel-gfx] " Chris Wilson
2019-11-20 13:18       ` Tvrtko Ursulin
2019-11-20 13:18         ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20 13:29         ` Chris Wilson
2019-11-20 13:29           ` [Intel-gfx] " Chris Wilson
2019-11-22  9:34           ` Tvrtko Ursulin
2019-11-22  9:34             ` [Intel-gfx] " Tvrtko Ursulin
2019-11-22 10:03             ` Chris Wilson
2019-11-22 10:03               ` [Intel-gfx] " Chris Wilson
2019-11-22 10:43   ` [PATCH] " Chris Wilson
2019-11-22 10:43     ` [Intel-gfx] " Chris Wilson
2019-11-20  9:33 ` [PATCH 7/9] drm/i915/selftests: Flush the active callbacks Chris Wilson
2019-11-20  9:33   ` [Intel-gfx] " Chris Wilson
2019-11-20  9:33 ` [PATCH 8/9] drm/i915/selftests: Be explicit in ERR_PTR handling Chris Wilson
2019-11-20  9:33   ` [Intel-gfx] " Chris Wilson
2019-11-20 10:23   ` Matthew Auld
2019-11-20 10:23     ` [Intel-gfx] " Matthew Auld
2019-11-20  9:33 ` [PATCH 9/9] drm/i915/gt: Schedule request retirement when timeline idles Chris Wilson
2019-11-20  9:33   ` [Intel-gfx] " Chris Wilson
2019-11-20 13:16   ` Tvrtko Ursulin
2019-11-20 13:16     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20 13:39     ` Chris Wilson
2019-11-20 13:39       ` [Intel-gfx] " Chris Wilson
2019-11-20 14:16       ` Tvrtko Ursulin
2019-11-20 14:16         ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20 14:25         ` Chris Wilson
2019-11-20 14:25           ` [Intel-gfx] " Chris Wilson
2019-11-20 10:19 ` [PATCH 1/9] drm/i915/selftests: Take a ref to the request we wait upon Matthew Auld
2019-11-20 10:19   ` [Intel-gfx] " Matthew Auld
2019-11-20 10:25   ` Chris Wilson
2019-11-20 10:25     ` [Intel-gfx] " Chris Wilson
2019-11-20 10:27 ` [PATCH] " Chris Wilson
2019-11-20 10:27   ` [Intel-gfx] " Chris Wilson
2019-11-20 10:34   ` Matthew Auld
2019-11-20 10:34     ` [Intel-gfx] " Matthew Auld
2019-11-20 13:51 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with drm/i915/selftests: Take a ref to the request we wait upon (rev2) Patchwork
2019-11-20 13:51   ` [Intel-gfx] " Patchwork
2019-11-20 14:19 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-11-20 14:19   ` [Intel-gfx] " Patchwork
2019-11-20 14:20   ` Chris Wilson
2019-11-20 14:20     ` [Intel-gfx] " Chris Wilson
2019-11-22 12:33 ` ✗ Fi.CI.BUILD: failure for series starting with drm/i915/selftests: Take a ref to the request we wait upon (rev3) Patchwork
2019-11-22 12:33   ` [Intel-gfx] " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b560217a-5e19-c43d-59b7-b6957c50e59d@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.