From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Pk+o=7A=lists.freedesktop.org=intel-gfx-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	HK_RANDOM_FROM,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,
	SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 713C6C433DF
	for <intel-gfx@archiver.kernel.org>; Mon, 18 May 2020 14:55:53 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 38C1B207C4
	for <intel-gfx@archiver.kernel.org>; Mon, 18 May 2020 14:55:53 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 38C1B207C4
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id C095B6E41A;
	Mon, 18 May 2020 14:55:52 +0000 (UTC)
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by gabe.freedesktop.org (Postfix) with ESMTPS id CDCC16E41A
 for <intel-gfx@lists.freedesktop.org>; Mon, 18 May 2020 14:55:51 +0000 (UTC)
IronPort-SDR: fm9Y+4x2C+gPTopxbg8nY/Es+9j4WROKGrTvrFcupfEAfGxfjdIddwoINLQyxl1R8iwJNY88Hi
 /P70x+DdUWkA==
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
 by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 May 2020 07:55:51 -0700
IronPort-SDR: owtfmWN3rebyntllpwSDwpvsuMfOejZ/Eww3RqJdNzbfRyp1BnOGVM9qXyjhSPMzrikQjh92uy
 jXxMWJWHsdXA==
X-IronPort-AV: E=Sophos;i="5.73,407,1583222400"; d="scan'208";a="439244564"
Received: from coheno1-mobl.ger.corp.intel.com (HELO [10.214.214.153])
 ([10.214.214.153])
 by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 18 May 2020 07:55:49 -0700
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
References: <20200518081440.17948-1-chris@chris-wilson.co.uk>
 <20200518081440.17948-7-chris@chris-wilson.co.uk>
 <adfc123c-7ae4-5d27-cd01-b3d050e3a25b@linux.intel.com>
 <158980685142.17769.13828694630708094538@build.alporthouse.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Organization: Intel Corporation UK Plc
Message-ID: <3878c571-9353-67f7-b979-9d03209fa8c4@linux.intel.com>
Date: Mon, 18 May 2020 15:55:46 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.7.0
MIME-Version: 1.0
In-Reply-To: <158980685142.17769.13828694630708094538@build.alporthouse.com>
Content-Language: en-US
Subject: Re: [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual
 engines
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>


On 18/05/2020 14:00, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-05-18 13:53:29)
>>
>> On 18/05/2020 09:14, Chris Wilson wrote:
>>> Once a virtual engine has been bound to a sibling, it will remain bound
>>> until we finally schedule out the last active request. We can not rebind
>>> the context to a new sibling while it is inflight as the context save
>>> will conflict, hence we wait. As we cannot then use any other sibliing
>>> while the context is inflight, only kick the bound sibling while it
>>> inflight and upon scheduling out the kick the rest (so that we can swap
>>> engines on timeslicing if the previously bound engine becomes
>>> oversubscribed).
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
>>>    1 file changed, 13 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index 7a5ac3375225..fe8f3518d6b8 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
>>>    static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>>>    {
>>>        struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
>>> -     struct i915_request *next = READ_ONCE(ve->request);
>>>    
>>> -     if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
>>> +     if (READ_ONCE(ve->request))
>>>                tasklet_hi_schedule(&ve->base.execlists.tasklet);
>>>    }
>>>    
>>> @@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
>>>                        rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
>>>                struct i915_request *rq = READ_ONCE(ve->request);
>>>    
>>> -             if (!rq) { /* lazily cleanup after another engine handled rq */
>>> +             /* lazily cleanup after another engine handled rq */
>>> +             if (!rq || !virtual_matches(ve, rq, engine)) {
>>>                        rb_erase_cached(rb, &el->virtual);
>>>                        RB_CLEAR_NODE(rb);
>>>                        rb = rb_first_cached(&el->virtual);
>>>                        continue;
>>>                }
>>>    
>>> -             if (!virtual_matches(ve, rq, engine)) {
>>> -                     rb = rb_next(rb);
>>> -                     continue;
>>> -             }
>>> -
>>>                return ve;
>>>        }
>>>    
>>> @@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data)
>>>        if (unlikely(!mask))
>>>                return;
>>>    
>>> -     local_irq_disable();
>>>        for (n = 0; n < ve->num_siblings; n++) {
>>>                struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
>>>                struct ve_node * const node = &ve->nodes[sibling->id];
>>> @@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data)
>>>                if (!READ_ONCE(ve->request))
>>>                        break; /* already handled by a sibling's tasklet */
>>>    
>>> +             spin_lock_irq(&sibling->active.lock);
>>> +
>>>                if (unlikely(!(mask & sibling->mask))) {
>>>                        if (!RB_EMPTY_NODE(&node->rb)) {
>>> -                             spin_lock(&sibling->active.lock);
>>>                                rb_erase_cached(&node->rb,
>>>                                                &sibling->execlists.virtual);
>>>                                RB_CLEAR_NODE(&node->rb);
>>> -                             spin_unlock(&sibling->active.lock);
>>>                        }
>>> -                     continue;
>>> -             }
>>>    
>>> -             spin_lock(&sibling->active.lock);
>>> +                     goto unlock_engine;
>>> +             }
>>>    
>>> -             if (!RB_EMPTY_NODE(&node->rb)) {
>>> +             if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
>>>                        /*
>>>                         * Cheat and avoid rebalancing the tree if we can
>>>                         * reuse this node in situ.
>>> @@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data)
>>>                if (first && prio >= sibling->execlists.queue_priority_hint)
>>>                        tasklet_hi_schedule(&sibling->execlists.tasklet);
>>>    
>>> -             spin_unlock(&sibling->active.lock);
>>> +unlock_engine:
>>> +             spin_unlock_irq(&sibling->active.lock);
>>> +
>>> +             if (intel_context_inflight(&ve->context))
>>> +                     break;
>>
>> So virtual request may not be added to all siblings any longer. Will it
>> still be able to schedule it on any if time slicing kicks in under these
>> conditions?
> 
> Yes.
>   
>> This is equivalent to the hunk in first_virtual_engine which also
>> removes it from all other siblings.
>>
>> I guess it's inline with what the commit messages says - that new
>> sibling will be picked upon time slicing. I just don't quite see the
>> path which would do it. Only path which shuffles the siblings array
>> around is in dequeue, and dequeue on other that the engine which first
>> picked it will not happen any more. I must be missing something..
> 
> It's all on the execlists_schedule_out. During timeslicing we call
> unwind_incomplete_requests which moves the requests back to the priotree
> (and in this patch back to the virtual engine).
> 
> But... We cannot use the virtual request on any other engine until it has
> been scheduled out. That only happens after we complete execlists_dequeue()
> and submit a new ELSP[]. Once the HW acks the change, we call
> execlists_schedule_out on the virtual_request.
> 
> Now we known that intel_context_inflight() will return false so any
> engine can pick up the request, and so it's time to kick the virtual
> tasklet and in turn kick all the siblings.
> 
> So timeslicing works by not submitting the virtual request again and
> when it expires on this sibling[0], we wake up all the other siblings
> and the first that is idle wins the race.

If a virtual request is on hw and timeslice expires:

1. Unwinds the request.
       -> kicks the virtual tasklet
2. Virtual tasklet runs and puts the request back on siblings.
       -> kicks the physical tasklets
3. Siblings tasklet runs and submits the request.

So two tasklets latency even if no other runnable requests?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx