From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, HK_RANDOM_FROM,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 713C6C433DF for ; Mon, 18 May 2020 14:55:53 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 38C1B207C4 for ; Mon, 18 May 2020 14:55:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 38C1B207C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C095B6E41A; Mon, 18 May 2020 14:55:52 +0000 (UTC) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id CDCC16E41A for ; Mon, 18 May 2020 14:55:51 +0000 (UTC) IronPort-SDR: fm9Y+4x2C+gPTopxbg8nY/Es+9j4WROKGrTvrFcupfEAfGxfjdIddwoINLQyxl1R8iwJNY88Hi /P70x+DdUWkA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2020 07:55:51 -0700 IronPort-SDR: owtfmWN3rebyntllpwSDwpvsuMfOejZ/Eww3RqJdNzbfRyp1BnOGVM9qXyjhSPMzrikQjh92uy jXxMWJWHsdXA== X-IronPort-AV: E=Sophos;i="5.73,407,1583222400"; d="scan'208";a="439244564" Received: from coheno1-mobl.ger.corp.intel.com (HELO [10.214.214.153]) ([10.214.214.153]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 May 2020 07:55:49 -0700 To: Chris Wilson , intel-gfx@lists.freedesktop.org References: <20200518081440.17948-1-chris@chris-wilson.co.uk> <20200518081440.17948-7-chris@chris-wilson.co.uk> <158980685142.17769.13828694630708094538@build.alporthouse.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc Message-ID: <3878c571-9353-67f7-b979-9d03209fa8c4@linux.intel.com> Date: Mon, 18 May 2020 15:55:46 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <158980685142.17769.13828694630708094538@build.alporthouse.com> Content-Language: en-US Subject: Re: [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 18/05/2020 14:00, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2020-05-18 13:53:29) >> >> On 18/05/2020 09:14, Chris Wilson wrote: >>> Once a virtual engine has been bound to a sibling, it will remain bound >>> until we finally schedule out the last active request. We can not rebind >>> the context to a new sibling while it is inflight as the context save >>> will conflict, hence we wait. As we cannot then use any other sibliing >>> while the context is inflight, only kick the bound sibling while it >>> inflight and upon scheduling out the kick the rest (so that we can swap >>> engines on timeslicing if the previously bound engine becomes >>> oversubscribed). >>> >>> Signed-off-by: Chris Wilson >>> --- >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++---------------- >>> 1 file changed, 13 insertions(+), 17 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c >>> index 7a5ac3375225..fe8f3518d6b8 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>> @@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx) >>> static void kick_siblings(struct i915_request *rq, struct intel_context *ce) >>> { >>> struct virtual_engine *ve = container_of(ce, typeof(*ve), context); >>> - struct i915_request *next = READ_ONCE(ve->request); >>> >>> - if (next == rq || (next && next->execution_mask & ~rq->execution_mask)) >>> + if (READ_ONCE(ve->request)) >>> tasklet_hi_schedule(&ve->base.execlists.tasklet); >>> } >>> >>> @@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine) >>> rb_entry(rb, typeof(*ve), nodes[engine->id].rb); >>> struct i915_request *rq = READ_ONCE(ve->request); >>> >>> - if (!rq) { /* lazily cleanup after another engine handled rq */ >>> + /* lazily cleanup after another engine handled rq */ >>> + if (!rq || !virtual_matches(ve, rq, engine)) { >>> rb_erase_cached(rb, &el->virtual); >>> RB_CLEAR_NODE(rb); >>> rb = rb_first_cached(&el->virtual); >>> continue; >>> } >>> >>> - if (!virtual_matches(ve, rq, engine)) { >>> - rb = rb_next(rb); >>> - continue; >>> - } >>> - >>> return ve; >>> } >>> >>> @@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data) >>> if (unlikely(!mask)) >>> return; >>> >>> - local_irq_disable(); >>> for (n = 0; n < ve->num_siblings; n++) { >>> struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); >>> struct ve_node * const node = &ve->nodes[sibling->id]; >>> @@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data) >>> if (!READ_ONCE(ve->request)) >>> break; /* already handled by a sibling's tasklet */ >>> >>> + spin_lock_irq(&sibling->active.lock); >>> + >>> if (unlikely(!(mask & sibling->mask))) { >>> if (!RB_EMPTY_NODE(&node->rb)) { >>> - spin_lock(&sibling->active.lock); >>> rb_erase_cached(&node->rb, >>> &sibling->execlists.virtual); >>> RB_CLEAR_NODE(&node->rb); >>> - spin_unlock(&sibling->active.lock); >>> } >>> - continue; >>> - } >>> >>> - spin_lock(&sibling->active.lock); >>> + goto unlock_engine; >>> + } >>> >>> - if (!RB_EMPTY_NODE(&node->rb)) { >>> + if (unlikely(!RB_EMPTY_NODE(&node->rb))) { >>> /* >>> * Cheat and avoid rebalancing the tree if we can >>> * reuse this node in situ. >>> @@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data) >>> if (first && prio >= sibling->execlists.queue_priority_hint) >>> tasklet_hi_schedule(&sibling->execlists.tasklet); >>> >>> - spin_unlock(&sibling->active.lock); >>> +unlock_engine: >>> + spin_unlock_irq(&sibling->active.lock); >>> + >>> + if (intel_context_inflight(&ve->context)) >>> + break; >> >> So virtual request may not be added to all siblings any longer. Will it >> still be able to schedule it on any if time slicing kicks in under these >> conditions? > > Yes. > >> This is equivalent to the hunk in first_virtual_engine which also >> removes it from all other siblings. >> >> I guess it's inline with what the commit messages says - that new >> sibling will be picked upon time slicing. I just don't quite see the >> path which would do it. Only path which shuffles the siblings array >> around is in dequeue, and dequeue on other that the engine which first >> picked it will not happen any more. I must be missing something.. > > It's all on the execlists_schedule_out. During timeslicing we call > unwind_incomplete_requests which moves the requests back to the priotree > (and in this patch back to the virtual engine). > > But... We cannot use the virtual request on any other engine until it has > been scheduled out. That only happens after we complete execlists_dequeue() > and submit a new ELSP[]. Once the HW acks the change, we call > execlists_schedule_out on the virtual_request. > > Now we known that intel_context_inflight() will return false so any > engine can pick up the request, and so it's time to kick the virtual > tasklet and in turn kick all the siblings. > > So timeslicing works by not submitting the virtual request again and > when it expires on this sibling[0], we wake up all the other siblings > and the first that is idle wins the race. If a virtual request is on hw and timeslice expires: 1. Unwinds the request. -> kicks the virtual tasklet 2. Virtual tasklet runs and puts the request back on siblings. -> kicks the physical tasklets 3. Siblings tasklet runs and submits the request. So two tasklets latency even if no other runnable requests? Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx