From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D73F3C433E0 for ; Wed, 29 Jul 2020 13:42:13 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AE6EF2070B for ; Wed, 29 Jul 2020 13:42:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE6EF2070B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 470846E51D; Wed, 29 Jul 2020 13:42:13 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id D2FB36E51D for ; Wed, 29 Jul 2020 13:42:10 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from localhost (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP (TLS) id 21969539-1500050 for multiple; Wed, 29 Jul 2020 14:42:06 +0100 MIME-Version: 1.0 In-Reply-To: <3274fa3b-7c34-1925-2cd6-145c73acb63b@linux.intel.com> References: <20200715115147.11866-1-chris@chris-wilson.co.uk> <20200715115147.11866-8-chris@chris-wilson.co.uk> <54de929a-9449-8ac2-a8bd-641a61d0525e@linux.intel.com> <159594649917.665.6631422765642650487@build.alporthouse.com> <3274fa3b-7c34-1925-2cd6-145c73acb63b@linux.intel.com> From: Chris Wilson To: Tvrtko Ursulin , intel-gfx@lists.freedesktop.org Date: Wed, 29 Jul 2020 14:42:06 +0100 Message-ID: <159603012686.8877.9862976259674771406@build.alporthouse.com> User-Agent: alot/0.9 Subject: Re: [Intel-gfx] [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Quoting Tvrtko Ursulin (2020-07-29 13:40:38) > > On 28/07/2020 15:28, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2020-07-17 14:04:58) > >> > >> On 15/07/2020 12:50, Chris Wilson wrote: > >>> Rather than require the next timeline after idling to match the MRU > >>> before idling, reset the index on the node and allow it to match the > >>> first request. However, this requires cmpxchg(u64) and so is not trivial > >>> on 32b, so for compatibility we just fallback to keeping the cached node > >>> pointing to the MRU timeline. > >>> > >>> Signed-off-by: Chris Wilson > >>> --- > >>> drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++-- > >>> 1 file changed, 19 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c > >>> index 0854b1552bc1..6737b5615c0c 100644 > >>> --- a/drivers/gpu/drm/i915/i915_active.c > >>> +++ b/drivers/gpu/drm/i915/i915_active.c > >>> @@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref) > >>> rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node); > >>> rb_insert_color(&ref->cache->node, &ref->tree); > >>> GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node); > >>> + > >>> + /* Make the cached node available for reuse with any timeline */ > >>> + if (IS_ENABLED(CONFIG_64BIT)) > >>> + ref->cache->timeline = 0; /* needs cmpxchg(u64) */ > >> > >> Or when fence context wraps shock horror. > > > > I more concerned about that we use timeline:0 as a special unordered > > timeline. It's reserved by use in the dma_fence_stub, and everything > > will start to break when the timelines wrap. The earliest causalities > > will be the kernel_context timelines which are also very special indices > > for the barriers. > > > >> > >>> } > >>> > >>> spin_unlock_irqrestore(&ref->tree_lock, flags); > >>> @@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx) > >>> { > >>> struct active_node *it; > >>> > >>> + GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */ > >>> + > >>> it = READ_ONCE(ref->cache); > >>> - if (it && it->timeline == idx) > >>> - return it; > >>> + if (it) { > >>> + u64 cached = READ_ONCE(it->timeline); > >>> + > >>> + if (cached == idx) > >>> + return it; > >>> + > >>> +#ifdef CONFIG_64BIT /* for cmpxchg(u64) */ > >>> + if (!cached && !cmpxchg(&it->timeline, 0, idx)) { > >>> + GEM_BUG_ON(i915_active_fence_isset(&it->base)); > >>> + return it; > >> > >> cpmxchg suggests this needs to be atomic, however above the check for > >> equality comes from a separate read. > > > > That's fine, and quite common to avoid cmpxchg if the current value > > already does not match the expected condition. > > How? What is another thread is about to install its idx into > it->timeline with cmpxchg and this thread does not see it because it > just returned on the "cached == idx" condition. Because it's nonzero. If the idx is already non-zero, it will always remain non-zero until everybody idles (and there are no more threads). If the idx is zero, it can only transition to non-zero once, atomically via cmpxchg. The first and only first cmpxchg will return that the previous value was 0, and so return with it->idx == idx. > > > > >> Since there is a lookup code path under the spinlock, perhaps the > >> unlocked lookup could just fail, and then locked lookup could re-assign > >> the timeline without the need for cmpxchg? > > > > The unlocked/locked lookup are the same routine. You pointed that out > > :-p > > Like I remember from ten days ago.. Anyway, I am pointing out it still > doesn't smell right. > > __active_lookup(...) -> lockless > { > ... > it = fetch_node(ref->tree.rb_node); > while (it) { > if (it->timeline < idx) { > it = fetch_node(it->node.rb_right); > } else if (it->timeline > idx) { > it = fetch_node(it->node.rb_left); > } else { > WRITE_ONCE(ref->cache, it); > break; > } > } > ... > } > > Then in active_instance, locked: > > ... > parent = NULL; > p = &ref->tree.rb_node; > while (*p) { > parent = *p; > > node = rb_entry(parent, struct active_node, node); > if (node->timeline == idx) { > kmem_cache_free(global.slab_cache, prealloc); > goto out; > } > > if (node->timeline < idx) > p = &parent->rb_right; > else > p = &parent->rb_left; > WRITE_ONCE(ref->cache, it); > break; > } > } > ... > > Tree walk could be consolidated between the two. This tree walk is subtly different, as we aren't just interested in the node, but its parent. The exact repetitions have been consolidated into __active_lookup. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx