From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0120C4332F for ; Mon, 19 Dec 2022 01:49:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230231AbiLSBts (ORCPT ); Sun, 18 Dec 2022 20:49:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229507AbiLSBtp (ORCPT ); Sun, 18 Dec 2022 20:49:45 -0500 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 096E2B7D2; Sun, 18 Dec 2022 17:49:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1671414581; bh=GkQ/F06qnipVISLnha8w8PWBHQp+eZ021fsAg2v1i7E=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=Ib5Ma9EHrgkB2r6wBsv01jkmTzxnubdgauyZmzesBZFic0QdDEfZP5lm4KHn4gEmN hNZkWng7NsvChAeKsmPCwYrdeiRHSc0z3cTb6WyxftP9sJMA0+ynXmT+W9hsW9w0AF Mm84mXMgJk8kLTsHU4bU3xF51hC3VOHpxL34qYfpxNTjBEysQauHicO165py86KICS sYdMDn5ZWEJ7B1V0DBWvLmKayzEW5xS9mgVb4kqhvhkrK7QnDzuKY6PLvMab5iN2tH w6rUFc+BgPeYmUdEDS7kZG4PZiPn/VcLL4tdm3kpF6BUPe7obj/EuLmuceaW1yavtM EKRirXPEzAm+w== Received: from [10.1.0.203] (192-222-188-69.qc.cable.ebox.net [192.222.188.69]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Nb2h55Sw5zbT1; Sun, 18 Dec 2022 20:49:41 -0500 (EST) Message-ID: <659763b0-eee4-10dd-5f4a-37241173809c@efficios.com> Date: Sun, 18 Dec 2022 20:50:06 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier Content-Language: en-US To: Joel Fernandes Cc: linux-kernel@vger.kernel.org, Josh Triplett , Lai Jiangshan , "Paul E. McKenney" , rcu@vger.kernel.org, Steven Rostedt References: <20221218191310.130904-1-joel@joelfernandes.org> <589da7c9-5fb7-5f6f-db88-ca464987997e@efficios.com> <2da94283-4fce-9aff-ac5d-ba181fa0f008@efficios.com> From: Mathieu Desnoyers In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022-12-18 19:24, Joel Fernandes wrote: > On Sun, Dec 18, 2022 at 7:04 PM Joel Fernandes wrote: >> >> Hi Mathieu, >> >> On Sun, Dec 18, 2022 at 6:38 PM Mathieu Desnoyers >> wrote: >>> >>> On 2022-12-18 16:30, Joel Fernandes wrote: >>>> Hi Mathieu, >>>> >>>> On Sun, Dec 18, 2022 at 3:56 PM Mathieu Desnoyers >>>> wrote: >>>>> >>>>> On 2022-12-18 14:13, Joel Fernandes (Google) wrote: >>>>>> Hello, I believe the pre-flip memory barrier is not required. The only reason I >>>>>> can say to remove it, other than the possibility that it is unnecessary, is to >>>>>> not have extra code that does not help. However, since we are issuing a fully >>>>>> memory-barrier after the flip, I cannot say that it hurts to do it anyway. >>>>>> >>>>>> For this reason, please consider these patches as "informational", than a >>>>>> "please merge". :-) Though, feel free to consider merging if you agree! >>>>>> >>>>>> All SRCU scenarios pass with these, with 6 hours of testing. >>>>> >>>>> Hi Joel, >>>>> >>>>> Please have a look at the comments in my side-rcu implementation [1, 2]. >>>>> It is similar to what SRCU does (per-cpu counter based grace period >>>>> tracking), but implemented for userspace. The comments explain why this >>>>> works without the memory barrier you identify as useless in SRCU. >>>>> >>>>> Following my implementation of side-rcu, I reviewed the SRCU comments >>>>> and identified that the barrier "/* E */" appears to be useless. I even >>>>> discussed this privately with Paul E. McKenney. >>>>> >>>>> My implementation and comments go further though, and skip the period >>>>> "flip" entirely if the first pass observes that all readers (in both >>>>> periods) are quiescent. >>>> >>>> Actually in SRCU, the first pass scans only 1 index, then does the >>>> flip, and the second pass scans the second index. Without doing a >>>> flip, an index cannot be scanned for forward progress reasons because >>>> it is still "active". So I am curious how you can skip flip and still >>>> scan both indexes? I will dig more into your implementation to learn more. >>> >>> If we look at SRCU read-side: >>> >>> int __srcu_read_lock(struct srcu_struct *ssp) >>> { >>> int idx; >>> >>> idx = READ_ONCE(ssp->srcu_idx) & 0x1; >>> this_cpu_inc(ssp->sda->srcu_lock_count[idx]); >>> smp_mb(); /* B */ /* Avoid leaking the critical section. */ >>> return idx; >>> } >>> >>> If the thread is preempted for a long period of time between load of >>> ssp->srcu_idx and increment of srcu_lock_count[idx], this means this >>> thread can appear as a "new reader" for the idx period at any arbitrary >>> time in the future, independently of which period is the current one >>> within a future grace period. >>> >>> As a result, the grace period algorithm needs to inherently support the >>> fact that a "new reader" can appear in any of the two periods, >>> independently of the current period state. >>> >>> As a result, this means that while within period "0", we _need_ to allow >>> newly coming readers to appear as we scan period "0". >> >> Sure, it already does handle it but that is I believe it is a corner >> case, not the norm. >> >>> As a result, we can simply scan both periods 0/1 for reader quiescence, >>> even while new readers appear within those periods. >> >> I think this is a bit dangerous. Yes there is the preemption thing you >> mentioned above, but that is bounded since you can only have a fixed >> number of tasks that underwent that preemption, and it is quite rare >> in the sense, each reader should get preempted just after sampling idx >> but not incrementing lock count. >> >> However, if we scan while new readers appear (outside of the above >> preemption problem), we can have counter wrap causing a false match >> much quicker. >> The scan loop is: >> check_readers(idx) { >> count_all_unlocks(idx); >> smp_mb(); >> count_all_locks(idx); >> bool done = (locks == unlocks) >> if (done) { >> // readers are done, end scan for this idx. >> } else { >> // try again later >> } >> } >> >> So if check_readers() got preempted just after the smp_mb(), then you >> can have lots of tasks enter and exit the read-side critical section >> and increment the locks count. Eventually locks == unlocks will >> happen, and it is screwed. Sure this is also theoretical, but yeah >> that issue can be made "worse" by scanning active readers >> deliberately, especially when such readers can also nest arbitrarily. >> >>> As a result, flipping between periods 0/1 is just relevant for forward >>> progress, not for correctness. >> >> Sure, agreed, forward progress. > > Adding to the last statement "But also correctness as described above". Exactly how many entry/exit of the read-side critical section while the grace period is preempted do you need to trigger this ? On a 64-bit system, where 64-bit counters are used, AFAIU this need to be exactly 2^64 read-side critical sections. There are other synchronization algorithms such as seqlocks which are quite happy with much less protection against overflow (using a 32-bit counter even on 64-bit architectures). For practical purposes, I suspect this issue is really just theoretical. Or am I missing your point ? Thanks, Mathieu > > thanks, > > - Joel -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com