From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753614AbdGCTBZ (ORCPT ); Mon, 3 Jul 2017 15:01:25 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:34879 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753039AbdGCTBW (ORCPT ); Mon, 3 Jul 2017 15:01:22 -0400 Subject: Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair To: paulmck@linux.vnet.ibm.com, Alan Stern Cc: linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, oleg@redhat.com, akpm@linux-foundation.org, mingo@redhat.com, dave@stgolabs.net, tj@kernel.org, arnd@arndb.de, linux-arch@vger.kernel.org, will.deacon@arm.com, peterz@infradead.org, parri.andrea@gmail.com, torvalds@linux-foundation.org, Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , "David S. Miller" , coreteam@netfilter.org, 1vier1@web.de References: <20170703171420.GC2393@linux.vnet.ibm.com> From: Manfred Spraul Message-ID: <53bbfa1a-2836-863d-3a5c-4f2f7f0baa40@colorfullife.com> Date: Mon, 3 Jul 2017 21:01:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <20170703171420.GC2393@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/03/2017 07:14 PM, Paul E. McKenney wrote: > On Mon, Jul 03, 2017 at 10:39:49AM -0400, Alan Stern wrote: >> On Sat, 1 Jul 2017, Manfred Spraul wrote: >> >>> As we want to remove spin_unlock_wait() and replace it with explicit >>> spin_lock()/spin_unlock() calls, we can use this to simplify the >>> locking. >>> >>> In addition: >>> - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering. >>> - The new code avoids the backwards loop. >>> >>> Only slightly tested, I did not manage to trigger calls to >>> nf_conntrack_all_lock(). >>> >>> Fixes: b16c29191dc8 >>> Signed-off-by: Manfred Spraul >>> Cc: >>> Cc: Sasha Levin >>> Cc: Pablo Neira Ayuso >>> Cc: netfilter-devel@vger.kernel.org >>> --- >>> net/netfilter/nf_conntrack_core.c | 44 +++++++++++++++++++++------------------ >>> 1 file changed, 24 insertions(+), 20 deletions(-) >>> >>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c >>> index e847dba..1193565 100644 >>> --- a/net/netfilter/nf_conntrack_core.c >>> +++ b/net/netfilter/nf_conntrack_core.c >>> @@ -96,19 +96,24 @@ static struct conntrack_gc_work conntrack_gc_work; >>> >>> void nf_conntrack_lock(spinlock_t *lock) __acquires(lock) >>> { >>> + /* 1) Acquire the lock */ >>> spin_lock(lock); >>> - while (unlikely(nf_conntrack_locks_all)) { >>> - spin_unlock(lock); >>> >>> - /* >>> - * Order the 'nf_conntrack_locks_all' load vs. the >>> - * spin_unlock_wait() loads below, to ensure >>> - * that 'nf_conntrack_locks_all_lock' is indeed held: >>> - */ >>> - smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */ >>> - spin_unlock_wait(&nf_conntrack_locks_all_lock); >>> - spin_lock(lock); >>> - } >>> + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */ >>> + if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false)) >>> + return; >> As far as I can tell, this read does not need to have ACQUIRE >> semantics. >> >> You need to guarantee that two things can never happen: >> >> (1) We read nf_conntrack_locks_all == false, and this routine's >> critical section for nf_conntrack_locks[i] runs after the >> (empty) critical section for that lock in >> nf_conntrack_all_lock(). >> >> (2) We read nf_conntrack_locks_all == true, and this routine's >> critical section for nf_conntrack_locks_all_lock runs before >> the critical section in nf_conntrack_all_lock(). I was looking at nf_conntrack_all_unlock: There is a smp_store_release() - which memory barrier does this pair with? nf_conntrack_all_unlock() smp_store_release(a, false) spin_unlock(b); nf_conntrack_lock() spin_lock(c); xx=read_once(a) if (xx==false) return >> In fact, neither one can happen even if smp_load_acquire() is replaced >> with READ_ONCE(). The reason is simple enough, using this property of >> spinlocks: >> >> If critical section CS1 runs before critical section CS2 (for >> the same lock) then: (a) every write coming before CS1's >> spin_unlock() will be visible to any read coming after CS2's >> spin_lock(), and (b) no write coming after CS2's spin_lock() >> will be visible to any read coming before CS1's spin_unlock(). Does this apply? The locks are different. >> Thus for (1), assuming the critical sections run in the order mentioned >> above, since nf_conntrack_all_lock() writes to nf_conntrack_locks_all >> before releasing nf_conntrack_locks[i], and since nf_conntrack_lock() >> acquires nf_conntrack_locks[i] before reading nf_conntrack_locks_all, >> by (a) the read will always see the write. >> >> Similarly for (2), since nf_conntrack_all_lock() acquires >> nf_conntrack_locks_all_lock before writing to nf_conntrack_locks_all, >> and since nf_conntrack_lock() reads nf_conntrack_locks_all before >> releasing nf_conntrack_locks_all_lock, by (b) the read cannot see the >> write. > And the Linux kernel memory model (https://lwn.net/Articles/718628/ > and https://lwn.net/Articles/720550/) agrees with Alan. Here is > a litmus test, which emulates spin_lock() with xchg_acquire() and > spin_unlock() with smp_store_release(): > > ------------------------------------------------------------------------ > > C C-ManfredSpraul-L1G1xchgnr.litmus > > (* Expected result: Never. *) > > { > } > > P0(int *nfcla, spinlock_t *gbl, int *gbl_held, spinlock_t *lcl, int *lcl_held) > { > /* Acquire local lock. */ > r10 = xchg_acquire(lcl, 1); > r1 = READ_ONCE(*nfcla); > if (r1) { > smp_store_release(lcl, 0); > r11 = xchg_acquire(gbl, 1); > r12 = xchg_acquire(lcl, 1); > smp_store_release(gbl, 0); > } > r2 = READ_ONCE(*gbl_held); > WRITE_ONCE(*lcl_held, 1); > WRITE_ONCE(*lcl_held, 0); > smp_store_release(lcl, 0); > } > > P1(int *nfcla, spinlock_t *gbl, int *gbl_held, spinlock_t *lcl, int *lcl_held) > { > /* Acquire global lock. */ > r10 = xchg_acquire(gbl, 1); > WRITE_ONCE(*nfcla, 1); > r11 = xchg_acquire(lcl, 1); > smp_store_release(lcl, 0); > r2 = READ_ONCE(*lcl_held); > WRITE_ONCE(*gbl_held, 1); > WRITE_ONCE(*gbl_held, 0); Where is the write that resets nfcla=0? > smp_store_release(gbl, 0); > } > > exists > ((0:r2=1 \/ 1:r2=1) /\ 0:r10=0 /\ 0:r11=0 /\ 0:r12=0 /\ 1:r10=0 /\ 1:r11=0) > > ------------------------------------------------------------------------ > > The memory model says that the forbidden state does not happen: [...] > Manfred, any objections to my changing your patch as Alan suggests? I tried to pair the memory barriers: nf_conntrack_all_unlock() contains a smp_store_release(). What does that pair with? -- Manfred