From: Manfred Spraul <manfred@colorfullife.com>
To: paulmck@linux.vnet.ibm.com, Alan Stern <stern@rowland.harvard.edu>
Cc: linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org,
netdev@vger.kernel.org, oleg@redhat.com,
akpm@linux-foundation.org, mingo@redhat.com, dave@stgolabs.net,
tj@kernel.org, arnd@arndb.de, linux-arch@vger.kernel.org,
will.deacon@arm.com, peterz@infradead.org,
parri.andrea@gmail.com, torvalds@linux-foundation.org,
Pablo Neira Ayuso <pablo@netfilter.org>,
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
Florian Westphal <fw@strlen.de>,
"David S. Miller" <davem@davemloft.net>,
coreteam@netfilter.org, 1vier1@web.de
Subject: Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair
Date: Mon, 3 Jul 2017 21:01:14 +0200 [thread overview]
Message-ID: <53bbfa1a-2836-863d-3a5c-4f2f7f0baa40@colorfullife.com> (raw)
In-Reply-To: <20170703171420.GC2393@linux.vnet.ibm.com>
On 07/03/2017 07:14 PM, Paul E. McKenney wrote:
> On Mon, Jul 03, 2017 at 10:39:49AM -0400, Alan Stern wrote:
>> On Sat, 1 Jul 2017, Manfred Spraul wrote:
>>
>>> As we want to remove spin_unlock_wait() and replace it with explicit
>>> spin_lock()/spin_unlock() calls, we can use this to simplify the
>>> locking.
>>>
>>> In addition:
>>> - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
>>> - The new code avoids the backwards loop.
>>>
>>> Only slightly tested, I did not manage to trigger calls to
>>> nf_conntrack_all_lock().
>>>
>>> Fixes: b16c29191dc8
>>> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>>> Cc: <stable@vger.kernel.org>
>>> Cc: Sasha Levin <sasha.levin@oracle.com>
>>> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
>>> Cc: netfilter-devel@vger.kernel.org
>>> ---
>>> net/netfilter/nf_conntrack_core.c | 44 +++++++++++++++++++++------------------
>>> 1 file changed, 24 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>>> index e847dba..1193565 100644
>>> --- a/net/netfilter/nf_conntrack_core.c
>>> +++ b/net/netfilter/nf_conntrack_core.c
>>> @@ -96,19 +96,24 @@ static struct conntrack_gc_work conntrack_gc_work;
>>>
>>> void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
>>> {
>>> + /* 1) Acquire the lock */
>>> spin_lock(lock);
>>> - while (unlikely(nf_conntrack_locks_all)) {
>>> - spin_unlock(lock);
>>>
>>> - /*
>>> - * Order the 'nf_conntrack_locks_all' load vs. the
>>> - * spin_unlock_wait() loads below, to ensure
>>> - * that 'nf_conntrack_locks_all_lock' is indeed held:
>>> - */
>>> - smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
>>> - spin_unlock_wait(&nf_conntrack_locks_all_lock);
>>> - spin_lock(lock);
>>> - }
>>> + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
>>> + if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
>>> + return;
>> As far as I can tell, this read does not need to have ACQUIRE
>> semantics.
>>
>> You need to guarantee that two things can never happen:
>>
>> (1) We read nf_conntrack_locks_all == false, and this routine's
>> critical section for nf_conntrack_locks[i] runs after the
>> (empty) critical section for that lock in
>> nf_conntrack_all_lock().
>>
>> (2) We read nf_conntrack_locks_all == true, and this routine's
>> critical section for nf_conntrack_locks_all_lock runs before
>> the critical section in nf_conntrack_all_lock().
I was looking at nf_conntrack_all_unlock:
There is a smp_store_release() - which memory barrier does this pair with?
nf_conntrack_all_unlock()
<arbitrary writes>
smp_store_release(a, false)
spin_unlock(b);
nf_conntrack_lock()
spin_lock(c);
xx=read_once(a)
if (xx==false)
return
<arbitrary read>
>> In fact, neither one can happen even if smp_load_acquire() is replaced
>> with READ_ONCE(). The reason is simple enough, using this property of
>> spinlocks:
>>
>> If critical section CS1 runs before critical section CS2 (for
>> the same lock) then: (a) every write coming before CS1's
>> spin_unlock() will be visible to any read coming after CS2's
>> spin_lock(), and (b) no write coming after CS2's spin_lock()
>> will be visible to any read coming before CS1's spin_unlock().
Does this apply? The locks are different.
>> Thus for (1), assuming the critical sections run in the order mentioned
>> above, since nf_conntrack_all_lock() writes to nf_conntrack_locks_all
>> before releasing nf_conntrack_locks[i], and since nf_conntrack_lock()
>> acquires nf_conntrack_locks[i] before reading nf_conntrack_locks_all,
>> by (a) the read will always see the write.
>>
>> Similarly for (2), since nf_conntrack_all_lock() acquires
>> nf_conntrack_locks_all_lock before writing to nf_conntrack_locks_all,
>> and since nf_conntrack_lock() reads nf_conntrack_locks_all before
>> releasing nf_conntrack_locks_all_lock, by (b) the read cannot see the
>> write.
> And the Linux kernel memory model (https://lwn.net/Articles/718628/
> and https://lwn.net/Articles/720550/) agrees with Alan. Here is
> a litmus test, which emulates spin_lock() with xchg_acquire() and
> spin_unlock() with smp_store_release():
>
> ------------------------------------------------------------------------
>
> C C-ManfredSpraul-L1G1xchgnr.litmus
>
> (* Expected result: Never. *)
>
> {
> }
>
> P0(int *nfcla, spinlock_t *gbl, int *gbl_held, spinlock_t *lcl, int *lcl_held)
> {
> /* Acquire local lock. */
> r10 = xchg_acquire(lcl, 1);
> r1 = READ_ONCE(*nfcla);
> if (r1) {
> smp_store_release(lcl, 0);
> r11 = xchg_acquire(gbl, 1);
> r12 = xchg_acquire(lcl, 1);
> smp_store_release(gbl, 0);
> }
> r2 = READ_ONCE(*gbl_held);
> WRITE_ONCE(*lcl_held, 1);
> WRITE_ONCE(*lcl_held, 0);
> smp_store_release(lcl, 0);
> }
>
> P1(int *nfcla, spinlock_t *gbl, int *gbl_held, spinlock_t *lcl, int *lcl_held)
> {
> /* Acquire global lock. */
> r10 = xchg_acquire(gbl, 1);
> WRITE_ONCE(*nfcla, 1);
> r11 = xchg_acquire(lcl, 1);
> smp_store_release(lcl, 0);
> r2 = READ_ONCE(*lcl_held);
> WRITE_ONCE(*gbl_held, 1);
> WRITE_ONCE(*gbl_held, 0);
Where is the write that resets nfcla=0?
> smp_store_release(gbl, 0);
> }
>
> exists
> ((0:r2=1 \/ 1:r2=1) /\ 0:r10=0 /\ 0:r11=0 /\ 0:r12=0 /\ 1:r10=0 /\ 1:r11=0)
>
> ------------------------------------------------------------------------
>
> The memory model says that the forbidden state does not happen:
[...]
> Manfred, any objections to my changing your patch as Alan suggests?
I tried to pair the memory barriers:
nf_conntrack_all_unlock() contains a smp_store_release().
What does that pair with?
--
Manfred
next prev parent reply other threads:[~2017-07-03 19:01 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-29 23:59 [PATCH RFC 0/26] Remove spin_unlock_wait() Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
[not found] ` <a6642feb-2f3a-980f-5ed6-2deb79563e6b@colorfullife.com>
2017-07-02 2:00 ` Paul E. McKenney
2017-07-03 14:39 ` Alan Stern
2017-07-03 17:14 ` Paul E. McKenney
2017-07-03 19:01 ` Manfred Spraul [this message]
2017-07-03 19:57 ` Alan Stern
2017-07-06 18:43 ` Manfred Spraul
2017-07-03 20:04 ` Alan Stern
2017-07-03 20:53 ` Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 02/26] task_work: " Paul E. McKenney
2017-06-30 11:04 ` Oleg Nesterov
2017-06-30 12:50 ` Paul E. McKenney
2017-06-30 15:20 ` Oleg Nesterov
2017-06-30 16:16 ` Paul E. McKenney
2017-06-30 17:21 ` Paul E. McKenney
2017-06-30 19:21 ` Oleg Nesterov
2017-06-30 19:50 ` Alan Stern
2017-06-30 20:04 ` Paul E. McKenney
2017-06-30 20:02 ` Paul E. McKenney
2017-06-30 20:19 ` Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 03/26] sched: " Paul E. McKenney
2017-06-30 10:31 ` Arnd Bergmann
2017-06-30 12:35 ` Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 04/26] completion: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 05/26] exit: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 07/26] drivers/ata: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-06-30 9:19 ` Will Deacon
2017-06-30 12:38 ` Paul E. McKenney
2017-06-30 13:13 ` Will Deacon
2017-06-30 22:18 ` Paul E. McKenney
2017-07-03 13:15 ` Will Deacon
2017-07-03 16:18 ` Paul E. McKenney
2017-07-03 16:40 ` Linus Torvalds
2017-07-03 17:13 ` Will Deacon
2017-07-03 22:30 ` Paul E. McKenney
2017-07-03 22:49 ` Linus Torvalds
2017-07-04 0:39 ` Paul E. McKenney
2017-07-04 0:54 ` Paul E. McKenney
2017-07-03 21:10 ` Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 09/26] alpha: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 10/26] arc: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 11/26] arm: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 12/26] arm64: " Paul E. McKenney
2017-06-30 9:20 ` Will Deacon
2017-06-30 17:29 ` Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 13/26] blackfin: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 14/26] hexagon: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 15/26] ia64: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 16/26] m32r: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 18/26] mips: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 19/26] mn10300: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 20/26] parisc: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 21/26] powerpc: " Paul E. McKenney
2017-07-02 3:58 ` Boqun Feng
2017-07-05 23:57 ` Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 22/26] s390: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 23/26] sh: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 24/26] sparc: " Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 25/26] tile: " Paul E. McKenney
2017-06-30 0:06 ` Linus Torvalds
2017-06-30 0:09 ` Paul E. McKenney
2017-06-30 0:14 ` Paul E. McKenney
2017-06-30 0:10 ` Linus Torvalds
2017-06-30 0:24 ` Paul E. McKenney
2017-06-30 0:01 ` [PATCH RFC 26/26] xtensa: " Paul E. McKenney
[not found] ` <1498780894-8253-6-git-send-email-paulmck@linux.vnet.ibm.com>
2017-07-01 19:23 ` [PATCH RFC 06/26] ipc: Replace spin_unlock_wait() with lock/unlock pair Manfred Spraul
2017-07-02 3:16 ` Paul E. McKenney
2017-07-05 23:29 ` [PATCH v2 0/9] Remove spin_unlock_wait() Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
2017-07-06 18:45 ` Manfred Spraul
2017-07-06 20:26 ` Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 3/9] sched: " Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 4/9] completion: " Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 5/9] exit: " Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 6/9] ipc: " Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 7/9] drivers/ata: " Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 8/9] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-07-05 23:31 ` [PATCH v2 9/9] arch: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney
2017-07-06 14:12 ` [PATCH v2 0/9] Remove spin_unlock_wait() David Laight
2017-07-06 15:21 ` Paul E. McKenney
2017-07-06 16:10 ` Peter Zijlstra
2017-07-06 16:24 ` Paul E. McKenney
2017-07-06 16:41 ` Peter Zijlstra
2017-07-06 17:03 ` Paul E. McKenney
2017-07-06 16:49 ` Alan Stern
2017-07-06 16:54 ` Peter Zijlstra
2017-07-06 19:37 ` Alan Stern
2017-07-06 16:05 ` Peter Zijlstra
2017-07-06 16:20 ` Paul E. McKenney
2017-07-06 16:50 ` Peter Zijlstra
2017-07-06 17:08 ` Will Deacon
2017-07-06 17:29 ` Paul E. McKenney
2017-07-06 17:18 ` Paul E. McKenney
2017-07-07 8:31 ` Ingo Molnar
2017-07-07 8:44 ` Peter Zijlstra
2017-07-07 10:33 ` Ingo Molnar
2017-07-07 11:23 ` Peter Zijlstra
2017-07-07 14:41 ` Paul E. McKenney
2017-07-08 8:43 ` Ingo Molnar
2017-07-08 11:41 ` Paul E. McKenney
2017-07-07 17:47 ` Manfred Spraul
2017-07-08 8:35 ` Ingo Molnar
2017-07-08 11:39 ` Paul E. McKenney
2017-07-08 12:30 ` Ingo Molnar
2017-07-08 14:45 ` Paul E. McKenney
2017-07-08 16:21 ` Alan Stern
2017-07-10 17:22 ` Manfred Spraul
2017-07-07 8:06 ` Ingo Molnar
2017-07-07 9:32 ` Ingo Molnar
2017-07-07 19:27 ` [PATCH v3 " Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 3/9] sched: " Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 4/9] completion: " Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 5/9] exit: " Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 6/9] ipc: " Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 7/9] drivers/ata: " Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 8/9] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-07-07 19:28 ` [PATCH v3 9/9] arch: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53bbfa1a-2836-863d-3a5c-4f2f7f0baa40@colorfullife.com \
--to=manfred@colorfullife.com \
--cc=1vier1@web.de \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=coreteam@netfilter.org \
--cc=dave@stgolabs.net \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=kadlec@blackhole.kfki.hu \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=pablo@netfilter.org \
--cc=parri.andrea@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=stern@rowland.harvard.edu \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).