linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alan Stern <stern@rowland.harvard.edu>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	<linux-kernel@vger.kernel.org>, <netfilter-devel@vger.kernel.org>,
	<netdev@vger.kernel.org>, <oleg@redhat.com>,
	<akpm@linux-foundation.org>, <mingo@redhat.com>,
	<dave@stgolabs.net>, <tj@kernel.org>, <arnd@arndb.de>,
	<linux-arch@vger.kernel.org>, <will.deacon@arm.com>,
	<peterz@infradead.org>, <parri.andrea@gmail.com>,
	<torvalds@linux-foundation.org>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
	Florian Westphal <fw@strlen.de>,
	"David S. Miller" <davem@davemloft.net>, <coreteam@netfilter.org>
Subject: Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair
Date: Mon, 3 Jul 2017 10:39:49 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.44L0.1707031012180.2027-100000@iolanthe.rowland.org> (raw)
In-Reply-To: <a6642feb-2f3a-980f-5ed6-2deb79563e6b@colorfullife.com>

On Sat, 1 Jul 2017, Manfred Spraul wrote:

> As we want to remove spin_unlock_wait() and replace it with explicit
> spin_lock()/spin_unlock() calls, we can use this to simplify the
> locking.
> 
> In addition:
> - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
> - The new code avoids the backwards loop.
> 
> Only slightly tested, I did not manage to trigger calls to
> nf_conntrack_all_lock().
> 
> Fixes: b16c29191dc8
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> Cc: <stable@vger.kernel.org>
> Cc: Sasha Levin <sasha.levin@oracle.com>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: netfilter-devel@vger.kernel.org
> ---
>  net/netfilter/nf_conntrack_core.c | 44 +++++++++++++++++++++------------------
>  1 file changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index e847dba..1193565 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -96,19 +96,24 @@ static struct conntrack_gc_work conntrack_gc_work;
>  
>  void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
>  {
> +	/* 1) Acquire the lock */
>  	spin_lock(lock);
> -	while (unlikely(nf_conntrack_locks_all)) {
> -		spin_unlock(lock);
>  
> -		/*
> -		 * Order the 'nf_conntrack_locks_all' load vs. the
> -		 * spin_unlock_wait() loads below, to ensure
> -		 * that 'nf_conntrack_locks_all_lock' is indeed held:
> -		 */
> -		smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
> -		spin_unlock_wait(&nf_conntrack_locks_all_lock);
> -		spin_lock(lock);
> -	}
> +	/* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
> +	if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
> +		return;

As far as I can tell, this read does not need to have ACQUIRE
semantics.

You need to guarantee that two things can never happen:

    (1) We read nf_conntrack_locks_all == false, and this routine's
	critical section for nf_conntrack_locks[i] runs after the
	(empty) critical section for that lock in 
	nf_conntrack_all_lock().

    (2) We read nf_conntrack_locks_all == true, and this routine's 
	critical section for nf_conntrack_locks_all_lock runs before 
	the critical section in nf_conntrack_all_lock().

In fact, neither one can happen even if smp_load_acquire() is replaced
with READ_ONCE().  The reason is simple enough, using this property of
spinlocks:

	If critical section CS1 runs before critical section CS2 (for 
	the same lock) then: (a) every write coming before CS1's
	spin_unlock() will be visible to any read coming after CS2's
	spin_lock(), and (b) no write coming after CS2's spin_lock()
	will be visible to any read coming before CS1's spin_unlock().

Thus for (1), assuming the critical sections run in the order mentioned
above, since nf_conntrack_all_lock() writes to nf_conntrack_locks_all
before releasing nf_conntrack_locks[i], and since nf_conntrack_lock()
acquires nf_conntrack_locks[i] before reading nf_conntrack_locks_all,
by (a) the read will always see the write.

Similarly for (2), since nf_conntrack_all_lock() acquires 
nf_conntrack_locks_all_lock before writing to nf_conntrack_locks_all, 
and since nf_conntrack_lock() reads nf_conntrack_locks_all before 
releasing nf_conntrack_locks_all_lock, by (b) the read cannot see the 
write.

Alan Stern

> +
> +	/* fast path failed, unlock */
> +	spin_unlock(lock);
> +
> +	/* Slow path 1) get global lock */
> +	spin_lock(&nf_conntrack_locks_all_lock);
> +
> +	/* Slow path 2) get the lock we want */
> +	spin_lock(lock);
> +
> +	/* Slow path 3) release the global lock */
> +	spin_unlock(&nf_conntrack_locks_all_lock);
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_lock);
>  
> @@ -149,18 +154,17 @@ static void nf_conntrack_all_lock(void)
>  	int i;
>  
>  	spin_lock(&nf_conntrack_locks_all_lock);
> -	nf_conntrack_locks_all = true;
>  
> -	/*
> -	 * Order the above store of 'nf_conntrack_locks_all' against
> -	 * the spin_unlock_wait() loads below, such that if
> -	 * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
> -	 * we must observe nf_conntrack_locks[] held:
> -	 */
> -	smp_mb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
> +	nf_conntrack_locks_all = true;
>  
>  	for (i = 0; i < CONNTRACK_LOCKS; i++) {
> -		spin_unlock_wait(&nf_conntrack_locks[i]);
> +		spin_lock(&nf_conntrack_locks[i]);
> +
> +		/* This spin_unlock provides the "release" to ensure that
> +		 * nf_conntrack_locks_all==true is visible to everyone that
> +		 * acquired spin_lock(&nf_conntrack_locks[]).
> +		 */
> +		spin_unlock(&nf_conntrack_locks[i]);
>  	}
>  }

  parent reply	other threads:[~2017-07-03 14:40 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-29 23:59 [PATCH RFC 0/26] Remove spin_unlock_wait() Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
     [not found]   ` <a6642feb-2f3a-980f-5ed6-2deb79563e6b@colorfullife.com>
2017-07-02  2:00     ` Paul E. McKenney
2017-07-03 14:39     ` Alan Stern [this message]
2017-07-03 17:14       ` Paul E. McKenney
2017-07-03 19:01         ` Manfred Spraul
2017-07-03 19:57           ` Alan Stern
2017-07-06 18:43             ` Manfred Spraul
2017-07-03 20:04         ` Alan Stern
2017-07-03 20:53           ` Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 02/26] task_work: " Paul E. McKenney
2017-06-30 11:04   ` Oleg Nesterov
2017-06-30 12:50     ` Paul E. McKenney
2017-06-30 15:20       ` Oleg Nesterov
2017-06-30 16:16         ` Paul E. McKenney
2017-06-30 17:21           ` Paul E. McKenney
2017-06-30 19:21           ` Oleg Nesterov
2017-06-30 19:50             ` Alan Stern
2017-06-30 20:04               ` Paul E. McKenney
2017-06-30 20:02             ` Paul E. McKenney
2017-06-30 20:19               ` Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 03/26] sched: " Paul E. McKenney
2017-06-30 10:31   ` Arnd Bergmann
2017-06-30 12:35     ` Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 04/26] completion: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 05/26] exit: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 07/26] drivers/ata: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 08/26] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-06-30  9:19   ` Will Deacon
2017-06-30 12:38     ` Paul E. McKenney
2017-06-30 13:13       ` Will Deacon
2017-06-30 22:18         ` Paul E. McKenney
2017-07-03 13:15           ` Will Deacon
2017-07-03 16:18             ` Paul E. McKenney
2017-07-03 16:40               ` Linus Torvalds
2017-07-03 17:13                 ` Will Deacon
2017-07-03 22:30                   ` Paul E. McKenney
2017-07-03 22:49                     ` Linus Torvalds
2017-07-04  0:39                       ` Paul E. McKenney
2017-07-04  0:54                         ` Paul E. McKenney
2017-07-03 21:10                 ` Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 09/26] alpha: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 10/26] arc: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 11/26] arm: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 12/26] arm64: " Paul E. McKenney
2017-06-30  9:20   ` Will Deacon
2017-06-30 17:29     ` Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 13/26] blackfin: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 14/26] hexagon: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 15/26] ia64: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 16/26] m32r: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 18/26] mips: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 19/26] mn10300: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 20/26] parisc: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 21/26] powerpc: " Paul E. McKenney
2017-07-02  3:58   ` Boqun Feng
2017-07-05 23:57     ` Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 22/26] s390: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 23/26] sh: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 24/26] sparc: " Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 25/26] tile: " Paul E. McKenney
2017-06-30  0:06   ` Linus Torvalds
2017-06-30  0:09     ` Paul E. McKenney
2017-06-30  0:14       ` Paul E. McKenney
2017-06-30  0:10     ` Linus Torvalds
2017-06-30  0:24       ` Paul E. McKenney
2017-06-30  0:01 ` [PATCH RFC 26/26] xtensa: " Paul E. McKenney
     [not found] ` <1498780894-8253-6-git-send-email-paulmck@linux.vnet.ibm.com>
2017-07-01 19:23   ` [PATCH RFC 06/26] ipc: Replace spin_unlock_wait() with lock/unlock pair Manfred Spraul
2017-07-02  3:16     ` Paul E. McKenney
2017-07-05 23:29 ` [PATCH v2 0/9] Remove spin_unlock_wait() Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
2017-07-06 18:45     ` Manfred Spraul
2017-07-06 20:26       ` Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 3/9] sched: " Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 4/9] completion: " Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 5/9] exit: " Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 6/9] ipc: " Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 7/9] drivers/ata: " Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 8/9] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-07-05 23:31   ` [PATCH v2 9/9] arch: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney
2017-07-06 14:12   ` [PATCH v2 0/9] Remove spin_unlock_wait() David Laight
2017-07-06 15:21     ` Paul E. McKenney
2017-07-06 16:10       ` Peter Zijlstra
2017-07-06 16:24         ` Paul E. McKenney
2017-07-06 16:41           ` Peter Zijlstra
2017-07-06 17:03             ` Paul E. McKenney
2017-07-06 16:49           ` Alan Stern
2017-07-06 16:54             ` Peter Zijlstra
2017-07-06 19:37               ` Alan Stern
2017-07-06 16:05     ` Peter Zijlstra
2017-07-06 16:20       ` Paul E. McKenney
2017-07-06 16:50         ` Peter Zijlstra
2017-07-06 17:08           ` Will Deacon
2017-07-06 17:29             ` Paul E. McKenney
2017-07-06 17:18           ` Paul E. McKenney
2017-07-07  8:31           ` Ingo Molnar
2017-07-07  8:44             ` Peter Zijlstra
2017-07-07 10:33               ` Ingo Molnar
2017-07-07 11:23                 ` Peter Zijlstra
2017-07-07 14:41             ` Paul E. McKenney
2017-07-08  8:43               ` Ingo Molnar
2017-07-08 11:41                 ` Paul E. McKenney
2017-07-07 17:47             ` Manfred Spraul
2017-07-08  8:35               ` Ingo Molnar
2017-07-08 11:39                 ` Paul E. McKenney
2017-07-08 12:30                   ` Ingo Molnar
2017-07-08 14:45                     ` Paul E. McKenney
2017-07-08 16:21                     ` Alan Stern
2017-07-10 17:22                       ` Manfred Spraul
2017-07-07  8:06       ` Ingo Molnar
2017-07-07  9:32         ` Ingo Molnar
2017-07-07 19:27   ` [PATCH v3 " Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 1/9] net/netfilter/nf_conntrack_core: Fix net_conntrack_lock() Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 2/9] task_work: Replace spin_unlock_wait() with lock/unlock pair Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 3/9] sched: " Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 4/9] completion: " Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 5/9] exit: " Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 6/9] ipc: " Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 7/9] drivers/ata: " Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 8/9] locking: Remove spin_unlock_wait() generic definitions Paul E. McKenney
2017-07-07 19:28     ` [PATCH v3 9/9] arch: Remove spin_unlock_wait() arch-specific definitions Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44L0.1707031012180.2027-100000@iolanthe.rowland.org \
    --to=stern@rowland.harvard.edu \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=coreteam@netfilter.org \
    --cc=dave@stgolabs.net \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).