linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Herbert Xu <herbert@gondor.apana.org.au>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Fengguang Wu <fengguang.wu@intel.com>, LKP <lkp@01.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Netdev <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: rcu_read_lock lost its compiler barrier
Date: Mon, 3 Jun 2019 10:46:40 +0800	[thread overview]
Message-ID: <20190603024640.2soysu4rpkwjuash@gondor.apana.org.au> (raw)
In-Reply-To: <CAHk-=whLGKOmM++OQi5GX08_dfh8Xfz9Wq7khPo+MVtPYh_8hw@mail.gmail.com>

On Sun, Jun 02, 2019 at 01:54:12PM -0700, Linus Torvalds wrote:
> On Sat, Jun 1, 2019 at 10:56 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >
> > You can't then go and decide to remove the compiler barrier!  To do
> > that you'd need to audit every single use of rcu_read_lock in the
> > kernel to ensure that they're not depending on the compiler barrier.
> 
> What's the possible case where it would matter when there is no preemption?

The case we were discussing is from net/ipv4/inet_fragment.c from
the net-next tree:

void fqdir_exit(struct fqdir *fqdir)
{
	...
	fqdir->dead = true;

	/* call_rcu is supposed to provide memory barrier semantics,
	 * separating the setting of fqdir->dead with the destruction
	 * work.  This implicit barrier is paired with inet_frag_kill().
	 */

	INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
	queue_rcu_work(system_wq, &fqdir->destroy_rwork);
}

and

void inet_frag_kill(struct inet_frag_queue *fq)
{
		...
		rcu_read_lock();
		/* The RCU read lock provides a memory barrier
		 * guaranteeing that if fqdir->dead is false then
		 * the hash table destruction will not start until
		 * after we unlock.  Paired with inet_frags_exit_net().
		 */
		if (!fqdir->dead) {
			rhashtable_remove_fast(&fqdir->rhashtable, &fq->node,
					       fqdir->f->rhash_params);
			...
		}
		...
		rcu_read_unlock();
		...
}

I simplified this to

Initial values:

a = 0
b = 0

CPU1				CPU2
----				----
a = 1				rcu_read_lock
synchronize_rcu			if (a == 0)
b = 2					b = 1
				rcu_read_unlock

On exit we want this to be true:
b == 2

Now what Paul was telling me is that unless every memory operation
is done with READ_ONCE/WRITE_ONCE then his memory model shows that
the exit constraint won't hold.  IOW, we need

CPU1				CPU2
----				----
WRITE_ONCE(a, 1)		rcu_read_lock
synchronize_rcu			if (READ_ONCE(a) == 0)
WRITE_ONCE(b, 2)			WRITE_ONCE(b, 1)
				rcu_read_unlock

Now I think this bullshit because if we really needed these compiler
barriers then we surely would need real memory barriers to go with
them.

In fact, the sole purpose of the RCU mechanism is to provide those
memory barriers.  Quoting from
Documentation/RCU/Design/Requirements/Requirements.html:

<li>	Each CPU that has an RCU read-side critical section that
	begins before <tt>synchronize_rcu()</tt> starts is
	guaranteed to execute a full memory barrier between the time
	that the RCU read-side critical section ends and the time that
	<tt>synchronize_rcu()</tt> returns.
	Without this guarantee, a pre-existing RCU read-side critical section
	might hold a reference to the newly removed <tt>struct foo</tt>
	after the <tt>kfree()</tt> on line&nbsp;14 of
	<tt>remove_gp_synchronous()</tt>.
<li>	Each CPU that has an RCU read-side critical section that ends
	after <tt>synchronize_rcu()</tt> returns is guaranteed
	to execute a full memory barrier between the time that
	<tt>synchronize_rcu()</tt> begins and the time that the RCU
	read-side critical section begins.
	Without this guarantee, a later RCU read-side critical section
	running after the <tt>kfree()</tt> on line&nbsp;14 of
	<tt>remove_gp_synchronous()</tt> might
	later run <tt>do_something_gp()</tt> and find the
	newly deleted <tt>struct foo</tt>.

My review of the RCU code shows that these memory barriers are
indeed present (at least when we're not in tiny mode where all
this discussion would be moot anyway).  For example, in call_rcu
we eventually get down to rcu_segcblist_enqueue which has an smp_mb.
On the reader side (correct me if I'm wrong Paul) the memory
barrier is implicitly coming from the scheduler.

My point is that within our kernel whenever we have a CPU memory
barrier we always have a compiler barrier too.  Therefore my code
example above does not need any extra compiler barriers such as
the ones provided by READ_ONCE/WRITE_ONCE.

I think perhaps Paul was perhaps thinking that I'm expecting
rcu_read_lock/rcu_read_unlock themselves to provide the memory
or compiler barriers.  That would indeed be wrong but this is
not what I need.  All I need is the RCU semantics as documented
for there to be memory and compiler barriers around the whole
grace period.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

  reply	other threads:[~2019-06-03  2:46 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-10  0:57 [rcu] kernel BUG at include/linux/pagemap.h:149! Fengguang Wu
2015-09-10 10:25 ` Boqun Feng
2015-09-10 17:16   ` Paul E. McKenney
2015-09-11  2:19     ` Boqun Feng
     [not found]       ` <CAJzB8QG=1iZW3dQEie6ZSTLv8GZ3YSut0aL1VU7LLmiHQ1B1DQ@mail.gmail.com>
2015-09-11 21:59         ` Paul E. McKenney
2015-09-12  5:46           ` Boqun Feng
2015-09-21 19:30       ` Frederic Weisbecker
2015-09-21 20:43         ` Paul E. McKenney
2019-06-02  5:56           ` rcu_read_lock lost its compiler barrier Herbert Xu
2019-06-02 20:54             ` Linus Torvalds
2019-06-03  2:46               ` Herbert Xu [this message]
2019-06-03  3:47                 ` Paul E. McKenney
2019-06-03  4:01                   ` Herbert Xu
2019-06-03  4:17                     ` Herbert Xu
2019-06-03  7:23                     ` Paul E. McKenney
2019-06-03  8:42                       ` Paul E. McKenney
2019-06-03 15:26                         ` David Laight
2019-06-03 15:40                           ` Linus Torvalds
2019-06-03  5:26                   ` Herbert Xu
2019-06-03  6:42                     ` Boqun Feng
2019-06-03 20:03                       ` Paul E. McKenney
2019-06-04 14:44                         ` Alan Stern
2019-06-04 16:04                           ` Linus Torvalds
2019-06-04 17:00                             ` Alan Stern
2019-06-04 17:29                               ` Linus Torvalds
2019-06-07 14:09                             ` inet: frags: Turn fqdir->dead into an int for old Alphas Herbert Xu
2019-06-07 15:26                               ` Eric Dumazet
2019-06-07 15:32                                 ` Herbert Xu
2019-06-07 16:13                                   ` Eric Dumazet
2019-06-07 16:19                                 ` Linus Torvalds
2019-06-08 15:27                                   ` Paul E. McKenney
2019-06-08 17:42                                     ` Linus Torvalds
2019-06-08 17:50                                       ` Linus Torvalds
2019-06-08 18:50                                         ` Paul E. McKenney
2019-06-08 18:14                                       ` Paul E. McKenney
2019-06-06  4:51                           ` rcu_read_lock lost its compiler barrier Herbert Xu
2019-06-06  6:05                             ` Paul E. McKenney
2019-06-06  6:14                               ` Herbert Xu
2019-06-06  9:06                                 ` Paul E. McKenney
2019-06-06  9:28                                   ` Herbert Xu
2019-06-06 10:58                                     ` Paul E. McKenney
2019-06-06 13:38                                       ` Herbert Xu
2019-06-06 13:48                                         ` Paul E. McKenney
2019-06-06  8:16                           ` Andrea Parri
2019-06-06 14:19                             ` Alan Stern
2019-06-08 15:19                               ` Paul E. McKenney
2019-06-08 15:56                                 ` Alan Stern
2019-06-08 16:31                                   ` Paul E. McKenney
2019-06-03  9:35                     ` Paul E. McKenney
2019-06-06  8:38                 ` Andrea Parri
2019-06-06  9:32                   ` Herbert Xu
2019-06-03  0:06             ` Paul E. McKenney
2019-06-03  3:03               ` Herbert Xu
2019-06-03  9:27                 ` Paul E. McKenney
2019-06-03 15:55                 ` Linus Torvalds
2019-06-03 16:07                   ` Linus Torvalds
2019-06-03 19:53                     ` Paul E. McKenney
2019-06-03 20:24                       ` Linus Torvalds
2019-06-04 21:14                         ` Paul E. McKenney
2019-06-05  2:21                           ` Herbert Xu
2019-06-05  3:30                             ` Paul E. McKenney
2019-06-06  4:37                               ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190603024640.2soysu4rpkwjuash@gondor.apana.org.au \
    --to=herbert@gondor.apana.org.au \
    --cc=boqun.feng@gmail.com \
    --cc=davem@davemloft.net \
    --cc=fengguang.wu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).