All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan Stern <stern@rowland.harvard.edu>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Fengguang Wu <fengguang.wu@intel.com>, LKP <lkp@01.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Netdev <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrea Parri <andrea.parri@amarulasolutions.com>,
	Luc Maranget <luc.maranget@inria.fr>,
	Jade Alglave <j.alglave@ucl.ac.uk>
Subject: Re: rcu_read_lock lost its compiler barrier
Date: Tue, 4 Jun 2019 10:44:18 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.44L0.1906041026570.1731-100000@iolanthe.rowland.org> (raw)
In-Reply-To: <20190603200301.GM28207@linux.ibm.com>

On Mon, 3 Jun 2019, Paul E. McKenney wrote:

> On Mon, Jun 03, 2019 at 02:42:00PM +0800, Boqun Feng wrote:
> > On Mon, Jun 03, 2019 at 01:26:26PM +0800, Herbert Xu wrote:
> > > On Sun, Jun 02, 2019 at 08:47:07PM -0700, Paul E. McKenney wrote:
> > > > 
> > > > 1.	These guarantees are of full memory barriers, -not- compiler
> > > > 	barriers.
> > > 
> > > What I'm saying is that wherever they are, they must come with
> > > compiler barriers.  I'm not aware of any synchronisation mechanism
> > > in the kernel that gives a memory barrier without a compiler barrier.
> > > 
> > > > 2.	These rules don't say exactly where these full memory barriers
> > > > 	go.  SRCU is at one extreme, placing those full barriers in
> > > > 	srcu_read_lock() and srcu_read_unlock(), and !PREEMPT Tree RCU
> > > > 	at the other, placing these barriers entirely within the callback
> > > > 	queueing/invocation, grace-period computation, and the scheduler.
> > > > 	Preemptible Tree RCU is in the middle, with rcu_read_unlock()
> > > > 	sometimes including a full memory barrier, but other times with
> > > > 	the full memory barrier being confined as it is with !PREEMPT
> > > > 	Tree RCU.
> > > 
> > > The rules do say that the (full) memory barrier must precede any
> > > RCU read-side that occur after the synchronize_rcu and after the
> > > end of any RCU read-side that occur before the synchronize_rcu.
> > > 
> > > All I'm arguing is that wherever that full mb is, as long as it
> > > also carries with it a barrier() (which it must do if it's done
> > > using an existing kernel mb/locking primitive), then we're fine.
> > > 
> > > > Interleaving and inserting full memory barriers as per the rules above:
> > > > 
> > > > 	CPU1: WRITE_ONCE(a, 1)
> > > > 	CPU1: synchronize_rcu	
> > > > 	/* Could put a full memory barrier here, but it wouldn't help. */
> > > 
> > > 	CPU1: smp_mb();
> > > 	CPU2: smp_mb();
> > > 
> > > Let's put them in because I think they are critical.  smp_mb() also
> > > carries with it a barrier().
> > > 
> > > > 	CPU2: rcu_read_lock();
> > > > 	CPU1: b = 2;	
> > > > 	CPU2: if (READ_ONCE(a) == 0)
> > > > 	CPU2:         if (b != 1)  /* Weakly ordered CPU moved this up! */
> > > > 	CPU2:                 b = 1;
> > > > 	CPU2: rcu_read_unlock
> > > > 
> > > > In fact, CPU2's load from b might be moved up to race with CPU1's store,
> > > > which (I believe) is why the model complains in this case.
> > > 
> > > Let's put aside my doubt over how we're even allowing a compiler
> > > to turn
> > > 
> > > 	b = 1
> > > 
> > > into
> > > 
> > > 	if (b != 1)
> > > 		b = 1

Even if you don't think the compiler will ever do this, the C standard
gives compilers the right to invent read accesses if a plain (i.e.,
non-atomic and non-volatile) write is present.  The Linux Kernel Memory
Model has to assume that compilers will sometimes do this, even if it
doesn't take the exact form of checking a variable's value before
writing to it.

(Incidentally, regardless of whether the compiler will ever do this, I 
have seen examples in the kernel where people did exactly this 
manually, in order to avoid dirtying a cache line unnecessarily.)

> > > Since you seem to be assuming that (a == 0) is true in this case
> > 
> > I think Paul's example assuming (a == 0) is false, and maybe
> 
> Yes, otherwise, P0()'s write to "b" cannot have happened.
> 
> > speculative writes (by compilers) needs to added into consideration?

On the other hand, the C standard does not allow compilers to add
speculative writes.  The LKMM assumes they will never occur.

> I would instead call it the compiler eliminating needless writes
> by inventing reads -- if the variable already has the correct value,
> no write happens.  So no compiler speculation.
> 
> However, it is difficult to create a solid defensible example.  Yes,
> from LKMM's viewpoint, the weakly reordered invented read from "b"
> can be concurrent with P0()'s write to "b", but in that case the value
> loaded would have to manage to be equal to 1 for anything bad to happen.
> This does feel wrong to me, but again, it is difficult to create a solid
> defensible example.
> 
> > Please consider the following case (I add a few smp_mb()s), the case may
> > be a little bit crasy, you have been warned ;-)
> > 
> >  	CPU1: WRITE_ONCE(a, 1)
> >  	CPU1: synchronize_rcu called
> > 
> >  	CPU1: smp_mb(); /* let assume there is one here */
> > 
> >  	CPU2: rcu_read_lock();
> >  	CPU2: smp_mb(); /* let assume there is one here */
> > 
> > 	/* "if (b != 1) b = 1" reordered  */
> >  	CPU2: r0 = b;       /* if (b != 1) reordered here, r0 == 0 */
> >  	CPU2: if (r0 != 1)  /* true */
> > 	CPU2:     b = 1;    /* b == 1 now, this is a speculative write
> > 	                       by compiler
> > 			     */
> > 
> > 	CPU1: b = 2;        /* b == 2 */
> > 
> >  	CPU2: if (READ_ONCE(a) == 0) /* false */
> > 	CPU2: ...
> > 	CPU2  else                   /* undo the speculative write */
> > 	CPU2:	  b = r0;   /* b == 0 */
> > 
> >  	CPU2: smp_mb();
> > 	CPU2: read_read_unlock();
> > 
> > I know this is too crasy for us to think a compiler like this, but this
> > might be the reason why the model complain about this.
> > 
> > Paul, did I get this right? Or you mean something else?
> 
> Mostly there, except that I am not yet desperate enough to appeal to
> compilers speculating stores.  ;-)

This example really does point out a weakness in the LKMM's handling of 
data races.  Herbert's litmus test is a great starting point:


C xu

{}

P0(int *a, int *b)
{
	WRITE_ONCE(*a, 1);
	synchronize_rcu();
	*b = 2;
}

P1(int *a, int *b)
{
	rcu_read_lock();
	if (READ_ONCE(*a) == 0)
		*b = 1;
	rcu_read_unlock();
}

exists (~b=2)


Currently the LKMM says the test is allowed and there is a data race, 
but this answer clearly is wrong since it would violate the RCU 
guarantee.

The problem is that the LKMM currently requires all ordering/visibility
of plain accesses to be mediated by marked accesses.  But in this case,
the visibility is mediated by RCU.  Technically, we need to add a
relation like

	([M] ; po ; rcu-fence ; po ; [M])

into the definitions of ww-vis, wr-vis, and rw-xbstar.  Doing so
changes the litmus test's result to "not allowed" and no data race.  
However, I'm not certain that this single change is the entire fix;  
more thought is needed.

Alan


WARNING: multiple messages have this Message-ID (diff)
From: Alan Stern <stern@rowland.harvard.edu>
To: lkp@lists.01.org
Subject: Re: rcu_read_lock lost its compiler barrier
Date: Tue, 04 Jun 2019 10:44:18 -0400	[thread overview]
Message-ID: <Pine.LNX.4.44L0.1906041026570.1731-100000@iolanthe.rowland.org> (raw)
In-Reply-To: <20190603200301.GM28207@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 6509 bytes --]

On Mon, 3 Jun 2019, Paul E. McKenney wrote:

> On Mon, Jun 03, 2019 at 02:42:00PM +0800, Boqun Feng wrote:
> > On Mon, Jun 03, 2019 at 01:26:26PM +0800, Herbert Xu wrote:
> > > On Sun, Jun 02, 2019 at 08:47:07PM -0700, Paul E. McKenney wrote:
> > > > 
> > > > 1.	These guarantees are of full memory barriers, -not- compiler
> > > > 	barriers.
> > > 
> > > What I'm saying is that wherever they are, they must come with
> > > compiler barriers.  I'm not aware of any synchronisation mechanism
> > > in the kernel that gives a memory barrier without a compiler barrier.
> > > 
> > > > 2.	These rules don't say exactly where these full memory barriers
> > > > 	go.  SRCU is at one extreme, placing those full barriers in
> > > > 	srcu_read_lock() and srcu_read_unlock(), and !PREEMPT Tree RCU
> > > > 	at the other, placing these barriers entirely within the callback
> > > > 	queueing/invocation, grace-period computation, and the scheduler.
> > > > 	Preemptible Tree RCU is in the middle, with rcu_read_unlock()
> > > > 	sometimes including a full memory barrier, but other times with
> > > > 	the full memory barrier being confined as it is with !PREEMPT
> > > > 	Tree RCU.
> > > 
> > > The rules do say that the (full) memory barrier must precede any
> > > RCU read-side that occur after the synchronize_rcu and after the
> > > end of any RCU read-side that occur before the synchronize_rcu.
> > > 
> > > All I'm arguing is that wherever that full mb is, as long as it
> > > also carries with it a barrier() (which it must do if it's done
> > > using an existing kernel mb/locking primitive), then we're fine.
> > > 
> > > > Interleaving and inserting full memory barriers as per the rules above:
> > > > 
> > > > 	CPU1: WRITE_ONCE(a, 1)
> > > > 	CPU1: synchronize_rcu	
> > > > 	/* Could put a full memory barrier here, but it wouldn't help. */
> > > 
> > > 	CPU1: smp_mb();
> > > 	CPU2: smp_mb();
> > > 
> > > Let's put them in because I think they are critical.  smp_mb() also
> > > carries with it a barrier().
> > > 
> > > > 	CPU2: rcu_read_lock();
> > > > 	CPU1: b = 2;	
> > > > 	CPU2: if (READ_ONCE(a) == 0)
> > > > 	CPU2:         if (b != 1)  /* Weakly ordered CPU moved this up! */
> > > > 	CPU2:                 b = 1;
> > > > 	CPU2: rcu_read_unlock
> > > > 
> > > > In fact, CPU2's load from b might be moved up to race with CPU1's store,
> > > > which (I believe) is why the model complains in this case.
> > > 
> > > Let's put aside my doubt over how we're even allowing a compiler
> > > to turn
> > > 
> > > 	b = 1
> > > 
> > > into
> > > 
> > > 	if (b != 1)
> > > 		b = 1

Even if you don't think the compiler will ever do this, the C standard
gives compilers the right to invent read accesses if a plain (i.e.,
non-atomic and non-volatile) write is present.  The Linux Kernel Memory
Model has to assume that compilers will sometimes do this, even if it
doesn't take the exact form of checking a variable's value before
writing to it.

(Incidentally, regardless of whether the compiler will ever do this, I 
have seen examples in the kernel where people did exactly this 
manually, in order to avoid dirtying a cache line unnecessarily.)

> > > Since you seem to be assuming that (a == 0) is true in this case
> > 
> > I think Paul's example assuming (a == 0) is false, and maybe
> 
> Yes, otherwise, P0()'s write to "b" cannot have happened.
> 
> > speculative writes (by compilers) needs to added into consideration?

On the other hand, the C standard does not allow compilers to add
speculative writes.  The LKMM assumes they will never occur.

> I would instead call it the compiler eliminating needless writes
> by inventing reads -- if the variable already has the correct value,
> no write happens.  So no compiler speculation.
> 
> However, it is difficult to create a solid defensible example.  Yes,
> from LKMM's viewpoint, the weakly reordered invented read from "b"
> can be concurrent with P0()'s write to "b", but in that case the value
> loaded would have to manage to be equal to 1 for anything bad to happen.
> This does feel wrong to me, but again, it is difficult to create a solid
> defensible example.
> 
> > Please consider the following case (I add a few smp_mb()s), the case may
> > be a little bit crasy, you have been warned ;-)
> > 
> >  	CPU1: WRITE_ONCE(a, 1)
> >  	CPU1: synchronize_rcu called
> > 
> >  	CPU1: smp_mb(); /* let assume there is one here */
> > 
> >  	CPU2: rcu_read_lock();
> >  	CPU2: smp_mb(); /* let assume there is one here */
> > 
> > 	/* "if (b != 1) b = 1" reordered  */
> >  	CPU2: r0 = b;       /* if (b != 1) reordered here, r0 == 0 */
> >  	CPU2: if (r0 != 1)  /* true */
> > 	CPU2:     b = 1;    /* b == 1 now, this is a speculative write
> > 	                       by compiler
> > 			     */
> > 
> > 	CPU1: b = 2;        /* b == 2 */
> > 
> >  	CPU2: if (READ_ONCE(a) == 0) /* false */
> > 	CPU2: ...
> > 	CPU2  else                   /* undo the speculative write */
> > 	CPU2:	  b = r0;   /* b == 0 */
> > 
> >  	CPU2: smp_mb();
> > 	CPU2: read_read_unlock();
> > 
> > I know this is too crasy for us to think a compiler like this, but this
> > might be the reason why the model complain about this.
> > 
> > Paul, did I get this right? Or you mean something else?
> 
> Mostly there, except that I am not yet desperate enough to appeal to
> compilers speculating stores.  ;-)

This example really does point out a weakness in the LKMM's handling of 
data races.  Herbert's litmus test is a great starting point:


C xu

{}

P0(int *a, int *b)
{
	WRITE_ONCE(*a, 1);
	synchronize_rcu();
	*b = 2;
}

P1(int *a, int *b)
{
	rcu_read_lock();
	if (READ_ONCE(*a) == 0)
		*b = 1;
	rcu_read_unlock();
}

exists (~b=2)


Currently the LKMM says the test is allowed and there is a data race, 
but this answer clearly is wrong since it would violate the RCU 
guarantee.

The problem is that the LKMM currently requires all ordering/visibility
of plain accesses to be mediated by marked accesses.  But in this case,
the visibility is mediated by RCU.  Technically, we need to add a
relation like

	([M] ; po ; rcu-fence ; po ; [M])

into the definitions of ww-vis, wr-vis, and rw-xbstar.  Doing so
changes the litmus test's result to "not allowed" and no data race.  
However, I'm not certain that this single change is the entire fix;  
more thought is needed.

Alan


  reply	other threads:[~2019-06-04 14:44 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-10  0:57 [rcu] kernel BUG at include/linux/pagemap.h:149! Fengguang Wu
2015-09-10  0:57 ` Fengguang Wu
2015-09-10 10:25 ` Boqun Feng
2015-09-10 17:16   ` Paul E. McKenney
2015-09-10 17:16     ` Paul E. McKenney
2015-09-11  2:19     ` Boqun Feng
     [not found]       ` <CAJzB8QG=1iZW3dQEie6ZSTLv8GZ3YSut0aL1VU7LLmiHQ1B1DQ@mail.gmail.com>
2015-09-11 21:59         ` Paul E. McKenney
2015-09-11 21:59           ` Paul E. McKenney
2015-09-12  5:46           ` Boqun Feng
2015-09-21 19:30       ` Frederic Weisbecker
2015-09-21 19:30         ` Frederic Weisbecker
2015-09-21 20:43         ` Paul E. McKenney
2015-09-21 20:43           ` Paul E. McKenney
2019-06-02  5:56           ` rcu_read_lock lost its compiler barrier Herbert Xu
2019-06-02  5:56             ` Herbert Xu
2019-06-02 20:54             ` Linus Torvalds
2019-06-02 20:54               ` Linus Torvalds
2019-06-03  2:46               ` Herbert Xu
2019-06-03  2:46                 ` Herbert Xu
2019-06-03  3:47                 ` Paul E. McKenney
2019-06-03  4:01                   ` Herbert Xu
2019-06-03  4:01                     ` Herbert Xu
2019-06-03  4:17                     ` Herbert Xu
2019-06-03  4:17                       ` Herbert Xu
2019-06-03  7:23                     ` Paul E. McKenney
2019-06-03  8:42                       ` Paul E. McKenney
2019-06-03 15:26                         ` David Laight
2019-06-03 15:40                           ` Linus Torvalds
2019-06-03 15:40                             ` Linus Torvalds
2019-06-03  5:26                   ` Herbert Xu
2019-06-03  5:26                     ` Herbert Xu
2019-06-03  6:42                     ` Boqun Feng
2019-06-03  6:42                       ` Boqun Feng
2019-06-03 20:03                       ` Paul E. McKenney
2019-06-04 14:44                         ` Alan Stern [this message]
2019-06-04 14:44                           ` Alan Stern
2019-06-04 16:04                           ` Linus Torvalds
2019-06-04 16:04                             ` Linus Torvalds
2019-06-04 17:00                             ` Alan Stern
2019-06-04 17:00                               ` Alan Stern
2019-06-04 17:29                               ` Linus Torvalds
2019-06-04 17:29                                 ` Linus Torvalds
2019-06-07 14:09                             ` inet: frags: Turn fqdir->dead into an int for old Alphas Herbert Xu
2019-06-07 14:09                               ` Herbert Xu
2019-06-07 15:26                               ` Eric Dumazet
2019-06-07 15:26                                 ` Eric Dumazet
2019-06-07 15:32                                 ` Herbert Xu
2019-06-07 15:32                                   ` Herbert Xu
2019-06-07 16:13                                   ` Eric Dumazet
2019-06-07 16:13                                     ` Eric Dumazet
2019-06-07 16:19                                 ` Linus Torvalds
2019-06-07 16:19                                   ` Linus Torvalds
2019-06-08 15:27                                   ` Paul E. McKenney
2019-06-08 17:42                                     ` Linus Torvalds
2019-06-08 17:42                                       ` Linus Torvalds
2019-06-08 17:50                                       ` Linus Torvalds
2019-06-08 17:50                                         ` Linus Torvalds
2019-06-08 18:50                                         ` Paul E. McKenney
2019-06-08 18:14                                       ` Paul E. McKenney
2019-06-06  4:51                           ` rcu_read_lock lost its compiler barrier Herbert Xu
2019-06-06  4:51                             ` Herbert Xu
2019-06-06  6:05                             ` Paul E. McKenney
2019-06-06  6:14                               ` Herbert Xu
2019-06-06  6:14                                 ` Herbert Xu
2019-06-06  9:06                                 ` Paul E. McKenney
2019-06-06  9:28                                   ` Herbert Xu
2019-06-06  9:28                                     ` Herbert Xu
2019-06-06 10:58                                     ` Paul E. McKenney
2019-06-06 13:38                                       ` Herbert Xu
2019-06-06 13:38                                         ` Herbert Xu
2019-06-06 13:48                                         ` Paul E. McKenney
2019-06-06  8:16                           ` Andrea Parri
2019-06-06 14:19                             ` Alan Stern
2019-06-06 14:19                               ` Alan Stern
2019-06-08 15:19                               ` Paul E. McKenney
2019-06-08 15:56                                 ` Alan Stern
2019-06-08 15:56                                   ` Alan Stern
2019-06-08 16:31                                   ` Paul E. McKenney
2019-06-03  9:35                     ` Paul E. McKenney
2019-06-06  8:38                 ` Andrea Parri
2019-06-06  9:32                   ` Herbert Xu
2019-06-06  9:32                     ` Herbert Xu
2019-06-03  0:06             ` Paul E. McKenney
2019-06-03  3:03               ` Herbert Xu
2019-06-03  3:03                 ` Herbert Xu
2019-06-03  9:27                 ` Paul E. McKenney
2019-06-03 15:55                 ` Linus Torvalds
2019-06-03 15:55                   ` Linus Torvalds
2019-06-03 16:07                   ` Linus Torvalds
2019-06-03 16:07                     ` Linus Torvalds
2019-06-03 19:53                     ` Paul E. McKenney
2019-06-03 20:24                       ` Linus Torvalds
2019-06-03 20:24                         ` Linus Torvalds
2019-06-04 21:14                         ` Paul E. McKenney
2019-06-05  2:21                           ` Herbert Xu
2019-06-05  2:21                             ` Herbert Xu
2019-06-05  3:30                             ` Paul E. McKenney
2019-06-06  4:37                               ` Herbert Xu
2019-06-06  4:37                                 ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44L0.1906041026570.1731-100000@iolanthe.rowland.org \
    --to=stern@rowland.harvard.edu \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=boqun.feng@gmail.com \
    --cc=davem@davemloft.net \
    --cc=fengguang.wu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=j.alglave@ucl.ac.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=luc.maranget@inria.fr \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.