Re: Plain accesses and data races in the Linux Kernel Memory Model

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Plain accesses and data races in the Linux Kernel Memory Model
       [not found] ` <20190114235426.GV1215@linux.ibm.com>
@ 2019-01-15  7:20   ` Dmitry Vyukov
  2019-01-15 15:03     ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2019-01-15  7:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Alan Stern, LKMM Maintainers -- Akira Yokosawa, Andrea Parri,
	Boqun Feng, Daniel Lustig, David Howells, Jade Alglave,
	Luc Maranget, Nicholas Piggin, Peter Zijlstra, Will Deacon, LKML

On Tue, Jan 15, 2019 at 12:54 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
>
> On Mon, Jan 14, 2019 at 02:41:49PM -0500, Alan Stern wrote:
> > The patch below is my first attempt at adapting the Linux Kernel
> > Memory Model to handle plain accesses (i.e., those which aren't
> > specially marked as READ_ONCE, WRITE_ONCE, acquire, release,
> > read-modify-write, or lock accesses).  This work is based on an
> > initial proposal created by Andrea Parri back in December 2017,
> > although it has grown a lot since then.
>
> Hello, Alan,
>
> Good stuff!!!
>
> I tried applying this in order to test it against the various litmus
> tests, but no joy.  Could you please tell me what commit is this patch
> based on?
>
>                                                         Thanx, Paul
>
> > The adaptation involves two main aspects: recognizing the ordering
> > induced by plain accesses and detecting data races.  They are handled
> > separately.  In fact, the code for figuring out the ordering assumes
> > there are no data races (the idea being that if a data race is
> > present then pretty much anything could happen, so there's no point
> > worrying about it -- obviously this will have to be changed if we want
> > to cover seqlocks).

Hi Alan,

Is there a mailing list dedicated to this effort? Private messages
tend to lost over time, no archive, not possible to send a link or
show full history to anybody, etc.

Re seqlocks, strictly saying defining races for seqlocks is not
necessary. Seqlocks can be expressed without races in C by using
relaxed atomic loads within the read critical section. We may consider
this option as well.


> > This is a relativly major change to the model and it will require a
> > lot of scrutiny and testing.  At the moment, I haven't even tried to
> > compare it with the existing model on our library of litmus tests.
> >
> > The difficulty with incorporating plain accesses in the memory model
> > is that the compiler has very few constraints on how it treats plain
> > accesses.  It can eliminate them, duplicate them, rearrange them,
> > merge them, split them up, and goodness knows what else.  To make some
> > sense of this, I have taken the view that a plain access can exist
> > (perhaps multiple times) within a certain bounded region of code.
> > Ordering of two accesses X and Y means that we guarantee at least one
> > instance of the X access must be executed before any instances of the
> > Y access.  (This is assuming that neither of the accesses is
> > completely eliminated by the compiler; otherwise there is nothing to
> > order!)
> >
> > After adding some simple definitions for the sets of plain and marked
> > accesses and for compiler barriers, the patch updates the ppo
> > relation.  The basic idea here is that ppo can be broken down into
> > categories: memory barriers, overwrites, and dependencies (including
> > dep-rfi).
> >
> >       Memory barriers always provide ordering (compiler barriers do
> >       not but they have indirect effects).
> >
> >       Overwriting always provides ordering.  This may seem
> >       surprising in the case where both X and Y are plain writes,
> >       but in that case the memory model will say that X can be
> >       eliminated unless there is at least a compiler barrier between
> >       X and Y, and this barrier will enforce the ordering.
> >
> >       Some dependencies provide ordering and some don't.  Going by
> >       cases:
> >
> >               An address dependency to a read provides ordering when
> >               the source is a marked read, even when the target is a
> >               plain read.  This is necessary if rcu_dereference() is
> >               to work correctly; it is tantamount to assuming that
> >               the compiler never speculates address dependencies.
> >               However, if the source is a plain read then there is
> >               no ordering.  This is because of Alpha, which does not
> >               respect address dependencies to reads (on Alpha,
> >               marked reads include a memory barrier to enforce the
> >               ordering but plain reads do not).
> >
> >               An address dependency to a write always provides
> >               ordering.  Neither the compiler nor the CPU can
> >               speculate the address of a write, because a wrong
> >               guess could generate a data race.  (Question: do we
> >               need to include the case where the source is a plain
> >               read?)
> >
> >               A data or control dependency to a write provides
> >               ordering if the target is a marked write.  This is
> >               because the compiler is obliged to translate a marked
> >               write as a single machine instruction; if it
> >               speculates such a write there will be no opportunity
> >               to correct a mistake.
> >
> >               Dep-rfi (i.e., a data or address dependency from a
> >               read to a write which is then read from on the same
> >               CPU) provides ordering between the two reads if the
> >               target is a marked read.  This is again because the
> >               marked read will be translated as a machine-level load
> >               instruction, and then the CPU will guarantee the
> >               ordering.
> >
> >               There is a special case (data;rfi) that doesn't
> >               provide ordering in itself but can contribute to other
> >               orderings.  A data;rfi link corresponds to situations
> >               where a value is stored in a temporary shared variable
> >               and then loaded back again.  Since the compiler might
> >               choose to eliminate the temporary, its accesses can't
> >               be said to be ordered -- but the accesses around it
> >               might be.  As a simple example, consider:
> >
> >                       r1 = READ_ONCE(ptr);
> >                       tmp = r1;
> >                       r2 = tmp;
> >                       WRITE_ONCE(*r2, 5);
> >
> >               The plain accesses involving tmp don't have any
> >               particular ordering requirements, but we do know that
> >               the READ_ONCE must be ordered before the WRITE_ONCE.
> >               The chain of relations is:
> >
> >                       [marked] ; data ; rfi ; addr ; [marked]
> >
> >               showing that a data;rfi has been inserted into an
> >               address dependency from a marked read to a marked
> >               write.  In general, any number of data;rfi links can
> >               be inserted in each of the other kinds of dependencies.
> >
> > As mentioned above, ordering applies only in situations where the
> > compiler has not eliminated either of the accesses.  Therefore the
> > memory model tries to identify which accesses must be preserved in
> > some form by the compiler.
> >
> >       All marked accesses must be preserved.
> >
> >       A plain write is preserved unless it is overwritten by a
> >       po-later write with no compiler barrier in between.  Even
> >       then, it must be preserved if it is read by a marked read, or
> >       if it is read by any preserved read on a different CPU.
> >
> >       A plain read is not preserved unless it is the source of a
> >       dependency to a preserved access.  Even then it is not
> >       preserved if it reads from a plain write on the same CPU with
> >       no compiler barrier in between.
> >
> > The hb (happens-before) relation is restricted to apply only to
> > preserved accesses.  In addition, the rfe and prop parts of hb are
> > restricted to apply only to marked accesses.
> >
> > The last part of the patch checks for possible data races.  A
> > potential race is defined as two accesses to the same location on
> > different CPUs, where at least one access is plain and at least one is
> > a write.  In order to qualify as an actual data race, a potential race
> > has to fail to meet a suitable ordering requirement.  This requirement
> > applies in two distinct scenarios:
> >
> >       X is a write and either Y reads from X or Y directly
> >       overwrites X (i.e., Y immediately follows X in the coherence
> >       order).  In this case X must be visible to Y's CPU before Y
> >       executes.
> >
> >       X is a read and there is an fre link from X to Y.  In this
> >       case X must execute before Y.
> >
> > (I don't know whether the second scenario is really needed.
> > Intuitively, it seems that if X doesn't execute before Y then there
> > must be an alternate execution in which X reads from Y, which would
> > then fall under the first scenario.  But I can't prove this, or even
> > express it precisely.)
> >
> > There are cases not covered by either scenario.  For instance, X could
> > be a write and Y could indirectly overwrite X, that is, there might be
> > a third write W coherence-between X and Y.  But in that case the
> > concern is whether X races with W or W races with Y, not whether X
> > races with Y.  Similarly, X could be a read and Y could be a write
> > coherence-before the write W which X reads from.  Again, the concern
> > would be whether X races with W or W races with Y, not whether X races
> > with Y.
> >
> > Above, the phrase "X must be visible to Y's CPU before Y execute"
> > means that there are preserved accesses X' and Y' such that:
> >
> >       X' occurs at or after the end of the bounded region where
> >       X might execute, Y' occurs at or before the start of the
> >       bounded region where Y might execute, and X' propagates to Y's
> >       CPU before Y' executes.  This last is defined by a new
> >       relation vis which has a relatively simple definition.
> >
> > Similarly, "X must execute before Y" means that there are preserved
> > accesses X' and Y' such that:
> >
> >       X' occurs at or after the end of the bounded region where
> >       X might execute, Y' occurs at or before the start of the
> >       bounded region where Y might execute, and X' is related to
> >       Y' by xb+ (where xb = hb | pb | rb).
> >
> > To use this, we have to have bounds for the region where an access
> > might execute.  For marked accesses the bounds are trivial: A marked
> > access can execute only once, so it serves as its own bounds.  However
> > plain accesses can execute anywhere within a more extended region.
> >
> >       If Y is plain, Y' is preserved, and Y' ->ppo Y then we know
> >       that Y' must execute at least once before any execution of Y.
> >       Thus Y' occurs at or before the start of the bounded region
> >       for Y.
> >
> >       If X is plain, X' is preserved, and X' overwrites X or there
> >       is a memory barrier between X and X' then we know X cannot
> >       execute after any instance of X' has executed.  Thus X' occurs
> >       at or after the end of the bounded region for X.  (Exercise:
> >       Show that the case where X' is plain and overwrites X with no
> >       barriers in between doesn't lead to any problems.)
> >
> > If a potential race exists which does not meet the ordering
> > requirement, the execution is flagged as containing a data race.  This
> > definition is less stringent that the one used in the C standard, but
> > I think it should be suitable for the kernel.
> >
> > Here's a simple example illustrating the data race detection:
> >
> > $ cat race.litmus
> > C race
> >
> > {}
> >
> > P0(int *x)
> > {
> >         WRITE_ONCE(*x, 1);
> > }
> >
> > P1(int *x)
> > {
> >         int r1;
> >
> >         r1 = *x;
> > }
> >
> > exists (1:r1=1)
> >
> >
> > $ herd7 -conf linux-kernel.cfg race.litmus
> > Test race Allowed
> > States 2
> > 1:r1=0;
> > 1:r1=1;
> > Ok
> > Witnesses
> > Positive: 1 Negative: 1
> > Flag data-race
> > Condition exists (1:r1=1)
> > Observation race Sometimes 1 1
> > Time race 0.00
> > Hash=5ad41863408315a556a8d38691c4924d
> >
> >
> > Alan
> >
> >
> >
> > Index: usb-4.x/tools/memory-model/linux-kernel.bell
> > ===================================================================
> > --- usb-4.x.orig/tools/memory-model/linux-kernel.bell
> > +++ usb-4.x/tools/memory-model/linux-kernel.bell
> > @@ -24,6 +24,7 @@ instructions RMW[{'once,'acquire,'releas
> >  enum Barriers = 'wmb (*smp_wmb*) ||
> >               'rmb (*smp_rmb*) ||
> >               'mb (*smp_mb*) ||
> > +             'barrier (*barrier*) ||
> >               'rcu-lock (*rcu_read_lock*)  ||
> >               'rcu-unlock (*rcu_read_unlock*) ||
> >               'sync-rcu (*synchronize_rcu*) ||
> > @@ -73,3 +74,8 @@ flag ~empty Srcu-unlock \ range(srcu-rsc
> >
> >  (* Check for use of synchronize_srcu() inside an RCU critical section *)
> >  flag ~empty rcu-rscs & (po ; [Sync-srcu] ; po) as invalid-sleep
> > +
> > +(* Compute marked and plain memory accesses *)
> > +let marked = IW | Once | Release | Acquire | domain(rmw) | range(rmw) |
> > +             LKR | LKW | UL | LF | RL | RU
> > +let plain = M \ marked
> > Index: usb-4.x/tools/memory-model/linux-kernel.cat
> > ===================================================================
> > --- usb-4.x.orig/tools/memory-model/linux-kernel.cat
> > +++ usb-4.x/tools/memory-model/linux-kernel.cat
> > @@ -37,6 +37,10 @@ let gp = po ; [Sync-rcu | Sync-srcu] ; p
> >
> >  let strong-fence = mb | gp
> >
> > +(* Memory barriers are also compiler barriers *)
> > +let barrier = fencerel(Barrier | Wmb | Rmb | Mb | Acquire | Release |
> > +             Sync-rcu | Sync-srcu)
> > +
> >  (* Release Acquire *)
> >  let acq-po = [Acquire] ; po ; [M]
> >  let po-rel = [M] ; po ; [Release]
> > @@ -58,24 +62,48 @@ empty rmw & (fre ; coe) as atomic
> >  (**********************************)
> >
> >  (* Preserved Program Order *)
> > -let dep = addr | data
> > -let rwdep = (dep | ctrl) ; [W]
> > +let data-rfi-star = (data ; rfi)*
> > +let rwdep = data-rfi-star ; (addr | ((data | ctrl) ; [marked])) ; [W]
> >  let overwrite = co | fr
> >  let to-w = rwdep | (overwrite & int)
> > -let to-r = addr | (dep ; rfi)
> > -let fence = strong-fence | wmb | po-rel | rmb | acq-po
> > -let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int)
> > +let to-r = (data-rfi-star ; (addr | data) ; rfi ; [marked]) |
> > +         ([marked] ; data-rfi-star ; addr ; [R])
> > +let fence = strong-fence | wmb | po-rel | rmb | acq-po |
> > +     (po-unlock-rf-lock-po & int)
> > +let ppo = to-r | to-w | fence
> > +
> > +(*
> > + * Preserved accesses.
> > + *
> > + * All marked writes are preserved.
> > + * A plain write need not be preserved if it is overwritten before the
> > + * next compiler barrier.  But if it is read by a marked read or a
> > + * preserved read on another CPU then it is preserved.
> > + *
> > + * All marked reads are preserved.
> > + * A plain read is not preserved unless it is the source of a dependency
> > + * to a preserved access.  Even then, it isn't preserved if it reads from
> > + * a plain write on the same CPU with no compiler barrier in between.
> > + *)
> > +let non-preserved-r = range(([plain] ; rfi) \ barrier)
> > +let rec preserved = preserved-w | preserved-r
> > +    and preserved-w = (W & marked) |
> > +             (W \ domain(coi \ barrier)) | domain(rf ; [marked]) |
> > +             domain(rfe ; [preserved-r])
> > +    and preserved-r = (R & marked) |
> > +             (domain((to-w | to-r) ; [preserved]) \ non-preserved-r)
> >
> >  (* Propagation: Ordering from release operations and strong fences. *)
> >  let A-cumul(r) = rfe? ; r
> >  let cumul-fence = A-cumul(strong-fence | po-rel) | wmb | po-unlock-rf-lock-po
> > -let prop = (overwrite & ext)? ; cumul-fence* ; rfe?
> > +let prop = [marked] ; ((overwrite & ext)? ; cumul-fence* ; rfe?) ; [marked]
> >
> >  (*
> >   * Happens Before: Ordering from the passage of time.
> >   * No fences needed here for prop because relation confined to one process.
> >   *)
> > -let hb = ppo | rfe | ((prop \ id) & int)
> > +let hb = ([preserved] ; ppo ; [preserved]) |
> > +     ([marked] ; rfe ; [marked]) | ((prop \ id) & int)
> >  acyclic hb as happens-before
> >
> >  (****************************************)
> > @@ -83,7 +111,7 @@ acyclic hb as happens-before
> >  (****************************************)
> >
> >  (* Propagation: Each non-rf link needs a strong fence. *)
> > -let pb = prop ; strong-fence ; hb*
> > +let pb = prop ; strong-fence ; hb* ; [marked]
> >  acyclic pb as propagation
> >
> >  (*******)
> > @@ -131,7 +159,7 @@ let rec rcu-fence = rcu-gp | srcu-gp |
> >       (rcu-fence ; rcu-link ; rcu-fence)
> >
> >  (* rb orders instructions just as pb does *)
> > -let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb*
> > +let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb* ; [marked]
> >
> >  irreflexive rb as rcu
> >
> > @@ -143,3 +171,27 @@ irreflexive rb as rcu
> >   * let xb = hb | pb | rb
> >   * acyclic xb as executes-before
> >   *)
> > +
> > +
> > +(*********************************)
> > +(* Plain accesses and data races *)
> > +(*********************************)
> > +
> > +(* Boundaries for lifetimes of plain accesses *)
> > +let bound-before = [marked] | ([preserved] ; ppo)
> > +let bound-after = [marked] | ((fri | coi | fence) ; [preserved])
> > +
> > +(* Visibility of a write *)
> > +let xb = hb | pb | rb
> > +let full-fence = strong-fence | (po ; rcu-fence ; po?)
> > +let vis = cumul-fence* ; rfe? ; ((full-fence ; xb* ) | (xb* & int))
> > +
> > +(* Potential races *)
> > +let pre-race = ext & ((plain * M) | ((M \ IW) * plain))
> > +
> > +(* Actual races *)
> > +let coe-next = (coe \ (co ; co)) | rfe
> > +let race-from-w = (pre-race & coe-next) \ (bound-after ; vis ; bound-before)
> > +let race-from-r = (pre-race & fre) \ (bound-after ; xb+ ; bound-before)
> > +
> > +flag ~empty race-from-w | race-from-r as data-race
> > Index: usb-4.x/tools/memory-model/linux-kernel.def
> > ===================================================================
> > --- usb-4.x.orig/tools/memory-model/linux-kernel.def
> > +++ usb-4.x/tools/memory-model/linux-kernel.def
> > @@ -24,6 +24,7 @@ smp_mb__before_atomic() { __fence{before
> >  smp_mb__after_atomic() { __fence{after-atomic}; }
> >  smp_mb__after_spinlock() { __fence{after-spinlock}; }
> >  smp_mb__after_unlock_lock() { __fence{after-unlock-lock}; }
> > +barrier() { __fence{barrier}; }
> >
> >  // Exchange
> >  xchg(X,V)  __xchg{mb}(X,V)
> >
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-15  7:20   ` Plain accesses and data races in the Linux Kernel Memory Model Dmitry Vyukov
@ 2019-01-15 15:03     ` Alan Stern
  2019-01-15 15:23       ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2019-01-15 15:03 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Paul E. McKenney, LKMM Maintainers -- Akira Yokosawa,
	Andrea Parri, Boqun Feng, Daniel Lustig, David Howells,
	Jade Alglave, Luc Maranget, Nicholas Piggin, Peter Zijlstra,
	Will Deacon, LKML

On Tue, 15 Jan 2019, Dmitry Vyukov wrote:

> On Tue, Jan 15, 2019 at 12:54 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
> >
> > On Mon, Jan 14, 2019 at 02:41:49PM -0500, Alan Stern wrote:
> > > The patch below is my first attempt at adapting the Linux Kernel
> > > Memory Model to handle plain accesses (i.e., those which aren't
> > > specially marked as READ_ONCE, WRITE_ONCE, acquire, release,
> > > read-modify-write, or lock accesses).  This work is based on an
> > > initial proposal created by Andrea Parri back in December 2017,
> > > although it has grown a lot since then.
> >
> > Hello, Alan,
> >
> > Good stuff!!!
> >
> > I tried applying this in order to test it against the various litmus
> > tests, but no joy.  Could you please tell me what commit is this patch
> > based on?
> >
> >                                                         Thanx, Paul
> >
> > > The adaptation involves two main aspects: recognizing the ordering
> > > induced by plain accesses and detecting data races.  They are handled
> > > separately.  In fact, the code for figuring out the ordering assumes
> > > there are no data races (the idea being that if a data race is
> > > present then pretty much anything could happen, so there's no point
> > > worrying about it -- obviously this will have to be changed if we want
> > > to cover seqlocks).
> 
> Hi Alan,
> 
> Is there a mailing list dedicated to this effort? Private messages
> tend to lost over time, no archive, not possible to send a link or
> show full history to anybody, etc.

No specific mailing list.  We've been relying on LKML.

> Re seqlocks, strictly saying defining races for seqlocks is not
> necessary. Seqlocks can be expressed without races in C by using
> relaxed atomic loads within the read critical section. We may consider
> this option as well.

That seems like a reasonable approach.

Alan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-15 15:03     ` Alan Stern
@ 2019-01-15 15:23       ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2019-01-15 15:23 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Vyukov, LKMM Maintainers -- Akira Yokosawa, Andrea Parri,
	Boqun Feng, Daniel Lustig, David Howells, Jade Alglave,
	Luc Maranget, Nicholas Piggin, Peter Zijlstra, Will Deacon, LKML

On Tue, Jan 15, 2019 at 10:03:26AM -0500, Alan Stern wrote:
> On Tue, 15 Jan 2019, Dmitry Vyukov wrote:
> 
> > On Tue, Jan 15, 2019 at 12:54 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
> > >
> > > On Mon, Jan 14, 2019 at 02:41:49PM -0500, Alan Stern wrote:
> > > > The patch below is my first attempt at adapting the Linux Kernel
> > > > Memory Model to handle plain accesses (i.e., those which aren't
> > > > specially marked as READ_ONCE, WRITE_ONCE, acquire, release,
> > > > read-modify-write, or lock accesses).  This work is based on an
> > > > initial proposal created by Andrea Parri back in December 2017,
> > > > although it has grown a lot since then.
> > >
> > > Hello, Alan,
> > >
> > > Good stuff!!!
> > >
> > > I tried applying this in order to test it against the various litmus
> > > tests, but no joy.  Could you please tell me what commit is this patch
> > > based on?
> > >
> > >                                                         Thanx, Paul
> > >
> > > > The adaptation involves two main aspects: recognizing the ordering
> > > > induced by plain accesses and detecting data races.  They are handled
> > > > separately.  In fact, the code for figuring out the ordering assumes
> > > > there are no data races (the idea being that if a data race is
> > > > present then pretty much anything could happen, so there's no point
> > > > worrying about it -- obviously this will have to be changed if we want
> > > > to cover seqlocks).
> > 
> > Hi Alan,
> > 
> > Is there a mailing list dedicated to this effort? Private messages
> > tend to lost over time, no archive, not possible to send a link or
> > show full history to anybody, etc.
> 
> No specific mailing list.  We've been relying on LKML.
> 
> > Re seqlocks, strictly saying defining races for seqlocks is not
> > necessary. Seqlocks can be expressed without races in C by using
> > relaxed atomic loads within the read critical section. We may consider
> > this option as well.
> 
> That seems like a reasonable approach.

What Alan said!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
       [not found] <Pine.LNX.4.44L0.1901141439480.1366-100000@iolanthe.rowland.org>
       [not found] ` <20190114235426.GV1215@linux.ibm.com>
@ 2019-01-15 14:25 ` Andrea Parri
  2019-01-15 15:19   ` Alan Stern
  2019-01-16 21:36 ` Andrea Parri
  2019-01-22 15:47 ` Andrea Parri
  3 siblings, 1 reply; 18+ messages in thread
From: Andrea Parri @ 2019-01-15 14:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	linux-kernel

On Mon, Jan 14, 2019 at 02:41:49PM -0500, Alan Stern wrote:
> The patch below is my first attempt at adapting the Linux Kernel
> Memory Model to handle plain accesses (i.e., those which aren't
> specially marked as READ_ONCE, WRITE_ONCE, acquire, release,
> read-modify-write, or lock accesses).  This work is based on an
> initial proposal created by Andrea Parri back in December 2017,
> although it has grown a lot since then.
> 
> The adaptation involves two main aspects: recognizing the ordering
> induced by plain accesses and detecting data races.  They are handled
> separately.  In fact, the code for figuring out the ordering assumes
> there are no data races (the idea being that if a data race is
> present then pretty much anything could happen, so there's no point
> worrying about it -- obviously this will have to be changed if we want
> to cover seqlocks).
> 
> This is a relativly major change to the model and it will require a
> lot of scrutiny and testing.  At the moment, I haven't even tried to
> compare it with the existing model on our library of litmus tests.
> 
> The difficulty with incorporating plain accesses in the memory model
> is that the compiler has very few constraints on how it treats plain
> accesses.  It can eliminate them, duplicate them, rearrange them,
> merge them, split them up, and goodness knows what else.  To make some
> sense of this, I have taken the view that a plain access can exist
> (perhaps multiple times) within a certain bounded region of code.
> Ordering of two accesses X and Y means that we guarantee at least one
> instance of the X access must be executed before any instances of the
> Y access.  (This is assuming that neither of the accesses is
> completely eliminated by the compiler; otherwise there is nothing to
> order!)
> 
> After adding some simple definitions for the sets of plain and marked
> accesses and for compiler barriers, the patch updates the ppo
> relation.  The basic idea here is that ppo can be broken down into
> categories: memory barriers, overwrites, and dependencies (including
> dep-rfi).
> 
> 	Memory barriers always provide ordering (compiler barriers do
> 	not but they have indirect effects).
> 
> 	Overwriting always provides ordering.  This may seem
> 	surprising in the case where both X and Y are plain writes,
> 	but in that case the memory model will say that X can be
> 	eliminated unless there is at least a compiler barrier between
> 	X and Y, and this barrier will enforce the ordering.
> 
> 	Some dependencies provide ordering and some don't.  Going by
> 	cases:
> 
> 		An address dependency to a read provides ordering when
> 		the source is a marked read, even when the target is a
> 		plain read.  This is necessary if rcu_dereference() is
> 		to work correctly; it is tantamount to assuming that
> 		the compiler never speculates address dependencies.
> 		However, if the source is a plain read then there is
> 		no ordering.  This is because of Alpha, which does not
> 		respect address dependencies to reads (on Alpha,
> 		marked reads include a memory barrier to enforce the
> 		ordering but plain reads do not).
> 
> 		An address dependency to a write always provides
> 		ordering.  Neither the compiler nor the CPU can
> 		speculate the address of a write, because a wrong
> 		guess could generate a data race.  (Question: do we
> 		need to include the case where the source is a plain
> 		read?)
> 
> 		A data or control dependency to a write provides
> 		ordering if the target is a marked write.  This is
> 		because the compiler is obliged to translate a marked
> 		write as a single machine instruction; if it
> 		speculates such a write there will be no opportunity
> 		to correct a mistake.
> 
> 		Dep-rfi (i.e., a data or address dependency from a
> 		read to a write which is then read from on the same
> 		CPU) provides ordering between the two reads if the
> 		target is a marked read.  This is again because the
> 		marked read will be translated as a machine-level load
> 		instruction, and then the CPU will guarantee the
> 		ordering.
> 
> 		There is a special case (data;rfi) that doesn't
> 		provide ordering in itself but can contribute to other
> 		orderings.  A data;rfi link corresponds to situations
> 		where a value is stored in a temporary shared variable
> 		and then loaded back again.  Since the compiler might
> 		choose to eliminate the temporary, its accesses can't
> 		be said to be ordered -- but the accesses around it
> 		might be.  As a simple example, consider:
> 
> 			r1 = READ_ONCE(ptr);
> 			tmp = r1;
> 			r2 = tmp;
> 			WRITE_ONCE(*r2, 5);
> 
> 		The plain accesses involving tmp don't have any
> 		particular ordering requirements, but we do know that
> 		the READ_ONCE must be ordered before the WRITE_ONCE.
> 		The chain of relations is:
> 
> 			[marked] ; data ; rfi ; addr ; [marked]
> 
> 		showing that a data;rfi has been inserted into an
> 		address dependency from a marked read to a marked
> 		write.  In general, any number of data;rfi links can
> 		be inserted in each of the other kinds of dependencies.
> 
> As mentioned above, ordering applies only in situations where the
> compiler has not eliminated either of the accesses.  Therefore the
> memory model tries to identify which accesses must be preserved in
> some form by the compiler.
> 
> 	All marked accesses must be preserved.
> 
> 	A plain write is preserved unless it is overwritten by a
> 	po-later write with no compiler barrier in between.  Even
> 	then, it must be preserved if it is read by a marked read, or
> 	if it is read by any preserved read on a different CPU.
> 
> 	A plain read is not preserved unless it is the source of a
> 	dependency to a preserved access.  Even then it is not
> 	preserved if it reads from a plain write on the same CPU with
> 	no compiler barrier in between.
> 
> The hb (happens-before) relation is restricted to apply only to
> preserved accesses.  In addition, the rfe and prop parts of hb are
> restricted to apply only to marked accesses.
> 
> The last part of the patch checks for possible data races.  A
> potential race is defined as two accesses to the same location on
> different CPUs, where at least one access is plain and at least one is
> a write.  In order to qualify as an actual data race, a potential race
> has to fail to meet a suitable ordering requirement.  This requirement
> applies in two distinct scenarios:
> 
> 	X is a write and either Y reads from X or Y directly
> 	overwrites X (i.e., Y immediately follows X in the coherence
> 	order).  In this case X must be visible to Y's CPU before Y
> 	executes.
> 
> 	X is a read and there is an fre link from X to Y.  In this
> 	case X must execute before Y.
> 
> (I don't know whether the second scenario is really needed.
> Intuitively, it seems that if X doesn't execute before Y then there
> must be an alternate execution in which X reads from Y, which would
> then fall under the first scenario.  But I can't prove this, or even
> express it precisely.)
> 
> There are cases not covered by either scenario.  For instance, X could
> be a write and Y could indirectly overwrite X, that is, there might be
> a third write W coherence-between X and Y.  But in that case the
> concern is whether X races with W or W races with Y, not whether X
> races with Y.  Similarly, X could be a read and Y could be a write
> coherence-before the write W which X reads from.  Again, the concern
> would be whether X races with W or W races with Y, not whether X races
> with Y.
> 
> Above, the phrase "X must be visible to Y's CPU before Y execute"
> means that there are preserved accesses X' and Y' such that:
> 
> 	X' occurs at or after the end of the bounded region where
> 	X might execute, Y' occurs at or before the start of the
> 	bounded region where Y might execute, and X' propagates to Y's
> 	CPU before Y' executes.  This last is defined by a new
> 	relation vis which has a relatively simple definition.
> 
> Similarly, "X must execute before Y" means that there are preserved
> accesses X' and Y' such that:
> 
> 	X' occurs at or after the end of the bounded region where
> 	X might execute, Y' occurs at or before the start of the
> 	bounded region where Y might execute, and X' is related to
> 	Y' by xb+ (where xb = hb | pb | rb).
> 
> To use this, we have to have bounds for the region where an access
> might execute.  For marked accesses the bounds are trivial: A marked
> access can execute only once, so it serves as its own bounds.  However
> plain accesses can execute anywhere within a more extended region.
> 
> 	If Y is plain, Y' is preserved, and Y' ->ppo Y then we know
> 	that Y' must execute at least once before any execution of Y.
> 	Thus Y' occurs at or before the start of the bounded region
> 	for Y.
> 
> 	If X is plain, X' is preserved, and X' overwrites X or there
> 	is a memory barrier between X and X' then we know X cannot
> 	execute after any instance of X' has executed.  Thus X' occurs
> 	at or after the end of the bounded region for X.  (Exercise:
> 	Show that the case where X' is plain and overwrites X with no
> 	barriers in between doesn't lead to any problems.)
> 
> If a potential race exists which does not meet the ordering
> requirement, the execution is flagged as containing a data race.  This
> definition is less stringent that the one used in the C standard, but
> I think it should be suitable for the kernel.

[A resend, +LKML]

Unless I'm mis-reading/-applying this definition, this will flag the
following test (a variation on your "race.litmus") with "data-race":

C no-race

{}

P0(int *x, spinlock_t *s)
{
	spin_lock(s);
        WRITE_ONCE(*x, 1);	/* A */
	spin_unlock(s);	/* B */
}

P1(int *x, spinlock_t *s)
{
        int r1;

	spin_lock(s); /* C */
        r1 = *x;	/* D */
	spin_unlock(s);
}

exists (1:r1=1)

Broadly speaking, this is due to the fact that the modified "happens-
before" axiom does not forbid the execution with the (MP-) cycle

	A ->po-rel B ->rfe C ->acq-po D ->fre A

and then to the link "D ->race-from-r A" here defined.

(In part., similar considerations hold for the following litmus test:

C MP1

{}

P0(int *x, int *y)
{
	*x = 1;
	smp_store_release(y, 1);
}

P1(int *x, int *y)
{
	int r0;
	int r1 = -1;

	r0 = smp_load_acquire(y);
	if (r0)
		r1 = *x;
}

exists (1:r0=1 /\ 1:r1=0)

)

I wonder whether you actually intended to introduce these "races"...?

  Andrea





> 
> Here's a simple example illustrating the data race detection:
> 
> $ cat race.litmus
> C race
> 
> {}
> 
> P0(int *x)
> {
>         WRITE_ONCE(*x, 1);
> }
> 
> P1(int *x)
> {
>         int r1;
> 
>         r1 = *x;
> }
> 
> exists (1:r1=1)
> 
> 
> $ herd7 -conf linux-kernel.cfg race.litmus 
> Test race Allowed
> States 2
> 1:r1=0;
> 1:r1=1;
> Ok
> Witnesses
> Positive: 1 Negative: 1
> Flag data-race
> Condition exists (1:r1=1)
> Observation race Sometimes 1 1
> Time race 0.00
> Hash=5ad41863408315a556a8d38691c4924d
> 
> 
> Alan
> 
> 
> 
> Index: usb-4.x/tools/memory-model/linux-kernel.bell
> ===================================================================
> --- usb-4.x.orig/tools/memory-model/linux-kernel.bell
> +++ usb-4.x/tools/memory-model/linux-kernel.bell
> @@ -24,6 +24,7 @@ instructions RMW[{'once,'acquire,'releas
>  enum Barriers = 'wmb (*smp_wmb*) ||
>  		'rmb (*smp_rmb*) ||
>  		'mb (*smp_mb*) ||
> +		'barrier (*barrier*) ||
>  		'rcu-lock (*rcu_read_lock*)  ||
>  		'rcu-unlock (*rcu_read_unlock*) ||
>  		'sync-rcu (*synchronize_rcu*) ||
> @@ -73,3 +74,8 @@ flag ~empty Srcu-unlock \ range(srcu-rsc
>  
>  (* Check for use of synchronize_srcu() inside an RCU critical section *)
>  flag ~empty rcu-rscs & (po ; [Sync-srcu] ; po) as invalid-sleep
> +
> +(* Compute marked and plain memory accesses *)
> +let marked = IW | Once | Release | Acquire | domain(rmw) | range(rmw) |
> +		LKR | LKW | UL | LF | RL | RU
> +let plain = M \ marked
> Index: usb-4.x/tools/memory-model/linux-kernel.cat
> ===================================================================
> --- usb-4.x.orig/tools/memory-model/linux-kernel.cat
> +++ usb-4.x/tools/memory-model/linux-kernel.cat
> @@ -37,6 +37,10 @@ let gp = po ; [Sync-rcu | Sync-srcu] ; p
>  
>  let strong-fence = mb | gp
>  
> +(* Memory barriers are also compiler barriers *)
> +let barrier = fencerel(Barrier | Wmb | Rmb | Mb | Acquire | Release |
> +		Sync-rcu | Sync-srcu)
> +
>  (* Release Acquire *)
>  let acq-po = [Acquire] ; po ; [M]
>  let po-rel = [M] ; po ; [Release]
> @@ -58,24 +62,48 @@ empty rmw & (fre ; coe) as atomic
>  (**********************************)
>  
>  (* Preserved Program Order *)
> -let dep = addr | data
> -let rwdep = (dep | ctrl) ; [W]
> +let data-rfi-star = (data ; rfi)*
> +let rwdep = data-rfi-star ; (addr | ((data | ctrl) ; [marked])) ; [W]
>  let overwrite = co | fr
>  let to-w = rwdep | (overwrite & int)
> -let to-r = addr | (dep ; rfi)
> -let fence = strong-fence | wmb | po-rel | rmb | acq-po
> -let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int)
> +let to-r = (data-rfi-star ; (addr | data) ; rfi ; [marked]) |
> +	    ([marked] ; data-rfi-star ; addr ; [R])
> +let fence = strong-fence | wmb | po-rel | rmb | acq-po |
> +	(po-unlock-rf-lock-po & int)
> +let ppo = to-r | to-w | fence
> +
> +(*
> + * Preserved accesses.
> + *
> + * All marked writes are preserved.
> + * A plain write need not be preserved if it is overwritten before the
> + * next compiler barrier.  But if it is read by a marked read or a
> + * preserved read on another CPU then it is preserved.
> + *
> + * All marked reads are preserved.
> + * A plain read is not preserved unless it is the source of a dependency
> + * to a preserved access.  Even then, it isn't preserved if it reads from
> + * a plain write on the same CPU with no compiler barrier in between.
> + *)
> +let non-preserved-r = range(([plain] ; rfi) \ barrier)
> +let rec preserved = preserved-w | preserved-r
> +    and preserved-w = (W & marked) |
> +		(W \ domain(coi \ barrier)) | domain(rf ; [marked]) |
> +		domain(rfe ; [preserved-r])
> +    and preserved-r = (R & marked) |
> +		(domain((to-w | to-r) ; [preserved]) \ non-preserved-r)
>  
>  (* Propagation: Ordering from release operations and strong fences. *)
>  let A-cumul(r) = rfe? ; r
>  let cumul-fence = A-cumul(strong-fence | po-rel) | wmb | po-unlock-rf-lock-po
> -let prop = (overwrite & ext)? ; cumul-fence* ; rfe?
> +let prop = [marked] ; ((overwrite & ext)? ; cumul-fence* ; rfe?) ; [marked]
>  
>  (*
>   * Happens Before: Ordering from the passage of time.
>   * No fences needed here for prop because relation confined to one process.
>   *)
> -let hb = ppo | rfe | ((prop \ id) & int)
> +let hb = ([preserved] ; ppo ; [preserved]) |
> +	([marked] ; rfe ; [marked]) | ((prop \ id) & int)
>  acyclic hb as happens-before
>  
>  (****************************************)
> @@ -83,7 +111,7 @@ acyclic hb as happens-before
>  (****************************************)
>  
>  (* Propagation: Each non-rf link needs a strong fence. *)
> -let pb = prop ; strong-fence ; hb*
> +let pb = prop ; strong-fence ; hb* ; [marked]
>  acyclic pb as propagation
>  
>  (*******)
> @@ -131,7 +159,7 @@ let rec rcu-fence = rcu-gp | srcu-gp |
>  	(rcu-fence ; rcu-link ; rcu-fence)
>  
>  (* rb orders instructions just as pb does *)
> -let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb*
> +let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb* ; [marked]
>  
>  irreflexive rb as rcu
>  
> @@ -143,3 +171,27 @@ irreflexive rb as rcu
>   * let xb = hb | pb | rb
>   * acyclic xb as executes-before
>   *)
> +
> +
> +(*********************************)
> +(* Plain accesses and data races *)
> +(*********************************)
> +
> +(* Boundaries for lifetimes of plain accesses *)
> +let bound-before = [marked] | ([preserved] ; ppo)
> +let bound-after = [marked] | ((fri | coi | fence) ; [preserved])
> +
> +(* Visibility of a write *)
> +let xb = hb | pb | rb
> +let full-fence = strong-fence | (po ; rcu-fence ; po?)
> +let vis = cumul-fence* ; rfe? ; ((full-fence ; xb* ) | (xb* & int))
> +
> +(* Potential races *)
> +let pre-race = ext & ((plain * M) | ((M \ IW) * plain))
> +
> +(* Actual races *)
> +let coe-next = (coe \ (co ; co)) | rfe
> +let race-from-w = (pre-race & coe-next) \ (bound-after ; vis ; bound-before)
> +let race-from-r = (pre-race & fre) \ (bound-after ; xb+ ; bound-before)
> +
> +flag ~empty race-from-w | race-from-r as data-race
> Index: usb-4.x/tools/memory-model/linux-kernel.def
> ===================================================================
> --- usb-4.x.orig/tools/memory-model/linux-kernel.def
> +++ usb-4.x/tools/memory-model/linux-kernel.def
> @@ -24,6 +24,7 @@ smp_mb__before_atomic() { __fence{before
>  smp_mb__after_atomic() { __fence{after-atomic}; }
>  smp_mb__after_spinlock() { __fence{after-spinlock}; }
>  smp_mb__after_unlock_lock() { __fence{after-unlock-lock}; }
> +barrier() { __fence{barrier}; }
>  
>  // Exchange
>  xchg(X,V)  __xchg{mb}(X,V)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-15 14:25 ` Andrea Parri
@ 2019-01-15 15:19   ` Alan Stern
  2019-01-16 11:57     ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2019-01-15 15:19 UTC (permalink / raw)
  To: Andrea Parri
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	linux-kernel

On Tue, 15 Jan 2019, Andrea Parri wrote:

> Unless I'm mis-reading/-applying this definition, this will flag the
> following test (a variation on your "race.litmus") with "data-race":
> 
> C no-race
> 
> {}
> 
> P0(int *x, spinlock_t *s)
> {
> 	spin_lock(s);
>         WRITE_ONCE(*x, 1);	/* A */
> 	spin_unlock(s);	/* B */
> }
> 
> P1(int *x, spinlock_t *s)
> {
>         int r1;
> 
> 	spin_lock(s); /* C */
>         r1 = *x;	/* D */
> 	spin_unlock(s);
> }
> 
> exists (1:r1=1)
> 
> Broadly speaking, this is due to the fact that the modified "happens-
> before" axiom does not forbid the execution with the (MP-) cycle
> 
> 	A ->po-rel B ->rfe C ->acq-po D ->fre A
> 
> and then to the link "D ->race-from-r A" here defined.

Yes, that cycle certainly should be forbidden.  On the other hand, we
don't want to insist that C happens before D, given that D may not
happen at all.

This is a real problem.  Can we solve it by adding a modified
"happens-before" which says essentially that _if_ D is preserved _then_
C happens before D?  But then what about cycles involving more than one
possibly preserved access?  Or maybe a relation which says that D
cannot execute before C (so if D executes at all, it has to come after
C)?

Now you see why this stuff is so difficult...  At the moment, I don't
know how to fix this.

> (In part., similar considerations hold for the following litmus test:
> 
> C MP1
> 
> {}
> 
> P0(int *x, int *y)
> {
> 	*x = 1;
> 	smp_store_release(y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	int r0;
> 	int r1 = -1;
> 
> 	r0 = smp_load_acquire(y);
> 	if (r0)
> 		r1 = *x;
> }
> 
> exists (1:r0=1 /\ 1:r1=0)
> 
> )
> 
> I wonder whether you actually intended to introduce these "races"...?

No, I did not.

Alan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-15 15:19   ` Alan Stern
@ 2019-01-16 11:57     ` Peter Zijlstra
  2019-01-16 13:11       ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2019-01-16 11:57 UTC (permalink / raw)
  To: Alan Stern
  Cc: Andrea Parri, LKMM Maintainers -- Akira Yokosawa, Boqun Feng,
	Daniel Lustig, David Howells, Jade Alglave, Luc Maranget,
	Nicholas Piggin, Paul E. McKenney, Will Deacon, Dmitry Vyukov,
	linux-kernel

On Tue, Jan 15, 2019 at 10:19:10AM -0500, Alan Stern wrote:
> On Tue, 15 Jan 2019, Andrea Parri wrote:
> 
> > Unless I'm mis-reading/-applying this definition, this will flag the
> > following test (a variation on your "race.litmus") with "data-race":
> > 
> > C no-race
> > 
> > {}
> > 
> > P0(int *x, spinlock_t *s)
> > {
> > 	spin_lock(s);
> >         WRITE_ONCE(*x, 1);	/* A */
> > 	spin_unlock(s);	/* B */
> > }
> > 
> > P1(int *x, spinlock_t *s)
> > {
> >         int r1;
> > 
> > 	spin_lock(s); /* C */
> >         r1 = *x;	/* D */
> > 	spin_unlock(s);
> > }
> > 
> > exists (1:r1=1)
> > 
> > Broadly speaking, this is due to the fact that the modified "happens-
> > before" axiom does not forbid the execution with the (MP-) cycle
> > 
> > 	A ->po-rel B ->rfe C ->acq-po D ->fre A
> > 
> > and then to the link "D ->race-from-r A" here defined.
> 
> Yes, that cycle certainly should be forbidden.  On the other hand, we
> don't want to insist that C happens before D, given that D may not
> happen at all.
> 
> This is a real problem.  Can we solve it by adding a modified
> "happens-before" which says essentially that _if_ D is preserved _then_
> C happens before D?  But then what about cycles involving more than one
> possibly preserved access?  Or maybe a relation which says that D
> cannot execute before C (so if D executes at all, it has to come after
> C)?

The latter; there is a compiler barrier implied at the end of
spin_lock() such that anything later (in PO) must indeed be later.

> Now you see why this stuff is so difficult...  At the moment, I don't
> know how to fix this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-16 11:57     ` Peter Zijlstra
@ 2019-01-16 13:11       ` Paul E. McKenney
  2019-01-16 15:49         ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2019-01-16 13:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alan Stern, Andrea Parri, LKMM Maintainers -- Akira Yokosawa,
	Boqun Feng, Daniel Lustig, David Howells, Jade Alglave,
	Luc Maranget, Nicholas Piggin, Will Deacon, Dmitry Vyukov,
	linux-kernel

On Wed, Jan 16, 2019 at 12:57:52PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 15, 2019 at 10:19:10AM -0500, Alan Stern wrote:
> > On Tue, 15 Jan 2019, Andrea Parri wrote:
> > 
> > > Unless I'm mis-reading/-applying this definition, this will flag the
> > > following test (a variation on your "race.litmus") with "data-race":
> > > 
> > > C no-race
> > > 
> > > {}
> > > 
> > > P0(int *x, spinlock_t *s)
> > > {
> > > 	spin_lock(s);
> > >         WRITE_ONCE(*x, 1);	/* A */
> > > 	spin_unlock(s);	/* B */
> > > }
> > > 
> > > P1(int *x, spinlock_t *s)
> > > {
> > >         int r1;
> > > 
> > > 	spin_lock(s); /* C */
> > >         r1 = *x;	/* D */
> > > 	spin_unlock(s);
> > > }
> > > 
> > > exists (1:r1=1)
> > > 
> > > Broadly speaking, this is due to the fact that the modified "happens-
> > > before" axiom does not forbid the execution with the (MP-) cycle
> > > 
> > > 	A ->po-rel B ->rfe C ->acq-po D ->fre A
> > > 
> > > and then to the link "D ->race-from-r A" here defined.
> > 
> > Yes, that cycle certainly should be forbidden.  On the other hand, we
> > don't want to insist that C happens before D, given that D may not
> > happen at all.
> > 
> > This is a real problem.  Can we solve it by adding a modified
> > "happens-before" which says essentially that _if_ D is preserved _then_
> > C happens before D?  But then what about cycles involving more than one
> > possibly preserved access?  Or maybe a relation which says that D
> > cannot execute before C (so if D executes at all, it has to come after
> > C)?
> 
> The latter; there is a compiler barrier implied at the end of
> spin_lock() such that anything later (in PO) must indeed be later.
> 
> > Now you see why this stuff is so difficult...  At the moment, I don't
> > know how to fix this.

In the spirit of cutting the Gordian Knot...

Given that we are flagging data races, how much do we really lose by
simply ignoring the possibility of removed accesses?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-16 13:11       ` Paul E. McKenney
@ 2019-01-16 15:49         ` Alan Stern
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Stern @ 2019-01-16 15:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Andrea Parri, LKMM Maintainers -- Akira Yokosawa,
	Boqun Feng, Daniel Lustig, David Howells, Jade Alglave,
	Luc Maranget, Nicholas Piggin, Will Deacon, Dmitry Vyukov,
	linux-kernel

On Wed, 16 Jan 2019, Paul E. McKenney wrote:

> On Wed, Jan 16, 2019 at 12:57:52PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 15, 2019 at 10:19:10AM -0500, Alan Stern wrote:
> > > On Tue, 15 Jan 2019, Andrea Parri wrote:
> > > 
> > > > Unless I'm mis-reading/-applying this definition, this will flag the
> > > > following test (a variation on your "race.litmus") with "data-race":
> > > > 
> > > > C no-race
> > > > 
> > > > {}
> > > > 
> > > > P0(int *x, spinlock_t *s)
> > > > {
> > > > 	spin_lock(s);
> > > >         WRITE_ONCE(*x, 1);	/* A */
> > > > 	spin_unlock(s);	/* B */
> > > > }
> > > > 
> > > > P1(int *x, spinlock_t *s)
> > > > {
> > > >         int r1;
> > > > 
> > > > 	spin_lock(s); /* C */
> > > >         r1 = *x;	/* D */
> > > > 	spin_unlock(s);
> > > > }
> > > > 
> > > > exists (1:r1=1)
> > > > 
> > > > Broadly speaking, this is due to the fact that the modified "happens-
> > > > before" axiom does not forbid the execution with the (MP-) cycle
> > > > 
> > > > 	A ->po-rel B ->rfe C ->acq-po D ->fre A
> > > > 
> > > > and then to the link "D ->race-from-r A" here defined.
> > > 
> > > Yes, that cycle certainly should be forbidden.  On the other hand, we
> > > don't want to insist that C happens before D, given that D may not
> > > happen at all.
> > > 
> > > This is a real problem.  Can we solve it by adding a modified
> > > "happens-before" which says essentially that _if_ D is preserved _then_
> > > C happens before D?  But then what about cycles involving more than one
> > > possibly preserved access?  Or maybe a relation which says that D
> > > cannot execute before C (so if D executes at all, it has to come after
> > > C)?
> > 
> > The latter; there is a compiler barrier implied at the end of
> > spin_lock() such that anything later (in PO) must indeed be later.
> > 
> > > Now you see why this stuff is so difficult...  At the moment, I don't
> > > know how to fix this.
> 
> In the spirit of cutting the Gordian Knot...
> 
> Given that we are flagging data races, how much do we really lose by
> simply ignoring the possibility of removed accesses?

Well, I thought about these issues overnight.  It turns out Andrea's
test cases expose two problems: an easy one and a hard one.

The easy one is that my definition of hb was too stringent; it required
the accesses involved in the prop relation to be marked, but it should
have allowed any preserved access.  At the same time, it was too
lenient in that the overwrite relation allowed any write as the
right-hand argument, but it should have required the write to be
preserved.  Likewise for the rfe? term in A-cumul.  Those issues have 
now been fixed.

The hard problem involves race detection when non-preserved accesses
are present.  (The plain reads in Andrea's examples were non-preserved;  
if the examples are changed to make them preserved then the corrected
model will realize they do not race.)  The point is that non-preserved
accesses can participate in a data race, but if they do it means that
the compiler must have preserved them!  To put it another way, if the
compiler deletes an access then that access can't race with anything.

Hence, when testing whether a particular execution has a data race
between accesses X and Y, we really should re-determine whether the
execution is allowed under the assumption that X and Y are both
preserved.  If it isn't then X and Y don't race in that execution.

Here's a particularly obscure example to illustrate the point.

C non-race1

{}

P0(int *x, int *y)
{
	int r1;
	int r2;

	r1 = READ_ONCE(*x);
	smp_rmb();
	if (r1 == 1)
		r2 = *y;
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y)
{
	int r3;

	r3 = READ_ONCE(*y);
	WRITE_ONCE(*x, r3);
}

P2(int *y)
{
	WRITE_ONCE(*y, 2);
}

exists (0:r1=1 /\ 1:r3=1)

This litmus test is allowed, and there's no synchronization at all
between the marked write to y in P2() and the plain read of y in P0().  
Nevertheless, those two accesses do not race, because the "r2 = *y" 
read does not actually occur in any of the allowed executions.

I'm thinking about ways to attack this problem.  One approach is to
ignore non-preserved accesses entirely (they do correspond to dead
code, after all).  But that's not so good, because an access may be
preserved in one execution and non-preserved in another.

Still working on it...

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
       [not found] <Pine.LNX.4.44L0.1901141439480.1366-100000@iolanthe.rowland.org>
       [not found] ` <20190114235426.GV1215@linux.ibm.com>
  2019-01-15 14:25 ` Andrea Parri
@ 2019-01-16 21:36 ` Andrea Parri
  2019-01-17 15:03   ` Andrea Parri
  2019-01-17 19:43   ` Alan Stern
  2019-01-22 15:47 ` Andrea Parri
  3 siblings, 2 replies; 18+ messages in thread
From: Andrea Parri @ 2019-01-16 21:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

[...]

> The difficulty with incorporating plain accesses in the memory model
> is that the compiler has very few constraints on how it treats plain
> accesses.  It can eliminate them, duplicate them, rearrange them,
> merge them, split them up, and goodness knows what else.  To make some
> sense of this, I have taken the view that a plain access can exist
> (perhaps multiple times) within a certain bounded region of code.
> Ordering of two accesses X and Y means that we guarantee at least one
> instance of the X access must be executed before any instances of the
> Y access.  (This is assuming that neither of the accesses is
> completely eliminated by the compiler; otherwise there is nothing to
> order!)
> 
> After adding some simple definitions for the sets of plain and marked
> accesses and for compiler barriers, the patch updates the ppo
> relation.  The basic idea here is that ppo can be broken down into
> categories: memory barriers, overwrites, and dependencies (including
> dep-rfi).
> 
> 	Memory barriers always provide ordering (compiler barriers do
> 	not but they have indirect effects).
> 
> 	Overwriting always provides ordering.  This may seem
> 	surprising in the case where both X and Y are plain writes,
> 	but in that case the memory model will say that X can be
> 	eliminated unless there is at least a compiler barrier between
> 	X and Y, and this barrier will enforce the ordering.
> 
> 	Some dependencies provide ordering and some don't.  Going by
> 	cases:
> 
> 		An address dependency to a read provides ordering when
> 		the source is a marked read, even when the target is a
> 		plain read.  This is necessary if rcu_dereference() is
> 		to work correctly; it is tantamount to assuming that
> 		the compiler never speculates address dependencies.
> 		However, if the source is a plain read then there is
> 		no ordering.  This is because of Alpha, which does not
> 		respect address dependencies to reads (on Alpha,
> 		marked reads include a memory barrier to enforce the
> 		ordering but plain reads do not).

Can the compiler (maybe, it does?) transform, at the C or at the "asm"
level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?

C LB1

{
	int *x = &a;
}

P0(int **x, int *y)
{
	int *r0;

	r0 = rcu_dereference(*x);
	*r0 = 0;
	smp_wmb();
	WRITE_ONCE(*y, 1);
}

P1(int **x, int *y, int *b)
{
	int r0;

	r0 = READ_ONCE(*y);
	rcu_assign_pointer(*x, b);
}

exists (0:r0=b /\ 1:r0=1)


C LB2

{
	int *x = &a;
}

P0(int **x, int *y)
{
	int *r0;

	r0 = rcu_dereference(*x);
	if (*r0)
		*r0 = 0;
	smp_wmb();
	WRITE_ONCE(*y, 1);
}

P1(int **x, int *y, int *b)
{
	int r0;

	r0 = READ_ONCE(*y);
	rcu_assign_pointer(*x, b);
}

exists (0:r0=b /\ 1:r0=1)

LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
clause is not satisfiable, while LB2's "exists" clause is satisfiable.

I'm adding Nick to Cc (I never spoke with him, but from what I see in
LKML, he must understand compiler better than I currently do... ;-/ )

  Andrea


> 
> 		An address dependency to a write always provides
> 		ordering.  Neither the compiler nor the CPU can
> 		speculate the address of a write, because a wrong
> 		guess could generate a data race.  (Question: do we
> 		need to include the case where the source is a plain
> 		read?)
> 
> 		A data or control dependency to a write provides
> 		ordering if the target is a marked write.  This is
> 		because the compiler is obliged to translate a marked
> 		write as a single machine instruction; if it
> 		speculates such a write there will be no opportunity
> 		to correct a mistake.
> 
> 		Dep-rfi (i.e., a data or address dependency from a
> 		read to a write which is then read from on the same
> 		CPU) provides ordering between the two reads if the
> 		target is a marked read.  This is again because the
> 		marked read will be translated as a machine-level load
> 		instruction, and then the CPU will guarantee the
> 		ordering.
> 
> 		There is a special case (data;rfi) that doesn't
> 		provide ordering in itself but can contribute to other
> 		orderings.  A data;rfi link corresponds to situations
> 		where a value is stored in a temporary shared variable
> 		and then loaded back again.  Since the compiler might
> 		choose to eliminate the temporary, its accesses can't
> 		be said to be ordered -- but the accesses around it
> 		might be.  As a simple example, consider:
> 
> 			r1 = READ_ONCE(ptr);
> 			tmp = r1;
> 			r2 = tmp;
> 			WRITE_ONCE(*r2, 5);
> 
> 		The plain accesses involving tmp don't have any
> 		particular ordering requirements, but we do know that
> 		the READ_ONCE must be ordered before the WRITE_ONCE.
> 		The chain of relations is:
> 
> 			[marked] ; data ; rfi ; addr ; [marked]
> 
> 		showing that a data;rfi has been inserted into an
> 		address dependency from a marked read to a marked
> 		write.  In general, any number of data;rfi links can
> 		be inserted in each of the other kinds of dependencies.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-16 21:36 ` Andrea Parri
@ 2019-01-17 15:03   ` Andrea Parri
  2019-01-17 20:21     ` Alan Stern
  2019-01-18 15:10     ` Alan Stern
  2019-01-17 19:43   ` Alan Stern
  1 sibling, 2 replies; 18+ messages in thread
From: Andrea Parri @ 2019-01-17 15:03 UTC (permalink / raw)
  To: Alan Stern
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

On Wed, Jan 16, 2019 at 10:36:58PM +0100, Andrea Parri wrote:
> [...]
> 
> > The difficulty with incorporating plain accesses in the memory model
> > is that the compiler has very few constraints on how it treats plain
> > accesses.  It can eliminate them, duplicate them, rearrange them,
> > merge them, split them up, and goodness knows what else.  To make some
> > sense of this, I have taken the view that a plain access can exist
> > (perhaps multiple times) within a certain bounded region of code.
> > Ordering of two accesses X and Y means that we guarantee at least one
> > instance of the X access must be executed before any instances of the
> > Y access.  (This is assuming that neither of the accesses is
> > completely eliminated by the compiler; otherwise there is nothing to
> > order!)
> > 
> > After adding some simple definitions for the sets of plain and marked
> > accesses and for compiler barriers, the patch updates the ppo
> > relation.  The basic idea here is that ppo can be broken down into
> > categories: memory barriers, overwrites, and dependencies (including
> > dep-rfi).
> > 
> > 	Memory barriers always provide ordering (compiler barriers do
> > 	not but they have indirect effects).
> > 
> > 	Overwriting always provides ordering.  This may seem
> > 	surprising in the case where both X and Y are plain writes,
> > 	but in that case the memory model will say that X can be
> > 	eliminated unless there is at least a compiler barrier between
> > 	X and Y, and this barrier will enforce the ordering.
> > 
> > 	Some dependencies provide ordering and some don't.  Going by
> > 	cases:
> > 
> > 		An address dependency to a read provides ordering when
> > 		the source is a marked read, even when the target is a
> > 		plain read.  This is necessary if rcu_dereference() is
> > 		to work correctly; it is tantamount to assuming that
> > 		the compiler never speculates address dependencies.
> > 		However, if the source is a plain read then there is
> > 		no ordering.  This is because of Alpha, which does not
> > 		respect address dependencies to reads (on Alpha,
> > 		marked reads include a memory barrier to enforce the
> > 		ordering but plain reads do not).
> 
> Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> 
> C LB1
> 
> {
> 	int *x = &a;
> }
> 
> P0(int **x, int *y)
> {
> 	int *r0;
> 
> 	r0 = rcu_dereference(*x);
> 	*r0 = 0;
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int **x, int *y, int *b)
> {
> 	int r0;
> 
> 	r0 = READ_ONCE(*y);
> 	rcu_assign_pointer(*x, b);
> }
> 
> exists (0:r0=b /\ 1:r0=1)
> 
> 
> C LB2
> 
> {
> 	int *x = &a;
> }
> 
> P0(int **x, int *y)
> {
> 	int *r0;
> 
> 	r0 = rcu_dereference(*x);
> 	if (*r0)
> 		*r0 = 0;
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int **x, int *y, int *b)
> {
> 	int r0;
> 
> 	r0 = READ_ONCE(*y);
> 	rcu_assign_pointer(*x, b);
> }
> 
> exists (0:r0=b /\ 1:r0=1)
> 
> LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> clause is not satisfiable, while LB2's "exists" clause is satisfiable.
> 
> I'm adding Nick to Cc (I never spoke with him, but from what I see in
> LKML, he must understand compiler better than I currently do... ;-/ )
> 
>   Andrea
> 
> 
> > 
> > 		An address dependency to a write always provides
> > 		ordering.  Neither the compiler nor the CPU can
> > 		speculate the address of a write, because a wrong
> > 		guess could generate a data race.  (Question: do we
> > 		need to include the case where the source is a plain
> > 		read?)
> > 
> > 		A data or control dependency to a write provides
> > 		ordering if the target is a marked write.  This is
> > 		because the compiler is obliged to translate a marked
> > 		write as a single machine instruction; if it
> > 		speculates such a write there will be no opportunity
> > 		to correct a mistake.
> > 
> > 		Dep-rfi (i.e., a data or address dependency from a
> > 		read to a write which is then read from on the same
> > 		CPU) provides ordering between the two reads if the
> > 		target is a marked read.  This is again because the
> > 		marked read will be translated as a machine-level load
> > 		instruction, and then the CPU will guarantee the
> > 		ordering.
> > 
> > 		There is a special case (data;rfi) that doesn't
> > 		provide ordering in itself but can contribute to other
> > 		orderings.  A data;rfi link corresponds to situations
> > 		where a value is stored in a temporary shared variable
> > 		and then loaded back again.  Since the compiler might
> > 		choose to eliminate the temporary, its accesses can't
> > 		be said to be ordered -- but the accesses around it
> > 		might be.  As a simple example, consider:
> > 
> > 			r1 = READ_ONCE(ptr);
> > 			tmp = r1;
> > 			r2 = tmp;
> > 			WRITE_ONCE(*r2, 5);
> > 
> > 		The plain accesses involving tmp don't have any
> > 		particular ordering requirements, but we do know that
> > 		the READ_ONCE must be ordered before the WRITE_ONCE.
> > 		The chain of relations is:
> > 
> > 			[marked] ; data ; rfi ; addr ; [marked]
> > 
> > 		showing that a data;rfi has been inserted into an
> > 		address dependency from a marked read to a marked
> > 		write.  In general, any number of data;rfi links can
> > 		be inserted in each of the other kinds of dependencies.

As a more general comment (disclaimer), I'm not sure we want to/can add
all the constraints above.  On one hand, for some of them, I ignore the
existence of current use cases in the source (and I don't quite see my-
self encouraging their adoption...); on the other hand, these certainly
do not make the model "simpler" or easier to maintain (in a sound way).

Moreover, I doubt that runtime checkers a la KTSan will ever be able to
assist the developer by supporting these "dependency orderings". [1]

Maybe we could start by adding those orderings that we know are "widely"
relied upon _and_ used by the developers, and later add more/strengthen
the model as needed (where feasible).

Thoughts?

  Andrea

[1] https://groups.google.com/d/msg/ktsan/bVZ1c6H2NE0/gapvllYNBQAJ

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-17 15:03   ` Andrea Parri
@ 2019-01-17 20:21     ` Alan Stern
  2019-01-18 15:10     ` Alan Stern
  1 sibling, 0 replies; 18+ messages in thread
From: Alan Stern @ 2019-01-17 20:21 UTC (permalink / raw)
  To: Andrea Parri
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

On Thu, 17 Jan 2019, Andrea Parri wrote:

> > > 		There is a special case (data;rfi) that doesn't
> > > 		provide ordering in itself but can contribute to other
> > > 		orderings.  A data;rfi link corresponds to situations
> > > 		where a value is stored in a temporary shared variable
> > > 		and then loaded back again.  Since the compiler might
> > > 		choose to eliminate the temporary, its accesses can't
> > > 		be said to be ordered -- but the accesses around it
> > > 		might be.  As a simple example, consider:
> > > 
> > > 			r1 = READ_ONCE(ptr);
> > > 			tmp = r1;
> > > 			r2 = tmp;
> > > 			WRITE_ONCE(*r2, 5);
> > > 
> > > 		The plain accesses involving tmp don't have any
> > > 		particular ordering requirements, but we do know that
> > > 		the READ_ONCE must be ordered before the WRITE_ONCE.
> > > 		The chain of relations is:
> > > 
> > > 			[marked] ; data ; rfi ; addr ; [marked]
> > > 
> > > 		showing that a data;rfi has been inserted into an
> > > 		address dependency from a marked read to a marked
> > > 		write.  In general, any number of data;rfi links can
> > > 		be inserted in each of the other kinds of dependencies.
> 
> As a more general comment (disclaimer), I'm not sure we want to/can add
> all the constraints above.  On one hand, for some of them, I ignore the
> existence of current use cases in the source (and I don't quite see my-
> self encouraging their adoption...); on the other hand, these certainly
> do not make the model "simpler" or easier to maintain (in a sound way).
> 
> Moreover, I doubt that runtime checkers a la KTSan will ever be able to
> assist the developer by supporting these "dependency orderings". [1]
> 
> Maybe we could start by adding those orderings that we know are "widely"
> relied upon _and_ used by the developers, and later add more/strengthen
> the model as needed (where feasible).
> 
> Thoughts?

Right now I'm inclined to give up on all dependency orderings other
than address dependency from a marked read.  But this would mean
missing things like

	MR ->addr PR ->data MW

which ought to be a valid ordering (MR stands for "marked read", "PR"
for "plain read", and "MW" for "marked write").  Is that going to be 
okay?  Or should I also include data and control dependencies from 
plain reads to marked writes?

Also, should this still include "[marked] ; (data ; rfi)* ; addr"?  
Without it, we wouldn't be able to tell that the following test does
not race:


C non-race4

{
int *x = a;
}

P0(int **x, int *b)
{
	*b = 1;
	smp_wmb();
	rcu_assign_pointer(*x, b);
}

P1(int **x, int **tmp)
{
	int *r1;
	int *r2;
	int r3;

	r1 = rcu_dereference(*x);
	tmp = r1;
	r2 = tmp;
	r3 = *r2;
}

exists (1:r1=b /\ 1:r3=0)


And it seems reasonable that this pattern could be used in the kernel.  
Although, I admit, in this scenario it's much more likely that tmp
would be a non-shared variable.

Alan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-17 15:03   ` Andrea Parri
  2019-01-17 20:21     ` Alan Stern
@ 2019-01-18 15:10     ` Alan Stern
  2019-01-18 15:56       ` Andrea Parri
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Stern @ 2019-01-18 15:10 UTC (permalink / raw)
  To: Andrea Parri
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

On Thu, 17 Jan 2019, Andrea Parri wrote:

> > Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> > level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> > 
> > C LB1
> > 
> > {
> > 	int *x = &a;
> > }
> > 
> > P0(int **x, int *y)
> > {
> > 	int *r0;
> > 
> > 	r0 = rcu_dereference(*x);
> > 	*r0 = 0;
> > 	smp_wmb();
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int **x, int *y, int *b)
> > {
> > 	int r0;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	rcu_assign_pointer(*x, b);
> > }
> > 
> > exists (0:r0=b /\ 1:r0=1)
> > 
> > 
> > C LB2
> > 
> > {
> > 	int *x = &a;
> > }
> > 
> > P0(int **x, int *y)
> > {
> > 	int *r0;
> > 
> > 	r0 = rcu_dereference(*x);
> > 	if (*r0)
> > 		*r0 = 0;
> > 	smp_wmb();
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int **x, int *y, int *b)
> > {
> > 	int r0;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	rcu_assign_pointer(*x, b);
> > }
> > 
> > exists (0:r0=b /\ 1:r0=1)
> > 
> > LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> > clause is not satisfiable, while LB2's "exists" clause is satisfiable.

A relatively simple solution to this problem would be to say that 
smp_wmb doesn't order plain writes.

I think the rest of the memory model would then be okay.  However, I am
open to arguments that this approach is too complex and we should
insist on the same kind of strict ordering for race avoidance that the
C11 standard uses (i.e., conflicting accesses separated by full memory
barriers or release & acquire barriers or locking).

Alan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-18 15:10     ` Alan Stern
@ 2019-01-18 15:56       ` Andrea Parri
  2019-01-18 16:43         ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Andrea Parri @ 2019-01-18 15:56 UTC (permalink / raw)
  To: Alan Stern
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

On Fri, Jan 18, 2019 at 10:10:22AM -0500, Alan Stern wrote:
> On Thu, 17 Jan 2019, Andrea Parri wrote:
> 
> > > Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> > > level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> > > 
> > > C LB1
> > > 
> > > {
> > > 	int *x = &a;
> > > }
> > > 
> > > P0(int **x, int *y)
> > > {
> > > 	int *r0;
> > > 
> > > 	r0 = rcu_dereference(*x);
> > > 	*r0 = 0;
> > > 	smp_wmb();
> > > 	WRITE_ONCE(*y, 1);
> > > }
> > > 
> > > P1(int **x, int *y, int *b)
> > > {
> > > 	int r0;
> > > 
> > > 	r0 = READ_ONCE(*y);
> > > 	rcu_assign_pointer(*x, b);
> > > }
> > > 
> > > exists (0:r0=b /\ 1:r0=1)
> > > 
> > > 
> > > C LB2
> > > 
> > > {
> > > 	int *x = &a;
> > > }
> > > 
> > > P0(int **x, int *y)
> > > {
> > > 	int *r0;
> > > 
> > > 	r0 = rcu_dereference(*x);
> > > 	if (*r0)
> > > 		*r0 = 0;
> > > 	smp_wmb();
> > > 	WRITE_ONCE(*y, 1);
> > > }
> > > 
> > > P1(int **x, int *y, int *b)
> > > {
> > > 	int r0;
> > > 
> > > 	r0 = READ_ONCE(*y);
> > > 	rcu_assign_pointer(*x, b);
> > > }
> > > 
> > > exists (0:r0=b /\ 1:r0=1)
> > > 
> > > LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> > > clause is not satisfiable, while LB2's "exists" clause is satisfiable.
> 
> A relatively simple solution to this problem would be to say that 
> smp_wmb doesn't order plain writes.

It seems so; I don't have other solutions to suggest ATM.  (But, TBH,
I'm still in the process of reviewing/testing these changes... )

And yes, this is a pain! : I don't have the exact statistics, but I'm
willing to believe that removing this order will take us back ~99% of
the current (~500!) uses of smp_wmb() ;-/

Oh, well, maybe we'll find a better solution one day: after all, that
one doesn't seem worse than what the current LKMM has to say! ;-)


> 
> I think the rest of the memory model would then be okay.  However, I am
> open to arguments that this approach is too complex and we should
> insist on the same kind of strict ordering for race avoidance that the
> C11 standard uses (i.e., conflicting accesses separated by full memory
> barriers or release & acquire barriers or locking).

Indeed;  maybe, we've just found another reason to obsolete smp_wmb()! ;-)

  Andrea


> 
> Alan
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-18 15:56       ` Andrea Parri
@ 2019-01-18 16:43         ` Alan Stern
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Stern @ 2019-01-18 16:43 UTC (permalink / raw)
  To: Andrea Parri
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

On Fri, 18 Jan 2019, Andrea Parri wrote:

> > A relatively simple solution to this problem would be to say that 
> > smp_wmb doesn't order plain writes.
> 
> It seems so; I don't have other solutions to suggest ATM.  (But, TBH,
> I'm still in the process of reviewing/testing these changes... )
> 
> And yes, this is a pain! : I don't have the exact statistics, but I'm
> willing to believe that removing this order will take us back ~99% of
> the current (~500!) uses of smp_wmb() ;-/
> 
> Oh, well, maybe we'll find a better solution one day: after all, that
> one doesn't seem worse than what the current LKMM has to say! ;-)
> 
> 
> > 
> > I think the rest of the memory model would then be okay.  However, I am
> > open to arguments that this approach is too complex and we should
> > insist on the same kind of strict ordering for race avoidance that the
> > C11 standard uses (i.e., conflicting accesses separated by full memory
> > barriers or release & acquire barriers or locking).
> 
> Indeed;  maybe, we've just found another reason to obsolete smp_wmb()! ;-)

Here's another example of how smp_wmb can cause trouble.  In this test,
I have replaced "*x = 1" in P1 with "r2 = *x; if (r2 != 1) *x = 1",
which is a perfectly valid transformation for the compiler to make.  
But as a result of this transformation, the MP pattern between P1 and
P2 is now allowed!

This shows that when plain accesses are involved, smp_wmb() in the
writing thread is not sufficient to forbid MP.

Alan


C bad-wmb

{}

P0(int *x, int *y)
{
	WRITE_ONCE(*x, 1);
	smp_store_release(y, 1);
}

P1(int *x, int *y, int *z)
{
	int r1;
	int r2;

	r1 = smp_load_acquire(y);
	if (r1) {
		/* Instead of *x = 1 ... */
		r2 = *x;
		if (r2 != 1)
			*x = 1;
		smp_wmb();
		WRITE_ONCE(*z, 1);
	}
}

P2(int *x, int *z)
{
	int r3;
	int r4 = 0;

	r3 = READ_ONCE(*z);
	if (r3) {
		smp_rmb();
		r4 = READ_ONCE(*x);
	}
}

exists (2:r3=1 /\ 2:r4=0)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-16 21:36 ` Andrea Parri
  2019-01-17 15:03   ` Andrea Parri
@ 2019-01-17 19:43   ` Alan Stern
  2019-01-18 18:53     ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Stern @ 2019-01-17 19:43 UTC (permalink / raw)
  To: Andrea Parri
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

On Wed, 16 Jan 2019, Andrea Parri wrote:

> Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> 
> C LB1
> 
> {
> 	int *x = &a;
> }
> 
> P0(int **x, int *y)
> {
> 	int *r0;
> 
> 	r0 = rcu_dereference(*x);
> 	*r0 = 0;
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int **x, int *y, int *b)
> {
> 	int r0;
> 
> 	r0 = READ_ONCE(*y);
> 	rcu_assign_pointer(*x, b);
> }
> 
> exists (0:r0=b /\ 1:r0=1)
> 
> 
> C LB2
> 
> {
> 	int *x = &a;
> }
> 
> P0(int **x, int *y)
> {
> 	int *r0;
> 
> 	r0 = rcu_dereference(*x);
> 	if (*r0)
> 		*r0 = 0;
> 	smp_wmb();
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int **x, int *y, int *b)
> {
> 	int r0;
> 
> 	r0 = READ_ONCE(*y);
> 	rcu_assign_pointer(*x, b);
> }
> 
> exists (0:r0=b /\ 1:r0=1)
> 
> LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> clause is not satisfiable, while LB2's "exists" clause is satisfiable.

Umm.  Transforming

	*r0 = 0;

to

	if (*r0 != 0)
		*r0 = 0;

wouldn't work on Alpha if r0 was assigned from a plain read with no
memory barrier between.  But when r0 is assigned from an
rcu_dereference call, or if there's no indirection (as in "if (a != 0)
a = 0;"), the compiler is indeed allowed to perform this
transformation.

This means my definition of preserved writes was wrong; a write we 
thought had to be preserved could instead be transformed into a read.

This objection throws a serious monkey wrench into my approach.  For
one thing, it implies that (as in the example) we can't expect
smp_wmb() always to order plain writes.  For another, it means we have
to assume a lot more writes need not be preserved.

I don't know.  This may doom the effort to formalize dependencies to
plain accesses.  Or at least, those other than address dependencies
from marked reads.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-17 19:43   ` Alan Stern
@ 2019-01-18 18:53     ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2019-01-18 18:53 UTC (permalink / raw)
  To: Alan Stern
  Cc: Andrea Parri, LKMM Maintainers -- Akira Yokosawa, Boqun Feng,
	Daniel Lustig, David Howells, Jade Alglave, Luc Maranget,
	Nicholas Piggin, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	Nick Desaulniers, linux-kernel

On Thu, Jan 17, 2019 at 02:43:54PM -0500, Alan Stern wrote:
> On Wed, 16 Jan 2019, Andrea Parri wrote:
> 
> > Can the compiler (maybe, it does?) transform, at the C or at the "asm"
> > level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)?
> > 
> > C LB1
> > 
> > {
> > 	int *x = &a;
> > }
> > 
> > P0(int **x, int *y)
> > {
> > 	int *r0;
> > 
> > 	r0 = rcu_dereference(*x);
> > 	*r0 = 0;
> > 	smp_wmb();
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int **x, int *y, int *b)
> > {
> > 	int r0;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	rcu_assign_pointer(*x, b);
> > }
> > 
> > exists (0:r0=b /\ 1:r0=1)
> > 
> > 
> > C LB2
> > 
> > {
> > 	int *x = &a;
> > }
> > 
> > P0(int **x, int *y)
> > {
> > 	int *r0;
> > 
> > 	r0 = rcu_dereference(*x);
> > 	if (*r0)
> > 		*r0 = 0;
> > 	smp_wmb();
> > 	WRITE_ONCE(*y, 1);
> > }
> > 
> > P1(int **x, int *y, int *b)
> > {
> > 	int r0;
> > 
> > 	r0 = READ_ONCE(*y);
> > 	rcu_assign_pointer(*x, b);
> > }
> > 
> > exists (0:r0=b /\ 1:r0=1)
> > 
> > LB1 and LB2 are data-race free, according to the patch; LB1's "exists"
> > clause is not satisfiable, while LB2's "exists" clause is satisfiable.
> 
> Umm.  Transforming
> 
> 	*r0 = 0;
> 
> to
> 
> 	if (*r0 != 0)
> 		*r0 = 0;
> 
> wouldn't work on Alpha if r0 was assigned from a plain read with no
> memory barrier between.  But when r0 is assigned from an
> rcu_dereference call, or if there's no indirection (as in "if (a != 0)
> a = 0;"), the compiler is indeed allowed to perform this
> transformation.
> 
> This means my definition of preserved writes was wrong; a write we 
> thought had to be preserved could instead be transformed into a read.
> 
> This objection throws a serious monkey wrench into my approach.  For
> one thing, it implies that (as in the example) we can't expect
> smp_wmb() always to order plain writes.  For another, it means we have
> to assume a lot more writes need not be preserved.
> 
> I don't know.  This may doom the effort to formalize dependencies to
> plain accesses.  Or at least, those other than address dependencies
> from marked reads.

(Catching up, hello from Auckland!)

At this point, I am very much in favor of taking the simpler starting
point.  If someone is using any sort of dependency from a plain access,
all bets are off.  Similarly, if someone is using a control or data
dependency even from a marked access, the later dependent access must
be marked to guarantee ordering.

I believe that the transformation from "*r0 = 0" should be convincing.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
       [not found] <Pine.LNX.4.44L0.1901141439480.1366-100000@iolanthe.rowland.org>
                   ` (2 preceding siblings ...)
  2019-01-16 21:36 ` Andrea Parri
@ 2019-01-22 15:47 ` Andrea Parri
  2019-01-22 16:19   ` Alan Stern
  3 siblings, 1 reply; 18+ messages in thread
From: Andrea Parri @ 2019-01-22 15:47 UTC (permalink / raw)
  To: Alan Stern
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	linux-kernel

> @@ -131,7 +159,7 @@ let rec rcu-fence = rcu-gp | srcu-gp |
>  	(rcu-fence ; rcu-link ; rcu-fence)
>  
>  (* rb orders instructions just as pb does *)
> -let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb*
> +let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb* ; [marked]

Testing has revealed some subtle semantics changes for some RCU tests
_without_ unmarked memory accesses; an example is reported at the end
of this email.  I suspect that the improvements you mentioned in this
thread can restore the original semantics but I'm reporting this here
for further reference.

With the above definition of 'rb', we're losing links which originate
or target RCU fences, so that this definition is in fact a relaxation
w.r.t. the current semantics (even when limiting to marked accesses).
The test below, for example, is currently forbidden by the LKMM, but
it becomes allowed with this patch.

FWIW, I checked that including the RCU fences in 'marked' can restore
the original semantics of these tests; I'm still not sure whether this
change can make sense though....

Thoughts?

Oh, one last (and unrelated) nit before I forget: IIUC, we used to
upper-case set names, so I'd also suggest s/marked/Marked, s/plain/Plain
and similarly for the other sets to be introduced.

  Andrea

C sync-rcu-is-not-idempotent

{ }

P0(int *x, int *y)
{
	int r0;

	WRITE_ONCE(*x, 1);
	synchronize_rcu();
	synchronize_rcu();
	r0 = READ_ONCE(*y);
}

P1(int *y, int *z)
{
	int r0;

	rcu_read_lock();
	WRITE_ONCE(*y, 1);
	r0 = READ_ONCE(*z);
	rcu_read_unlock();
}

P2(int *z, int *x)
{
	int r0;

	rcu_read_lock();
	WRITE_ONCE(*z, 1);
	r0 = READ_ONCE(*x);
	rcu_read_unlock();
}

exists (0:r0=0 /\ 1:r0=0 /\ 2:r0=0)

>  
>  irreflexive rb as rcu
>  

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Plain accesses and data races in the Linux Kernel Memory Model
  2019-01-22 15:47 ` Andrea Parri
@ 2019-01-22 16:19   ` Alan Stern
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Stern @ 2019-01-22 16:19 UTC (permalink / raw)
  To: Andrea Parri
  Cc: LKMM Maintainers -- Akira Yokosawa, Boqun Feng, Daniel Lustig,
	David Howells, Jade Alglave, Luc Maranget, Nicholas Piggin,
	Paul E. McKenney, Peter Zijlstra, Will Deacon, Dmitry Vyukov,
	linux-kernel

On Tue, 22 Jan 2019, Andrea Parri wrote:

> > @@ -131,7 +159,7 @@ let rec rcu-fence = rcu-gp | srcu-gp |
> >  	(rcu-fence ; rcu-link ; rcu-fence)
> >  
> >  (* rb orders instructions just as pb does *)
> > -let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb*
> > +let rb = prop ; po ; rcu-fence ; po? ; hb* ; pb* ; [marked]
> 
> Testing has revealed some subtle semantics changes for some RCU tests
> _without_ unmarked memory accesses; an example is reported at the end
> of this email.  I suspect that the improvements you mentioned in this
> thread can restore the original semantics but I'm reporting this here
> for further reference.
> 
> With the above definition of 'rb', we're losing links which originate
> or target RCU fences, so that this definition is in fact a relaxation
> w.r.t. the current semantics (even when limiting to marked accesses).
> The test below, for example, is currently forbidden by the LKMM, but
> it becomes allowed with this patch.
> 
> FWIW, I checked that including the RCU fences in 'marked' can restore
> the original semantics of these tests; I'm still not sure whether this
> change can make sense though....
> 
> Thoughts?

Ah, a very good discovery.  I think changing marked to ~plain in a few
places would be a better solution.  Or maybe allowing plain accesses in
those places will also be okay -- it's hard to judge at this point.

> Oh, one last (and unrelated) nit before I forget: IIUC, we used to
> upper-case set names, so I'd also suggest s/marked/Marked, s/plain/Plain
> and similarly for the other sets to be introduced.

Okay, I'll follow that convention.

Alan


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-01-22 16:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.44L0.1901141439480.1366-100000@iolanthe.rowland.org>
     [not found] ` <20190114235426.GV1215@linux.ibm.com>
2019-01-15  7:20   ` Plain accesses and data races in the Linux Kernel Memory Model Dmitry Vyukov
2019-01-15 15:03     ` Alan Stern
2019-01-15 15:23       ` Paul E. McKenney
2019-01-15 14:25 ` Andrea Parri
2019-01-15 15:19   ` Alan Stern
2019-01-16 11:57     ` Peter Zijlstra
2019-01-16 13:11       ` Paul E. McKenney
2019-01-16 15:49         ` Alan Stern
2019-01-16 21:36 ` Andrea Parri
2019-01-17 15:03   ` Andrea Parri
2019-01-17 20:21     ` Alan Stern
2019-01-18 15:10     ` Alan Stern
2019-01-18 15:56       ` Andrea Parri
2019-01-18 16:43         ` Alan Stern
2019-01-17 19:43   ` Alan Stern
2019-01-18 18:53     ` Paul E. McKenney
2019-01-22 15:47 ` Andrea Parri
2019-01-22 16:19   ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.