All of lore.kernel.org
 help / color / mirror / Atom feed
* XDP socket rings, and LKMM litmus tests
@ 2021-03-02 18:46 Björn Töpel
  2021-03-02 19:57 ` Paul E. McKenney
  2021-03-02 21:14 ` Alan Stern
  0 siblings, 2 replies; 33+ messages in thread
From: Björn Töpel @ 2021-03-02 18:46 UTC (permalink / raw)
  To: bpf, LKML
  Cc: stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng,
	npiggin, dhowells, j.alglave, luc.maranget, paulmck, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

Hi!

Firstly; The long Cc-list is to reach the LKMM-folks.

Some background; the XDP sockets use a ring-buffer to communicate
between the kernel and userland. It's a
single-consumer/single-producer ring, and described in
net/xdp/xsk_queue.h.

--8<---
/* The structure of the shared state of the rings are the same as the
 * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
 * ring, the kernel is the producer and user space is the consumer. For
 * the Tx and fill rings, the kernel is the consumer and user space is
 * the producer.
 *
 * producer                         consumer
 *
 * if (LOAD ->consumer) {           LOAD ->producer
 *                    (A)           smp_rmb()       (C)
 *    STORE $data                   LOAD $data
 *    smp_wmb()       (B)           smp_mb()        (D)
 *    STORE ->producer              STORE ->consumer
 * }
 *
 * (A) pairs with (D), and (B) pairs with (C).
...
-->8---

I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
semantics [1], without breaking existing userspace applications.

So, I figured I'd use herd7 and the LKMM model to build a litmus test
for the barrier version, then for the acquire-release version, and
finally permutations of both.

The idea is to use a one element ring, with a state machine outlined
in the litmus test.

The basic test for the existing smp_{r,w,}mb() barriers looks like:

$ cat spsc-rb+1p1c.litmus
C spsc-rb+1p1c

// Stupid one entry ring:
// prod cons     allowed action       prod cons
//    0    0 =>       prod          =>   1    0
//    0    1 =>       cons          =>   0    0
//    1    0 =>       cons          =>   1    1
//    1    1 =>       prod          =>   0    1

{ prod = 1; }

// Here, we start at prod==1,cons==0, data==0, i.e. producer has
// written data=0, so from here only the consumer can start, and should
// consume data==0. Afterwards, producer can continue and write 1 to
// data. Can we enter state prod==0, cons==1, but consumer observerd
// the write of 1?

P0(int *prod, int *cons, int *data)
{
    int p;
    int c;
    int cond = 0;

    p = READ_ONCE(*prod);
    c = READ_ONCE(*cons);
    if (p == 0)
        if (c == 0)
            cond = 1;
    if (p == 1)
        if (c == 1)
            cond = 1;

    if (cond) {
        smp_mb();
        WRITE_ONCE(*data, 1);
        smp_wmb();
        WRITE_ONCE(*prod, p ^ 1);
    }
}

P1(int *prod, int *cons, int *data)
{
    int p;
    int c;
    int d = -1;
    int cond = 0;

    p = READ_ONCE(*prod);
    c = READ_ONCE(*cons);
    if (p == 1)
        if (c == 0)
            cond = 1;
    if (p == 0)
        if (c == 1)
            cond = 1;

    if (cond == 1) {
        smp_rmb();
        d = READ_ONCE(*data);
        smp_mb();
        WRITE_ONCE(*cons, c ^ 1);
    }
}

exists( 1:d=1 /\ prod=0 /\ cons=1 );

--

The weird state changing if-statements is because that I didn't get
'&&' and '||' to work with herd.

When this is run:

$ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
Test spsc-rb+1p1c Allowed
States 2
1:d=0; cons=1; prod=0;
1:d=0; cons=1; prod=1;
No
Witnesses
Positive: 0 Negative: 2
Condition exists (1:d=1 /\ prod=0 /\ cons=1)
Observation spsc-rb+1p1c Never 0 2
Time spsc-rb+1p1c 0.04
Hash=b399756d6a1301ca5bda042f32130791

Now to my question; In P0 there's an smp_mb(). Without that, the d==1
can be observed from P1 (consumer):

$ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
Test spsc-rb+1p1c Allowed
States 3
1:d=0; cons=1; prod=0;
1:d=0; cons=1; prod=1;
1:d=1; cons=1; prod=0;
Ok
Witnesses
Positive: 1 Negative: 2
Condition exists (1:d=1 /\ prod=0 /\ cons=1)
Observation spsc-rb+1p1c Sometimes 1 2
Time spsc-rb+1p1c 0.04
Hash=0047fc21fa77da9a9aee15e35ec367ef

In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
on control dependencies") removes the corresponding smp_mb(), and also
the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
("documentation: Update circular buffer for
load-acquire/store-release")) is missing the smp_mb() at the
producer-side.

I'm trying to wrap my head around why it's OK to remove the smp_mb()
in the cases above? I'm worried that the current XDP socket ring
implementation (which is missing smp_mb()) might be broken.


If you read this far, thanks! :-)
Björn


[1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 18:46 XDP socket rings, and LKMM litmus tests Björn Töpel
@ 2021-03-02 19:57 ` Paul E. McKenney
  2021-03-02 20:04   ` Paul E. McKenney
  2021-03-02 20:24   ` Björn Töpel
  2021-03-02 21:14 ` Alan Stern
  1 sibling, 2 replies; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-02 19:57 UTC (permalink / raw)
  To: Björn Töpel
  Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> Hi!
> 
> Firstly; The long Cc-list is to reach the LKMM-folks.
> 
> Some background; the XDP sockets use a ring-buffer to communicate
> between the kernel and userland. It's a
> single-consumer/single-producer ring, and described in
> net/xdp/xsk_queue.h.
> 
> --8<---
> /* The structure of the shared state of the rings are the same as the
>  * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
>  * ring, the kernel is the producer and user space is the consumer. For
>  * the Tx and fill rings, the kernel is the consumer and user space is
>  * the producer.
>  *
>  * producer                         consumer
>  *
>  * if (LOAD ->consumer) {           LOAD ->producer
>  *                    (A)           smp_rmb()       (C)
>  *    STORE $data                   LOAD $data
>  *    smp_wmb()       (B)           smp_mb()        (D)
>  *    STORE ->producer              STORE ->consumer
>  * }
>  *
>  * (A) pairs with (D), and (B) pairs with (C).
> ...
> -->8---
> 
> I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
> semantics [1], without breaking existing userspace applications.
> 
> So, I figured I'd use herd7 and the LKMM model to build a litmus test
> for the barrier version, then for the acquire-release version, and
> finally permutations of both.
> 
> The idea is to use a one element ring, with a state machine outlined
> in the litmus test.
> 
> The basic test for the existing smp_{r,w,}mb() barriers looks like:
> 
> $ cat spsc-rb+1p1c.litmus
> C spsc-rb+1p1c
> 
> // Stupid one entry ring:
> // prod cons     allowed action       prod cons
> //    0    0 =>       prod          =>   1    0
> //    0    1 =>       cons          =>   0    0
> //    1    0 =>       cons          =>   1    1
> //    1    1 =>       prod          =>   0    1
> 
> { prod = 1; }
> 
> // Here, we start at prod==1,cons==0, data==0, i.e. producer has
> // written data=0, so from here only the consumer can start, and should
> // consume data==0. Afterwards, producer can continue and write 1 to
> // data. Can we enter state prod==0, cons==1, but consumer observerd
> // the write of 1?
> 
> P0(int *prod, int *cons, int *data)
> {
>     int p;
>     int c;
>     int cond = 0;
> 
>     p = READ_ONCE(*prod);
>     c = READ_ONCE(*cons);
>     if (p == 0)
>         if (c == 0)
>             cond = 1;
>     if (p == 1)
>         if (c == 1)
>             cond = 1;
> 
>     if (cond) {
>         smp_mb();
>         WRITE_ONCE(*data, 1);
>         smp_wmb();
>         WRITE_ONCE(*prod, p ^ 1);
>     }
> }
> 
> P1(int *prod, int *cons, int *data)
> {
>     int p;
>     int c;
>     int d = -1;
>     int cond = 0;
> 
>     p = READ_ONCE(*prod);
>     c = READ_ONCE(*cons);
>     if (p == 1)
>         if (c == 0)
>             cond = 1;
>     if (p == 0)
>         if (c == 1)
>             cond = 1;
> 
>     if (cond == 1) {
>         smp_rmb();
>         d = READ_ONCE(*data);
>         smp_mb();
>         WRITE_ONCE(*cons, c ^ 1);
>     }
> }

Before digging in too deeply, does the following simplification
still capture your intent?

P0(int *prod, int *cons, int *data)
{
    int p;
    int cond = 0;

    p = READ_ONCE(*prod);
    if (p == READ_ONCE(*cons))
            cond = 1;
    if (cond) {
	smp_mb();
        WRITE_ONCE(*data, 1);
        smp_wmb();
        WRITE_ONCE(*prod, p ^ 1);
    }
}

P1(int *prod, int *cons, int *data)
{
    int c;
    int d = -1;
    int cond = 0;

    c = READ_ONCE(*cons);
    if (READ_ONCE(*prod) == c)
            cond = 1;

    if (cond == 1) {
        smp_rmb();
        d = READ_ONCE(*data);
        smp_mb();
        WRITE_ONCE(*cons, c ^ 1);
    }
}

							Thanx, Paul

> exists( 1:d=1 /\ prod=0 /\ cons=1 );
> 
> --
> 
> The weird state changing if-statements is because that I didn't get
> '&&' and '||' to work with herd.
> 
> When this is run:
> 
> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> Test spsc-rb+1p1c Allowed
> States 2
> 1:d=0; cons=1; prod=0;
> 1:d=0; cons=1; prod=1;
> No
> Witnesses
> Positive: 0 Negative: 2
> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> Observation spsc-rb+1p1c Never 0 2
> Time spsc-rb+1p1c 0.04
> Hash=b399756d6a1301ca5bda042f32130791
> 
> Now to my question; In P0 there's an smp_mb(). Without that, the d==1
> can be observed from P1 (consumer):
> 
> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> Test spsc-rb+1p1c Allowed
> States 3
> 1:d=0; cons=1; prod=0;
> 1:d=0; cons=1; prod=1;
> 1:d=1; cons=1; prod=0;
> Ok
> Witnesses
> Positive: 1 Negative: 2
> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> Observation spsc-rb+1p1c Sometimes 1 2
> Time spsc-rb+1p1c 0.04
> Hash=0047fc21fa77da9a9aee15e35ec367ef
> 
> In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
> on control dependencies") removes the corresponding smp_mb(), and also
> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> ("documentation: Update circular buffer for
> load-acquire/store-release")) is missing the smp_mb() at the
> producer-side.
> 
> I'm trying to wrap my head around why it's OK to remove the smp_mb()
> in the cases above? I'm worried that the current XDP socket ring
> implementation (which is missing smp_mb()) might be broken.
> 
> 
> If you read this far, thanks! :-)
> Björn
> 
> 
> [1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 19:57 ` Paul E. McKenney
@ 2021-03-02 20:04   ` Paul E. McKenney
  2021-03-02 20:37     ` Björn Töpel
  2021-03-02 20:24   ` Björn Töpel
  1 sibling, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-02 20:04 UTC (permalink / raw)
  To: Björn Töpel
  Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

On Tue, Mar 02, 2021 at 11:57:58AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> > Hi!
> > 
> > Firstly; The long Cc-list is to reach the LKMM-folks.
> > 
> > Some background; the XDP sockets use a ring-buffer to communicate
> > between the kernel and userland. It's a
> > single-consumer/single-producer ring, and described in
> > net/xdp/xsk_queue.h.
> > 
> > --8<---
> > /* The structure of the shared state of the rings are the same as the
> >  * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> >  * ring, the kernel is the producer and user space is the consumer. For
> >  * the Tx and fill rings, the kernel is the consumer and user space is
> >  * the producer.
> >  *
> >  * producer                         consumer
> >  *
> >  * if (LOAD ->consumer) {           LOAD ->producer
> >  *                    (A)           smp_rmb()       (C)
> >  *    STORE $data                   LOAD $data
> >  *    smp_wmb()       (B)           smp_mb()        (D)
> >  *    STORE ->producer              STORE ->consumer
> >  * }
> >  *
> >  * (A) pairs with (D), and (B) pairs with (C).
> > ...
> > -->8---
> > 
> > I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
> > semantics [1], without breaking existing userspace applications.
> > 
> > So, I figured I'd use herd7 and the LKMM model to build a litmus test
> > for the barrier version, then for the acquire-release version, and
> > finally permutations of both.
> > 
> > The idea is to use a one element ring, with a state machine outlined
> > in the litmus test.
> > 
> > The basic test for the existing smp_{r,w,}mb() barriers looks like:
> > 
> > $ cat spsc-rb+1p1c.litmus
> > C spsc-rb+1p1c
> > 
> > // Stupid one entry ring:
> > // prod cons     allowed action       prod cons
> > //    0    0 =>       prod          =>   1    0
> > //    0    1 =>       cons          =>   0    0
> > //    1    0 =>       cons          =>   1    1
> > //    1    1 =>       prod          =>   0    1
> > 
> > { prod = 1; }
> > 
> > // Here, we start at prod==1,cons==0, data==0, i.e. producer has
> > // written data=0, so from here only the consumer can start, and should
> > // consume data==0. Afterwards, producer can continue and write 1 to
> > // data. Can we enter state prod==0, cons==1, but consumer observerd
> > // the write of 1?
> > 
> > P0(int *prod, int *cons, int *data)
> > {
> >     int p;
> >     int c;
> >     int cond = 0;
> > 
> >     p = READ_ONCE(*prod);
> >     c = READ_ONCE(*cons);
> >     if (p == 0)
> >         if (c == 0)
> >             cond = 1;
> >     if (p == 1)
> >         if (c == 1)
> >             cond = 1;
> > 
> >     if (cond) {
> >         smp_mb();
> >         WRITE_ONCE(*data, 1);
> >         smp_wmb();
> >         WRITE_ONCE(*prod, p ^ 1);
> >     }
> > }
> > 
> > P1(int *prod, int *cons, int *data)
> > {
> >     int p;
> >     int c;
> >     int d = -1;
> >     int cond = 0;
> > 
> >     p = READ_ONCE(*prod);
> >     c = READ_ONCE(*cons);
> >     if (p == 1)
> >         if (c == 0)
> >             cond = 1;
> >     if (p == 0)
> >         if (c == 1)
> >             cond = 1;
> > 
> >     if (cond == 1) {
> >         smp_rmb();
> >         d = READ_ONCE(*data);
> >         smp_mb();
> >         WRITE_ONCE(*cons, c ^ 1);
> >     }
> > }
> 
> Before digging in too deeply, does the following simplification
> still capture your intent?
> 
> P0(int *prod, int *cons, int *data)
> {
>     int p;
>     int cond = 0;
> 
>     p = READ_ONCE(*prod);
>     if (p == READ_ONCE(*cons))
>             cond = 1;
>     if (cond) {
> 	smp_mb();
>         WRITE_ONCE(*data, 1);
>         smp_wmb();
>         WRITE_ONCE(*prod, p ^ 1);
>     }
> }
> 
> P1(int *prod, int *cons, int *data)
> {
>     int c;
>     int d = -1;
>     int cond = 0;
> 
>     c = READ_ONCE(*cons);
>     if (READ_ONCE(*prod) == c)
>             cond = 1;
> 
>     if (cond == 1) {
>         smp_rmb();
>         d = READ_ONCE(*data);
>         smp_mb();
>         WRITE_ONCE(*cons, c ^ 1);
>     }
> }

And if the answer is "yes", how about this one?

P0(int *prod, int *cons, int *data)
{
    int p;

    p = READ_ONCE(*prod);
    if (p == READ_ONCE(*cons)) {
        WRITE_ONCE(*data, 1);
        smp_wmb();
        WRITE_ONCE(*prod, p ^ 1);
    }
}

P1(int *prod, int *cons, int *data)
{
    int c;
    int d = -1;

    c = READ_ONCE(*cons);
    if (READ_ONCE(*prod) == c) {
        smp_rmb();
        d = READ_ONCE(*data);
        smp_mb();
        WRITE_ONCE(*cons, c ^ 1);
    }
}

							Thanx, Paul

> > exists( 1:d=1 /\ prod=0 /\ cons=1 );
> > 
> > --
> > 
> > The weird state changing if-statements is because that I didn't get
> > '&&' and '||' to work with herd.
> > 
> > When this is run:
> > 
> > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> > Test spsc-rb+1p1c Allowed
> > States 2
> > 1:d=0; cons=1; prod=0;
> > 1:d=0; cons=1; prod=1;
> > No
> > Witnesses
> > Positive: 0 Negative: 2
> > Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> > Observation spsc-rb+1p1c Never 0 2
> > Time spsc-rb+1p1c 0.04
> > Hash=b399756d6a1301ca5bda042f32130791
> > 
> > Now to my question; In P0 there's an smp_mb(). Without that, the d==1
> > can be observed from P1 (consumer):
> > 
> > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> > Test spsc-rb+1p1c Allowed
> > States 3
> > 1:d=0; cons=1; prod=0;
> > 1:d=0; cons=1; prod=1;
> > 1:d=1; cons=1; prod=0;
> > Ok
> > Witnesses
> > Positive: 1 Negative: 2
> > Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> > Observation spsc-rb+1p1c Sometimes 1 2
> > Time spsc-rb+1p1c 0.04
> > Hash=0047fc21fa77da9a9aee15e35ec367ef
> > 
> > In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
> > on control dependencies") removes the corresponding smp_mb(), and also
> > the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> > ("documentation: Update circular buffer for
> > load-acquire/store-release")) is missing the smp_mb() at the
> > producer-side.
> > 
> > I'm trying to wrap my head around why it's OK to remove the smp_mb()
> > in the cases above? I'm worried that the current XDP socket ring
> > implementation (which is missing smp_mb()) might be broken.
> > 
> > 
> > If you read this far, thanks! :-)
> > Björn
> > 
> > 
> > [1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 19:57 ` Paul E. McKenney
  2021-03-02 20:04   ` Paul E. McKenney
@ 2021-03-02 20:24   ` Björn Töpel
  2021-03-02 20:41     ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Björn Töpel @ 2021-03-02 20:24 UTC (permalink / raw)
  To: paulmck
  Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

On Tue, 2 Mar 2021 at 20:57, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:

[...]

>
> Before digging in too deeply, does the following simplification
> still capture your intent?
>

Thanks for having a look, Paul!

> P0(int *prod, int *cons, int *data)
> {
>     int p;
>     int cond = 0;
>
>     p = READ_ONCE(*prod);
>     if (p == READ_ONCE(*cons))
>             cond = 1;

With this, yes!

>     if (cond) {
>         smp_mb();
>         WRITE_ONCE(*data, 1);
>         smp_wmb();
>         WRITE_ONCE(*prod, p ^ 1);
>     }
> }
>
> P1(int *prod, int *cons, int *data)
> {
>     int c;
>     int d = -1;
>     int cond = 0;
>
>     c = READ_ONCE(*cons);
>     if (READ_ONCE(*prod) == c)
>             cond = 1;

Hmm, this would not be the correct state transition.

c==1 && p==1 would set cond to 1, right?

I would agree with:
  c = READ_ONCE(*cons);
  if (READ_ONCE(*prod) != c)


>
>     if (cond == 1) {
>         smp_rmb();
>         d = READ_ONCE(*data);
>         smp_mb();
>         WRITE_ONCE(*cons, c ^ 1);
>     }
> }
>
>                                                         Thanx, Paul
>

[...]

Björn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 20:04   ` Paul E. McKenney
@ 2021-03-02 20:37     ` Björn Töpel
  0 siblings, 0 replies; 33+ messages in thread
From: Björn Töpel @ 2021-03-02 20:37 UTC (permalink / raw)
  To: paulmck
  Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

On Tue, 2 Mar 2021 at 21:04, Paul E. McKenney <paulmck@kernel.org> wrote:
>

[...]

>
> And if the answer is "yes", how about this one?
>

With the == to != change in P1, this is OK!

> P0(int *prod, int *cons, int *data)
> {
>     int p;
>
>     p = READ_ONCE(*prod);
>     if (p == READ_ONCE(*cons)) {

...and now d==1 is not a valid state anymore according to herd. If
think I need some input here! :-D

>         WRITE_ONCE(*data, 1);
>         smp_wmb();
>         WRITE_ONCE(*prod, p ^ 1);
>     }
> }
>
> P1(int *prod, int *cons, int *data)
> {
>     int c;
>     int d = -1;
>
>     c = READ_ONCE(*cons);
>     if (READ_ONCE(*prod) == c) {
>         smp_rmb();
>         d = READ_ONCE(*data);
>         smp_mb();
>         WRITE_ONCE(*cons, c ^ 1);
>     }
> }
>

[...]

Björn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 20:24   ` Björn Töpel
@ 2021-03-02 20:41     ` Paul E. McKenney
  2021-03-02 20:51       ` Björn Töpel
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-02 20:41 UTC (permalink / raw)
  To: Björn Töpel
  Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

On Tue, Mar 02, 2021 at 09:24:04PM +0100, Björn Töpel wrote:
> On Tue, 2 Mar 2021 at 20:57, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> 
> [...]
> 
> >
> > Before digging in too deeply, does the following simplification
> > still capture your intent?
> >
> 
> Thanks for having a look, Paul!
> 
> > P0(int *prod, int *cons, int *data)
> > {
> >     int p;
> >     int cond = 0;
> >
> >     p = READ_ONCE(*prod);
> >     if (p == READ_ONCE(*cons))
> >             cond = 1;
> 
> With this, yes!
> 
> >     if (cond) {
> >         smp_mb();
> >         WRITE_ONCE(*data, 1);
> >         smp_wmb();
> >         WRITE_ONCE(*prod, p ^ 1);
> >     }
> > }
> >
> > P1(int *prod, int *cons, int *data)
> > {
> >     int c;
> >     int d = -1;
> >     int cond = 0;
> >
> >     c = READ_ONCE(*cons);
> >     if (READ_ONCE(*prod) == c)
> >             cond = 1;
> 
> Hmm, this would not be the correct state transition.
> 
> c==1 && p==1 would set cond to 1, right?
> 
> I would agree with:
>   c = READ_ONCE(*cons);
>   if (READ_ONCE(*prod) != c)

Right you are!

With that, it looks to me like LKMM is OK with removing the smp_mb().
My guess is that the issue is that LKMM confines the effect of control
dependencies to a single "if" statement, hence my reworking of your
original.

							Thanx, Paul

> >
> >     if (cond == 1) {
> >         smp_rmb();
> >         d = READ_ONCE(*data);
> >         smp_mb();
> >         WRITE_ONCE(*cons, c ^ 1);
> >     }
> > }
> >
> >                                                         Thanx, Paul
> >
> 
> [...]
> 
> Björn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 20:41     ` Paul E. McKenney
@ 2021-03-02 20:51       ` Björn Töpel
  0 siblings, 0 replies; 33+ messages in thread
From: Björn Töpel @ 2021-03-02 20:51 UTC (permalink / raw)
  To: paulmck
  Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra,
	boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

On Tue, 2 Mar 2021 at 21:41, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Mar 02, 2021 at 09:24:04PM +0100, Björn Töpel wrote:
> > On Tue, 2 Mar 2021 at 20:57, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> >
> > [...]
> >
> > >
> > > Before digging in too deeply, does the following simplification
> > > still capture your intent?
> > >
> >
> > Thanks for having a look, Paul!
> >
> > > P0(int *prod, int *cons, int *data)
> > > {
> > >     int p;
> > >     int cond = 0;
> > >
> > >     p = READ_ONCE(*prod);
> > >     if (p == READ_ONCE(*cons))
> > >             cond = 1;
> >
> > With this, yes!
> >
> > >     if (cond) {
> > >         smp_mb();
> > >         WRITE_ONCE(*data, 1);
> > >         smp_wmb();
> > >         WRITE_ONCE(*prod, p ^ 1);
> > >     }
> > > }
> > >
> > > P1(int *prod, int *cons, int *data)
> > > {
> > >     int c;
> > >     int d = -1;
> > >     int cond = 0;
> > >
> > >     c = READ_ONCE(*cons);
> > >     if (READ_ONCE(*prod) == c)
> > >             cond = 1;
> >
> > Hmm, this would not be the correct state transition.
> >
> > c==1 && p==1 would set cond to 1, right?
> >
> > I would agree with:
> >   c = READ_ONCE(*cons);
> >   if (READ_ONCE(*prod) != c)
>
> Right you are!
>
> With that, it looks to me like LKMM is OK with removing the smp_mb().
> My guess is that the issue is that LKMM confines the effect of control
> dependencies to a single "if" statement, hence my reworking of your
> original.
>

Interesting!

I tried the acquire/release version:

P0(int *prod, int *cons, int *data)
{
    int p;
    int cond = 0;

    p = READ_ONCE(*prod);
    if (p == READ_ONCE(*cons)) {
        WRITE_ONCE(*data, 1);
        smp_store_release(prod, p ^ 1);
    }
}

P1(int *prod, int *cons, int *data)
{
    int c;
    int d = -1;

    c = READ_ONCE(*cons);
    if (smp_load_acquire(prod) != c) {
        d = READ_ONCE(*data);
        smp_store_release(cons, c ^ 1);
    }
}

and as with the previous example, restructuring the if-statement makes
"if (p == READ_ONCE(*cons)) {" sufficient, instead of "if (p ==
smp_load_acquire(cons)) {".

Yay!


Björn


>                                                         Thanx, Paul
>
> > >
> > >     if (cond == 1) {
> > >         smp_rmb();
> > >         d = READ_ONCE(*data);
> > >         smp_mb();
> > >         WRITE_ONCE(*cons, c ^ 1);
> > >     }
> > > }
> > >
> > >                                                         Thanx, Paul
> > >
> >
> > [...]
> >
> > Björn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 18:46 XDP socket rings, and LKMM litmus tests Björn Töpel
  2021-03-02 19:57 ` Paul E. McKenney
@ 2021-03-02 21:14 ` Alan Stern
  2021-03-02 23:50   ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Alan Stern @ 2021-03-02 21:14 UTC (permalink / raw)
  To: Björn Töpel
  Cc: bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng,
	npiggin, dhowells, j.alglave, luc.maranget, paulmck, akiyks,
	dlustig, joel, Toke Høiland-Jørgensen, Karlsson,
	Magnus

On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> Hi!
> 
> Firstly; The long Cc-list is to reach the LKMM-folks.
> 
> Some background; the XDP sockets use a ring-buffer to communicate
> between the kernel and userland. It's a
> single-consumer/single-producer ring, and described in
> net/xdp/xsk_queue.h.
> 
> --8<---
> /* The structure of the shared state of the rings are the same as the
>  * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
>  * ring, the kernel is the producer and user space is the consumer. For
>  * the Tx and fill rings, the kernel is the consumer and user space is
>  * the producer.
>  *
>  * producer                         consumer
>  *
>  * if (LOAD ->consumer) {           LOAD ->producer
>  *                    (A)           smp_rmb()       (C)
>  *    STORE $data                   LOAD $data
>  *    smp_wmb()       (B)           smp_mb()        (D)
>  *    STORE ->producer              STORE ->consumer
>  * }
>  *
>  * (A) pairs with (D), and (B) pairs with (C).
> ...
> -->8---
> 
> I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
> semantics [1], without breaking existing userspace applications.
> 
> So, I figured I'd use herd7 and the LKMM model to build a litmus test
> for the barrier version, then for the acquire-release version, and
> finally permutations of both.
> 
> The idea is to use a one element ring, with a state machine outlined
> in the litmus test.
> 
> The basic test for the existing smp_{r,w,}mb() barriers looks like:
> 
> $ cat spsc-rb+1p1c.litmus
> C spsc-rb+1p1c
> 
> // Stupid one entry ring:
> // prod cons     allowed action       prod cons
> //    0    0 =>       prod          =>   1    0
> //    0    1 =>       cons          =>   0    0
> //    1    0 =>       cons          =>   1    1
> //    1    1 =>       prod          =>   0    1
> 
> { prod = 1; }
> 
> // Here, we start at prod==1,cons==0, data==0, i.e. producer has
> // written data=0, so from here only the consumer can start, and should
> // consume data==0. Afterwards, producer can continue and write 1 to
> // data. Can we enter state prod==0, cons==1, but consumer observerd
> // the write of 1?
> 
> P0(int *prod, int *cons, int *data)
> {
>     int p;
>     int c;
>     int cond = 0;
> 
>     p = READ_ONCE(*prod);
>     c = READ_ONCE(*cons);
>     if (p == 0)
>         if (c == 0)
>             cond = 1;
>     if (p == 1)
>         if (c == 1)
>             cond = 1;
> 
>     if (cond) {
>         smp_mb();
>         WRITE_ONCE(*data, 1);
>         smp_wmb();
>         WRITE_ONCE(*prod, p ^ 1);
>     }
> }
> 
> P1(int *prod, int *cons, int *data)
> {
>     int p;
>     int c;
>     int d = -1;
>     int cond = 0;
> 
>     p = READ_ONCE(*prod);
>     c = READ_ONCE(*cons);
>     if (p == 1)
>         if (c == 0)
>             cond = 1;
>     if (p == 0)
>         if (c == 1)
>             cond = 1;
> 
>     if (cond == 1) {
>         smp_rmb();
>         d = READ_ONCE(*data);
>         smp_mb();
>         WRITE_ONCE(*cons, c ^ 1);
>     }
> }
> 
> exists( 1:d=1 /\ prod=0 /\ cons=1 );
> 
> --
> 
> The weird state changing if-statements is because that I didn't get
> '&&' and '||' to work with herd.
> 
> When this is run:
> 
> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> Test spsc-rb+1p1c Allowed
> States 2
> 1:d=0; cons=1; prod=0;
> 1:d=0; cons=1; prod=1;
> No
> Witnesses
> Positive: 0 Negative: 2
> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> Observation spsc-rb+1p1c Never 0 2
> Time spsc-rb+1p1c 0.04
> Hash=b399756d6a1301ca5bda042f32130791
> 
> Now to my question; In P0 there's an smp_mb(). Without that, the d==1
> can be observed from P1 (consumer):
> 
> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> Test spsc-rb+1p1c Allowed
> States 3
> 1:d=0; cons=1; prod=0;
> 1:d=0; cons=1; prod=1;
> 1:d=1; cons=1; prod=0;
> Ok
> Witnesses
> Positive: 1 Negative: 2
> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> Observation spsc-rb+1p1c Sometimes 1 2
> Time spsc-rb+1p1c 0.04
> Hash=0047fc21fa77da9a9aee15e35ec367ef

This result is wrong, apparently because of a bug in herd7.  There 
should be control dependencies from each of the two loads in P0 to each 
of the two stores, but herd7 doesn't detect them.

Maybe Luc can find some time to check whether this really is a bug and 
if it is, fix it.

> In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
> on control dependencies") removes the corresponding smp_mb(), and also
> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> ("documentation: Update circular buffer for
> load-acquire/store-release")) is missing the smp_mb() at the
> producer-side.
> 
> I'm trying to wrap my head around why it's OK to remove the smp_mb()
> in the cases above? I'm worried that the current XDP socket ring
> implementation (which is missing smp_mb()) might be broken.

Because of the control dependencies, the smp_mb isn't needed.  The 
dependencies will order both of the stores after both of the loads.

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 21:14 ` Alan Stern
@ 2021-03-02 23:50   ` Paul E. McKenney
  2021-03-03  6:37     ` maranget
  2021-03-03 17:12     ` Alan Stern
  0 siblings, 2 replies; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-02 23:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> > Hi!
> > 
> > Firstly; The long Cc-list is to reach the LKMM-folks.
> > 
> > Some background; the XDP sockets use a ring-buffer to communicate
> > between the kernel and userland. It's a
> > single-consumer/single-producer ring, and described in
> > net/xdp/xsk_queue.h.
> > 
> > --8<---
> > /* The structure of the shared state of the rings are the same as the
> >  * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> >  * ring, the kernel is the producer and user space is the consumer. For
> >  * the Tx and fill rings, the kernel is the consumer and user space is
> >  * the producer.
> >  *
> >  * producer                         consumer
> >  *
> >  * if (LOAD ->consumer) {           LOAD ->producer
> >  *                    (A)           smp_rmb()       (C)
> >  *    STORE $data                   LOAD $data
> >  *    smp_wmb()       (B)           smp_mb()        (D)
> >  *    STORE ->producer              STORE ->consumer
> >  * }
> >  *
> >  * (A) pairs with (D), and (B) pairs with (C).
> > ...
> > -->8---
> > 
> > I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
> > semantics [1], without breaking existing userspace applications.
> > 
> > So, I figured I'd use herd7 and the LKMM model to build a litmus test
> > for the barrier version, then for the acquire-release version, and
> > finally permutations of both.
> > 
> > The idea is to use a one element ring, with a state machine outlined
> > in the litmus test.
> > 
> > The basic test for the existing smp_{r,w,}mb() barriers looks like:
> > 
> > $ cat spsc-rb+1p1c.litmus
> > C spsc-rb+1p1c
> > 
> > // Stupid one entry ring:
> > // prod cons     allowed action       prod cons
> > //    0    0 =>       prod          =>   1    0
> > //    0    1 =>       cons          =>   0    0
> > //    1    0 =>       cons          =>   1    1
> > //    1    1 =>       prod          =>   0    1
> > 
> > { prod = 1; }
> > 
> > // Here, we start at prod==1,cons==0, data==0, i.e. producer has
> > // written data=0, so from here only the consumer can start, and should
> > // consume data==0. Afterwards, producer can continue and write 1 to
> > // data. Can we enter state prod==0, cons==1, but consumer observerd
> > // the write of 1?
> > 
> > P0(int *prod, int *cons, int *data)
> > {
> >     int p;
> >     int c;
> >     int cond = 0;
> > 
> >     p = READ_ONCE(*prod);
> >     c = READ_ONCE(*cons);
> >     if (p == 0)
> >         if (c == 0)
> >             cond = 1;
> >     if (p == 1)
> >         if (c == 1)
> >             cond = 1;
> > 
> >     if (cond) {
> >         smp_mb();
> >         WRITE_ONCE(*data, 1);
> >         smp_wmb();
> >         WRITE_ONCE(*prod, p ^ 1);
> >     }
> > }
> > 
> > P1(int *prod, int *cons, int *data)
> > {
> >     int p;
> >     int c;
> >     int d = -1;
> >     int cond = 0;
> > 
> >     p = READ_ONCE(*prod);
> >     c = READ_ONCE(*cons);
> >     if (p == 1)
> >         if (c == 0)
> >             cond = 1;
> >     if (p == 0)
> >         if (c == 1)
> >             cond = 1;
> > 
> >     if (cond == 1) {
> >         smp_rmb();
> >         d = READ_ONCE(*data);
> >         smp_mb();
> >         WRITE_ONCE(*cons, c ^ 1);
> >     }
> > }
> > 
> > exists( 1:d=1 /\ prod=0 /\ cons=1 );
> > 
> > --
> > 
> > The weird state changing if-statements is because that I didn't get
> > '&&' and '||' to work with herd.
> > 
> > When this is run:
> > 
> > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> > Test spsc-rb+1p1c Allowed
> > States 2
> > 1:d=0; cons=1; prod=0;
> > 1:d=0; cons=1; prod=1;
> > No
> > Witnesses
> > Positive: 0 Negative: 2
> > Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> > Observation spsc-rb+1p1c Never 0 2
> > Time spsc-rb+1p1c 0.04
> > Hash=b399756d6a1301ca5bda042f32130791
> > 
> > Now to my question; In P0 there's an smp_mb(). Without that, the d==1
> > can be observed from P1 (consumer):
> > 
> > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> > Test spsc-rb+1p1c Allowed
> > States 3
> > 1:d=0; cons=1; prod=0;
> > 1:d=0; cons=1; prod=1;
> > 1:d=1; cons=1; prod=0;
> > Ok
> > Witnesses
> > Positive: 1 Negative: 2
> > Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> > Observation spsc-rb+1p1c Sometimes 1 2
> > Time spsc-rb+1p1c 0.04
> > Hash=0047fc21fa77da9a9aee15e35ec367ef
> 
> This result is wrong, apparently because of a bug in herd7.  There 
> should be control dependencies from each of the two loads in P0 to each 
> of the two stores, but herd7 doesn't detect them.
> 
> Maybe Luc can find some time to check whether this really is a bug and 
> if it is, fix it.

I agree that herd7's control dependency tracking could be improved.

But sadly, it is currently doing exactly what I asked Luc to make it do,
which is to confine the control dependency to its "if" statement.  But as
usual I wasn't thinking globally enough.  And I am not exactly sure what
to ask for.  Here a store to a local was control-dependency ordered after
a read, and so that should propagate to a read from that local variable.
Maybe treat local variables as if they were registers, so that from
herd7's viewpoint the READ_ONCE()s are able to head control-dependency
chains in multiple "if" statements?

Thoughts?

							Thanx, Paul

> > In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
> > on control dependencies") removes the corresponding smp_mb(), and also
> > the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> > ("documentation: Update circular buffer for
> > load-acquire/store-release")) is missing the smp_mb() at the
> > producer-side.
> > 
> > I'm trying to wrap my head around why it's OK to remove the smp_mb()
> > in the cases above? I'm worried that the current XDP socket ring
> > implementation (which is missing smp_mb()) might be broken.
> 
> Because of the control dependencies, the smp_mb isn't needed.  The 
> dependencies will order both of the stores after both of the loads.
> 
> Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 23:50   ` Paul E. McKenney
@ 2021-03-03  6:37     ` maranget
  2021-03-03 16:54       ` Paul E. McKenney
  2021-03-03 17:12     ` Alan Stern
  1 sibling, 1 reply; 33+ messages in thread
From: maranget @ 2021-03-03  6:37 UTC (permalink / raw)
  To: paulmck
  Cc: Alan Stern, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells,
	j.alglave, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

Hi all,

I agree that herd7 computation of control dependencies is problematic here.

Thanks for reporting the problem. Paul kindly takes the responsibility for the problem, but frankly do not remember having followed his recommendations here.

I am quite busy today and canot really work on the issue. However my intuition is that control dependencies follow from chains  of control dependencies in if (say “direct” control dependencies, roughly from "if" condition to if statement)  and data dependencies, ending in control.

—Luc

> On 3 Mar 2021, at 00:50, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
>> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
>>> Hi!
>>> 
>>> Firstly; The long Cc-list is to reach the LKMM-folks.
>>> 
>>> Some background; the XDP sockets use a ring-buffer to communicate
>>> between the kernel and userland. It's a
>>> single-consumer/single-producer ring, and described in
>>> net/xdp/xsk_queue.h.
>>> 
>>> --8<---
>>> /* The structure of the shared state of the rings are the same as the
>>> * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
>>> * ring, the kernel is the producer and user space is the consumer. For
>>> * the Tx and fill rings, the kernel is the consumer and user space is
>>> * the producer.
>>> *
>>> * producer                         consumer
>>> *
>>> * if (LOAD ->consumer) {           LOAD ->producer
>>> *                    (A)           smp_rmb()       (C)
>>> *    STORE $data                   LOAD $data
>>> *    smp_wmb()       (B)           smp_mb()        (D)
>>> *    STORE ->producer              STORE ->consumer
>>> * }
>>> *
>>> * (A) pairs with (D), and (B) pairs with (C).
>>> ...
>>> -->8---
>>> 
>>> I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
>>> semantics [1], without breaking existing userspace applications.
>>> 
>>> So, I figured I'd use herd7 and the LKMM model to build a litmus test
>>> for the barrier version, then for the acquire-release version, and
>>> finally permutations of both.
>>> 
>>> The idea is to use a one element ring, with a state machine outlined
>>> in the litmus test.
>>> 
>>> The basic test for the existing smp_{r,w,}mb() barriers looks like:
>>> 
>>> $ cat spsc-rb+1p1c.litmus
>>> C spsc-rb+1p1c
>>> 
>>> // Stupid one entry ring:
>>> // prod cons     allowed action       prod cons
>>> //    0    0 =>       prod          =>   1    0
>>> //    0    1 =>       cons          =>   0    0
>>> //    1    0 =>       cons          =>   1    1
>>> //    1    1 =>       prod          =>   0    1
>>> 
>>> { prod = 1; }
>>> 
>>> // Here, we start at prod==1,cons==0, data==0, i.e. producer has
>>> // written data=0, so from here only the consumer can start, and should
>>> // consume data==0. Afterwards, producer can continue and write 1 to
>>> // data. Can we enter state prod==0, cons==1, but consumer observerd
>>> // the write of 1?
>>> 
>>> P0(int *prod, int *cons, int *data)
>>> {
>>>    int p;
>>>    int c;
>>>    int cond = 0;
>>> 
>>>    p = READ_ONCE(*prod);
>>>    c = READ_ONCE(*cons);
>>>    if (p == 0)
>>>        if (c == 0)
>>>            cond = 1;
>>>    if (p == 1)
>>>        if (c == 1)
>>>            cond = 1;
>>> 
>>>    if (cond) {
>>>        smp_mb();
>>>        WRITE_ONCE(*data, 1);
>>>        smp_wmb();
>>>        WRITE_ONCE(*prod, p ^ 1);
>>>    }
>>> }
>>> 
>>> P1(int *prod, int *cons, int *data)
>>> {
>>>    int p;
>>>    int c;
>>>    int d = -1;
>>>    int cond = 0;
>>> 
>>>    p = READ_ONCE(*prod);
>>>    c = READ_ONCE(*cons);
>>>    if (p == 1)
>>>        if (c == 0)
>>>            cond = 1;
>>>    if (p == 0)
>>>        if (c == 1)
>>>            cond = 1;
>>> 
>>>    if (cond == 1) {
>>>        smp_rmb();
>>>        d = READ_ONCE(*data);
>>>        smp_mb();
>>>        WRITE_ONCE(*cons, c ^ 1);
>>>    }
>>> }
>>> 
>>> exists( 1:d=1 /\ prod=0 /\ cons=1 );
>>> 
>>> --
>>> 
>>> The weird state changing if-statements is because that I didn't get
>>> '&&' and '||' to work with herd.
>>> 
>>> When this is run:
>>> 
>>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
>>> Test spsc-rb+1p1c Allowed
>>> States 2
>>> 1:d=0; cons=1; prod=0;
>>> 1:d=0; cons=1; prod=1;
>>> No
>>> Witnesses
>>> Positive: 0 Negative: 2
>>> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
>>> Observation spsc-rb+1p1c Never 0 2
>>> Time spsc-rb+1p1c 0.04
>>> Hash=b399756d6a1301ca5bda042f32130791
>>> 
>>> Now to my question; In P0 there's an smp_mb(). Without that, the d==1
>>> can be observed from P1 (consumer):
>>> 
>>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
>>> Test spsc-rb+1p1c Allowed
>>> States 3
>>> 1:d=0; cons=1; prod=0;
>>> 1:d=0; cons=1; prod=1;
>>> 1:d=1; cons=1; prod=0;
>>> Ok
>>> Witnesses
>>> Positive: 1 Negative: 2
>>> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
>>> Observation spsc-rb+1p1c Sometimes 1 2
>>> Time spsc-rb+1p1c 0.04
>>> Hash=0047fc21fa77da9a9aee15e35ec367ef
>> 
>> This result is wrong, apparently because of a bug in herd7.  There 
>> should be control dependencies from each of the two loads in P0 to each 
>> of the two stores, but herd7 doesn't detect them.
>> 
>> Maybe Luc can find some time to check whether this really is a bug and 
>> if it is, fix it.
> 
> I agree that herd7's control dependency tracking could be improved.
> 
> But sadly, it is currently doing exactly what I asked Luc to make it do,
> which is to confine the control dependency to its "if" statement.  But as
> usual I wasn't thinking globally enough.  And I am not exactly sure what
> to ask for.  Here a store to a local was control-dependency ordered after
> a read, and so that should propagate to a read from that local variable.
> Maybe treat local variables as if they were registers, so that from
> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> chains in multiple "if" statements?
> 
> Thoughts?
> 
> 							Thanx, Paul
> 
>>> In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
>>> on control dependencies") removes the corresponding smp_mb(), and also
>>> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
>>> ("documentation: Update circular buffer for
>>> load-acquire/store-release")) is missing the smp_mb() at the
>>> producer-side.
>>> 
>>> I'm trying to wrap my head around why it's OK to remove the smp_mb()
>>> in the cases above? I'm worried that the current XDP socket ring
>>> implementation (which is missing smp_mb()) might be broken.
>> 
>> Because of the control dependencies, the smp_mb isn't needed.  The 
>> dependencies will order both of the stores after both of the loads.
>> 
>> Alan Stern


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03  6:37     ` maranget
@ 2021-03-03 16:54       ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-03 16:54 UTC (permalink / raw)
  To: maranget
  Cc: Alan Stern, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells,
	j.alglave, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 07:37:42AM +0100, maranget wrote:
> Hi all,
> 
> I agree that herd7 computation of control dependencies is problematic here.
> 
> Thanks for reporting the problem. Paul kindly takes the responsibility for the problem, but frankly do not remember having followed his recommendations here.

Accidents happen?  ;-)

> I am quite busy today and canot really work on the issue. However my intuition is that control dependencies follow from chains  of control dependencies in if (say “direct” control dependencies, roughly from "if" condition to if statement)  and data dependencies, ending in control.

My more  operational line of thinking is to treat each local variable
as if it was a machine register, then analyze based on hardware control
dependencies, except that (as before) an LKMM control dependency is
confined to its "if" statement.

Stores to and reloads from shared variables would be handled as they
are now, correct?

My guess is that this turns out to be equivalent to your intuition,
but this stuff can be tricky.

							Thanx, Paul

> —Luc
> 
> > On 3 Mar 2021, at 00:50, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
> >> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote:
> >>> Hi!
> >>> 
> >>> Firstly; The long Cc-list is to reach the LKMM-folks.
> >>> 
> >>> Some background; the XDP sockets use a ring-buffer to communicate
> >>> between the kernel and userland. It's a
> >>> single-consumer/single-producer ring, and described in
> >>> net/xdp/xsk_queue.h.
> >>> 
> >>> --8<---
> >>> /* The structure of the shared state of the rings are the same as the
> >>> * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> >>> * ring, the kernel is the producer and user space is the consumer. For
> >>> * the Tx and fill rings, the kernel is the consumer and user space is
> >>> * the producer.
> >>> *
> >>> * producer                         consumer
> >>> *
> >>> * if (LOAD ->consumer) {           LOAD ->producer
> >>> *                    (A)           smp_rmb()       (C)
> >>> *    STORE $data                   LOAD $data
> >>> *    smp_wmb()       (B)           smp_mb()        (D)
> >>> *    STORE ->producer              STORE ->consumer
> >>> * }
> >>> *
> >>> * (A) pairs with (D), and (B) pairs with (C).
> >>> ...
> >>> -->8---
> >>> 
> >>> I'd like to replace the smp_{r,w,}mb() barriers with acquire-release
> >>> semantics [1], without breaking existing userspace applications.
> >>> 
> >>> So, I figured I'd use herd7 and the LKMM model to build a litmus test
> >>> for the barrier version, then for the acquire-release version, and
> >>> finally permutations of both.
> >>> 
> >>> The idea is to use a one element ring, with a state machine outlined
> >>> in the litmus test.
> >>> 
> >>> The basic test for the existing smp_{r,w,}mb() barriers looks like:
> >>> 
> >>> $ cat spsc-rb+1p1c.litmus
> >>> C spsc-rb+1p1c
> >>> 
> >>> // Stupid one entry ring:
> >>> // prod cons     allowed action       prod cons
> >>> //    0    0 =>       prod          =>   1    0
> >>> //    0    1 =>       cons          =>   0    0
> >>> //    1    0 =>       cons          =>   1    1
> >>> //    1    1 =>       prod          =>   0    1
> >>> 
> >>> { prod = 1; }
> >>> 
> >>> // Here, we start at prod==1,cons==0, data==0, i.e. producer has
> >>> // written data=0, so from here only the consumer can start, and should
> >>> // consume data==0. Afterwards, producer can continue and write 1 to
> >>> // data. Can we enter state prod==0, cons==1, but consumer observerd
> >>> // the write of 1?
> >>> 
> >>> P0(int *prod, int *cons, int *data)
> >>> {
> >>>    int p;
> >>>    int c;
> >>>    int cond = 0;
> >>> 
> >>>    p = READ_ONCE(*prod);
> >>>    c = READ_ONCE(*cons);
> >>>    if (p == 0)
> >>>        if (c == 0)
> >>>            cond = 1;
> >>>    if (p == 1)
> >>>        if (c == 1)
> >>>            cond = 1;
> >>> 
> >>>    if (cond) {
> >>>        smp_mb();
> >>>        WRITE_ONCE(*data, 1);
> >>>        smp_wmb();
> >>>        WRITE_ONCE(*prod, p ^ 1);
> >>>    }
> >>> }
> >>> 
> >>> P1(int *prod, int *cons, int *data)
> >>> {
> >>>    int p;
> >>>    int c;
> >>>    int d = -1;
> >>>    int cond = 0;
> >>> 
> >>>    p = READ_ONCE(*prod);
> >>>    c = READ_ONCE(*cons);
> >>>    if (p == 1)
> >>>        if (c == 0)
> >>>            cond = 1;
> >>>    if (p == 0)
> >>>        if (c == 1)
> >>>            cond = 1;
> >>> 
> >>>    if (cond == 1) {
> >>>        smp_rmb();
> >>>        d = READ_ONCE(*data);
> >>>        smp_mb();
> >>>        WRITE_ONCE(*cons, c ^ 1);
> >>>    }
> >>> }
> >>> 
> >>> exists( 1:d=1 /\ prod=0 /\ cons=1 );
> >>> 
> >>> --
> >>> 
> >>> The weird state changing if-statements is because that I didn't get
> >>> '&&' and '||' to work with herd.
> >>> 
> >>> When this is run:
> >>> 
> >>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> >>> Test spsc-rb+1p1c Allowed
> >>> States 2
> >>> 1:d=0; cons=1; prod=0;
> >>> 1:d=0; cons=1; prod=1;
> >>> No
> >>> Witnesses
> >>> Positive: 0 Negative: 2
> >>> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> >>> Observation spsc-rb+1p1c Never 0 2
> >>> Time spsc-rb+1p1c 0.04
> >>> Hash=b399756d6a1301ca5bda042f32130791
> >>> 
> >>> Now to my question; In P0 there's an smp_mb(). Without that, the d==1
> >>> can be observed from P1 (consumer):
> >>> 
> >>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus
> >>> Test spsc-rb+1p1c Allowed
> >>> States 3
> >>> 1:d=0; cons=1; prod=0;
> >>> 1:d=0; cons=1; prod=1;
> >>> 1:d=1; cons=1; prod=0;
> >>> Ok
> >>> Witnesses
> >>> Positive: 1 Negative: 2
> >>> Condition exists (1:d=1 /\ prod=0 /\ cons=1)
> >>> Observation spsc-rb+1p1c Sometimes 1 2
> >>> Time spsc-rb+1p1c 0.04
> >>> Hash=0047fc21fa77da9a9aee15e35ec367ef
> >> 
> >> This result is wrong, apparently because of a bug in herd7.  There 
> >> should be control dependencies from each of the two loads in P0 to each 
> >> of the two stores, but herd7 doesn't detect them.
> >> 
> >> Maybe Luc can find some time to check whether this really is a bug and 
> >> if it is, fix it.
> > 
> > I agree that herd7's control dependency tracking could be improved.
> > 
> > But sadly, it is currently doing exactly what I asked Luc to make it do,
> > which is to confine the control dependency to its "if" statement.  But as
> > usual I wasn't thinking globally enough.  And I am not exactly sure what
> > to ask for.  Here a store to a local was control-dependency ordered after
> > a read, and so that should propagate to a read from that local variable.
> > Maybe treat local variables as if they were registers, so that from
> > herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> > chains in multiple "if" statements?
> > 
> > Thoughts?
> > 
> > 							Thanx, Paul
> > 
> >>> In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending
> >>> on control dependencies") removes the corresponding smp_mb(), and also
> >>> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> >>> ("documentation: Update circular buffer for
> >>> load-acquire/store-release")) is missing the smp_mb() at the
> >>> producer-side.
> >>> 
> >>> I'm trying to wrap my head around why it's OK to remove the smp_mb()
> >>> in the cases above? I'm worried that the current XDP socket ring
> >>> implementation (which is missing smp_mb()) might be broken.
> >> 
> >> Because of the control dependencies, the smp_mb isn't needed.  The 
> >> dependencies will order both of the stores after both of the loads.
> >> 
> >> Alan Stern
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-02 23:50   ` Paul E. McKenney
  2021-03-03  6:37     ` maranget
@ 2021-03-03 17:12     ` Alan Stern
  2021-03-03 17:37       ` maranget
  2021-03-03 17:40       ` Paul E. McKenney
  1 sibling, 2 replies; 33+ messages in thread
From: Alan Stern @ 2021-03-03 17:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:

> > This result is wrong, apparently because of a bug in herd7.  There 
> > should be control dependencies from each of the two loads in P0 to each 
> > of the two stores, but herd7 doesn't detect them.
> > 
> > Maybe Luc can find some time to check whether this really is a bug and 
> > if it is, fix it.
> 
> I agree that herd7's control dependency tracking could be improved.
> 
> But sadly, it is currently doing exactly what I asked Luc to make it do,
> which is to confine the control dependency to its "if" statement.  But as
> usual I wasn't thinking globally enough.  And I am not exactly sure what
> to ask for.  Here a store to a local was control-dependency ordered after
> a read, and so that should propagate to a read from that local variable.
> Maybe treat local variables as if they were registers, so that from
> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> chains in multiple "if" statements?
> 
> Thoughts?

Local variables absolutely should be treated just like CPU registers, if 
possible.  In fact, the compiler has the option of keeping local 
variables stored in registers.

(Of course, things may get complicated if anyone writes a litmus test 
that uses a pointer to a local variable,  Especially if the pointer 
could hold the address of a local variable in one execution and a 
shared variable in another!  Or if the pointer is itself a shared 
variable and is dereferenced in another thread!)

But even if local variables are treated as non-shared storage locations, 
we should still handle this correctly.  Part of the problem seems to lie 
in the definition of the to-r dependency relation; the relevant portion 
is:

	(dep ; [Marked] ; rfi)

Here dep is the control dependency from the READ_ONCE to the 
local-variable store, and the rfi refers to the following load of the 
local variable.  The problem is that the store to the local variable 
doesn't go in the Marked class, because it is notated as a plain C 
assignment.  (And likewise for the following load.)

Should we change the model to make loads from and stores to local 
variables always count as Marked?

What should have happened if the local variable were instead a shared 
variable which the other thread didn't access at all?  It seems like a 
weak point of the memory model that it treats these two things 
differently.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 17:12     ` Alan Stern
@ 2021-03-03 17:37       ` maranget
  2021-03-03 17:39         ` maranget
  2021-03-03 19:40         ` Alan Stern
  2021-03-03 17:40       ` Paul E. McKenney
  1 sibling, 2 replies; 33+ messages in thread
From: maranget @ 2021-03-03 17:37 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells,
	j.alglave, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus



> On 3 Mar 2021, at 18:12, Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
>> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
> 
>>> This result is wrong, apparently because of a bug in herd7.  There 
>>> should be control dependencies from each of the two loads in P0 to each 
>>> of the two stores, but herd7 doesn't detect them.
>>> 
>>> Maybe Luc can find some time to check whether this really is a bug and 
>>> if it is, fix it.
>> 
>> I agree that herd7's control dependency tracking could be improved.
>> 
>> But sadly, it is currently doing exactly what I asked Luc to make it do,
>> which is to confine the control dependency to its "if" statement.  But as
>> usual I wasn't thinking globally enough.  And I am not exactly sure what
>> to ask for.  Here a store to a local was control-dependency ordered after
>> a read, and so that should propagate to a read from that local variable.
>> Maybe treat local variables as if they were registers, so that from
>> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
>> chains in multiple "if" statements?
>> 
>> Thoughts?
> 
> Local variables absolutely should be treated just like CPU registers, if 
> possible.  In fact, the compiler has the option of keeping local 
> variables stored in registers.
> 

And indeed local variables are treated as registers by herd7.


> (Of course, things may get complicated if anyone writes a litmus test 
> that uses a pointer to a local variable,  Especially if the pointer 
> could hold the address of a local variable in one execution and a 
> shared variable in another!  Or if the pointer is itself a shared 
> variable and is dereferenced in another thread!)
> 
> But even if local variables are treated as non-shared storage locations, 
> we should still handle this correctly.  Part of the problem seems to lie 
> in the definition of the to-r dependency relation; the relevant portion 
> is:

In fact, I’d rather change the computation of “dep” here control-dependency “ctrl”. Notice that “ctrl” is computed by herd7 and present in the initial environment of the Cat interpreter.

I have made a PR to herd7 that performs the change. The commit message states the new definition.


> 
> 	(dep ; [Marked] ; rfi)
> 
> Here dep is the control dependency from the READ_ONCE to the 
> local-variable store, and the rfi refers to the following load of the 
> local variable.  The problem is that the store to the local variable 
> doesn't go in the Marked class, because it is notated as a plain C 
> assignment.  (And likewise for the following load.)
> 
This is a related issue, I am not sure, but perhaps it can be formulated as
"should rfi and rf on registers behave the  same?”



> Should we change the model to make loads from and stores to local 
> variables always count as Marked?
> 
> What should have happened if the local variable were instead a shared 
> variable which the other thread didn't access at all?  It seems like a 
> weak point of the memory model that it treats these two things 
> differently.
> 
> Alan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 17:37       ` maranget
@ 2021-03-03 17:39         ` maranget
  2021-03-03 21:56           ` Paul E. McKenney
  2021-03-03 19:40         ` Alan Stern
  1 sibling, 1 reply; 33+ messages in thread
From: maranget @ 2021-03-03 17:39 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells,
	j.alglave, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus



> On 3 Mar 2021, at 18:37, maranget <luc.maranget@inria.fr> wrote:
> 
> I have made a PR to herd7 that performs the change. The commit message states the new definition.

For those who are interested
<https://github.com/herd/herdtools7/pull/183>

—Luc


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 17:12     ` Alan Stern
  2021-03-03 17:37       ` maranget
@ 2021-03-03 17:40       ` Paul E. McKenney
  2021-03-03 20:22         ` Alan Stern
  1 sibling, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-03 17:40 UTC (permalink / raw)
  To: Alan Stern
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:
> On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
> > On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
> 
> > > This result is wrong, apparently because of a bug in herd7.  There 
> > > should be control dependencies from each of the two loads in P0 to each 
> > > of the two stores, but herd7 doesn't detect them.
> > > 
> > > Maybe Luc can find some time to check whether this really is a bug and 
> > > if it is, fix it.
> > 
> > I agree that herd7's control dependency tracking could be improved.
> > 
> > But sadly, it is currently doing exactly what I asked Luc to make it do,
> > which is to confine the control dependency to its "if" statement.  But as
> > usual I wasn't thinking globally enough.  And I am not exactly sure what
> > to ask for.  Here a store to a local was control-dependency ordered after
> > a read, and so that should propagate to a read from that local variable.
> > Maybe treat local variables as if they were registers, so that from
> > herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> > chains in multiple "if" statements?
> > 
> > Thoughts?
> 
> Local variables absolutely should be treated just like CPU registers, if 
> possible.  In fact, the compiler has the option of keeping local 
> variables stored in registers.
> 
> (Of course, things may get complicated if anyone writes a litmus test 
> that uses a pointer to a local variable,  Especially if the pointer 
> could hold the address of a local variable in one execution and a 
> shared variable in another!  Or if the pointer is itself a shared 
> variable and is dereferenced in another thread!)

Good point!  I did miss this complication.  ;-)

As you say, when its address is taken, the "local" variable needs to be
treated as is it were shared.  There are exceptions where the pointed-to
local is still used only by its process.  Are any of these exceptions
problematic?

> But even if local variables are treated as non-shared storage locations, 
> we should still handle this correctly.  Part of the problem seems to lie 
> in the definition of the to-r dependency relation; the relevant portion 
> is:
> 
> 	(dep ; [Marked] ; rfi)
> 
> Here dep is the control dependency from the READ_ONCE to the 
> local-variable store, and the rfi refers to the following load of the 
> local variable.  The problem is that the store to the local variable 
> doesn't go in the Marked class, because it is notated as a plain C 
> assignment.  (And likewise for the following load.)
> 
> Should we change the model to make loads from and stores to local 
> variables always count as Marked?

As long as the initial (possibly unmarked) load would be properly
complained about.  And I cannot immediately think of a situation where
this approach would break that would not result in a data race being
flagged.  Or is this yet another failure of my imagination?

> What should have happened if the local variable were instead a shared 
> variable which the other thread didn't access at all?  It seems like a 
> weak point of the memory model that it treats these two things 
> differently.

But is this really any different than the situation where a global
variable is only accessed by a single thread?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 17:37       ` maranget
  2021-03-03 17:39         ` maranget
@ 2021-03-03 19:40         ` Alan Stern
  1 sibling, 0 replies; 33+ messages in thread
From: Alan Stern @ 2021-03-03 19:40 UTC (permalink / raw)
  To: maranget
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells,
	j.alglave, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 06:37:36PM +0100, maranget wrote:
> 
> 
> > On 3 Mar 2021, at 18:12, Alan Stern <stern@rowland.harvard.edu> wrote:
> > 
> > On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
> >> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
> > 
> >>> This result is wrong, apparently because of a bug in herd7.  There 
> >>> should be control dependencies from each of the two loads in P0 to each 
> >>> of the two stores, but herd7 doesn't detect them.
> >>> 
> >>> Maybe Luc can find some time to check whether this really is a bug and 
> >>> if it is, fix it.
> >> 
> >> I agree that herd7's control dependency tracking could be improved.
> >> 
> >> But sadly, it is currently doing exactly what I asked Luc to make it do,
> >> which is to confine the control dependency to its "if" statement.  But as
> >> usual I wasn't thinking globally enough.  And I am not exactly sure what
> >> to ask for.  Here a store to a local was control-dependency ordered after
> >> a read, and so that should propagate to a read from that local variable.
> >> Maybe treat local variables as if they were registers, so that from
> >> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> >> chains in multiple "if" statements?
> >> 
> >> Thoughts?
> > 
> > Local variables absolutely should be treated just like CPU registers, if 
> > possible.  In fact, the compiler has the option of keeping local 
> > variables stored in registers.
> > 
> 
> And indeed local variables are treated as registers by herd7.
> 
> 
> > (Of course, things may get complicated if anyone writes a litmus test 
> > that uses a pointer to a local variable,  Especially if the pointer 
> > could hold the address of a local variable in one execution and a 
> > shared variable in another!  Or if the pointer is itself a shared 
> > variable and is dereferenced in another thread!)
> > 
> > But even if local variables are treated as non-shared storage locations, 
> > we should still handle this correctly.  Part of the problem seems to lie 
> > in the definition of the to-r dependency relation; the relevant portion 
> > is:
> 
> In fact, I’d rather change the computation of “dep” here control-dependency “ctrl”. Notice that “ctrl” is computed by herd7 and present in the initial environment of the Cat interpreter.
> 
> I have made a PR to herd7 that performs the change. The commit message states the new definition.

Shouldn't similar reasoning apply to data and address dependencies?

For example, suppose there is a control dependency from a load to a 
register variable, and then a data dependency from the register variable 
to a store.  This should be treated as an overall data dependency from 
the load to the store.

Does your change to herd7 do this?  I couldn't tell from the description 
in the PR.

Also, do you think it's reasonable to add a restriction to herd7 against 
taking the address of a local variable?

> > 	(dep ; [Marked] ; rfi)
> > 
> > Here dep is the control dependency from the READ_ONCE to the 
> > local-variable store, and the rfi refers to the following load of the 
> > local variable.  The problem is that the store to the local variable 
> > doesn't go in the Marked class, because it is notated as a plain C 
> > assignment.  (And likewise for the following load.)
> > 
> This is a related issue, I am not sure, but perhaps it can be formulated as
> "should rfi and rf on registers behave the  same?”

Aren't they already the same thing?  It's not possible to have an rfe 
from a register, is it?

Alan

> > Should we change the model to make loads from and stores to local 
> > variables always count as Marked?
> > 
> > What should have happened if the local variable were instead a shared 
> > variable which the other thread didn't access at all?  It seems like a 
> > weak point of the memory model that it treats these two things 
> > differently.
> > 
> > Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 17:40       ` Paul E. McKenney
@ 2021-03-03 20:22         ` Alan Stern
  2021-03-03 22:03           ` Paul E. McKenney
                             ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Alan Stern @ 2021-03-03 20:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:

> > Local variables absolutely should be treated just like CPU registers, if 
> > possible.  In fact, the compiler has the option of keeping local 
> > variables stored in registers.
> > 
> > (Of course, things may get complicated if anyone writes a litmus test 
> > that uses a pointer to a local variable,  Especially if the pointer 
> > could hold the address of a local variable in one execution and a 
> > shared variable in another!  Or if the pointer is itself a shared 
> > variable and is dereferenced in another thread!)
> 
> Good point!  I did miss this complication.  ;-)

I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
local variables.

> As you say, when its address is taken, the "local" variable needs to be
> treated as is it were shared.  There are exceptions where the pointed-to
> local is still used only by its process.  Are any of these exceptions
> problematic?

Easiest just to rule out the whole can of worms.

> > But even if local variables are treated as non-shared storage locations, 
> > we should still handle this correctly.  Part of the problem seems to lie 
> > in the definition of the to-r dependency relation; the relevant portion 
> > is:
> > 
> > 	(dep ; [Marked] ; rfi)
> > 
> > Here dep is the control dependency from the READ_ONCE to the 
> > local-variable store, and the rfi refers to the following load of the 
> > local variable.  The problem is that the store to the local variable 
> > doesn't go in the Marked class, because it is notated as a plain C 
> > assignment.  (And likewise for the following load.)
> > 
> > Should we change the model to make loads from and stores to local 
> > variables always count as Marked?
> 
> As long as the initial (possibly unmarked) load would be properly
> complained about.

Sorry, I don't understand what you mean.

>  And I cannot immediately think of a situation where
> this approach would break that would not result in a data race being
> flagged.  Or is this yet another failure of my imagination?

By definition, an access to a local variable cannot participate in a 
data race because all such accesses are confined to a single thread.

However, there are other aspects to consider, in particular, the 
ordering relations on local-variable accesses.  But if, as Luc says, 
local variables are treated just like registers then perhaps the issue 
doesn't arise.

> > What should have happened if the local variable were instead a shared 
> > variable which the other thread didn't access at all?  It seems like a 
> > weak point of the memory model that it treats these two things 
> > differently.
> 
> But is this really any different than the situation where a global
> variable is only accessed by a single thread?

Indeed; it is the _same_ situation.  Which leads to some interesting 
questions, such as: What does READ_ONCE(r) mean when r is a local 
variable?  Should it be allowed at all?  In what way is it different 
from a plain read of r?

One difference is that the LKMM doesn't allow dependencies to originate 
from a plain load.  Of course, when you're dealing with a local 
variable, what matters is not the load from that variable but rather the 
earlier loads which determined the value that had been stored there.  
Which brings us back to the case of the

	dep ; rfi

dependency relation, where the accesses in the middle are plain and 
non-racy.  Should the LKMM be changed to allow this?

There are other differences to consider.  For example:

	r = READ_ONCE(x);
	smp_wmb();
	WRITE_ONCE(y, 1);

If the write to r were treated as a marked store, the smp_wmb would 
order it (and consequently the READ_ONCE) before the WRITE_ONCE.  
However we don't want to do this when r is a local variable.  Indeed, a 
plain store wouldn't be ordered this way because the compiler might 
optimize the store away entirely, leaving the smp_wmb nothing to act on.

So overall the situation is rather puzzling.  Treating local variables 
as registers is probably the best answer.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 17:39         ` maranget
@ 2021-03-03 21:56           ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-03 21:56 UTC (permalink / raw)
  To: maranget
  Cc: Alan Stern, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells,
	j.alglave, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 06:39:14PM +0100, maranget wrote:
> 
> 
> > On 3 Mar 2021, at 18:37, maranget <luc.maranget@inria.fr> wrote:
> > 
> > I have made a PR to herd7 that performs the change. The commit message states the new definition.
> 
> For those who are interested
> <https://github.com/herd/herdtools7/pull/183>

And I just confirmed that with this change, Björn's original litmus
test behaves as desired.  Merci beaucoup, Luc!

The new herd7 also passes the in-kernel regression test, and also all
of the github "litmus" archive's tests up to six processes that are
flagged with expected outcomes except for these five tests, which
are expected failures:

	litmus/manual/deps/LB-addr-equals.litmus
	litmus/manual/deps/LB-ctls-bothvals-a.litmus
	litmus/manual/deps/LB-ctls-diffvals-det.litmus
	litmus/manual/deps/LB-ctls-sameval-barrier.litmus
	litmus/manual/deps/LB-ctls-sameval.litmus

This situation is a bit silly, so I changed the "Results" clause
and added a comment noting that LKMM does not yet know about these
corner cases.

I just started a longer regression test, but this will take some time.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 20:22         ` Alan Stern
@ 2021-03-03 22:03           ` Paul E. McKenney
  2021-03-04  3:21             ` Alan Stern
  2021-03-04  1:26           ` Boqun Feng
  2021-03-04 15:44           ` maranget
  2 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-03 22:03 UTC (permalink / raw)
  To: Alan Stern
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:
> 
> > > Local variables absolutely should be treated just like CPU registers, if 
> > > possible.  In fact, the compiler has the option of keeping local 
> > > variables stored in registers.
> > > 
> > > (Of course, things may get complicated if anyone writes a litmus test 
> > > that uses a pointer to a local variable,  Especially if the pointer 
> > > could hold the address of a local variable in one execution and a 
> > > shared variable in another!  Or if the pointer is itself a shared 
> > > variable and is dereferenced in another thread!)
> > 
> > Good point!  I did miss this complication.  ;-)
> 
> I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> local variables.
> 
> > As you say, when its address is taken, the "local" variable needs to be
> > treated as is it were shared.  There are exceptions where the pointed-to
> > local is still used only by its process.  Are any of these exceptions
> > problematic?
> 
> Easiest just to rule out the whole can of worms.

Good point, given that a global can be used instead of a local for
any case where an address must be taken.

> > > But even if local variables are treated as non-shared storage locations, 
> > > we should still handle this correctly.  Part of the problem seems to lie 
> > > in the definition of the to-r dependency relation; the relevant portion 
> > > is:
> > > 
> > > 	(dep ; [Marked] ; rfi)
> > > 
> > > Here dep is the control dependency from the READ_ONCE to the 
> > > local-variable store, and the rfi refers to the following load of the 
> > > local variable.  The problem is that the store to the local variable 
> > > doesn't go in the Marked class, because it is notated as a plain C 
> > > assignment.  (And likewise for the following load.)
> > > 
> > > Should we change the model to make loads from and stores to local 
> > > variables always count as Marked?
> > 
> > As long as the initial (possibly unmarked) load would be properly
> > complained about.
> 
> Sorry, I don't understand what you mean.

I was thinking in terms of something like this in one of the processes:

	p = gp; // Unmarked!
	r1 = p;
	q = r1; // Implicitly marked now?
	if (q)
		WRITE_ONCE(x, 1); // ctrl dep from gp???

> >  And I cannot immediately think of a situation where
> > this approach would break that would not result in a data race being
> > flagged.  Or is this yet another failure of my imagination?
> 
> By definition, an access to a local variable cannot participate in a 
> data race because all such accesses are confined to a single thread.

True, but its value might have come from a load from a shared variable.

> However, there are other aspects to consider, in particular, the 
> ordering relations on local-variable accesses.  But if, as Luc says, 
> local variables are treated just like registers then perhaps the issue 
> doesn't arise.

Here is hoping!

> > > What should have happened if the local variable were instead a shared 
> > > variable which the other thread didn't access at all?  It seems like a 
> > > weak point of the memory model that it treats these two things 
> > > differently.
> > 
> > But is this really any different than the situation where a global
> > variable is only accessed by a single thread?
> 
> Indeed; it is the _same_ situation.  Which leads to some interesting 
> questions, such as: What does READ_ONCE(r) mean when r is a local 
> variable?  Should it be allowed at all?  In what way is it different 
> from a plain read of r?
> 
> One difference is that the LKMM doesn't allow dependencies to originate 
> from a plain load.  Of course, when you're dealing with a local 
> variable, what matters is not the load from that variable but rather the 
> earlier loads which determined the value that had been stored there.  
> Which brings us back to the case of the
> 
> 	dep ; rfi
> 
> dependency relation, where the accesses in the middle are plain and 
> non-racy.  Should the LKMM be changed to allow this?

It would be nice, give or take the potential side effects.  ;-)
As in it would be nice, but might not be worthwhile.

> There are other differences to consider.  For example:
> 
> 	r = READ_ONCE(x);
> 	smp_wmb();
> 	WRITE_ONCE(y, 1);
> 
> If the write to r were treated as a marked store, the smp_wmb would 
> order it (and consequently the READ_ONCE) before the WRITE_ONCE.  
> However we don't want to do this when r is a local variable.  Indeed, a 
> plain store wouldn't be ordered this way because the compiler might 
> optimize the store away entirely, leaving the smp_wmb nothing to act on.

Agreed, having smp_wmb() order things due to a write to a local variable
would not be what we want.

> So overall the situation is rather puzzling.  Treating local variables 
> as registers is probably the best answer.

That is sounding quite appealing at the moment.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 20:22         ` Alan Stern
  2021-03-03 22:03           ` Paul E. McKenney
@ 2021-03-04  1:26           ` Boqun Feng
  2021-03-04  3:13             ` Alan Stern
  2021-03-04 15:44           ` maranget
  2 siblings, 1 reply; 33+ messages in thread
From: Boqun Feng @ 2021-03-04  1:26 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:
> 
> > > Local variables absolutely should be treated just like CPU registers, if 
> > > possible.  In fact, the compiler has the option of keeping local 
> > > variables stored in registers.
> > > 
> > > (Of course, things may get complicated if anyone writes a litmus test 
> > > that uses a pointer to a local variable,  Especially if the pointer 
> > > could hold the address of a local variable in one execution and a 
> > > shared variable in another!  Or if the pointer is itself a shared 
> > > variable and is dereferenced in another thread!)
> > 
> > Good point!  I did miss this complication.  ;-)
> 
> I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> local variables.
> 
> > As you say, when its address is taken, the "local" variable needs to be
> > treated as is it were shared.  There are exceptions where the pointed-to
> > local is still used only by its process.  Are any of these exceptions
> > problematic?
> 
> Easiest just to rule out the whole can of worms.
> 
> > > But even if local variables are treated as non-shared storage locations, 
> > > we should still handle this correctly.  Part of the problem seems to lie 
> > > in the definition of the to-r dependency relation; the relevant portion 
> > > is:
> > > 
> > > 	(dep ; [Marked] ; rfi)
> > > 
> > > Here dep is the control dependency from the READ_ONCE to the 
> > > local-variable store, and the rfi refers to the following load of the 
> > > local variable.  The problem is that the store to the local variable 
> > > doesn't go in the Marked class, because it is notated as a plain C 
> > > assignment.  (And likewise for the following load.)
> > > 
> > > Should we change the model to make loads from and stores to local 
> > > variables always count as Marked?
> > 
> > As long as the initial (possibly unmarked) load would be properly
> > complained about.
> 
> Sorry, I don't understand what you mean.
> 
> >  And I cannot immediately think of a situation where
> > this approach would break that would not result in a data race being
> > flagged.  Or is this yet another failure of my imagination?
> 
> By definition, an access to a local variable cannot participate in a 
> data race because all such accesses are confined to a single thread.
> 
> However, there are other aspects to consider, in particular, the 
> ordering relations on local-variable accesses.  But if, as Luc says, 
> local variables are treated just like registers then perhaps the issue 
> doesn't arise.
> 
> > > What should have happened if the local variable were instead a shared 
> > > variable which the other thread didn't access at all?  It seems like a 
> > > weak point of the memory model that it treats these two things 
> > > differently.
> > 
> > But is this really any different than the situation where a global
> > variable is only accessed by a single thread?
> 
> Indeed; it is the _same_ situation.  Which leads to some interesting 
> questions, such as: What does READ_ONCE(r) mean when r is a local 
> variable?  Should it be allowed at all?  In what way is it different 
> from a plain read of r?
> 
> One difference is that the LKMM doesn't allow dependencies to originate 
> from a plain load.  Of course, when you're dealing with a local 
> variable, what matters is not the load from that variable but rather the 
> earlier loads which determined the value that had been stored there.  
> Which brings us back to the case of the
> 
> 	dep ; rfi
> 
> dependency relation, where the accesses in the middle are plain and 
> non-racy.  Should the LKMM be changed to allow this?
> 

For this particular question, do we need to consider code as the follow?

	r1 = READ_ONCE(x);  // f
	if (r == 1) {
		local_v = &y; // g
		do_something_a();
	}
	else {
		local_v = &y;
		do_something_b();
	}

	r2 = READ_ONCE(*local_v); // e

, do we have the guarantee that the first READ_ONCE() happens before the
second one? Can compiler optimize the code as:

	r2 = READ_ONCE(y);
	r1 = READ_ONCE(x);

	if (r == 1) {
		do_something_a();
	}
	else {
		do_something_b();
	}

? Although we have:

	f ->dep g ->rfi ->addr e

Regards,
Boqun

> There are other differences to consider.  For example:
> 
> 	r = READ_ONCE(x);
> 	smp_wmb();
> 	WRITE_ONCE(y, 1);
> 
> If the write to r were treated as a marked store, the smp_wmb would 
> order it (and consequently the READ_ONCE) before the WRITE_ONCE.  
> However we don't want to do this when r is a local variable.  Indeed, a 
> plain store wouldn't be ordered this way because the compiler might 
> optimize the store away entirely, leaving the smp_wmb nothing to act on.
> 
> So overall the situation is rather puzzling.  Treating local variables 
> as registers is probably the best answer.
> 
> Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04  1:26           ` Boqun Feng
@ 2021-03-04  3:13             ` Alan Stern
  2021-03-04  6:33               ` Boqun Feng
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2021-03-04  3:13 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Thu, Mar 04, 2021 at 09:26:31AM +0800, Boqun Feng wrote:
> On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:

> > Which brings us back to the case of the
> > 
> > 	dep ; rfi
> > 
> > dependency relation, where the accesses in the middle are plain and 
> > non-racy.  Should the LKMM be changed to allow this?
> > 
> 
> For this particular question, do we need to consider code as the follow?
> 
> 	r1 = READ_ONCE(x);  // f
> 	if (r == 1) {
> 		local_v = &y; // g
> 		do_something_a();
> 	}
> 	else {
> 		local_v = &y;
> 		do_something_b();
> 	}
> 
> 	r2 = READ_ONCE(*local_v); // e
> 
> , do we have the guarantee that the first READ_ONCE() happens before the
> second one? Can compiler optimize the code as:
> 
> 	r2 = READ_ONCE(y);
> 	r1 = READ_ONCE(x);

Well, it can't do that because the compiler isn't allowed to reorder
volatile accesses (which includes READ_ONCE).  But the compiler could
do:

	r1 = READ_ONCE(x);
	r2 = READ_ONCE(y);

> 	if (r == 1) {
> 		do_something_a();
> 	}
> 	else {
> 		do_something_b();
> 	}
> 
> ? Although we have:
> 
> 	f ->dep g ->rfi ->addr e

This would be an example of a problem Paul has described on several
occasions, where both arms of an "if" statement store the same value
(in this case to local_v).  This problem arises even when local
variables are not involved.  For example:

	if (READ_ONCE(x) == 0) {
		WRITE_ONCE(y, 1);
		do_a();
	} else {
		WRITE_ONCE(y, 1);
		do_b();
	}

The compiler can change this to:

	r = READ_ONCE(x);
	WRITE_ONCE(y, 1);
	if (r == 0)
		do_a();
	else
		do_b();

thus allowing the marked accesses to be reordered by the CPU and
breaking the apparent control dependency.

So the answer to your question is: No, we don't have this guarantee,
but the reason is because of doing the same store in both arms, not
because of the use of local variables.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 22:03           ` Paul E. McKenney
@ 2021-03-04  3:21             ` Alan Stern
  2021-03-04  5:04               ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2021-03-04  3:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:
> > 
> > > > Local variables absolutely should be treated just like CPU registers, if 
> > > > possible.  In fact, the compiler has the option of keeping local 
> > > > variables stored in registers.
> > > > 
> > > > (Of course, things may get complicated if anyone writes a litmus test 
> > > > that uses a pointer to a local variable,  Especially if the pointer 
> > > > could hold the address of a local variable in one execution and a 
> > > > shared variable in another!  Or if the pointer is itself a shared 
> > > > variable and is dereferenced in another thread!)
> > > 
> > > Good point!  I did miss this complication.  ;-)
> > 
> > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> > local variables.
> > 
> > > As you say, when its address is taken, the "local" variable needs to be
> > > treated as is it were shared.  There are exceptions where the pointed-to
> > > local is still used only by its process.  Are any of these exceptions
> > > problematic?
> > 
> > Easiest just to rule out the whole can of worms.
> 
> Good point, given that a global can be used instead of a local for
> any case where an address must be taken.

Another thing to consider: Almost all marked accesses involve using the 
address of the storage location (for example, smp_load_acquire's first 
argument must be a pointer).  As far as I can remember at the moment, 
the only ones that don't are READ_ONCE and WRITE_ONCE.  So although we 
might or might not want to allow READ_ONCE or WRITE_ONCE on a local 
variable, we won't have to worry about any of the other kinds of marked 
accesses.

> > > > But even if local variables are treated as non-shared storage locations, 
> > > > we should still handle this correctly.  Part of the problem seems to lie 
> > > > in the definition of the to-r dependency relation; the relevant portion 
> > > > is:
> > > > 
> > > > 	(dep ; [Marked] ; rfi)
> > > > 
> > > > Here dep is the control dependency from the READ_ONCE to the 
> > > > local-variable store, and the rfi refers to the following load of the 
> > > > local variable.  The problem is that the store to the local variable 
> > > > doesn't go in the Marked class, because it is notated as a plain C 
> > > > assignment.  (And likewise for the following load.)
> > > > 
> > > > Should we change the model to make loads from and stores to local 
> > > > variables always count as Marked?
> > > 
> > > As long as the initial (possibly unmarked) load would be properly
> > > complained about.
> > 
> > Sorry, I don't understand what you mean.
> 
> I was thinking in terms of something like this in one of the processes:
> 
> 	p = gp; // Unmarked!
> 	r1 = p;
> 	q = r1; // Implicitly marked now?
> 	if (q)
> 		WRITE_ONCE(x, 1); // ctrl dep from gp???

I hope we won't have to worry about this!  :-)  Treating local variable 
accesses as if they are always marked looks wrong.

> > >  And I cannot immediately think of a situation where
> > > this approach would break that would not result in a data race being
> > > flagged.  Or is this yet another failure of my imagination?
> > 
> > By definition, an access to a local variable cannot participate in a 
> > data race because all such accesses are confined to a single thread.
> 
> True, but its value might have come from a load from a shared variable.

Then that load could have participated in a data race.  But the store to 
the local variable cannot.

> > However, there are other aspects to consider, in particular, the 
> > ordering relations on local-variable accesses.  But if, as Luc says, 
> > local variables are treated just like registers then perhaps the issue 
> > doesn't arise.
> 
> Here is hoping!
> 
> > > > What should have happened if the local variable were instead a shared 
> > > > variable which the other thread didn't access at all?  It seems like a 
> > > > weak point of the memory model that it treats these two things 
> > > > differently.
> > > 
> > > But is this really any different than the situation where a global
> > > variable is only accessed by a single thread?
> > 
> > Indeed; it is the _same_ situation.  Which leads to some interesting 
> > questions, such as: What does READ_ONCE(r) mean when r is a local 
> > variable?  Should it be allowed at all?  In what way is it different 
> > from a plain read of r?
> > 
> > One difference is that the LKMM doesn't allow dependencies to originate 
> > from a plain load.  Of course, when you're dealing with a local 
> > variable, what matters is not the load from that variable but rather the 
> > earlier loads which determined the value that had been stored there.  
> > Which brings us back to the case of the
> > 
> > 	dep ; rfi
> > 
> > dependency relation, where the accesses in the middle are plain and 
> > non-racy.  Should the LKMM be changed to allow this?
> 
> It would be nice, give or take the potential side effects.  ;-)
> As in it would be nice, but might not be worthwhile.

Treating local variables like registers will automatically bring this 
behavior.  So I think we'll be good.

> > There are other differences to consider.  For example:
> > 
> > 	r = READ_ONCE(x);
> > 	smp_wmb();
> > 	WRITE_ONCE(y, 1);
> > 
> > If the write to r were treated as a marked store, the smp_wmb would 
> > order it (and consequently the READ_ONCE) before the WRITE_ONCE.  
> > However we don't want to do this when r is a local variable.  Indeed, a 
> > plain store wouldn't be ordered this way because the compiler might 
> > optimize the store away entirely, leaving the smp_wmb nothing to act on.
> 
> Agreed, having smp_wmb() order things due to a write to a local variable
> would not be what we want.
> 
> > So overall the situation is rather puzzling.  Treating local variables 
> > as registers is probably the best answer.
> 
> That is sounding quite appealing at the moment.

Agreed.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04  3:21             ` Alan Stern
@ 2021-03-04  5:04               ` Paul E. McKenney
  2021-03-04 15:35                 ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-04  5:04 UTC (permalink / raw)
  To: Alan Stern
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> > > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:
> > > 
> > > > > Local variables absolutely should be treated just like CPU registers, if 
> > > > > possible.  In fact, the compiler has the option of keeping local 
> > > > > variables stored in registers.
> > > > > 
> > > > > (Of course, things may get complicated if anyone writes a litmus test 
> > > > > that uses a pointer to a local variable,  Especially if the pointer 
> > > > > could hold the address of a local variable in one execution and a 
> > > > > shared variable in another!  Or if the pointer is itself a shared 
> > > > > variable and is dereferenced in another thread!)
> > > > 
> > > > Good point!  I did miss this complication.  ;-)
> > > 
> > > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> > > local variables.
> > > 
> > > > As you say, when its address is taken, the "local" variable needs to be
> > > > treated as is it were shared.  There are exceptions where the pointed-to
> > > > local is still used only by its process.  Are any of these exceptions
> > > > problematic?
> > > 
> > > Easiest just to rule out the whole can of worms.
> > 
> > Good point, given that a global can be used instead of a local for
> > any case where an address must be taken.
> 
> Another thing to consider: Almost all marked accesses involve using the 
> address of the storage location (for example, smp_load_acquire's first 
> argument must be a pointer).  As far as I can remember at the moment, 
> the only ones that don't are READ_ONCE and WRITE_ONCE.  So although we 
> might or might not want to allow READ_ONCE or WRITE_ONCE on a local 
> variable, we won't have to worry about any of the other kinds of marked 
> accesses.

Good point!

> > > > > But even if local variables are treated as non-shared storage locations, 
> > > > > we should still handle this correctly.  Part of the problem seems to lie 
> > > > > in the definition of the to-r dependency relation; the relevant portion 
> > > > > is:
> > > > > 
> > > > > 	(dep ; [Marked] ; rfi)
> > > > > 
> > > > > Here dep is the control dependency from the READ_ONCE to the 
> > > > > local-variable store, and the rfi refers to the following load of the 
> > > > > local variable.  The problem is that the store to the local variable 
> > > > > doesn't go in the Marked class, because it is notated as a plain C 
> > > > > assignment.  (And likewise for the following load.)
> > > > > 
> > > > > Should we change the model to make loads from and stores to local 
> > > > > variables always count as Marked?
> > > > 
> > > > As long as the initial (possibly unmarked) load would be properly
> > > > complained about.
> > > 
> > > Sorry, I don't understand what you mean.
> > 
> > I was thinking in terms of something like this in one of the processes:
> > 
> > 	p = gp; // Unmarked!
> > 	r1 = p;
> > 	q = r1; // Implicitly marked now?
> > 	if (q)
> > 		WRITE_ONCE(x, 1); // ctrl dep from gp???
> 
> I hope we won't have to worry about this!  :-)  Treating local variable 
> accesses as if they are always marked looks wrong.

Good, that is where I was also heading.  ;-)

> > > >  And I cannot immediately think of a situation where
> > > > this approach would break that would not result in a data race being
> > > > flagged.  Or is this yet another failure of my imagination?
> > > 
> > > By definition, an access to a local variable cannot participate in a 
> > > data race because all such accesses are confined to a single thread.
> > 
> > True, but its value might have come from a load from a shared variable.
> 
> Then that load could have participated in a data race.  But the store to 
> the local variable cannot.

Agreed.  My thought was that if the ordering from the initial (non-local)
load mattered, then that initial load must have participated in a
data race.  Is that true, or am I failing to perceive some corner case?

> > > However, there are other aspects to consider, in particular, the 
> > > ordering relations on local-variable accesses.  But if, as Luc says, 
> > > local variables are treated just like registers then perhaps the issue 
> > > doesn't arise.
> > 
> > Here is hoping!
> > 
> > > > > What should have happened if the local variable were instead a shared 
> > > > > variable which the other thread didn't access at all?  It seems like a 
> > > > > weak point of the memory model that it treats these two things 
> > > > > differently.
> > > > 
> > > > But is this really any different than the situation where a global
> > > > variable is only accessed by a single thread?
> > > 
> > > Indeed; it is the _same_ situation.  Which leads to some interesting 
> > > questions, such as: What does READ_ONCE(r) mean when r is a local 
> > > variable?  Should it be allowed at all?  In what way is it different 
> > > from a plain read of r?
> > > 
> > > One difference is that the LKMM doesn't allow dependencies to originate 
> > > from a plain load.  Of course, when you're dealing with a local 
> > > variable, what matters is not the load from that variable but rather the 
> > > earlier loads which determined the value that had been stored there.  
> > > Which brings us back to the case of the
> > > 
> > > 	dep ; rfi
> > > 
> > > dependency relation, where the accesses in the middle are plain and 
> > > non-racy.  Should the LKMM be changed to allow this?
> > 
> > It would be nice, give or take the potential side effects.  ;-)
> > As in it would be nice, but might not be worthwhile.
> 
> Treating local variables like registers will automatically bring this 
> behavior.  So I think we'll be good.

Sounds good.

> > > There are other differences to consider.  For example:
> > > 
> > > 	r = READ_ONCE(x);
> > > 	smp_wmb();
> > > 	WRITE_ONCE(y, 1);
> > > 
> > > If the write to r were treated as a marked store, the smp_wmb would 
> > > order it (and consequently the READ_ONCE) before the WRITE_ONCE.  
> > > However we don't want to do this when r is a local variable.  Indeed, a 
> > > plain store wouldn't be ordered this way because the compiler might 
> > > optimize the store away entirely, leaving the smp_wmb nothing to act on.
> > 
> > Agreed, having smp_wmb() order things due to a write to a local variable
> > would not be what we want.
> > 
> > > So overall the situation is rather puzzling.  Treating local variables 
> > > as registers is probably the best answer.
> > 
> > That is sounding quite appealing at the moment.
> 
> Agreed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04  3:13             ` Alan Stern
@ 2021-03-04  6:33               ` Boqun Feng
  2021-03-04 16:11                 ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Boqun Feng @ 2021-03-04  6:33 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 10:13:22PM -0500, Alan Stern wrote:
> On Thu, Mar 04, 2021 at 09:26:31AM +0800, Boqun Feng wrote:
> > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> 
> > > Which brings us back to the case of the
> > > 
> > > 	dep ; rfi
> > > 
> > > dependency relation, where the accesses in the middle are plain and 
> > > non-racy.  Should the LKMM be changed to allow this?
> > > 
> > 
> > For this particular question, do we need to consider code as the follow?
> > 
> > 	r1 = READ_ONCE(x);  // f
> > 	if (r == 1) {
> > 		local_v = &y; // g
> > 		do_something_a();
> > 	}
> > 	else {
> > 		local_v = &y;
> > 		do_something_b();
> > 	}
> > 
> > 	r2 = READ_ONCE(*local_v); // e
> > 
> > , do we have the guarantee that the first READ_ONCE() happens before the
> > second one? Can compiler optimize the code as:
> > 
> > 	r2 = READ_ONCE(y);
> > 	r1 = READ_ONCE(x);
> 
> Well, it can't do that because the compiler isn't allowed to reorder
> volatile accesses (which includes READ_ONCE).  But the compiler could
> do:
> 
> 	r1 = READ_ONCE(x);
> 	r2 = READ_ONCE(y);
> 
> > 	if (r == 1) {
> > 		do_something_a();
> > 	}
> > 	else {
> > 		do_something_b();
> > 	}
> > 
> > ? Although we have:
> > 
> > 	f ->dep g ->rfi ->addr e
> 
> This would be an example of a problem Paul has described on several
> occasions, where both arms of an "if" statement store the same value
> (in this case to local_v).  This problem arises even when local
> variables are not involved.  For example:
> 
> 	if (READ_ONCE(x) == 0) {
> 		WRITE_ONCE(y, 1);
> 		do_a();
> 	} else {
> 		WRITE_ONCE(y, 1);
> 		do_b();
> 	}
> 
> The compiler can change this to:
> 
> 	r = READ_ONCE(x);
> 	WRITE_ONCE(y, 1);
> 	if (r == 0)
> 		do_a();
> 	else
> 		do_b();
> 
> thus allowing the marked accesses to be reordered by the CPU and
> breaking the apparent control dependency.
> 
> So the answer to your question is: No, we don't have this guarantee,
> but the reason is because of doing the same store in both arms, not
> because of the use of local variables.
> 

Right, I was thinking about something unrelated.. but how about the
following case:

	local_v = &y;
	r1 = READ_ONCE(*x); // f

	if (r1 == 1) {
		local_v = &y; // e
	} else {
		local_v = &z; // d
	}

	p = READ_ONCE(local_v); // g

	r2 = READ_ONCE(*p);   // h

if r1 == 1, we definitely think we have:

	f ->ctrl e ->rfi g ->addr h

, and if we treat ctrl;rfi as "to-r", then we have "f" happens before
"h". However compile can optimze the above as:

	local_v = &y;

	r1 = READ_ONCE(*x); // f

	if (r1 != 1) {
		local_v = &z; // d
	}

	p = READ_ONCE(local_v); // g

	r2 = READ_ONCE(*p);   // h

, and when this gets executed, I don't think we have the guarantee we
have "f" happens before "h", because CPU can do optimistic read for "g"
and "h".

Part of this is because when we take plain access into consideration, we
won't guarantee a read-from or other relations exists if compiler
optimization happens.

Maybe I'm missing something subtle, but just try to think through the
effect of making dep; rfi as "to-r".

Regards,
Boqun

> Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04  5:04               ` Paul E. McKenney
@ 2021-03-04 15:35                 ` Alan Stern
  2021-03-04 19:05                   ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2021-03-04 15:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:

> > > > >  And I cannot immediately think of a situation where
> > > > > this approach would break that would not result in a data race being
> > > > > flagged.  Or is this yet another failure of my imagination?
> > > > 
> > > > By definition, an access to a local variable cannot participate in a 
> > > > data race because all such accesses are confined to a single thread.
> > > 
> > > True, but its value might have come from a load from a shared variable.
> > 
> > Then that load could have participated in a data race.  But the store to 
> > the local variable cannot.
> 
> Agreed.  My thought was that if the ordering from the initial (non-local)
> load mattered, then that initial load must have participated in a
> data race.  Is that true, or am I failing to perceive some corner case?

Ordering can matter even when no data race is involved.  Just think
about how much of the memory model is concerned with ordering of
marked accesses, which don't participate in data races unless there is
a conflicting plain access somewhere.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-03 20:22         ` Alan Stern
  2021-03-03 22:03           ` Paul E. McKenney
  2021-03-04  1:26           ` Boqun Feng
@ 2021-03-04 15:44           ` maranget
  2021-03-04 19:07             ` Paul E. McKenney
  2 siblings, 1 reply; 33+ messages in thread
From: maranget @ 2021-03-04 15:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, Andrea Parri,
	Will Deacon, Peter Zijlstra, Boqun Feng, Nicholas Piggin,
	David Howells, Alglave, Jade, Akira Yokosawa, Daniel Lustig,
	joel, Toke Høiland-Jørgensen, Karlsson, Magnus



> On 3 Mar 2021, at 21:22, Alan Stern <stern@rowland.harvard.edu> wrote:
> 
>>> 
>>> Local variables absolutely should be treated just like CPU registers, if 
>>> possible.  In fact, the compiler has the option of keeping local 
>>> variables stored in registers.
>>> 
>>> (Of course, things may get complicated if anyone writes a litmus test 
>>> that uses a pointer to a local variable,  Especially if the pointer 
>>> could hold the address of a local variable in one execution and a 
>>> shared variable in another!  Or if the pointer is itself a shared 
>>> variable and is dereferenced in another thread!)
>> 
>> Good point!  I did miss this complication.  ;-)
> 
> I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> local variables.
> 
> 

Herd7 does disallow taking addresses of local variables.

However, such  tests can still be run on machine, provided function bodies are accepted by the C compiler.

—Luc


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04  6:33               ` Boqun Feng
@ 2021-03-04 16:11                 ` Alan Stern
  2021-03-05  1:12                   ` Boqun Feng
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2021-03-04 16:11 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Thu, Mar 04, 2021 at 02:33:32PM +0800, Boqun Feng wrote:

> Right, I was thinking about something unrelated.. but how about the
> following case:
> 
> 	local_v = &y;
> 	r1 = READ_ONCE(*x); // f
> 
> 	if (r1 == 1) {
> 		local_v = &y; // e
> 	} else {
> 		local_v = &z; // d
> 	}
> 
> 	p = READ_ONCE(local_v); // g
> 
> 	r2 = READ_ONCE(*p);   // h
> 
> if r1 == 1, we definitely think we have:
> 
> 	f ->ctrl e ->rfi g ->addr h
> 
> , and if we treat ctrl;rfi as "to-r", then we have "f" happens before
> "h". However compile can optimze the above as:
> 
> 	local_v = &y;
> 
> 	r1 = READ_ONCE(*x); // f
> 
> 	if (r1 != 1) {
> 		local_v = &z; // d
> 	}
> 
> 	p = READ_ONCE(local_v); // g
> 
> 	r2 = READ_ONCE(*p);   // h
> 
> , and when this gets executed, I don't think we have the guarantee we
> have "f" happens before "h", because CPU can do optimistic read for "g"
> and "h".

In your example, which accesses are supposed to be to actual memory and 
which to registers?  Also, remember that the memory model assumes the 
hardware does not reorder loads if there is an address dependency 
between them.

> Part of this is because when we take plain access into consideration, we
> won't guarantee a read-from or other relations exists if compiler
> optimization happens.
> 
> Maybe I'm missing something subtle, but just try to think through the
> effect of making dep; rfi as "to-r".

Forget about local variables for the time being and just consider

	dep ; [Plain] ; rfi

For example:

	A: r1 = READ_ONCE(x);
	   y = r1;
	B: r2 = READ_ONCE(y);

Should B be ordered after A?  I don't see how any CPU could hope to 
excute B before A, but maybe I'm missing something.

There's another twist, connected with the fact that herd7 can't detect 
control dependencies caused by unexecuted code.  If we have:

	A: r1 = READ_ONCE(x);
	if (r1)
		WRITE_ONCE(y, 5);
	r2 = READ_ONCE(y);
	B: WRITE_ONCE(z, r2);

then in executions where x == 0, herd7 doesn't see any control 
dependency.  But CPUs do see control dependencies whenever there is a 
conditional branch, whether the branch is taken or not, and so they will 
never reorder B before A.

One last thing to think about: My original assessment or Björn's problem 
wasn't right, because the dep in (dep ; rfi) doesn't include control 
dependencies.  Only data and address.  So I believe that the LKMM 
wouldn't consider A to be ordered before B in this example even if x 
was nonzero.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04 15:35                 ` Alan Stern
@ 2021-03-04 19:05                   ` Paul E. McKenney
  2021-03-04 21:27                     ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-04 19:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote:
> On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> 
> > > > > >  And I cannot immediately think of a situation where
> > > > > > this approach would break that would not result in a data race being
> > > > > > flagged.  Or is this yet another failure of my imagination?
> > > > > 
> > > > > By definition, an access to a local variable cannot participate in a 
> > > > > data race because all such accesses are confined to a single thread.
> > > > 
> > > > True, but its value might have come from a load from a shared variable.
> > > 
> > > Then that load could have participated in a data race.  But the store to 
> > > the local variable cannot.
> > 
> > Agreed.  My thought was that if the ordering from the initial (non-local)
> > load mattered, then that initial load must have participated in a
> > data race.  Is that true, or am I failing to perceive some corner case?
> 
> Ordering can matter even when no data race is involved.  Just think
> about how much of the memory model is concerned with ordering of
> marked accesses, which don't participate in data races unless there is
> a conflicting plain access somewhere.

Fair point.  Should I have instead said "then that initial load must
have run concurrently with a store to that same variable"?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04 15:44           ` maranget
@ 2021-03-04 19:07             ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-04 19:07 UTC (permalink / raw)
  To: maranget
  Cc: Alan Stern, Björn Töpel, bpf, LKML, Andrea Parri,
	Will Deacon, Peter Zijlstra, Boqun Feng, Nicholas Piggin,
	David Howells, Alglave, Jade, Akira Yokosawa, Daniel Lustig,
	joel, Toke Høiland-Jørgensen, Karlsson, Magnus

On Thu, Mar 04, 2021 at 04:44:34PM +0100, maranget wrote:
> 
> 
> > On 3 Mar 2021, at 21:22, Alan Stern <stern@rowland.harvard.edu> wrote:
> > 
> >>> 
> >>> Local variables absolutely should be treated just like CPU registers, if 
> >>> possible.  In fact, the compiler has the option of keeping local 
> >>> variables stored in registers.
> >>> 
> >>> (Of course, things may get complicated if anyone writes a litmus test 
> >>> that uses a pointer to a local variable,  Especially if the pointer 
> >>> could hold the address of a local variable in one execution and a 
> >>> shared variable in another!  Or if the pointer is itself a shared 
> >>> variable and is dereferenced in another thread!)
> >> 
> >> Good point!  I did miss this complication.  ;-)
> > 
> > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> > local variables.
> 
> Herd7 does disallow taking addresses of local variables.

Good to know, and thank you!

> However, such  tests can still be run on machine, provided function bodies are accepted by the C compiler.

True, but that would be outside of the LKMM proper, correct?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04 19:05                   ` Paul E. McKenney
@ 2021-03-04 21:27                     ` Alan Stern
  2021-03-04 22:05                       ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2021-03-04 21:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Thu, Mar 04, 2021 at 11:05:15AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > 
> > > > > > >  And I cannot immediately think of a situation where
> > > > > > > this approach would break that would not result in a data race being
> > > > > > > flagged.  Or is this yet another failure of my imagination?
> > > > > > 
> > > > > > By definition, an access to a local variable cannot participate in a 
> > > > > > data race because all such accesses are confined to a single thread.
> > > > > 
> > > > > True, but its value might have come from a load from a shared variable.
> > > > 
> > > > Then that load could have participated in a data race.  But the store to 
> > > > the local variable cannot.
> > > 
> > > Agreed.  My thought was that if the ordering from the initial (non-local)
> > > load mattered, then that initial load must have participated in a
> > > data race.  Is that true, or am I failing to perceive some corner case?
> > 
> > Ordering can matter even when no data race is involved.  Just think
> > about how much of the memory model is concerned with ordering of
> > marked accesses, which don't participate in data races unless there is
> > a conflicting plain access somewhere.
> 
> Fair point.  Should I have instead said "then that initial load must
> have run concurrently with a store to that same variable"?

I'm losing track of the point you were originally trying to make.

Does ordering matter when there are no conflicting accesses?  Sure.  
Consider this:

	A: r1 = READ_ONCE(x);
	B: WRITE_ONCE(y, r1);
	   smp_wmb();
	C: WRITE_ONCE(z, 1);

Even if there are no other accesses to y at all (let alone any 
conflicting ones), the mere existence of B forces A to be ordered before 
C, and this is easily detectable by a litmus test.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04 21:27                     ` Alan Stern
@ 2021-03-04 22:05                       ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2021-03-04 22:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon,
	Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Thu, Mar 04, 2021 at 04:27:53PM -0500, Alan Stern wrote:
> On Thu, Mar 04, 2021 at 11:05:15AM -0800, Paul E. McKenney wrote:
> > On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote:
> > > On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> > > > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > > > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > > 
> > > > > > > >  And I cannot immediately think of a situation where
> > > > > > > > this approach would break that would not result in a data race being
> > > > > > > > flagged.  Or is this yet another failure of my imagination?
> > > > > > > 
> > > > > > > By definition, an access to a local variable cannot participate in a 
> > > > > > > data race because all such accesses are confined to a single thread.
> > > > > > 
> > > > > > True, but its value might have come from a load from a shared variable.
> > > > > 
> > > > > Then that load could have participated in a data race.  But the store to 
> > > > > the local variable cannot.
> > > > 
> > > > Agreed.  My thought was that if the ordering from the initial (non-local)
> > > > load mattered, then that initial load must have participated in a
> > > > data race.  Is that true, or am I failing to perceive some corner case?
> > > 
> > > Ordering can matter even when no data race is involved.  Just think
> > > about how much of the memory model is concerned with ordering of
> > > marked accesses, which don't participate in data races unless there is
> > > a conflicting plain access somewhere.
> > 
> > Fair point.  Should I have instead said "then that initial load must
> > have run concurrently with a store to that same variable"?
> 
> I'm losing track of the point you were originally trying to make.
> 
> Does ordering matter when there are no conflicting accesses?  Sure.  
> Consider this:
> 
> 	A: r1 = READ_ONCE(x);
> 	B: WRITE_ONCE(y, r1);
> 	   smp_wmb();
> 	C: WRITE_ONCE(z, 1);
> 
> Even if there are no other accesses to y at all (let alone any 
> conflicting ones), the mere existence of B forces A to be ordered before 
> C, and this is easily detectable by a litmus test.

Given that herd7 treats all local variables as registers (including
forbidding taking their addresses), and given that we are not thinking of
treating local-variable accesses as if they were marked, this is likely
all moot.

But just in case...

I was trying to figure out if there was a litmus test of the following
form where it might make a difference if local-variable accesses were
treated as if they were marked.  So is there something like this:

	r1 = x;
	if (r1)
	   	WRITE_ONCE(y, 1);

where implicitly treating the accesses to r1 as marked would make a
difference.  I was thinking that any such example would have to result
in LKMM flagging the load from x as a data race.  However, your example
inserting the smp_wmb() does shed some doubt on that theory.

This of course is moot unless we come back to treating local-variable
accesses as if they were marked.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-04 16:11                 ` Alan Stern
@ 2021-03-05  1:12                   ` Boqun Feng
  2021-03-05 16:15                     ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Boqun Feng @ 2021-03-05  1:12 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Thu, Mar 04, 2021 at 11:11:42AM -0500, Alan Stern wrote:
> On Thu, Mar 04, 2021 at 02:33:32PM +0800, Boqun Feng wrote:
> 
> > Right, I was thinking about something unrelated.. but how about the
> > following case:
> > 
> > 	local_v = &y;
> > 	r1 = READ_ONCE(*x); // f
> > 
> > 	if (r1 == 1) {
> > 		local_v = &y; // e
> > 	} else {
> > 		local_v = &z; // d
> > 	}
> > 
> > 	p = READ_ONCE(local_v); // g
> > 
> > 	r2 = READ_ONCE(*p);   // h
> > 
> > if r1 == 1, we definitely think we have:
> > 
> > 	f ->ctrl e ->rfi g ->addr h
> > 
> > , and if we treat ctrl;rfi as "to-r", then we have "f" happens before
> > "h". However compile can optimze the above as:
> > 
> > 	local_v = &y;
> > 
> > 	r1 = READ_ONCE(*x); // f
> > 
> > 	if (r1 != 1) {
> > 		local_v = &z; // d
> > 	}
> > 
> > 	p = READ_ONCE(local_v); // g
> > 
> > 	r2 = READ_ONCE(*p);   // h
> > 
> > , and when this gets executed, I don't think we have the guarantee we
> > have "f" happens before "h", because CPU can do optimistic read for "g"
> > and "h".
> 
> In your example, which accesses are supposed to be to actual memory and 
> which to registers?  Also, remember that the memory model assumes the 

Given that we use READ_ONCE() on local_v, local_v should be a memory
location but only accessed by this thread.

> hardware does not reorder loads if there is an address dependency 
> between them.
> 

Right, so "g" won't be reordered after "h".

> > Part of this is because when we take plain access into consideration, we
> > won't guarantee a read-from or other relations exists if compiler
> > optimization happens.
> > 
> > Maybe I'm missing something subtle, but just try to think through the
> > effect of making dep; rfi as "to-r".
> 
> Forget about local variables for the time being and just consider
> 
> 	dep ; [Plain] ; rfi
> 
> For example:
> 
> 	A: r1 = READ_ONCE(x);
> 	   y = r1;
> 	B: r2 = READ_ONCE(y);
> 
> Should B be ordered after A?  I don't see how any CPU could hope to 
> excute B before A, but maybe I'm missing something.
> 

Agreed.

> There's another twist, connected with the fact that herd7 can't detect 
> control dependencies caused by unexecuted code.  If we have:
> 
> 	A: r1 = READ_ONCE(x);
> 	if (r1)
> 		WRITE_ONCE(y, 5);
> 	r2 = READ_ONCE(y);
> 	B: WRITE_ONCE(z, r2);
> 
> then in executions where x == 0, herd7 doesn't see any control 
> dependency.  But CPUs do see control dependencies whenever there is a 
> conditional branch, whether the branch is taken or not, and so they will 
> never reorder B before A.
> 

Right, because B in this example is a write, what if B is a read that
depends on r2, like in my example? Let y be a pointer to a memory
location, and initialized as a valid value (pointing to a valid memory
location) you example changed to:

	A: r1 = READ_ONCE(x);
	if (r1)
		WRITE_ONCE(y, 5);
	C: r2 = READ_ONCE(y);
	B: r3 = READ_ONCE(*r2);

, then A don't have the control dependency to B, because A and B is
read+read. So B can be ordered before A, right?

> One last thing to think about: My original assessment or Björn's problem 
> wasn't right, because the dep in (dep ; rfi) doesn't include control 
> dependencies.  Only data and address.  So I believe that the LKMM 

Ah, right. I was mising that part (ctrl is not in dep). So I guess my
example is pointless for the question we are discussing here ;-(

> wouldn't consider A to be ordered before B in this example even if x 
> was nonzero.

Yes, and similar to my example (changing B to a read).

I did try to run my example with herd, and got confused no matter I make
dep; [Plain]; rfi as to-r (I got the same result telling me a reorder
can happen). Now the reason is clear, because this is a ctrl; rfi not a
dep; rfi.

Thanks so much for walking with me on this ;-)

Regards,
Boqun

> 
> Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: XDP socket rings, and LKMM litmus tests
  2021-03-05  1:12                   ` Boqun Feng
@ 2021-03-05 16:15                     ` Alan Stern
  0 siblings, 0 replies; 33+ messages in thread
From: Alan Stern @ 2021-03-05 16:15 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea,
	Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel,
	Toke Høiland-Jørgensen, Karlsson, Magnus

On Fri, Mar 05, 2021 at 09:12:30AM +0800, Boqun Feng wrote:
> On Thu, Mar 04, 2021 at 11:11:42AM -0500, Alan Stern wrote:

> > Forget about local variables for the time being and just consider
> > 
> > 	dep ; [Plain] ; rfi
> > 
> > For example:
> > 
> > 	A: r1 = READ_ONCE(x);
> > 	   y = r1;
> > 	B: r2 = READ_ONCE(y);
> > 
> > Should B be ordered after A?  I don't see how any CPU could hope to 
> > excute B before A, but maybe I'm missing something.
> > 
> 
> Agreed.
> 
> > There's another twist, connected with the fact that herd7 can't detect 
> > control dependencies caused by unexecuted code.  If we have:
> > 
> > 	A: r1 = READ_ONCE(x);
> > 	if (r1)
> > 		WRITE_ONCE(y, 5);
> > 	r2 = READ_ONCE(y);
> > 	B: WRITE_ONCE(z, r2);
> > 
> > then in executions where x == 0, herd7 doesn't see any control 
> > dependency.  But CPUs do see control dependencies whenever there is a 
> > conditional branch, whether the branch is taken or not, and so they will 
> > never reorder B before A.
> > 
> 
> Right, because B in this example is a write, what if B is a read that
> depends on r2, like in my example? Let y be a pointer to a memory
> location, and initialized as a valid value (pointing to a valid memory
> location) you example changed to:
> 
> 	A: r1 = READ_ONCE(x);
> 	if (r1)
> 		WRITE_ONCE(y, 5);
> 	C: r2 = READ_ONCE(y);
> 	B: r3 = READ_ONCE(*r2);
> 
> , then A don't have the control dependency to B, because A and B is
> read+read. So B can be ordered before A, right?

Yes, I think that's right: Both C and B can be executed before A.

> > One last thing to think about: My original assessment or Björn's problem 
> > wasn't right, because the dep in (dep ; rfi) doesn't include control 
> > dependencies.  Only data and address.  So I believe that the LKMM 
> 
> Ah, right. I was mising that part (ctrl is not in dep). So I guess my
> example is pointless for the question we are discussing here ;-(
> 
> > wouldn't consider A to be ordered before B in this example even if x 
> > was nonzero.
> 
> Yes, and similar to my example (changing B to a read).
> 
> I did try to run my example with herd, and got confused no matter I make
> dep; [Plain]; rfi as to-r (I got the same result telling me a reorder
> can happen). Now the reason is clear, because this is a ctrl; rfi not a
> dep; rfi.
> 
> Thanks so much for walking with me on this ;-)

You're welcome.  At this point, it looks like the only remaining 
question is whether to include (dep ; [Plain] ; rfi) in to-r.  This 
doesn't seem to be an urgent question.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-03-05 16:16 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 18:46 XDP socket rings, and LKMM litmus tests Björn Töpel
2021-03-02 19:57 ` Paul E. McKenney
2021-03-02 20:04   ` Paul E. McKenney
2021-03-02 20:37     ` Björn Töpel
2021-03-02 20:24   ` Björn Töpel
2021-03-02 20:41     ` Paul E. McKenney
2021-03-02 20:51       ` Björn Töpel
2021-03-02 21:14 ` Alan Stern
2021-03-02 23:50   ` Paul E. McKenney
2021-03-03  6:37     ` maranget
2021-03-03 16:54       ` Paul E. McKenney
2021-03-03 17:12     ` Alan Stern
2021-03-03 17:37       ` maranget
2021-03-03 17:39         ` maranget
2021-03-03 21:56           ` Paul E. McKenney
2021-03-03 19:40         ` Alan Stern
2021-03-03 17:40       ` Paul E. McKenney
2021-03-03 20:22         ` Alan Stern
2021-03-03 22:03           ` Paul E. McKenney
2021-03-04  3:21             ` Alan Stern
2021-03-04  5:04               ` Paul E. McKenney
2021-03-04 15:35                 ` Alan Stern
2021-03-04 19:05                   ` Paul E. McKenney
2021-03-04 21:27                     ` Alan Stern
2021-03-04 22:05                       ` Paul E. McKenney
2021-03-04  1:26           ` Boqun Feng
2021-03-04  3:13             ` Alan Stern
2021-03-04  6:33               ` Boqun Feng
2021-03-04 16:11                 ` Alan Stern
2021-03-05  1:12                   ` Boqun Feng
2021-03-05 16:15                     ` Alan Stern
2021-03-04 15:44           ` maranget
2021-03-04 19:07             ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.