* XDP socket rings, and LKMM litmus tests @ 2021-03-02 18:46 Björn Töpel 2021-03-02 19:57 ` Paul E. McKenney 2021-03-02 21:14 ` Alan Stern 0 siblings, 2 replies; 33+ messages in thread From: Björn Töpel @ 2021-03-02 18:46 UTC (permalink / raw) To: bpf, LKML Cc: stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, paulmck, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus Hi! Firstly; The long Cc-list is to reach the LKMM-folks. Some background; the XDP sockets use a ring-buffer to communicate between the kernel and userland. It's a single-consumer/single-producer ring, and described in net/xdp/xsk_queue.h. --8<--- /* The structure of the shared state of the rings are the same as the * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion * ring, the kernel is the producer and user space is the consumer. For * the Tx and fill rings, the kernel is the consumer and user space is * the producer. * * producer consumer * * if (LOAD ->consumer) { LOAD ->producer * (A) smp_rmb() (C) * STORE $data LOAD $data * smp_wmb() (B) smp_mb() (D) * STORE ->producer STORE ->consumer * } * * (A) pairs with (D), and (B) pairs with (C). ... -->8--- I'd like to replace the smp_{r,w,}mb() barriers with acquire-release semantics [1], without breaking existing userspace applications. So, I figured I'd use herd7 and the LKMM model to build a litmus test for the barrier version, then for the acquire-release version, and finally permutations of both. The idea is to use a one element ring, with a state machine outlined in the litmus test. The basic test for the existing smp_{r,w,}mb() barriers looks like: $ cat spsc-rb+1p1c.litmus C spsc-rb+1p1c // Stupid one entry ring: // prod cons allowed action prod cons // 0 0 => prod => 1 0 // 0 1 => cons => 0 0 // 1 0 => cons => 1 1 // 1 1 => prod => 0 1 { prod = 1; } // Here, we start at prod==1,cons==0, data==0, i.e. producer has // written data=0, so from here only the consumer can start, and should // consume data==0. Afterwards, producer can continue and write 1 to // data. Can we enter state prod==0, cons==1, but consumer observerd // the write of 1? P0(int *prod, int *cons, int *data) { int p; int c; int cond = 0; p = READ_ONCE(*prod); c = READ_ONCE(*cons); if (p == 0) if (c == 0) cond = 1; if (p == 1) if (c == 1) cond = 1; if (cond) { smp_mb(); WRITE_ONCE(*data, 1); smp_wmb(); WRITE_ONCE(*prod, p ^ 1); } } P1(int *prod, int *cons, int *data) { int p; int c; int d = -1; int cond = 0; p = READ_ONCE(*prod); c = READ_ONCE(*cons); if (p == 1) if (c == 0) cond = 1; if (p == 0) if (c == 1) cond = 1; if (cond == 1) { smp_rmb(); d = READ_ONCE(*data); smp_mb(); WRITE_ONCE(*cons, c ^ 1); } } exists( 1:d=1 /\ prod=0 /\ cons=1 ); -- The weird state changing if-statements is because that I didn't get '&&' and '||' to work with herd. When this is run: $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus Test spsc-rb+1p1c Allowed States 2 1:d=0; cons=1; prod=0; 1:d=0; cons=1; prod=1; No Witnesses Positive: 0 Negative: 2 Condition exists (1:d=1 /\ prod=0 /\ cons=1) Observation spsc-rb+1p1c Never 0 2 Time spsc-rb+1p1c 0.04 Hash=b399756d6a1301ca5bda042f32130791 Now to my question; In P0 there's an smp_mb(). Without that, the d==1 can be observed from P1 (consumer): $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus Test spsc-rb+1p1c Allowed States 3 1:d=0; cons=1; prod=0; 1:d=0; cons=1; prod=1; 1:d=1; cons=1; prod=0; Ok Witnesses Positive: 1 Negative: 2 Condition exists (1:d=1 /\ prod=0 /\ cons=1) Observation spsc-rb+1p1c Sometimes 1 2 Time spsc-rb+1p1c 0.04 Hash=0047fc21fa77da9a9aee15e35ec367ef In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending on control dependencies") removes the corresponding smp_mb(), and also the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5 ("documentation: Update circular buffer for load-acquire/store-release")) is missing the smp_mb() at the producer-side. I'm trying to wrap my head around why it's OK to remove the smp_mb() in the cases above? I'm worried that the current XDP socket ring implementation (which is missing smp_mb()) might be broken. If you read this far, thanks! :-) Björn [1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 18:46 XDP socket rings, and LKMM litmus tests Björn Töpel @ 2021-03-02 19:57 ` Paul E. McKenney 2021-03-02 20:04 ` Paul E. McKenney 2021-03-02 20:24 ` Björn Töpel 2021-03-02 21:14 ` Alan Stern 1 sibling, 2 replies; 33+ messages in thread From: Paul E. McKenney @ 2021-03-02 19:57 UTC (permalink / raw) To: Björn Töpel Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: > Hi! > > Firstly; The long Cc-list is to reach the LKMM-folks. > > Some background; the XDP sockets use a ring-buffer to communicate > between the kernel and userland. It's a > single-consumer/single-producer ring, and described in > net/xdp/xsk_queue.h. > > --8<--- > /* The structure of the shared state of the rings are the same as the > * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion > * ring, the kernel is the producer and user space is the consumer. For > * the Tx and fill rings, the kernel is the consumer and user space is > * the producer. > * > * producer consumer > * > * if (LOAD ->consumer) { LOAD ->producer > * (A) smp_rmb() (C) > * STORE $data LOAD $data > * smp_wmb() (B) smp_mb() (D) > * STORE ->producer STORE ->consumer > * } > * > * (A) pairs with (D), and (B) pairs with (C). > ... > -->8--- > > I'd like to replace the smp_{r,w,}mb() barriers with acquire-release > semantics [1], without breaking existing userspace applications. > > So, I figured I'd use herd7 and the LKMM model to build a litmus test > for the barrier version, then for the acquire-release version, and > finally permutations of both. > > The idea is to use a one element ring, with a state machine outlined > in the litmus test. > > The basic test for the existing smp_{r,w,}mb() barriers looks like: > > $ cat spsc-rb+1p1c.litmus > C spsc-rb+1p1c > > // Stupid one entry ring: > // prod cons allowed action prod cons > // 0 0 => prod => 1 0 > // 0 1 => cons => 0 0 > // 1 0 => cons => 1 1 > // 1 1 => prod => 0 1 > > { prod = 1; } > > // Here, we start at prod==1,cons==0, data==0, i.e. producer has > // written data=0, so from here only the consumer can start, and should > // consume data==0. Afterwards, producer can continue and write 1 to > // data. Can we enter state prod==0, cons==1, but consumer observerd > // the write of 1? > > P0(int *prod, int *cons, int *data) > { > int p; > int c; > int cond = 0; > > p = READ_ONCE(*prod); > c = READ_ONCE(*cons); > if (p == 0) > if (c == 0) > cond = 1; > if (p == 1) > if (c == 1) > cond = 1; > > if (cond) { > smp_mb(); > WRITE_ONCE(*data, 1); > smp_wmb(); > WRITE_ONCE(*prod, p ^ 1); > } > } > > P1(int *prod, int *cons, int *data) > { > int p; > int c; > int d = -1; > int cond = 0; > > p = READ_ONCE(*prod); > c = READ_ONCE(*cons); > if (p == 1) > if (c == 0) > cond = 1; > if (p == 0) > if (c == 1) > cond = 1; > > if (cond == 1) { > smp_rmb(); > d = READ_ONCE(*data); > smp_mb(); > WRITE_ONCE(*cons, c ^ 1); > } > } Before digging in too deeply, does the following simplification still capture your intent? P0(int *prod, int *cons, int *data) { int p; int cond = 0; p = READ_ONCE(*prod); if (p == READ_ONCE(*cons)) cond = 1; if (cond) { smp_mb(); WRITE_ONCE(*data, 1); smp_wmb(); WRITE_ONCE(*prod, p ^ 1); } } P1(int *prod, int *cons, int *data) { int c; int d = -1; int cond = 0; c = READ_ONCE(*cons); if (READ_ONCE(*prod) == c) cond = 1; if (cond == 1) { smp_rmb(); d = READ_ONCE(*data); smp_mb(); WRITE_ONCE(*cons, c ^ 1); } } Thanx, Paul > exists( 1:d=1 /\ prod=0 /\ cons=1 ); > > -- > > The weird state changing if-statements is because that I didn't get > '&&' and '||' to work with herd. > > When this is run: > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > Test spsc-rb+1p1c Allowed > States 2 > 1:d=0; cons=1; prod=0; > 1:d=0; cons=1; prod=1; > No > Witnesses > Positive: 0 Negative: 2 > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > Observation spsc-rb+1p1c Never 0 2 > Time spsc-rb+1p1c 0.04 > Hash=b399756d6a1301ca5bda042f32130791 > > Now to my question; In P0 there's an smp_mb(). Without that, the d==1 > can be observed from P1 (consumer): > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > Test spsc-rb+1p1c Allowed > States 3 > 1:d=0; cons=1; prod=0; > 1:d=0; cons=1; prod=1; > 1:d=1; cons=1; prod=0; > Ok > Witnesses > Positive: 1 Negative: 2 > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > Observation spsc-rb+1p1c Sometimes 1 2 > Time spsc-rb+1p1c 0.04 > Hash=0047fc21fa77da9a9aee15e35ec367ef > > In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending > on control dependencies") removes the corresponding smp_mb(), and also > the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5 > ("documentation: Update circular buffer for > load-acquire/store-release")) is missing the smp_mb() at the > producer-side. > > I'm trying to wrap my head around why it's OK to remove the smp_mb() > in the cases above? I'm worried that the current XDP socket ring > implementation (which is missing smp_mb()) might be broken. > > > If you read this far, thanks! :-) > Björn > > > [1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 19:57 ` Paul E. McKenney @ 2021-03-02 20:04 ` Paul E. McKenney 2021-03-02 20:37 ` Björn Töpel 2021-03-02 20:24 ` Björn Töpel 1 sibling, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2021-03-02 20:04 UTC (permalink / raw) To: Björn Töpel Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, Mar 02, 2021 at 11:57:58AM -0800, Paul E. McKenney wrote: > On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: > > Hi! > > > > Firstly; The long Cc-list is to reach the LKMM-folks. > > > > Some background; the XDP sockets use a ring-buffer to communicate > > between the kernel and userland. It's a > > single-consumer/single-producer ring, and described in > > net/xdp/xsk_queue.h. > > > > --8<--- > > /* The structure of the shared state of the rings are the same as the > > * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion > > * ring, the kernel is the producer and user space is the consumer. For > > * the Tx and fill rings, the kernel is the consumer and user space is > > * the producer. > > * > > * producer consumer > > * > > * if (LOAD ->consumer) { LOAD ->producer > > * (A) smp_rmb() (C) > > * STORE $data LOAD $data > > * smp_wmb() (B) smp_mb() (D) > > * STORE ->producer STORE ->consumer > > * } > > * > > * (A) pairs with (D), and (B) pairs with (C). > > ... > > -->8--- > > > > I'd like to replace the smp_{r,w,}mb() barriers with acquire-release > > semantics [1], without breaking existing userspace applications. > > > > So, I figured I'd use herd7 and the LKMM model to build a litmus test > > for the barrier version, then for the acquire-release version, and > > finally permutations of both. > > > > The idea is to use a one element ring, with a state machine outlined > > in the litmus test. > > > > The basic test for the existing smp_{r,w,}mb() barriers looks like: > > > > $ cat spsc-rb+1p1c.litmus > > C spsc-rb+1p1c > > > > // Stupid one entry ring: > > // prod cons allowed action prod cons > > // 0 0 => prod => 1 0 > > // 0 1 => cons => 0 0 > > // 1 0 => cons => 1 1 > > // 1 1 => prod => 0 1 > > > > { prod = 1; } > > > > // Here, we start at prod==1,cons==0, data==0, i.e. producer has > > // written data=0, so from here only the consumer can start, and should > > // consume data==0. Afterwards, producer can continue and write 1 to > > // data. Can we enter state prod==0, cons==1, but consumer observerd > > // the write of 1? > > > > P0(int *prod, int *cons, int *data) > > { > > int p; > > int c; > > int cond = 0; > > > > p = READ_ONCE(*prod); > > c = READ_ONCE(*cons); > > if (p == 0) > > if (c == 0) > > cond = 1; > > if (p == 1) > > if (c == 1) > > cond = 1; > > > > if (cond) { > > smp_mb(); > > WRITE_ONCE(*data, 1); > > smp_wmb(); > > WRITE_ONCE(*prod, p ^ 1); > > } > > } > > > > P1(int *prod, int *cons, int *data) > > { > > int p; > > int c; > > int d = -1; > > int cond = 0; > > > > p = READ_ONCE(*prod); > > c = READ_ONCE(*cons); > > if (p == 1) > > if (c == 0) > > cond = 1; > > if (p == 0) > > if (c == 1) > > cond = 1; > > > > if (cond == 1) { > > smp_rmb(); > > d = READ_ONCE(*data); > > smp_mb(); > > WRITE_ONCE(*cons, c ^ 1); > > } > > } > > Before digging in too deeply, does the following simplification > still capture your intent? > > P0(int *prod, int *cons, int *data) > { > int p; > int cond = 0; > > p = READ_ONCE(*prod); > if (p == READ_ONCE(*cons)) > cond = 1; > if (cond) { > smp_mb(); > WRITE_ONCE(*data, 1); > smp_wmb(); > WRITE_ONCE(*prod, p ^ 1); > } > } > > P1(int *prod, int *cons, int *data) > { > int c; > int d = -1; > int cond = 0; > > c = READ_ONCE(*cons); > if (READ_ONCE(*prod) == c) > cond = 1; > > if (cond == 1) { > smp_rmb(); > d = READ_ONCE(*data); > smp_mb(); > WRITE_ONCE(*cons, c ^ 1); > } > } And if the answer is "yes", how about this one? P0(int *prod, int *cons, int *data) { int p; p = READ_ONCE(*prod); if (p == READ_ONCE(*cons)) { WRITE_ONCE(*data, 1); smp_wmb(); WRITE_ONCE(*prod, p ^ 1); } } P1(int *prod, int *cons, int *data) { int c; int d = -1; c = READ_ONCE(*cons); if (READ_ONCE(*prod) == c) { smp_rmb(); d = READ_ONCE(*data); smp_mb(); WRITE_ONCE(*cons, c ^ 1); } } Thanx, Paul > > exists( 1:d=1 /\ prod=0 /\ cons=1 ); > > > > -- > > > > The weird state changing if-statements is because that I didn't get > > '&&' and '||' to work with herd. > > > > When this is run: > > > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > > Test spsc-rb+1p1c Allowed > > States 2 > > 1:d=0; cons=1; prod=0; > > 1:d=0; cons=1; prod=1; > > No > > Witnesses > > Positive: 0 Negative: 2 > > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > > Observation spsc-rb+1p1c Never 0 2 > > Time spsc-rb+1p1c 0.04 > > Hash=b399756d6a1301ca5bda042f32130791 > > > > Now to my question; In P0 there's an smp_mb(). Without that, the d==1 > > can be observed from P1 (consumer): > > > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > > Test spsc-rb+1p1c Allowed > > States 3 > > 1:d=0; cons=1; prod=0; > > 1:d=0; cons=1; prod=1; > > 1:d=1; cons=1; prod=0; > > Ok > > Witnesses > > Positive: 1 Negative: 2 > > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > > Observation spsc-rb+1p1c Sometimes 1 2 > > Time spsc-rb+1p1c 0.04 > > Hash=0047fc21fa77da9a9aee15e35ec367ef > > > > In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending > > on control dependencies") removes the corresponding smp_mb(), and also > > the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5 > > ("documentation: Update circular buffer for > > load-acquire/store-release")) is missing the smp_mb() at the > > producer-side. > > > > I'm trying to wrap my head around why it's OK to remove the smp_mb() > > in the cases above? I'm worried that the current XDP socket ring > > implementation (which is missing smp_mb()) might be broken. > > > > > > If you read this far, thanks! :-) > > Björn > > > > > > [1] https://lore.kernel.org/bpf/20210301104318.263262-2-bjorn.topel@gmail.com/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 20:04 ` Paul E. McKenney @ 2021-03-02 20:37 ` Björn Töpel 0 siblings, 0 replies; 33+ messages in thread From: Björn Töpel @ 2021-03-02 20:37 UTC (permalink / raw) To: paulmck Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, 2 Mar 2021 at 21:04, Paul E. McKenney <paulmck@kernel.org> wrote: > [...] > > And if the answer is "yes", how about this one? > With the == to != change in P1, this is OK! > P0(int *prod, int *cons, int *data) > { > int p; > > p = READ_ONCE(*prod); > if (p == READ_ONCE(*cons)) { ...and now d==1 is not a valid state anymore according to herd. If think I need some input here! :-D > WRITE_ONCE(*data, 1); > smp_wmb(); > WRITE_ONCE(*prod, p ^ 1); > } > } > > P1(int *prod, int *cons, int *data) > { > int c; > int d = -1; > > c = READ_ONCE(*cons); > if (READ_ONCE(*prod) == c) { > smp_rmb(); > d = READ_ONCE(*data); > smp_mb(); > WRITE_ONCE(*cons, c ^ 1); > } > } > [...] Björn ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 19:57 ` Paul E. McKenney 2021-03-02 20:04 ` Paul E. McKenney @ 2021-03-02 20:24 ` Björn Töpel 2021-03-02 20:41 ` Paul E. McKenney 1 sibling, 1 reply; 33+ messages in thread From: Björn Töpel @ 2021-03-02 20:24 UTC (permalink / raw) To: paulmck Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, 2 Mar 2021 at 20:57, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: [...] > > Before digging in too deeply, does the following simplification > still capture your intent? > Thanks for having a look, Paul! > P0(int *prod, int *cons, int *data) > { > int p; > int cond = 0; > > p = READ_ONCE(*prod); > if (p == READ_ONCE(*cons)) > cond = 1; With this, yes! > if (cond) { > smp_mb(); > WRITE_ONCE(*data, 1); > smp_wmb(); > WRITE_ONCE(*prod, p ^ 1); > } > } > > P1(int *prod, int *cons, int *data) > { > int c; > int d = -1; > int cond = 0; > > c = READ_ONCE(*cons); > if (READ_ONCE(*prod) == c) > cond = 1; Hmm, this would not be the correct state transition. c==1 && p==1 would set cond to 1, right? I would agree with: c = READ_ONCE(*cons); if (READ_ONCE(*prod) != c) > > if (cond == 1) { > smp_rmb(); > d = READ_ONCE(*data); > smp_mb(); > WRITE_ONCE(*cons, c ^ 1); > } > } > > Thanx, Paul > [...] Björn ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 20:24 ` Björn Töpel @ 2021-03-02 20:41 ` Paul E. McKenney 2021-03-02 20:51 ` Björn Töpel 0 siblings, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2021-03-02 20:41 UTC (permalink / raw) To: Björn Töpel Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, Mar 02, 2021 at 09:24:04PM +0100, Björn Töpel wrote: > On Tue, 2 Mar 2021 at 20:57, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: > > [...] > > > > > Before digging in too deeply, does the following simplification > > still capture your intent? > > > > Thanks for having a look, Paul! > > > P0(int *prod, int *cons, int *data) > > { > > int p; > > int cond = 0; > > > > p = READ_ONCE(*prod); > > if (p == READ_ONCE(*cons)) > > cond = 1; > > With this, yes! > > > if (cond) { > > smp_mb(); > > WRITE_ONCE(*data, 1); > > smp_wmb(); > > WRITE_ONCE(*prod, p ^ 1); > > } > > } > > > > P1(int *prod, int *cons, int *data) > > { > > int c; > > int d = -1; > > int cond = 0; > > > > c = READ_ONCE(*cons); > > if (READ_ONCE(*prod) == c) > > cond = 1; > > Hmm, this would not be the correct state transition. > > c==1 && p==1 would set cond to 1, right? > > I would agree with: > c = READ_ONCE(*cons); > if (READ_ONCE(*prod) != c) Right you are! With that, it looks to me like LKMM is OK with removing the smp_mb(). My guess is that the issue is that LKMM confines the effect of control dependencies to a single "if" statement, hence my reworking of your original. Thanx, Paul > > > > if (cond == 1) { > > smp_rmb(); > > d = READ_ONCE(*data); > > smp_mb(); > > WRITE_ONCE(*cons, c ^ 1); > > } > > } > > > > Thanx, Paul > > > > [...] > > Björn ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 20:41 ` Paul E. McKenney @ 2021-03-02 20:51 ` Björn Töpel 0 siblings, 0 replies; 33+ messages in thread From: Björn Töpel @ 2021-03-02 20:51 UTC (permalink / raw) To: paulmck Cc: bpf, LKML, stern, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, 2 Mar 2021 at 21:41, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Mar 02, 2021 at 09:24:04PM +0100, Björn Töpel wrote: > > On Tue, 2 Mar 2021 at 20:57, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: > > > > [...] > > > > > > > > Before digging in too deeply, does the following simplification > > > still capture your intent? > > > > > > > Thanks for having a look, Paul! > > > > > P0(int *prod, int *cons, int *data) > > > { > > > int p; > > > int cond = 0; > > > > > > p = READ_ONCE(*prod); > > > if (p == READ_ONCE(*cons)) > > > cond = 1; > > > > With this, yes! > > > > > if (cond) { > > > smp_mb(); > > > WRITE_ONCE(*data, 1); > > > smp_wmb(); > > > WRITE_ONCE(*prod, p ^ 1); > > > } > > > } > > > > > > P1(int *prod, int *cons, int *data) > > > { > > > int c; > > > int d = -1; > > > int cond = 0; > > > > > > c = READ_ONCE(*cons); > > > if (READ_ONCE(*prod) == c) > > > cond = 1; > > > > Hmm, this would not be the correct state transition. > > > > c==1 && p==1 would set cond to 1, right? > > > > I would agree with: > > c = READ_ONCE(*cons); > > if (READ_ONCE(*prod) != c) > > Right you are! > > With that, it looks to me like LKMM is OK with removing the smp_mb(). > My guess is that the issue is that LKMM confines the effect of control > dependencies to a single "if" statement, hence my reworking of your > original. > Interesting! I tried the acquire/release version: P0(int *prod, int *cons, int *data) { int p; int cond = 0; p = READ_ONCE(*prod); if (p == READ_ONCE(*cons)) { WRITE_ONCE(*data, 1); smp_store_release(prod, p ^ 1); } } P1(int *prod, int *cons, int *data) { int c; int d = -1; c = READ_ONCE(*cons); if (smp_load_acquire(prod) != c) { d = READ_ONCE(*data); smp_store_release(cons, c ^ 1); } } and as with the previous example, restructuring the if-statement makes "if (p == READ_ONCE(*cons)) {" sufficient, instead of "if (p == smp_load_acquire(cons)) {". Yay! Björn > Thanx, Paul > > > > > > > if (cond == 1) { > > > smp_rmb(); > > > d = READ_ONCE(*data); > > > smp_mb(); > > > WRITE_ONCE(*cons, c ^ 1); > > > } > > > } > > > > > > Thanx, Paul > > > > > > > [...] > > > > Björn ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 18:46 XDP socket rings, and LKMM litmus tests Björn Töpel 2021-03-02 19:57 ` Paul E. McKenney @ 2021-03-02 21:14 ` Alan Stern 2021-03-02 23:50 ` Paul E. McKenney 1 sibling, 1 reply; 33+ messages in thread From: Alan Stern @ 2021-03-02 21:14 UTC (permalink / raw) To: Björn Töpel Cc: bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, paulmck, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: > Hi! > > Firstly; The long Cc-list is to reach the LKMM-folks. > > Some background; the XDP sockets use a ring-buffer to communicate > between the kernel and userland. It's a > single-consumer/single-producer ring, and described in > net/xdp/xsk_queue.h. > > --8<--- > /* The structure of the shared state of the rings are the same as the > * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion > * ring, the kernel is the producer and user space is the consumer. For > * the Tx and fill rings, the kernel is the consumer and user space is > * the producer. > * > * producer consumer > * > * if (LOAD ->consumer) { LOAD ->producer > * (A) smp_rmb() (C) > * STORE $data LOAD $data > * smp_wmb() (B) smp_mb() (D) > * STORE ->producer STORE ->consumer > * } > * > * (A) pairs with (D), and (B) pairs with (C). > ... > -->8--- > > I'd like to replace the smp_{r,w,}mb() barriers with acquire-release > semantics [1], without breaking existing userspace applications. > > So, I figured I'd use herd7 and the LKMM model to build a litmus test > for the barrier version, then for the acquire-release version, and > finally permutations of both. > > The idea is to use a one element ring, with a state machine outlined > in the litmus test. > > The basic test for the existing smp_{r,w,}mb() barriers looks like: > > $ cat spsc-rb+1p1c.litmus > C spsc-rb+1p1c > > // Stupid one entry ring: > // prod cons allowed action prod cons > // 0 0 => prod => 1 0 > // 0 1 => cons => 0 0 > // 1 0 => cons => 1 1 > // 1 1 => prod => 0 1 > > { prod = 1; } > > // Here, we start at prod==1,cons==0, data==0, i.e. producer has > // written data=0, so from here only the consumer can start, and should > // consume data==0. Afterwards, producer can continue and write 1 to > // data. Can we enter state prod==0, cons==1, but consumer observerd > // the write of 1? > > P0(int *prod, int *cons, int *data) > { > int p; > int c; > int cond = 0; > > p = READ_ONCE(*prod); > c = READ_ONCE(*cons); > if (p == 0) > if (c == 0) > cond = 1; > if (p == 1) > if (c == 1) > cond = 1; > > if (cond) { > smp_mb(); > WRITE_ONCE(*data, 1); > smp_wmb(); > WRITE_ONCE(*prod, p ^ 1); > } > } > > P1(int *prod, int *cons, int *data) > { > int p; > int c; > int d = -1; > int cond = 0; > > p = READ_ONCE(*prod); > c = READ_ONCE(*cons); > if (p == 1) > if (c == 0) > cond = 1; > if (p == 0) > if (c == 1) > cond = 1; > > if (cond == 1) { > smp_rmb(); > d = READ_ONCE(*data); > smp_mb(); > WRITE_ONCE(*cons, c ^ 1); > } > } > > exists( 1:d=1 /\ prod=0 /\ cons=1 ); > > -- > > The weird state changing if-statements is because that I didn't get > '&&' and '||' to work with herd. > > When this is run: > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > Test spsc-rb+1p1c Allowed > States 2 > 1:d=0; cons=1; prod=0; > 1:d=0; cons=1; prod=1; > No > Witnesses > Positive: 0 Negative: 2 > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > Observation spsc-rb+1p1c Never 0 2 > Time spsc-rb+1p1c 0.04 > Hash=b399756d6a1301ca5bda042f32130791 > > Now to my question; In P0 there's an smp_mb(). Without that, the d==1 > can be observed from P1 (consumer): > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > Test spsc-rb+1p1c Allowed > States 3 > 1:d=0; cons=1; prod=0; > 1:d=0; cons=1; prod=1; > 1:d=1; cons=1; prod=0; > Ok > Witnesses > Positive: 1 Negative: 2 > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > Observation spsc-rb+1p1c Sometimes 1 2 > Time spsc-rb+1p1c 0.04 > Hash=0047fc21fa77da9a9aee15e35ec367ef This result is wrong, apparently because of a bug in herd7. There should be control dependencies from each of the two loads in P0 to each of the two stores, but herd7 doesn't detect them. Maybe Luc can find some time to check whether this really is a bug and if it is, fix it. > In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending > on control dependencies") removes the corresponding smp_mb(), and also > the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5 > ("documentation: Update circular buffer for > load-acquire/store-release")) is missing the smp_mb() at the > producer-side. > > I'm trying to wrap my head around why it's OK to remove the smp_mb() > in the cases above? I'm worried that the current XDP socket ring > implementation (which is missing smp_mb()) might be broken. Because of the control dependencies, the smp_mb isn't needed. The dependencies will order both of the stores after both of the loads. Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 21:14 ` Alan Stern @ 2021-03-02 23:50 ` Paul E. McKenney 2021-03-03 6:37 ` maranget 2021-03-03 17:12 ` Alan Stern 0 siblings, 2 replies; 33+ messages in thread From: Paul E. McKenney @ 2021-03-02 23:50 UTC (permalink / raw) To: Alan Stern Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote: > On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: > > Hi! > > > > Firstly; The long Cc-list is to reach the LKMM-folks. > > > > Some background; the XDP sockets use a ring-buffer to communicate > > between the kernel and userland. It's a > > single-consumer/single-producer ring, and described in > > net/xdp/xsk_queue.h. > > > > --8<--- > > /* The structure of the shared state of the rings are the same as the > > * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion > > * ring, the kernel is the producer and user space is the consumer. For > > * the Tx and fill rings, the kernel is the consumer and user space is > > * the producer. > > * > > * producer consumer > > * > > * if (LOAD ->consumer) { LOAD ->producer > > * (A) smp_rmb() (C) > > * STORE $data LOAD $data > > * smp_wmb() (B) smp_mb() (D) > > * STORE ->producer STORE ->consumer > > * } > > * > > * (A) pairs with (D), and (B) pairs with (C). > > ... > > -->8--- > > > > I'd like to replace the smp_{r,w,}mb() barriers with acquire-release > > semantics [1], without breaking existing userspace applications. > > > > So, I figured I'd use herd7 and the LKMM model to build a litmus test > > for the barrier version, then for the acquire-release version, and > > finally permutations of both. > > > > The idea is to use a one element ring, with a state machine outlined > > in the litmus test. > > > > The basic test for the existing smp_{r,w,}mb() barriers looks like: > > > > $ cat spsc-rb+1p1c.litmus > > C spsc-rb+1p1c > > > > // Stupid one entry ring: > > // prod cons allowed action prod cons > > // 0 0 => prod => 1 0 > > // 0 1 => cons => 0 0 > > // 1 0 => cons => 1 1 > > // 1 1 => prod => 0 1 > > > > { prod = 1; } > > > > // Here, we start at prod==1,cons==0, data==0, i.e. producer has > > // written data=0, so from here only the consumer can start, and should > > // consume data==0. Afterwards, producer can continue and write 1 to > > // data. Can we enter state prod==0, cons==1, but consumer observerd > > // the write of 1? > > > > P0(int *prod, int *cons, int *data) > > { > > int p; > > int c; > > int cond = 0; > > > > p = READ_ONCE(*prod); > > c = READ_ONCE(*cons); > > if (p == 0) > > if (c == 0) > > cond = 1; > > if (p == 1) > > if (c == 1) > > cond = 1; > > > > if (cond) { > > smp_mb(); > > WRITE_ONCE(*data, 1); > > smp_wmb(); > > WRITE_ONCE(*prod, p ^ 1); > > } > > } > > > > P1(int *prod, int *cons, int *data) > > { > > int p; > > int c; > > int d = -1; > > int cond = 0; > > > > p = READ_ONCE(*prod); > > c = READ_ONCE(*cons); > > if (p == 1) > > if (c == 0) > > cond = 1; > > if (p == 0) > > if (c == 1) > > cond = 1; > > > > if (cond == 1) { > > smp_rmb(); > > d = READ_ONCE(*data); > > smp_mb(); > > WRITE_ONCE(*cons, c ^ 1); > > } > > } > > > > exists( 1:d=1 /\ prod=0 /\ cons=1 ); > > > > -- > > > > The weird state changing if-statements is because that I didn't get > > '&&' and '||' to work with herd. > > > > When this is run: > > > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > > Test spsc-rb+1p1c Allowed > > States 2 > > 1:d=0; cons=1; prod=0; > > 1:d=0; cons=1; prod=1; > > No > > Witnesses > > Positive: 0 Negative: 2 > > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > > Observation spsc-rb+1p1c Never 0 2 > > Time spsc-rb+1p1c 0.04 > > Hash=b399756d6a1301ca5bda042f32130791 > > > > Now to my question; In P0 there's an smp_mb(). Without that, the d==1 > > can be observed from P1 (consumer): > > > > $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > > Test spsc-rb+1p1c Allowed > > States 3 > > 1:d=0; cons=1; prod=0; > > 1:d=0; cons=1; prod=1; > > 1:d=1; cons=1; prod=0; > > Ok > > Witnesses > > Positive: 1 Negative: 2 > > Condition exists (1:d=1 /\ prod=0 /\ cons=1) > > Observation spsc-rb+1p1c Sometimes 1 2 > > Time spsc-rb+1p1c 0.04 > > Hash=0047fc21fa77da9a9aee15e35ec367ef > > This result is wrong, apparently because of a bug in herd7. There > should be control dependencies from each of the two loads in P0 to each > of the two stores, but herd7 doesn't detect them. > > Maybe Luc can find some time to check whether this really is a bug and > if it is, fix it. I agree that herd7's control dependency tracking could be improved. But sadly, it is currently doing exactly what I asked Luc to make it do, which is to confine the control dependency to its "if" statement. But as usual I wasn't thinking globally enough. And I am not exactly sure what to ask for. Here a store to a local was control-dependency ordered after a read, and so that should propagate to a read from that local variable. Maybe treat local variables as if they were registers, so that from herd7's viewpoint the READ_ONCE()s are able to head control-dependency chains in multiple "if" statements? Thoughts? Thanx, Paul > > In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending > > on control dependencies") removes the corresponding smp_mb(), and also > > the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5 > > ("documentation: Update circular buffer for > > load-acquire/store-release")) is missing the smp_mb() at the > > producer-side. > > > > I'm trying to wrap my head around why it's OK to remove the smp_mb() > > in the cases above? I'm worried that the current XDP socket ring > > implementation (which is missing smp_mb()) might be broken. > > Because of the control dependencies, the smp_mb isn't needed. The > dependencies will order both of the stores after both of the loads. > > Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 23:50 ` Paul E. McKenney @ 2021-03-03 6:37 ` maranget 2021-03-03 16:54 ` Paul E. McKenney 2021-03-03 17:12 ` Alan Stern 1 sibling, 1 reply; 33+ messages in thread From: maranget @ 2021-03-03 6:37 UTC (permalink / raw) To: paulmck Cc: Alan Stern, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus Hi all, I agree that herd7 computation of control dependencies is problematic here. Thanks for reporting the problem. Paul kindly takes the responsibility for the problem, but frankly do not remember having followed his recommendations here. I am quite busy today and canot really work on the issue. However my intuition is that control dependencies follow from chains of control dependencies in if (say “direct” control dependencies, roughly from "if" condition to if statement) and data dependencies, ending in control. —Luc > On 3 Mar 2021, at 00:50, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote: >> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: >>> Hi! >>> >>> Firstly; The long Cc-list is to reach the LKMM-folks. >>> >>> Some background; the XDP sockets use a ring-buffer to communicate >>> between the kernel and userland. It's a >>> single-consumer/single-producer ring, and described in >>> net/xdp/xsk_queue.h. >>> >>> --8<--- >>> /* The structure of the shared state of the rings are the same as the >>> * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion >>> * ring, the kernel is the producer and user space is the consumer. For >>> * the Tx and fill rings, the kernel is the consumer and user space is >>> * the producer. >>> * >>> * producer consumer >>> * >>> * if (LOAD ->consumer) { LOAD ->producer >>> * (A) smp_rmb() (C) >>> * STORE $data LOAD $data >>> * smp_wmb() (B) smp_mb() (D) >>> * STORE ->producer STORE ->consumer >>> * } >>> * >>> * (A) pairs with (D), and (B) pairs with (C). >>> ... >>> -->8--- >>> >>> I'd like to replace the smp_{r,w,}mb() barriers with acquire-release >>> semantics [1], without breaking existing userspace applications. >>> >>> So, I figured I'd use herd7 and the LKMM model to build a litmus test >>> for the barrier version, then for the acquire-release version, and >>> finally permutations of both. >>> >>> The idea is to use a one element ring, with a state machine outlined >>> in the litmus test. >>> >>> The basic test for the existing smp_{r,w,}mb() barriers looks like: >>> >>> $ cat spsc-rb+1p1c.litmus >>> C spsc-rb+1p1c >>> >>> // Stupid one entry ring: >>> // prod cons allowed action prod cons >>> // 0 0 => prod => 1 0 >>> // 0 1 => cons => 0 0 >>> // 1 0 => cons => 1 1 >>> // 1 1 => prod => 0 1 >>> >>> { prod = 1; } >>> >>> // Here, we start at prod==1,cons==0, data==0, i.e. producer has >>> // written data=0, so from here only the consumer can start, and should >>> // consume data==0. Afterwards, producer can continue and write 1 to >>> // data. Can we enter state prod==0, cons==1, but consumer observerd >>> // the write of 1? >>> >>> P0(int *prod, int *cons, int *data) >>> { >>> int p; >>> int c; >>> int cond = 0; >>> >>> p = READ_ONCE(*prod); >>> c = READ_ONCE(*cons); >>> if (p == 0) >>> if (c == 0) >>> cond = 1; >>> if (p == 1) >>> if (c == 1) >>> cond = 1; >>> >>> if (cond) { >>> smp_mb(); >>> WRITE_ONCE(*data, 1); >>> smp_wmb(); >>> WRITE_ONCE(*prod, p ^ 1); >>> } >>> } >>> >>> P1(int *prod, int *cons, int *data) >>> { >>> int p; >>> int c; >>> int d = -1; >>> int cond = 0; >>> >>> p = READ_ONCE(*prod); >>> c = READ_ONCE(*cons); >>> if (p == 1) >>> if (c == 0) >>> cond = 1; >>> if (p == 0) >>> if (c == 1) >>> cond = 1; >>> >>> if (cond == 1) { >>> smp_rmb(); >>> d = READ_ONCE(*data); >>> smp_mb(); >>> WRITE_ONCE(*cons, c ^ 1); >>> } >>> } >>> >>> exists( 1:d=1 /\ prod=0 /\ cons=1 ); >>> >>> -- >>> >>> The weird state changing if-statements is because that I didn't get >>> '&&' and '||' to work with herd. >>> >>> When this is run: >>> >>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus >>> Test spsc-rb+1p1c Allowed >>> States 2 >>> 1:d=0; cons=1; prod=0; >>> 1:d=0; cons=1; prod=1; >>> No >>> Witnesses >>> Positive: 0 Negative: 2 >>> Condition exists (1:d=1 /\ prod=0 /\ cons=1) >>> Observation spsc-rb+1p1c Never 0 2 >>> Time spsc-rb+1p1c 0.04 >>> Hash=b399756d6a1301ca5bda042f32130791 >>> >>> Now to my question; In P0 there's an smp_mb(). Without that, the d==1 >>> can be observed from P1 (consumer): >>> >>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus >>> Test spsc-rb+1p1c Allowed >>> States 3 >>> 1:d=0; cons=1; prod=0; >>> 1:d=0; cons=1; prod=1; >>> 1:d=1; cons=1; prod=0; >>> Ok >>> Witnesses >>> Positive: 1 Negative: 2 >>> Condition exists (1:d=1 /\ prod=0 /\ cons=1) >>> Observation spsc-rb+1p1c Sometimes 1 2 >>> Time spsc-rb+1p1c 0.04 >>> Hash=0047fc21fa77da9a9aee15e35ec367ef >> >> This result is wrong, apparently because of a bug in herd7. There >> should be control dependencies from each of the two loads in P0 to each >> of the two stores, but herd7 doesn't detect them. >> >> Maybe Luc can find some time to check whether this really is a bug and >> if it is, fix it. > > I agree that herd7's control dependency tracking could be improved. > > But sadly, it is currently doing exactly what I asked Luc to make it do, > which is to confine the control dependency to its "if" statement. But as > usual I wasn't thinking globally enough. And I am not exactly sure what > to ask for. Here a store to a local was control-dependency ordered after > a read, and so that should propagate to a read from that local variable. > Maybe treat local variables as if they were registers, so that from > herd7's viewpoint the READ_ONCE()s are able to head control-dependency > chains in multiple "if" statements? > > Thoughts? > > Thanx, Paul > >>> In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending >>> on control dependencies") removes the corresponding smp_mb(), and also >>> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5 >>> ("documentation: Update circular buffer for >>> load-acquire/store-release")) is missing the smp_mb() at the >>> producer-side. >>> >>> I'm trying to wrap my head around why it's OK to remove the smp_mb() >>> in the cases above? I'm worried that the current XDP socket ring >>> implementation (which is missing smp_mb()) might be broken. >> >> Because of the control dependencies, the smp_mb isn't needed. The >> dependencies will order both of the stores after both of the loads. >> >> Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 6:37 ` maranget @ 2021-03-03 16:54 ` Paul E. McKenney 0 siblings, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2021-03-03 16:54 UTC (permalink / raw) To: maranget Cc: Alan Stern, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 07:37:42AM +0100, maranget wrote: > Hi all, > > I agree that herd7 computation of control dependencies is problematic here. > > Thanks for reporting the problem. Paul kindly takes the responsibility for the problem, but frankly do not remember having followed his recommendations here. Accidents happen? ;-) > I am quite busy today and canot really work on the issue. However my intuition is that control dependencies follow from chains of control dependencies in if (say “direct” control dependencies, roughly from "if" condition to if statement) and data dependencies, ending in control. My more operational line of thinking is to treat each local variable as if it was a machine register, then analyze based on hardware control dependencies, except that (as before) an LKMM control dependency is confined to its "if" statement. Stores to and reloads from shared variables would be handled as they are now, correct? My guess is that this turns out to be equivalent to your intuition, but this stuff can be tricky. Thanx, Paul > —Luc > > > On 3 Mar 2021, at 00:50, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote: > >> On Tue, Mar 02, 2021 at 07:46:27PM +0100, Björn Töpel wrote: > >>> Hi! > >>> > >>> Firstly; The long Cc-list is to reach the LKMM-folks. > >>> > >>> Some background; the XDP sockets use a ring-buffer to communicate > >>> between the kernel and userland. It's a > >>> single-consumer/single-producer ring, and described in > >>> net/xdp/xsk_queue.h. > >>> > >>> --8<--- > >>> /* The structure of the shared state of the rings are the same as the > >>> * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion > >>> * ring, the kernel is the producer and user space is the consumer. For > >>> * the Tx and fill rings, the kernel is the consumer and user space is > >>> * the producer. > >>> * > >>> * producer consumer > >>> * > >>> * if (LOAD ->consumer) { LOAD ->producer > >>> * (A) smp_rmb() (C) > >>> * STORE $data LOAD $data > >>> * smp_wmb() (B) smp_mb() (D) > >>> * STORE ->producer STORE ->consumer > >>> * } > >>> * > >>> * (A) pairs with (D), and (B) pairs with (C). > >>> ... > >>> -->8--- > >>> > >>> I'd like to replace the smp_{r,w,}mb() barriers with acquire-release > >>> semantics [1], without breaking existing userspace applications. > >>> > >>> So, I figured I'd use herd7 and the LKMM model to build a litmus test > >>> for the barrier version, then for the acquire-release version, and > >>> finally permutations of both. > >>> > >>> The idea is to use a one element ring, with a state machine outlined > >>> in the litmus test. > >>> > >>> The basic test for the existing smp_{r,w,}mb() barriers looks like: > >>> > >>> $ cat spsc-rb+1p1c.litmus > >>> C spsc-rb+1p1c > >>> > >>> // Stupid one entry ring: > >>> // prod cons allowed action prod cons > >>> // 0 0 => prod => 1 0 > >>> // 0 1 => cons => 0 0 > >>> // 1 0 => cons => 1 1 > >>> // 1 1 => prod => 0 1 > >>> > >>> { prod = 1; } > >>> > >>> // Here, we start at prod==1,cons==0, data==0, i.e. producer has > >>> // written data=0, so from here only the consumer can start, and should > >>> // consume data==0. Afterwards, producer can continue and write 1 to > >>> // data. Can we enter state prod==0, cons==1, but consumer observerd > >>> // the write of 1? > >>> > >>> P0(int *prod, int *cons, int *data) > >>> { > >>> int p; > >>> int c; > >>> int cond = 0; > >>> > >>> p = READ_ONCE(*prod); > >>> c = READ_ONCE(*cons); > >>> if (p == 0) > >>> if (c == 0) > >>> cond = 1; > >>> if (p == 1) > >>> if (c == 1) > >>> cond = 1; > >>> > >>> if (cond) { > >>> smp_mb(); > >>> WRITE_ONCE(*data, 1); > >>> smp_wmb(); > >>> WRITE_ONCE(*prod, p ^ 1); > >>> } > >>> } > >>> > >>> P1(int *prod, int *cons, int *data) > >>> { > >>> int p; > >>> int c; > >>> int d = -1; > >>> int cond = 0; > >>> > >>> p = READ_ONCE(*prod); > >>> c = READ_ONCE(*cons); > >>> if (p == 1) > >>> if (c == 0) > >>> cond = 1; > >>> if (p == 0) > >>> if (c == 1) > >>> cond = 1; > >>> > >>> if (cond == 1) { > >>> smp_rmb(); > >>> d = READ_ONCE(*data); > >>> smp_mb(); > >>> WRITE_ONCE(*cons, c ^ 1); > >>> } > >>> } > >>> > >>> exists( 1:d=1 /\ prod=0 /\ cons=1 ); > >>> > >>> -- > >>> > >>> The weird state changing if-statements is because that I didn't get > >>> '&&' and '||' to work with herd. > >>> > >>> When this is run: > >>> > >>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > >>> Test spsc-rb+1p1c Allowed > >>> States 2 > >>> 1:d=0; cons=1; prod=0; > >>> 1:d=0; cons=1; prod=1; > >>> No > >>> Witnesses > >>> Positive: 0 Negative: 2 > >>> Condition exists (1:d=1 /\ prod=0 /\ cons=1) > >>> Observation spsc-rb+1p1c Never 0 2 > >>> Time spsc-rb+1p1c 0.04 > >>> Hash=b399756d6a1301ca5bda042f32130791 > >>> > >>> Now to my question; In P0 there's an smp_mb(). Without that, the d==1 > >>> can be observed from P1 (consumer): > >>> > >>> $ herd7 -conf linux-kernel.cfg litmus-tests/spsc-rb+1p1c.litmus > >>> Test spsc-rb+1p1c Allowed > >>> States 3 > >>> 1:d=0; cons=1; prod=0; > >>> 1:d=0; cons=1; prod=1; > >>> 1:d=1; cons=1; prod=0; > >>> Ok > >>> Witnesses > >>> Positive: 1 Negative: 2 > >>> Condition exists (1:d=1 /\ prod=0 /\ cons=1) > >>> Observation spsc-rb+1p1c Sometimes 1 2 > >>> Time spsc-rb+1p1c 0.04 > >>> Hash=0047fc21fa77da9a9aee15e35ec367ef > >> > >> This result is wrong, apparently because of a bug in herd7. There > >> should be control dependencies from each of the two loads in P0 to each > >> of the two stores, but herd7 doesn't detect them. > >> > >> Maybe Luc can find some time to check whether this really is a bug and > >> if it is, fix it. > > > > I agree that herd7's control dependency tracking could be improved. > > > > But sadly, it is currently doing exactly what I asked Luc to make it do, > > which is to confine the control dependency to its "if" statement. But as > > usual I wasn't thinking globally enough. And I am not exactly sure what > > to ask for. Here a store to a local was control-dependency ordered after > > a read, and so that should propagate to a read from that local variable. > > Maybe treat local variables as if they were registers, so that from > > herd7's viewpoint the READ_ONCE()s are able to head control-dependency > > chains in multiple "if" statements? > > > > Thoughts? > > > > Thanx, Paul > > > >>> In commit c7f2e3cd6c1f ("perf: Optimize ring-buffer write by depending > >>> on control dependencies") removes the corresponding smp_mb(), and also > >>> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5 > >>> ("documentation: Update circular buffer for > >>> load-acquire/store-release")) is missing the smp_mb() at the > >>> producer-side. > >>> > >>> I'm trying to wrap my head around why it's OK to remove the smp_mb() > >>> in the cases above? I'm worried that the current XDP socket ring > >>> implementation (which is missing smp_mb()) might be broken. > >> > >> Because of the control dependencies, the smp_mb isn't needed. The > >> dependencies will order both of the stores after both of the loads. > >> > >> Alan Stern > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-02 23:50 ` Paul E. McKenney 2021-03-03 6:37 ` maranget @ 2021-03-03 17:12 ` Alan Stern 2021-03-03 17:37 ` maranget 2021-03-03 17:40 ` Paul E. McKenney 1 sibling, 2 replies; 33+ messages in thread From: Alan Stern @ 2021-03-03 17:12 UTC (permalink / raw) To: Paul E. McKenney Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote: > On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote: > > This result is wrong, apparently because of a bug in herd7. There > > should be control dependencies from each of the two loads in P0 to each > > of the two stores, but herd7 doesn't detect them. > > > > Maybe Luc can find some time to check whether this really is a bug and > > if it is, fix it. > > I agree that herd7's control dependency tracking could be improved. > > But sadly, it is currently doing exactly what I asked Luc to make it do, > which is to confine the control dependency to its "if" statement. But as > usual I wasn't thinking globally enough. And I am not exactly sure what > to ask for. Here a store to a local was control-dependency ordered after > a read, and so that should propagate to a read from that local variable. > Maybe treat local variables as if they were registers, so that from > herd7's viewpoint the READ_ONCE()s are able to head control-dependency > chains in multiple "if" statements? > > Thoughts? Local variables absolutely should be treated just like CPU registers, if possible. In fact, the compiler has the option of keeping local variables stored in registers. (Of course, things may get complicated if anyone writes a litmus test that uses a pointer to a local variable, Especially if the pointer could hold the address of a local variable in one execution and a shared variable in another! Or if the pointer is itself a shared variable and is dereferenced in another thread!) But even if local variables are treated as non-shared storage locations, we should still handle this correctly. Part of the problem seems to lie in the definition of the to-r dependency relation; the relevant portion is: (dep ; [Marked] ; rfi) Here dep is the control dependency from the READ_ONCE to the local-variable store, and the rfi refers to the following load of the local variable. The problem is that the store to the local variable doesn't go in the Marked class, because it is notated as a plain C assignment. (And likewise for the following load.) Should we change the model to make loads from and stores to local variables always count as Marked? What should have happened if the local variable were instead a shared variable which the other thread didn't access at all? It seems like a weak point of the memory model that it treats these two things differently. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 17:12 ` Alan Stern @ 2021-03-03 17:37 ` maranget 2021-03-03 17:39 ` maranget 2021-03-03 19:40 ` Alan Stern 2021-03-03 17:40 ` Paul E. McKenney 1 sibling, 2 replies; 33+ messages in thread From: maranget @ 2021-03-03 17:37 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus > On 3 Mar 2021, at 18:12, Alan Stern <stern@rowland.harvard.edu> wrote: > > On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote: >> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote: > >>> This result is wrong, apparently because of a bug in herd7. There >>> should be control dependencies from each of the two loads in P0 to each >>> of the two stores, but herd7 doesn't detect them. >>> >>> Maybe Luc can find some time to check whether this really is a bug and >>> if it is, fix it. >> >> I agree that herd7's control dependency tracking could be improved. >> >> But sadly, it is currently doing exactly what I asked Luc to make it do, >> which is to confine the control dependency to its "if" statement. But as >> usual I wasn't thinking globally enough. And I am not exactly sure what >> to ask for. Here a store to a local was control-dependency ordered after >> a read, and so that should propagate to a read from that local variable. >> Maybe treat local variables as if they were registers, so that from >> herd7's viewpoint the READ_ONCE()s are able to head control-dependency >> chains in multiple "if" statements? >> >> Thoughts? > > Local variables absolutely should be treated just like CPU registers, if > possible. In fact, the compiler has the option of keeping local > variables stored in registers. > And indeed local variables are treated as registers by herd7. > (Of course, things may get complicated if anyone writes a litmus test > that uses a pointer to a local variable, Especially if the pointer > could hold the address of a local variable in one execution and a > shared variable in another! Or if the pointer is itself a shared > variable and is dereferenced in another thread!) > > But even if local variables are treated as non-shared storage locations, > we should still handle this correctly. Part of the problem seems to lie > in the definition of the to-r dependency relation; the relevant portion > is: In fact, I’d rather change the computation of “dep” here control-dependency “ctrl”. Notice that “ctrl” is computed by herd7 and present in the initial environment of the Cat interpreter. I have made a PR to herd7 that performs the change. The commit message states the new definition. > > (dep ; [Marked] ; rfi) > > Here dep is the control dependency from the READ_ONCE to the > local-variable store, and the rfi refers to the following load of the > local variable. The problem is that the store to the local variable > doesn't go in the Marked class, because it is notated as a plain C > assignment. (And likewise for the following load.) > This is a related issue, I am not sure, but perhaps it can be formulated as "should rfi and rf on registers behave the same?” > Should we change the model to make loads from and stores to local > variables always count as Marked? > > What should have happened if the local variable were instead a shared > variable which the other thread didn't access at all? It seems like a > weak point of the memory model that it treats these two things > differently. > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 17:37 ` maranget @ 2021-03-03 17:39 ` maranget 2021-03-03 21:56 ` Paul E. McKenney 2021-03-03 19:40 ` Alan Stern 1 sibling, 1 reply; 33+ messages in thread From: maranget @ 2021-03-03 17:39 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus > On 3 Mar 2021, at 18:37, maranget <luc.maranget@inria.fr> wrote: > > I have made a PR to herd7 that performs the change. The commit message states the new definition. For those who are interested <https://github.com/herd/herdtools7/pull/183> —Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 17:39 ` maranget @ 2021-03-03 21:56 ` Paul E. McKenney 0 siblings, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2021-03-03 21:56 UTC (permalink / raw) To: maranget Cc: Alan Stern, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 06:39:14PM +0100, maranget wrote: > > > > On 3 Mar 2021, at 18:37, maranget <luc.maranget@inria.fr> wrote: > > > > I have made a PR to herd7 that performs the change. The commit message states the new definition. > > For those who are interested > <https://github.com/herd/herdtools7/pull/183> And I just confirmed that with this change, Björn's original litmus test behaves as desired. Merci beaucoup, Luc! The new herd7 also passes the in-kernel regression test, and also all of the github "litmus" archive's tests up to six processes that are flagged with expected outcomes except for these five tests, which are expected failures: litmus/manual/deps/LB-addr-equals.litmus litmus/manual/deps/LB-ctls-bothvals-a.litmus litmus/manual/deps/LB-ctls-diffvals-det.litmus litmus/manual/deps/LB-ctls-sameval-barrier.litmus litmus/manual/deps/LB-ctls-sameval.litmus This situation is a bit silly, so I changed the "Results" clause and added a comment noting that LKMM does not yet know about these corner cases. I just started a longer regression test, but this will take some time. Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 17:37 ` maranget 2021-03-03 17:39 ` maranget @ 2021-03-03 19:40 ` Alan Stern 1 sibling, 0 replies; 33+ messages in thread From: Alan Stern @ 2021-03-03 19:40 UTC (permalink / raw) To: maranget Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 06:37:36PM +0100, maranget wrote: > > > > On 3 Mar 2021, at 18:12, Alan Stern <stern@rowland.harvard.edu> wrote: > > > > On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote: > >> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote: > > > >>> This result is wrong, apparently because of a bug in herd7. There > >>> should be control dependencies from each of the two loads in P0 to each > >>> of the two stores, but herd7 doesn't detect them. > >>> > >>> Maybe Luc can find some time to check whether this really is a bug and > >>> if it is, fix it. > >> > >> I agree that herd7's control dependency tracking could be improved. > >> > >> But sadly, it is currently doing exactly what I asked Luc to make it do, > >> which is to confine the control dependency to its "if" statement. But as > >> usual I wasn't thinking globally enough. And I am not exactly sure what > >> to ask for. Here a store to a local was control-dependency ordered after > >> a read, and so that should propagate to a read from that local variable. > >> Maybe treat local variables as if they were registers, so that from > >> herd7's viewpoint the READ_ONCE()s are able to head control-dependency > >> chains in multiple "if" statements? > >> > >> Thoughts? > > > > Local variables absolutely should be treated just like CPU registers, if > > possible. In fact, the compiler has the option of keeping local > > variables stored in registers. > > > > And indeed local variables are treated as registers by herd7. > > > > (Of course, things may get complicated if anyone writes a litmus test > > that uses a pointer to a local variable, Especially if the pointer > > could hold the address of a local variable in one execution and a > > shared variable in another! Or if the pointer is itself a shared > > variable and is dereferenced in another thread!) > > > > But even if local variables are treated as non-shared storage locations, > > we should still handle this correctly. Part of the problem seems to lie > > in the definition of the to-r dependency relation; the relevant portion > > is: > > In fact, I’d rather change the computation of “dep” here control-dependency “ctrl”. Notice that “ctrl” is computed by herd7 and present in the initial environment of the Cat interpreter. > > I have made a PR to herd7 that performs the change. The commit message states the new definition. Shouldn't similar reasoning apply to data and address dependencies? For example, suppose there is a control dependency from a load to a register variable, and then a data dependency from the register variable to a store. This should be treated as an overall data dependency from the load to the store. Does your change to herd7 do this? I couldn't tell from the description in the PR. Also, do you think it's reasonable to add a restriction to herd7 against taking the address of a local variable? > > (dep ; [Marked] ; rfi) > > > > Here dep is the control dependency from the READ_ONCE to the > > local-variable store, and the rfi refers to the following load of the > > local variable. The problem is that the store to the local variable > > doesn't go in the Marked class, because it is notated as a plain C > > assignment. (And likewise for the following load.) > > > This is a related issue, I am not sure, but perhaps it can be formulated as > "should rfi and rf on registers behave the same?” Aren't they already the same thing? It's not possible to have an rfe from a register, is it? Alan > > Should we change the model to make loads from and stores to local > > variables always count as Marked? > > > > What should have happened if the local variable were instead a shared > > variable which the other thread didn't access at all? It seems like a > > weak point of the memory model that it treats these two things > > differently. > > > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 17:12 ` Alan Stern 2021-03-03 17:37 ` maranget @ 2021-03-03 17:40 ` Paul E. McKenney 2021-03-03 20:22 ` Alan Stern 1 sibling, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2021-03-03 17:40 UTC (permalink / raw) To: Alan Stern Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote: > On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote: > > On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote: > > > > This result is wrong, apparently because of a bug in herd7. There > > > should be control dependencies from each of the two loads in P0 to each > > > of the two stores, but herd7 doesn't detect them. > > > > > > Maybe Luc can find some time to check whether this really is a bug and > > > if it is, fix it. > > > > I agree that herd7's control dependency tracking could be improved. > > > > But sadly, it is currently doing exactly what I asked Luc to make it do, > > which is to confine the control dependency to its "if" statement. But as > > usual I wasn't thinking globally enough. And I am not exactly sure what > > to ask for. Here a store to a local was control-dependency ordered after > > a read, and so that should propagate to a read from that local variable. > > Maybe treat local variables as if they were registers, so that from > > herd7's viewpoint the READ_ONCE()s are able to head control-dependency > > chains in multiple "if" statements? > > > > Thoughts? > > Local variables absolutely should be treated just like CPU registers, if > possible. In fact, the compiler has the option of keeping local > variables stored in registers. > > (Of course, things may get complicated if anyone writes a litmus test > that uses a pointer to a local variable, Especially if the pointer > could hold the address of a local variable in one execution and a > shared variable in another! Or if the pointer is itself a shared > variable and is dereferenced in another thread!) Good point! I did miss this complication. ;-) As you say, when its address is taken, the "local" variable needs to be treated as is it were shared. There are exceptions where the pointed-to local is still used only by its process. Are any of these exceptions problematic? > But even if local variables are treated as non-shared storage locations, > we should still handle this correctly. Part of the problem seems to lie > in the definition of the to-r dependency relation; the relevant portion > is: > > (dep ; [Marked] ; rfi) > > Here dep is the control dependency from the READ_ONCE to the > local-variable store, and the rfi refers to the following load of the > local variable. The problem is that the store to the local variable > doesn't go in the Marked class, because it is notated as a plain C > assignment. (And likewise for the following load.) > > Should we change the model to make loads from and stores to local > variables always count as Marked? As long as the initial (possibly unmarked) load would be properly complained about. And I cannot immediately think of a situation where this approach would break that would not result in a data race being flagged. Or is this yet another failure of my imagination? > What should have happened if the local variable were instead a shared > variable which the other thread didn't access at all? It seems like a > weak point of the memory model that it treats these two things > differently. But is this really any different than the situation where a global variable is only accessed by a single thread? Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 17:40 ` Paul E. McKenney @ 2021-03-03 20:22 ` Alan Stern 2021-03-03 22:03 ` Paul E. McKenney ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Alan Stern @ 2021-03-03 20:22 UTC (permalink / raw) To: Paul E. McKenney Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote: > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote: > > Local variables absolutely should be treated just like CPU registers, if > > possible. In fact, the compiler has the option of keeping local > > variables stored in registers. > > > > (Of course, things may get complicated if anyone writes a litmus test > > that uses a pointer to a local variable, Especially if the pointer > > could hold the address of a local variable in one execution and a > > shared variable in another! Or if the pointer is itself a shared > > variable and is dereferenced in another thread!) > > Good point! I did miss this complication. ;-) I suspect it wouldn't be so bad if herd7 disallowed taking addresses of local variables. > As you say, when its address is taken, the "local" variable needs to be > treated as is it were shared. There are exceptions where the pointed-to > local is still used only by its process. Are any of these exceptions > problematic? Easiest just to rule out the whole can of worms. > > But even if local variables are treated as non-shared storage locations, > > we should still handle this correctly. Part of the problem seems to lie > > in the definition of the to-r dependency relation; the relevant portion > > is: > > > > (dep ; [Marked] ; rfi) > > > > Here dep is the control dependency from the READ_ONCE to the > > local-variable store, and the rfi refers to the following load of the > > local variable. The problem is that the store to the local variable > > doesn't go in the Marked class, because it is notated as a plain C > > assignment. (And likewise for the following load.) > > > > Should we change the model to make loads from and stores to local > > variables always count as Marked? > > As long as the initial (possibly unmarked) load would be properly > complained about. Sorry, I don't understand what you mean. > And I cannot immediately think of a situation where > this approach would break that would not result in a data race being > flagged. Or is this yet another failure of my imagination? By definition, an access to a local variable cannot participate in a data race because all such accesses are confined to a single thread. However, there are other aspects to consider, in particular, the ordering relations on local-variable accesses. But if, as Luc says, local variables are treated just like registers then perhaps the issue doesn't arise. > > What should have happened if the local variable were instead a shared > > variable which the other thread didn't access at all? It seems like a > > weak point of the memory model that it treats these two things > > differently. > > But is this really any different than the situation where a global > variable is only accessed by a single thread? Indeed; it is the _same_ situation. Which leads to some interesting questions, such as: What does READ_ONCE(r) mean when r is a local variable? Should it be allowed at all? In what way is it different from a plain read of r? One difference is that the LKMM doesn't allow dependencies to originate from a plain load. Of course, when you're dealing with a local variable, what matters is not the load from that variable but rather the earlier loads which determined the value that had been stored there. Which brings us back to the case of the dep ; rfi dependency relation, where the accesses in the middle are plain and non-racy. Should the LKMM be changed to allow this? There are other differences to consider. For example: r = READ_ONCE(x); smp_wmb(); WRITE_ONCE(y, 1); If the write to r were treated as a marked store, the smp_wmb would order it (and consequently the READ_ONCE) before the WRITE_ONCE. However we don't want to do this when r is a local variable. Indeed, a plain store wouldn't be ordered this way because the compiler might optimize the store away entirely, leaving the smp_wmb nothing to act on. So overall the situation is rather puzzling. Treating local variables as registers is probably the best answer. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 20:22 ` Alan Stern @ 2021-03-03 22:03 ` Paul E. McKenney 2021-03-04 3:21 ` Alan Stern 2021-03-04 1:26 ` Boqun Feng 2021-03-04 15:44 ` maranget 2 siblings, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2021-03-03 22:03 UTC (permalink / raw) To: Alan Stern Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote: > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote: > > > > Local variables absolutely should be treated just like CPU registers, if > > > possible. In fact, the compiler has the option of keeping local > > > variables stored in registers. > > > > > > (Of course, things may get complicated if anyone writes a litmus test > > > that uses a pointer to a local variable, Especially if the pointer > > > could hold the address of a local variable in one execution and a > > > shared variable in another! Or if the pointer is itself a shared > > > variable and is dereferenced in another thread!) > > > > Good point! I did miss this complication. ;-) > > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of > local variables. > > > As you say, when its address is taken, the "local" variable needs to be > > treated as is it were shared. There are exceptions where the pointed-to > > local is still used only by its process. Are any of these exceptions > > problematic? > > Easiest just to rule out the whole can of worms. Good point, given that a global can be used instead of a local for any case where an address must be taken. > > > But even if local variables are treated as non-shared storage locations, > > > we should still handle this correctly. Part of the problem seems to lie > > > in the definition of the to-r dependency relation; the relevant portion > > > is: > > > > > > (dep ; [Marked] ; rfi) > > > > > > Here dep is the control dependency from the READ_ONCE to the > > > local-variable store, and the rfi refers to the following load of the > > > local variable. The problem is that the store to the local variable > > > doesn't go in the Marked class, because it is notated as a plain C > > > assignment. (And likewise for the following load.) > > > > > > Should we change the model to make loads from and stores to local > > > variables always count as Marked? > > > > As long as the initial (possibly unmarked) load would be properly > > complained about. > > Sorry, I don't understand what you mean. I was thinking in terms of something like this in one of the processes: p = gp; // Unmarked! r1 = p; q = r1; // Implicitly marked now? if (q) WRITE_ONCE(x, 1); // ctrl dep from gp??? > > And I cannot immediately think of a situation where > > this approach would break that would not result in a data race being > > flagged. Or is this yet another failure of my imagination? > > By definition, an access to a local variable cannot participate in a > data race because all such accesses are confined to a single thread. True, but its value might have come from a load from a shared variable. > However, there are other aspects to consider, in particular, the > ordering relations on local-variable accesses. But if, as Luc says, > local variables are treated just like registers then perhaps the issue > doesn't arise. Here is hoping! > > > What should have happened if the local variable were instead a shared > > > variable which the other thread didn't access at all? It seems like a > > > weak point of the memory model that it treats these two things > > > differently. > > > > But is this really any different than the situation where a global > > variable is only accessed by a single thread? > > Indeed; it is the _same_ situation. Which leads to some interesting > questions, such as: What does READ_ONCE(r) mean when r is a local > variable? Should it be allowed at all? In what way is it different > from a plain read of r? > > One difference is that the LKMM doesn't allow dependencies to originate > from a plain load. Of course, when you're dealing with a local > variable, what matters is not the load from that variable but rather the > earlier loads which determined the value that had been stored there. > Which brings us back to the case of the > > dep ; rfi > > dependency relation, where the accesses in the middle are plain and > non-racy. Should the LKMM be changed to allow this? It would be nice, give or take the potential side effects. ;-) As in it would be nice, but might not be worthwhile. > There are other differences to consider. For example: > > r = READ_ONCE(x); > smp_wmb(); > WRITE_ONCE(y, 1); > > If the write to r were treated as a marked store, the smp_wmb would > order it (and consequently the READ_ONCE) before the WRITE_ONCE. > However we don't want to do this when r is a local variable. Indeed, a > plain store wouldn't be ordered this way because the compiler might > optimize the store away entirely, leaving the smp_wmb nothing to act on. Agreed, having smp_wmb() order things due to a write to a local variable would not be what we want. > So overall the situation is rather puzzling. Treating local variables > as registers is probably the best answer. That is sounding quite appealing at the moment. Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 22:03 ` Paul E. McKenney @ 2021-03-04 3:21 ` Alan Stern 2021-03-04 5:04 ` Paul E. McKenney 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2021-03-04 3:21 UTC (permalink / raw) To: Paul E. McKenney Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote: > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote: > > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote: > > > > > > Local variables absolutely should be treated just like CPU registers, if > > > > possible. In fact, the compiler has the option of keeping local > > > > variables stored in registers. > > > > > > > > (Of course, things may get complicated if anyone writes a litmus test > > > > that uses a pointer to a local variable, Especially if the pointer > > > > could hold the address of a local variable in one execution and a > > > > shared variable in another! Or if the pointer is itself a shared > > > > variable and is dereferenced in another thread!) > > > > > > Good point! I did miss this complication. ;-) > > > > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of > > local variables. > > > > > As you say, when its address is taken, the "local" variable needs to be > > > treated as is it were shared. There are exceptions where the pointed-to > > > local is still used only by its process. Are any of these exceptions > > > problematic? > > > > Easiest just to rule out the whole can of worms. > > Good point, given that a global can be used instead of a local for > any case where an address must be taken. Another thing to consider: Almost all marked accesses involve using the address of the storage location (for example, smp_load_acquire's first argument must be a pointer). As far as I can remember at the moment, the only ones that don't are READ_ONCE and WRITE_ONCE. So although we might or might not want to allow READ_ONCE or WRITE_ONCE on a local variable, we won't have to worry about any of the other kinds of marked accesses. > > > > But even if local variables are treated as non-shared storage locations, > > > > we should still handle this correctly. Part of the problem seems to lie > > > > in the definition of the to-r dependency relation; the relevant portion > > > > is: > > > > > > > > (dep ; [Marked] ; rfi) > > > > > > > > Here dep is the control dependency from the READ_ONCE to the > > > > local-variable store, and the rfi refers to the following load of the > > > > local variable. The problem is that the store to the local variable > > > > doesn't go in the Marked class, because it is notated as a plain C > > > > assignment. (And likewise for the following load.) > > > > > > > > Should we change the model to make loads from and stores to local > > > > variables always count as Marked? > > > > > > As long as the initial (possibly unmarked) load would be properly > > > complained about. > > > > Sorry, I don't understand what you mean. > > I was thinking in terms of something like this in one of the processes: > > p = gp; // Unmarked! > r1 = p; > q = r1; // Implicitly marked now? > if (q) > WRITE_ONCE(x, 1); // ctrl dep from gp??? I hope we won't have to worry about this! :-) Treating local variable accesses as if they are always marked looks wrong. > > > And I cannot immediately think of a situation where > > > this approach would break that would not result in a data race being > > > flagged. Or is this yet another failure of my imagination? > > > > By definition, an access to a local variable cannot participate in a > > data race because all such accesses are confined to a single thread. > > True, but its value might have come from a load from a shared variable. Then that load could have participated in a data race. But the store to the local variable cannot. > > However, there are other aspects to consider, in particular, the > > ordering relations on local-variable accesses. But if, as Luc says, > > local variables are treated just like registers then perhaps the issue > > doesn't arise. > > Here is hoping! > > > > > What should have happened if the local variable were instead a shared > > > > variable which the other thread didn't access at all? It seems like a > > > > weak point of the memory model that it treats these two things > > > > differently. > > > > > > But is this really any different than the situation where a global > > > variable is only accessed by a single thread? > > > > Indeed; it is the _same_ situation. Which leads to some interesting > > questions, such as: What does READ_ONCE(r) mean when r is a local > > variable? Should it be allowed at all? In what way is it different > > from a plain read of r? > > > > One difference is that the LKMM doesn't allow dependencies to originate > > from a plain load. Of course, when you're dealing with a local > > variable, what matters is not the load from that variable but rather the > > earlier loads which determined the value that had been stored there. > > Which brings us back to the case of the > > > > dep ; rfi > > > > dependency relation, where the accesses in the middle are plain and > > non-racy. Should the LKMM be changed to allow this? > > It would be nice, give or take the potential side effects. ;-) > As in it would be nice, but might not be worthwhile. Treating local variables like registers will automatically bring this behavior. So I think we'll be good. > > There are other differences to consider. For example: > > > > r = READ_ONCE(x); > > smp_wmb(); > > WRITE_ONCE(y, 1); > > > > If the write to r were treated as a marked store, the smp_wmb would > > order it (and consequently the READ_ONCE) before the WRITE_ONCE. > > However we don't want to do this when r is a local variable. Indeed, a > > plain store wouldn't be ordered this way because the compiler might > > optimize the store away entirely, leaving the smp_wmb nothing to act on. > > Agreed, having smp_wmb() order things due to a write to a local variable > would not be what we want. > > > So overall the situation is rather puzzling. Treating local variables > > as registers is probably the best answer. > > That is sounding quite appealing at the moment. Agreed. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 3:21 ` Alan Stern @ 2021-03-04 5:04 ` Paul E. McKenney 2021-03-04 15:35 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2021-03-04 5:04 UTC (permalink / raw) To: Alan Stern Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote: > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote: > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote: > > > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote: > > > > > > > > Local variables absolutely should be treated just like CPU registers, if > > > > > possible. In fact, the compiler has the option of keeping local > > > > > variables stored in registers. > > > > > > > > > > (Of course, things may get complicated if anyone writes a litmus test > > > > > that uses a pointer to a local variable, Especially if the pointer > > > > > could hold the address of a local variable in one execution and a > > > > > shared variable in another! Or if the pointer is itself a shared > > > > > variable and is dereferenced in another thread!) > > > > > > > > Good point! I did miss this complication. ;-) > > > > > > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of > > > local variables. > > > > > > > As you say, when its address is taken, the "local" variable needs to be > > > > treated as is it were shared. There are exceptions where the pointed-to > > > > local is still used only by its process. Are any of these exceptions > > > > problematic? > > > > > > Easiest just to rule out the whole can of worms. > > > > Good point, given that a global can be used instead of a local for > > any case where an address must be taken. > > Another thing to consider: Almost all marked accesses involve using the > address of the storage location (for example, smp_load_acquire's first > argument must be a pointer). As far as I can remember at the moment, > the only ones that don't are READ_ONCE and WRITE_ONCE. So although we > might or might not want to allow READ_ONCE or WRITE_ONCE on a local > variable, we won't have to worry about any of the other kinds of marked > accesses. Good point! > > > > > But even if local variables are treated as non-shared storage locations, > > > > > we should still handle this correctly. Part of the problem seems to lie > > > > > in the definition of the to-r dependency relation; the relevant portion > > > > > is: > > > > > > > > > > (dep ; [Marked] ; rfi) > > > > > > > > > > Here dep is the control dependency from the READ_ONCE to the > > > > > local-variable store, and the rfi refers to the following load of the > > > > > local variable. The problem is that the store to the local variable > > > > > doesn't go in the Marked class, because it is notated as a plain C > > > > > assignment. (And likewise for the following load.) > > > > > > > > > > Should we change the model to make loads from and stores to local > > > > > variables always count as Marked? > > > > > > > > As long as the initial (possibly unmarked) load would be properly > > > > complained about. > > > > > > Sorry, I don't understand what you mean. > > > > I was thinking in terms of something like this in one of the processes: > > > > p = gp; // Unmarked! > > r1 = p; > > q = r1; // Implicitly marked now? > > if (q) > > WRITE_ONCE(x, 1); // ctrl dep from gp??? > > I hope we won't have to worry about this! :-) Treating local variable > accesses as if they are always marked looks wrong. Good, that is where I was also heading. ;-) > > > > And I cannot immediately think of a situation where > > > > this approach would break that would not result in a data race being > > > > flagged. Or is this yet another failure of my imagination? > > > > > > By definition, an access to a local variable cannot participate in a > > > data race because all such accesses are confined to a single thread. > > > > True, but its value might have come from a load from a shared variable. > > Then that load could have participated in a data race. But the store to > the local variable cannot. Agreed. My thought was that if the ordering from the initial (non-local) load mattered, then that initial load must have participated in a data race. Is that true, or am I failing to perceive some corner case? > > > However, there are other aspects to consider, in particular, the > > > ordering relations on local-variable accesses. But if, as Luc says, > > > local variables are treated just like registers then perhaps the issue > > > doesn't arise. > > > > Here is hoping! > > > > > > > What should have happened if the local variable were instead a shared > > > > > variable which the other thread didn't access at all? It seems like a > > > > > weak point of the memory model that it treats these two things > > > > > differently. > > > > > > > > But is this really any different than the situation where a global > > > > variable is only accessed by a single thread? > > > > > > Indeed; it is the _same_ situation. Which leads to some interesting > > > questions, such as: What does READ_ONCE(r) mean when r is a local > > > variable? Should it be allowed at all? In what way is it different > > > from a plain read of r? > > > > > > One difference is that the LKMM doesn't allow dependencies to originate > > > from a plain load. Of course, when you're dealing with a local > > > variable, what matters is not the load from that variable but rather the > > > earlier loads which determined the value that had been stored there. > > > Which brings us back to the case of the > > > > > > dep ; rfi > > > > > > dependency relation, where the accesses in the middle are plain and > > > non-racy. Should the LKMM be changed to allow this? > > > > It would be nice, give or take the potential side effects. ;-) > > As in it would be nice, but might not be worthwhile. > > Treating local variables like registers will automatically bring this > behavior. So I think we'll be good. Sounds good. > > > There are other differences to consider. For example: > > > > > > r = READ_ONCE(x); > > > smp_wmb(); > > > WRITE_ONCE(y, 1); > > > > > > If the write to r were treated as a marked store, the smp_wmb would > > > order it (and consequently the READ_ONCE) before the WRITE_ONCE. > > > However we don't want to do this when r is a local variable. Indeed, a > > > plain store wouldn't be ordered this way because the compiler might > > > optimize the store away entirely, leaving the smp_wmb nothing to act on. > > > > Agreed, having smp_wmb() order things due to a write to a local variable > > would not be what we want. > > > > > So overall the situation is rather puzzling. Treating local variables > > > as registers is probably the best answer. > > > > That is sounding quite appealing at the moment. > > Agreed. Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 5:04 ` Paul E. McKenney @ 2021-03-04 15:35 ` Alan Stern 2021-03-04 19:05 ` Paul E. McKenney 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2021-03-04 15:35 UTC (permalink / raw) To: Paul E. McKenney Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote: > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote: > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote: > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > > > > And I cannot immediately think of a situation where > > > > > this approach would break that would not result in a data race being > > > > > flagged. Or is this yet another failure of my imagination? > > > > > > > > By definition, an access to a local variable cannot participate in a > > > > data race because all such accesses are confined to a single thread. > > > > > > True, but its value might have come from a load from a shared variable. > > > > Then that load could have participated in a data race. But the store to > > the local variable cannot. > > Agreed. My thought was that if the ordering from the initial (non-local) > load mattered, then that initial load must have participated in a > data race. Is that true, or am I failing to perceive some corner case? Ordering can matter even when no data race is involved. Just think about how much of the memory model is concerned with ordering of marked accesses, which don't participate in data races unless there is a conflicting plain access somewhere. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 15:35 ` Alan Stern @ 2021-03-04 19:05 ` Paul E. McKenney 2021-03-04 21:27 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2021-03-04 19:05 UTC (permalink / raw) To: Alan Stern Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote: > On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote: > > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote: > > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote: > > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > > > > > > And I cannot immediately think of a situation where > > > > > > this approach would break that would not result in a data race being > > > > > > flagged. Or is this yet another failure of my imagination? > > > > > > > > > > By definition, an access to a local variable cannot participate in a > > > > > data race because all such accesses are confined to a single thread. > > > > > > > > True, but its value might have come from a load from a shared variable. > > > > > > Then that load could have participated in a data race. But the store to > > > the local variable cannot. > > > > Agreed. My thought was that if the ordering from the initial (non-local) > > load mattered, then that initial load must have participated in a > > data race. Is that true, or am I failing to perceive some corner case? > > Ordering can matter even when no data race is involved. Just think > about how much of the memory model is concerned with ordering of > marked accesses, which don't participate in data races unless there is > a conflicting plain access somewhere. Fair point. Should I have instead said "then that initial load must have run concurrently with a store to that same variable"? Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 19:05 ` Paul E. McKenney @ 2021-03-04 21:27 ` Alan Stern 2021-03-04 22:05 ` Paul E. McKenney 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2021-03-04 21:27 UTC (permalink / raw) To: Paul E. McKenney Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Thu, Mar 04, 2021 at 11:05:15AM -0800, Paul E. McKenney wrote: > On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote: > > On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote: > > > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote: > > > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote: > > > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > > > > > > > > And I cannot immediately think of a situation where > > > > > > > this approach would break that would not result in a data race being > > > > > > > flagged. Or is this yet another failure of my imagination? > > > > > > > > > > > > By definition, an access to a local variable cannot participate in a > > > > > > data race because all such accesses are confined to a single thread. > > > > > > > > > > True, but its value might have come from a load from a shared variable. > > > > > > > > Then that load could have participated in a data race. But the store to > > > > the local variable cannot. > > > > > > Agreed. My thought was that if the ordering from the initial (non-local) > > > load mattered, then that initial load must have participated in a > > > data race. Is that true, or am I failing to perceive some corner case? > > > > Ordering can matter even when no data race is involved. Just think > > about how much of the memory model is concerned with ordering of > > marked accesses, which don't participate in data races unless there is > > a conflicting plain access somewhere. > > Fair point. Should I have instead said "then that initial load must > have run concurrently with a store to that same variable"? I'm losing track of the point you were originally trying to make. Does ordering matter when there are no conflicting accesses? Sure. Consider this: A: r1 = READ_ONCE(x); B: WRITE_ONCE(y, r1); smp_wmb(); C: WRITE_ONCE(z, 1); Even if there are no other accesses to y at all (let alone any conflicting ones), the mere existence of B forces A to be ordered before C, and this is easily detectable by a litmus test. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 21:27 ` Alan Stern @ 2021-03-04 22:05 ` Paul E. McKenney 0 siblings, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2021-03-04 22:05 UTC (permalink / raw) To: Alan Stern Cc: Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Thu, Mar 04, 2021 at 04:27:53PM -0500, Alan Stern wrote: > On Thu, Mar 04, 2021 at 11:05:15AM -0800, Paul E. McKenney wrote: > > On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote: > > > On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote: > > > > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote: > > > > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote: > > > > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > > > > > > > > > > And I cannot immediately think of a situation where > > > > > > > > this approach would break that would not result in a data race being > > > > > > > > flagged. Or is this yet another failure of my imagination? > > > > > > > > > > > > > > By definition, an access to a local variable cannot participate in a > > > > > > > data race because all such accesses are confined to a single thread. > > > > > > > > > > > > True, but its value might have come from a load from a shared variable. > > > > > > > > > > Then that load could have participated in a data race. But the store to > > > > > the local variable cannot. > > > > > > > > Agreed. My thought was that if the ordering from the initial (non-local) > > > > load mattered, then that initial load must have participated in a > > > > data race. Is that true, or am I failing to perceive some corner case? > > > > > > Ordering can matter even when no data race is involved. Just think > > > about how much of the memory model is concerned with ordering of > > > marked accesses, which don't participate in data races unless there is > > > a conflicting plain access somewhere. > > > > Fair point. Should I have instead said "then that initial load must > > have run concurrently with a store to that same variable"? > > I'm losing track of the point you were originally trying to make. > > Does ordering matter when there are no conflicting accesses? Sure. > Consider this: > > A: r1 = READ_ONCE(x); > B: WRITE_ONCE(y, r1); > smp_wmb(); > C: WRITE_ONCE(z, 1); > > Even if there are no other accesses to y at all (let alone any > conflicting ones), the mere existence of B forces A to be ordered before > C, and this is easily detectable by a litmus test. Given that herd7 treats all local variables as registers (including forbidding taking their addresses), and given that we are not thinking of treating local-variable accesses as if they were marked, this is likely all moot. But just in case... I was trying to figure out if there was a litmus test of the following form where it might make a difference if local-variable accesses were treated as if they were marked. So is there something like this: r1 = x; if (r1) WRITE_ONCE(y, 1); where implicitly treating the accesses to r1 as marked would make a difference. I was thinking that any such example would have to result in LKMM flagging the load from x as a data race. However, your example inserting the smp_wmb() does shed some doubt on that theory. This of course is moot unless we come back to treating local-variable accesses as if they were marked. Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 20:22 ` Alan Stern 2021-03-03 22:03 ` Paul E. McKenney @ 2021-03-04 1:26 ` Boqun Feng 2021-03-04 3:13 ` Alan Stern 2021-03-04 15:44 ` maranget 2 siblings, 1 reply; 33+ messages in thread From: Boqun Feng @ 2021-03-04 1:26 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote: > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote: > > > > Local variables absolutely should be treated just like CPU registers, if > > > possible. In fact, the compiler has the option of keeping local > > > variables stored in registers. > > > > > > (Of course, things may get complicated if anyone writes a litmus test > > > that uses a pointer to a local variable, Especially if the pointer > > > could hold the address of a local variable in one execution and a > > > shared variable in another! Or if the pointer is itself a shared > > > variable and is dereferenced in another thread!) > > > > Good point! I did miss this complication. ;-) > > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of > local variables. > > > As you say, when its address is taken, the "local" variable needs to be > > treated as is it were shared. There are exceptions where the pointed-to > > local is still used only by its process. Are any of these exceptions > > problematic? > > Easiest just to rule out the whole can of worms. > > > > But even if local variables are treated as non-shared storage locations, > > > we should still handle this correctly. Part of the problem seems to lie > > > in the definition of the to-r dependency relation; the relevant portion > > > is: > > > > > > (dep ; [Marked] ; rfi) > > > > > > Here dep is the control dependency from the READ_ONCE to the > > > local-variable store, and the rfi refers to the following load of the > > > local variable. The problem is that the store to the local variable > > > doesn't go in the Marked class, because it is notated as a plain C > > > assignment. (And likewise for the following load.) > > > > > > Should we change the model to make loads from and stores to local > > > variables always count as Marked? > > > > As long as the initial (possibly unmarked) load would be properly > > complained about. > > Sorry, I don't understand what you mean. > > > And I cannot immediately think of a situation where > > this approach would break that would not result in a data race being > > flagged. Or is this yet another failure of my imagination? > > By definition, an access to a local variable cannot participate in a > data race because all such accesses are confined to a single thread. > > However, there are other aspects to consider, in particular, the > ordering relations on local-variable accesses. But if, as Luc says, > local variables are treated just like registers then perhaps the issue > doesn't arise. > > > > What should have happened if the local variable were instead a shared > > > variable which the other thread didn't access at all? It seems like a > > > weak point of the memory model that it treats these two things > > > differently. > > > > But is this really any different than the situation where a global > > variable is only accessed by a single thread? > > Indeed; it is the _same_ situation. Which leads to some interesting > questions, such as: What does READ_ONCE(r) mean when r is a local > variable? Should it be allowed at all? In what way is it different > from a plain read of r? > > One difference is that the LKMM doesn't allow dependencies to originate > from a plain load. Of course, when you're dealing with a local > variable, what matters is not the load from that variable but rather the > earlier loads which determined the value that had been stored there. > Which brings us back to the case of the > > dep ; rfi > > dependency relation, where the accesses in the middle are plain and > non-racy. Should the LKMM be changed to allow this? > For this particular question, do we need to consider code as the follow? r1 = READ_ONCE(x); // f if (r == 1) { local_v = &y; // g do_something_a(); } else { local_v = &y; do_something_b(); } r2 = READ_ONCE(*local_v); // e , do we have the guarantee that the first READ_ONCE() happens before the second one? Can compiler optimize the code as: r2 = READ_ONCE(y); r1 = READ_ONCE(x); if (r == 1) { do_something_a(); } else { do_something_b(); } ? Although we have: f ->dep g ->rfi ->addr e Regards, Boqun > There are other differences to consider. For example: > > r = READ_ONCE(x); > smp_wmb(); > WRITE_ONCE(y, 1); > > If the write to r were treated as a marked store, the smp_wmb would > order it (and consequently the READ_ONCE) before the WRITE_ONCE. > However we don't want to do this when r is a local variable. Indeed, a > plain store wouldn't be ordered this way because the compiler might > optimize the store away entirely, leaving the smp_wmb nothing to act on. > > So overall the situation is rather puzzling. Treating local variables > as registers is probably the best answer. > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 1:26 ` Boqun Feng @ 2021-03-04 3:13 ` Alan Stern 2021-03-04 6:33 ` Boqun Feng 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2021-03-04 3:13 UTC (permalink / raw) To: Boqun Feng Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Thu, Mar 04, 2021 at 09:26:31AM +0800, Boqun Feng wrote: > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > Which brings us back to the case of the > > > > dep ; rfi > > > > dependency relation, where the accesses in the middle are plain and > > non-racy. Should the LKMM be changed to allow this? > > > > For this particular question, do we need to consider code as the follow? > > r1 = READ_ONCE(x); // f > if (r == 1) { > local_v = &y; // g > do_something_a(); > } > else { > local_v = &y; > do_something_b(); > } > > r2 = READ_ONCE(*local_v); // e > > , do we have the guarantee that the first READ_ONCE() happens before the > second one? Can compiler optimize the code as: > > r2 = READ_ONCE(y); > r1 = READ_ONCE(x); Well, it can't do that because the compiler isn't allowed to reorder volatile accesses (which includes READ_ONCE). But the compiler could do: r1 = READ_ONCE(x); r2 = READ_ONCE(y); > if (r == 1) { > do_something_a(); > } > else { > do_something_b(); > } > > ? Although we have: > > f ->dep g ->rfi ->addr e This would be an example of a problem Paul has described on several occasions, where both arms of an "if" statement store the same value (in this case to local_v). This problem arises even when local variables are not involved. For example: if (READ_ONCE(x) == 0) { WRITE_ONCE(y, 1); do_a(); } else { WRITE_ONCE(y, 1); do_b(); } The compiler can change this to: r = READ_ONCE(x); WRITE_ONCE(y, 1); if (r == 0) do_a(); else do_b(); thus allowing the marked accesses to be reordered by the CPU and breaking the apparent control dependency. So the answer to your question is: No, we don't have this guarantee, but the reason is because of doing the same store in both arms, not because of the use of local variables. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 3:13 ` Alan Stern @ 2021-03-04 6:33 ` Boqun Feng 2021-03-04 16:11 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Boqun Feng @ 2021-03-04 6:33 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Wed, Mar 03, 2021 at 10:13:22PM -0500, Alan Stern wrote: > On Thu, Mar 04, 2021 at 09:26:31AM +0800, Boqun Feng wrote: > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote: > > > > Which brings us back to the case of the > > > > > > dep ; rfi > > > > > > dependency relation, where the accesses in the middle are plain and > > > non-racy. Should the LKMM be changed to allow this? > > > > > > > For this particular question, do we need to consider code as the follow? > > > > r1 = READ_ONCE(x); // f > > if (r == 1) { > > local_v = &y; // g > > do_something_a(); > > } > > else { > > local_v = &y; > > do_something_b(); > > } > > > > r2 = READ_ONCE(*local_v); // e > > > > , do we have the guarantee that the first READ_ONCE() happens before the > > second one? Can compiler optimize the code as: > > > > r2 = READ_ONCE(y); > > r1 = READ_ONCE(x); > > Well, it can't do that because the compiler isn't allowed to reorder > volatile accesses (which includes READ_ONCE). But the compiler could > do: > > r1 = READ_ONCE(x); > r2 = READ_ONCE(y); > > > if (r == 1) { > > do_something_a(); > > } > > else { > > do_something_b(); > > } > > > > ? Although we have: > > > > f ->dep g ->rfi ->addr e > > This would be an example of a problem Paul has described on several > occasions, where both arms of an "if" statement store the same value > (in this case to local_v). This problem arises even when local > variables are not involved. For example: > > if (READ_ONCE(x) == 0) { > WRITE_ONCE(y, 1); > do_a(); > } else { > WRITE_ONCE(y, 1); > do_b(); > } > > The compiler can change this to: > > r = READ_ONCE(x); > WRITE_ONCE(y, 1); > if (r == 0) > do_a(); > else > do_b(); > > thus allowing the marked accesses to be reordered by the CPU and > breaking the apparent control dependency. > > So the answer to your question is: No, we don't have this guarantee, > but the reason is because of doing the same store in both arms, not > because of the use of local variables. > Right, I was thinking about something unrelated.. but how about the following case: local_v = &y; r1 = READ_ONCE(*x); // f if (r1 == 1) { local_v = &y; // e } else { local_v = &z; // d } p = READ_ONCE(local_v); // g r2 = READ_ONCE(*p); // h if r1 == 1, we definitely think we have: f ->ctrl e ->rfi g ->addr h , and if we treat ctrl;rfi as "to-r", then we have "f" happens before "h". However compile can optimze the above as: local_v = &y; r1 = READ_ONCE(*x); // f if (r1 != 1) { local_v = &z; // d } p = READ_ONCE(local_v); // g r2 = READ_ONCE(*p); // h , and when this gets executed, I don't think we have the guarantee we have "f" happens before "h", because CPU can do optimistic read for "g" and "h". Part of this is because when we take plain access into consideration, we won't guarantee a read-from or other relations exists if compiler optimization happens. Maybe I'm missing something subtle, but just try to think through the effect of making dep; rfi as "to-r". Regards, Boqun > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 6:33 ` Boqun Feng @ 2021-03-04 16:11 ` Alan Stern 2021-03-05 1:12 ` Boqun Feng 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2021-03-04 16:11 UTC (permalink / raw) To: Boqun Feng Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Thu, Mar 04, 2021 at 02:33:32PM +0800, Boqun Feng wrote: > Right, I was thinking about something unrelated.. but how about the > following case: > > local_v = &y; > r1 = READ_ONCE(*x); // f > > if (r1 == 1) { > local_v = &y; // e > } else { > local_v = &z; // d > } > > p = READ_ONCE(local_v); // g > > r2 = READ_ONCE(*p); // h > > if r1 == 1, we definitely think we have: > > f ->ctrl e ->rfi g ->addr h > > , and if we treat ctrl;rfi as "to-r", then we have "f" happens before > "h". However compile can optimze the above as: > > local_v = &y; > > r1 = READ_ONCE(*x); // f > > if (r1 != 1) { > local_v = &z; // d > } > > p = READ_ONCE(local_v); // g > > r2 = READ_ONCE(*p); // h > > , and when this gets executed, I don't think we have the guarantee we > have "f" happens before "h", because CPU can do optimistic read for "g" > and "h". In your example, which accesses are supposed to be to actual memory and which to registers? Also, remember that the memory model assumes the hardware does not reorder loads if there is an address dependency between them. > Part of this is because when we take plain access into consideration, we > won't guarantee a read-from or other relations exists if compiler > optimization happens. > > Maybe I'm missing something subtle, but just try to think through the > effect of making dep; rfi as "to-r". Forget about local variables for the time being and just consider dep ; [Plain] ; rfi For example: A: r1 = READ_ONCE(x); y = r1; B: r2 = READ_ONCE(y); Should B be ordered after A? I don't see how any CPU could hope to excute B before A, but maybe I'm missing something. There's another twist, connected with the fact that herd7 can't detect control dependencies caused by unexecuted code. If we have: A: r1 = READ_ONCE(x); if (r1) WRITE_ONCE(y, 5); r2 = READ_ONCE(y); B: WRITE_ONCE(z, r2); then in executions where x == 0, herd7 doesn't see any control dependency. But CPUs do see control dependencies whenever there is a conditional branch, whether the branch is taken or not, and so they will never reorder B before A. One last thing to think about: My original assessment or Björn's problem wasn't right, because the dep in (dep ; rfi) doesn't include control dependencies. Only data and address. So I believe that the LKMM wouldn't consider A to be ordered before B in this example even if x was nonzero. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 16:11 ` Alan Stern @ 2021-03-05 1:12 ` Boqun Feng 2021-03-05 16:15 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Boqun Feng @ 2021-03-05 1:12 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Thu, Mar 04, 2021 at 11:11:42AM -0500, Alan Stern wrote: > On Thu, Mar 04, 2021 at 02:33:32PM +0800, Boqun Feng wrote: > > > Right, I was thinking about something unrelated.. but how about the > > following case: > > > > local_v = &y; > > r1 = READ_ONCE(*x); // f > > > > if (r1 == 1) { > > local_v = &y; // e > > } else { > > local_v = &z; // d > > } > > > > p = READ_ONCE(local_v); // g > > > > r2 = READ_ONCE(*p); // h > > > > if r1 == 1, we definitely think we have: > > > > f ->ctrl e ->rfi g ->addr h > > > > , and if we treat ctrl;rfi as "to-r", then we have "f" happens before > > "h". However compile can optimze the above as: > > > > local_v = &y; > > > > r1 = READ_ONCE(*x); // f > > > > if (r1 != 1) { > > local_v = &z; // d > > } > > > > p = READ_ONCE(local_v); // g > > > > r2 = READ_ONCE(*p); // h > > > > , and when this gets executed, I don't think we have the guarantee we > > have "f" happens before "h", because CPU can do optimistic read for "g" > > and "h". > > In your example, which accesses are supposed to be to actual memory and > which to registers? Also, remember that the memory model assumes the Given that we use READ_ONCE() on local_v, local_v should be a memory location but only accessed by this thread. > hardware does not reorder loads if there is an address dependency > between them. > Right, so "g" won't be reordered after "h". > > Part of this is because when we take plain access into consideration, we > > won't guarantee a read-from or other relations exists if compiler > > optimization happens. > > > > Maybe I'm missing something subtle, but just try to think through the > > effect of making dep; rfi as "to-r". > > Forget about local variables for the time being and just consider > > dep ; [Plain] ; rfi > > For example: > > A: r1 = READ_ONCE(x); > y = r1; > B: r2 = READ_ONCE(y); > > Should B be ordered after A? I don't see how any CPU could hope to > excute B before A, but maybe I'm missing something. > Agreed. > There's another twist, connected with the fact that herd7 can't detect > control dependencies caused by unexecuted code. If we have: > > A: r1 = READ_ONCE(x); > if (r1) > WRITE_ONCE(y, 5); > r2 = READ_ONCE(y); > B: WRITE_ONCE(z, r2); > > then in executions where x == 0, herd7 doesn't see any control > dependency. But CPUs do see control dependencies whenever there is a > conditional branch, whether the branch is taken or not, and so they will > never reorder B before A. > Right, because B in this example is a write, what if B is a read that depends on r2, like in my example? Let y be a pointer to a memory location, and initialized as a valid value (pointing to a valid memory location) you example changed to: A: r1 = READ_ONCE(x); if (r1) WRITE_ONCE(y, 5); C: r2 = READ_ONCE(y); B: r3 = READ_ONCE(*r2); , then A don't have the control dependency to B, because A and B is read+read. So B can be ordered before A, right? > One last thing to think about: My original assessment or Björn's problem > wasn't right, because the dep in (dep ; rfi) doesn't include control > dependencies. Only data and address. So I believe that the LKMM Ah, right. I was mising that part (ctrl is not in dep). So I guess my example is pointless for the question we are discussing here ;-( > wouldn't consider A to be ordered before B in this example even if x > was nonzero. Yes, and similar to my example (changing B to a read). I did try to run my example with herd, and got confused no matter I make dep; [Plain]; rfi as to-r (I got the same result telling me a reorder can happen). Now the reason is clear, because this is a ctrl; rfi not a dep; rfi. Thanks so much for walking with me on this ;-) Regards, Boqun > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-05 1:12 ` Boqun Feng @ 2021-03-05 16:15 ` Alan Stern 0 siblings, 0 replies; 33+ messages in thread From: Alan Stern @ 2021-03-05 16:15 UTC (permalink / raw) To: Boqun Feng Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, parri.andrea, Will Deacon, Peter Zijlstra, npiggin, dhowells, j.alglave, luc.maranget, akiyks, dlustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Fri, Mar 05, 2021 at 09:12:30AM +0800, Boqun Feng wrote: > On Thu, Mar 04, 2021 at 11:11:42AM -0500, Alan Stern wrote: > > Forget about local variables for the time being and just consider > > > > dep ; [Plain] ; rfi > > > > For example: > > > > A: r1 = READ_ONCE(x); > > y = r1; > > B: r2 = READ_ONCE(y); > > > > Should B be ordered after A? I don't see how any CPU could hope to > > excute B before A, but maybe I'm missing something. > > > > Agreed. > > > There's another twist, connected with the fact that herd7 can't detect > > control dependencies caused by unexecuted code. If we have: > > > > A: r1 = READ_ONCE(x); > > if (r1) > > WRITE_ONCE(y, 5); > > r2 = READ_ONCE(y); > > B: WRITE_ONCE(z, r2); > > > > then in executions where x == 0, herd7 doesn't see any control > > dependency. But CPUs do see control dependencies whenever there is a > > conditional branch, whether the branch is taken or not, and so they will > > never reorder B before A. > > > > Right, because B in this example is a write, what if B is a read that > depends on r2, like in my example? Let y be a pointer to a memory > location, and initialized as a valid value (pointing to a valid memory > location) you example changed to: > > A: r1 = READ_ONCE(x); > if (r1) > WRITE_ONCE(y, 5); > C: r2 = READ_ONCE(y); > B: r3 = READ_ONCE(*r2); > > , then A don't have the control dependency to B, because A and B is > read+read. So B can be ordered before A, right? Yes, I think that's right: Both C and B can be executed before A. > > One last thing to think about: My original assessment or Björn's problem > > wasn't right, because the dep in (dep ; rfi) doesn't include control > > dependencies. Only data and address. So I believe that the LKMM > > Ah, right. I was mising that part (ctrl is not in dep). So I guess my > example is pointless for the question we are discussing here ;-( > > > wouldn't consider A to be ordered before B in this example even if x > > was nonzero. > > Yes, and similar to my example (changing B to a read). > > I did try to run my example with herd, and got confused no matter I make > dep; [Plain]; rfi as to-r (I got the same result telling me a reorder > can happen). Now the reason is clear, because this is a ctrl; rfi not a > dep; rfi. > > Thanks so much for walking with me on this ;-) You're welcome. At this point, it looks like the only remaining question is whether to include (dep ; [Plain] ; rfi) in to-r. This doesn't seem to be an urgent question. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-03 20:22 ` Alan Stern 2021-03-03 22:03 ` Paul E. McKenney 2021-03-04 1:26 ` Boqun Feng @ 2021-03-04 15:44 ` maranget 2021-03-04 19:07 ` Paul E. McKenney 2 siblings, 1 reply; 33+ messages in thread From: maranget @ 2021-03-04 15:44 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Björn Töpel, bpf, LKML, Andrea Parri, Will Deacon, Peter Zijlstra, Boqun Feng, Nicholas Piggin, David Howells, Alglave, Jade, Akira Yokosawa, Daniel Lustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus > On 3 Mar 2021, at 21:22, Alan Stern <stern@rowland.harvard.edu> wrote: > >>> >>> Local variables absolutely should be treated just like CPU registers, if >>> possible. In fact, the compiler has the option of keeping local >>> variables stored in registers. >>> >>> (Of course, things may get complicated if anyone writes a litmus test >>> that uses a pointer to a local variable, Especially if the pointer >>> could hold the address of a local variable in one execution and a >>> shared variable in another! Or if the pointer is itself a shared >>> variable and is dereferenced in another thread!) >> >> Good point! I did miss this complication. ;-) > > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of > local variables. > > Herd7 does disallow taking addresses of local variables. However, such tests can still be run on machine, provided function bodies are accepted by the C compiler. —Luc ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: XDP socket rings, and LKMM litmus tests 2021-03-04 15:44 ` maranget @ 2021-03-04 19:07 ` Paul E. McKenney 0 siblings, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2021-03-04 19:07 UTC (permalink / raw) To: maranget Cc: Alan Stern, Björn Töpel, bpf, LKML, Andrea Parri, Will Deacon, Peter Zijlstra, Boqun Feng, Nicholas Piggin, David Howells, Alglave, Jade, Akira Yokosawa, Daniel Lustig, joel, Toke Høiland-Jørgensen, Karlsson, Magnus On Thu, Mar 04, 2021 at 04:44:34PM +0100, maranget wrote: > > > > On 3 Mar 2021, at 21:22, Alan Stern <stern@rowland.harvard.edu> wrote: > > > >>> > >>> Local variables absolutely should be treated just like CPU registers, if > >>> possible. In fact, the compiler has the option of keeping local > >>> variables stored in registers. > >>> > >>> (Of course, things may get complicated if anyone writes a litmus test > >>> that uses a pointer to a local variable, Especially if the pointer > >>> could hold the address of a local variable in one execution and a > >>> shared variable in another! Or if the pointer is itself a shared > >>> variable and is dereferenced in another thread!) > >> > >> Good point! I did miss this complication. ;-) > > > > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of > > local variables. > > Herd7 does disallow taking addresses of local variables. Good to know, and thank you! > However, such tests can still be run on machine, provided function bodies are accepted by the C compiler. True, but that would be outside of the LKMM proper, correct? Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2021-03-05 16:16 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-02 18:46 XDP socket rings, and LKMM litmus tests Björn Töpel 2021-03-02 19:57 ` Paul E. McKenney 2021-03-02 20:04 ` Paul E. McKenney 2021-03-02 20:37 ` Björn Töpel 2021-03-02 20:24 ` Björn Töpel 2021-03-02 20:41 ` Paul E. McKenney 2021-03-02 20:51 ` Björn Töpel 2021-03-02 21:14 ` Alan Stern 2021-03-02 23:50 ` Paul E. McKenney 2021-03-03 6:37 ` maranget 2021-03-03 16:54 ` Paul E. McKenney 2021-03-03 17:12 ` Alan Stern 2021-03-03 17:37 ` maranget 2021-03-03 17:39 ` maranget 2021-03-03 21:56 ` Paul E. McKenney 2021-03-03 19:40 ` Alan Stern 2021-03-03 17:40 ` Paul E. McKenney 2021-03-03 20:22 ` Alan Stern 2021-03-03 22:03 ` Paul E. McKenney 2021-03-04 3:21 ` Alan Stern 2021-03-04 5:04 ` Paul E. McKenney 2021-03-04 15:35 ` Alan Stern 2021-03-04 19:05 ` Paul E. McKenney 2021-03-04 21:27 ` Alan Stern 2021-03-04 22:05 ` Paul E. McKenney 2021-03-04 1:26 ` Boqun Feng 2021-03-04 3:13 ` Alan Stern 2021-03-04 6:33 ` Boqun Feng 2021-03-04 16:11 ` Alan Stern 2021-03-05 1:12 ` Boqun Feng 2021-03-05 16:15 ` Alan Stern 2021-03-04 15:44 ` maranget 2021-03-04 19:07 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).