* Re: Documentation/memory-barriers.txt: Is "stores are not speculated" correct? [not found] <20210426022309.2333D4640475@webmail.sinamail.sina.com.cn> @ 2021-04-26 3:50 ` Paul E. McKenney 2021-04-26 9:30 ` Luc Maranget 0 siblings, 1 reply; 5+ messages in thread From: Paul E. McKenney @ 2021-04-26 3:50 UTC (permalink / raw) To: szyhb810501.student Cc: stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, luc.maranget, linux-kernel On Mon, Apr 26, 2021 at 10:23:09AM +0800, szyhb810501.student@sina.com wrote: > > Hello everyone, I have a question."Documentation/memory-barriers.txt" > says:However, stores are not speculated. This means that ordering -is- > providedfor load-store control dependencies, as in the following example: q = READ_ONCE(a); if (q) { WRITE_ONCE(b, 1); } > Is "stores are not speculated" correct? I > think store instructions can be executed speculatively. > "https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram" > says:Store instructions can also be executed speculatively thanks to the > store buffer. The actual execution of a store just writes the address and > data into the store buffer.Commit to L1d cache happens some time after > the store instruction retires from the ROB, i.e. when the store is known > to be non-speculative, the associated store-buffer entry "graduates" > and becomes eligible to commit to cache and become globally visible. From the viewpoint of other CPUs, the store hasn't really happened until it finds its way into a cacheline. As you yourself note above, if the store is still in the store buffer, it might be squashed when speculation fails. So Documentation/memory-barriers.txt and that stackoverflow entry are not really in conflict, but are instead using words a bit differently from each other. The stackoverflow entry is considering a store to have in some sense happened during a time when it might later be squashed. In contrast, the Documentation/memory-barriers.txt document only considers a store to have completed once it is visible outside of the CPU executing that store. So from a stackoverflow viewpoint, stores can be speculated, but until they are finalized, they must be hidden from other CPUs. From a Documentation/memory-barriers.txt viewpoint, stores don't complete until they update their cachelines, and stores may not be speculated. Some of the actions that lead up to the completion of a store may be speculated, but not the completion of the store itself. Different words, but same effect. Welcome to our world! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Documentation/memory-barriers.txt: Is "stores are not speculated" correct? 2021-04-26 3:50 ` Documentation/memory-barriers.txt: Is "stores are not speculated" correct? Paul E. McKenney @ 2021-04-26 9:30 ` Luc Maranget 2021-04-26 15:13 ` Randy Dunlap 0 siblings, 1 reply; 5+ messages in thread From: Luc Maranget @ 2021-04-26 9:30 UTC (permalink / raw) To: Paul E. McKenney Cc: szyhb810501.student, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, linux-kernel > On Mon, Apr 26, 2021 at 10:23:09AM +0800, szyhb810501.student@sina.com wrote: > > > > Hello everyone, I have a question."Documentation/memory-barriers.txt" > > says:However, stores are not speculated. This means that ordering -is- > > providedfor load-store control dependencies, as in the following example: > q = READ_ONCE(a); > if (q) { > WRITE_ONCE(b, 1); > } > > Is "stores are not speculated" correct? I > > think store instructions can be executed speculatively. > > "https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram" > > says:Store instructions can also be executed speculatively thanks to the > > store buffer. The actual execution of a store just writes the address and > > data into the store buffer.Commit to L1d cache happens some time after > > the store instruction retires from the ROB, i.e. when the store is known > > to be non-speculative, the associated store-buffer entry "graduates" > > and becomes eligible to commit to cache and become globally visible. > > >From the viewpoint of other CPUs, the store hasn't really happened > until it finds its way into a cacheline. As you yourself note above, > if the store is still in the store buffer, it might be squashed when > speculation fails. > > So Documentation/memory-barriers.txt and that stackoverflow entry are > not really in conflict, but are instead using words a bit differently > from each other. The stackoverflow entry is considering a store to have > in some sense happened during a time when it might later be squashed. > In contrast, the Documentation/memory-barriers.txt document only considers > a store to have completed once it is visible outside of the CPU executing > that store. > > So from a stackoverflow viewpoint, stores can be speculated, but until > they are finalized, they must be hidden from other CPUs. > > >From a Documentation/memory-barriers.txt viewpoint, stores don't complete > until they update their cachelines, and stores may not be speculated. > Some of the actions that lead up to the completion of a store may be > speculated, but not the completion of the store itself. > > Different words, but same effect. Welcome to our world! ;-) > > Thanx, Paul Hi all, Here is a complement to Paul's excellent answer. The "CPU-local" speculation of stores can be observed by the following test (in C11) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% C PPOCA {} P0(volatile int* y, volatile int* x) { atomic_store(x,1); atomic_store(y,1); } P1(volatile int* z, volatile int* y, volatile int* x) { int r1=-1; int r2=-1; int r0 = atomic_load_explicit(y,memory_order_relaxed); if (r0) { atomic_store_explicit(z,1,memory_order_relaxed); r1 = atomic_load_explicit(z,memory_order_relaxed); r2 = atomic_load_explicit(x+(r1 & 128),memory_order_relaxed); } } This is a variation on the MP test. Because of tht conditionnal "if (..) { S }" Statements "S" can be executed speculatively. More precisely, the store statement writes value 1 into the CPU local structure for variable z. The next load statement reads the value, and the last load statement can be peformed (speculatively) as its address is known. The resulting outcomme is observed for instance on a RaspBerry Pi3, see attached file. --Luc ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Documentation/memory-barriers.txt: Is "stores are not speculated" correct? 2021-04-26 9:30 ` Luc Maranget @ 2021-04-26 15:13 ` Randy Dunlap 2021-04-26 15:16 ` maranget 0 siblings, 1 reply; 5+ messages in thread From: Randy Dunlap @ 2021-04-26 15:13 UTC (permalink / raw) To: Luc Maranget, Paul E. McKenney Cc: szyhb810501.student, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, linux-kernel On 4/26/21 2:30 AM, Luc Maranget wrote: >> On Mon, Apr 26, 2021 at 10:23:09AM +0800, szyhb810501.student@sina.com wrote: >>> >>> Hello everyone, I have a question."Documentation/memory-barriers.txt" >>> says:However, stores are not speculated. This means that ordering -is- >>> providedfor load-store control dependencies, as in the following example: >> q = READ_ONCE(a); >> if (q) { >> WRITE_ONCE(b, 1); >> } >>> Is "stores are not speculated" correct? I >>> think store instructions can be executed speculatively. >>> "https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram" >>> says:Store instructions can also be executed speculatively thanks to the >>> store buffer. The actual execution of a store just writes the address and >>> data into the store buffer.Commit to L1d cache happens some time after >>> the store instruction retires from the ROB, i.e. when the store is known >>> to be non-speculative, the associated store-buffer entry "graduates" >>> and becomes eligible to commit to cache and become globally visible. >> >> >From the viewpoint of other CPUs, the store hasn't really happened >> until it finds its way into a cacheline. As you yourself note above, >> if the store is still in the store buffer, it might be squashed when >> speculation fails. >> >> So Documentation/memory-barriers.txt and that stackoverflow entry are >> not really in conflict, but are instead using words a bit differently >> from each other. The stackoverflow entry is considering a store to have >> in some sense happened during a time when it might later be squashed. >> In contrast, the Documentation/memory-barriers.txt document only considers >> a store to have completed once it is visible outside of the CPU executing >> that store. >> >> So from a stackoverflow viewpoint, stores can be speculated, but until >> they are finalized, they must be hidden from other CPUs. >> >> >From a Documentation/memory-barriers.txt viewpoint, stores don't complete >> until they update their cachelines, and stores may not be speculated. >> Some of the actions that lead up to the completion of a store may be >> speculated, but not the completion of the store itself. >> >> Different words, but same effect. Welcome to our world! ;-) >> >> Thanx, Paul > > Hi all, > > Here is a complement to Paul's excellent answer. > > The "CPU-local" speculation of stores can be observed > by the following test (in C11) > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > > C PPOCA > > {} > > P0(volatile int* y, volatile int* x) { > > atomic_store(x,1); > atomic_store(y,1); > > } > > P1(volatile int* z, volatile int* y, volatile int* x) { > > int r1=-1; int r2=-1; > int r0 = atomic_load_explicit(y,memory_order_relaxed); > if (r0) { > atomic_store_explicit(z,1,memory_order_relaxed); > r1 = atomic_load_explicit(z,memory_order_relaxed); > r2 = atomic_load_explicit(x+(r1 & 128),memory_order_relaxed); > } > > } > > > This is a variation on the MP test. > > Because of tht conditionnal "if (..) { S }" Statements "S" can be executed > speculatively. > > More precisely, the store statement writes value 1 into the CPU local > structure for variable z. The next load statement reads the value, > and the last load statement can be peformed (speculatively) > as its address is known. > > The resulting outcomme is observed for instance on a RaspBerry Pi3, > see attached file. ?attached file? -- ~Randy ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Documentation/memory-barriers.txt: Is "stores are not speculated" correct? 2021-04-26 15:13 ` Randy Dunlap @ 2021-04-26 15:16 ` maranget 2021-04-26 15:34 ` Paul E. McKenney 0 siblings, 1 reply; 5+ messages in thread From: maranget @ 2021-04-26 15:16 UTC (permalink / raw) To: Randy Dunlap Cc: Paul E. McKenney, szyhb810501.student, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3832 bytes --] > On 26 Apr 2021, at 17:13, Randy Dunlap <rdunlap@infradead.org> wrote: > > On 4/26/21 2:30 AM, Luc Maranget wrote: >>> On Mon, Apr 26, 2021 at 10:23:09AM +0800, szyhb810501.student@sina.com wrote: >>>> >>>> Hello everyone, I have a question."Documentation/memory-barriers.txt" >>>> says:However, stores are not speculated. This means that ordering -is- >>>> providedfor load-store control dependencies, as in the following example: >>> q = READ_ONCE(a); >>> if (q) { >>> WRITE_ONCE(b, 1); >>> } >>>> Is "stores are not speculated" correct? I >>>> think store instructions can be executed speculatively. >>>> "https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram" >>>> says:Store instructions can also be executed speculatively thanks to the >>>> store buffer. The actual execution of a store just writes the address and >>>> data into the store buffer.Commit to L1d cache happens some time after >>>> the store instruction retires from the ROB, i.e. when the store is known >>>> to be non-speculative, the associated store-buffer entry "graduates" >>>> and becomes eligible to commit to cache and become globally visible. >>> >>>> From the viewpoint of other CPUs, the store hasn't really happened >>> until it finds its way into a cacheline. As you yourself note above, >>> if the store is still in the store buffer, it might be squashed when >>> speculation fails. >>> >>> So Documentation/memory-barriers.txt and that stackoverflow entry are >>> not really in conflict, but are instead using words a bit differently >>> from each other. The stackoverflow entry is considering a store to have >>> in some sense happened during a time when it might later be squashed. >>> In contrast, the Documentation/memory-barriers.txt document only considers >>> a store to have completed once it is visible outside of the CPU executing >>> that store. >>> >>> So from a stackoverflow viewpoint, stores can be speculated, but until >>> they are finalized, they must be hidden from other CPUs. >>> >>>> From a Documentation/memory-barriers.txt viewpoint, stores don't complete >>> until they update their cachelines, and stores may not be speculated. >>> Some of the actions that lead up to the completion of a store may be >>> speculated, but not the completion of the store itself. >>> >>> Different words, but same effect. Welcome to our world! ;-) >>> >>> Thanx, Paul >> >> Hi all, >> >> Here is a complement to Paul's excellent answer. >> >> The "CPU-local" speculation of stores can be observed >> by the following test (in C11) >> >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% >> >> C PPOCA >> >> {} >> >> P0(volatile int* y, volatile int* x) { >> >> atomic_store(x,1); >> atomic_store(y,1); >> >> } >> >> P1(volatile int* z, volatile int* y, volatile int* x) { >> >> int r1=-1; int r2=-1; >> int r0 = atomic_load_explicit(y,memory_order_relaxed); >> if (r0) { >> atomic_store_explicit(z,1,memory_order_relaxed); >> r1 = atomic_load_explicit(z,memory_order_relaxed); >> r2 = atomic_load_explicit(x+(r1 & 128),memory_order_relaxed); >> } >> >> } >> >> >> This is a variation on the MP test. >> >> Because of tht conditionnal "if (..) { S }" Statements "S" can be executed >> speculatively. >> >> More precisely, the store statement writes value 1 into the CPU local >> structure for variable z. The next load statement reads the value, >> and the last load statement can be peformed (speculatively) >> as its address is known. >> >> The resulting outcomme is observed for instance on a RaspBerry Pi3, >> see attached file. > > ?attached file? > > -- > ~Randy > Oups, sorry I forgot the attachement: —Luc [-- Attachment #2: LOG.txt --] [-- Type: text/plain, Size: 1483 bytes --] Mon Apr 26 09:07:19 UTC 2021 %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Results for PPOCA.litmus % %%%%%%%%%%%%%%%%%%%%%%%%%%%% C PPOCA {} P0(volatile int* y, volatile int* x) { atomic_store(x,1); atomic_store(y,1); } P1(volatile int* z, volatile int* y, volatile int* x) { int r1=-1; int r2=-1; int r0 = atomic_load_explicit(y,memory_order_relaxed); if (r0) { atomic_store_explicit(z,1,memory_order_relaxed); r1 = atomic_load_explicit(z,memory_order_relaxed); r2 = atomic_load_explicit(x+(r1 & 128),memory_order_relaxed); } } exists (1:r0=1 /\ 1:r1=1 /\ 1:r2=0) Histogram (3 states) 11057696:>1:r0=0; 1:r1=-1; 1:r2=-1; 2 *>1:r0=1; 1:r1=1; 1:r2=0; 8942302:>1:r0=1; 1:r1=1; 1:r2=1; Ok Witnesses Positive: 2, Negative: 19999998 Condition exists (1:r0=1 /\ 1:r1=1 /\ 1:r2=0) is validated Hash=bb2426936c19f1555410d1483dd31452 Observation PPOCA Sometimes 2 19999998 Time PPOCA 3.30 Revision 45690d9d0f7a956a6d3dbaf9e912efb22835756e, version 7.56+02~dev Command line: litmus7 -mach vougeot -c11 true -o R.tar PPOCA.litmus Parameters #define SIZE_OF_TEST 10000 #define NUMBER_OF_RUN 100 #define AVAIL 4 #define STRIDE 1 #define MAX_LOOP 0 /* gcc options: -Wall -std=gnu11 -O2 -pthread */ /* barrier: userfence */ /* launch: changing */ /* affinity: none */ /* alloc: dynamic */ /* memory: direct */ /* stride: 1 */ /* safer: write */ /* preload: random */ /* speedcheck: no */ /* proc used: 4 */ GCC=gcc LITMUSOPTS=-s 5k -r 2k -st 1 Mon Apr 26 09:07:23 UTC 2021 [-- Attachment #3: Type: text/plain, Size: 2 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Documentation/memory-barriers.txt: Is "stores are not speculated" correct? 2021-04-26 15:16 ` maranget @ 2021-04-26 15:34 ` Paul E. McKenney 0 siblings, 0 replies; 5+ messages in thread From: Paul E. McKenney @ 2021-04-26 15:34 UTC (permalink / raw) To: maranget Cc: Randy Dunlap, szyhb810501.student, stern, parri.andrea, will, peterz, boqun.feng, npiggin, dhowells, j.alglave, linux-kernel On Mon, Apr 26, 2021 at 05:16:15PM +0200, maranget wrote: > > > > On 26 Apr 2021, at 17:13, Randy Dunlap <rdunlap@infradead.org> wrote: > > > > On 4/26/21 2:30 AM, Luc Maranget wrote: > >>> On Mon, Apr 26, 2021 at 10:23:09AM +0800, szyhb810501.student@sina.com wrote: > >>>> > >>>> Hello everyone, I have a question."Documentation/memory-barriers.txt" > >>>> says:However, stores are not speculated. This means that ordering -is- > >>>> providedfor load-store control dependencies, as in the following example: > >>> q = READ_ONCE(a); > >>> if (q) { > >>> WRITE_ONCE(b, 1); > >>> } > >>>> Is "stores are not speculated" correct? I > >>>> think store instructions can be executed speculatively. > >>>> "https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram" > >>>> says:Store instructions can also be executed speculatively thanks to the > >>>> store buffer. The actual execution of a store just writes the address and > >>>> data into the store buffer.Commit to L1d cache happens some time after > >>>> the store instruction retires from the ROB, i.e. when the store is known > >>>> to be non-speculative, the associated store-buffer entry "graduates" > >>>> and becomes eligible to commit to cache and become globally visible. > >>> > >>>> From the viewpoint of other CPUs, the store hasn't really happened > >>> until it finds its way into a cacheline. As you yourself note above, > >>> if the store is still in the store buffer, it might be squashed when > >>> speculation fails. > >>> > >>> So Documentation/memory-barriers.txt and that stackoverflow entry are > >>> not really in conflict, but are instead using words a bit differently > >>> from each other. The stackoverflow entry is considering a store to have > >>> in some sense happened during a time when it might later be squashed. > >>> In contrast, the Documentation/memory-barriers.txt document only considers > >>> a store to have completed once it is visible outside of the CPU executing > >>> that store. > >>> > >>> So from a stackoverflow viewpoint, stores can be speculated, but until > >>> they are finalized, they must be hidden from other CPUs. > >>> > >>>> From a Documentation/memory-barriers.txt viewpoint, stores don't complete > >>> until they update their cachelines, and stores may not be speculated. > >>> Some of the actions that lead up to the completion of a store may be > >>> speculated, but not the completion of the store itself. > >>> > >>> Different words, but same effect. Welcome to our world! ;-) > >>> > >>> Thanx, Paul > >> > >> Hi all, > >> > >> Here is a complement to Paul's excellent answer. > >> > >> The "CPU-local" speculation of stores can be observed > >> by the following test (in C11) > >> > >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > >> > >> C PPOCA > >> > >> {} > >> > >> P0(volatile int* y, volatile int* x) { > >> > >> atomic_store(x,1); > >> atomic_store(y,1); > >> > >> } > >> > >> P1(volatile int* z, volatile int* y, volatile int* x) { > >> > >> int r1=-1; int r2=-1; > >> int r0 = atomic_load_explicit(y,memory_order_relaxed); > >> if (r0) { > >> atomic_store_explicit(z,1,memory_order_relaxed); > >> r1 = atomic_load_explicit(z,memory_order_relaxed); > >> r2 = atomic_load_explicit(x+(r1 & 128),memory_order_relaxed); > >> } > >> > >> } > >> > >> > >> This is a variation on the MP test. > >> > >> Because of tht conditionnal "if (..) { S }" Statements "S" can be executed > >> speculatively. > >> > >> More precisely, the store statement writes value 1 into the CPU local > >> structure for variable z. The next load statement reads the value, > >> and the last load statement can be peformed (speculatively) > >> as its address is known. > >> > >> The resulting outcomme is observed for instance on a RaspBerry Pi3, > >> see attached file. > > > > ?attached file? > > > > -- > > ~Randy > > > > Oups, sorry I forgot the attachement: > > —Luc > Mon Apr 26 09:07:19 UTC 2021 > %%%%%%%%%%%%%%%%%%%%%%%%%%%% > % Results for PPOCA.litmus % > %%%%%%%%%%%%%%%%%%%%%%%%%%%% > C PPOCA > > {} > > P0(volatile int* y, volatile int* x) { > > atomic_store(x,1); > atomic_store(y,1); > > } > > P1(volatile int* z, volatile int* y, volatile int* x) { > > int r1=-1; int r2=-1; > int r0 = atomic_load_explicit(y,memory_order_relaxed); > if (r0) { > atomic_store_explicit(z,1,memory_order_relaxed); > r1 = atomic_load_explicit(z,memory_order_relaxed); > r2 = atomic_load_explicit(x+(r1 & 128),memory_order_relaxed); > } > > } > > > exists (1:r0=1 /\ 1:r1=1 /\ 1:r2=0) > > Histogram (3 states) > 11057696:>1:r0=0; 1:r1=-1; 1:r2=-1; > 2 *>1:r0=1; 1:r1=1; 1:r2=0; Fun!!! ;-) Thanx, Paul > 8942302:>1:r0=1; 1:r1=1; 1:r2=1; > Ok > > Witnesses > Positive: 2, Negative: 19999998 > Condition exists (1:r0=1 /\ 1:r1=1 /\ 1:r2=0) is validated > Hash=bb2426936c19f1555410d1483dd31452 > Observation PPOCA Sometimes 2 19999998 > Time PPOCA 3.30 > Revision 45690d9d0f7a956a6d3dbaf9e912efb22835756e, version 7.56+02~dev > Command line: litmus7 -mach vougeot -c11 true -o R.tar PPOCA.litmus > Parameters > #define SIZE_OF_TEST 10000 > #define NUMBER_OF_RUN 100 > #define AVAIL 4 > #define STRIDE 1 > #define MAX_LOOP 0 > /* gcc options: -Wall -std=gnu11 -O2 -pthread */ > /* barrier: userfence */ > /* launch: changing */ > /* affinity: none */ > /* alloc: dynamic */ > /* memory: direct */ > /* stride: 1 */ > /* safer: write */ > /* preload: random */ > /* speedcheck: no */ > /* proc used: 4 */ > GCC=gcc > LITMUSOPTS=-s 5k -r 2k -st 1 > Mon Apr 26 09:07:23 UTC 2021 > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-04-26 15:34 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20210426022309.2333D4640475@webmail.sinamail.sina.com.cn> 2021-04-26 3:50 ` Documentation/memory-barriers.txt: Is "stores are not speculated" correct? Paul E. McKenney 2021-04-26 9:30 ` Luc Maranget 2021-04-26 15:13 ` Randy Dunlap 2021-04-26 15:16 ` maranget 2021-04-26 15:34 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).