Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
@ 2023-10-27 21:08 Paul E. McKenney
  2023-11-03 17:02 ` Alglave, Jade
  2023-11-05 23:08 ` Fw: " Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-10-27 21:08 UTC (permalink / raw)
  To: j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer,
	parri.andrea
  Cc: linux-kernel, linux-toolchains, peterz, boqun.feng, davidtgoldblatt

Hello!

FYI, unless someone complains, it is quite likely that C++ (and thus
likely C) compilers and standards will enforce Hans Boehm's proposal
for ordering relaxed loads before relaxed stores.  The document [1]
cites "Bounding data races in space and time" by Dolan et al. [2], and
notes an "average a 2.x% slow down" for ARMv8 and PowerPC.  In the past,
this has been considered unacceptable, among other things, due to the
fact that this issue is strictly theoretical.

This would not (repeat, not) affect the current Linux kernel, which
relies on volatile loads and stores rather than C/C++ atomics.

To be clear, the initial proposal is not to change the standards, but
rather to add a command-line argument to enforce the stronger ordering.
However, given the long list of ARM-related folks in the Acknowledgments
section, the future direction is clear.

So, do any ARMv8, PowerPC, or RISC-V people still care?  If so, I strongly
recommend speaking up.  ;-)

							Thanx, Paul

[1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
[2] https://dl.acm.org/doi/10.1145/3192366.3192421

----- Forwarded message from David Goldblatt via Parallel <parallel@lists.isocpp.org> -----

Date: Fri, 27 Oct 2023 11:09:18 -0700
From: David Goldblatt via Parallel <parallel@lists.isocpp.org>
To: SG1 concurrency and parallelism <parallel@lists.isocpp.org>
Reply-To: parallel@lists.isocpp.org
Cc: David Goldblatt <davidtgoldblatt@gmail.com>
Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

Those who read this list but not the LLVM discourse might be interested in:
- This discussion, proposing `-mstrict-rlx-atomics`:
https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473
to enforce load-store ordering
- The associated blog post here:
https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/

- David

_______________________________________________
Parallel mailing list
Parallel@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php

----- End forwarded message -----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-10-27 21:08 Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Paul E. McKenney
@ 2023-11-03 17:02 ` Alglave, Jade
  2023-11-04 18:20   ` Jonas Oberhauser
  2023-11-05 23:08 ` Fw: " Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Alglave, Jade @ 2023-11-03 17:02 UTC (permalink / raw)
  To: will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea,
	paulmck
  Cc: linux-kernel, linux-toolchains, peterz, boqun.feng, davidtgoldblatt

Dear all, (resending because I accidentally sent it in html first, sorry)

Arm’s official position on the topic can be found in this recent blog:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-technical-view-on-relaxed-atomics

Please do reach out to memory-model@arm.com if there are any questions.
Thanks,
Jade

From: Paul E. McKenney <paulmck@kernel.org>
Sent: 27 October 2023 22:08
To: Alglave, Jade <j.alglave@ucl.ac.uk>; will@kernel.org <will@kernel.org>; catalin.marinas@arm.com <catalin.marinas@arm.com>; linux@armlinux.org.uk <linux@armlinux.org.uk>; mpe@ellerman.id.au <mpe@ellerman.id.au>; npiggin@gmail.com <npiggin@gmail.com>; palmer@dabbelt.com <palmer@dabbelt.com>; parri.andrea@gmail.com <parri.andrea@gmail.com>
Cc: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-toolchains@vger.kernel.org <linux-toolchains@vger.kernel.org>; peterz@infradead.org <peterz@infradead.org>; boqun.feng@gmail.com <boqun.feng@gmail.com>; davidtgoldblatt@gmail.com <davidtgoldblatt@gmail.com>
Subject: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

⚠ Caution: External sender

Hello!

FYI, unless someone complains, it is quite likely that C++ (and thus
likely C) compilers and standards will enforce Hans Boehm's proposal
for ordering relaxed loads before relaxed stores.  The document [1]
cites "Bounding data races in space and time" by Dolan et al. [2], and
notes an "average a 2.x% slow down" for ARMv8 and PowerPC.  In the past,
this has been considered unacceptable, among other things, due to the
fact that this issue is strictly theoretical.

This would not (repeat, not) affect the current Linux kernel, which
relies on volatile loads and stores rather than C/C++ atomics.

To be clear, the initial proposal is not to change the standards, but
rather to add a command-line argument to enforce the stronger ordering.
However, given the long list of ARM-related folks in the Acknowledgments
section, the future direction is clear.

So, do any ARMv8, PowerPC, or RISC-V people still care?  If so, I strongly
recommend speaking up.  ;-)

                                                        Thanx, Paul

[1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
[2] https://dl.acm.org/doi/10.1145/3192366.3192421

----- Forwarded message from David Goldblatt via Parallel <parallel@lists.isocpp.org> -----

Date: Fri, 27 Oct 2023 11:09:18 -0700
From: David Goldblatt via Parallel <parallel@lists.isocpp.org>
To: SG1 concurrency and parallelism <parallel@lists.isocpp.org>
Reply-To: parallel@lists.isocpp.org
Cc: David Goldblatt <davidtgoldblatt@gmail.com>
Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

Those who read this list but not the LLVM discourse might be interested in:
- This discussion, proposing `-mstrict-rlx-atomics`:
https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473
to enforce load-store ordering
- The associated blog post here:
https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/

- David

_______________________________________________
Parallel mailing list
Parallel@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php

----- End forwarded message -----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-11-03 17:02 ` Alglave, Jade
@ 2023-11-04 18:20   ` Jonas Oberhauser
  0 siblings, 0 replies; 10+ messages in thread
From: Jonas Oberhauser @ 2023-11-04 18:20 UTC (permalink / raw)
  To: Alglave, Jade, will, catalin.marinas, linux, mpe, npiggin,
	palmer, parri.andrea, paulmck
  Cc: linux-kernel, linux-toolchains, peterz, boqun.feng,
	davidtgoldblatt, viktor

Thanks Jade.

I agree with the position you linked to in that the move is... unwise.

IMO, for a high-level language like C, if you need to outrule OOTA, just 
declare it impossible (Viktor, in CC, made this suggestion a while ago) 
by a "no OOTA axiom".

BTW, is there at least a proof that just making relaxed atomics ordered 
in this way rules out OOTA in programs that contain non-atomics?
Or can we have something like the LKMM OOTA example I sent around last year?


best wishes,

jonas


Am 11/3/2023 um 6:02 PM schrieb Alglave, Jade:
> Dear all, (resending because I accidentally sent it in html first, sorry)
>
> Arm’s official position on the topic can be found in this recent blog:
> https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-technical-view-on-relaxed-atomics
>
> Please do reach out to memory-model@arm.com if there are any questions.
> Thanks,
> Jade
>
>
> From: Paul E. McKenney <paulmck@kernel.org>
> Sent: 27 October 2023 22:08
> To: Alglave, Jade <j.alglave@ucl.ac.uk>; will@kernel.org <will@kernel.org>; catalin.marinas@arm.com <catalin.marinas@arm.com>; linux@armlinux.org.uk <linux@armlinux.org.uk>; mpe@ellerman.id.au <mpe@ellerman.id.au>; npiggin@gmail.com <npiggin@gmail.com>; palmer@dabbelt.com <palmer@dabbelt.com>; parri.andrea@gmail.com <parri.andrea@gmail.com>
> Cc: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-toolchains@vger.kernel.org <linux-toolchains@vger.kernel.org>; peterz@infradead.org <peterz@infradead.org>; boqun.feng@gmail.com <boqun.feng@gmail.com>; davidtgoldblatt@gmail.com <davidtgoldblatt@gmail.com>
> Subject: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
>
> ⚠ Caution: External sender
>
>
> Hello!
>
> FYI, unless someone complains, it is quite likely that C++ (and thus
> likely C) compilers and standards will enforce Hans Boehm's proposal
> for ordering relaxed loads before relaxed stores.  The document [1]
> cites "Bounding data races in space and time" by Dolan et al. [2], and
> notes an "average a 2.x% slow down" for ARMv8 and PowerPC.  In the past,
> this has been considered unacceptable, among other things, due to the
> fact that this issue is strictly theoretical.
>
> This would not (repeat, not) affect the current Linux kernel, which
> relies on volatile loads and stores rather than C/C++ atomics.
>
> To be clear, the initial proposal is not to change the standards, but
> rather to add a command-line argument to enforce the stronger ordering.
> However, given the long list of ARM-related folks in the Acknowledgments
> section, the future direction is clear.
>
> So, do any ARMv8, PowerPC, or RISC-V people still care?  If so, I strongly
> recommend speaking up.  ;-)
>
>                                                          Thanx, Paul
>
> [1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
> [2] https://dl.acm.org/doi/10.1145/3192366.3192421
>
> ----- Forwarded message from David Goldblatt via Parallel <parallel@lists.isocpp.org> -----
>
> Date: Fri, 27 Oct 2023 11:09:18 -0700
> From: David Goldblatt via Parallel <parallel@lists.isocpp.org>
> To: SG1 concurrency and parallelism <parallel@lists.isocpp.org>
> Reply-To: parallel@lists.isocpp.org
> Cc: David Goldblatt <davidtgoldblatt@gmail.com>
> Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
>
> Those who read this list but not the LLVM discourse might be interested in:
> - This discussion, proposing `-mstrict-rlx-atomics`:
> https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473
> to enforce load-store ordering
> - The associated blog post here:
> https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
>
> - David
>
> _______________________________________________
> Parallel mailing list
> Parallel@lists.isocpp.org
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel
> Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php
>
>
> ----- End forwarded message -----


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-10-27 21:08 Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Paul E. McKenney
  2023-11-03 17:02 ` Alglave, Jade
@ 2023-11-05 23:08 ` Peter Zijlstra
  2023-11-07  2:16   ` Paul E. McKenney
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2023-11-05 23:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer,
	parri.andrea, linux-kernel, linux-toolchains, boqun.feng,
	davidtgoldblatt

On Fri, Oct 27, 2023 at 02:08:13PM -0700, Paul E. McKenney wrote:
> Hello!
> 
> FYI, unless someone complains, it is quite likely that C++ (and thus
> likely C) compilers and standards will enforce Hans Boehm's proposal
> for ordering relaxed loads before relaxed stores.  The document [1]
> cites "Bounding data races in space and time" by Dolan et al. [2], and
> notes an "average a 2.x% slow down" for ARMv8 and PowerPC.  In the past,
> this has been considered unacceptable, among other things, due to the
> fact that this issue is strictly theoretical.
> 
> This would not (repeat, not) affect the current Linux kernel, which
> relies on volatile loads and stores rather than C/C++ atomics.
> 
> To be clear, the initial proposal is not to change the standards, but
> rather to add a command-line argument to enforce the stronger ordering.
> However, given the long list of ARM-related folks in the Acknowledgments
> section, the future direction is clear.
> 
> So, do any ARMv8, PowerPC, or RISC-V people still care?  If so, I strongly
> recommend speaking up.  ;-)

OK, I finally had some time to read up...

Colour me properly confused. To me this all reads like C people can't
deal with relaxed atomics and are doing crazy things to try and 'fix'
it.

And while I don't speak for ARM/Power, I do worry this all takes C/C++
even further away from LKMM instead of closing the gap.

Worse, things like:

  https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/

Which state:

 "It would solve real issues in the Linux Kernel without costly fences
 (the kernel does not use relaxed atomics or the ISO C/C++ model - the
 load buffering issue affects the ISO C and linux memory models) ..."

Which is a contradiction if ever I saw one. It both claims this atrocity
fixes our volatile_if() woes while at the same time saying we're
unaffected because we don't use any of the C/C++ atomic batshit.

Anyway, I worry that all this faffing about will get in the way of our
volatile_if() 'demands'. Compiler people will tell us, just use relaxed
atomics, which that is very much not what we want. We know relaxed loads
and stores behave 'funny', we've been doing that for a long long time.
Don't impose that madness on us. And certainly don't use us as an excuse
to peddle this nonsense.

Bah, what a load of crazy.

/me stomps off in disgust.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-11-05 23:08 ` Fw: " Peter Zijlstra
@ 2023-11-07  2:16   ` Paul E. McKenney
  2023-11-07  9:57     ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2023-11-07  2:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer,
	parri.andrea, linux-kernel, linux-toolchains, boqun.feng,
	davidtgoldblatt

On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote:
> On Fri, Oct 27, 2023 at 02:08:13PM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > FYI, unless someone complains, it is quite likely that C++ (and thus
> > likely C) compilers and standards will enforce Hans Boehm's proposal
> > for ordering relaxed loads before relaxed stores.  The document [1]
> > cites "Bounding data races in space and time" by Dolan et al. [2], and
> > notes an "average a 2.x% slow down" for ARMv8 and PowerPC.  In the past,
> > this has been considered unacceptable, among other things, due to the
> > fact that this issue is strictly theoretical.
> > 
> > This would not (repeat, not) affect the current Linux kernel, which
> > relies on volatile loads and stores rather than C/C++ atomics.
> > 
> > To be clear, the initial proposal is not to change the standards, but
> > rather to add a command-line argument to enforce the stronger ordering.
> > However, given the long list of ARM-related folks in the Acknowledgments
> > section, the future direction is clear.
> > 
> > So, do any ARMv8, PowerPC, or RISC-V people still care?  If so, I strongly
> > recommend speaking up.  ;-)
> 
> OK, I finally had some time to read up...
> 
> Colour me properly confused. To me this all reads like C people can't
> deal with relaxed atomics and are doing crazy things to try and 'fix'
> it.
> 
> And while I don't speak for ARM/Power, I do worry this all takes C/C++
> even further away from LKMM instead of closing the gap.
> 
> Worse, things like:
> 
>   https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
> 
> Which state:
> 
>  "It would solve real issues in the Linux Kernel without costly fences
>  (the kernel does not use relaxed atomics or the ISO C/C++ model - the
>  load buffering issue affects the ISO C and linux memory models) ..."
> 
> Which is a contradiction if ever I saw one. It both claims this atrocity
> fixes our volatile_if() woes while at the same time saying we're
> unaffected because we don't use any of the C/C++ atomic batshit.

I guess that my traditional reply would be that if you are properly
confused by all this, that just means that you were reading carefully.

> Anyway, I worry that all this faffing about will get in the way of our
> volatile_if() 'demands'. Compiler people will tell us, just use relaxed
> atomics, which that is very much not what we want. We know relaxed loads
> and stores behave 'funny', we've been doing that for a long long time.
> Don't impose that madness on us. And certainly don't use us as an excuse
> to peddle this nonsense.

I am very much against incurring real overhead to solve an issue that is
an issue only in theory and not in practice.  I wish I could confidently
say that my view will prevail, but...

> Bah, what a load of crazy.
> 
> /me stomps off in disgust.

If this goes through and if developers see any overhead from relaxed
atomics in a situation that matters to them, they will reach for some
other tool.  Inline assembly and volatile accesses, I suppose.  Or the
traditional approach of a compiler flag.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-11-07  2:16   ` Paul E. McKenney
@ 2023-11-07  9:57     ` Segher Boessenkool
  2023-11-07 16:44       ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2023-11-07  9:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe,
	npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains,
	boqun.feng, davidtgoldblatt

On Mon, Nov 06, 2023 at 06:16:24PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote:
> > Which is a contradiction if ever I saw one. It both claims this atrocity
> > fixes our volatile_if() woes while at the same time saying we're
> > unaffected because we don't use any of the C/C++ atomic batshit.
> 
> I guess that my traditional reply would be that if you are properly
> confused by all this, that just means that you were reading carefully.

I'll put that in my quote box :-)

> I am very much against incurring real overhead to solve an issue that is
> an issue only in theory and not in practice.  I wish I could confidently
> say that my view will prevail, but...

Given enough time most theory turns out to be practice.  If what you
are writing has a constrained scope, or a limited impact, or both, you
can ignore this "we'll deal with it later if/when it shows up".  But a
compiler does not have that luxury at all: it has to make correct
translations from source code to assembler code (or machine code
directly, for some compilers), or refuse to compile something.  Making
an incorrect translation is not an option.

> If this goes through and if developers see any overhead from relaxed
> atomics in a situation that matters to them, they will reach for some
> other tool.  Inline assembly and volatile accesses, I suppose.  Or the
> traditional approach of a compiler flag.

And I understand you want the standards to be more useful for the kernel
concurrency model?  Why, exactly?  I can think of many reasons, but I'm
a bit lost as to what motivates you here :-)


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-11-07  9:57     ` Segher Boessenkool
@ 2023-11-07 16:44       ` Paul E. McKenney
  2023-11-09 16:25         ` Jonas Oberhauser
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2023-11-07 16:44 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe,
	npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains,
	boqun.feng, davidtgoldblatt

On Tue, Nov 07, 2023 at 03:57:45AM -0600, Segher Boessenkool wrote:
> On Mon, Nov 06, 2023 at 06:16:24PM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote:
> > > Which is a contradiction if ever I saw one. It both claims this atrocity
> > > fixes our volatile_if() woes while at the same time saying we're
> > > unaffected because we don't use any of the C/C++ atomic batshit.
> > 
> > I guess that my traditional reply would be that if you are properly
> > confused by all this, that just means that you were reading carefully.
> 
> I'll put that in my quote box :-)

Please accept my condolences.  ;-)

> > I am very much against incurring real overhead to solve an issue that is
> > an issue only in theory and not in practice.  I wish I could confidently
> > say that my view will prevail, but...
> 
> Given enough time most theory turns out to be practice.

Actually, in this field, it tends to be the other way around.  After
all, if practice followed theory, we would have all abandoned locks
for non-blocking synchronization in the 1990s, and then switched to
transactional memory in the decade following.

>                                                          If what you
> are writing has a constrained scope, or a limited impact, or both, you
> can ignore this "we'll deal with it later if/when it shows up".  But a
> compiler does not have that luxury at all: it has to make correct
> translations from source code to assembler code (or machine code
> directly, for some compilers), or refuse to compile something.  Making
> an incorrect translation is not an option.

But in this case, it would be most excellent if compiler practice were
to follow theory.  Because the theory of avoiding OOTA without emitting
extraneous instructions is quite simple:

	Avoid breaking semanitic dependencies.

Given that, as you say, incorrect translation is not an option, it
should not be hard to satisfy oneself that a correct compiler must
avoid breaking semantic dependencies, at least for compilers that do not
indulge in value speculation, which would be a very brave indulgence.
Or, alternatively, that any such breaking constitutes a compiler bug.

But when this approach was put forward about a decade ago, compiler
writers were quite resistant.  Plus they argued that concurrency was a
niche use case, which might be a less convincing argument these days.

> > If this goes through and if developers see any overhead from relaxed
> > atomics in a situation that matters to them, they will reach for some
> > other tool.  Inline assembly and volatile accesses, I suppose.  Or the
> > traditional approach of a compiler flag.
> 
> And I understand you want the standards to be more useful for the kernel
> concurrency model?  Why, exactly?  I can think of many reasons, but I'm
> a bit lost as to what motivates you here :-)

My hope is that a number of useful and efficient concurrent-code idioms
can be implemented with help from the compiler, in the Linux kernel
and elsewhere.  Or failing that, at least with less resistance on the
part of the compiler.

Here is an incomplete list:

1.	When it is necessary for fast-path code to load from a shared
	variable, it should be possible to use a single normal load
	instruction for this purpose.  Our current conversation touches
	on this point.

2.	Address dependencies extending to loads do not generate OOTA,
	but are another example of this "just use a normal load" issue.
	Address dependencies need more careful handling because compilers
	can and do convert them to control dependencies.  Which is why
	the Linux-kernel advice in rcu_dereference.rst is to avoid using
	integers for address dependencies, and most especially to avoid
	using booleans for this.

	But it would be nice to be able to tell the compiler that a
	given variable, function parameter, and/or return value carried
	a dependency.  At a minimum, it would be nice if the compiler
	could complain if that dependency would be broken.

3.	It should be possible to implement 50-year-old concurrent
	algorithms straightforwardly (LIFO push stack being the poster
	boy here [1]).	Right now, pointer provenance gets in the way
	of this.  On the other hand, clang/LLVM does quite well without
	pointer provenance, so one has to wonder just how important
	pointer-provenance-based optimizations really are.

	Hans Boehm, were he looking over my shoulder, would add that
	there are a number of single-threaded algorithms that are made
	unnecessarily complex due pointer provenance issues [3].  Hans and
	I often come down on opposite sides of concurrency issues, so
	in cases like this one where we do agree, you just might want
	to pay attention.  ;-)

	Anthony Williams goes further, arguing that pointers should at
	least sometimes just be bags of bits, that is, that the role of
	pointer provenance should be greatly reduced or even eliminated [3].

4.	Semantics of volatile.	Perhaps the current state is the best
	that can be hoped for, but given that the current state is a
	few vague words in the standard in combination with the fact
	that C-language device drivers must be able to use volatile
	to reliably and concurrently access memory shared with device
	firmware, one would hope for better.

5.	UB on signed integer overflow.	Right now, the kernel just
	forces wrapping, which works, so maybe we don't really care
	all that much.	But at this point, it seems to me that it was a
	mistake for the language to have failed to provide a means of
	specifying signed integers that wrap (int_wrap?).  (Yes, yes,
	you can get them by making an atomic signed int, but that is
	not exactly an ergonomic workaround.)

Is this really too much to ask?  If so, why?

The Linux-kernel memory model and associated coding guidelines form
a way of achieving the items on this list with the current level of
cooperation from the compilers, or perhaps more accurately, current
level of lack of cooperation.

The thing is that C11 and C++11 did pretty much the bare minimum to
support concurrency.  After all, when that effort started back in 2005,
concurrency really was a niche use case.  It is only reasonable to
expect a few adjustments almost 20 years on.

Hey, you asked!!!  ;-)

							Thanx, Paul

[1] https://docs.google.com/document/d/12paeC4suYAmVZlQvqytiCjGmEeGdJn0X7oM0SRLMALk/edit
[2] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1726r5.pdf
[3] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2188r1.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-11-07 16:44       ` Paul E. McKenney
@ 2023-11-09 16:25         ` Jonas Oberhauser
  2023-11-09 18:24           ` Boqun Feng
  0 siblings, 1 reply; 10+ messages in thread
From: Jonas Oberhauser @ 2023-11-09 16:25 UTC (permalink / raw)
  To: paulmck, Segher Boessenkool
  Cc: Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe,
	npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains,
	boqun.feng, davidtgoldblatt


Am 11/7/2023 um 5:44 PM schrieb Paul E. McKenney:
> On Tue, Nov 07, 2023 at 03:57:45AM -0600, Segher Boessenkool wrote:
>> On Mon, Nov 06, 2023 at 06:16:24PM -0800, Paul E. McKenney wrote:
>>
>>> I am very much against incurring real overhead to solve an issue that is
>>> an issue only in theory and not in practice.  I wish I could confidently
>>> say that my view will prevail, but...


Where to complain to to increase the chance of your view prevailing?


>>                                                           If what you
>> are writing has a constrained scope, or a limited impact, or both, you
>> can ignore this "we'll deal with it later if/when it shows up".  But a
>> compiler does not have that luxury at all: it has to make correct
>> translations from source code to assembler code (or machine code
>> directly, for some compilers), or refuse to compile something.  Making
>> an incorrect translation is not an option.
> But in this case, it would be most excellent if compiler practice were
> to follow theory.  Because the theory of avoiding OOTA without emitting
> extraneous instructions is quite simple:
>
> 	Avoid breaking semanitic dependencies.


The problem with that is that C, unlike LKMM, doesn't use a ppo-based model.

So there's no notion of "breaking a dependency" that could be applied to 
define what it means "not to break" semantic dependencies.

But the solution is actually simpler. Just add the axiom (proposed by 
Viktor):

     There are no OOTA behaviors


> 4.	Semantics of volatile.	Perhaps the current state is the best
> 	that can be hoped for, but given that the current state is a
> 	few vague words in the standard in combination with the fact
> 	that C-language device drivers must be able to use volatile
> 	to reliably and concurrently access memory shared with device
> 	firmware, one would hope for better.


Is it really so bad? I think the definition in the manual is quite 
precise, if confusing. (volatiles are visible side effects and must 
therefore have the same program order in the abstract machine and in the 
implementation, and that's pretty much it).

There should just be a large explanatory note about what it implies and 
what it doesn't imply.


>
> 5.	UB on signed integer overflow.	Right now, the kernel just
> 	forces wrapping, which works, so maybe we don't really care
> 	all that much.	But at this point, it seems to me that it was a
> 	mistake for the language to have failed to provide a means of
> 	specifying signed integers that wrap (int_wrap?).  (Yes, yes,
> 	you can get them by making an atomic signed int, but that is
> 	not exactly an ergonomic workaround.)

What I don't understand is why they didn't make signed integer types 
wrap when they became two's complement.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-11-09 16:25         ` Jonas Oberhauser
@ 2023-11-09 18:24           ` Boqun Feng
  2023-11-09 20:09             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Boqun Feng @ 2023-11-09 18:24 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: paulmck, Segher Boessenkool, Peter Zijlstra, j.alglave, will,
	catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea,
	linux-kernel, linux-toolchains, davidtgoldblatt

On Thu, Nov 09, 2023 at 05:25:05PM +0100, Jonas Oberhauser wrote:
[...]
> > 4.	Semantics of volatile.	Perhaps the current state is the best
> > 	that can be hoped for, but given that the current state is a
> > 	few vague words in the standard in combination with the fact
> > 	that C-language device drivers must be able to use volatile
> > 	to reliably and concurrently access memory shared with device
> > 	firmware, one would hope for better.
> 
> 
> Is it really so bad? I think the definition in the manual is quite precise,
> if confusing. (volatiles are visible side effects and must therefore have
> the same program order in the abstract machine and in the implementation,
> and that's pretty much it).
> 

But I don't think there is any mention on whether current volatile
accesses can be excluded from data race, or whether a volatile access
on a machine-word size natually aligned object can be teared or not.

Regards,
Boqun

> There should just be a large explanatory note about what it implies and what
> it doesn't imply.
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
  2023-11-09 18:24           ` Boqun Feng
@ 2023-11-09 20:09             ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2023-11-09 20:09 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Jonas Oberhauser, Segher Boessenkool, Peter Zijlstra, j.alglave,
	will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea,
	linux-kernel, linux-toolchains, davidtgoldblatt

On Thu, Nov 09, 2023 at 10:24:23AM -0800, Boqun Feng wrote:
> On Thu, Nov 09, 2023 at 05:25:05PM +0100, Jonas Oberhauser wrote:
> [...]
> > > 4.	Semantics of volatile.	Perhaps the current state is the best
> > > 	that can be hoped for, but given that the current state is a
> > > 	few vague words in the standard in combination with the fact
> > > 	that C-language device drivers must be able to use volatile
> > > 	to reliably and concurrently access memory shared with device
> > > 	firmware, one would hope for better.
> > 
> > 
> > Is it really so bad? I think the definition in the manual is quite precise,
> > if confusing. (volatiles are visible side effects and must therefore have
> > the same program order in the abstract machine and in the implementation,
> > and that's pretty much it).
> 
> But I don't think there is any mention on whether current volatile
> accesses can be excluded from data race, or whether a volatile access
> on a machine-word size natually aligned object can be teared or not.

Here is my understanding:  It must be possible to write C-language
device drivers for devices that...

1.	read and write to normal memory.  If this device has C-langugae
	firmware, volatile reads and writes involving aligned
	machine-word-sized locations must not invoke undefined behavior.

2.	allow concurrent reads and writes to MMIO registers (or to
	normal memory).  Even ignoring the device firmware, volatile
	reads and writes involving aligned machine-word-sized locations
	must not invoke undefined behavior.

Not necessarily a popular view, but then again, in my experience,
objective reality never has been trying to win a popularity contest.

							Thanx, Paul

> Regards,
> Boqun
> 
> > There should just be a large explanatory note about what it implies and what
> > it doesn't imply.
> > 
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-09 20:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-27 21:08 Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Paul E. McKenney
2023-11-03 17:02 ` Alglave, Jade
2023-11-04 18:20   ` Jonas Oberhauser
2023-11-05 23:08 ` Fw: " Peter Zijlstra
2023-11-07  2:16   ` Paul E. McKenney
2023-11-07  9:57     ` Segher Boessenkool
2023-11-07 16:44       ` Paul E. McKenney
2023-11-09 16:25         ` Jonas Oberhauser
2023-11-09 18:24           ` Boqun Feng
2023-11-09 20:09             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).