* Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion @ 2023-10-27 21:08 Paul E. McKenney 2023-11-03 17:02 ` Alglave, Jade 2023-11-05 23:08 ` Fw: " Peter Zijlstra 0 siblings, 2 replies; 10+ messages in thread From: Paul E. McKenney @ 2023-10-27 21:08 UTC (permalink / raw) To: j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea Cc: linux-kernel, linux-toolchains, peterz, boqun.feng, davidtgoldblatt Hello! FYI, unless someone complains, it is quite likely that C++ (and thus likely C) compilers and standards will enforce Hans Boehm's proposal for ordering relaxed loads before relaxed stores. The document [1] cites "Bounding data races in space and time" by Dolan et al. [2], and notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past, this has been considered unacceptable, among other things, due to the fact that this issue is strictly theoretical. This would not (repeat, not) affect the current Linux kernel, which relies on volatile loads and stores rather than C/C++ atomics. To be clear, the initial proposal is not to change the standards, but rather to add a command-line argument to enforce the stronger ordering. However, given the long list of ARM-related folks in the Acknowledgments section, the future direction is clear. So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly recommend speaking up. ;-) Thanx, Paul [1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ [2] https://dl.acm.org/doi/10.1145/3192366.3192421 ----- Forwarded message from David Goldblatt via Parallel <parallel@lists.isocpp.org> ----- Date: Fri, 27 Oct 2023 11:09:18 -0700 From: David Goldblatt via Parallel <parallel@lists.isocpp.org> To: SG1 concurrency and parallelism <parallel@lists.isocpp.org> Reply-To: parallel@lists.isocpp.org Cc: David Goldblatt <davidtgoldblatt@gmail.com> Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Those who read this list but not the LLVM discourse might be interested in: - This discussion, proposing `-mstrict-rlx-atomics`: https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473 to enforce load-store ordering - The associated blog post here: https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ - David _______________________________________________ Parallel mailing list Parallel@lists.isocpp.org Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php ----- End forwarded message ----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-10-27 21:08 Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Paul E. McKenney @ 2023-11-03 17:02 ` Alglave, Jade 2023-11-04 18:20 ` Jonas Oberhauser 2023-11-05 23:08 ` Fw: " Peter Zijlstra 1 sibling, 1 reply; 10+ messages in thread From: Alglave, Jade @ 2023-11-03 17:02 UTC (permalink / raw) To: will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, paulmck Cc: linux-kernel, linux-toolchains, peterz, boqun.feng, davidtgoldblatt Dear all, (resending because I accidentally sent it in html first, sorry) Arm’s official position on the topic can be found in this recent blog: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-technical-view-on-relaxed-atomics Please do reach out to memory-model@arm.com if there are any questions. Thanks, Jade From: Paul E. McKenney <paulmck@kernel.org> Sent: 27 October 2023 22:08 To: Alglave, Jade <j.alglave@ucl.ac.uk>; will@kernel.org <will@kernel.org>; catalin.marinas@arm.com <catalin.marinas@arm.com>; linux@armlinux.org.uk <linux@armlinux.org.uk>; mpe@ellerman.id.au <mpe@ellerman.id.au>; npiggin@gmail.com <npiggin@gmail.com>; palmer@dabbelt.com <palmer@dabbelt.com>; parri.andrea@gmail.com <parri.andrea@gmail.com> Cc: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-toolchains@vger.kernel.org <linux-toolchains@vger.kernel.org>; peterz@infradead.org <peterz@infradead.org>; boqun.feng@gmail.com <boqun.feng@gmail.com>; davidtgoldblatt@gmail.com <davidtgoldblatt@gmail.com> Subject: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion ⚠ Caution: External sender Hello! FYI, unless someone complains, it is quite likely that C++ (and thus likely C) compilers and standards will enforce Hans Boehm's proposal for ordering relaxed loads before relaxed stores. The document [1] cites "Bounding data races in space and time" by Dolan et al. [2], and notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past, this has been considered unacceptable, among other things, due to the fact that this issue is strictly theoretical. This would not (repeat, not) affect the current Linux kernel, which relies on volatile loads and stores rather than C/C++ atomics. To be clear, the initial proposal is not to change the standards, but rather to add a command-line argument to enforce the stronger ordering. However, given the long list of ARM-related folks in the Acknowledgments section, the future direction is clear. So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly recommend speaking up. ;-) Thanx, Paul [1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ [2] https://dl.acm.org/doi/10.1145/3192366.3192421 ----- Forwarded message from David Goldblatt via Parallel <parallel@lists.isocpp.org> ----- Date: Fri, 27 Oct 2023 11:09:18 -0700 From: David Goldblatt via Parallel <parallel@lists.isocpp.org> To: SG1 concurrency and parallelism <parallel@lists.isocpp.org> Reply-To: parallel@lists.isocpp.org Cc: David Goldblatt <davidtgoldblatt@gmail.com> Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Those who read this list but not the LLVM discourse might be interested in: - This discussion, proposing `-mstrict-rlx-atomics`: https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473 to enforce load-store ordering - The associated blog post here: https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ - David _______________________________________________ Parallel mailing list Parallel@lists.isocpp.org Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php ----- End forwarded message ----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-11-03 17:02 ` Alglave, Jade @ 2023-11-04 18:20 ` Jonas Oberhauser 0 siblings, 0 replies; 10+ messages in thread From: Jonas Oberhauser @ 2023-11-04 18:20 UTC (permalink / raw) To: Alglave, Jade, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, paulmck Cc: linux-kernel, linux-toolchains, peterz, boqun.feng, davidtgoldblatt, viktor Thanks Jade. I agree with the position you linked to in that the move is... unwise. IMO, for a high-level language like C, if you need to outrule OOTA, just declare it impossible (Viktor, in CC, made this suggestion a while ago) by a "no OOTA axiom". BTW, is there at least a proof that just making relaxed atomics ordered in this way rules out OOTA in programs that contain non-atomics? Or can we have something like the LKMM OOTA example I sent around last year? best wishes, jonas Am 11/3/2023 um 6:02 PM schrieb Alglave, Jade: > Dear all, (resending because I accidentally sent it in html first, sorry) > > Arm’s official position on the topic can be found in this recent blog: > https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-technical-view-on-relaxed-atomics > > Please do reach out to memory-model@arm.com if there are any questions. > Thanks, > Jade > > > From: Paul E. McKenney <paulmck@kernel.org> > Sent: 27 October 2023 22:08 > To: Alglave, Jade <j.alglave@ucl.ac.uk>; will@kernel.org <will@kernel.org>; catalin.marinas@arm.com <catalin.marinas@arm.com>; linux@armlinux.org.uk <linux@armlinux.org.uk>; mpe@ellerman.id.au <mpe@ellerman.id.au>; npiggin@gmail.com <npiggin@gmail.com>; palmer@dabbelt.com <palmer@dabbelt.com>; parri.andrea@gmail.com <parri.andrea@gmail.com> > Cc: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-toolchains@vger.kernel.org <linux-toolchains@vger.kernel.org>; peterz@infradead.org <peterz@infradead.org>; boqun.feng@gmail.com <boqun.feng@gmail.com>; davidtgoldblatt@gmail.com <davidtgoldblatt@gmail.com> > Subject: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion > > ⚠ Caution: External sender > > > Hello! > > FYI, unless someone complains, it is quite likely that C++ (and thus > likely C) compilers and standards will enforce Hans Boehm's proposal > for ordering relaxed loads before relaxed stores. The document [1] > cites "Bounding data races in space and time" by Dolan et al. [2], and > notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past, > this has been considered unacceptable, among other things, due to the > fact that this issue is strictly theoretical. > > This would not (repeat, not) affect the current Linux kernel, which > relies on volatile loads and stores rather than C/C++ atomics. > > To be clear, the initial proposal is not to change the standards, but > rather to add a command-line argument to enforce the stronger ordering. > However, given the long list of ARM-related folks in the Acknowledgments > section, the future direction is clear. > > So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly > recommend speaking up. ;-) > > Thanx, Paul > > [1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ > [2] https://dl.acm.org/doi/10.1145/3192366.3192421 > > ----- Forwarded message from David Goldblatt via Parallel <parallel@lists.isocpp.org> ----- > > Date: Fri, 27 Oct 2023 11:09:18 -0700 > From: David Goldblatt via Parallel <parallel@lists.isocpp.org> > To: SG1 concurrency and parallelism <parallel@lists.isocpp.org> > Reply-To: parallel@lists.isocpp.org > Cc: David Goldblatt <davidtgoldblatt@gmail.com> > Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion > > Those who read this list but not the LLVM discourse might be interested in: > - This discussion, proposing `-mstrict-rlx-atomics`: > https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473 > to enforce load-store ordering > - The associated blog post here: > https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ > > - David > > _______________________________________________ > Parallel mailing list > Parallel@lists.isocpp.org > Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel > Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php > > > ----- End forwarded message ----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-10-27 21:08 Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Paul E. McKenney 2023-11-03 17:02 ` Alglave, Jade @ 2023-11-05 23:08 ` Peter Zijlstra 2023-11-07 2:16 ` Paul E. McKenney 1 sibling, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2023-11-05 23:08 UTC (permalink / raw) To: Paul E. McKenney Cc: j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains, boqun.feng, davidtgoldblatt On Fri, Oct 27, 2023 at 02:08:13PM -0700, Paul E. McKenney wrote: > Hello! > > FYI, unless someone complains, it is quite likely that C++ (and thus > likely C) compilers and standards will enforce Hans Boehm's proposal > for ordering relaxed loads before relaxed stores. The document [1] > cites "Bounding data races in space and time" by Dolan et al. [2], and > notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past, > this has been considered unacceptable, among other things, due to the > fact that this issue is strictly theoretical. > > This would not (repeat, not) affect the current Linux kernel, which > relies on volatile loads and stores rather than C/C++ atomics. > > To be clear, the initial proposal is not to change the standards, but > rather to add a command-line argument to enforce the stronger ordering. > However, given the long list of ARM-related folks in the Acknowledgments > section, the future direction is clear. > > So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly > recommend speaking up. ;-) OK, I finally had some time to read up... Colour me properly confused. To me this all reads like C people can't deal with relaxed atomics and are doing crazy things to try and 'fix' it. And while I don't speak for ARM/Power, I do worry this all takes C/C++ even further away from LKMM instead of closing the gap. Worse, things like: https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ Which state: "It would solve real issues in the Linux Kernel without costly fences (the kernel does not use relaxed atomics or the ISO C/C++ model - the load buffering issue affects the ISO C and linux memory models) ..." Which is a contradiction if ever I saw one. It both claims this atrocity fixes our volatile_if() woes while at the same time saying we're unaffected because we don't use any of the C/C++ atomic batshit. Anyway, I worry that all this faffing about will get in the way of our volatile_if() 'demands'. Compiler people will tell us, just use relaxed atomics, which that is very much not what we want. We know relaxed loads and stores behave 'funny', we've been doing that for a long long time. Don't impose that madness on us. And certainly don't use us as an excuse to peddle this nonsense. Bah, what a load of crazy. /me stomps off in disgust. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-11-05 23:08 ` Fw: " Peter Zijlstra @ 2023-11-07 2:16 ` Paul E. McKenney 2023-11-07 9:57 ` Segher Boessenkool 0 siblings, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2023-11-07 2:16 UTC (permalink / raw) To: Peter Zijlstra Cc: j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains, boqun.feng, davidtgoldblatt On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote: > On Fri, Oct 27, 2023 at 02:08:13PM -0700, Paul E. McKenney wrote: > > Hello! > > > > FYI, unless someone complains, it is quite likely that C++ (and thus > > likely C) compilers and standards will enforce Hans Boehm's proposal > > for ordering relaxed loads before relaxed stores. The document [1] > > cites "Bounding data races in space and time" by Dolan et al. [2], and > > notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past, > > this has been considered unacceptable, among other things, due to the > > fact that this issue is strictly theoretical. > > > > This would not (repeat, not) affect the current Linux kernel, which > > relies on volatile loads and stores rather than C/C++ atomics. > > > > To be clear, the initial proposal is not to change the standards, but > > rather to add a command-line argument to enforce the stronger ordering. > > However, given the long list of ARM-related folks in the Acknowledgments > > section, the future direction is clear. > > > > So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly > > recommend speaking up. ;-) > > OK, I finally had some time to read up... > > Colour me properly confused. To me this all reads like C people can't > deal with relaxed atomics and are doing crazy things to try and 'fix' > it. > > And while I don't speak for ARM/Power, I do worry this all takes C/C++ > even further away from LKMM instead of closing the gap. > > Worse, things like: > > https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/ > > Which state: > > "It would solve real issues in the Linux Kernel without costly fences > (the kernel does not use relaxed atomics or the ISO C/C++ model - the > load buffering issue affects the ISO C and linux memory models) ..." > > Which is a contradiction if ever I saw one. It both claims this atrocity > fixes our volatile_if() woes while at the same time saying we're > unaffected because we don't use any of the C/C++ atomic batshit. I guess that my traditional reply would be that if you are properly confused by all this, that just means that you were reading carefully. > Anyway, I worry that all this faffing about will get in the way of our > volatile_if() 'demands'. Compiler people will tell us, just use relaxed > atomics, which that is very much not what we want. We know relaxed loads > and stores behave 'funny', we've been doing that for a long long time. > Don't impose that madness on us. And certainly don't use us as an excuse > to peddle this nonsense. I am very much against incurring real overhead to solve an issue that is an issue only in theory and not in practice. I wish I could confidently say that my view will prevail, but... > Bah, what a load of crazy. > > /me stomps off in disgust. If this goes through and if developers see any overhead from relaxed atomics in a situation that matters to them, they will reach for some other tool. Inline assembly and volatile accesses, I suppose. Or the traditional approach of a compiler flag. Thanx, Paul ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-11-07 2:16 ` Paul E. McKenney @ 2023-11-07 9:57 ` Segher Boessenkool 2023-11-07 16:44 ` Paul E. McKenney 0 siblings, 1 reply; 10+ messages in thread From: Segher Boessenkool @ 2023-11-07 9:57 UTC (permalink / raw) To: Paul E. McKenney Cc: Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains, boqun.feng, davidtgoldblatt On Mon, Nov 06, 2023 at 06:16:24PM -0800, Paul E. McKenney wrote: > On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote: > > Which is a contradiction if ever I saw one. It both claims this atrocity > > fixes our volatile_if() woes while at the same time saying we're > > unaffected because we don't use any of the C/C++ atomic batshit. > > I guess that my traditional reply would be that if you are properly > confused by all this, that just means that you were reading carefully. I'll put that in my quote box :-) > I am very much against incurring real overhead to solve an issue that is > an issue only in theory and not in practice. I wish I could confidently > say that my view will prevail, but... Given enough time most theory turns out to be practice. If what you are writing has a constrained scope, or a limited impact, or both, you can ignore this "we'll deal with it later if/when it shows up". But a compiler does not have that luxury at all: it has to make correct translations from source code to assembler code (or machine code directly, for some compilers), or refuse to compile something. Making an incorrect translation is not an option. > If this goes through and if developers see any overhead from relaxed > atomics in a situation that matters to them, they will reach for some > other tool. Inline assembly and volatile accesses, I suppose. Or the > traditional approach of a compiler flag. And I understand you want the standards to be more useful for the kernel concurrency model? Why, exactly? I can think of many reasons, but I'm a bit lost as to what motivates you here :-) Segher ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-11-07 9:57 ` Segher Boessenkool @ 2023-11-07 16:44 ` Paul E. McKenney 2023-11-09 16:25 ` Jonas Oberhauser 0 siblings, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2023-11-07 16:44 UTC (permalink / raw) To: Segher Boessenkool Cc: Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains, boqun.feng, davidtgoldblatt On Tue, Nov 07, 2023 at 03:57:45AM -0600, Segher Boessenkool wrote: > On Mon, Nov 06, 2023 at 06:16:24PM -0800, Paul E. McKenney wrote: > > On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote: > > > Which is a contradiction if ever I saw one. It both claims this atrocity > > > fixes our volatile_if() woes while at the same time saying we're > > > unaffected because we don't use any of the C/C++ atomic batshit. > > > > I guess that my traditional reply would be that if you are properly > > confused by all this, that just means that you were reading carefully. > > I'll put that in my quote box :-) Please accept my condolences. ;-) > > I am very much against incurring real overhead to solve an issue that is > > an issue only in theory and not in practice. I wish I could confidently > > say that my view will prevail, but... > > Given enough time most theory turns out to be practice. Actually, in this field, it tends to be the other way around. After all, if practice followed theory, we would have all abandoned locks for non-blocking synchronization in the 1990s, and then switched to transactional memory in the decade following. > If what you > are writing has a constrained scope, or a limited impact, or both, you > can ignore this "we'll deal with it later if/when it shows up". But a > compiler does not have that luxury at all: it has to make correct > translations from source code to assembler code (or machine code > directly, for some compilers), or refuse to compile something. Making > an incorrect translation is not an option. But in this case, it would be most excellent if compiler practice were to follow theory. Because the theory of avoiding OOTA without emitting extraneous instructions is quite simple: Avoid breaking semanitic dependencies. Given that, as you say, incorrect translation is not an option, it should not be hard to satisfy oneself that a correct compiler must avoid breaking semantic dependencies, at least for compilers that do not indulge in value speculation, which would be a very brave indulgence. Or, alternatively, that any such breaking constitutes a compiler bug. But when this approach was put forward about a decade ago, compiler writers were quite resistant. Plus they argued that concurrency was a niche use case, which might be a less convincing argument these days. > > If this goes through and if developers see any overhead from relaxed > > atomics in a situation that matters to them, they will reach for some > > other tool. Inline assembly and volatile accesses, I suppose. Or the > > traditional approach of a compiler flag. > > And I understand you want the standards to be more useful for the kernel > concurrency model? Why, exactly? I can think of many reasons, but I'm > a bit lost as to what motivates you here :-) My hope is that a number of useful and efficient concurrent-code idioms can be implemented with help from the compiler, in the Linux kernel and elsewhere. Or failing that, at least with less resistance on the part of the compiler. Here is an incomplete list: 1. When it is necessary for fast-path code to load from a shared variable, it should be possible to use a single normal load instruction for this purpose. Our current conversation touches on this point. 2. Address dependencies extending to loads do not generate OOTA, but are another example of this "just use a normal load" issue. Address dependencies need more careful handling because compilers can and do convert them to control dependencies. Which is why the Linux-kernel advice in rcu_dereference.rst is to avoid using integers for address dependencies, and most especially to avoid using booleans for this. But it would be nice to be able to tell the compiler that a given variable, function parameter, and/or return value carried a dependency. At a minimum, it would be nice if the compiler could complain if that dependency would be broken. 3. It should be possible to implement 50-year-old concurrent algorithms straightforwardly (LIFO push stack being the poster boy here [1]). Right now, pointer provenance gets in the way of this. On the other hand, clang/LLVM does quite well without pointer provenance, so one has to wonder just how important pointer-provenance-based optimizations really are. Hans Boehm, were he looking over my shoulder, would add that there are a number of single-threaded algorithms that are made unnecessarily complex due pointer provenance issues [3]. Hans and I often come down on opposite sides of concurrency issues, so in cases like this one where we do agree, you just might want to pay attention. ;-) Anthony Williams goes further, arguing that pointers should at least sometimes just be bags of bits, that is, that the role of pointer provenance should be greatly reduced or even eliminated [3]. 4. Semantics of volatile. Perhaps the current state is the best that can be hoped for, but given that the current state is a few vague words in the standard in combination with the fact that C-language device drivers must be able to use volatile to reliably and concurrently access memory shared with device firmware, one would hope for better. 5. UB on signed integer overflow. Right now, the kernel just forces wrapping, which works, so maybe we don't really care all that much. But at this point, it seems to me that it was a mistake for the language to have failed to provide a means of specifying signed integers that wrap (int_wrap?). (Yes, yes, you can get them by making an atomic signed int, but that is not exactly an ergonomic workaround.) Is this really too much to ask? If so, why? The Linux-kernel memory model and associated coding guidelines form a way of achieving the items on this list with the current level of cooperation from the compilers, or perhaps more accurately, current level of lack of cooperation. The thing is that C11 and C++11 did pretty much the bare minimum to support concurrency. After all, when that effort started back in 2005, concurrency really was a niche use case. It is only reasonable to expect a few adjustments almost 20 years on. Hey, you asked!!! ;-) Thanx, Paul [1] https://docs.google.com/document/d/12paeC4suYAmVZlQvqytiCjGmEeGdJn0X7oM0SRLMALk/edit [2] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1726r5.pdf [3] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2188r1.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-11-07 16:44 ` Paul E. McKenney @ 2023-11-09 16:25 ` Jonas Oberhauser 2023-11-09 18:24 ` Boqun Feng 0 siblings, 1 reply; 10+ messages in thread From: Jonas Oberhauser @ 2023-11-09 16:25 UTC (permalink / raw) To: paulmck, Segher Boessenkool Cc: Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains, boqun.feng, davidtgoldblatt Am 11/7/2023 um 5:44 PM schrieb Paul E. McKenney: > On Tue, Nov 07, 2023 at 03:57:45AM -0600, Segher Boessenkool wrote: >> On Mon, Nov 06, 2023 at 06:16:24PM -0800, Paul E. McKenney wrote: >> >>> I am very much against incurring real overhead to solve an issue that is >>> an issue only in theory and not in practice. I wish I could confidently >>> say that my view will prevail, but... Where to complain to to increase the chance of your view prevailing? >> If what you >> are writing has a constrained scope, or a limited impact, or both, you >> can ignore this "we'll deal with it later if/when it shows up". But a >> compiler does not have that luxury at all: it has to make correct >> translations from source code to assembler code (or machine code >> directly, for some compilers), or refuse to compile something. Making >> an incorrect translation is not an option. > But in this case, it would be most excellent if compiler practice were > to follow theory. Because the theory of avoiding OOTA without emitting > extraneous instructions is quite simple: > > Avoid breaking semanitic dependencies. The problem with that is that C, unlike LKMM, doesn't use a ppo-based model. So there's no notion of "breaking a dependency" that could be applied to define what it means "not to break" semantic dependencies. But the solution is actually simpler. Just add the axiom (proposed by Viktor): There are no OOTA behaviors > 4. Semantics of volatile. Perhaps the current state is the best > that can be hoped for, but given that the current state is a > few vague words in the standard in combination with the fact > that C-language device drivers must be able to use volatile > to reliably and concurrently access memory shared with device > firmware, one would hope for better. Is it really so bad? I think the definition in the manual is quite precise, if confusing. (volatiles are visible side effects and must therefore have the same program order in the abstract machine and in the implementation, and that's pretty much it). There should just be a large explanatory note about what it implies and what it doesn't imply. > > 5. UB on signed integer overflow. Right now, the kernel just > forces wrapping, which works, so maybe we don't really care > all that much. But at this point, it seems to me that it was a > mistake for the language to have failed to provide a means of > specifying signed integers that wrap (int_wrap?). (Yes, yes, > you can get them by making an atomic signed int, but that is > not exactly an ergonomic workaround.) What I don't understand is why they didn't make signed integer types wrap when they became two's complement. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-11-09 16:25 ` Jonas Oberhauser @ 2023-11-09 18:24 ` Boqun Feng 2023-11-09 20:09 ` Paul E. McKenney 0 siblings, 1 reply; 10+ messages in thread From: Boqun Feng @ 2023-11-09 18:24 UTC (permalink / raw) To: Jonas Oberhauser Cc: paulmck, Segher Boessenkool, Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains, davidtgoldblatt On Thu, Nov 09, 2023 at 05:25:05PM +0100, Jonas Oberhauser wrote: [...] > > 4. Semantics of volatile. Perhaps the current state is the best > > that can be hoped for, but given that the current state is a > > few vague words in the standard in combination with the fact > > that C-language device drivers must be able to use volatile > > to reliably and concurrently access memory shared with device > > firmware, one would hope for better. > > > Is it really so bad? I think the definition in the manual is quite precise, > if confusing. (volatiles are visible side effects and must therefore have > the same program order in the abstract machine and in the implementation, > and that's pretty much it). > But I don't think there is any mention on whether current volatile accesses can be excluded from data race, or whether a volatile access on a machine-word size natually aligned object can be teared or not. Regards, Boqun > There should just be a large explanatory note about what it implies and what > it doesn't imply. > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion 2023-11-09 18:24 ` Boqun Feng @ 2023-11-09 20:09 ` Paul E. McKenney 0 siblings, 0 replies; 10+ messages in thread From: Paul E. McKenney @ 2023-11-09 20:09 UTC (permalink / raw) To: Boqun Feng Cc: Jonas Oberhauser, Segher Boessenkool, Peter Zijlstra, j.alglave, will, catalin.marinas, linux, mpe, npiggin, palmer, parri.andrea, linux-kernel, linux-toolchains, davidtgoldblatt On Thu, Nov 09, 2023 at 10:24:23AM -0800, Boqun Feng wrote: > On Thu, Nov 09, 2023 at 05:25:05PM +0100, Jonas Oberhauser wrote: > [...] > > > 4. Semantics of volatile. Perhaps the current state is the best > > > that can be hoped for, but given that the current state is a > > > few vague words in the standard in combination with the fact > > > that C-language device drivers must be able to use volatile > > > to reliably and concurrently access memory shared with device > > > firmware, one would hope for better. > > > > > > Is it really so bad? I think the definition in the manual is quite precise, > > if confusing. (volatiles are visible side effects and must therefore have > > the same program order in the abstract machine and in the implementation, > > and that's pretty much it). > > But I don't think there is any mention on whether current volatile > accesses can be excluded from data race, or whether a volatile access > on a machine-word size natually aligned object can be teared or not. Here is my understanding: It must be possible to write C-language device drivers for devices that... 1. read and write to normal memory. If this device has C-langugae firmware, volatile reads and writes involving aligned machine-word-sized locations must not invoke undefined behavior. 2. allow concurrent reads and writes to MMIO registers (or to normal memory). Even ignoring the device firmware, volatile reads and writes involving aligned machine-word-sized locations must not invoke undefined behavior. Not necessarily a popular view, but then again, in my experience, objective reality never has been trying to win a popularity contest. Thanx, Paul > Regards, > Boqun > > > There should just be a large explanatory note about what it implies and what > > it doesn't imply. > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-09 20:09 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-10-27 21:08 Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Paul E. McKenney 2023-11-03 17:02 ` Alglave, Jade 2023-11-04 18:20 ` Jonas Oberhauser 2023-11-05 23:08 ` Fw: " Peter Zijlstra 2023-11-07 2:16 ` Paul E. McKenney 2023-11-07 9:57 ` Segher Boessenkool 2023-11-07 16:44 ` Paul E. McKenney 2023-11-09 16:25 ` Jonas Oberhauser 2023-11-09 18:24 ` Boqun Feng 2023-11-09 20:09 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).