* Compilers and RCU readers: Once more unto the breach! @ 2015-05-20 0:55 Paul E. McKenney 2015-05-20 1:57 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 0:55 UTC (permalink / raw) To: linux-kernel, c++std-parallel, linux-arch, gcc Cc: Peter.Sewell, torvalds, Mark.Batty, peterz, will.deacon, Ramana.Radhakrishnan, dhowells, akpm, mingo, michaelw Hello! Following up on last year's discussion (https://lwn.net/Articles/586838/, https://lwn.net/Articles/588300/), I believe that we have a solution. If I am wrong, I am sure you all will let me know, and in great detail. ;-) The key simplification is to "just say no" to RCU-protected array indexes: https://lkml.org/lkml/2015/5/12/827, as was suggested by several people. This simplification means that rcu_dereference (AKA memory_order_consume) need only return pointers. This in ture avoids things like (x-x), (x*0), and (x%1) because if "x" is a pointer, these expressions either return non-pointers are compilation errors. With a very few exceptions, dependency chains can lead -to- non-pointers, but cannot pass -through- them. The result is that dependencies are carried only by operations for which the compiler cannot easily optimize the away those dependencies, these operations including simple assignment, integer offset (including indexing), dereferencing, casts, passing as a function argument, return values from functions and so on. A complete list with commentary starts on page 28 of: http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf Dependency chains are broken if a pointer compares equal to some other pointer not part of the same dependency chain, if too many bits are ORed onto or ANDed off of a intptr_t or uintptr_t, or if the dependency is explicitly killed (which should now strictly speaking never be necessary, but which might allow better diagnostics). These are set out in more detail on page 30 of the above PDF. This covers all the uses in the Linux kernel that I am aware of without any source-code changes (other than to the rcu_dereference() primitives themselves) and should also work for compilers and standards. Thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 0:55 Compilers and RCU readers: Once more unto the breach! Paul E. McKenney @ 2015-05-20 1:57 ` Linus Torvalds 2015-05-20 2:10 ` Linus Torvalds 2015-05-20 2:34 ` Paul E. McKenney 2015-05-26 17:08 ` [c++std-parallel-1611] " Torvald Riegel 2015-07-14 0:44 ` Paul E. McKenney 2 siblings, 2 replies; 44+ messages in thread From: Linus Torvalds @ 2015-05-20 1:57 UTC (permalink / raw) To: Paul McKenney Cc: Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Tue, May 19, 2015 at 5:55 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf >From a very quick read-through, the restricted dependency chain in 7.9 seems to be reasonable, and essentially covers "thats' what hardware gives us anyway", making compiler writers happy. I would clarify the language somewhat: - it says that the result of a cast of a pointer is a dependency. You need to make an exception for casting to bool, methinks (because that's effectively just a test-against-NULL, which you later describe as terminating the dependency). Maybe get rid of the "any type", and just limit it to casts to types of size intptr_t, ie ones that don't drop significant bits. All the other rules talk about [u]intptr_t anyway. - you clarify that the trivial "& 0" and "| ~0" kill the dependency chain, but if you really want to be a stickler, you might want to extend it to a few more cases. Things like "& 1" (to extract a tag from the lot bit of a tagged pointer) should likely also drop the dependency, since a compiler would commonly end up using the end result as a conditional even if the code was written to then use casting etc to look like a dereference. - the "you can add/subtract integral values" still opens you up to language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the dependency, which it clearly doesn't. But language-lawyering it does, since all those operations (cast to pointer, cast to integer, subtracting an integer) claim to be dependency-preserving operations. So I think you want to limit the logical operators to things that don't mask off too many bits, and you should probably limit the add/subtract operations some way (maybe specify that the integer value you add/subtract cannot be related to the pointer). But I think limiting it to mostly pointer ops (and a _few_ integer operations to do offsets and remove tag bits) is otherwise a good approach. Linus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 1:57 ` Linus Torvalds @ 2015-05-20 2:10 ` Linus Torvalds 2015-05-20 2:41 ` Paul E. McKenney 2015-05-20 2:34 ` Paul E. McKenney 1 sibling, 1 reply; 44+ messages in thread From: Linus Torvalds @ 2015-05-20 2:10 UTC (permalink / raw) To: Paul McKenney Cc: Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > - the "you can add/subtract integral values" still opens you up to > language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > dependency, which it clearly doesn't. But language-lawyering it does, > since all those operations (cast to pointer, cast to integer, > subtracting an integer) claim to be dependency-preserving operations. > > So I think you want to limit the logical operators to things that > don't mask off too many bits, and you should probably limit the > add/subtract operations some way (maybe specify that the integer value > you add/subtract cannot be related to the pointer). Actually, "not related" doesn't work. For some buddy allocator thing, you very much might want some of the bits to be related. So I think you're better off just saying that operations designed to drop significant bits break the dependency chain, and give things like "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such. Making that just an extension of your existing "& 0" language would seem to be natural. Humans will understand, and compiler writers won't care. They will either depend on hardware semantics anyway (and argue that your language is tight enough that they don't need to do anything special) or they will turn the consume into an acquire (on platforms that have too weak hardware). Linus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 2:10 ` Linus Torvalds @ 2015-05-20 2:41 ` Paul E. McKenney 2015-05-20 11:47 ` Will Deacon 0 siblings, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 2:41 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Tue, May 19, 2015 at 07:10:12PM -0700, Linus Torvalds wrote: > On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > - the "you can add/subtract integral values" still opens you up to > > language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > > dependency, which it clearly doesn't. But language-lawyering it does, > > since all those operations (cast to pointer, cast to integer, > > subtracting an integer) claim to be dependency-preserving operations. > > > > So I think you want to limit the logical operators to things that > > don't mask off too many bits, and you should probably limit the > > add/subtract operations some way (maybe specify that the integer value > > you add/subtract cannot be related to the pointer). > > Actually, "not related" doesn't work. For some buddy allocator thing, > you very much might want some of the bits to be related. Good point, you could do the buddy-allocator computations with add and subtract instead of exclusive OR. > So I think you're better off just saying that operations designed to > drop significant bits break the dependency chain, and give things like > "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such. > > Making that just an extension of your existing "& 0" language would > seem to be natural. Works for me! I added the following bullet to the list of things that break dependencies: If a pointer is part of a dependency chain, and if the values added to or subtracted from that pointer cancel the pointer value so as to allow the compiler to precisely determine the resulting value, then the resulting value will not be part of any dependency chain. For example, if p is part of a dependency chain, then ((char *)p-(uintptr_t)p)+65536 will not be. Seem reasonable? > Humans will understand, and compiler writers won't care. They will > either depend on hardware semantics anyway (and argue that your > language is tight enough that they don't need to do anything special) > or they will turn the consume into an acquire (on platforms that have > too weak hardware). Agreed. Plus Core Working Group will hammer out the exact wording, should this approach meet their approval. Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 2:41 ` Paul E. McKenney @ 2015-05-20 11:47 ` Will Deacon 2015-05-20 12:15 ` Paul E. McKenney 2015-05-20 13:18 ` David Howells 0 siblings, 2 replies; 44+ messages in thread From: Will Deacon @ 2015-05-20 11:47 UTC (permalink / raw) To: Paul E. McKenney Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw Hi Paul, On Wed, May 20, 2015 at 03:41:48AM +0100, Paul E. McKenney wrote: > On Tue, May 19, 2015 at 07:10:12PM -0700, Linus Torvalds wrote: > > On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > So I think you're better off just saying that operations designed to > > drop significant bits break the dependency chain, and give things like > > "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such. > > > > Making that just an extension of your existing "& 0" language would > > seem to be natural. > > Works for me! I added the following bullet to the list of things > that break dependencies: > > If a pointer is part of a dependency chain, and if the values > added to or subtracted from that pointer cancel the pointer > value so as to allow the compiler to precisely determine the > resulting value, then the resulting value will not be part of > any dependency chain. For example, if p is part of a dependency > chain, then ((char *)p-(uintptr_t)p)+65536 will not be. > > Seem reasonable? Whilst I understand what you're saying (the ARM architecture makes these sorts of distinctions when calling out dependency-based ordering), it feels like we're dangerously close to defining the difference between a true and a false dependency. If we want to do this in the context of the C language specification, you run into issues because you need to evaluate the program in order to determine data values in order to determine the nature of the dependency. You tackle this above by saying "to allow the compiler to precisely determine the resulting value", but I can't see how that can be cleanly fitted into something like the C language specification. Even if it can, then we'd need to reword the "?:" treatment that you currently have: "If a pointer is part of a dependency chain, and that pointer appears in the entry of a ?: expression selected by the condition, then the chain extends to the result." which I think requires the state of the condition to be known statically if we only want to extend the chain from the selected expression. In the general case, wouldn't a compiler have to assume that the chain is extended from both? Additionally, what about the following code? char *x = y ? z : z; Does that extend a dependency chain from z to x? If so, I can imagine a CPU breaking that in practice. > > Humans will understand, and compiler writers won't care. They will > > either depend on hardware semantics anyway (and argue that your > > language is tight enough that they don't need to do anything special) > > or they will turn the consume into an acquire (on platforms that have > > too weak hardware). > > Agreed. Plus Core Working Group will hammer out the exact wording, > should this approach meet their approval. For the avoidance of doubt, I'm completely behind any attempts to tackle this problem, but I anticipate an uphill struggle getting this text into the C standard. Is your intention to change the carries-a-dependency relation to encompass this change? Cheers, Will ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 11:47 ` Will Deacon @ 2015-05-20 12:15 ` Paul E. McKenney 2015-05-20 15:46 ` Will Deacon 2015-05-20 13:18 ` David Howells 1 sibling, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 12:15 UTC (permalink / raw) To: Will Deacon Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 12:47:45PM +0100, Will Deacon wrote: > Hi Paul, > > On Wed, May 20, 2015 at 03:41:48AM +0100, Paul E. McKenney wrote: > > On Tue, May 19, 2015 at 07:10:12PM -0700, Linus Torvalds wrote: > > > On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds > > > <torvalds@linux-foundation.org> wrote: > > > So I think you're better off just saying that operations designed to > > > drop significant bits break the dependency chain, and give things like > > > "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such. > > > > > > Making that just an extension of your existing "& 0" language would > > > seem to be natural. > > > > Works for me! I added the following bullet to the list of things > > that break dependencies: > > > > If a pointer is part of a dependency chain, and if the values > > added to or subtracted from that pointer cancel the pointer > > value so as to allow the compiler to precisely determine the > > resulting value, then the resulting value will not be part of > > any dependency chain. For example, if p is part of a dependency > > chain, then ((char *)p-(uintptr_t)p)+65536 will not be. > > > > Seem reasonable? > > Whilst I understand what you're saying (the ARM architecture makes these > sorts of distinctions when calling out dependency-based ordering), it > feels like we're dangerously close to defining the difference between a > true and a false dependency. If we want to do this in the context of the > C language specification, you run into issues because you need to evaluate > the program in order to determine data values in order to determine the > nature of the dependency. Indeed, something like this does -not- carry a dependency from the memory_order_consume load to q: char *p, q; p = atomic_load_explicit(&gp, memory_order_consume); q = gq + (intptr_t)p - (intptr_t)p; If this was compiled with -O0, ARM and Power might well carry a dependency, but given any optimization, the assembly language would have no hint of any such dependency. So I am not seeing any particular danger. > You tackle this above by saying "to allow the compiler to precisely > determine the resulting value", but I can't see how that can be cleanly > fitted into something like the C language specification. I am sure that there will be significant rework from where this document is to language appropriate from the standard. Which is why I am glad that Jens is taking an interest in this, as he is particularly good at producing standards language. > Even if it can, > then we'd need to reword the "?:" treatment that you currently have: > > "If a pointer is part of a dependency chain, and that pointer appears > in the entry of a ?: expression selected by the condition, then the > chain extends to the result." > > which I think requires the state of the condition to be known statically > if we only want to extend the chain from the selected expression. In the > general case, wouldn't a compiler have to assume that the chain is > extended from both? In practice, yes, if the compiler cannot determine which expression is selected, it must arrange for the dependency to be carried from either, depending on the run-time value of the condition. But you would have to work pretty hard to create code that did not carry the dependencies as require, not? > Additionally, what about the following code? > > char *x = y ? z : z; > > Does that extend a dependency chain from z to x? If so, I can imagine a > CPU breaking that in practice. I am not seeing this. I would expect the compiler to optimize to something like this: char *x = z; How does this avoid carrying the dependency? Or are you saying that ARM loses the dependency via a store to memory and a later reload? That would be a bit surprising... > > > Humans will understand, and compiler writers won't care. They will > > > either depend on hardware semantics anyway (and argue that your > > > language is tight enough that they don't need to do anything special) > > > or they will turn the consume into an acquire (on platforms that have > > > too weak hardware). > > > > Agreed. Plus Core Working Group will hammer out the exact wording, > > should this approach meet their approval. > > For the avoidance of doubt, I'm completely behind any attempts to tackle > this problem, but I anticipate an uphill struggle getting this text into > the C standard. Is your intention to change the carries-a-dependency > relation to encompass this change? I completely agree that this won't be easy, but this is the task at hand. And yes, the intent is to change carries-a-dependency, given that the current wording isn't helping anything. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 12:15 ` Paul E. McKenney @ 2015-05-20 15:46 ` Will Deacon 2015-05-20 15:54 ` Andrew Haley 2015-05-20 18:16 ` Paul E. McKenney 0 siblings, 2 replies; 44+ messages in thread From: Will Deacon @ 2015-05-20 15:46 UTC (permalink / raw) To: Paul E. McKenney Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 01:15:22PM +0100, Paul E. McKenney wrote: > On Wed, May 20, 2015 at 12:47:45PM +0100, Will Deacon wrote: > > On Wed, May 20, 2015 at 03:41:48AM +0100, Paul E. McKenney wrote: > > > If a pointer is part of a dependency chain, and if the values > > > added to or subtracted from that pointer cancel the pointer > > > value so as to allow the compiler to precisely determine the > > > resulting value, then the resulting value will not be part of > > > any dependency chain. For example, if p is part of a dependency > > > chain, then ((char *)p-(uintptr_t)p)+65536 will not be. > > > > > > Seem reasonable? > > > > Whilst I understand what you're saying (the ARM architecture makes these > > sorts of distinctions when calling out dependency-based ordering), it > > feels like we're dangerously close to defining the difference between a > > true and a false dependency. If we want to do this in the context of the > > C language specification, you run into issues because you need to evaluate > > the program in order to determine data values in order to determine the > > nature of the dependency. > > Indeed, something like this does -not- carry a dependency from the > memory_order_consume load to q: > > char *p, q; > > p = atomic_load_explicit(&gp, memory_order_consume); > q = gq + (intptr_t)p - (intptr_t)p; > > If this was compiled with -O0, ARM and Power might well carry a > dependency, but given any optimization, the assembly language would have > no hint of any such dependency. So I am not seeing any particular danger. The above is a welcome relaxation over C11, since ARM doesn't even give you ordering based off false data dependencies. My concern is more to do with how this can be specified precisely without prohibing honest compiler and hardware optimisations. Out of interest, how do you tackle examples (4) and (5) of (assuming the reads are promoted to consume loads)?: http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html my understanding is that you permit both outcomes (I appreciate you're not directly tackling out-of-thin-air, but treatment of dependencies is heavily related). > > You tackle this above by saying "to allow the compiler to precisely > > determine the resulting value", but I can't see how that can be cleanly > > fitted into something like the C language specification. > > I am sure that there will be significant rework from where this document > is to language appropriate from the standard. Which is why I am glad > that Jens is taking an interest in this, as he is particularly good at > producing standards language. Ok. I'm curious to see how that comes along. > > Even if it can, > > then we'd need to reword the "?:" treatment that you currently have: > > > > "If a pointer is part of a dependency chain, and that pointer appears > > in the entry of a ?: expression selected by the condition, then the > > chain extends to the result." > > > > which I think requires the state of the condition to be known statically > > if we only want to extend the chain from the selected expression. In the > > general case, wouldn't a compiler have to assume that the chain is > > extended from both? > > In practice, yes, if the compiler cannot determine which expression is > selected, it must arrange for the dependency to be carried from either, > depending on the run-time value of the condition. But you would have > to work pretty hard to create code that did not carry the dependencies > as require, not? I'm not sure... you'd require the compiler to perform static analysis of loops to determine the state of the machine when they exit (if they exit!) in order to show whether or not a dependency is carried to subsequent operations. If it can't prove otherwise, it would have to assume that a dependency *is* carried, and it's not clear to me how it would use this information to restrict any subsequent dependency removing optimisations. I guess that's one for the GCC folks. > > Additionally, what about the following code? > > > > char *x = y ? z : z; > > > > Does that extend a dependency chain from z to x? If so, I can imagine a > > CPU breaking that in practice. > > I am not seeing this. I would expect the compiler to optimize to > something like this: > > char *x = z; > > How does this avoid carrying the dependency? Or are you saying that > ARM loses the dependency via a store to memory and a later reload? > That would be a bit surprising... I was thinking that the compiler would have to preserve the conditional structure so that the dependency chain could be tracked correctly, but if it can just assume that the dependency is carried regardless of y then I agree that it doesn't matter for this code. All the CPU could do is remove the conditional hazard. > > > > Humans will understand, and compiler writers won't care. They will > > > > either depend on hardware semantics anyway (and argue that your > > > > language is tight enough that they don't need to do anything special) > > > > or they will turn the consume into an acquire (on platforms that have > > > > too weak hardware). > > > > > > Agreed. Plus Core Working Group will hammer out the exact wording, > > > should this approach meet their approval. > > > > For the avoidance of doubt, I'm completely behind any attempts to tackle > > this problem, but I anticipate an uphill struggle getting this text into > > the C standard. Is your intention to change the carries-a-dependency > > relation to encompass this change? > > I completely agree that this won't be easy, but this is the task at hand. > And yes, the intent is to change carries-a-dependency, given that the > current wording isn't helping anything. ;-) Agreed on that! Will ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 15:46 ` Will Deacon @ 2015-05-20 15:54 ` Andrew Haley 2015-05-20 18:16 ` [c++std-parallel-1632] " Paul E. McKenney 2015-05-20 18:16 ` Paul E. McKenney 1 sibling, 1 reply; 44+ messages in thread From: Andrew Haley @ 2015-05-20 15:54 UTC (permalink / raw) To: Will Deacon, Paul E. McKenney Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On 05/20/2015 04:46 PM, Will Deacon wrote: > I'm not sure... you'd require the compiler to perform static analysis of > loops to determine the state of the machine when they exit (if they exit!) > in order to show whether or not a dependency is carried to subsequent > operations. If it can't prove otherwise, it would have to assume that a > dependency *is* carried, and it's not clear to me how it would use this > information to restrict any subsequent dependency removing optimisations. It'd just convert consume to acquire. Andrew. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1632] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 15:54 ` Andrew Haley @ 2015-05-20 18:16 ` Paul E. McKenney 2015-05-21 14:22 ` Michael Matz 0 siblings, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 18:16 UTC (permalink / raw) To: c++std-parallel Cc: Will Deacon, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 04:54:51PM +0100, Andrew Haley wrote: > On 05/20/2015 04:46 PM, Will Deacon wrote: > > I'm not sure... you'd require the compiler to perform static analysis of > > loops to determine the state of the machine when they exit (if they exit!) > > in order to show whether or not a dependency is carried to subsequent > > operations. If it can't prove otherwise, it would have to assume that a > > dependency *is* carried, and it's not clear to me how it would use this > > information to restrict any subsequent dependency removing optimisations. > > It'd just convert consume to acquire. It should not need to, actually. Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1632] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 18:16 ` [c++std-parallel-1632] " Paul E. McKenney @ 2015-05-21 14:22 ` Michael Matz 2015-05-21 15:10 ` Paul E. McKenney 0 siblings, 1 reply; 44+ messages in thread From: Michael Matz @ 2015-05-21 14:22 UTC (permalink / raw) To: Paul E. McKenney Cc: c++std-parallel, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw Hi, On Wed, 20 May 2015, Paul E. McKenney wrote: > > > I'm not sure... you'd require the compiler to perform static analysis of > > > loops to determine the state of the machine when they exit (if they exit!) > > > in order to show whether or not a dependency is carried to subsequent > > > operations. If it can't prove otherwise, it would have to assume that a > > > dependency *is* carried, and it's not clear to me how it would use this > > > information to restrict any subsequent dependency removing optimisations. > > > > It'd just convert consume to acquire. > > It should not need to, actually. [with GCC hat, and having only lightly read your document] Then you need to provide language or at least informal reasons why the compiler is allowed to not do that. Without that a compiler would have to be conservative, if it can't _prove_ that a dependency chain is stopped, then it has to assume it hasn't. For instance I can't really make out easily what your document says about the following simple situation (well, actually I have difficulties to differ between what you're proposing as the good-new model of this all, and what you're merely describing as different current states of affair): char * fancy_assign (char *in) { return in; } ... char *x, *y; x = atomic_load_explicit(p, memory_order_consume); y = fancy_assign (x); atomic_store_explicit(q, y, memory_order_relaxed); So, is there, or is there not a dependency carried from x to y in your proposed model (and which rule in your document states so)? Clearly, without any other language the compiler would have to assume that there is (because the equivalent 'y = x' assignment would carry the dependency). If it has to assume this, then the whole model is not going to work very well, as usual with models that assume a certain less-optimal fact ("carries-dep" is less optimal for code generation purposes that "not-carries-dep") unless very specific circumstances say it can be ignored. Ciao, Michael. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1632] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 14:22 ` Michael Matz @ 2015-05-21 15:10 ` Paul E. McKenney 2015-05-21 16:17 ` Michael Matz 0 siblings, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-05-21 15:10 UTC (permalink / raw) To: Michael Matz Cc: c++std-parallel, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw@ca.ibm.com On Thu, May 21, 2015 at 04:22:38PM +0200, Michael Matz wrote: > Hi, > > On Wed, 20 May 2015, Paul E. McKenney wrote: > > > > > I'm not sure... you'd require the compiler to perform static analysis of > > > > loops to determine the state of the machine when they exit (if they exit!) > > > > in order to show whether or not a dependency is carried to subsequent > > > > operations. If it can't prove otherwise, it would have to assume that a > > > > dependency *is* carried, and it's not clear to me how it would use this > > > > information to restrict any subsequent dependency removing optimisations. > > > > > > It'd just convert consume to acquire. > > > > It should not need to, actually. > > [with GCC hat, and having only lightly read your document] Understood. > Then you need to provide language or at least informal reasons why the > compiler is allowed to not do that. Without that a compiler would have to > be conservative, if it can't _prove_ that a dependency chain is stopped, > then it has to assume it hasn't. > > For instance I can't really make out easily what your document says about > the following simple situation (well, actually I have difficulties to > differ between what you're proposing as the good-new model of this all, > and what you're merely describing as different current states of affair): The point is -exactly- to codify the current state of affairs. I expect a follow-on effort to specify some sort of marking regimen, as noted in the last paragraph of 7.9 and as discussed with Torvald Riegel. However, given that there are not yet any implementations or practical experience with such markings, I suspect that some time will be required to hammer out a good marking scheme. > char * fancy_assign (char *in) { return in; } > ... > char *x, *y; > > x = atomic_load_explicit(p, memory_order_consume); > y = fancy_assign (x); > atomic_store_explicit(q, y, memory_order_relaxed); > > So, is there, or is there not a dependency carried from x to y in your > proposed model (and which rule in your document states so)? Clearly, > without any other language the compiler would have to assume that there is > (because the equivalent 'y = x' assignment would carry the dependency). The dependency is not carried, though this is due to the current set of rules not covering atomic loads and stores, which I need to fix. Here is the sequence of events: o A memory_order_consume load heads a dependency chain. o Rule 2 says that if a value is part of a dependency chain and is used as the right-hand side of an assignment operator, the expression extends the chain to cover the assignment. And I switched to numbered bullet items here: http://www2.rdrop.com/users/paulmck/RCU/consume.2015.05.21a.pdf o Rule 14 says that if a value is part of a dependency chain and is used as the actual parameter of a function call, then the dependency chain extends to the corresponding formal parameter, namely "in" of fancy_assign(). o Rule 15 says that if a value is part of a dependency chain and is returned from a function, then the dependency chain extends to the returned value in the calling function. o And you are right. I need to make the first and second rules cover the relaxed atomic operations, or at least atomic loads and stores. Not that this is an issue for existing Linux-kernel code. But given such a change, the new version of rule 2 would extend the dependency chain to cover the atomic_store_explicit(). > If it has to assume this, then the whole model is not going to work very > well, as usual with models that assume a certain less-optimal fact > ("carries-dep" is less optimal for code generation purposes that > "not-carries-dep") unless very specific circumstances say it can be > ignored. Although that is a good general rule of thumb, I do not believe that it applies to this situation, with the exception that I do indeed assume that no one is insane enough to do value-speculation optimizations for non-NULL values on loads from pointers. So what am I missing here? Do you have a specific example where the compiler would need to suppress a production-quality optimization? Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1632] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 15:10 ` Paul E. McKenney @ 2015-05-21 16:17 ` Michael Matz 2015-05-21 18:37 ` Paul E. McKenney 0 siblings, 1 reply; 44+ messages in thread From: Michael Matz @ 2015-05-21 16:17 UTC (permalink / raw) To: Paul E. McKenney Cc: c++std-parallel, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw@ca.ibm.com Hi, On Thu, 21 May 2015, Paul E. McKenney wrote: > The point is -exactly- to codify the current state of affairs. Ah, I see, so it's not yet about creating a more useful (for compilers, that is) model. > > char * fancy_assign (char *in) { return in; } > > ... > > char *x, *y; > > > > x = atomic_load_explicit(p, memory_order_consume); > > y = fancy_assign (x); > > atomic_store_explicit(q, y, memory_order_relaxed); > > > > So, is there, or is there not a dependency carried from x to y in your > > proposed model (and which rule in your document states so)? Clearly, > > without any other language the compiler would have to assume that there is > > (because the equivalent 'y = x' assignment would carry the dependency). > > The dependency is not carried, though this is due to the current set > of rules not covering atomic loads and stores, which I need to fix. Okay, so with the current regime(s), the dependency carries ... > o Rule 14 says that if a value is part of a dependency chain and > is used as the actual parameter of a function call, then the > dependency chain extends to the corresponding formal parameter, > namely "in" of fancy_assign(). > > o Rule 15 says that if a value is part of a dependency chain and > is returned from a function, then the dependency chain extends > to the returned value in the calling function. > > o And you are right. I need to make the first and second rules > cover the relaxed atomic operations, or at least atomic loads and > stores. Not that this is an issue for existing Linux-kernel code. > > But given such a change, the new version of rule 2 would > extend the dependency chain to cover the atomic_store_explicit(). ... (if this detail would be fixed). Okay, that's quite awful ... > > If it has to assume this, then the whole model is not going to work > > very well, as usual with models that assume a certain less-optimal > > fact ("carries-dep" is less optimal for code generation purposes that > > "not-carries-dep") unless very specific circumstances say it can be > > ignored. > > Although that is a good general rule of thumb, I do not believe that it > applies to this situation, with the exception that I do indeed assume > that no one is insane enough to do value-speculation optimizations for > non-NULL values on loads from pointers. > > So what am I missing here? ... because you are then missing that if "carries-dep" can flow through function calls from arguments to return values by default, the compiler has to assume this in fact always happens when it can't see the function body, or can't analyze it. In effect that's making the whole "carries-dep stops at these and those uses" a useless excercise because a malicious user (malicious in the sense of abusing the model to show that it's hindering optimizations), i.e. me, can hide all such carries-dep stopping effects inside a function, et voila, the dependecy carries through. So for a slightly more simple example: extern void *foo (void *); // body not available x = load y = foo (x); store (y) the compiler has to assume that there's a dep-chain from x to y; always. What's worse, it also has to assume a carries-dep for this: extern void foo (void *in, void **out1, void **out2); x = load foo (x, &o1, &o2); store (o1); store (o2); Now the compiler has to assume that the body of 'foo' is just mean enough to make the dep-chain carry from in to *out1 or *out2 (i.e. it has to assume that for both). This extends to _all_ memory accessible from foo's body, i.e. generally all global and all local address-taken variables, so as soon as you have a function call into which a dep-chain value flows you're creating a dep-chain extension from that value to each and every global piece of memory, because the compiler cannot assume that the black box called foo is not mean. This could conceivably be stopped by making normal stores not to carry the dependency; then only the return value might be infected; but I don't see that in your rules, as a "normal store" is just an assigment in your model and hence rules 1 and 2 apply (that is, carries-dep flows through all assignments, incl. loads and stores). Basically whenever you can construct black boxes for the compiler, you have to limit their effects on such transitive relations like carries-dep by default, at the border of such black boxes; otherwise that transitive relation quickly becomes an most-x-everything relation (i.e. mostthings carries-dep to everything), and as such is then totally useless, because such a universally filled relation (like an empty one) doesn't bear any interesting information, at which point it then is questionably why the compiler should jump through hoops to analyse the few cases that would be allowed to stop the carries-dep flow, when it more often than not have to give up anyway and generate slow code. > Do you have a specific example where the compiler would need to suppress > a production-quality optimization? I can't say; I haven't completely grokked all details of the different sub mem-models and their interaction with compiler optimizations, and when to convert consume to aquire. But I do know that transitive relations that carry a code generation cost (i.e. that when two entities are related you're limited in what you can do), let's call them infections :), have to be quite strictly limited in scope, otherwise they become useless. Real world examples that are quite terrible to get right, and get good code out of at the same time, is aliasing and effective-type of memory cells, which have some similar properties to the current carries-dep. Both concepts added to the language after-the-fact (to capture some assumptions that were used in practice, and that seemed sensible), but then in a way that weren't limiting their scope very well. For instance, if you follow the c++ language strictly, you can't assume that a anonymuous cell that was holding an int before a function call is still holding an int afterwards, even without any stores in between, because the function could use placement new to change the cells type. Guess how GCC deals with this? We've designed and redesigned our memory model to accomodate for this: It's conservatively assuming that crap-happens-here is part of most function calls (because in the real world, it indeed is the case that crap-happens-there) :) Ciao, Michael. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1632] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 16:17 ` Michael Matz @ 2015-05-21 18:37 ` Paul E. McKenney 0 siblings, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-21 18:37 UTC (permalink / raw) To: Michael Matz Cc: c++std-parallel, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw@ca.ibm.com On Thu, May 21, 2015 at 06:17:43PM +0200, Michael Matz wrote: > Hi, > > On Thu, 21 May 2015, Paul E. McKenney wrote: > > > The point is -exactly- to codify the current state of affairs. > > Ah, I see, so it's not yet about creating a more useful (for compilers, > that is) model. There are several approaches being considered for that as well, but we do need to codify current usage. > > > char * fancy_assign (char *in) { return in; } > > > ... > > > char *x, *y; > > > > > > x = atomic_load_explicit(p, memory_order_consume); > > > y = fancy_assign (x); > > > atomic_store_explicit(q, y, memory_order_relaxed); > > > > > > So, is there, or is there not a dependency carried from x to y in your > > > proposed model (and which rule in your document states so)? Clearly, > > > without any other language the compiler would have to assume that there is > > > (because the equivalent 'y = x' assignment would carry the dependency). > > > > The dependency is not carried, though this is due to the current set > > of rules not covering atomic loads and stores, which I need to fix. > > Okay, so with the current regime(s), the dependency carries ... Yes, that is the intent. > > o Rule 14 says that if a value is part of a dependency chain and > > is used as the actual parameter of a function call, then the > > dependency chain extends to the corresponding formal parameter, > > namely "in" of fancy_assign(). > > > > o Rule 15 says that if a value is part of a dependency chain and > > is returned from a function, then the dependency chain extends > > to the returned value in the calling function. > > > > o And you are right. I need to make the first and second rules > > cover the relaxed atomic operations, or at least atomic loads and > > stores. Not that this is an issue for existing Linux-kernel code. > > > > But given such a change, the new version of rule 2 would > > extend the dependency chain to cover the atomic_store_explicit(). > > ... (if this detail would be fixed). Okay, that's quite awful ... > > > > If it has to assume this, then the whole model is not going to work > > > very well, as usual with models that assume a certain less-optimal > > > fact ("carries-dep" is less optimal for code generation purposes that > > > "not-carries-dep") unless very specific circumstances say it can be > > > ignored. > > > > Although that is a good general rule of thumb, I do not believe that it > > applies to this situation, with the exception that I do indeed assume > > that no one is insane enough to do value-speculation optimizations for > > non-NULL values on loads from pointers. > > > > So what am I missing here? > > ... because you are then missing that if "carries-dep" can flow through > function calls from arguments to return values by default, the compiler > has to assume this in fact always happens when it can't see the function > body, or can't analyze it. In effect that's making the whole "carries-dep > stops at these and those uses" a useless excercise because a malicious > user (malicious in the sense of abusing the model to show that it's > hindering optimizations), i.e. me, can hide all such carries-dep stopping > effects inside a function, et voila, the dependecy carries through. So > for a slightly more simple example: > > extern void *foo (void *); // body not available > x = load > y = foo (x); > store (y) > > the compiler has to assume that there's a dep-chain from x to y; always. Yes, the compiler does have to make this assumption. And the intent behind the rules is to ensure that this assumption does not get in the way of reasonable optimizations. So although I am sure that you are as busy as the rest of us, I really do need you to go through the rules in detail before you get too much more excited about this. > What's worse, it also has to assume a carries-dep for this: > > extern void foo (void *in, void **out1, void **out2); > x = load > foo (x, &o1, &o2); > store (o1); > store (o2); > > Now the compiler has to assume that the body of 'foo' is just mean enough > to make the dep-chain carry from in to *out1 or *out2 (i.e. it has to > assume that for both). This extends to _all_ memory accessible from foo's > body, i.e. generally all global and all local address-taken variables, so > as soon as you have a function call into which a dep-chain value flows > you're creating a dep-chain extension from that value to each and every > global piece of memory, because the compiler cannot assume that the > black box called foo is not mean. This could conceivably be stopped by > making normal stores not to carry the dependency; then only the return > value might be infected; but I don't see that in your rules, as a "normal > store" is just an assigment in your model and hence rules 1 and 2 apply > (that is, carries-dep flows through all assignments, incl. loads and > stores). > > Basically whenever you can construct black boxes for the compiler, you > have to limit their effects on such transitive relations like carries-dep > by default, at the border of such black boxes; otherwise that transitive > relation quickly becomes an most-x-everything relation (i.e. > mostthings carries-dep to everything), and as such is then totally > useless, because such a universally filled relation (like an empty one) > doesn't bear any interesting information, at which point it then is > questionably why the compiler should jump through hoops to analyse the few > cases that would be allowed to stop the carries-dep flow, when it more > often than not have to give up anyway and generate slow code. Furthermore, in general, if the compiler loads a pointer (only a pointer, not any other type) from some random location, it needs to assume that the value loaded is part of a dependency chain. Again, the intent behind the rules is to make sure that the developers get the guarantees that they need from the compiler without getting in the way of reasonable optimizations. (Value speculation of pointer loads being the only example unreasonable optimization that I am aware of.) > > Do you have a specific example where the compiler would need to suppress > > a production-quality optimization? > > I can't say; I haven't completely grokked all details of the different sub > mem-models and their interaction with compiler optimizations, and when to > convert consume to aquire. But I do know that transitive relations that > carry a code generation cost (i.e. that when two entities are related > you're limited in what you can do), let's call them infections :), have to > be quite strictly limited in scope, otherwise they become useless. When I get the rules right (and I am sure that they need additional help), it should never be necessary to promote consume to acquire. > Real world examples that are quite terrible to get right, and get good > code out of at the same time, is aliasing and effective-type of memory > cells, which have some similar properties to the current carries-dep. > Both concepts added to the language after-the-fact (to capture some > assumptions that were used in practice, and that seemed sensible), but > then in a way that weren't limiting their scope very well. For instance, > if you follow the c++ language strictly, you can't assume that a > anonymuous cell that was holding an int before a function call is still > holding an int afterwards, even without any stores in between, because the > function could use placement new to change the cells type. Guess how GCC > deals with this? We've designed and redesigned our memory model to > accomodate for this: It's conservatively assuming that crap-happens-here > is part of most function calls (because in the real world, it indeed is > the case that crap-happens-there) :) Agreed, crap-happens-everywhere is a good approximation for real-world code. ;-) But again, I believe that the rules do what needs to be done, although some additional adjustments are no doubt needed. I have updated the document: http://www2.rdrop.com/users/paulmck/RCU/consume.2015.05.21b.pdf Section 7.9 starts on page 28. Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 15:46 ` Will Deacon 2015-05-20 15:54 ` Andrew Haley @ 2015-05-20 18:16 ` Paul E. McKenney 2015-05-21 19:24 ` Will Deacon 1 sibling, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 18:16 UTC (permalink / raw) To: Will Deacon Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 04:46:17PM +0100, Will Deacon wrote: > On Wed, May 20, 2015 at 01:15:22PM +0100, Paul E. McKenney wrote: > > On Wed, May 20, 2015 at 12:47:45PM +0100, Will Deacon wrote: > > > On Wed, May 20, 2015 at 03:41:48AM +0100, Paul E. McKenney wrote: > > > > If a pointer is part of a dependency chain, and if the values > > > > added to or subtracted from that pointer cancel the pointer > > > > value so as to allow the compiler to precisely determine the > > > > resulting value, then the resulting value will not be part of > > > > any dependency chain. For example, if p is part of a dependency > > > > chain, then ((char *)p-(uintptr_t)p)+65536 will not be. > > > > > > > > Seem reasonable? > > > > > > Whilst I understand what you're saying (the ARM architecture makes these > > > sorts of distinctions when calling out dependency-based ordering), it > > > feels like we're dangerously close to defining the difference between a > > > true and a false dependency. If we want to do this in the context of the > > > C language specification, you run into issues because you need to evaluate > > > the program in order to determine data values in order to determine the > > > nature of the dependency. > > > > Indeed, something like this does -not- carry a dependency from the > > memory_order_consume load to q: > > > > char *p, q; > > > > p = atomic_load_explicit(&gp, memory_order_consume); > > q = gq + (intptr_t)p - (intptr_t)p; > > > > If this was compiled with -O0, ARM and Power might well carry a > > dependency, but given any optimization, the assembly language would have > > no hint of any such dependency. So I am not seeing any particular danger. > > The above is a welcome relaxation over C11, since ARM doesn't even give > you ordering based off false data dependencies. My concern is more to do > with how this can be specified precisely without prohibing honest compiler > and hardware optimisations. That last is the challenge. I believe that I am pretty close, but I am sure that additional adjustment will be required. Especially given that we also need the memory model to be amenable to formal analysis. > Out of interest, how do you tackle examples (4) and (5) of (assuming the > reads are promoted to consume loads)?: > > http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html > > my understanding is that you permit both outcomes (I appreciate you're > not directly tackling out-of-thin-air, but treatment of dependencies > is heavily related). Let's see... #4 is as follows, given promotion to memory_order_consume and (I am guessing) memory_order_relaxed: r1 = atomic_load_explicit(&x, memory_order_consume); if (r1 == 42) atomic_store_explicit(&y, r1, memory_order_relaxed); ------------------------------------------------------ r2 = atomic_load_explicit(&y, memory_order_consume); if (r2 == 42) atomic_store_explicit(&x, 42, memory_order_relaxed); else atomic_store_explicit(&x, 42, memory_order_relaxed); The second thread does not have a proper control dependency, even with the memory_order_consume load because both branches assign the same value to "x". This means that the compiler is within its rights to optimize this into the following: r1 = atomic_load_explicit(&x, memory_order_consume); if (r1 == 42) atomic_store_explicit(&y, r1, memory_order_relaxed); ------------------------------------------------------ r2 = atomic_load_explicit(&y, memory_order_consume); atomic_store_explicit(&x, 42, memory_order_relaxed); There is no dependency between the second thread's pair of statements, so both the compiler and the CPU are within their rights to optimize further as follows: r1 = atomic_load_explicit(&x, memory_order_consume); if (r1 == 42) atomic_store_explicit(&y, r1, memory_order_relaxed); ------------------------------------------------------ atomic_store_explicit(&x, 42, memory_order_relaxed); r2 = atomic_load_explicit(&y, memory_order_consume); If the compiler makes this final optimization, even mythical SC hardware is within its rights to end up with (r1 == 42 && r2 == 42). Which is fine, as far as I am concerned. Or at least something that can be lived with. On to #5: r1 = atomic_load_explicit(&x, memory_order_consume); if (r1 == 42) atomic_store_explicit(&y, r1, memory_order_relaxed); ---------------------------------------------------- r2 = atomic_load_explicit(&y, memory_order_consume); if (r2 == 42) atomic_store_explicit(&x, 42, memory_order_relaxed); The first thread's accesses are dependency ordered. The second thread's ordering is in a corner case that memory-barriers.txt does not cover. You are supposed to start control dependencies with READ_ONCE_CTRL(), not a memory_order_consume load (AKA rcu_dereference and friends). However, Alpha would have a full barrier as part of the memory_order_consume load, and the rest of the processors would (one way or another) respect the control dependency. And the compiler would have some fun trying to break it. So the current Linux memory model would allow (r1 == 42 && r2 == 42), but I don't know of any hardware/compiler combination that would allow it. And no, I am -not- going to update memory-barriers.txt for this litmus test, its theoretical interest notwithstanding! ;-) In summary, both #4 and #5 would be allowed, as modified above. Seem reasonable? > > > You tackle this above by saying "to allow the compiler to precisely > > > determine the resulting value", but I can't see how that can be cleanly > > > fitted into something like the C language specification. > > > > I am sure that there will be significant rework from where this document > > is to language appropriate from the standard. Which is why I am glad > > that Jens is taking an interest in this, as he is particularly good at > > producing standards language. > > Ok. I'm curious to see how that comes along. Me too... > > > Even if it can, > > > then we'd need to reword the "?:" treatment that you currently have: > > > > > > "If a pointer is part of a dependency chain, and that pointer appears > > > in the entry of a ?: expression selected by the condition, then the > > > chain extends to the result." > > > > > > which I think requires the state of the condition to be known statically > > > if we only want to extend the chain from the selected expression. In the > > > general case, wouldn't a compiler have to assume that the chain is > > > extended from both? > > > > In practice, yes, if the compiler cannot determine which expression is > > selected, it must arrange for the dependency to be carried from either, > > depending on the run-time value of the condition. But you would have > > to work pretty hard to create code that did not carry the dependencies > > as require, not? > > I'm not sure... you'd require the compiler to perform static analysis of > loops to determine the state of the machine when they exit (if they exit!) > in order to show whether or not a dependency is carried to subsequent > operations. If it can't prove otherwise, it would have to assume that a > dependency *is* carried, and it's not clear to me how it would use this > information to restrict any subsequent dependency removing optimisations. > > I guess that's one for the GCC folks. The goal is to restrict the dependency carrying such that the compiler folks don't need to care. (Yeah, I know, great goal and all that...) > > > Additionally, what about the following code? > > > > > > char *x = y ? z : z; > > > > > > Does that extend a dependency chain from z to x? If so, I can imagine a > > > CPU breaking that in practice. > > > > I am not seeing this. I would expect the compiler to optimize to > > something like this: > > > > char *x = z; > > > > How does this avoid carrying the dependency? Or are you saying that > > ARM loses the dependency via a store to memory and a later reload? > > That would be a bit surprising... > > I was thinking that the compiler would have to preserve the conditional > structure so that the dependency chain could be tracked correctly, but > if it can just assume that the dependency is carried regardless of y then > I agree that it doesn't matter for this code. All the CPU could do is > remove the conditional hazard. OK, sounds like we are on the same page on this one. > > > > > Humans will understand, and compiler writers won't care. They will > > > > > either depend on hardware semantics anyway (and argue that your > > > > > language is tight enough that they don't need to do anything special) > > > > > or they will turn the consume into an acquire (on platforms that have > > > > > too weak hardware). > > > > > > > > Agreed. Plus Core Working Group will hammer out the exact wording, > > > > should this approach meet their approval. > > > > > > For the avoidance of doubt, I'm completely behind any attempts to tackle > > > this problem, but I anticipate an uphill struggle getting this text into > > > the C standard. Is your intention to change the carries-a-dependency > > > relation to encompass this change? > > > > I completely agree that this won't be easy, but this is the task at hand. > > And yes, the intent is to change carries-a-dependency, given that the > > current wording isn't helping anything. ;-) > > Agreed on that! ;-) ;-) ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 18:16 ` Paul E. McKenney @ 2015-05-21 19:24 ` Will Deacon 2015-05-21 20:02 ` Paul E. McKenney 0 siblings, 1 reply; 44+ messages in thread From: Will Deacon @ 2015-05-21 19:24 UTC (permalink / raw) To: Paul E. McKenney Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote: > On Wed, May 20, 2015 at 04:46:17PM +0100, Will Deacon wrote: > > On Wed, May 20, 2015 at 01:15:22PM +0100, Paul E. McKenney wrote: > > > Indeed, something like this does -not- carry a dependency from the > > > memory_order_consume load to q: > > > > > > char *p, q; > > > > > > p = atomic_load_explicit(&gp, memory_order_consume); > > > q = gq + (intptr_t)p - (intptr_t)p; > > > > > > If this was compiled with -O0, ARM and Power might well carry a > > > dependency, but given any optimization, the assembly language would have > > > no hint of any such dependency. So I am not seeing any particular danger. > > > > The above is a welcome relaxation over C11, since ARM doesn't even give > > you ordering based off false data dependencies. My concern is more to do > > with how this can be specified precisely without prohibing honest compiler > > and hardware optimisations. > > That last is the challenge. I believe that I am pretty close, but I am > sure that additional adjustment will be required. Especially given that > we also need the memory model to be amenable to formal analysis. Well, there's still the whole thin-air problem which unfortunately doesn't go away with your proposal... (I was hoping that differentiating between true and false dependencies would solve that, but your set of rules isn't broad enough and I don't blame you at all for that!). > > Out of interest, how do you tackle examples (4) and (5) of (assuming the > > reads are promoted to consume loads)?: > > > > http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html > > > > my understanding is that you permit both outcomes (I appreciate you're > > not directly tackling out-of-thin-air, but treatment of dependencies > > is heavily related). Thanks for taking the time to walk these two examples through. > Let's see... #4 is as follows, given promotion to memory_order_consume > and (I am guessing) memory_order_relaxed: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > ------------------------------------------------------ > r2 = atomic_load_explicit(&y, memory_order_consume); > if (r2 == 42) > atomic_store_explicit(&x, 42, memory_order_relaxed); > else > atomic_store_explicit(&x, 42, memory_order_relaxed); > > The second thread does not have a proper control dependency, even with > the memory_order_consume load because both branches assign the same > value to "x". This means that the compiler is within its rights to > optimize this into the following: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > ------------------------------------------------------ > r2 = atomic_load_explicit(&y, memory_order_consume); > atomic_store_explicit(&x, 42, memory_order_relaxed); > > There is no dependency between the second thread's pair of statements, > so both the compiler and the CPU are within their rights to optimize > further as follows: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > ------------------------------------------------------ > atomic_store_explicit(&x, 42, memory_order_relaxed); > r2 = atomic_load_explicit(&y, memory_order_consume); > > If the compiler makes this final optimization, even mythical SC hardware > is within its rights to end up with (r1 == 42 && r2 == 42). Which is > fine, as far as I am concerned. Or at least something that can be > lived with. Agreed. > On to #5: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > ---------------------------------------------------- > r2 = atomic_load_explicit(&y, memory_order_consume); > if (r2 == 42) > atomic_store_explicit(&x, 42, memory_order_relaxed); > > The first thread's accesses are dependency ordered. The second thread's > ordering is in a corner case that memory-barriers.txt does not cover. > You are supposed to start control dependencies with READ_ONCE_CTRL(), not > a memory_order_consume load (AKA rcu_dereference and friends). However, > Alpha would have a full barrier as part of the memory_order_consume load, > and the rest of the processors would (one way or another) respect the > control dependency. And the compiler would have some fun trying to > break it. But this is interesting because the first thread is ordered whilst the second is not, so doesn't that effectively forbid the compiler from constant-folding values if it can't prove that there is no dependency chain? > So the current Linux memory model would allow (r1 == 42 && r2 == 42), > but I don't know of any hardware/compiler combination that would > allow it. And no, I am -not- going to update memory-barriers.txt for > this litmus test, its theoretical interest notwithstanding! ;-) Indeed, I also don't know of any hardware which permits speculative writes to become visible, but it's the compiler (and the language definition) that we need to think about here. > In summary, both #4 and #5 would be allowed, as modified above. > Seem reasonable? It feels like it's suppressing a reasonable compiler optimisation, but again, I'm not a compiler writer ;) Will ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 19:24 ` Will Deacon @ 2015-05-21 20:02 ` Paul E. McKenney 2015-05-21 20:42 ` Linus Torvalds 2015-05-22 17:30 ` Will Deacon 0 siblings, 2 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-21 20:02 UTC (permalink / raw) To: Will Deacon Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Thu, May 21, 2015 at 08:24:22PM +0100, Will Deacon wrote: > On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote: > > On Wed, May 20, 2015 at 04:46:17PM +0100, Will Deacon wrote: > > > On Wed, May 20, 2015 at 01:15:22PM +0100, Paul E. McKenney wrote: > > > > Indeed, something like this does -not- carry a dependency from the > > > > memory_order_consume load to q: > > > > > > > > char *p, q; > > > > > > > > p = atomic_load_explicit(&gp, memory_order_consume); > > > > q = gq + (intptr_t)p - (intptr_t)p; > > > > > > > > If this was compiled with -O0, ARM and Power might well carry a > > > > dependency, but given any optimization, the assembly language would have > > > > no hint of any such dependency. So I am not seeing any particular danger. > > > > > > The above is a welcome relaxation over C11, since ARM doesn't even give > > > you ordering based off false data dependencies. My concern is more to do > > > with how this can be specified precisely without prohibing honest compiler > > > and hardware optimisations. > > > > That last is the challenge. I believe that I am pretty close, but I am > > sure that additional adjustment will be required. Especially given that > > we also need the memory model to be amenable to formal analysis. > > Well, there's still the whole thin-air problem which unfortunately doesn't > go away with your proposal... (I was hoping that differentiating between > true and false dependencies would solve that, but your set of rules isn't > broad enough and I don't blame you at all for that!). > > > > Out of interest, how do you tackle examples (4) and (5) of (assuming the > > > reads are promoted to consume loads)?: > > > > > > http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html > > > > > > my understanding is that you permit both outcomes (I appreciate you're > > > not directly tackling out-of-thin-air, but treatment of dependencies > > > is heavily related). > > Thanks for taking the time to walk these two examples through. ;-) ;-) ;-) > > Let's see... #4 is as follows, given promotion to memory_order_consume > > and (I am guessing) memory_order_relaxed: > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > if (r1 == 42) > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > ------------------------------------------------------ > > r2 = atomic_load_explicit(&y, memory_order_consume); > > if (r2 == 42) > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > else > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > > > The second thread does not have a proper control dependency, even with > > the memory_order_consume load because both branches assign the same > > value to "x". This means that the compiler is within its rights to > > optimize this into the following: > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > if (r1 == 42) > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > ------------------------------------------------------ > > r2 = atomic_load_explicit(&y, memory_order_consume); > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > > > There is no dependency between the second thread's pair of statements, > > so both the compiler and the CPU are within their rights to optimize > > further as follows: > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > if (r1 == 42) > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > ------------------------------------------------------ > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > r2 = atomic_load_explicit(&y, memory_order_consume); > > > > If the compiler makes this final optimization, even mythical SC hardware > > is within its rights to end up with (r1 == 42 && r2 == 42). Which is > > fine, as far as I am concerned. Or at least something that can be > > lived with. > > Agreed. > > > On to #5: > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > if (r1 == 42) > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > ---------------------------------------------------- > > r2 = atomic_load_explicit(&y, memory_order_consume); > > if (r2 == 42) > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > > > The first thread's accesses are dependency ordered. The second thread's > > ordering is in a corner case that memory-barriers.txt does not cover. > > You are supposed to start control dependencies with READ_ONCE_CTRL(), not > > a memory_order_consume load (AKA rcu_dereference and friends). However, > > Alpha would have a full barrier as part of the memory_order_consume load, > > and the rest of the processors would (one way or another) respect the > > control dependency. And the compiler would have some fun trying to > > break it. > > But this is interesting because the first thread is ordered whilst the > second is not, so doesn't that effectively forbid the compiler from > constant-folding values if it can't prove that there is no dependency > chain? You lost me on this one. Are you suggesting that the compiler speculate the second thread's atomic store? That would be very bad regardless of dependency chains. So what constant-folding optimization are you thinking of here? If the above example is not amenable to such an optimization, could you please give me an example where constant folding would apply in a way that is sensitive to dependency chains? > > So the current Linux memory model would allow (r1 == 42 && r2 == 42), > > but I don't know of any hardware/compiler combination that would > > allow it. And no, I am -not- going to update memory-barriers.txt for > > this litmus test, its theoretical interest notwithstanding! ;-) > > Indeed, I also don't know of any hardware which permits speculative > writes to become visible, but it's the compiler (and the language > definition) that we need to think about here. The compiler can (and does) speculate non-atomic non-volatile writes in some cases, but I do not believe that it is permitted to speculate either volatile or atomic writes. All that aside, I reiterate that I am assuming that the compiler does not speculate the value of pointer loads. > > In summary, both #4 and #5 would be allowed, as modified above. > > Seem reasonable? > > It feels like it's suppressing a reasonable compiler optimisation, but again, > I'm not a compiler writer ;) The intent of this proposal is to avoid suppressing reasonable compiler optimizations, so could you please send along an example? Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 20:02 ` Paul E. McKenney @ 2015-05-21 20:42 ` Linus Torvalds 2015-05-21 22:02 ` Paul E. McKenney ` (2 more replies) 2015-05-22 17:30 ` Will Deacon 1 sibling, 3 replies; 44+ messages in thread From: Linus Torvalds @ 2015-05-21 20:42 UTC (permalink / raw) To: Paul McKenney Cc: Will Deacon, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Thu, May 21, 2015 at 1:02 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > The compiler can (and does) speculate non-atomic non-volatile writes > in some cases, but I do not believe that it is permitted to speculate > either volatile or atomic writes. I do *not* believe that a compiler is ever allowed to speculate *any* writes - volatile or not - unless the compiler can prove that the end result is either single-threaded, or the write in question is guaranteed to only be visible in that thread (ie local stack variable etc). Quite frankly, I'd be much happier if the C standard just said so outright. Also, I do think that the whole "consume" read should be explained better to compiler writers. Right now the language (including very much in the "restricted dependency" model) is described in very abstract terms. Yet those abstract terms are actually very subtle and complex, and very opaque to a compiler writer. If I was a compiler writer, I'd absolutely detest that definition. It's very far removed from my problem space as a compiler writer, and nothing in the language *explains* the odd and subtle abstract rules. It smells ad-hoc to me. Now, I actually understand the point of those odd and abstract rules, but to a compiler writer that doesn't understand the background, the whole section reads as "this is really painful for me to track all those dependencies and what kills them". So I would very much suggest that there would be language that *explains* this. Basically, tell the compiler writer: (a) the "official" rules are completely pointless, and make sense only because the standard is written for some random "abstract machine" that doesn't actually exist. (b) in *real life*, the whole and only point of the rules is to make sure that the compiler doesn't turn a data depenency into a control dependency, which on ARM and POWERPC does not honor causal memory ordering (c) on x86, since *all* memory accesses are causal, all the magical dependency rules are just pointless anyway, and what it really means is that you cannot re-order accesses with value speculation. (c) the *actual* relevant rule for a compiler writer is very simple: the compiler must not do value speculation on a "consume" load, and the abstract machine rules are written so that any other sane optimization is legal. (d) if the compiler writer really thinks they want to do value speculation, they have to turn the "consume" load into an "acquire" load. And you have to do that anyway on architectures like alpha that aren't causal even for data dependencies. I personally think the whole "abstract machine" model of the C language is a mistake. It would be much better to talk about things in terms of actual code generation and actual issues. Make all the problems much more concrete, with actual examples of how memory ordering matters on different architectures. 99% of all the problems with the whole "consume" memory ordering comes not from anything relevant to a compiler writer. All of it comes from trying to "define" the issue in the wrong terms. Linus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 20:42 ` Linus Torvalds @ 2015-05-21 22:02 ` Paul E. McKenney 2015-05-22 6:43 ` Ingo Molnar 2015-05-26 17:37 ` [c++std-parallel-1641] " Torvald Riegel 2 siblings, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-21 22:02 UTC (permalink / raw) To: Linus Torvalds Cc: Will Deacon, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Thu, May 21, 2015 at 01:42:11PM -0700, Linus Torvalds wrote: > On Thu, May 21, 2015 at 1:02 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > > > The compiler can (and does) speculate non-atomic non-volatile writes > > in some cases, but I do not believe that it is permitted to speculate > > either volatile or atomic writes. > > I do *not* believe that a compiler is ever allowed to speculate *any* > writes - volatile or not - unless the compiler can prove that the end > result is either single-threaded, or the write in question is > guaranteed to only be visible in that thread (ie local stack variable > etc). > > Quite frankly, I'd be much happier if the C standard just said so outright. So would I. ;-) The usual example is the following, where x is non-volatile and non-atomic: if (a) x = 42; else x = 7; The current rules allow this to be transformed as follows: x = 7; if (a) x = 42; So the variable x takes on the value 7 momentarily when it otherwise would not have. At least C11 now prohibits "drive-by" speculative writes, where the compiler writes a larger size and then fixes things up afterwards. > Also, I do think that the whole "consume" read should be explained > better to compiler writers. Right now the language (including very > much in the "restricted dependency" model) is described in very > abstract terms. Yet those abstract terms are actually very subtle and > complex, and very opaque to a compiler writer. > > If I was a compiler writer, I'd absolutely detest that definition. > It's very far removed from my problem space as a compiler writer, and > nothing in the language *explains* the odd and subtle abstract rules. > It smells ad-hoc to me. > > Now, I actually understand the point of those odd and abstract rules, > but to a compiler writer that doesn't understand the background, the > whole section reads as "this is really painful for me to track all > those dependencies and what kills them". > > So I would very much suggest that there would be language that > *explains* this. Basically, tell the compiler writer: > > (a) the "official" rules are completely pointless, and make sense > only because the standard is written for some random "abstract > machine" that doesn't actually exist. > > (b) in *real life*, the whole and only point of the rules is to make > sure that the compiler doesn't turn a data depenency into a control > dependency, which on ARM and POWERPC does not honor causal memory > ordering > > (c) on x86, since *all* memory accesses are causal, all the magical > dependency rules are just pointless anyway, and what it really means > is that you cannot re-order accesses with value speculation. > > (c) the *actual* relevant rule for a compiler writer is very simple: > the compiler must not do value speculation on a "consume" load, and > the abstract machine rules are written so that any other sane > optimization is legal. > > (d) if the compiler writer really thinks they want to do value > speculation, they have to turn the "consume" load into an "acquire" > load. And you have to do that anyway on architectures like alpha that > aren't causal even for data dependencies. I am certainly more than willing to add this sort of wording to the document. > I personally think the whole "abstract machine" model of the C > language is a mistake. It would be much better to talk about things in > terms of actual code generation and actual issues. Make all the > problems much more concrete, with actual examples of how memory > ordering matters on different architectures. > > 99% of all the problems with the whole "consume" memory ordering comes > not from anything relevant to a compiler writer. All of it comes from > trying to "define" the issue in the wrong terms. I certainly cannot deny that there seems to be significant confusion in the discussion thus far. Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 20:42 ` Linus Torvalds 2015-05-21 22:02 ` Paul E. McKenney @ 2015-05-22 6:43 ` Ingo Molnar 2015-05-22 10:43 ` Richard Kenner 2015-05-22 13:12 ` Paul E. McKenney 2015-05-26 17:37 ` [c++std-parallel-1641] " Torvald Riegel 2 siblings, 2 replies; 44+ messages in thread From: Ingo Molnar @ 2015-05-22 6:43 UTC (permalink / raw) To: Linus Torvalds Cc: Paul McKenney, Will Deacon, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, michaelw * Linus Torvalds <torvalds@linux-foundation.org> wrote: > (a) the "official" rules are completely pointless, and make sense > only because the standard is written for some random "abstract > machine" that doesn't actually exist. Presuming the intent of the abstract machine specification is to avoid being seen as biased towards any specific machine (politics), maybe write this as: (a) the "official" rules are written for a somewhat weird and complex "union of all known and theoretically possible CPU architectures that exist or which might exist in the future", which machine does not actually exist in practice, but which allows a single abstract set of rules to apply to all machines. These rules are complex, but if applied to a specific machine they become considerably simpler. Here's a few examples: ... ? (Assuming it's a goal of this standard to be human parseable to more than a few dozen people on the planet.) Thanks, Ingo ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-22 6:43 ` Ingo Molnar @ 2015-05-22 10:43 ` Richard Kenner 2015-05-22 13:11 ` Paul E. McKenney 2015-05-22 13:12 ` Paul E. McKenney 1 sibling, 1 reply; 44+ messages in thread From: Richard Kenner @ 2015-05-22 10:43 UTC (permalink / raw) To: mingo Cc: akpm, c++std-parallel, dhowells, gcc, linux-arch, linux-kernel, Mark.Batty, michaelw, paulmck, Peter.Sewell, peterz, Ramana.Radhakrishnan, torvalds, will.deacon > (Assuming it's a goal of this standard to be human parseable to more > than a few dozen people on the planet.) Unfortunately, that's rarely a goal of most standards. ;-) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-22 10:43 ` Richard Kenner @ 2015-05-22 13:11 ` Paul E. McKenney 0 siblings, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-22 13:11 UTC (permalink / raw) To: Richard Kenner Cc: mingo, akpm, c++std-parallel, dhowells, gcc, linux-arch, linux-kernel, Mark.Batty, michaelw, Peter.Sewell, peterz, Ramana.Radhakrishnan, torvalds, will.deacon On Fri, May 22, 2015 at 06:43:32AM -0400, Richard Kenner wrote: > > (Assuming it's a goal of this standard to be human parseable to more > > than a few dozen people on the planet.) > > Unfortunately, that's rarely a goal of most standards. ;-) My experience does match Richard's, sad to say. There has been some vigorous discussion in another thread involving undefined behavior and value narrowing, which has resulted in some useful changes to Section 7.9. The updated document is here: http://www2.rdrop.com/users/paulmck/RCU/consume.2015.05.22a.pdf Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-22 6:43 ` Ingo Molnar 2015-05-22 10:43 ` Richard Kenner @ 2015-05-22 13:12 ` Paul E. McKenney 1 sibling, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-22 13:12 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Will Deacon, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, michaelw On Fri, May 22, 2015 at 08:43:44AM +0200, Ingo Molnar wrote: > > * Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > (a) the "official" rules are completely pointless, and make sense > > only because the standard is written for some random "abstract > > machine" that doesn't actually exist. > > Presuming the intent of the abstract machine specification is to avoid > being seen as biased towards any specific machine (politics), maybe > write this as: > > (a) the "official" rules are written for a somewhat weird and > complex "union of all known and theoretically possible CPU > architectures that exist or which might exist in the future", > which machine does not actually exist in practice, but which > allows a single abstract set of rules to apply to all machines. > These rules are complex, but if applied to a specific machine > they become considerably simpler. Here's a few examples: ... > > ? > > (Assuming it's a goal of this standard to be human parseable to more > than a few dozen people on the planet.) Should something based on Section 7.9 go in, then I would need to add a more developer-friendly explanation in Documentation/RCU, no two ways about it! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1641] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 20:42 ` Linus Torvalds 2015-05-21 22:02 ` Paul E. McKenney 2015-05-22 6:43 ` Ingo Molnar @ 2015-05-26 17:37 ` Torvald Riegel 2 siblings, 0 replies; 44+ messages in thread From: Torvald Riegel @ 2015-05-26 17:37 UTC (permalink / raw) To: c++std-parallel Cc: Paul McKenney, Will Deacon, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Thu, 2015-05-21 at 13:42 -0700, Linus Torvalds wrote: > On Thu, May 21, 2015 at 1:02 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > > > The compiler can (and does) speculate non-atomic non-volatile writes > > in some cases, but I do not believe that it is permitted to speculate > > either volatile or atomic writes. > > I do *not* believe that a compiler is ever allowed to speculate *any* > writes - volatile or not - unless the compiler can prove that the end > result is either single-threaded, or the write in question is > guaranteed to only be visible in that thread (ie local stack variable > etc). It must not speculative volatile accesses. It could speculate non-volatiles even if those are atomic and observable by other threads but that would require further work/checks on all potential observers of those (ie, to still satisfy as-if). Thus, compilers are unlikely to do such speculation, I'd say. The as-if rule (ie, equality of observable behavior (ie, volatiles, ...) to the abstract machine) makes all this clear. > Also, I do think that the whole "consume" read should be explained > better to compiler writers. Right now the language (including very > much in the "restricted dependency" model) is described in very > abstract terms. Yet those abstract terms are actually very subtle and > complex, and very opaque to a compiler writer. I believe the issues around the existing specification of mo_consume where pointed out by compiler folks. It's a complex problem, and I'm all for more explanations, but I did get the impression that the compiler writers in ISO C++ Study Group 1 do have a good understanding of the problem. > I personally think the whole "abstract machine" model of the C > language is a mistake. It would be much better to talk about things in > terms of actual code generation and actual issues. Make all the > problems much more concrete, with actual examples of how memory > ordering matters on different architectures. As someone working for a toolchain team, I don't see how the abstract-machine-based specification is a problem at all, nor have I seen compiler writers struggling with it. It does give precise rules for code generation. The level of abstraction is a good thing for most programs because for those, the primary concern is that the observable behavior and end result is computed -- it's secondary and QoI how that happens. In contrast, if you specify on the level of code generation, you'd have to foresee how code generation might look in the future, including predict future optimizations and all that. That doesn't look future-proof to me. I do realize that this may be less than ideal for cases when one would want to use a C compiler more like a convenient assembler. But that case isn't the 99%, I believe. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-21 20:02 ` Paul E. McKenney 2015-05-21 20:42 ` Linus Torvalds @ 2015-05-22 17:30 ` Will Deacon 2015-05-22 18:55 ` Paul E. McKenney 1 sibling, 1 reply; 44+ messages in thread From: Will Deacon @ 2015-05-22 17:30 UTC (permalink / raw) To: Paul E. McKenney Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw Hi Paul, On Thu, May 21, 2015 at 09:02:12PM +0100, Paul E. McKenney wrote: > On Thu, May 21, 2015 at 08:24:22PM +0100, Will Deacon wrote: > > On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote: > > > On to #5: > > > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > > if (r1 == 42) > > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > > ---------------------------------------------------- > > > r2 = atomic_load_explicit(&y, memory_order_consume); > > > if (r2 == 42) > > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > > > > > The first thread's accesses are dependency ordered. The second thread's > > > ordering is in a corner case that memory-barriers.txt does not cover. > > > You are supposed to start control dependencies with READ_ONCE_CTRL(), not > > > a memory_order_consume load (AKA rcu_dereference and friends). However, > > > Alpha would have a full barrier as part of the memory_order_consume load, > > > and the rest of the processors would (one way or another) respect the > > > control dependency. And the compiler would have some fun trying to > > > break it. > > > > But this is interesting because the first thread is ordered whilst the > > second is not, so doesn't that effectively forbid the compiler from > > constant-folding values if it can't prove that there is no dependency > > chain? > > You lost me on this one. Are you suggesting that the compiler > speculate the second thread's atomic store? That would be very > bad regardless of dependency chains. > > So what constant-folding optimization are you thinking of here? > If the above example is not amenable to such an optimization, could > you please give me an example where constant folding would apply > in a way that is sensitive to dependency chains? Unless I'm missing something, I can't see what would prevent a compiler from looking at the code in thread1 and transforming it into the code in thread2 (i.e. constant folding r1 with 42 given that the taken branch must mean that r1 == 42). However, such an optimisation breaks the dependency chain, which means that a compiler needs to walk backwards to see if there is a dependency chain extending to r1. > > > So the current Linux memory model would allow (r1 == 42 && r2 == 42), > > > but I don't know of any hardware/compiler combination that would > > > allow it. And no, I am -not- going to update memory-barriers.txt for > > > this litmus test, its theoretical interest notwithstanding! ;-) Of course, I'm not asking for that at all! I'm just trying to see how your proposal holds up with the example. Will ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-22 17:30 ` Will Deacon @ 2015-05-22 18:55 ` Paul E. McKenney 0 siblings, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-22 18:55 UTC (permalink / raw) To: Will Deacon Cc: Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Fri, May 22, 2015 at 06:30:29PM +0100, Will Deacon wrote: > Hi Paul, > > On Thu, May 21, 2015 at 09:02:12PM +0100, Paul E. McKenney wrote: > > On Thu, May 21, 2015 at 08:24:22PM +0100, Will Deacon wrote: > > > On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote: > > > > On to #5: > > > > > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > > > if (r1 == 42) > > > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > > > ---------------------------------------------------- > > > > r2 = atomic_load_explicit(&y, memory_order_consume); > > > > if (r2 == 42) > > > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > > > > > > > The first thread's accesses are dependency ordered. The second thread's > > > > ordering is in a corner case that memory-barriers.txt does not cover. > > > > You are supposed to start control dependencies with READ_ONCE_CTRL(), not > > > > a memory_order_consume load (AKA rcu_dereference and friends). However, > > > > Alpha would have a full barrier as part of the memory_order_consume load, > > > > and the rest of the processors would (one way or another) respect the > > > > control dependency. And the compiler would have some fun trying to > > > > break it. > > > > > > But this is interesting because the first thread is ordered whilst the > > > second is not, so doesn't that effectively forbid the compiler from > > > constant-folding values if it can't prove that there is no dependency > > > chain? > > > > You lost me on this one. Are you suggesting that the compiler > > speculate the second thread's atomic store? That would be very > > bad regardless of dependency chains. > > > > So what constant-folding optimization are you thinking of here? > > If the above example is not amenable to such an optimization, could > > you please give me an example where constant folding would apply > > in a way that is sensitive to dependency chains? > > Unless I'm missing something, I can't see what would prevent a compiler > from looking at the code in thread1 and transforming it into the code in > thread2 (i.e. constant folding r1 with 42 given that the taken branch > must mean that r1 == 42). However, such an optimisation breaks the > dependency chain, which means that a compiler needs to walk backwards > to see if there is a dependency chain extending to r1. Indeed! Which is one reason that (1) integers are not allowed in dependency chains with a very few extremely constrained exceptions and (2) sequences of comparisons and/or undefined-behavior considerations that allow the compiler to exactly determine the pointer value break the dependency chain. > > > > So the current Linux memory model would allow (r1 == 42 && r2 == 42), > > > > but I don't know of any hardware/compiler combination that would > > > > allow it. And no, I am -not- going to update memory-barriers.txt for > > > > this litmus test, its theoretical interest notwithstanding! ;-) > > Of course, I'm not asking for that at all! I'm just trying to see how > your proposal holds up with the example. Whew! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 11:47 ` Will Deacon 2015-05-20 12:15 ` Paul E. McKenney @ 2015-05-20 13:18 ` David Howells 2015-05-20 13:30 ` Paul E. McKenney 2015-05-20 13:37 ` David Howells 1 sibling, 2 replies; 44+ messages in thread From: David Howells @ 2015-05-20 13:18 UTC (permalink / raw) To: paulmck Cc: dhowells, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, Andrew Morton, Ingo Molnar, michaelw Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > Additionally, what about the following code? > > > > char *x = y ? z : z; > > > > Does that extend a dependency chain from z to x? If so, I can imagine a > > CPU breaking that in practice. > > I am not seeing this. I would expect the compiler to optimize to > something like this: > > char *x = z; Why? What if y has a potential side-effect (say it makes a function call)? David ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 13:18 ` David Howells @ 2015-05-20 13:30 ` Paul E. McKenney 2015-05-20 13:37 ` David Howells 1 sibling, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 13:30 UTC (permalink / raw) To: David Howells Cc: Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 02:18:37PM +0100, David Howells wrote: > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > > Additionally, what about the following code? > > > > > > char *x = y ? z : z; > > > > > > Does that extend a dependency chain from z to x? If so, I can imagine a > > > CPU breaking that in practice. > > > > I am not seeing this. I would expect the compiler to optimize to > > something like this: > > > > char *x = z; > > Why? What if y has a potential side-effect (say it makes a function call)? I was thinking of "y" as a simple variable, but if it is something more complex, then the compiler could do this, right? char *x; y; x = z; Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 13:18 ` David Howells 2015-05-20 13:30 ` Paul E. McKenney @ 2015-05-20 13:37 ` David Howells 2015-05-20 13:44 ` Ramana Radhakrishnan 2015-05-20 14:02 ` [c++std-parallel-1624] " Paul E. McKenney 1 sibling, 2 replies; 44+ messages in thread From: David Howells @ 2015-05-20 13:37 UTC (permalink / raw) To: paulmck Cc: dhowells, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, Andrew Morton, Ingo Molnar, michaelw Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > I was thinking of "y" as a simple variable, but if it is something more > complex, then the compiler could do this, right? > > char *x; > > y; > x = z; Yeah. I presume it has to maintain the ordering, though. David ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 13:37 ` David Howells @ 2015-05-20 13:44 ` Ramana Radhakrishnan 2015-05-20 14:03 ` Paul E. McKenney 2015-05-20 14:02 ` [c++std-parallel-1624] " Paul E. McKenney 1 sibling, 1 reply; 44+ messages in thread From: Ramana Radhakrishnan @ 2015-05-20 13:44 UTC (permalink / raw) To: David Howells, paulmck Cc: Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Andrew Morton, Ingo Molnar, michaelw On 20/05/15 14:37, David Howells wrote: > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > >> I was thinking of "y" as a simple variable, but if it is something more >> complex, then the compiler could do this, right? >> >> char *x; >> >> y; >> x = z; > > Yeah. I presume it has to maintain the ordering, though. The scheduler for e.g. is free to reorder if it can prove there is no dependence (or indeed side-effects for y) between insns produced for y and `x = z'. regards Ramana > > David > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 13:44 ` Ramana Radhakrishnan @ 2015-05-20 14:03 ` Paul E. McKenney 2015-05-20 14:15 ` Ramana Radhakrishnan 0 siblings, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 14:03 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: David Howells, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 02:44:30PM +0100, Ramana Radhakrishnan wrote: > > > On 20/05/15 14:37, David Howells wrote: > >Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > >>I was thinking of "y" as a simple variable, but if it is something more > >>complex, then the compiler could do this, right? > >> > >> char *x; > >> > >> y; > >> x = z; > > > >Yeah. I presume it has to maintain the ordering, though. > > The scheduler for e.g. is free to reorder if it can prove there is > no dependence (or indeed side-effects for y) between insns produced > for y and `x = z'. So for example, if y is independent of z, the compiler can do the following: char *x; x = z; y; But the dependency ordering is still maintained from z to x, so this is not a problem. Or am I missing something subtle here? Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 14:03 ` Paul E. McKenney @ 2015-05-20 14:15 ` Ramana Radhakrishnan 2015-05-20 15:12 ` Paul E. McKenney 2015-05-20 15:46 ` David Howells 0 siblings, 2 replies; 44+ messages in thread From: Ramana Radhakrishnan @ 2015-05-20 14:15 UTC (permalink / raw) To: paulmck Cc: David Howells, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Andrew Morton, Ingo Molnar, michaelw On 20/05/15 15:03, Paul E. McKenney wrote: > On Wed, May 20, 2015 at 02:44:30PM +0100, Ramana Radhakrishnan wrote: >> >> >> On 20/05/15 14:37, David Howells wrote: >>> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: >>> >>>> I was thinking of "y" as a simple variable, but if it is something more >>>> complex, then the compiler could do this, right? >>>> >>>> char *x; >>>> >>>> y; >>>> x = z; >>> >>> Yeah. I presume it has to maintain the ordering, though. >> >> The scheduler for e.g. is free to reorder if it can prove there is >> no dependence (or indeed side-effects for y) between insns produced >> for y and `x = z'. > > So for example, if y is independent of z, the compiler can do the > following: > > char *x; > > x = z; > y; > > But the dependency ordering is still maintained from z to x, so this > is not a problem. Well, reads if any of x (assuming x was initialized elsewhere) would need to happen before x got assigned to z. I understood the original "maintain the ordering" as between the statements `x = z' and `y'. > > Or am I missing something subtle here? No, it sounds like we are on the same page here. regards Ramana > > Thanx, Paul > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 14:15 ` Ramana Radhakrishnan @ 2015-05-20 15:12 ` Paul E. McKenney 2015-05-20 15:46 ` David Howells 1 sibling, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 15:12 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: David Howells, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 03:15:48PM +0100, Ramana Radhakrishnan wrote: > > > On 20/05/15 15:03, Paul E. McKenney wrote: > >On Wed, May 20, 2015 at 02:44:30PM +0100, Ramana Radhakrishnan wrote: > >> > >> > >>On 20/05/15 14:37, David Howells wrote: > >>>Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > >>> > >>>>I was thinking of "y" as a simple variable, but if it is something more > >>>>complex, then the compiler could do this, right? > >>>> > >>>> char *x; > >>>> > >>>> y; > >>>> x = z; > >>> > >>>Yeah. I presume it has to maintain the ordering, though. > >> > >>The scheduler for e.g. is free to reorder if it can prove there is > >>no dependence (or indeed side-effects for y) between insns produced > >>for y and `x = z'. > > > >So for example, if y is independent of z, the compiler can do the > >following: > > > > char *x; > > > > x = z; > > y; > > > >But the dependency ordering is still maintained from z to x, so this > >is not a problem. > > > Well, reads if any of x (assuming x was initialized elsewhere) would > need to happen before x got assigned to z. Agreed, there needs to be a memory_order_consume load up there somewhere. (AKA rcu_dereference().) > I understood the original "maintain the ordering" as between the > statements `x = z' and `y'. Ah, I was assuming between x and z. David, what was your intent? ;-) > >Or am I missing something subtle here? > > No, it sounds like we are on the same page here. Whew! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 14:15 ` Ramana Radhakrishnan 2015-05-20 15:12 ` Paul E. McKenney @ 2015-05-20 15:46 ` David Howells 1 sibling, 0 replies; 44+ messages in thread From: David Howells @ 2015-05-20 15:46 UTC (permalink / raw) To: paulmck Cc: dhowells, Ramana Radhakrishnan, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Andrew Morton, Ingo Molnar, michaelw Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > Ah, I was assuming between x and z. David, what was your intent? ;-) Clarification. David ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1624] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 13:37 ` David Howells 2015-05-20 13:44 ` Ramana Radhakrishnan @ 2015-05-20 14:02 ` Paul E. McKenney 1 sibling, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 14:02 UTC (permalink / raw) To: c++std-parallel Cc: dhowells, Will Deacon, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Ramana Radhakrishnan, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 02:37:05PM +0100, David Howells wrote: > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > > > I was thinking of "y" as a simple variable, but if it is something more > > complex, then the compiler could do this, right? > > > > char *x; > > > > y; > > x = z; > > Yeah. I presume it has to maintain the ordering, though. Agreed. Unless of course y writes to x or some such. Given that there is already code in the Linux kernel relying on dependencies being carried through stores to local variables, this should not be a problem. Or am I missing something? Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 1:57 ` Linus Torvalds 2015-05-20 2:10 ` Linus Torvalds @ 2015-05-20 2:34 ` Paul E. McKenney 2015-05-20 7:34 ` [c++std-parallel-1614] " Jens Maurer 1 sibling, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 2:34 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, c++std-parallel, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Tue, May 19, 2015 at 06:57:02PM -0700, Linus Torvalds wrote: > On Tue, May 19, 2015 at 5:55 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf > > >From a very quick read-through, the restricted dependency chain in 7.9 > seems to be reasonable, and essentially covers "thats' what hardware > gives us anyway", making compiler writers happy. > > I would clarify the language somewhat: > > - it says that the result of a cast of a pointer is a dependency. You > need to make an exception for casting to bool, methinks (because > that's effectively just a test-against-NULL, which you later describe > as terminating the dependency). > > Maybe get rid of the "any type", and just limit it to casts to > types of size intptr_t, ie ones that don't drop significant bits. All > the other rules talk about [u]intptr_t anyway. Excellent point! I now say: If a pointer is part of a dependency chain, then casting it (either explicitly or implicitly) to any pointer-sized type extends the chain to the result. If this approach works out, the people in the Core Working Group will come up with alternative language-lawyer-proof wording, but this informal version will hopefully do for the moment. > - you clarify that the trivial "& 0" and "| ~0" kill the dependency > chain, but if you really want to be a stickler, you might want to > extend it to a few more cases. Things like "& 1" (to extract a tag > from the lot bit of a tagged pointer) should likely also drop the > dependency, since a compiler would commonly end up using the end > result as a conditional even if the code was written to then use > casting etc to look like a dereference. Ah, how about the following? If a value of type intptr_t or uintptr_t is part of a dependency chain, and if that value is one of the operands to an & or | infix operator whose result has too few or too many bits set, then the resulting value will not be part of any dependency chain. For example, on a 64-bit system, if p is part of a dependency chain, then (p & 0x7) provides just the tag bits, and normally cannot even be legally dereferenced. Similarly, (p | ~0) normally cannot be legally dereferenced. > - the "you can add/subtract integral values" still opens you up to > language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > dependency, which it clearly doesn't. But language-lawyering it does, > since all those operations (cast to pointer, cast to integer, > subtracting an integer) claim to be dependency-preserving operations. My thought was that the result of "(char *)ptr - (intptr_t)ptr" is a NULL pointer in most environments, and dereferencing a NULL pointer is undefined behavior. So it becomes irrelevant whether or not the NULL pointer carries a dependency. There are some stranger examples, such as "(char *)ptr - ((intptr_t)ptr)/7", but in that case, if the resulting pointer happens by chance to reference valid memory, I believe a dependency would still be carried. Of course, if you are producing code like that, I am guessing that dependencies are the very least of your concerns. However, I will give this some more thought. > So I think you want to limit the logical operators to things that > don't mask off too many bits, and you should probably limit the > add/subtract operations some way (maybe specify that the integer value > you add/subtract cannot be related to the pointer). But I think > limiting it to mostly pointer ops (and a _few_ integer operations to > do offsets and remove tag bits) is otherwise a good approach. Glad you mostly like it! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1614] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 2:34 ` Paul E. McKenney @ 2015-05-20 7:34 ` Jens Maurer 2015-05-20 9:03 ` Richard Biener 2015-05-20 12:01 ` [c++std-parallel-1616] " Paul E. McKenney 0 siblings, 2 replies; 44+ messages in thread From: Jens Maurer @ 2015-05-20 7:34 UTC (permalink / raw) To: c++std-parallel Cc: Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On 05/20/2015 04:34 AM, Paul E. McKenney wrote: > On Tue, May 19, 2015 at 06:57:02PM -0700, Linus Torvalds wrote: >> - the "you can add/subtract integral values" still opens you up to >> language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the >> dependency, which it clearly doesn't. But language-lawyering it does, >> since all those operations (cast to pointer, cast to integer, >> subtracting an integer) claim to be dependency-preserving operations. [...] > There are some stranger examples, such as "(char *)ptr - ((intptr_t)ptr)/7", > but in that case, if the resulting pointer happens by chance to reference > valid memory, I believe a dependency would still be carried. [...] >From a language lawyer standpoint, pointer arithmetic is only valid within an array. These examples seem to go beyond the bounds of the array and therefore have undefined behavior. C++ standard section 5.7 paragraph 4 "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." C99 and C11 identical phrasing in 6.5.6 paragraph 8 Jens ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1614] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 7:34 ` [c++std-parallel-1614] " Jens Maurer @ 2015-05-20 9:03 ` Richard Biener 2015-05-20 12:02 ` Paul E. McKenney 2015-05-20 12:01 ` [c++std-parallel-1616] " Paul E. McKenney 1 sibling, 1 reply; 44+ messages in thread From: Richard Biener @ 2015-05-20 9:03 UTC (permalink / raw) To: Jens Maurer Cc: c++std-parallel, Paul E. McKenney, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 9:34 AM, Jens Maurer <Jens.Maurer@gmx.net> wrote: > On 05/20/2015 04:34 AM, Paul E. McKenney wrote: >> On Tue, May 19, 2015 at 06:57:02PM -0700, Linus Torvalds wrote: > >>> - the "you can add/subtract integral values" still opens you up to >>> language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the >>> dependency, which it clearly doesn't. But language-lawyering it does, >>> since all those operations (cast to pointer, cast to integer, >>> subtracting an integer) claim to be dependency-preserving operations. > > [...] > >> There are some stranger examples, such as "(char *)ptr - ((intptr_t)ptr)/7", >> but in that case, if the resulting pointer happens by chance to reference >> valid memory, I believe a dependency would still be carried. > [...] > > From a language lawyer standpoint, pointer arithmetic is only valid > within an array. These examples seem to go beyond the bounds of the > array and therefore have undefined behavior. > > C++ standard section 5.7 paragraph 4 > "If both the pointer operand and the result point to elements of the > same array object, or one past the last element of the array object, > the evaluation shall not produce an overflow; otherwise, the behavior > is undefined." > > C99 and C11 > identical phrasing in 6.5.6 paragraph 8 Of course you can try to circumvent that by doing (char*)((intptr_t)ptr - (intptr_t)ptr + (intptr_t)ptr) (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 for extra fun). Which (IMHO) gets you into the standard language that only makes conversion of the exact same integer back to a pointer well-defined(?) Richard. > Jens ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1614] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 9:03 ` Richard Biener @ 2015-05-20 12:02 ` Paul E. McKenney 0 siblings, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 12:02 UTC (permalink / raw) To: Richard Biener Cc: Jens Maurer, c++std-parallel, Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 11:03:00AM +0200, Richard Biener wrote: > On Wed, May 20, 2015 at 9:34 AM, Jens Maurer <Jens.Maurer@gmx.net> wrote: > > On 05/20/2015 04:34 AM, Paul E. McKenney wrote: > >> On Tue, May 19, 2015 at 06:57:02PM -0700, Linus Torvalds wrote: > > > >>> - the "you can add/subtract integral values" still opens you up to > >>> language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > >>> dependency, which it clearly doesn't. But language-lawyering it does, > >>> since all those operations (cast to pointer, cast to integer, > >>> subtracting an integer) claim to be dependency-preserving operations. > > > > [...] > > > >> There are some stranger examples, such as "(char *)ptr - ((intptr_t)ptr)/7", > >> but in that case, if the resulting pointer happens by chance to reference > >> valid memory, I believe a dependency would still be carried. > > [...] > > > > From a language lawyer standpoint, pointer arithmetic is only valid > > within an array. These examples seem to go beyond the bounds of the > > array and therefore have undefined behavior. > > > > C++ standard section 5.7 paragraph 4 > > "If both the pointer operand and the result point to elements of the > > same array object, or one past the last element of the array object, > > the evaluation shall not produce an overflow; otherwise, the behavior > > is undefined." > > > > C99 and C11 > > identical phrasing in 6.5.6 paragraph 8 > > Of course you can try to circumvent that by doing > (char*)((intptr_t)ptr - (intptr_t)ptr + (intptr_t)ptr) > (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 for extra fun). > > Which (IMHO) gets you into the standard language that only makes conversion of > the exact same integer back to a pointer well-defined(?) I am feeling good about leaving the restriction and calling out the two paragraphs in a footnote, then. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1616] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 7:34 ` [c++std-parallel-1614] " Jens Maurer 2015-05-20 9:03 ` Richard Biener @ 2015-05-20 12:01 ` Paul E. McKenney 1 sibling, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-20 12:01 UTC (permalink / raw) To: c++std-parallel Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch, gcc, p796231, mark.batty@cl.cam.ac.uk, Peter Zijlstra, Will Deacon, Ramana Radhakrishnan, David Howells, Andrew Morton, Ingo Molnar, michaelw On Wed, May 20, 2015 at 09:34:10AM +0200, Jens Maurer wrote: > On 05/20/2015 04:34 AM, Paul E. McKenney wrote: > > On Tue, May 19, 2015 at 06:57:02PM -0700, Linus Torvalds wrote: > > >> - the "you can add/subtract integral values" still opens you up to > >> language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > >> dependency, which it clearly doesn't. But language-lawyering it does, > >> since all those operations (cast to pointer, cast to integer, > >> subtracting an integer) claim to be dependency-preserving operations. > > [...] > > > There are some stranger examples, such as "(char *)ptr - ((intptr_t)ptr)/7", > > but in that case, if the resulting pointer happens by chance to reference > > valid memory, I believe a dependency would still be carried. > [...] > > >From a language lawyer standpoint, pointer arithmetic is only valid > within an array. These examples seem to go beyond the bounds of the > array and therefore have undefined behavior. > > C++ standard section 5.7 paragraph 4 > "If both the pointer operand and the result point to elements of the > same array object, or one past the last element of the array object, > the evaluation shall not produce an overflow; otherwise, the behavior > is undefined." > > C99 and C11 > identical phrasing in 6.5.6 paragraph 8 Even better! I added a footnote calling out these two paragraphs. Thax, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1611] Compilers and RCU readers: Once more unto the breach! 2015-05-20 0:55 Compilers and RCU readers: Once more unto the breach! Paul E. McKenney 2015-05-20 1:57 ` Linus Torvalds @ 2015-05-26 17:08 ` Torvald Riegel 2015-05-27 1:41 ` [c++std-parallel-1651] " Paul E. McKenney 2015-07-14 0:44 ` Paul E. McKenney 2 siblings, 1 reply; 44+ messages in thread From: Torvald Riegel @ 2015-05-26 17:08 UTC (permalink / raw) To: c++std-parallel Cc: linux-kernel, linux-arch, gcc, Peter.Sewell, torvalds, Mark.Batty, peterz, will.deacon, Ramana.Radhakrishnan, dhowells, akpm, mingo, michaelw On Tue, 2015-05-19 at 17:55 -0700, Paul E. McKenney wrote: > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf I have been discussing Section 7.9 with Paul during last week. While I think that 7.9 helps narrow down the problem somewhat, I'm still concerned that it effectively requires compilers to either track dependencies or conservatively prevent optimizations like value speculation and specialization based on that. Neither is a good thing for a compiler. 7.9 adds requirements that dependency chains stop if the program itself informs the compiler about the value of something in the dependency chain (e.g., as shown in Figure 33). Also, if a correct program that does not have undefined behavior must use a particular value, this is also considered as "informing" the compiler about that value. For example: int arr[2]; int* x = foo.load(mo_consume); if (x > arr) // implies same object/array, so x is in arr[] int r1 = *x; // compiler knows x == arr + 1 The program, assuming no undefined behavior, first tells the compiler that x should be within arr, and then the comparison tells the compiler that x!=arr, so x==arr+1 must hold because there are just two elements in the array. Having these requirements is generally good, but we don't yet know how to specify this properly. For example, I suppose we'd need to also say that the compiler cannot assume to know anything about a value returned from an mo_consume load; otherwise, nothing prevents a compiler from using knowledge about the stores that the mo_consume load can read from (see Section 7.2). Also, a compiler is still required to not do value-speculation or optimizations based on that. For example, this program: void op(type *p) { foo /= p->a; bar = p->b; } void bar() { pointer = ppp.load(mo_consume); op(pointer); } ... must not be transformed into this program, even if the compiler knows that global_var->a == 1: void op(type *p) { /* unchanged */} void bar() { pointer = ppp.load(mo_consume); if (pointer != global_var) { op(pointer); else // specialization for global_var { // compiler knows global_var->a==1; // compiler uses global_var directly, inlines, optimizes: bar = global_var->b; } The compiler could optimize out the division if pointer==global_var but it must not access field b directly through global_var. This would be pretty awkwaard; the compiler may work based on an optimized expression in the specialization (ie, create code that assumes global_var instead of pointer) but it would still have to carry around and use the non-optimized expression. This wouldn't be as bad if it were easily constrained to code sequences that really need the dependencies. However, 7.9 does not effectively contain dependencies to only the code that really needs them, IMO. Unless the compiler proves otherwise, it has to assume that a load from a pointer carries a dependency. Proving that is often very hard because it requires points-to analysis; 7.9 restricts this to intra-thread analysis but that is still nontrivial. Michael Matz' had a similar concern (in terms of what it results in). Given that mo_consume is useful but a very specialized feature, I wouldn't be optimistic that 7.9 would actually be supported by many compilers. The trade-off between having to track dependencies or having to disallow optimizations is a bad one to make. The simple way out for a compiler would be to just emit mo_acquire instead of mo_consume and be done with all -- and this might be the most practical decision overall, or the default general-purpose implementation. At least I haven't heard any compiler implementer say that they think it's obviously worth implementing. I also don't think 7.9 is ready for ISO standardization yet (or any of the other alternatives mentioned in the paper). Standardizing a feature that we're not sure whether it will actually be implemented is not a good thing to do; it's too costly for all involved parties (compiler writers *and* users). IMO, the approach outlined in Section 7.7 is still the most promising contender in the long run. It avoid the perhaps more pervasive changes that a type-system-based approach as the one in Section 7.2 might result in, yet still informs the compiler where dependencies are actually used and which chain of expressions would be involved in that. Tracking is probably simplified, as dependencies are never open-ended and potentially leaking into various other regions of code. It seems easier to specify in a standard because we just need the programmer to annotate the intent and the rest is compiler QoI. It would require users to annotate their use of dependencies, but they don't need to follow further rules; performance tuning of the code so it actually makes use of dependencies is mostly a compiler QoI thing, and if the compiler can't maintain a dependency, it can issue warnings and thus make the tuning interactive for the user. Of course, finding a good way to reference the source of the dependency in the API (see the paper for some of the sub-issues) will be a challenge. But, currently, I'm more optimistic we can find a useful solution for this than finding a standardizable version of 7.9. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [c++std-parallel-1651] Re: Compilers and RCU readers: Once more unto the breach! 2015-05-26 17:08 ` [c++std-parallel-1611] " Torvald Riegel @ 2015-05-27 1:41 ` Paul E. McKenney 0 siblings, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-05-27 1:41 UTC (permalink / raw) To: c++std-parallel Cc: linux-kernel, linux-arch, gcc, Peter.Sewell, torvalds, Mark.Batty, peterz, will.deacon, Ramana.Radhakrishnan, dhowells, akpm, mingo, michaelw On Tue, May 26, 2015 at 07:08:35PM +0200, Torvald Riegel wrote: > On Tue, 2015-05-19 at 17:55 -0700, Paul E. McKenney wrote: > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf > > I have been discussing Section 7.9 with Paul during last week. > > While I think that 7.9 helps narrow down the problem somewhat, I'm still > concerned that it effectively requires compilers to either track > dependencies or conservatively prevent optimizations like value > speculation and specialization based on that. Neither is a good thing > for a compiler. I do believe that we can find some middle ground. > 7.9 adds requirements that dependency chains stop if the program itself > informs the compiler about the value of something in the dependency > chain (e.g., as shown in Figure 33). Also, if a correct program that > does not have undefined behavior must use a particular value, this is > also considered as "informing" the compiler about that value. For > example: > int arr[2]; > int* x = foo.load(mo_consume); > if (x > arr) // implies same object/array, so x is in arr[] > int r1 = *x; // compiler knows x == arr + 1 > The program, assuming no undefined behavior, first tells the compiler > that x should be within arr, and then the comparison tells the compiler > that x!=arr, so x==arr+1 must hold because there are just two elements > in the array. The updated version of Section 7.9 says that if undefined behavior allows the compiler to deduce the exact pointer value, as in the case you show above, the dependency chain is broken. > Having these requirements is generally good, but we don't yet know how > to specify this properly. For example, I suppose we'd need to also say > that the compiler cannot assume to know anything about a value returned > from an mo_consume load; otherwise, nothing prevents a compiler from > using knowledge about the stores that the mo_consume load can read from > (see Section 7.2). I expect that the Linux kernel's rcu_dereference() primitives would map to volatile memory_order_consume loads for this reason. > Also, a compiler is still required to not do value-speculation or > optimizations based on that. For example, this program: > > void op(type *p) > { > foo /= p->a; > bar = p->b; > } > void bar() > { > pointer = ppp.load(mo_consume); > op(pointer); > } > > ... must not be transformed into this program, even if the compiler > knows that global_var->a == 1: > > void op(type *p) { /* unchanged */} > void bar() > { > pointer = ppp.load(mo_consume); > if (pointer != global_var) { > op(pointer); > else // specialization for global_var > { > // compiler knows global_var->a==1; > // compiler uses global_var directly, inlines, optimizes: > bar = global_var->b; > } > > The compiler could optimize out the division if pointer==global_var but > it must not access field b directly through global_var. This would be > pretty awkwaard; the compiler may work based on an optimized expression > in the specialization (ie, create code that assumes global_var instead > of pointer) but it would still have to carry around and use the > non-optimized expression. Exactly how valuable is this sort of optimization in real code? And how likely is the compiler to actually be able to apply it? (I nevertheless will take another pass through the Linux kernel looking for global variables being added to RCU-protected linked structures. Putting a global into an RCU-protected structure seems more likely than is an RCU-protected pointer into a two-element array.) > This wouldn't be as bad if it were easily constrained to code sequences > that really need the dependencies. However, 7.9 does not effectively > contain dependencies to only the code that really needs them, IMO. > Unless the compiler proves otherwise, it has to assume that a load from > a pointer carries a dependency. Proving that is often very hard because > it requires points-to analysis; 7.9 restricts this to intra-thread > analysis but that is still nontrivial. > Michael Matz' had a similar concern (in terms of what it results in). Again, I will be looking through the Linux kernel for vulnerabilities to this sort of transformation. However, I am having a very hard time seeing how the compiler is going to know that much about the vast majority of the Linux-kernel use cases. The linked structures are allocated on the heap, not in arrays or in globals. > Given that mo_consume is useful but a very specialized feature, I > wouldn't be optimistic that 7.9 would actually be supported by many > compilers. The trade-off between having to track dependencies or having > to disallow optimizations is a bad one to make. The simple way out for > a compiler would be to just emit mo_acquire instead of mo_consume and be > done with all -- and this might be the most practical decision overall, > or the default general-purpose implementation. At least I haven't heard > any compiler implementer say that they think it's obviously worth > implementing. I believe that we need to balance the specialization of the memory_order_consume feature against the real-world usefulness of the optimizations. In addition, I believe that we can rule out some of the more ornate situations, which would allow the compiler to do the optimizations and allow the designated set of dependencies to be preserved. > I also don't think 7.9 is ready for ISO standardization yet (or any of > the other alternatives mentioned in the paper). Standardizing a feature > that we're not sure whether it will actually be implemented is not a > good thing to do; it's too costly for all involved parties (compiler > writers *and* users). Understood, but it seems premature to give up on 7.9. I believe that we are actually quite close. > IMO, the approach outlined in Section 7.7 is still the most promising > contender in the long run. It avoid the perhaps more pervasive changes > that a type-system-based approach as the one in Section 7.2 might result > in, yet still informs the compiler where dependencies are actually used > and which chain of expressions would be involved in that. Tracking is > probably simplified, as dependencies are never open-ended and > potentially leaking into various other regions of code. It seems easier > to specify in a standard because we just need the programmer to annotate > the intent and the rest is compiler QoI. It would require users to > annotate their use of dependencies, but they don't need to follow > further rules; performance tuning of the code so it actually makes use > of dependencies is mostly a compiler QoI thing, and if the compiler > can't maintain a dependency, it can issue warnings and thus make the > tuning interactive for the user. > > Of course, finding a good way to reference the source of the dependency > in the API (see the paper for some of the sub-issues) will be a > challenge. But, currently, I'm more optimistic we can find a useful > solution for this than finding a standardizable version of 7.9. As discussed before, I have no objection to exploring the other options, including the one in Section 7.7, but we really do need to be able to handle the common-case dependency chain without overdosing on syntactic sugar. Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-05-20 0:55 Compilers and RCU readers: Once more unto the breach! Paul E. McKenney 2015-05-20 1:57 ` Linus Torvalds 2015-05-26 17:08 ` [c++std-parallel-1611] " Torvald Riegel @ 2015-07-14 0:44 ` Paul E. McKenney 2015-09-22 17:00 ` Paul E. McKenney 2 siblings, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-07-14 0:44 UTC (permalink / raw) To: linux-kernel, c++std-parallel, linux-arch, gcc Cc: Peter.Sewell, torvalds, Mark.Batty, peterz, will.deacon, Ramana.Radhakrishnan, dhowells, akpm, mingo, michaelw On Tue, May 19, 2015 at 05:55:10PM -0700, Paul E. McKenney wrote: > Hello! > > Following up on last year's discussion (https://lwn.net/Articles/586838/, > https://lwn.net/Articles/588300/), I believe that we have a solution. If > I am wrong, I am sure you all will let me know, and in great detail. ;-) > > The key simplification is to "just say no" to RCU-protected array indexes: > https://lkml.org/lkml/2015/5/12/827, as was suggested by several people. > This simplification means that rcu_dereference (AKA memory_order_consume) > need only return pointers. This in ture avoids things like (x-x), > (x*0), and (x%1) because if "x" is a pointer, these expressions either > return non-pointers are compilation errors. With a very few exceptions, > dependency chains can lead -to- non-pointers, but cannot pass -through- > them. > > The result is that dependencies are carried only by operations for > which the compiler cannot easily optimize the away those dependencies, > these operations including simple assignment, integer offset (including > indexing), dereferencing, casts, passing as a function argument, return > values from functions and so on. A complete list with commentary starts > on page 28 of: > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf And an update is available here: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf Among other things, this update addresses the points about equality comparisons introduced by the compiler, and outlines some of the issues head-/tail-marked alternatives face with respect to abstraction. The updated "Restricted Dependency Chains" section starts on page 28. Thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Compilers and RCU readers: Once more unto the breach! 2015-07-14 0:44 ` Paul E. McKenney @ 2015-09-22 17:00 ` Paul E. McKenney [not found] ` <CAPUmR1aqV_cQWjE8qC9x2sfmW-1ocKKMtCgNbjZH0cJ-AO2WTg@mail.gmail.com> 0 siblings, 1 reply; 44+ messages in thread From: Paul E. McKenney @ 2015-09-22 17:00 UTC (permalink / raw) To: linux-kernel, c++std-parallel, linux-arch, gcc Cc: Peter.Sewell, torvalds, Mark.Batty, peterz, will.deacon, Ramana.Radhakrishnan, dhowells, akpm, mingo, michaelw On Mon, Jul 13, 2015 at 05:44:59PM -0700, Paul E. McKenney wrote: > On Tue, May 19, 2015 at 05:55:10PM -0700, Paul E. McKenney wrote: > > Hello! > > > > Following up on last year's discussion (https://lwn.net/Articles/586838/, > > https://lwn.net/Articles/588300/), I believe that we have a solution. If > > I am wrong, I am sure you all will let me know, and in great detail. ;-) > > > > The key simplification is to "just say no" to RCU-protected array indexes: > > https://lkml.org/lkml/2015/5/12/827, as was suggested by several people. > > This simplification means that rcu_dereference (AKA memory_order_consume) > > need only return pointers. This in ture avoids things like (x-x), > > (x*0), and (x%1) because if "x" is a pointer, these expressions either > > return non-pointers are compilation errors. With a very few exceptions, > > dependency chains can lead -to- non-pointers, but cannot pass -through- > > them. > > > > The result is that dependencies are carried only by operations for > > which the compiler cannot easily optimize the away those dependencies, > > these operations including simple assignment, integer offset (including > > indexing), dereferencing, casts, passing as a function argument, return > > values from functions and so on. A complete list with commentary starts > > on page 28 of: > > > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf > > And an update is available here: > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf > > Among other things, this update addresses the points about equality > comparisons introduced by the compiler, and outlines some of the > issues head-/tail-marked alternatives face with respect to abstraction. > The updated "Restricted Dependency Chains" section starts on page 28. > > Thoughts? This is updated based on many offline discussions. Section 7.9 has been updated accordingly, but still starts on page 28. This approach is again intended for large existing RCU code bases such as the Linux kernel. A new Section 7.10 starting on page 35 describes a possible longer-term solution for new-to-RCU code bases. This approach adds a storage class that indicates that the marked object carries dependencies. This allows better diagnostics (and arguably better code self-documentation) as well as providing something that formal-verification tools can handle while avoiding the need for compilers to trace dependency chains. http://www.rdrop.com/users/paulmck/submission/consume.2015.09.22a.pdf Thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <CAPUmR1aqV_cQWjE8qC9x2sfmW-1ocKKMtCgNbjZH0cJ-AO2WTg@mail.gmail.com>]
* Re: [c++std-parallel-2008] Re: Compilers and RCU readers: Once more unto the breach! [not found] ` <CAPUmR1aqV_cQWjE8qC9x2sfmW-1ocKKMtCgNbjZH0cJ-AO2WTg@mail.gmail.com> @ 2015-09-23 23:26 ` Paul E. McKenney 0 siblings, 0 replies; 44+ messages in thread From: Paul E. McKenney @ 2015-09-23 23:26 UTC (permalink / raw) To: c++std-parallel Cc: linux-kernel, linux-arch, gcc, p796231, torvalds, Mark Batty, peterz, will.deacon, Ramana.Radhakrishnan, dhowells, akpm, mingo, Michael Wong On Wed, Sep 23, 2015 at 03:30:34PM -0700, Hans Boehm wrote: > I'd really like this to converge. It's great that it tries to focus on a > couple of alternatives. But I'm not sure they're the right ones. > > I kind of like the 7.9 approach of restricting dependency chain to those > case in which no sane existing optimization break the dependency anyway. > The hope would then be that we can restrict future optimizations to not do > so either without too much cost. But I'm again very concerned as to > whether this can be specified acceptably. > > The paper discusses the problem with pointer comparisons that effectively > allow the compiler to remove a dependency by inferring the identity of a > pointer. I don't see how to describe the set of all such cases in a way > that's comprehensible and acceptable in a standard. > > Consider: > > p = x; > y = p -> a; > if (x == well_known_pointer) { > foo(well_known_pointer -> a); > z = true; > } else { > y = 2; > z = false; > } > > Clearly if z = true, the load of y should have depended on that of x. But > this should be transformed to: > > p = x; > if (x == well_known_pointer) { > foo(y = well_known_pointer -> a); > z = true; > } else { > y = 2; > z = false; > } > > which no longer has that property. And I suspect we can construct examples > in which the conditional is in a subsequent inlined function call, and > entirely invisible to the programmer. And there are probably issues with > compilers using more complicated chains of reasoning to infer that pointers > must be equal, e.g. because anything else would result in undefined > behavior. I'm once again being doubtful that we can pull off anything > along these lines. Indeed, the first paragraph of "Equality comparisons" on page 32 says "This dependency-chain breakage can back-propagate to earlier uses of the pointer". The "Narrowing magnitude comparisons" paragraph on that same page makes this same point for sequences of inequality comparisons that allow the compiler to deduce the exact value of the pointer. The comparisons currently in the Linux kernel are against compile-time initialized constant-value structures, in which case breaking the dependency chain is OK. I agree that the compiler could legally introduce a pointer comparison, as noted in the "Equality comparisons" section, for example, when using feedback-directed profiling. Do you see other reasons why the compiler would do this? > 7.10 looks like a good direction, but it seems to me that we need to say > more about handling expression temporaries and function results. > Presumably at least ordinary expression temporaries implicitly carry > dependencies? Thus if both x and y are marked with _Carries_dependency, > then > > y = ((x ^ x) & 0) * 0 > > carries a dependency from x to y? Yes, but a high-quality implementation would be capable of issuing a warning in this case. After all, the developer presumably used memory_order_consume to gain better performance, but in this case that added performance is reduced or perhaps even eliminated completely by the additional code required to maintain the dependency chain. The reason for carrying the dependency in this case is instead to make it easier on the people producing tooling. > I'm leaning again towards something similar to what we attempted to specify > in the current standard, but restricted to temporaries, maybe function > locals, and explicitly annotated locations. This would be some combination > of 7.10 and maybe 7.5. It has the down side that it will likely require a > bunch of explicit attributes (or some kind of C equivalent, though I'm not > sure storage class works well for function arguments and results). But it > should avoid the kill-dependency proliferation required by the current > standard. And we could start to actually clean up the remaining problems > with carries_dependency. The nice thing about having some sort of marking is that it helps in generation of good diagnostics. We also need dependency chains to pass into and out of functions, and this need will become more acute as link-time optimization becomes more mainstream, which can remove the function-call overhead from calls to external functions. We need the compiler to know for sure whether or not a given formal parameter or return value carries a dependency, so some sort of marking is required. In addition, where structures (as opposed to pointers/references to structures) are passed into and out of functions, we will very likely need some way to mark the fields of those structures that are expected to carry dependencies. (The Linux kernel tends to avoid this, which I like, but other projects are likely to make other choices, regardless of my opinions.) That marking could be the [[carries_dependency]] attribute (at least assuming that C adds attributes), a type modifier (which might or might not go over well in Core), a variable modifier (which I don't understand well enough to have an opinion), or a storage class as suggested in 7.10. I am not all that worried about exactly what sort of marking we use, but I am convinced that marking is required. So, given that we need to mark things, what sort of marking would work best from your perspective? Thanx, Paul > On Tue, Sep 22, 2015 at 10:00 AM, Paul E. McKenney < > paulmck@linux.vnet.ibm.com> wrote: > > > On Mon, Jul 13, 2015 at 05:44:59PM -0700, Paul E. McKenney wrote: > > > On Tue, May 19, 2015 at 05:55:10PM -0700, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > Following up on last year's discussion ( > > https://lwn.net/Articles/586838/, > > > > https://lwn.net/Articles/588300/), I believe that we have a > > solution. If > > > > I am wrong, I am sure you all will let me know, and in great detail. > > ;-) > > > > > > > > The key simplification is to "just say no" to RCU-protected array > > indexes: > > > > https://lkml.org/lkml/2015/5/12/827, as was suggested by several > > people. > > > > This simplification means that rcu_dereference (AKA > > memory_order_consume) > > > > need only return pointers. This in ture avoids things like (x-x), > > > > (x*0), and (x%1) because if "x" is a pointer, these expressions either > > > > return non-pointers are compilation errors. With a very few > > exceptions, > > > > dependency chains can lead -to- non-pointers, but cannot pass -through- > > > > them. > > > > > > > > The result is that dependencies are carried only by operations for > > > > which the compiler cannot easily optimize the away those dependencies, > > > > these operations including simple assignment, integer offset (including > > > > indexing), dereferencing, casts, passing as a function argument, return > > > > values from functions and so on. A complete list with commentary > > starts > > > > on page 28 of: > > > > > > > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf > > > > > > And an update is available here: > > > > > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf > > > > > > Among other things, this update addresses the points about equality > > > comparisons introduced by the compiler, and outlines some of the > > > issues head-/tail-marked alternatives face with respect to abstraction. > > > The updated "Restricted Dependency Chains" section starts on page 28. > > > > > > Thoughts? > > > > This is updated based on many offline discussions. Section 7.9 has > > been updated accordingly, but still starts on page 28. This approach is > > again intended for large existing RCU code bases such as the Linux kernel. > > > > A new Section 7.10 starting on page 35 describes a possible longer-term > > solution for new-to-RCU code bases. This approach adds a storage class > > that indicates that the marked object carries dependencies. This allows > > better diagnostics (and arguably better code self-documentation) as well > > as providing something that formal-verification tools can handle while > > avoiding the need for compilers to trace dependency chains. > > > > http://www.rdrop.com/users/paulmck/submission/consume.2015.09.22a.pdf > > > > Thoughts? > > > > Thanx, Paul > > > > > > ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2015-09-23 23:26 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-20 0:55 Compilers and RCU readers: Once more unto the breach! Paul E. McKenney 2015-05-20 1:57 ` Linus Torvalds 2015-05-20 2:10 ` Linus Torvalds 2015-05-20 2:41 ` Paul E. McKenney 2015-05-20 11:47 ` Will Deacon 2015-05-20 12:15 ` Paul E. McKenney 2015-05-20 15:46 ` Will Deacon 2015-05-20 15:54 ` Andrew Haley 2015-05-20 18:16 ` [c++std-parallel-1632] " Paul E. McKenney 2015-05-21 14:22 ` Michael Matz 2015-05-21 15:10 ` Paul E. McKenney 2015-05-21 16:17 ` Michael Matz 2015-05-21 18:37 ` Paul E. McKenney 2015-05-20 18:16 ` Paul E. McKenney 2015-05-21 19:24 ` Will Deacon 2015-05-21 20:02 ` Paul E. McKenney 2015-05-21 20:42 ` Linus Torvalds 2015-05-21 22:02 ` Paul E. McKenney 2015-05-22 6:43 ` Ingo Molnar 2015-05-22 10:43 ` Richard Kenner 2015-05-22 13:11 ` Paul E. McKenney 2015-05-22 13:12 ` Paul E. McKenney 2015-05-26 17:37 ` [c++std-parallel-1641] " Torvald Riegel 2015-05-22 17:30 ` Will Deacon 2015-05-22 18:55 ` Paul E. McKenney 2015-05-20 13:18 ` David Howells 2015-05-20 13:30 ` Paul E. McKenney 2015-05-20 13:37 ` David Howells 2015-05-20 13:44 ` Ramana Radhakrishnan 2015-05-20 14:03 ` Paul E. McKenney 2015-05-20 14:15 ` Ramana Radhakrishnan 2015-05-20 15:12 ` Paul E. McKenney 2015-05-20 15:46 ` David Howells 2015-05-20 14:02 ` [c++std-parallel-1624] " Paul E. McKenney 2015-05-20 2:34 ` Paul E. McKenney 2015-05-20 7:34 ` [c++std-parallel-1614] " Jens Maurer 2015-05-20 9:03 ` Richard Biener 2015-05-20 12:02 ` Paul E. McKenney 2015-05-20 12:01 ` [c++std-parallel-1616] " Paul E. McKenney 2015-05-26 17:08 ` [c++std-parallel-1611] " Torvald Riegel 2015-05-27 1:41 ` [c++std-parallel-1651] " Paul E. McKenney 2015-07-14 0:44 ` Paul E. McKenney 2015-09-22 17:00 ` Paul E. McKenney [not found] ` <CAPUmR1aqV_cQWjE8qC9x2sfmW-1ocKKMtCgNbjZH0cJ-AO2WTg@mail.gmail.com> 2015-09-23 23:26 ` [c++std-parallel-2008] " Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).