linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Proposal for new memory_order_consume definition
@ 2016-02-18  1:10 Paul E. McKenney
  2016-02-20  2:15 ` [isocpp-parallel] " Tony V E
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2016-02-18  1:10 UTC (permalink / raw)
  To: parallel, linux-kernel, linux-arch, gcc, llvm-dev
  Cc: Peter.Sewell, torvalds, Mark.Batty, peterz, will.deacon,
	Ramana.Radhakrishnan, dhowells, akpm, mingo, michaelw, boehm,
	Jens.Maurer, luc.maranget, j.alglave

Hello!

A proposal (quaintly identified as P0190R0) for a new memory_order_consume
definition may be found here:

	http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf

As requested at the October C++ Standards Committee meeting, this
is a follow-on to P0098R1 that picks one alternative and describes
it in detail.  This approach focuses on existing practice, with the
goal of supporting existing code with existing compilers.  In the last
clang/LLVM patch I saw for basic support of this change, you could count
the changed lines and still have lots of fingers and toes left over.
Those who have been following this story will recognize that this is
a very happy contrast to work that would be required to implement the
definition in the current standard.

I expect that P0190R0 will be discussed at the upcoming C++ Standards
Committee meeting taking place the week of February 29th.  Points of
discussion are likely to include:

o	May memory_order_consume dependency ordering be used in
	unannotated code?  I believe that this must be the case,
	especially given that this is our experience base.  P0190R0
	therefore recommends this approach.

o	If memory_order_consume dependency ordering can be used in
	unannotated code, must implementations support annotation?
	I believe that annotation support should be required, at the very
	least for formal verification, which can be quite difficult to
	carry out on unannotated code.  In addition, it seems likely
	that annotations can enable much better diagnostics.  P0190R0
	therefore recommends this approach.

o	If implementations must support annotation, what form should that
	annotation take?  P0190R0 recommends the [[carries_dependency]]
	attribute, but I am not picky as long as it can be (1) applied
	to all relevant pointer-like objects and (2) used in C as well
	as C++.  ;-)

o	If memory_order_consume dependency ordering can be used in
	unannotated code, how best to define the situations where
	the compiler can determine the exact value of the pointer in
	question?  (In current defacto implementations, this can
	defeat dependency ordering.  Interestingly enough, this case
	is not present in the Linux kernel, but still needs to be
	defined.)

	Options include:
	
	o	Provide new intrinsics that carry out the
		comparisons, but guarantee to preserve dependencies,
		as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
		std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
		std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
		and std::pointer_cmp_le_dep()).

	o	State that -any- comparison involving an unannotated
		pointer loses the dependency.

o	How is the common idiom of marking pointers by setting low-order
	bits to be supported when those pointers carry dependencies?
	At the moment, I believe that setting bits in pointers results in
	undefined behavior even without dependency ordering, so P0190R0
	kicks this particular can down the road.  One option that
	has been suggested is to provide intrinsics for this purpose.
	(Sorry, but I forget who suggested this.)

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-18  1:10 Proposal for new memory_order_consume definition Paul E. McKenney
@ 2016-02-20  2:15 ` Tony V E
  2016-02-20 19:53   ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Tony V E @ 2016-02-20  2:15 UTC (permalink / raw)
  To: Paul E. McKenney, parallel, linux-kernel, linux-arch, gcc, llvm-dev
  Cc: peterz, j.alglave, will.deacon, dhowells, Ramana.Radhakrishnan,
	luc.maranget, akpm, Peter.Sewell, torvalds, mingo

There's at least one easy answer in there:

> ‎If implementations must support annotation, what form should that
	annotation take?  P0190R0 recommends the [[carries_dependency]]
	attribute, but I am not picky as long as it can be (1) applied
	to all relevant pointer-like objects and (2) used in C as well
	as C++.  ;-)

If an implementation must support it, then it is not an annotation but a keyword. So no [[]] 



Sent from my BlackBerry portable Babbage Device
  Original Message  
From: Paul E. McKenney
Sent: Thursday, February 18, 2016 4:58 AM
To: parallel@lists.isocpp.org; linux-kernel@vger.kernel.org; linux-arch@vger.kernel.org; gcc@gcc.gnu.org; llvm-dev@lists.llvm.org
Reply To: parallel@lists.isocpp.org
Cc: peterz@infradead.org; j.alglave@ucl.ac.uk; will.deacon@arm.com; dhowells@redhat.com; Ramana.Radhakrishnan@arm.com; luc.maranget@inria.fr; akpm@linux-foundation.org; Peter.Sewell@cl.cam.ac.uk; torvalds@linux-foundation.org; mingo@kernel.org
Subject: [isocpp-parallel] Proposal for new memory_order_consume definition

Hello!

A proposal (quaintly identified as P0190R0) for a new memory_order_consume
definition may be found here:

http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf

As requested at the October C++ Standards Committee meeting, this
is a follow-on to P0098R1 that picks one alternative and describes
it in detail. This approach focuses on existing practice, with the
goal of supporting existing code with existing compilers. In the last
clang/LLVM patch I saw for basic support of this change, you could count
the changed lines and still have lots of fingers and toes left over.
Those who have been following this story will recognize that this is
a very happy contrast to work that would be required to implement the
definition in the current standard.

I expect that P0190R0 will be discussed at the upcoming C++ Standards
Committee meeting taking place the week of February 29th. Points of
discussion are likely to include:

o	May memory_order_consume dependency ordering be used in
unannotated code? I believe that this must be the case,
especially given that this is our experience base. P0190R0
therefore recommends this approach.

o	If memory_order_consume dependency ordering can be used in
unannotated code, must implementations support annotation?
I believe that annotation support should be required, at the very
least for formal verification, which can be quite difficult to
carry out on unannotated code. In addition, it seems likely
that annotations can enable much better diagnostics. P0190R0
therefore recommends this approach.

o	If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)

o	If memory_order_consume dependency ordering can be used in
unannotated code, how best to define the situations where
the compiler can determine the exact value of the pointer in
question? (In current defacto implementations, this can
defeat dependency ordering. Interestingly enough, this case
is not present in the Linux kernel, but still needs to be
defined.)

Options include:

o	Provide new intrinsics that carry out the
comparisons, but guarantee to preserve dependencies,
as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
and std::pointer_cmp_le_dep()).

o	State that -any- comparison involving an unannotated
pointer loses the dependency.

o	How is the common idiom of marking pointers by setting low-order
bits to be supported when those pointers carry dependencies?
At the moment, I believe that setting bits in pointers results in
undefined behavior even without dependency ordering, so P0190R0
kicks this particular can down the road. One option that
has been suggested is to provide intrinsics for this purpose.
(Sorry, but I forget who suggested this.)

Thoughts?

Thanx, Paul

_______________________________________________
Parallel mailing list
Parallel@lists.isocpp.org
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0040.php

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-20  2:15 ` [isocpp-parallel] " Tony V E
@ 2016-02-20 19:53   ` Paul E. McKenney
       [not found]     ` <CAPUmR1bw=N4NkjAK1zn_X0+84KEaEAM6HZCHZJy_txqC9hMgSg@mail.gmail.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2016-02-20 19:53 UTC (permalink / raw)
  To: parallel
  Cc: linux-kernel, linux-arch, gcc, llvm-dev, Peter.Sewell, peterz,
	will.deacon, dhowells, Ramana.Radhakrishnan, luc.maranget, akpm,
	j.alglave, torvalds, mingo

On Fri, Feb 19, 2016 at 09:15:16PM -0500, Tony V E wrote:
> There's at least one easy answer in there:
> 
> > ‎If implementations must support annotation, what form should that
> 	annotation take?  P0190R0 recommends the [[carries_dependency]]
> 	attribute, but I am not picky as long as it can be (1) applied
> 	to all relevant pointer-like objects and (2) used in C as well
> 	as C++.  ;-)
> 
> If an implementation must support it, then it is not an annotation but a keyword. So no [[]] 

I would be good with that approach, especially if the WG14 continues
to stay away from annotations.

For whatever it is worth, the introduction of intrinsics for comparisons
that avoid breaking dependencies enables the annotation to remain
optional.

							Thanx, Paul

> Sent from my BlackBerry portable Babbage Device
>   Original Message  
> From: Paul E. McKenney
> Sent: Thursday, February 18, 2016 4:58 AM
> To: parallel@lists.isocpp.org; linux-kernel@vger.kernel.org; linux-arch@vger.kernel.org; gcc@gcc.gnu.org; llvm-dev@lists.llvm.org
> Reply To: parallel@lists.isocpp.org
> Cc: peterz@infradead.org; j.alglave@ucl.ac.uk; will.deacon@arm.com; dhowells@redhat.com; Ramana.Radhakrishnan@arm.com; luc.maranget@inria.fr; akpm@linux-foundation.org; Peter.Sewell@cl.cam.ac.uk; torvalds@linux-foundation.org; mingo@kernel.org
> Subject: [isocpp-parallel] Proposal for new memory_order_consume definition
> 
> Hello!
> 
> A proposal (quaintly identified as P0190R0) for a new memory_order_consume
> definition may be found here:
> 
> http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf
> 
> As requested at the October C++ Standards Committee meeting, this
> is a follow-on to P0098R1 that picks one alternative and describes
> it in detail. This approach focuses on existing practice, with the
> goal of supporting existing code with existing compilers. In the last
> clang/LLVM patch I saw for basic support of this change, you could count
> the changed lines and still have lots of fingers and toes left over.
> Those who have been following this story will recognize that this is
> a very happy contrast to work that would be required to implement the
> definition in the current standard.
> 
> I expect that P0190R0 will be discussed at the upcoming C++ Standards
> Committee meeting taking place the week of February 29th. Points of
> discussion are likely to include:
> 
> o	May memory_order_consume dependency ordering be used in
> unannotated code? I believe that this must be the case,
> especially given that this is our experience base. P0190R0
> therefore recommends this approach.
> 
> o	If memory_order_consume dependency ordering can be used in
> unannotated code, must implementations support annotation?
> I believe that annotation support should be required, at the very
> least for formal verification, which can be quite difficult to
> carry out on unannotated code. In addition, it seems likely
> that annotations can enable much better diagnostics. P0190R0
> therefore recommends this approach.
> 
> o	If implementations must support annotation, what form should that
> annotation take? P0190R0 recommends the [[carries_dependency]]
> attribute, but I am not picky as long as it can be (1) applied
> to all relevant pointer-like objects and (2) used in C as well
> as C++. ;-)
> 
> o	If memory_order_consume dependency ordering can be used in
> unannotated code, how best to define the situations where
> the compiler can determine the exact value of the pointer in
> question? (In current defacto implementations, this can
> defeat dependency ordering. Interestingly enough, this case
> is not present in the Linux kernel, but still needs to be
> defined.)
> 
> Options include:
> 
> o	Provide new intrinsics that carry out the
> comparisons, but guarantee to preserve dependencies,
> as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
> std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
> std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
> and std::pointer_cmp_le_dep()).
> 
> o	State that -any- comparison involving an unannotated
> pointer loses the dependency.
> 
> o	How is the common idiom of marking pointers by setting low-order
> bits to be supported when those pointers carry dependencies?
> At the moment, I believe that setting bits in pointers results in
> undefined behavior even without dependency ordering, so P0190R0
> kicks this particular can down the road. One option that
> has been suggested is to provide intrinsics for this purpose.
> (Sorry, but I forget who suggested this.)
> 
> Thoughts?
> 
> Thanx, Paul
> 
> _______________________________________________
> Parallel mailing list
> Parallel@lists.isocpp.org
> Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
> Link to this post: http://lists.isocpp.org/parallel/2016/02/0040.php
> _______________________________________________
> Parallel mailing list
> Parallel@lists.isocpp.org
> Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
> Link to this post: http://lists.isocpp.org/parallel/2016/02/0045.php

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
       [not found]     ` <CAPUmR1bw=N4NkjAK1zn_X0+84KEaEAM6HZCHZJy_txqC9hMgSg@mail.gmail.com>
@ 2016-02-26 23:56       ` Lawrence Crowl
  2016-02-27 17:06       ` Paul E. McKenney
  1 sibling, 0 replies; 18+ messages in thread
From: Lawrence Crowl @ 2016-02-26 23:56 UTC (permalink / raw)
  To: parallel
  Cc: linux-arch, gcc, p796231, llvm-dev, will.deacon, linux-kernel,
	dhowells, peterz, Ramana.Radhakrishnan, Luc Maranget, akpm,
	Jade Alglave, torvalds, mingo

On 2/25/16, Hans Boehm <boehm@acm.org> wrote:
> If carries_dependency affects semantics, then it should not be an
> attribute.
>
> The original design, or at least my understanding of it, was that it not
> have semantics; it was only a suggestion to the compiler that it should
> preserve dependencies instead of inserting a fence at the call site.
> Dependency-based ordering would be preserved in either case.

Yes, but there is a performance penalty, though I do not know how severe.
When do the pragmatics become sufficiently severe that they become
semantics?

> But I think we're moving away from that view towards something that doesn't
> quietly add fences.
>
> I do not think we can quite get away with defining a dependency in a way
> that is unconditionally preserved by existing compilers, and thus I think
> that we do probably need annotations along the dependency path.  I just
> don't see a way to otherwise deal with the case in which a compiler infers
> an equivalent pointer and dereferences that instead of the original.  This
> can happen under so many (unlikely but) hard-to-define conditions that it
> seems undefinable in an implementation-independent manner. "If the
> implementation is able then <the semantics change>" is, in my opinion, not
> acceptable standards text.
>
> Thus I see no way to both avoid adding syntax to functions that preserve
> dependencies and continue to allow existing transformations that remove
> dependencies we care about, e.g. due to equality comparisons.  We can
> hopefully ensure that without annotations compilers break things with very
> low probability, so that there is a reasonable path forward for existing
> code relying on dependency ordering (which currently also breaks with very
> low probability unless you understand what the compiler is doing).  But I
> don't see a way for the standard to guarantee correctness without the added
> syntax (or added optimization constraints that effectively assume all
> functions were annotated).
>
> On Sat, Feb 20, 2016 at 11:53 AM, Paul E. McKenney <
> paulmck@linux.vnet.ibm.com> wrote:
>
>> On Fri, Feb 19, 2016 at 09:15:16PM -0500, Tony V E wrote:
>> > There's at least one easy answer in there:
>> >
>> > > ‎If implementations must support annotation, what form should that
>> >       annotation take?  P0190R0 recommends the [[carries_dependency]]
>> >       attribute, but I am not picky as long as it can be (1) applied
>> >       to all relevant pointer-like objects and (2) used in C as well
>> >       as C++.  ;-)
>> >
>> > If an implementation must support it, then it is not an annotation but
>> > a
>> keyword. So no [[]]
>>
>> I would be good with that approach, especially if the WG14 continues
>> to stay away from annotations.
>>
>> For whatever it is worth, the introduction of intrinsics for comparisons
>> that avoid breaking dependencies enables the annotation to remain
>> optional.
>>
>>                                                         Thanx, Paul
>>
>> > Sent from my BlackBerry portable Babbage Device
>> >   Original Message
>> > From: Paul E. McKenney
>> > Sent: Thursday, February 18, 2016 4:58 AM
>> > To: parallel@lists.isocpp.org; linux-kernel@vger.kernel.org;
>> linux-arch@vger.kernel.org; gcc@gcc.gnu.org; llvm-dev@lists.llvm.org
>> > Reply To: parallel@lists.isocpp.org
>> > Cc: peterz@infradead.org; j.alglave@ucl.ac.uk; will.deacon@arm.com;
>> dhowells@redhat.com; Ramana.Radhakrishnan@arm.com; luc.maranget@inria.fr;
>> akpm@linux-foundation.org; Peter.Sewell@cl.cam.ac.uk;
>> torvalds@linux-foundation.org; mingo@kernel.org
>> > Subject: [isocpp-parallel] Proposal for new memory_order_consume
>> definition
>> >
>> > Hello!
>> >
>> > A proposal (quaintly identified as P0190R0) for a new
>> memory_order_consume
>> > definition may be found here:
>> >
>> > http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf
>> >
>> > As requested at the October C++ Standards Committee meeting, this
>> > is a follow-on to P0098R1 that picks one alternative and describes
>> > it in detail. This approach focuses on existing practice, with the
>> > goal of supporting existing code with existing compilers. In the last
>> > clang/LLVM patch I saw for basic support of this change, you could
>> > count
>> > the changed lines and still have lots of fingers and toes left over.
>> > Those who have been following this story will recognize that this is
>> > a very happy contrast to work that would be required to implement the
>> > definition in the current standard.
>> >
>> > I expect that P0190R0 will be discussed at the upcoming C++ Standards
>> > Committee meeting taking place the week of February 29th. Points of
>> > discussion are likely to include:
>> >
>> > o     May memory_order_consume dependency ordering be used in
>> > unannotated code? I believe that this must be the case,
>> > especially given that this is our experience base. P0190R0
>> > therefore recommends this approach.
>> >
>> > o     If memory_order_consume dependency ordering can be used in
>> > unannotated code, must implementations support annotation?
>> > I believe that annotation support should be required, at the very
>> > least for formal verification, which can be quite difficult to
>> > carry out on unannotated code. In addition, it seems likely
>> > that annotations can enable much better diagnostics. P0190R0
>> > therefore recommends this approach.
>> >
>> > o     If implementations must support annotation, what form should that
>> > annotation take? P0190R0 recommends the [[carries_dependency]]
>> > attribute, but I am not picky as long as it can be (1) applied
>> > to all relevant pointer-like objects and (2) used in C as well
>> > as C++. ;-)
>> >
>> > o     If memory_order_consume dependency ordering can be used in
>> > unannotated code, how best to define the situations where
>> > the compiler can determine the exact value of the pointer in
>> > question? (In current defacto implementations, this can
>> > defeat dependency ordering. Interestingly enough, this case
>> > is not present in the Linux kernel, but still needs to be
>> > defined.)
>> >
>> > Options include:
>> >
>> > o     Provide new intrinsics that carry out the
>> > comparisons, but guarantee to preserve dependencies,
>> > as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
>> > std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
>> > std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
>> > and std::pointer_cmp_le_dep()).
>> >
>> > o     State that -any- comparison involving an unannotated
>> > pointer loses the dependency.
>> >
>> > o     How is the common idiom of marking pointers by setting low-order
>> > bits to be supported when those pointers carry dependencies?
>> > At the moment, I believe that setting bits in pointers results in
>> > undefined behavior even without dependency ordering, so P0190R0
>> > kicks this particular can down the road. One option that
>> > has been suggested is to provide intrinsics for this purpose.
>> > (Sorry, but I forget who suggested this.)
>> >
>> > Thoughts?
>> >
>> > Thanx, Paul
>> >
>> > _______________________________________________
>> > Parallel mailing list
>> > Parallel@lists.isocpp.org
>> > Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
>> > Link to this post: http://lists.isocpp.org/parallel/2016/02/0040.php
>> > _______________________________________________
>> > Parallel mailing list
>> > Parallel@lists.isocpp.org
>> > Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
>> > Link to this post: http://lists.isocpp.org/parallel/2016/02/0045.php
>>
>> _______________________________________________
>> Parallel mailing list
>> Parallel@lists.isocpp.org
>> Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
>> Link to this post: http://lists.isocpp.org/parallel/2016/02/0046.php
>


-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
       [not found]     ` <CAPUmR1bw=N4NkjAK1zn_X0+84KEaEAM6HZCHZJy_txqC9hMgSg@mail.gmail.com>
  2016-02-26 23:56       ` Lawrence Crowl
@ 2016-02-27 17:06       ` Paul E. McKenney
       [not found]         ` <CA+55aFyHmykKc=YybJMo9ZUO352MY5noJVB4-K1Lkjmw4UHXfA@mail.gmail.com>
  2016-02-29 18:17         ` Michael Matz
  1 sibling, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2016-02-27 17:06 UTC (permalink / raw)
  To: parallel
  Cc: linux-arch, gcc, p796231, llvm-dev, will.deacon, linux-kernel,
	dhowells, peterz, Ramana.Radhakrishnan, Luc Maranget, akpm,
	Jade Alglave, torvalds, mingo

On Thu, Feb 25, 2016 at 04:46:50PM -0800, Hans Boehm wrote:
> If carries_dependency affects semantics, then it should not be an attribute.

I am not picky about the form of the marking.

> The original design, or at least my understanding of it, was that it not
> have semantics; it was only a suggestion to the compiler that it should
> preserve dependencies instead of inserting a fence at the call site.
> Dependency-based ordering would be preserved in either case.  But I think
> we're moving away from that view towards something that doesn't quietly add
> fences.

Yes, we do need to allow typical implementations to avoid quiet fence
addition.

> I do not think we can quite get away with defining a dependency in a way
> that is unconditionally preserved by existing compilers, and thus I think
> that we do probably need annotations along the dependency path.  I just
> don't see a way to otherwise deal with the case in which a compiler infers
> an equivalent pointer and dereferences that instead of the original.  This
> can happen under so many (unlikely but) hard-to-define conditions that it
> seems undefinable in an implementation-independent manner. "If the
> implementation is able then <the semantics change>" is, in my opinion, not
> acceptable standards text.

Hmmm...

But we do already have something very similar with signed integer
overflow.  If the compiler can see a way to generate faster code that
does not handle the overflow case, then the semantics suddenly change
from twos-complement arithmetic to something very strange.  The standard
does not specify all the ways that the implementation might deduce that
faster code can be generated by ignoring the overflow case, it instead
simply says that signed integer overflow invoked undefined behavior.

And if that is a problem, you use unsigned integers instead of signed
integers.

So it seems that we should be able to do something very similar here.
If you don't use marking, and the compiler deduces that a given pointer
that carries a given dependency is equal to some other pointer not
carrying that same dependency, there is no dependency ordering.  And,
just as with the signed-integer-overflow case, if that is a problem for
you, you can mark the pointers that you intend to carry dependencies.

In both the signed-integer-overflow and pointer-value-deduction cases,
most use cases don't need to care.  In the integer case, this is because
most use cases have small integer values that don't overflow.  In the
pointer case, this is because when the data structure is composed of
lots of heap-allocated data items, the compiler really cannot deduce
anything.

Other safe pointer use cases involve statically allocated data items
whose contents are compile-time constants (thus avoiding the need for
any sort of ordering) and sentinel data items (as in the Linux kernel's
cicular linked lists) where there is no dereferencing.

> Thus I see no way to both avoid adding syntax to functions that preserve
> dependencies and continue to allow existing transformations that remove
> dependencies we care about, e.g. due to equality comparisons.  We can
> hopefully ensure that without annotations compilers break things with very
> low probability, so that there is a reasonable path forward for existing
> code relying on dependency ordering (which currently also breaks with very
> low probability unless you understand what the compiler is doing).  But I
> don't see a way for the standard to guarantee correctness without the added
> syntax (or added optimization constraints that effectively assume all
> functions were annotated).

Your second sentence ("We can hopefully ensure...") does give me hope
that we might be able to reach agreement.  The intent of P0190R0 is
to define a subset of operations where dependencies will be carried.
Note that P0190R0 does call out comparisons as potentially unsafe.

							Thanx, Paul

> On Sat, Feb 20, 2016 at 11:53 AM, Paul E. McKenney <
> paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Fri, Feb 19, 2016 at 09:15:16PM -0500, Tony V E wrote:
> > > There's at least one easy answer in there:
> > >
> > > > ‎If implementations must support annotation, what form should that
> > >       annotation take?  P0190R0 recommends the [[carries_dependency]]
> > >       attribute, but I am not picky as long as it can be (1) applied
> > >       to all relevant pointer-like objects and (2) used in C as well
> > >       as C++.  ;-)
> > >
> > > If an implementation must support it, then it is not an annotation but a
> > keyword. So no [[]]
> >
> > I would be good with that approach, especially if the WG14 continues
> > to stay away from annotations.
> >
> > For whatever it is worth, the introduction of intrinsics for comparisons
> > that avoid breaking dependencies enables the annotation to remain
> > optional.
> >
> >                                                         Thanx, Paul
> >
> > > Sent from my BlackBerry portable Babbage Device
> > >   Original Message
> > > From: Paul E. McKenney
> > > Sent: Thursday, February 18, 2016 4:58 AM
> > > To: parallel@lists.isocpp.org; linux-kernel@vger.kernel.org;
> > linux-arch@vger.kernel.org; gcc@gcc.gnu.org; llvm-dev@lists.llvm.org
> > > Reply To: parallel@lists.isocpp.org
> > > Cc: peterz@infradead.org; j.alglave@ucl.ac.uk; will.deacon@arm.com;
> > dhowells@redhat.com; Ramana.Radhakrishnan@arm.com; luc.maranget@inria.fr;
> > akpm@linux-foundation.org; Peter.Sewell@cl.cam.ac.uk;
> > torvalds@linux-foundation.org; mingo@kernel.org
> > > Subject: [isocpp-parallel] Proposal for new memory_order_consume
> > definition
> > >
> > > Hello!
> > >
> > > A proposal (quaintly identified as P0190R0) for a new
> > memory_order_consume
> > > definition may be found here:
> > >
> > > http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf
> > >
> > > As requested at the October C++ Standards Committee meeting, this
> > > is a follow-on to P0098R1 that picks one alternative and describes
> > > it in detail. This approach focuses on existing practice, with the
> > > goal of supporting existing code with existing compilers. In the last
> > > clang/LLVM patch I saw for basic support of this change, you could count
> > > the changed lines and still have lots of fingers and toes left over.
> > > Those who have been following this story will recognize that this is
> > > a very happy contrast to work that would be required to implement the
> > > definition in the current standard.
> > >
> > > I expect that P0190R0 will be discussed at the upcoming C++ Standards
> > > Committee meeting taking place the week of February 29th. Points of
> > > discussion are likely to include:
> > >
> > > o     May memory_order_consume dependency ordering be used in
> > > unannotated code? I believe that this must be the case,
> > > especially given that this is our experience base. P0190R0
> > > therefore recommends this approach.
> > >
> > > o     If memory_order_consume dependency ordering can be used in
> > > unannotated code, must implementations support annotation?
> > > I believe that annotation support should be required, at the very
> > > least for formal verification, which can be quite difficult to
> > > carry out on unannotated code. In addition, it seems likely
> > > that annotations can enable much better diagnostics. P0190R0
> > > therefore recommends this approach.
> > >
> > > o     If implementations must support annotation, what form should that
> > > annotation take? P0190R0 recommends the [[carries_dependency]]
> > > attribute, but I am not picky as long as it can be (1) applied
> > > to all relevant pointer-like objects and (2) used in C as well
> > > as C++. ;-)
> > >
> > > o     If memory_order_consume dependency ordering can be used in
> > > unannotated code, how best to define the situations where
> > > the compiler can determine the exact value of the pointer in
> > > question? (In current defacto implementations, this can
> > > defeat dependency ordering. Interestingly enough, this case
> > > is not present in the Linux kernel, but still needs to be
> > > defined.)
> > >
> > > Options include:
> > >
> > > o     Provide new intrinsics that carry out the
> > > comparisons, but guarantee to preserve dependencies,
> > > as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
> > > std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
> > > std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
> > > and std::pointer_cmp_le_dep()).
> > >
> > > o     State that -any- comparison involving an unannotated
> > > pointer loses the dependency.
> > >
> > > o     How is the common idiom of marking pointers by setting low-order
> > > bits to be supported when those pointers carry dependencies?
> > > At the moment, I believe that setting bits in pointers results in
> > > undefined behavior even without dependency ordering, so P0190R0
> > > kicks this particular can down the road. One option that
> > > has been suggested is to provide intrinsics for this purpose.
> > > (Sorry, but I forget who suggested this.)
> > >
> > > Thoughts?
> > >
> > > Thanx, Paul
> > >
> > > _______________________________________________
> > > Parallel mailing list
> > > Parallel@lists.isocpp.org
> > > Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
> > > Link to this post: http://lists.isocpp.org/parallel/2016/02/0040.php
> > > _______________________________________________
> > > Parallel mailing list
> > > Parallel@lists.isocpp.org
> > > Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
> > > Link to this post: http://lists.isocpp.org/parallel/2016/02/0045.php
> >
> > _______________________________________________
> > Parallel mailing list
> > Parallel@lists.isocpp.org
> > Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
> > Link to this post: http://lists.isocpp.org/parallel/2016/02/0046.php

> _______________________________________________
> Parallel mailing list
> Parallel@lists.isocpp.org
> Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
> Link to this post: http://lists.isocpp.org/parallel/2016/02/0049.php

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
       [not found]         ` <CA+55aFyHmykKc=YybJMo9ZUO352MY5noJVB4-K1Lkjmw4UHXfA@mail.gmail.com>
@ 2016-02-27 23:10           ` Paul E. McKenney
  2016-02-28  8:27             ` Markus Trippelsdorf
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2016-02-27 23:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Jade Alglave, p796231, gcc, llvm-dev, Luc Maranget,
	mingo, akpm, dhowells, linux-kernel, parallel, will.deacon,
	peterz, Ramana.Radhakrishnan

On Sat, Feb 27, 2016 at 11:16:51AM -0800, Linus Torvalds wrote:
> On Feb 27, 2016 09:06, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> wrote:
> >
> >
> > But we do already have something very similar with signed integer
> > overflow.  If the compiler can see a way to generate faster code that
> > does not handle the overflow case, then the semantics suddenly change
> > from twos-complement arithmetic to something very strange.  The standard
> > does not specify all the ways that the implementation might deduce that
> > faster code can be generated by ignoring the overflow case, it instead
> > simply says that signed integer overflow invoked undefined behavior.
> >
> > And if that is a problem, you use unsigned integers instead of signed
> > integers.
> 
> Actually, in the case of there Linux kernel we just tell the compiler to
> not be an ass. We use
> 
>   -fno-strict-overflow

That is the one!

> or something. I forget the exact compiler flag needed for "the standard is
> as broken piece of shit and made things undefined for very bad reasons".
> 
> See also there idiotic standard C alias rules. Same deal.

For which we use -fno-strict-aliasing.

> So no, standards aren't that important. When the standards screw up, the
> right answer is not to turn the other cheek.

Agreed, hence my current (perhaps quixotic and insane) attempt to get
the standard to do something useful for dependency ordering.  But if
that doesn't work, yes, a fallback position is to get the relevant
compilers to provide flags to avoid problematic behavior, similar to
-fno-strict-overflow.

							Thanx, Paul

> And undefined behavior is pretty much *always* a sign of "the standard is
> wrong".
> 
>      Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-27 23:10           ` Paul E. McKenney
@ 2016-02-28  8:27             ` Markus Trippelsdorf
  2016-02-28 16:13               ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Trippelsdorf @ 2016-02-28  8:27 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, linux-arch, gcc, parallel, llvm-dev, will.deacon,
	linux-kernel, dhowells, peterz, Ramana.Radhakrishnan,
	Luc Maranget, akpm, Jade Alglave, mingo

On 2016.02.27 at 15:10 -0800, Paul E. McKenney via llvm-dev wrote:
> On Sat, Feb 27, 2016 at 11:16:51AM -0800, Linus Torvalds wrote:
> > On Feb 27, 2016 09:06, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > wrote:
> > >
> > >
> > > But we do already have something very similar with signed integer
> > > overflow.  If the compiler can see a way to generate faster code that
> > > does not handle the overflow case, then the semantics suddenly change
> > > from twos-complement arithmetic to something very strange.  The standard
> > > does not specify all the ways that the implementation might deduce that
> > > faster code can be generated by ignoring the overflow case, it instead
> > > simply says that signed integer overflow invoked undefined behavior.
> > >
> > > And if that is a problem, you use unsigned integers instead of signed
> > > integers.
> > 
> > Actually, in the case of there Linux kernel we just tell the compiler to
> > not be an ass. We use
> > 
> >   -fno-strict-overflow
> 
> That is the one!
> 
> > or something. I forget the exact compiler flag needed for "the standard is
> > as broken piece of shit and made things undefined for very bad reasons".
> > 
> > See also there idiotic standard C alias rules. Same deal.
> 
> For which we use -fno-strict-aliasing.

Do not forget -fno-delete-null-pointer-checks. 

So the kernel obviously is already using its own C dialect, that is
pretty far from standard C.
All these options also have a negative impact on the performance of the
generated code.

-- 
Markus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-28  8:27             ` Markus Trippelsdorf
@ 2016-02-28 16:13               ` Linus Torvalds
  2016-02-28 16:50                 ` [llvm-dev] " cbergstrom
                                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Linus Torvalds @ 2016-02-28 16:13 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Paul McKenney, linux-arch, gcc, parallel, llvm-dev, Will Deacon,
	Linux Kernel Mailing List, David Howells, Peter Zijlstra,
	Ramana Radhakrishnan, Luc Maranget, Andrew Morton, Jade Alglave,
	Ingo Molnar

On Sun, Feb 28, 2016 at 12:27 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>> >
>> >   -fno-strict-overflow
>>
>>   -fno-strict-aliasing.
>
> Do not forget -fno-delete-null-pointer-checks.
>
> So the kernel obviously is already using its own C dialect, that is
> pretty far from standard C.
> All these options also have a negative impact on the performance of the
> generated code.

They really don't.

Have you ever seen code that cared about signed integer overflow?
Yeah, getting it right can make the compiler generate an extra ALU
instruction once in a blue moon, but trust me - you'll never notice.
You *will* notice when you suddenly have a crash or a security issue
due to bad code generation, though.

The idiotic C alias rules aren't even worth discussing. They were a
mistake. The kernel doesn't use some "C dialect pretty far from
standard C". Yeah, let's just say that the original C designers were
better at their job than a gaggle of standards people who were making
bad crap up to make some Fortran-style programs go faster.

They don't speed up normal code either, they just introduce undefined
behavior in a lot of code.

And deleting NULL pointer checks because somebody made a mistake, and
then turning that small mistake into a real and exploitable security
hole? Not so smart either.

The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.

Performance doesn't come from occasional small and odd
micro-optimizations. I care about performance a lot, and I actually
look at generated code and do profiling etc. None of those three
options have *ever* shown up as issues. But the incorrect code they
generate? It has.

     Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [llvm-dev] [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-28 16:13               ` Linus Torvalds
@ 2016-02-28 16:50                 ` cbergstrom
  2016-02-29 17:37                 ` Michael Matz
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: cbergstrom @ 2016-02-28 16:50 UTC (permalink / raw)
  To: Linus Torvalds via llvm-dev, Markus Trippelsdorf
  Cc: linux-arch, gcc, Jade Alglave, parallel, llvm-dev, Will Deacon,
	Linux Kernel Mailing List, David Howells, Peter Zijlstra,
	Ramana Radhakrishnan, Luc Maranget, Andrew Morton, Paul McKenney,
	Ingo Molnar

Sometimes Linus says some really flippant and funny things but gosh I couldn't agree more.. with one tiny nit..

Properly written Fortran and a good compiler is potentially as fast or faster than typical C version in HPC codes. (yes you may be able to get the c version faster, but it would take some effort.)

  Original Message  
From: Linus Torvalds via llvm-dev
Sent: Sunday, February 28, 2016 23:13
To: Markus Trippelsdorf
Reply To: Linus Torvalds
Cc: linux-arch@vger.kernel.org; gcc@gcc.gnu.org; Jade Alglave; parallel@lists.isocpp.org; llvm-dev@lists.llvm.org; Will Deacon; Linux Kernel Mailing List; David Howells; Peter Zijlstra; Ramana Radhakrishnan; Luc Maranget; Andrew Morton; Paul McKenney; Ingo Molnar
Subject: Re: [llvm-dev] [isocpp-parallel] Proposal for new memory_order_consume definition

On Sun, Feb 28, 2016 at 12:27 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>> >
>> > -fno-strict-overflow
>>
>> -fno-strict-aliasing.
>
> Do not forget -fno-delete-null-pointer-checks.
>
> So the kernel obviously is already using its own C dialect, that is
> pretty far from standard C.
> All these options also have a negative impact on the performance of the
> generated code.

They really don't.

Have you ever seen code that cared about signed integer overflow?
Yeah, getting it right can make the compiler generate an extra ALU
instruction once in a blue moon, but trust me - you'll never notice.
You *will* notice when you suddenly have a crash or a security issue
due to bad code generation, though.

The idiotic C alias rules aren't even worth discussing. They were a
mistake. The kernel doesn't use some "C dialect pretty far from
standard C". Yeah, let's just say that the original C designers were
better at their job than a gaggle of standards people who were making
bad crap up to make some Fortran-style programs go faster.

They don't speed up normal code either, they just introduce undefined
behavior in a lot of code.

And deleting NULL pointer checks because somebody made a mistake, and
then turning that small mistake into a real and exploitable security
hole? Not so smart either.

The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.

Performance doesn't come from occasional small and odd
micro-optimizations. I care about performance a lot, and I actually
look at generated code and do profiling etc. None of those three
options have *ever* shown up as issues. But the incorrect code they
generate? It has.

Linus
_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-28 16:13               ` Linus Torvalds
  2016-02-28 16:50                 ` [llvm-dev] " cbergstrom
@ 2016-02-29 17:37                 ` Michael Matz
  2016-02-29 17:57                   ` Linus Torvalds
  2016-02-29 19:38                 ` Lawrence Crowl
  2016-02-29 20:45                 ` Toon Moene
  3 siblings, 1 reply; 18+ messages in thread
From: Michael Matz @ 2016-02-29 17:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Markus Trippelsdorf, Paul McKenney, linux-arch, gcc, parallel,
	llvm-dev, Will Deacon, Linux Kernel Mailing List, David Howells,
	Peter Zijlstra, Ramana Radhakrishnan, Luc Maranget,
	Andrew Morton, Jade Alglave, Ingo Molnar

Hi,

On Sun, 28 Feb 2016, Linus Torvalds wrote:

> > So the kernel obviously is already using its own C dialect, that is 
> > pretty far from standard C. All these options also have a negative 
> > impact on the performance of the generated code.
> 
> They really don't.

They do.

> Have you ever seen code that cared about signed integer overflow?
> 
> Yeah, getting it right can make the compiler generate an extra ALU
> instruction once in a blue moon, but trust me - you'll never notice.
> You *will* notice when you suddenly have a crash or a security issue
> due to bad code generation, though.

No, that's not at all the important piece of making signed overflow 
undefined.  The important part is with induction variables controlling 
loops:

  short i;  for (i = start; i < end; i++)
vs.
  unsigned short u; for (u = start; u < end; u++)

For the former you're allowed to assume that the loop will terminate, and 
that its iteration count is easily computable.  For the latter you get 
modulo arithmetic and (if start/end are of larger type than u, say 'int') 
it might not even terminate at all.  That has direct consequences of 
vectorizability of such loops (or profitability of such transformation) 
and hence quite important performance implications in practice.  Not for 
the kernel of course.  Now we can endlessly debate how (non)practical it 
is to write HPC code in C or C++, but there we are.

> The fact is, undefined compiler behavior is never a good idea. Not for 
> serious projects.

Perhaps if these undefinednesses wouldn't have been put into the standard, 
people wouldn't have written HPC code, and if that were so the world would 
be a nicer place sometimes (certainly for the compiler).  Alas, it isn't.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-29 17:37                 ` Michael Matz
@ 2016-02-29 17:57                   ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2016-02-29 17:57 UTC (permalink / raw)
  To: Michael Matz
  Cc: Markus Trippelsdorf, Paul McKenney, linux-arch, gcc, parallel,
	llvm-dev, Will Deacon, Linux Kernel Mailing List, David Howells,
	Peter Zijlstra, Ramana Radhakrishnan, Luc Maranget,
	Andrew Morton, Jade Alglave, Ingo Molnar

On Mon, Feb 29, 2016 at 9:37 AM, Michael Matz <matz@suse.de> wrote:
>
>The important part is with induction variables controlling
> loops:
>
>   short i;  for (i = start; i < end; i++)
> vs.
>   unsigned short u; for (u = start; u < end; u++)
>
> For the former you're allowed to assume that the loop will terminate, and
> that its iteration count is easily computable.  For the latter you get
> modulo arithmetic and (if start/end are of larger type than u, say 'int')
> it might not even terminate at all.  That has direct consequences of
> vectorizability of such loops (or profitability of such transformation)
> and hence quite important performance implications in practice.

Stop bullshitting me.

It would generally force the compiler to add a few extra checks when
you do vectorize (or, more generally, do any kind of loop unrolling),
and yes, it would make things slightly more painful. You might, for
example, need to add code to handle the wraparound and have a more
complex non-unrolled head/tail version for that case.

In theory you could do a whole "restart the unrolled loop around the
index wraparound" if you actually cared about the performance of such
a case - but since nobody would ever care about that, it's more likely
that you'd just do it with a non-unrolled fallback (which would likely
be identical to the tail fixup).

It would be painful, yes.

But it wouldn't be fundamentally hard, or hurt actual performance fundamentally.

It would be _inconvenient_ for compiler writers, and the bad ones
would argue vehemently against it.

.. and it's how a "go fast" mode would be implemented by a compiler
writer initially as a compiler option, for those HPC people. Then you
have a use case and implementation example, and can go to the
standards body and say "look, we have people who use this already, it
breaks almost no code, and it makes our compiler able to generate much
faster code".

Which is why the standard was written to be good for compiler writers,
not actual users.

Of course, in real life HPC performance is often more about doing the
cache blocking etc, and I've seen people move to more parameterized
languages rather than C to get best performance. Generate the code
from a much higher-level description, and be able to do a much better
job, and leave C to do the low-level job, and let people do the
important part.

But no. Instead the C compiler people still argue for bad features
that were a misdesign and a wart on the language.

At the very least it should have been left as a "go unsafe, go fast"
option, and standardize *that*, instead of screwing everybody else
over.

The HPC people end up often using those anyway, because it turns out
that they'll happily get rid of proper rounding etc if it buys them a
couple of percent on their workload.  Things like "I really want you
to generate multiply-accumulate instructions because I don't mind
having intermediates with higher precision" etc.

             Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-27 17:06       ` Paul E. McKenney
       [not found]         ` <CA+55aFyHmykKc=YybJMo9ZUO352MY5noJVB4-K1Lkjmw4UHXfA@mail.gmail.com>
@ 2016-02-29 18:17         ` Michael Matz
  2016-03-01  1:28           ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Matz @ 2016-02-29 18:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: parallel, linux-arch, gcc, p796231, llvm-dev, will.deacon,
	linux-kernel, dhowells, peterz, Ramana.Radhakrishnan,
	Luc Maranget, akpm, Jade Alglave, torvalds, mingo

Hi,

On Sat, 27 Feb 2016, Paul E. McKenney wrote:

> But we do already have something very similar with signed integer
> overflow.  If the compiler can see a way to generate faster code that
> does not handle the overflow case, then the semantics suddenly change
> from twos-complement arithmetic to something very strange.  The standard
> does not specify all the ways that the implementation might deduce that
> faster code can be generated by ignoring the overflow case, it instead
> simply says that signed integer overflow invoked undefined behavior.
> 
> And if that is a problem, you use unsigned integers instead of signed
> integers.
> 
> So it seems that we should be able to do something very similar here.

For this case the important pice of information to convey one or the other 
meaning in source code is the _type_ of involved entities, not annotations 
on the operations.  signed type -> undefined overflow, unsigned type -> 
modulo arithmetic; easy, and it nicely carries automatically through 
operation chains (and pointers) without any annotations.

I feel much of the complexity in the memory order specifications, also 
with your recent (much better) wording to explain dependency chains, would 
be much easier if the 'carries-dependency' would be encoded into the types 
of operands.  For purpose of example, let's call the marker "blaeh" (not 
atomic to not confuse with existing use :) ):

int foo;
blaeh int global;
int *somep;
blae int *blaehp;
f () {
  blaehp = &foo;  // might be okay, adds restrictions on accesses through 
                  // blaehp, but not through 'foo' directly
  blaehp = &global;
  if (somep == blaehp)
    {
      /* Even though the value is equal ... */
      ... *blaehp ... /* ... a compiler can't rewrite this into *somep */
    }
}

A "carries-dependency" on some operation (e.g. a call) would be added by 
using a properly typed pointer at those arguments (or return type) where 
it matters.  You can't give a blaeh pointer to something only accepting 
non-blaeh pointers (without cast).

Pointer addition and similar transformations involving a blaeh pointer and 
some integer would still give a blaeh pointer, and hence by default also 
solve the problem of cancellations.

Such marking via types would not solve all problems in an optimal way if 
you had two overlapping but independend dependency chains (all of them 
would collapse to one chain and hence made dependend, which still is 
conservatively correct).

OTOH introducing new type qualifiers is a much larger undertaking, so I 
can understand one wants to avoid this.  I think it'd ultimately be 
clearer, though.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-28 16:13               ` Linus Torvalds
  2016-02-28 16:50                 ` [llvm-dev] " cbergstrom
  2016-02-29 17:37                 ` Michael Matz
@ 2016-02-29 19:38                 ` Lawrence Crowl
  2016-02-29 21:12                   ` [llvm-dev] " James Y Knight
  2016-02-29 20:45                 ` Toon Moene
  3 siblings, 1 reply; 18+ messages in thread
From: Lawrence Crowl @ 2016-02-29 19:38 UTC (permalink / raw)
  To: parallel
  Cc: Markus Trippelsdorf, linux-arch, gcc, Jade Alglave, llvm-dev,
	Will Deacon, Linux Kernel Mailing List, David Howells,
	Peter Zijlstra, Ramana Radhakrishnan, Luc Maranget,
	Andrew Morton, Ingo Molnar

On 2/28/16, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> The fact is, undefined compiler behavior is never a good idea. Not for
> serious projects.

Actually, undefined behavior is essential for serious projects, but
not for the reasons mentioned.

If the language has no undefined behavior, then from the compiler's view,
there is no such thing as a bad program.  All programs will compile and
enter functional debug (possibly after shipping to customer).  On the
other hand, a language with undefined behavior makes it possible for
compilers (and their run-time support) to identify a program as wrong.

The problem with the latest spate of compiler optimizations was not the
optimization, but the lack of warnings about exploiting undefined behavior.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-28 16:13               ` Linus Torvalds
                                   ` (2 preceding siblings ...)
  2016-02-29 19:38                 ` Lawrence Crowl
@ 2016-02-29 20:45                 ` Toon Moene
  3 siblings, 0 replies; 18+ messages in thread
From: Toon Moene @ 2016-02-29 20:45 UTC (permalink / raw)
  To: Linus Torvalds, Markus Trippelsdorf
  Cc: Paul McKenney, linux-arch, gcc, parallel, llvm-dev, Will Deacon,
	Linux Kernel Mailing List, David Howells, Peter Zijlstra,
	Ramana Radhakrishnan, Luc Maranget, Andrew Morton, Jade Alglave,
	Ingo Molnar

On 02/28/2016 05:13 PM, Linus Torvalds wrote:

> Yeah, let's just say that the original C designers were
> better at their job than a gaggle of standards people who were making
> bad crap up to make some Fortran-style programs go faster.

The original C designers were defining a language that would make it 
easy to write operating systems in (and not having to rely on assembler).

I misled the quote where they said they first tried Fortran (and 
concluded it didn't fit their purpose).

BTW, Fortran was designed around floating point arithmetic (and its 
non-relation to the mathematical concept of the field of the reals).

It used integers only for counting and indexing arrays, so it had no 
purpose for "signed integers that overflowed". Therefore, to the Fortran 
standard, this was "undefined". It was literally "undefined" - as it was 
not described by the standard's text.

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [llvm-dev] [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-29 19:38                 ` Lawrence Crowl
@ 2016-02-29 21:12                   ` James Y Knight
  0 siblings, 0 replies; 18+ messages in thread
From: James Y Knight @ 2016-02-29 21:12 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: parallel, linux-arch, gcc, llvm-dev, Will Deacon,
	Linux Kernel Mailing List, David Howells, Peter Zijlstra,
	Ramana Radhakrishnan, Luc Maranget, Andrew Morton, Jade Alglave,
	Ingo Molnar, Markus Trippelsdorf

No, you really don't need undefined behavior in the standard in order
to enable bug-finding.

The standard could've (and still could...) make signed integer
overflow "implementation-defined" rather than "undefined". Compilers
would thus be required to have *some documented meaning* for it (e.g.
wrap 2's-complement, wrap 1's-complement, saturate to min/max, trap,
or whatever...), but must not have the current "Anything goes! I can
set your cat on fire if the optimizer feels like it today!" behavior.

Such a change to the standard would not reduce any ability to do error
checking, as compilers that want to be helpful could perfectly-well
define it to trap at runtime when given certain compiler flags, and
perfectly well warn you of your dependence upon unportable
implementation-defined behavior (or, that your program is going to
trap), at build-time.

[Sending again as a plain-text email, since a bunch of mailing lists
apparently hate on multipart messages that even contain a text/html
part...]

On Mon, Feb 29, 2016 at 2:38 PM, Lawrence Crowl via llvm-dev
<llvm-dev@lists.llvm.org> wrote:
> On 2/28/16, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> The fact is, undefined compiler behavior is never a good idea. Not for
>> serious projects.
>
> Actually, undefined behavior is essential for serious projects, but
> not for the reasons mentioned.
>
> If the language has no undefined behavior, then from the compiler's view,
> there is no such thing as a bad program.  All programs will compile and
> enter functional debug (possibly after shipping to customer).  On the
> other hand, a language with undefined behavior makes it possible for
> compilers (and their run-time support) to identify a program as wrong.
>
> The problem with the latest spate of compiler optimizations was not the
> optimization, but the lack of warnings about exploiting undefined behavior.
>
> --
> Lawrence Crowl
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [isocpp-parallel] Proposal for new memory_order_consume definition
  2016-02-29 18:17         ` Michael Matz
@ 2016-03-01  1:28           ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2016-03-01  1:28 UTC (permalink / raw)
  To: Michael Matz
  Cc: parallel, linux-arch, gcc, p796231, llvm-dev, will.deacon,
	linux-kernel, dhowells, peterz, Ramana.Radhakrishnan,
	Luc Maranget, akpm, Jade Alglave, torvalds, mingo

On Mon, Feb 29, 2016 at 07:17:55PM +0100, Michael Matz wrote:
> Hi,
> 
> On Sat, 27 Feb 2016, Paul E. McKenney wrote:
> 
> > But we do already have something very similar with signed integer
> > overflow.  If the compiler can see a way to generate faster code that
> > does not handle the overflow case, then the semantics suddenly change
> > from twos-complement arithmetic to something very strange.  The standard
> > does not specify all the ways that the implementation might deduce that
> > faster code can be generated by ignoring the overflow case, it instead
> > simply says that signed integer overflow invoked undefined behavior.
> > 
> > And if that is a problem, you use unsigned integers instead of signed
> > integers.
> > 
> > So it seems that we should be able to do something very similar here.
> 
> For this case the important pice of information to convey one or the other 
> meaning in source code is the _type_ of involved entities, not annotations 
> on the operations.  signed type -> undefined overflow, unsigned type -> 
> modulo arithmetic; easy, and it nicely carries automatically through 
> operation chains (and pointers) without any annotations.
> 
> I feel much of the complexity in the memory order specifications, also 
> with your recent (much better) wording to explain dependency chains, would 
> be much easier if the 'carries-dependency' would be encoded into the types 
> of operands.  For purpose of example, let's call the marker "blaeh" (not 
> atomic to not confuse with existing use :) ):
> 
> int foo;
> blaeh int global;
> int *somep;
> blae int *blaehp;
> f () {
>   blaehp = &foo;  // might be okay, adds restrictions on accesses through 
>                   // blaehp, but not through 'foo' directly
>   blaehp = &global;
>   if (somep == blaehp)
>     {
>       /* Even though the value is equal ... */
>       ... *blaehp ... /* ... a compiler can't rewrite this into *somep */
>     }
> }
> 
> A "carries-dependency" on some operation (e.g. a call) would be added by 
> using a properly typed pointer at those arguments (or return type) where 
> it matters.  You can't give a blaeh pointer to something only accepting 
> non-blaeh pointers (without cast).
> 
> Pointer addition and similar transformations involving a blaeh pointer and 
> some integer would still give a blaeh pointer, and hence by default also 
> solve the problem of cancellations.
> 
> Such marking via types would not solve all problems in an optimal way if 
> you had two overlapping but independend dependency chains (all of them 
> would collapse to one chain and hence made dependend, which still is 
> conservatively correct).
> 
> OTOH introducing new type qualifiers is a much larger undertaking, so I 
> can understand one wants to avoid this.  I think it'd ultimately be 
> clearer, though.

As has been stated in this thread, we do need the unmarked variant.

For the marked variant, there are quite a few possible solutions with
varying advantages and disadvantages:

o	Attribute already exists, but is not carried by the type system.
	Could be enforced by external tools.

o	Storage class could be added with fewer effects on the type
	system, but the reaction to this suggestion in October was
	not all that positive.

o	Non-type keywords for objects has been suggested, might be worth
	revisiting.

o	Adding to the type system allows type enforcement on the one
	hand, but makes it harder to write code that can be used for
	both RCU-protected and not-RCU-protected data structures.
	(This sort of thing is not uncommon in the Linux kernel.)

There are probably others, but those are the ones I recall at the
moment.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Proposal for new memory_order_consume definition
  2016-01-11 23:11 Paul E. McKenney
@ 2016-01-19 22:25 ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2016-01-19 22:25 UTC (permalink / raw)
  To: c++std-parallel
  Cc: linux-kernel, triegel, jeff, boehm, clark.nelson, OGiroux,
	Lawrence, dhowells, joseph, torvalds, Mark.Batty, Peter.Sewell,
	peterz, will.deacon, behanw, jfb, Jens.Maurer, michaelw

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

On Mon, Jan 11, 2016 at 03:11:42PM -0800, Paul E. McKenney wrote:
> Hello!
> 
> As requested at the October 2015 C++ Standards Committee Meeting, I have
> created a single proposal for memory_order_consume in C++:
> 
> http://www2.rdrop.com/users/paulmck/submission/consume.2016.01.11b.pdf
> 
> This contains an informal description of the proposal, rough-draft
> wording changes, and a number of litmus tests demonstrating how the
> proposal works.
> 
> The required changes to compilers appears to be extremely small,
> however, I would like to get more compiler writers' thoughts on the
> pointer_cmp_eq_dep(), pointer_cmp_ne_dep(), pointer_cmp_gt_dep(),
> pointer_cmp_ge_dep(), pointer_cmp_lt_dep(), and pointer_cmp_le_dep()
> intrinsics that do pointer comparisons without breaking dependencies on
> their first argument.  Figures 25 and 26 on page 16 demonstrate their use.
> These intrinsics were suggested at the October meeting, but it would be
> good to get wider feedback on them.
> 
> Note that last I checked, the Linux kernel actually does not depend
> on pointer comparisons not breaking dependency chains, because all
> comparisons are against NULL or a list-head structure, in which case
> the pointer is not going to be dereferenced after an equals comparison.
> But I do believe that some past versions of the Linux kernel have depended
> on this.

And an update based on considerable off-list feedback.

More thoughts?

 							Thanx, Paul

> PS.  For more background information, please see:
> 
> 	http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0098r0.pdf

[-- Attachment #2: consume.2016.01.19a.pdf --]
[-- Type: application/pdf, Size: 288461 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Proposal for new memory_order_consume definition
@ 2016-01-11 23:11 Paul E. McKenney
  2016-01-19 22:25 ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2016-01-11 23:11 UTC (permalink / raw)
  To: c++std-parallel
  Cc: linux-kernel, triegel, jeff, boehm, clark.nelson, OGiroux,
	Lawrence, dhowells, joseph, torvalds, Mark.Batty, Peter.Sewell,
	peterz, will.deacon, behanw, jfb, Jens.Maurer, michaelw

Hello!

As requested at the October 2015 C++ Standards Committee Meeting, I have
created a single proposal for memory_order_consume in C++:

http://www2.rdrop.com/users/paulmck/submission/consume.2016.01.11b.pdf

This contains an informal description of the proposal, rough-draft
wording changes, and a number of litmus tests demonstrating how the
proposal works.

The required changes to compilers appears to be extremely small,
however, I would like to get more compiler writers' thoughts on the
pointer_cmp_eq_dep(), pointer_cmp_ne_dep(), pointer_cmp_gt_dep(),
pointer_cmp_ge_dep(), pointer_cmp_lt_dep(), and pointer_cmp_le_dep()
intrinsics that do pointer comparisons without breaking dependencies on
their first argument.  Figures 25 and 26 on page 16 demonstrate their use.
These intrinsics were suggested at the October meeting, but it would be
good to get wider feedback on them.

Note that last I checked, the Linux kernel actually does not depend
on pointer comparisons not breaking dependency chains, because all
comparisons are against NULL or a list-head structure, in which case
the pointer is not going to be dereferenced after an equals comparison.
But I do believe that some past versions of the Linux kernel have depended
on this.

Thoughts?

							Thanx, Paul

PS.  For more background information, please see:

	http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0098r0.pdf

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-03-01  1:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-18  1:10 Proposal for new memory_order_consume definition Paul E. McKenney
2016-02-20  2:15 ` [isocpp-parallel] " Tony V E
2016-02-20 19:53   ` Paul E. McKenney
     [not found]     ` <CAPUmR1bw=N4NkjAK1zn_X0+84KEaEAM6HZCHZJy_txqC9hMgSg@mail.gmail.com>
2016-02-26 23:56       ` Lawrence Crowl
2016-02-27 17:06       ` Paul E. McKenney
     [not found]         ` <CA+55aFyHmykKc=YybJMo9ZUO352MY5noJVB4-K1Lkjmw4UHXfA@mail.gmail.com>
2016-02-27 23:10           ` Paul E. McKenney
2016-02-28  8:27             ` Markus Trippelsdorf
2016-02-28 16:13               ` Linus Torvalds
2016-02-28 16:50                 ` [llvm-dev] " cbergstrom
2016-02-29 17:37                 ` Michael Matz
2016-02-29 17:57                   ` Linus Torvalds
2016-02-29 19:38                 ` Lawrence Crowl
2016-02-29 21:12                   ` [llvm-dev] " James Y Knight
2016-02-29 20:45                 ` Toon Moene
2016-02-29 18:17         ` Michael Matz
2016-03-01  1:28           ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2016-01-11 23:11 Paul E. McKenney
2016-01-19 22:25 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).