linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cmpxchg and x86 flags output
@ 2016-06-14 23:53 H. Peter Anvin
  2016-06-15  8:50 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2016-06-14 23:53 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-arch, Linus Torvalds,
	Ingo Molnar, Thomas Gleixner

The x86 gcc now has the ability to return the value of flags output.  In
most use cases, this has been trivial to use in the kernel.

However, cmpxchg() presents a problem.  The current definition of
cmpxchg() and its variants is:

	out = cmpxchg(ptr, old, new);

... which is then frequently followed by:

	if (likely(old == out))

... or something along those lines.

This test is unnecessary and can now be elided, but this means changing
the signature on the cmpxchg() function (macro, generally).

It seems to me that the sanest way to handle this is to add a new
interface with a fourth parameter, so:

	changed = cmpxchgx(ptr, old, new, out);

A generic implementation of cmpxchgx() would be provided, looking like:

#define cmpxchgx(ptr, old, new, out) ({			\
	__typeof__((*(ptr))) __old = (old);		\
	__typeof__((*(ptr))) __new = (new);		\
	__typeof__((*(ptr))) __old = (old);		\
	__typeof__((*(ptr))) __out; 			\
	(out) = __out = cmpxchg(ptr, __old, __new);	\
	(__old != __out);				\
})

... and so on for all the many other variants.

However, I'm wondering how well this will fit in with other
architectures.  Keep in mind gcc will probably gain this ability for
other architectures with flags at some point, although that doesn't
inherently mean that cmpxchg will be able to make use of it.

This means a lot of changes even to common code, so I want to make sure
the interface is right before embarking on an implementation.

Thoughts?

	-hpa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-14 23:53 cmpxchg and x86 flags output H. Peter Anvin
@ 2016-06-15  8:50 ` Peter Zijlstra
  2016-06-16 22:21   ` H. Peter Anvin
  2016-06-21  9:06   ` David Howells
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2016-06-15  8:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linux Kernel Mailing List, linux-arch, Linus Torvalds,
	Ingo Molnar, Thomas Gleixner, David Howells

On Tue, Jun 14, 2016 at 04:53:00PM -0700, H. Peter Anvin wrote:
> The x86 gcc now has the ability to return the value of flags output.  In
> most use cases, this has been trivial to use in the kernel.
> 
> However, cmpxchg() presents a problem.  The current definition of
> cmpxchg() and its variants is:
> 
> 	out = cmpxchg(ptr, old, new);
> 
> ... which is then frequently followed by:
> 
> 	if (likely(old == out))
> 
> ... or something along those lines.
> 
> This test is unnecessary and can now be elided, but this means changing
> the signature on the cmpxchg() function (macro, generally).
> 
> It seems to me that the sanest way to handle this is to add a new
> interface with a fourth parameter, so:
> 
> 	changed = cmpxchgx(ptr, old, new, out);

See also:

  lkml.kernel.org/r/146358429016.8596.3381723959064491676.stgit@warthog.procyon.org.uk

where David suggests the same.

> 
> A generic implementation of cmpxchgx() would be provided, looking like:
> 
> #define cmpxchgx(ptr, old, new, out) ({			\
> 	__typeof__((*(ptr))) __old = (old);		\
> 	__typeof__((*(ptr))) __new = (new);		\
> 	__typeof__((*(ptr))) __old = (old);		\
> 	__typeof__((*(ptr))) __out; 			\
> 	(out) = __out = cmpxchg(ptr, __old, __new);	\
> 	(__old != __out);				\
> })
> 
> ... and so on for all the many other variants.
> 
> However, I'm wondering how well this will fit in with other
> architectures.

All ll/sc based archs also already know if the operation succeeded
without having to do the extra comparison.

SPARCv9,S390x which are native CAS architectures, also places the
success of the operation in condition codes.

IA64 might be the odd duck out (or I'm not reading the manual right,
which is entirely possible).

> Keep in mind gcc will probably gain this ability for
> other architectures with flags at some point, although that doesn't
> inherently mean that cmpxchg will be able to make use of it.
> 
> This means a lot of changes even to common code, so I want to make sure
> the interface is right before embarking on an implementation.
> 
> Thoughts?

David has already done lots of the conversions for you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-15  8:50 ` Peter Zijlstra
@ 2016-06-16 22:21   ` H. Peter Anvin
  2016-06-21  9:06   ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2016-06-16 22:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, linux-arch, Linus Torvalds,
	Ingo Molnar, Thomas Gleixner, David Howells

On 06/15/16 01:50, Peter Zijlstra wrote:
>>
>> It seems to me that the sanest way to handle this is to add a new
>> interface with a fourth parameter, so:
>>
>> 	changed = cmpxchgx(ptr, old, new, out);
> 
> See also:
> 
>   lkml.kernel.org/r/146358429016.8596.3381723959064491676.stgit@warthog.procyon.org.uk
> 
> where David suggests the same.
> 
>>
>> A generic implementation of cmpxchgx() would be provided, looking like:
>>
>> #define cmpxchgx(ptr, old, new, out) ({			\
>> 	__typeof__((*(ptr))) __old = (old);		\
>> 	__typeof__((*(ptr))) __new = (new);		\
>> 	__typeof__((*(ptr))) __old = (old);		\
>> 	__typeof__((*(ptr))) __out; 			\
>> 	(out) = __out = cmpxchg(ptr, __old, __new);	\
>> 	(__old != __out);				\
>> })
>>
>> ... and so on for all the many other variants.
>>
>> However, I'm wondering how well this will fit in with other
>> architectures.
> 
> All ll/sc based archs also already know if the operation succeeded
> without having to do the extra comparison.
> 
> SPARCv9,S390x which are native CAS architectures, also places the
> success of the operation in condition codes.
> 
> IA64 might be the odd duck out (or I'm not reading the manual right,
> which is entirely possible).
> 
>> Keep in mind gcc will probably gain this ability for
>> other architectures with flags at some point, although that doesn't
>> inherently mean that cmpxchg will be able to make use of it.
>>
>> This means a lot of changes even to common code, so I want to make sure
>> the interface is right before embarking on an implementation.
>>
>> Thoughts?
> 
> David has already done lots of the conversions for you.
> 

Well, that sounds promising.  I wonder how David's model, using
intrinsics (do we have enough intrinsics to actually be able to do this
"correctly"?), compare to using the flags output from assembly.

	-hpa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-15  8:50 ` Peter Zijlstra
  2016-06-16 22:21   ` H. Peter Anvin
@ 2016-06-21  9:06   ` David Howells
  2016-06-21 17:00     ` H. Peter Anvin
                       ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: David Howells @ 2016-06-21  9:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dhowells, Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

H. Peter Anvin <hpa@zytor.com> wrote:

> Well, that sounds promising.  I wonder how David's model, using
> intrinsics (do we have enough intrinsics to actually be able to do this
> "correctly"?), compare to using the flags output from assembly.

There is an advantage to using the intriniscs on arches with explicit
barriers.  On powerpc64, for example, the compiler can move the release memory
barrier earlier to push register-only instructions between the barrier and the
lwarx.  This would allow the memory barrier to be executed concurrently with
those instructions.

The compiler could also move the acquire memory barrier later, pulling
register-only instructions between the stwcx and that barrier, though I don't
see any advantage to doing so.

Whereas if the release barrier is in the same asm block as the lwarx, the
compiler cannot do anything with it.


Another advantage is that the compiler can switch between instruction variants
automatically, allowing us to get rid of the size-based switch statements for
things like cmpxchg().


However, there's probably not a great deal of difference to be had if the
inline asm codes the appropriate instruction in each case for something like
x86*.  The emitted code ought to look the same.  The second biggest win for
the intriniscs, I think, is the ability to ask the CMPXCHG instruction whether
it actually did anything rather than comparing the result.  I added two
variants, one that only returned the yes/no and one that passed back the value
as well as the yes/no.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-21  9:06   ` David Howells
@ 2016-06-21 17:00     ` H. Peter Anvin
  2016-06-21 17:24     ` H. Peter Anvin
  2016-06-22 16:11     ` David Howells
  2 siblings, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2016-06-21 17:00 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

On June 21, 2016 2:06:20 AM PDT, David Howells <dhowells@redhat.com> wrote:
>H. Peter Anvin <hpa@zytor.com> wrote:
>
>> Well, that sounds promising.  I wonder how David's model, using
>> intrinsics (do we have enough intrinsics to actually be able to do
>this
>> "correctly"?), compare to using the flags output from assembly.
>
>There is an advantage to using the intriniscs on arches with explicit
>barriers.  On powerpc64, for example, the compiler can move the release
>memory
>barrier earlier to push register-only instructions between the barrier
>and the
>lwarx.  This would allow the memory barrier to be executed concurrently
>with
>those instructions.
>
>The compiler could also move the acquire memory barrier later, pulling
>register-only instructions between the stwcx and that barrier, though I
>don't
>see any advantage to doing so.
>
>Whereas if the release barrier is in the same asm block as the lwarx,
>the
>compiler cannot do anything with it.
>
>
>Another advantage is that the compiler can switch between instruction
>variants
>automatically, allowing us to get rid of the size-based switch
>statements for
>things like cmpxchg().
>
>
>However, there's probably not a great deal of difference to be had if
>the
>inline asm codes the appropriate instruction in each case for something
>like
>x86*.  The emitted code ought to look the same.  The second biggest win
>for
>the intriniscs, I think, is the ability to ask the CMPXCHG instruction
>whether
>it actually did anything rather than comparing the result.  I added two
>variants, one that only returned the yes/no and one that passed back
>the value
>as well as the yes/no.
>
>David

The question for me is for things like lock patching that we do on x86...
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-21  9:06   ` David Howells
  2016-06-21 17:00     ` H. Peter Anvin
@ 2016-06-21 17:24     ` H. Peter Anvin
  2016-06-22  0:09       ` H. Peter Anvin
  2016-06-22 16:14       ` David Howells
  2016-06-22 16:11     ` David Howells
  2 siblings, 2 replies; 14+ messages in thread
From: H. Peter Anvin @ 2016-06-21 17:24 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

On 06/21/16 02:06, David Howells wrote:
> 
> However, there's probably not a great deal of difference to be had if the
> inline asm codes the appropriate instruction in each case for something like
> x86*.  The emitted code ought to look the same.  The second biggest win for
> the intriniscs, I think, is the ability to ask the CMPXCHG instruction whether
> it actually did anything rather than comparing the result.  I added two
> variants, one that only returned the yes/no and one that passed back the value
> as well as the yes/no.
> 

Right, and we want that either way.  The API change that you are
proposing is definitely what we want; the specifics of the x86
implementation is sort of orthogonal.

	-hpa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-21 17:24     ` H. Peter Anvin
@ 2016-06-22  0:09       ` H. Peter Anvin
  2016-06-22 16:14       ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2016-06-22  0:09 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

On 06/21/16 10:24, H. Peter Anvin wrote:
> On 06/21/16 02:06, David Howells wrote:
>>
>> However, there's probably not a great deal of difference to be had if the
>> inline asm codes the appropriate instruction in each case for something like
>> x86*.  The emitted code ought to look the same.  The second biggest win for
>> the intriniscs, I think, is the ability to ask the CMPXCHG instruction whether
>> it actually did anything rather than comparing the result.  I added two
>> variants, one that only returned the yes/no and one that passed back the value
>> as well as the yes/no.
>>
> 
> Right, and we want that either way.  The API change that you are
> proposing is definitely what we want; the specifics of the x86
> implementation is sort of orthogonal.
> 

So how do we make this move forward?

	-hpa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-21  9:06   ` David Howells
  2016-06-21 17:00     ` H. Peter Anvin
  2016-06-21 17:24     ` H. Peter Anvin
@ 2016-06-22 16:11     ` David Howells
  2016-06-22 16:36       ` H. Peter Anvin
  2 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2016-06-22 16:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dhowells, Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

H. Peter Anvin <hpa@zytor.com> wrote:

> The question for me is for things like lock patching that we do on x86...

This might be pertinent to what you're asking:

	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-21 17:24     ` H. Peter Anvin
  2016-06-22  0:09       ` H. Peter Anvin
@ 2016-06-22 16:14       ` David Howells
  2016-08-19 17:22         ` H. Peter Anvin
  2016-08-22 15:13         ` David Howells
  1 sibling, 2 replies; 14+ messages in thread
From: David Howells @ 2016-06-22 16:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dhowells, Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

H. Peter Anvin <hpa@zytor.com> wrote:

> So how do we make this move forward?

Getting my API additions in is relatively straightforward, I think.  The
whether-or-notness of the cmpxchg operation succeeding can be calculated by
comparing the original value read from memory with the value-to-be-replaced
inside the API function.

This can later be replaced with the boolean output from the CMPXCHG
instruction, the branch target from the LL/SC skipping or the result of the
intrinsics.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-22 16:11     ` David Howells
@ 2016-06-22 16:36       ` H. Peter Anvin
  2016-06-22 17:11         ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2016-06-22 16:36 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

On 06/22/16 09:11, David Howells wrote:
> H. Peter Anvin <hpa@zytor.com> wrote:
> 
>> The question for me is for things like lock patching that we do on x86...
> 
> This might be pertinent to what you're asking:
> 
> 	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70973
> 

I am kind of hesitant to put knowledge of this into gcc, because it
freezes something that currently is not gcc-dependent (although we could
separate out the gcc-generated and non-gcc-generated bits if we really
care.)  With the gcc flags output we can do this with assembly code as
well today (on x86), so it is unclear if we have any compelling reason
to need intrinsics that we won't be able to rely on existing for a long
time (the difference between gcc versions that have flags support and
don't have it has already been abstracted out in the x86/asm branch of
the tip tree, so we don't need two versions.)

	-hpa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-22 16:36       ` H. Peter Anvin
@ 2016-06-22 17:11         ` Linus Torvalds
  2016-06-22 17:49           ` H. Peter Anvin
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2016-06-22 17:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Howells, Peter Zijlstra, Linux Kernel Mailing List,
	linux-arch, Ingo Molnar, Thomas Gleixner

On Wed, Jun 22, 2016 at 9:36 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> I am kind of hesitant to put knowledge of this into gcc, because it
> freezes something that currently is not gcc-dependent (although we could
> separate out the gcc-generated and non-gcc-generated bits if we really
> care.)

I'm pretty down on the whole intrinsics thing in general. We have
*not* had great luck with most intrinsics, largely because it takes so
long for people to upgrade compilers, and it's such a pain to check
every single little random new gcc addition.

There seems to be no advantage (at least on x86) of some new intrinsic
over just using the asm with condition code outputs. And that's a much
more generic gcc feature that we would use in other places.

I thought Richard Henderson already had a patch for the condition code
asm outputs, but maybe I misremember.

              Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-22 17:11         ` Linus Torvalds
@ 2016-06-22 17:49           ` H. Peter Anvin
  0 siblings, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2016-06-22 17:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, Peter Zijlstra, Linux Kernel Mailing List,
	linux-arch, Ingo Molnar, Thomas Gleixner

On 06/22/16 10:11, Linus Torvalds wrote:
> 
> I thought Richard Henderson already had a patch for the condition code
> asm outputs, but maybe I misremember.
> 

It is already in the released version of gcc 6.1; there is a patchset in
-tip already for using it for everything other than cmpxchg using the
CC_SET/CC_OUT macros I proposed a while ago.  cmpxchg is special because
it requires API changes to take advantage of, which as far as I
understand happens to be exactly the API changes that David has already
implemented.  Apparently some other architectures really can benefit
from intrinsics, so that is okay; we can use inline asm with flags
output on x86.

	-hpa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-22 16:14       ` David Howells
@ 2016-08-19 17:22         ` H. Peter Anvin
  2016-08-22 15:13         ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2016-08-19 17:22 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

On 06/22/16 09:14, David Howells wrote:
> H. Peter Anvin <hpa@zytor.com> wrote:
> 
>> So how do we make this move forward?
> 
> Getting my API additions in is relatively straightforward, I think.  The
> whether-or-notness of the cmpxchg operation succeeding can be calculated by
> comparing the original value read from memory with the value-to-be-replaced
> inside the API function.
> 
> This can later be replaced with the boolean output from the CMPXCHG
> instruction, the branch target from the LL/SC skipping or the result of the
> intrinsics.
> 

Hi David,

Any news on this?  Can we perhaps help making this go forward at some point?

	-hpa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cmpxchg and x86 flags output
  2016-06-22 16:14       ` David Howells
  2016-08-19 17:22         ` H. Peter Anvin
@ 2016-08-22 15:13         ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: David Howells @ 2016-08-22 15:13 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: dhowells, Peter Zijlstra, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, Ingo Molnar, Thomas Gleixner

H. Peter Anvin <hpa@zytor.com> wrote:

> Any news on this?  Can we perhaps help making this go forward at some point?

I'm just bringing my patches up to date with an eye on extracting and pushing
the more general (ie. less contentious) bits.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-08-22 15:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14 23:53 cmpxchg and x86 flags output H. Peter Anvin
2016-06-15  8:50 ` Peter Zijlstra
2016-06-16 22:21   ` H. Peter Anvin
2016-06-21  9:06   ` David Howells
2016-06-21 17:00     ` H. Peter Anvin
2016-06-21 17:24     ` H. Peter Anvin
2016-06-22  0:09       ` H. Peter Anvin
2016-06-22 16:14       ` David Howells
2016-08-19 17:22         ` H. Peter Anvin
2016-08-22 15:13         ` David Howells
2016-06-22 16:11     ` David Howells
2016-06-22 16:36       ` H. Peter Anvin
2016-06-22 17:11         ` Linus Torvalds
2016-06-22 17:49           ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).