All of lore.kernel.org
 help / color / mirror / Atom feed
* Implementing 64bit atomic gcc built-ins
@ 2014-07-16 12:40 Guy Martin
  2014-07-16 14:24 ` Carlos O'Donell
  2014-07-16 21:16 ` James Bottomley
  0 siblings, 2 replies; 8+ messages in thread
From: Guy Martin @ 2014-07-16 12:40 UTC (permalink / raw)
  To: linux-parisc

Hi all,


It seems that gcc on hppa currently doesn't support 64 bit atomic 
built-ins such as __sync_compare_and_swap().

Looking at the current implementation, glibc calls the LWS CAS in the 
kernel to do the compare and swap operation in an atomic way.
The current implementation of lws_compare_and_swap64 works only with 64 
bit kernel.

In the case of a 32 bit kernel, I'm not sure if it's possible to 
implement an atomic CAS that would work on two registers at once. If 
it's possible, most probably a lws_compare_and_swap_dword or so LWS 
should be created as I can't see the current ABI working in this 
scenario. As far as I understand the code in syscall.S, it would just be 
a matter of adding a ldw/stw instruction pair in cas_action to have 
64bit operations (on top of changing the ABI).

If we are running a 64bit kernel, I guess it might be possible to call 
lws_compare_and_swap64 from userspace, but it means that we would have 
to switch to wide mode in userspace prior to perform the call.
Again, I'm not sure that this is doable as it seems that to do so, the 
RSM instruction needs to be used while it's a privileged level 
instruction.
Another option is to create lws_compare_and_swap_dword with a different 
ABI that would take 64bit integers stored in two '32bit' registers, 
merge the registers into a single one and call lws_compare_ans_swap64.



For me, the best course of action here is to create 
lws_compare_and_swap_dword. Provided we can perform the CAS operation on 
two registers at once, it would solve the problem for 32bit userspace 
CAS to either kernel word size.

Would this approach work or is it a dead end ?

Any comments/advices ?


Thanks,
   Guy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing 64bit atomic gcc built-ins
  2014-07-16 12:40 Implementing 64bit atomic gcc built-ins Guy Martin
@ 2014-07-16 14:24 ` Carlos O'Donell
  2014-07-16 19:37   ` Helge Deller
  2014-07-16 21:16 ` James Bottomley
  1 sibling, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2014-07-16 14:24 UTC (permalink / raw)
  To: Guy Martin; +Cc: linux-parisc

On Wed, Jul 16, 2014 at 8:40 AM, Guy Martin <gmsoft@tuxicoman.be> wrote:
> It seems that gcc on hppa currently doesn't support 64 bit atomic built-ins
> such as __sync_compare_and_swap().
>
> Looking at the current implementation, glibc calls the LWS CAS in the kernel
> to do the compare and swap operation in an atomic way.
> The current implementation of lws_compare_and_swap64 works only with 64 bit
> kernel.

This doesn't do what you want.

IIRC both swaps, the swap32 and swap64 operate only on a 32-bit quantity

However swap32 operates using 32-bit operands e.g. addresses and values.

While swap64 operates using 64-bit operands allowing you to access
64-bit addresses, but still doing only a 32-bit swap. The upper-half
of the 64-bit "new value" should be all zeros or the swap will never
succeed. I expect the loaded 32-bit value will be sign extended so you
have to take that into account. The truth is that I don't remember
ever testing the 64-bit entry point, but it's there fore 64-bit
processes.

This is what I intended when I wrote the ABI.

> In the case of a 32 bit kernel, I'm not sure if it's possible to implement
> an atomic CAS that would work on two registers at once. If it's possible,
> most probably a lws_compare_and_swap_dword or so LWS should be created as I
> can't see the current ABI working in this scenario. As far as I understand
> the code in syscall.S, it would just be a matter of adding a ldw/stw
> instruction pair in cas_action to have 64bit operations (on top of changing
> the ABI).

It is *absolutely* possible and very easy.

For a 32-bit kernel to do a 64-bit atomic operation it needs to do
everything in two steps while holding the lws_cas locks.

For a 64-bit kernel you can simply use load double-word.

> If we are running a 64bit kernel, I guess it might be possible to call
> lws_compare_and_swap64 from userspace, but it means that we would have to
> switch to wide mode in userspace prior to perform the call.
> Again, I'm not sure that this is doable as it seems that to do so, the RSM
> instruction needs to be used while it's a privileged level instruction.
> Another option is to create lws_compare_and_swap_dword with a different ABI
> that would take 64bit integers stored in two '32bit' registers, merge the
> registers into a single one and call lws_compare_ans_swap64.

I would avoid switching to wide mode in userspace because you can
still take a signal between the switch and entering the kernel and
that would be catastrophic since you'd be running 32-bit code in wide
mode without having taken the appropriate precautions.

I would suggest creating lws_compare_and_swap_dword with a distinct ABI.

You will need two return registers for the 32-bit ABI, high and low.

> For me, the best course of action here is to create
> lws_compare_and_swap_dword. Provided we can perform the CAS operation on two
> registers at once, it would solve the problem for 32bit userspace CAS to
> either kernel word size.

Agreed.

> Would this approach work or is it a dead end ?

It is not a dead end, it's quite a nice project and enables 64-bit or
N-bit atomics using LWS CAS.

> Any comments/advices ?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing 64bit atomic gcc built-ins
  2014-07-16 14:24 ` Carlos O'Donell
@ 2014-07-16 19:37   ` Helge Deller
  0 siblings, 0 replies; 8+ messages in thread
From: Helge Deller @ 2014-07-16 19:37 UTC (permalink / raw)
  To: Carlos O'Donell, Guy Martin; +Cc: linux-parisc

On 07/16/2014 04:24 PM, Carlos O'Donell wrote:
> On Wed, Jul 16, 2014 at 8:40 AM, Guy Martin <gmsoft@tuxicoman.be> wrote:
>> In the case of a 32 bit kernel, I'm not sure if it's possible to implement
>> an atomic CAS that would work on two registers at once. If it's possible,
>> most probably a lws_compare_and_swap_dword or so LWS should be created as I
>> can't see the current ABI working in this scenario. As far as I understand
>> the code in syscall.S, it would just be a matter of adding a ldw/stw
>> instruction pair in cas_action to have 64bit operations (on top of changing
>> the ABI).
> 
> It is *absolutely* possible and very easy.
> 
> For a 32-bit kernel to do a 64-bit atomic operation it needs to do
> everything in two steps while holding the lws_cas locks.

I agree this would work if userspace only modifies the atomic dword through the new lws call.
But is it still "atomic" if some other thread modifies the 64bit value not through the lws?
I mean, isn't there a possibility, that the upper or lower 32bit may be changed unexpectedly and out-of-sync?

Helge

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing 64bit atomic gcc built-ins
  2014-07-16 12:40 Implementing 64bit atomic gcc built-ins Guy Martin
  2014-07-16 14:24 ` Carlos O'Donell
@ 2014-07-16 21:16 ` James Bottomley
  2014-07-17  1:52   ` Carlos O'Donell
  1 sibling, 1 reply; 8+ messages in thread
From: James Bottomley @ 2014-07-16 21:16 UTC (permalink / raw)
  To: Guy Martin; +Cc: linux-parisc

On Wed, 2014-07-16 at 14:40 +0200, Guy Martin wrote:
> Hi all,
> 
> 
> It seems that gcc on hppa currently doesn't support 64 bit atomic 
> built-ins such as __sync_compare_and_swap().
> 
> Looking at the current implementation, glibc calls the LWS CAS in the 
> kernel to do the compare and swap operation in an atomic way.
> The current implementation of lws_compare_and_swap64 works only with 64 
> bit kernel.
> 
> In the case of a 32 bit kernel, I'm not sure if it's possible to 
> implement an atomic CAS that would work on two registers at once. If 
> it's possible, most probably a lws_compare_and_swap_dword or so LWS 
> should be created as I can't see the current ABI working in this 
> scenario. As far as I understand the code in syscall.S, it would just be 
> a matter of adding a ldw/stw instruction pair in cas_action to have 
> 64bit operations (on top of changing the ABI).

Atomic operations are only done at the natural width of the binary
architecture.  That means even on 32 bit x86 there's no natural atomic
64 bit swap and no expectation of one.  Glibc can emulate one, but it
shouldn't assume there's any natural machine op.

This should mean that a 32 bit kernel has no need at all for the 64 bit
ops to be implemented in kernel.

> If we are running a 64bit kernel, I guess it might be possible to call 
> lws_compare_and_swap64 from userspace, but it means that we would have 
> to switch to wide mode in userspace prior to perform the call.
> Again, I'm not sure that this is doable as it seems that to do so, the 
> RSM instruction needs to be used while it's a privileged level 
> instruction.
> Another option is to create lws_compare_and_swap_dword with a different 
> ABI that would take 64bit integers stored in two '32bit' registers, 
> merge the registers into a single one and call lws_compare_ans_swap64.

Same rule applies.  Until we have a 64 bit userspace, we have no use for
64 bit atomic swaps.

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing 64bit atomic gcc built-ins
  2014-07-16 21:16 ` James Bottomley
@ 2014-07-17  1:52   ` Carlos O'Donell
  2014-07-17  8:44     ` Guy Martin
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2014-07-17  1:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: Guy Martin, linux-parisc

On Wed, Jul 16, 2014 at 5:16 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2014-07-16 at 14:40 +0200, Guy Martin wrote:
>> Hi all,
>>
>>
>> It seems that gcc on hppa currently doesn't support 64 bit atomic
>> built-ins such as __sync_compare_and_swap().
>>
>> Looking at the current implementation, glibc calls the LWS CAS in the
>> kernel to do the compare and swap operation in an atomic way.
>> The current implementation of lws_compare_and_swap64 works only with 64
>> bit kernel.
>>
>> In the case of a 32 bit kernel, I'm not sure if it's possible to
>> implement an atomic CAS that would work on two registers at once. If
>> it's possible, most probably a lws_compare_and_swap_dword or so LWS
>> should be created as I can't see the current ABI working in this
>> scenario. As far as I understand the code in syscall.S, it would just be
>> a matter of adding a ldw/stw instruction pair in cas_action to have
>> 64bit operations (on top of changing the ABI).
>
> Atomic operations are only done at the natural width of the binary
> architecture.  That means even on 32 bit x86 there's no natural atomic
> 64 bit swap and no expectation of one.  Glibc can emulate one, but it
> shouldn't assume there's any natural machine op.

That's right.

Which speaks to Helge's comment.

Userspace algorithms can't expect that a 64-bit write will be atomic
with respect to a 64-bit atomic operation.

You must use the atomic operations to both read and write the
wider-than-natural-width types.

> This should mean that a 32 bit kernel has no need at all for the 64 bit
> ops to be implemented in kernel.

Has no need, yes, but from the userspace perspective such an operation
helps write algorithms that take advantage of 64-bits of atomic
storage without having to deal with signals, interruptions, locking
semantics etc.

Particularly when you are porting an algorithm that is already proven
and uses a 64-bit atomic, having these features means you don't have
to rewrite all the algorithms during the port.

Case in point I created LWS CAS to avoid having to rewrite all the
nptl threading algorithms for hppa based on ldcw.

>> If we are running a 64bit kernel, I guess it might be possible to call
>> lws_compare_and_swap64 from userspace, but it means that we would have
>> to switch to wide mode in userspace prior to perform the call.
>> Again, I'm not sure that this is doable as it seems that to do so, the
>> RSM instruction needs to be used while it's a privileged level
>> instruction.
>> Another option is to create lws_compare_and_swap_dword with a different
>> ABI that would take 64bit integers stored in two '32bit' registers,
>> merge the registers into a single one and call lws_compare_ans_swap64.
>
> Same rule applies.  Until we have a 64 bit userspace, we have no use for
> 64 bit atomic swaps.

I disagree. See the rationale above which argues it is out of a
practical necessity for porting algorithms wholesale without
modification.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing 64bit atomic gcc built-ins
  2014-07-17  1:52   ` Carlos O'Donell
@ 2014-07-17  8:44     ` Guy Martin
  2014-07-17 12:46       ` John David Anglin
  2014-07-19  0:30       ` John David Anglin
  0 siblings, 2 replies; 8+ messages in thread
From: Guy Martin @ 2014-07-17  8:44 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: James Bottomley, linux-parisc

On 2014-07-17 03:52, Carlos O'Donell wrote:
> Userspace algorithms can't expect that a 64-bit write will be atomic
> with respect to a 64-bit atomic operation.
> 
> You must use the atomic operations to both read and write the
> wider-than-natural-width types.
> 

Reviewing HPPA 1.1 specs, I see that the FPU has 64bit registers.
Can't we use those registers to perform 64bit atomic load and store ?

As far as I can see, FLDDX and FSTDX should be able to do the job.
Unless there is something I'm missing about the FPU :)

>> This should mean that a 32 bit kernel has no need at all for the 64 
>> bit
>> ops to be implemented in kernel.
> 
> Has no need, yes, but from the userspace perspective such an operation
> helps write algorithms that take advantage of 64-bits of atomic
> storage without having to deal with signals, interruptions, locking
> semantics etc.
> 
> Particularly when you are porting an algorithm that is already proven
> and uses a 64-bit atomic, having these features means you don't have
> to rewrite all the algorithms during the port.
> 
> Case in point I created LWS CAS to avoid having to rewrite all the
> nptl threading algorithms for hppa based on ldcw.
> 

Just to add up on this, I often use 64 bit __sync_fetch_and_add() for
performance counter in a software I develop. It's really useful if
you don't want to bother with having a lock per performance object.

   Guy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing 64bit atomic gcc built-ins
  2014-07-17  8:44     ` Guy Martin
@ 2014-07-17 12:46       ` John David Anglin
  2014-07-19  0:30       ` John David Anglin
  1 sibling, 0 replies; 8+ messages in thread
From: John David Anglin @ 2014-07-17 12:46 UTC (permalink / raw)
  To: Guy Martin; +Cc: Carlos O'Donell, James Bottomley, linux-parisc

On 17-Jul-14, at 4:44 AM, Guy Martin wrote:

> On 2014-07-17 03:52, Carlos O'Donell wrote:
>> Userspace algorithms can't expect that a 64-bit write will be atomic
>> with respect to a 64-bit atomic operation.
>> You must use the atomic operations to both read and write the
>> wider-than-natural-width types.
>
> Reviewing HPPA 1.1 specs, I see that the FPU has 64bit registers.
> Can't we use those registers to perform 64bit atomic load and store ?
>
> As far as I can see, FLDDX and FSTDX should be able to do the job.
> Unless there is something I'm missing about the FPU :)

Yes, although I believe there are some restrictions involving loads  
and stores
to I/O space.  There are atomic_loaddi and atomic_storedi patterns  
implemented
in GCC to do 64-bit loads and stores.

>
>>> This should mean that a 32 bit kernel has no need at all for the  
>>> 64 bit
>>> ops to be implemented in kernel.
>> Has no need, yes, but from the userspace perspective such an  
>> operation
>> helps write algorithms that take advantage of 64-bits of atomic
>> storage without having to deal with signals, interruptions, locking
>> semantics etc.
>> Particularly when you are porting an algorithm that is already proven
>> and uses a 64-bit atomic, having these features means you don't have
>> to rewrite all the algorithms during the port.
>> Case in point I created LWS CAS to avoid having to rewrite all the
>> nptl threading algorithms for hppa based on ldcw.
>
> Just to add up on this, I often use 64 bit __sync_fetch_and_add() for
> performance counter in a software I develop. It's really useful if
> you don't want to bother with having a lock per performance object.

There is a need for 64-bit atomics in a number of packages.  In the  
other
direction, some packages would like atomic bit operations.

As Helge pointed out, none of the sync atomics available on parisc are  
lock free.  As such,
one can't use loads and stores to operate on locations modified by LWS  
CAS operations.
Even in a simple userspace spin lock implementation, we found a race  
condition in trying
to reset the lock with a simple store.  The lock had to be reset using  
a LES CAS operation.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Implementing 64bit atomic gcc built-ins
  2014-07-17  8:44     ` Guy Martin
  2014-07-17 12:46       ` John David Anglin
@ 2014-07-19  0:30       ` John David Anglin
  1 sibling, 0 replies; 8+ messages in thread
From: John David Anglin @ 2014-07-19  0:30 UTC (permalink / raw)
  To: Guy Martin; +Cc: Carlos O'Donell, James Bottomley, linux-parisc

On 17-Jul-14, at 4:44 AM, Guy Martin wrote:

> Reviewing HPPA 1.1 specs, I see that the FPU has 64bit registers.
> Can't we use those registers to perform 64bit atomic load and store ?
>
> As far as I can see, FLDDX and FSTDX should be able to do the job.
> Unless there is something I'm missing about the FPU :)

The main inefficiency is that one needs to copy a value through memory  
to
get a value in a general register into a floating point register,  
etc.  Probably,
for this technique, it would be most efficient to pass the values in  
floating
point registers.

Most people are running PA 2.0 machines now and these have 64-bit  
registers
and 64-bit loads and stores.  Problem is we don't have a proper 64-bit  
context
on these machines like hpux.  So, we don't save/restore everything on an
interruption.  However, it might be possible to use 64-bit operations  
in the region
where interrupts are disabled.  Note that interruptions are still  
possible in this
region but it might be we don't care about the values in this case.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-07-19  0:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-16 12:40 Implementing 64bit atomic gcc built-ins Guy Martin
2014-07-16 14:24 ` Carlos O'Donell
2014-07-16 19:37   ` Helge Deller
2014-07-16 21:16 ` James Bottomley
2014-07-17  1:52   ` Carlos O'Donell
2014-07-17  8:44     ` Guy Martin
2014-07-17 12:46       ` John David Anglin
2014-07-19  0:30       ` John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.