All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] 64bit LWS CAS
@ 2014-07-17 20:00 Guy Martin
  2014-07-17 21:12 ` John David Anglin
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Guy Martin @ 2014-07-17 20:00 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

Hi,

I've attached the gcc and kernel patch for 64bit CAS. So far I've
implemented the easiest use case which is for 64bit kernel.

I'll investigate using the FPU register for 64 bit operations with
32bit kernels.

I feel like there is a lot of code duplication in my patches, this can
probably be optimized altho it might reduce readability.

Any comments ?


Thanks,
  Guy

[-- Attachment #2: linux-hppa-atomic-cas-64bit.patch --]
[-- Type: text/x-patch, Size: 5827 bytes --]

--- ./arch/parisc/kernel/syscall.S.orig	2014-07-16 16:39:20.684498341 +0200
+++ ./arch/parisc/kernel/syscall.S	2014-07-17 21:34:35.091933739 +0200
@@ -74,7 +74,7 @@
 	/* ADDRESS 0xb0 to 0xb8, lws uses two insns for entry */
 	/* Light-weight-syscall entry must always be located at 0xb0 */
 	/* WARNING: Keep this number updated with table size changes */
-#define __NR_lws_entries (2)
+#define __NR_lws_entries (3)
 
 lws_entry:
 	gate	lws_start, %r0		/* increase privilege */
@@ -502,7 +502,7 @@
 
 	
 	/***************************************************
-		Implementing CAS as an atomic operation:
+		Implementing 32bit CAS as an atomic operation:
 
 		%r26 - Address to examine
 		%r25 - Old value to check (old)
@@ -658,6 +658,161 @@
 	ASM_EXCEPTIONTABLE_ENTRY(1b-linux_gateway_page, 3b-linux_gateway_page)
 	ASM_EXCEPTIONTABLE_ENTRY(2b-linux_gateway_page, 3b-linux_gateway_page)
 
+	
+	/***************************************************
+		Implementing 64bit CAS as an atomic operation for ELF32:
+
+		%r26 - Address to examine
+		%r25 - Old 32bit high value to check (old)
+		%r24 - Old 32bit low value to check (old)
+		%r23 - New 32bit high value to set (new)
+		%r22 - New 32bit low value to set (new)
+		%r28 - Return prev 32bit high through this register.
+		%r29 - Return prev 32bit low through this register.
+		%r21 - Kernel error code
+
+		If debugging is DISabled:
+
+		%r21 has the following meanings:
+
+		EAGAIN - CAS is busy, ldcw failed, try again.
+		EFAULT - Read or write failed.		
+
+		If debugging is enabled:
+
+		EDEADLOCK - CAS called recursively.
+		EAGAIN && r28 == 1 - CAS is busy. Lock contended.
+		EAGAIN && r28 == 2 - CAS is busy. ldcw failed.
+		EFAULT - Read or write failed.
+
+		Scratch: r20, r28, r1
+
+	****************************************************/
+
+	/* ELF32 Process entry path */
+lws_compare_and_swap_dword:
+#ifdef CONFIG_64BIT
+	/* Clip all the input registers */
+	depdi	0, 31, 32, %r26
+	/* Merge low/high bits */
+	shld	%r25, 32, %r24
+	shld	%r23, 32, %r22
+#else
+#error Not implemented
+#endif
+	/* Load start of lock table */
+	ldil	L%lws_lock_start, %r20
+	ldo	R%lws_lock_start(%r20), %r28
+
+	/* Extract four bits from r26 and hash lock (Bits 4-7) */
+	extru  %r26, 27, 4, %r20
+
+	/* Find lock to use, the hash is either one of 0 to
+	   15, multiplied by 16 (keep it 16-byte aligned)
+	   and add to the lock table offset. */
+	shlw	%r20, 4, %r20
+	add	%r20, %r28, %r20
+
+# if ENABLE_LWS_DEBUG
+	/*	
+		DEBUG, check for deadlock! 
+		If the thread register values are the same
+		then we were the one that locked it last and
+		this is a recurisve call that will deadlock.
+		We *must* giveup this call and fail.
+	*/
+	ldw	4(%sr2,%r20), %r28			/* Load thread register */
+	/* WARNING: If cr27 cycles to the same value we have problems */
+	mfctl	%cr27, %r21				/* Get current thread register */
+	cmpb,<>,n	%r21, %r28, cas_dword_lock	/* Called recursive? */
+	b	lws_exit				/* Return error! */
+	ldo	-EDEADLOCK(%r0), %r21
+cas_dword_lock:
+	cmpb,=,n	%r0, %r28, cas_dword_nocontend /* Is nobody using it? */
+	ldo	1(%r0), %r28				/* 1st case */
+	b	lws_exit				/* Contended... */
+	ldo	-EAGAIN(%r0), %r21			/* Spin in userspace */
+cas_dword_nocontend:
+# endif
+/* ENABLE_LWS_DEBUG */
+
+	rsm	PSW_SM_I, %r0				/* Disable interrupts */
+	/* COW breaks can cause contention on UP systems */
+	LDCW	0(%sr2,%r20), %r28			/* Try to acquire the lock */
+	cmpb,<>,n	%r0, %r28, cas_dword_action	/* Did we get it? */
+cas_dword_wouldblock:
+	ldo	2(%r0), %r28				/* 2nd case */
+	ssm	PSW_SM_I, %r0
+	b	lws_exit				/* Contended... */
+	ldo	-EAGAIN(%r0), %r21			/* Spin in userspace */
+
+	/*
+		prev = *addr;
+		if ( prev == old )
+		  *addr = new;
+		return prev;
+	*/
+
+	/* NOTES:
+		This all works becuse intr_do_signal
+		and schedule both check the return iasq
+		and see that we are on the kernel page
+		so this process is never scheduled off
+		or is ever sent any signal of any sort,
+		thus it is wholly atomic from usrspaces
+		perspective
+	*/
+cas_dword_action:
+#if defined CONFIG_SMP && ENABLE_LWS_DEBUG
+	/* DEBUG */
+	mfctl	%cr27, %r1
+	stw	%r1, 4(%sr2,%r20)
+#endif
+
+#ifdef CONFIG_64BIT
+	/* The load and store could fail */
+4:	ldd,ma	0(%sr3,%r26), %r29
+	sub,<>	%r29, %r24, %r0
+5:	std,ma	%r22, 0(%sr3,%r26)
+	/* Split the high/low bit of the result */
+	shrd	%r29,32,%r28
+	depdi	0, 31, 32, %r28
+#else
+#error Not implemented
+#endif
+
+	/* Free lock */
+	stw,ma	%r20, 0(%sr2,%r20)
+#if ENABLE_LWS_DEBUG
+	/* Clear thread register indicator */
+	stw	%r0, 4(%sr2,%r20)
+#endif
+	/* Enable interrupts */
+	ssm	PSW_SM_I, %r0
+	/* Return to userspace, set no error */
+	b	lws_exit
+	copy	%r0, %r21
+
+6:		
+	/* Error occurred on load or store */
+	/* Free lock */
+	stw	%r20, 0(%sr2,%r20)
+#if ENABLE_LWS_DEBUG
+	stw	%r0, 4(%sr2,%r20)
+#endif
+	ssm	PSW_SM_I, %r0
+	b	lws_exit
+	ldo	-EFAULT(%r0),%r21	/* set errno */
+	nop
+	nop
+	nop
+	nop
+
+	/* Two exception table entries, one for the loads,
+	   the other for the store. Either return -EFAULT.
+	   Each of the entries must be relocated. */
+	ASM_EXCEPTIONTABLE_ENTRY(4b-linux_gateway_page, 6b-linux_gateway_page)
+	ASM_EXCEPTIONTABLE_ENTRY(5b-linux_gateway_page, 6b-linux_gateway_page)
 
 	/* Make sure nothing else is placed on this page */
 	.align PAGE_SIZE
@@ -675,8 +830,9 @@
 	/* Light-weight-syscall table */
 	/* Start of lws table. */
 ENTRY(lws_table)
-	LWS_ENTRY(compare_and_swap32)	/* 0 - ELF32 Atomic compare and swap */
-	LWS_ENTRY(compare_and_swap64)	/* 1 - ELF64 Atomic compare and swap */
+	LWS_ENTRY(compare_and_swap32)		/* 0 - ELF32 Atomic 32bit compare and swap */
+	LWS_ENTRY(compare_and_swap64)		/* 1 - ELF64 Atomic 32bit compare and swap */
+	LWS_ENTRY(compare_and_swap_dword)	/* 2 - ELF32 Atomic 64bit compare and swap */
 END(lws_table)
 	/* End of lws table */
 

[-- Attachment #3: gcc-atomic-buildins-64bit.patch --]
[-- Type: text/x-patch, Size: 5008 bytes --]

--- libgcc/config/pa/linux-atomic.c.orig	2014-07-16 19:29:28.670595484 +0000
+++ libgcc/config/pa/linux-atomic.c	2014-07-16 19:31:32.754003341 +0000
@@ -24,6 +24,8 @@
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
+#include <stdint.h>
+
 #define EFAULT  14 
 #define EBUSY   16
 #define ENOSYS 251 
@@ -75,6 +77,39 @@
   return lws_errno;
 }
 
+/* Kernel helper for compare-and-exchange a 64-bit value from ELF32.  */
+static inline long
+__kernel_cmpxchg_dword32 (int64_t oldval, int64_t newval, int64_t *mem)
+{
+  register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
+  register long lws_ret_h   asm("r28");
+  register long lws_ret_l   asm("r29");
+  register long lws_errno   asm("r21");
+  register int lws_old_h    asm("r25") = oldval >> 32;
+  register int lws_old_l    asm("r24") = oldval & 0xffffffff;
+  register int lws_new_h    asm("r23") = newval >> 32;
+  register int lws_new_l    asm("r22") = newval & 0xffffffff;
+  asm volatile (	"ble	0xb0(%%sr2, %%r0)	\n\t"
+			"ldi	%8, %%r20		\n\t"
+	: "=r" (lws_ret_h), "=r" (lws_ret_l), "=r" (lws_errno), "=r" (lws_mem),
+	  "=r" (lws_old_h), "=r" (lws_old_l), "=r" (lws_new_h), "=r" (lws_new_l)
+	: "i" (2), "3" (lws_mem), "4" (lws_old_h), "5" (lws_old_l), "6" (lws_new_h), "7" (lws_new_l)
+	: "r1", "r20", "r31", "memory"
+  );
+  if (__builtin_expect (lws_errno == -EFAULT || lws_errno == -ENOSYS, 0))
+    ABORT_INSTRUCTION;
+
+   int64_t lws_ret = ((int64_t)lws_ret_h << 32) | (int64_t)lws_ret_l;
+
+  /* If the kernel LWS call succeeded (lws_errno == 0), lws_ret contains
+     the old value from memory.  If this value is equal to OLDVAL, the
+     new value was written to memory.  If not, return -EBUSY.  */
+  if (!lws_errno && lws_ret != oldval)
+    lws_errno = -EBUSY;
+
+  return lws_errno;
+}
+
 #define HIDDEN __attribute__ ((visibility ("hidden")))
 
 /* Big endian masks  */
@@ -84,6 +119,28 @@
 #define MASK_1 0xffu
 #define MASK_2 0xffffu
 
+#define FETCH_AND_OP_DWORD(OP, PFX_OP, INF_OP)					\
+  int64_t HIDDEN								\
+  __sync_fetch_and_##OP##_8 (int64_t *ptr, int64_t val)				\
+  {										\
+    int64_t tmp;								\
+    int failure;								\
+										\
+    do {									\
+      tmp = *ptr;								\
+      failure = __kernel_cmpxchg_dword32 (tmp, PFX_OP (tmp INF_OP val), ptr);	\
+    } while (failure != 0);							\
+										\
+    return tmp;									\
+  }
+
+FETCH_AND_OP_DWORD (add,   , +)
+FETCH_AND_OP_DWORD (sub,   , -)
+FETCH_AND_OP_DWORD (or,    , |)
+FETCH_AND_OP_DWORD (and,   , &)
+FETCH_AND_OP_DWORD (xor,   , ^)
+FETCH_AND_OP_DWORD (nand, ~, &)
+
 #define FETCH_AND_OP_WORD(OP, PFX_OP, INF_OP)				\
   int HIDDEN								\
   __sync_fetch_and_##OP##_4 (int *ptr, int val)				\
@@ -147,6 +204,28 @@
 SUBWORD_SYNC_OP (xor,   , ^, unsigned char, 1, oldval)
 SUBWORD_SYNC_OP (nand, ~, &, unsigned char, 1, oldval)
 
+#define OP_AND_FETCH_DWORD(OP, PFX_OP, INF_OP)					\
+  int64_t HIDDEN								\
+  __sync_##OP##_and_fetch_8 (int64_t *ptr, int64_t val)				\
+  {										\
+    int64_t tmp;								\
+    int failure;								\
+										\
+    do {									\
+      tmp = *ptr;								\
+      failure = __kernel_cmpxchg_dword32 (tmp, PFX_OP (tmp INF_OP val), ptr);	\
+    } while (failure != 0);							\
+										\
+    return PFX_OP (tmp INF_OP val);						\
+  }
+
+OP_AND_FETCH_DWORD (add,   , +)
+OP_AND_FETCH_DWORD (sub,   , -)
+OP_AND_FETCH_DWORD (or,    , |)
+OP_AND_FETCH_DWORD (and,   , &)
+OP_AND_FETCH_DWORD (xor,   , ^)
+OP_AND_FETCH_DWORD (nand, ~, &)
+
 #define OP_AND_FETCH_WORD(OP, PFX_OP, INF_OP)				\
   int HIDDEN								\
   __sync_##OP##_and_fetch_4 (int *ptr, int val)				\
@@ -182,6 +261,26 @@
 SUBWORD_SYNC_OP (xor,   , ^, unsigned char, 1, newval)
 SUBWORD_SYNC_OP (nand, ~, &, unsigned char, 1, newval)
 
+int64_t HIDDEN
+__sync_val_compare_and_swap_8 (int64_t *ptr, int64_t oldval, int64_t newval)
+{
+  int64_t actual_oldval;
+  int fail;
+    
+  while (1)
+    {
+      actual_oldval = *ptr;
+
+      if (__builtin_expect (oldval != actual_oldval, 0))
+	return actual_oldval;
+
+      fail = __kernel_cmpxchg_dword32 (actual_oldval, newval, ptr);
+  
+      if (__builtin_expect (!fail, 1))
+	return actual_oldval;
+    }
+}
+
 int HIDDEN
 __sync_val_compare_and_swap_4 (int *ptr, int oldval, int newval)
 {
@@ -256,6 +355,20 @@
 SUBWORD_BOOL_CAS (unsigned short, 2)
 SUBWORD_BOOL_CAS (unsigned char,  1)
 
+int64_t HIDDEN
+__sync_lock_test_and_set_8 (int64_t *ptr, int64_t val)
+{
+  int64_t oldval;
+  int failure;
+
+  do {
+    oldval = *ptr;
+    failure = __kernel_cmpxchg_dword32 (oldval, val, ptr);
+  } while (failure != 0);
+
+  return oldval;
+}
+
 int HIDDEN
 __sync_lock_test_and_set_4 (int *ptr, int val)
 {
@@ -300,6 +413,7 @@
     *ptr = 0;								\
   }
 
-SYNC_LOCK_RELEASE (int,   4)
-SYNC_LOCK_RELEASE (short, 2)
-SYNC_LOCK_RELEASE (char,  1)
+SYNC_LOCK_RELEASE (int64_t, 8)
+SYNC_LOCK_RELEASE (int,     4)
+SYNC_LOCK_RELEASE (short,   2)
+SYNC_LOCK_RELEASE (char,    1)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-17 20:00 [RFC PATCH] 64bit LWS CAS Guy Martin
@ 2014-07-17 21:12 ` John David Anglin
  2014-07-17 23:27   ` John David Anglin
  2014-07-17 21:30 ` Helge Deller
  2014-07-18 14:49 ` John David Anglin
  2 siblings, 1 reply; 11+ messages in thread
From: John David Anglin @ 2014-07-17 21:12 UTC (permalink / raw)
  To: Guy Martin, linux-parisc

On 7/17/2014 4:00 PM, Guy Martin wrote:
> +#include <stdint.h>
> +
I would kill the above include and use long long instead of int64_t.  
Generally,
we don't want libgcc routines to depend on system includes.

SYNC_LOCK_RELEASE needs to be fixed to use __kernel_cmpxchg and 
__kernel_cmpxchg_dword32.
The current implementation won't be atomic for long long, and in glibc, 
a similar implementation caused
problems for the int type.  I've been testing a patch for the latter and 
will apply it soon.

Dave

-- 
John David Anglin    dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-17 20:00 [RFC PATCH] 64bit LWS CAS Guy Martin
  2014-07-17 21:12 ` John David Anglin
@ 2014-07-17 21:30 ` Helge Deller
  2014-07-17 22:51   ` John David Anglin
  2014-07-18 14:49 ` John David Anglin
  2 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2014-07-17 21:30 UTC (permalink / raw)
  To: Guy Martin, linux-parisc

On 07/17/2014 10:00 PM, Guy Martin wrote:
+/* Kernel helper for compare-and-exchange a 64-bit value from ELF32.  */
+static inline long
+__kernel_cmpxchg_dword32 (int64_t oldval, int64_t newval, int64_t *mem)
+{
+  register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
+  register long lws_ret_h   asm("r28");
+  register long lws_ret_l   asm("r29");
+  register long lws_errno   asm("r21");
+  register int lws_old_h    asm("r25") = oldval >> 32;
+  register int lws_old_l    asm("r24") = oldval & 0xffffffff;
+  register int lws_new_h    asm("r23") = newval >> 32;
+  register int lws_new_l    asm("r22") = newval & 0xffffffff;
+  asm volatile (	"ble	0xb0(%%sr2, %%r0)	\n\t"
+			"ldi	%8, %%r20		\n\t"
+	: "=r" (lws_ret_h), "=r" (lws_ret_l), "=r" (lws_errno), "=r" (lws_mem),
+	  "=r" (lws_old_h), "=r" (lws_old_l), "=r" (lws_new_h), "=r" (lws_new_l)
+	: "i" (2), "3" (lws_mem), "4" (lws_old_h), "5" (lws_old_l), "6" (lws_new_h), "7" (lws_new_l)
+	: "r1", "r20", "r31", "memory"
+  );

Just a thought:
I'm not sure how good gcc optimizes the assignment of the 64bit parameters to their final destination registers (r22-r25) with regard to the shifting and masking, but it might be worth to check if gcc's built-in "R2" functionality (sorry, I don't know the name of this feature!) can help here?

As an example see the __put_kernel_asm64() macro in the the kernel header arch/parisc/include/asm/uaccess.h:

#define __put_kernel_asm64(__val,ptr) do {                  \
        __asm__ __volatile__ (                              \
                "\n1:\tstw %2,0(%1)"                        \
                "\n2:\tstw %R2,4(%1)\n\t"                   \
                : "=r"(__pu_err)                            \
                : "r"(ptr), "r"(__val), "0"(__pu_err) \
                : "r1");

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-17 21:30 ` Helge Deller
@ 2014-07-17 22:51   ` John David Anglin
  0 siblings, 0 replies; 11+ messages in thread
From: John David Anglin @ 2014-07-17 22:51 UTC (permalink / raw)
  To: Helge Deller; +Cc: Guy Martin, linux-parisc

On 17-Jul-14, at 5:30 PM, Helge Deller wrote:

> On 07/17/2014 10:00 PM, Guy Martin wrote:
> +/* Kernel helper for compare-and-exchange a 64-bit value from  
> ELF32.  */
> +static inline long
> +__kernel_cmpxchg_dword32 (int64_t oldval, int64_t newval, int64_t  
> *mem)
> +{
> +  register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
> +  register long lws_ret_h   asm("r28");
> +  register long lws_ret_l   asm("r29");
> +  register long lws_errno   asm("r21");
> +  register int lws_old_h    asm("r25") = oldval >> 32;
> +  register int lws_old_l    asm("r24") = oldval & 0xffffffff;
> +  register int lws_new_h    asm("r23") = newval >> 32;
> +  register int lws_new_l    asm("r22") = newval & 0xffffffff;
> +  asm volatile (	"ble	0xb0(%%sr2, %%r0)	\n\t"
> +			"ldi	%8, %%r20		\n\t"
> +	: "=r" (lws_ret_h), "=r" (lws_ret_l), "=r" (lws_errno),  
> "=r" (lws_mem),
> +	  "=r" (lws_old_h), "=r" (lws_old_l), "=r" (lws_new_h),  
> "=r" (lws_new_l)
> +	: "i" (2), "3" (lws_mem), "4" (lws_old_h), "5" (lws_old_l),  
> "6" (lws_new_h), "7" (lws_new_l)
> +	: "r1", "r20", "r31", "memory"
> +  );
>
> Just a thought:
> I'm not sure how good gcc optimizes the assignment of the 64bit  
> parameters to their final destination registers (r22-r25) with  
> regard to the shifting and masking, but it might be worth to check  
> if gcc's built-in "R2" functionality (sorry, I don't know the name  
> of this feature!) can help here?
>
> As an example see the __put_kernel_asm64() macro in the the kernel  
> header arch/parisc/include/asm/uaccess.h:
>
> #define __put_kernel_asm64(__val,ptr) do {                  \
>        __asm__ __volatile__ (                              \
>                "\n1:\tstw %2,0(%1)"                        \
>                "\n2:\tstw %R2,4(%1)\n\t"                   \
>                : "=r"(__pu_err)                            \
>                : "r"(ptr), "r"(__val), "0"(__pu_err) \
>                : "r1");


Maybe.  On the other hand, if mem was the third argument in the ble  
call, it likely would be possible to write:

register long long lws_ret asm("r28");
register long long lws_old asm("r25");
register long long lws_new asm("r23");
register unsigned long lws_mem asm("r22");

This is consistent with the parisc calling conventions for 64-bit  
objects.  If it works, the low part of lws_ret, etc,
should automatically get allocated to the correct register and the  
copy done efficiently.  Have to say I have
not tried it.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-17 21:12 ` John David Anglin
@ 2014-07-17 23:27   ` John David Anglin
  2014-07-17 23:57     ` John David Anglin
  0 siblings, 1 reply; 11+ messages in thread
From: John David Anglin @ 2014-07-17 23:27 UTC (permalink / raw)
  To: John David Anglin; +Cc: Guy Martin, linux-parisc

On 17-Jul-14, at 5:12 PM, John David Anglin wrote:

> SYNC_LOCK_RELEASE needs to be fixed to use __kernel_cmpxchg and  
> __kernel_cmpxchg_dword32.
> The current implementation won't be atomic for long long, and in  
> glibc, a similar implementation caused
> problems for the int type.  I've been testing a patch for the latter  
> and will apply it soon.

The change is now committed to GCC 4.10.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-17 23:27   ` John David Anglin
@ 2014-07-17 23:57     ` John David Anglin
  0 siblings, 0 replies; 11+ messages in thread
From: John David Anglin @ 2014-07-17 23:57 UTC (permalink / raw)
  To: John David Anglin; +Cc: Guy Martin, linux-parisc

On 17-Jul-14, at 7:27 PM, John David Anglin wrote:

> On 17-Jul-14, at 5:12 PM, John David Anglin wrote:
>
>> SYNC_LOCK_RELEASE needs to be fixed to use __kernel_cmpxchg and  
>> __kernel_cmpxchg_dword32.
>> The current implementation won't be atomic for long long, and in  
>> glibc, a similar implementation caused
>> problems for the int type.  I've been testing a patch for the  
>> latter and will apply it soon.
>
> The change is now committed to GCC 4.10.

Also, see this change:
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01242.html

We will need a DI mode define for the 64-bit sync operation.

This enables "future" support in libstdc++.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-17 20:00 [RFC PATCH] 64bit LWS CAS Guy Martin
  2014-07-17 21:12 ` John David Anglin
  2014-07-17 21:30 ` Helge Deller
@ 2014-07-18 14:49 ` John David Anglin
  2014-07-18 21:55   ` Carlos O'Donell
  2 siblings, 1 reply; 11+ messages in thread
From: John David Anglin @ 2014-07-18 14:49 UTC (permalink / raw)
  To: Guy Martin, linux-parisc

On 7/17/2014 4:00 PM, Guy Martin wrote:
> +/* Kernel helper for compare-and-exchange a 64-bit value from ELF32.  */
> +static inline long
> +__kernel_cmpxchg_dword32 (int64_t oldval, int64_t newval, int64_t *mem)
I'm thinking we don't need suffix "32".  If a 64-bit runtime is ever 
developed,
there probably would be a separate file for it as on arm.

This discussion has got me thinking that the char and short 
implementations may be broken (i.e,
the CAS operation may clobber data adjacent to location being operated 
on).  I'm thinking we need
separate LWS calls for them.  Thoughts?

Dave

-- 
John David Anglin    dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-18 14:49 ` John David Anglin
@ 2014-07-18 21:55   ` Carlos O'Donell
  2014-07-18 22:12     ` John David Anglin
  0 siblings, 1 reply; 11+ messages in thread
From: Carlos O'Donell @ 2014-07-18 21:55 UTC (permalink / raw)
  To: John David Anglin; +Cc: Guy Martin, linux-parisc

On Fri, Jul 18, 2014 at 10:49 AM, John David Anglin
<dave.anglin@bell.net> wrote:
> On 7/17/2014 4:00 PM, Guy Martin wrote:
>>
>> +/* Kernel helper for compare-and-exchange a 64-bit value from ELF32.  */
>> +static inline long
>> +__kernel_cmpxchg_dword32 (int64_t oldval, int64_t newval, int64_t *mem)
>
> I'm thinking we don't need suffix "32".  If a 64-bit runtime is ever
> developed,
> there probably would be a separate file for it as on arm.
>
> This discussion has got me thinking that the char and short implementations
> may be broken (i.e,
> the CAS operation may clobber data adjacent to location being operated on).
> I'm thinking we need
> separate LWS calls for them.  Thoughts?

Yes, you need masking or short load variants. How did you test the
char and short implementations? ;-)

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-18 21:55   ` Carlos O'Donell
@ 2014-07-18 22:12     ` John David Anglin
  2014-07-19  0:38       ` Carlos O'Donell
  0 siblings, 1 reply; 11+ messages in thread
From: John David Anglin @ 2014-07-18 22:12 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Guy Martin, linux-parisc

On 18-Jul-14, at 5:55 PM, Carlos O'Donell wrote:

>> This discussion has got me thinking that the char and short  
>> implementations
>> may be broken (i.e,
>> the CAS operation may clobber data adjacent to location being  
>> operated on).
>> I'm thinking we need
>> separate LWS calls for them.  Thoughts?
>
> Yes, you need masking or short load variants. How did you test the
> char and short implementations? ;-)

There is no testing beyond the general testing of atomics in GCC.   
There is masking
in the GCC code but I think this is racy.  Another thread might modify  
the masked data
during the CAS operation and this might be lost.  I've never seen it  
in the real world though.

The implementation was originally copied from arm.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-18 22:12     ` John David Anglin
@ 2014-07-19  0:38       ` Carlos O'Donell
  2014-07-19  0:42         ` John David Anglin
  0 siblings, 1 reply; 11+ messages in thread
From: Carlos O'Donell @ 2014-07-19  0:38 UTC (permalink / raw)
  To: John David Anglin; +Cc: Guy Martin, linux-parisc

On Fri, Jul 18, 2014 at 6:12 PM, John David Anglin <dave.anglin@bell.net> wrote:
> There is no testing beyond the general testing of atomics in GCC.  There is
> masking
> in the GCC code but I think this is racy.  Another thread might modify the
> masked data
> during the CAS operation and this might be lost.  I've never seen it in the
> real world though.

The race is real IMO, but you're right that it might be hard to trigger.

My opinion is that the only way you can do this is to write new LWS
CAS variants that do half-word or byte loads, compares and stores. You
can't touch the surrounding data safely.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] 64bit LWS CAS
  2014-07-19  0:38       ` Carlos O'Donell
@ 2014-07-19  0:42         ` John David Anglin
  0 siblings, 0 replies; 11+ messages in thread
From: John David Anglin @ 2014-07-19  0:42 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Guy Martin, linux-parisc

On 18-Jul-14, at 8:38 PM, Carlos O'Donell wrote:

> On Fri, Jul 18, 2014 at 6:12 PM, John David Anglin <dave.anglin@bell.net 
> > wrote:
>> There is no testing beyond the general testing of atomics in GCC.   
>> There is
>> masking
>> in the GCC code but I think this is racy.  Another thread might  
>> modify the
>> masked data
>> during the CAS operation and this might be lost.  I've never seen  
>> it in the
>> real world though.
>
> The race is real IMO, but you're right that it might be hard to  
> trigger.
>
> My opinion is that the only way you can do this is to write new LWS
> CAS variants that do half-word or byte loads, compares and stores. You
> can't touch the surrounding data safely.

I totally agree.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-19  0:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-17 20:00 [RFC PATCH] 64bit LWS CAS Guy Martin
2014-07-17 21:12 ` John David Anglin
2014-07-17 23:27   ` John David Anglin
2014-07-17 23:57     ` John David Anglin
2014-07-17 21:30 ` Helge Deller
2014-07-17 22:51   ` John David Anglin
2014-07-18 14:49 ` John David Anglin
2014-07-18 21:55   ` Carlos O'Donell
2014-07-18 22:12     ` John David Anglin
2014-07-19  0:38       ` Carlos O'Donell
2014-07-19  0:42         ` John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.