All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] x86: Ticket lock cleanup
@ 2011-08-22 23:14 Jeremy Fitzhardinge
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
  0 siblings, 1 reply; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>


Hi all,

This is a repost of the ticketlock cleanup I posted for the last mergewindow.

The main differences are:
 - put "asm volatiles" into the locked unlock path
 - tidy up xadd helper to put it into a common place
 - clean up the rest of cmpxchg.h to remove most of the 32/64 duplication

The main downside of the 32/64 cmpxchg/xadd duplication is that passing a 64-bit param
to cmpxchg/xadd on a 32-bit system will generate a bad instruction rather than a link
error, but I don't think that's a huge issue.

As a bonus, I added some cmpxchg_flag() variants, which return a bool rather than
the old value, which saves the caller from having to do it (since cmpxchg itself
sets the flags).  This is useful since many callers of cmpxchg() only care whether
it worked without actually getting the old value.

Thanks,
	J

Jeremy Fitzhardinge (12):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
    same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86: add xadd helper macro
  x86/ticketlock: use xadd helper
  x86/cmpxchg: linux/alternative.h has LOCK_PREFIX
  x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit.
  x86/cmpxchg: move 64-bit set64_bit() to match 32-bit
  x86/cmpxchg: unify cmpxchg into cmpxchg.h

 arch/x86/include/asm/cmpxchg.h        |  165 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_32.h     |  114 -----------------------
 arch/x86/include/asm/cmpxchg_64.h     |  131 --------------------------
 arch/x86/include/asm/spinlock.h       |  140 +++++++++------------------
 arch/x86/include/asm/spinlock_types.h |   22 ++++-
 5 files changed, 232 insertions(+), 340 deletions(-)

-- 
1.7.6


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/15] x86/ticketlock: clean up types and accessors
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
@ 2011-08-22 23:14   ` Jeremy Fitzhardinge
  2011-08-22 23:14   ` [PATCH 02/15] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   24 ++++++++++--------------
 arch/x86/include/asm/spinlock_types.h |   20 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	unsigned short inc = 1 << TICKET_SHIFT;
 
 	asm volatile (
 		LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp, new;
+	unsigned int tmp, new;
 
 	asm volatile("movzwl %2, %0\n\t"
 		     "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	int inc = 0x00010000;
-	int tmp;
+	unsigned inc = 1 << TICKET_SHIFT;
+	unsigned tmp;
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 		     "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp;
-	int new;
+	unsigned tmp;
+	unsigned new;
 
 	asm volatile("movl %2,%0\n\t"
 		     "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
+#define TICKET_MASK	((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-	unsigned int slock;
+	union {
+		unsigned int slock;
+		struct __raw_tickets {
+			__ticket_t head, tail;
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/15] x86/ticketlock: convert spin loop to C
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
  2011-08-22 23:14   ` [PATCH 01/15] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-08-22 23:14   ` Jeremy Fitzhardinge
  2011-08-22 23:14   ` [PATCH 03/15] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   58 +++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned short inc = 1 << TICKET_SHIFT;
-
-	asm volatile (
-		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
-		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
-		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
-		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
-		:
-		: "memory", "cc");
+	register union {
+		struct __raw_tickets tickets;
+		unsigned short slock;
+	} inc = { .slock = 1 << TICKET_SHIFT };
+
+	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+	for (;;) {
+		if (inc.tickets.head == inc.tickets.tail)
+			goto out;
+		cpu_relax();
+		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned inc = 1 << TICKET_SHIFT;
-	unsigned tmp;
+	__ticket_t tmp;
 
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-		     "movzwl %w0, %2\n\t"
-		     "shrl $16, %0\n\t"
-		     "1:\t"
-		     "cmpl %0, %2\n\t"
-		     "je 2f\n\t"
-		     "rep ; nop\n\t"
-		     "movzwl %1, %2\n\t"
-		     /* don't need lfence here, because loads are in-order */
-		     "jmp 1b\n"
-		     "2:"
-		     : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-		     :
-		     : "memory", "cc");
+	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+		     : "+r" (inc), "+m" (lock->slock)
+		     : : "memory", "cc");
+
+	tmp = inc;
+	inc >>= TICKET_SHIFT;
+
+	for (;;) {
+		if ((__ticket_t)inc == tmp)
+			goto out;
+		cpu_relax();
+		tmp = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/15] x86/ticketlock: Use C for __ticket_spin_unlock
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
  2011-08-22 23:14   ` [PATCH 01/15] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
  2011-08-22 23:14   ` [PATCH 02/15] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-08-22 23:14   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 04/15] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   33 ++++++++++++++++++---------------
 1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..61feeb5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "incb %0"
+			      : "+m" (lock->tickets.head) : : "memory");
+	else
+		asm volatile (LOCK_PREFIX "incw %0"
+			      : "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	lock->tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,14 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	barrier();		/* prevent reordering out of locked region */
+	__ticket_unlock_release(lock);
+	barrier();		/* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/15] x86/ticketlock: make large and small ticket versions of spin_lock the same
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (2 preceding siblings ...)
  2011-08-22 23:14   ` [PATCH 03/15] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 05/15] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   23 ++++++++---------------
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 61feeb5..894753f 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	register union {
-		struct __raw_tickets tickets;
-		unsigned short slock;
-	} inc = { .slock = 1 << TICKET_SHIFT };
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
 	for (;;) {
-		if (inc.tickets.head == inc.tickets.tail)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned inc = 1 << TICKET_SHIFT;
-	__ticket_t tmp;
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->slock)
+		     : "+r" (inc), "+m" (lock->tickets)
 		     : : "memory", "cc");
 
-	tmp = inc;
-	inc >>= TICKET_SHIFT;
-
 	for (;;) {
-		if ((__ticket_t)inc == tmp)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		tmp = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/15] x86/ticketlock: make __ticket_spin_lock common
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (3 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 04/15] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 06/15] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   42 ++++++++++++++++++--------------------
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 894753f..37c3b10 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets tickets = { .tail = 1 };
+
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+			      : "+r" (tickets), "+m" (lock->tickets)
+			      : : "memory", "cc");
+	else
+		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+			     : "+r" (tickets), "+m" (lock->tickets)
+			     : : "memory", "cc");
 
-	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+	return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+	register struct __raw_tickets inc;
+
+	inc = __ticket_spin_claim(lock);
 
 	for (;;) {
 		if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-	register struct __raw_tickets inc = { .tail = 1 };
-
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->tickets)
-		     : : "memory", "cc");
-
-	for (;;) {
-		if (inc.head == inc.tail)
-			goto out;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
-	}
-out:	barrier();		/* make sure nothing creeps before the lock is taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned tmp;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/15] x86/ticketlock: make __ticket_spin_trylock common
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (4 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 05/15] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 07/15] x86: add xadd helper macro Jeremy Fitzhardinge
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   49 +++++++--------------------------
 arch/x86/include/asm/spinlock_types.h |    6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 37c3b10..4857dee 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp, new;
-
-	asm volatile("movzwl %2, %0\n\t"
-		     "cmpb %h0,%b0\n\t"
-		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned tmp;
-	unsigned new;
-
-	asm volatile("movl %2,%0\n\t"
-		     "movl %0,%1\n\t"
-		     "roll $16, %0\n\t"
-		     "cmpl %0,%1\n\t"
-		     "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
+	arch_spinlock_t old, new;
+
+	old.tickets = ACCESS_ONCE(lock->tickets);
+	if (old.tickets.head != old.tickets.tail)
+		return 0;
+
+	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+	/* cmpxchg is a full barrier, so nothing can move before it */
+	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
 	union {
-		unsigned int slock;
+		__ticketpair_t head_tail;
 		struct __raw_tickets {
 			__ticket_t head, tail;
 		} tickets;
 	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/15] x86: add xadd helper macro
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (5 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 06/15] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:29     ` H. Peter Anvin
  2011-08-23 10:59     ` Peter Zijlstra
  2011-08-22 23:15   ` [PATCH 08/15] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
                     ` (7 subsequent siblings)
  14 siblings, 2 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Add a common xadd implementation.

This has the side effect of generating a bad instruction if you try to
use it on a 64-bit value on a 32-bit system - but don't do that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index a460fa0..b771cdb 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -1,5 +1,37 @@
+#ifndef ASM_X86_CMPXCHG_H
+#define ASM_X86_CMPXCHG_H
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
 # include "cmpxchg_64.h"
 #endif
+
+
+#define xadd(ptr, inc)							\
+	do {								\
+		switch (sizeof(*(ptr))) {				\
+		case 1:							\
+			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 2:							\
+			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 4:							\
+			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 8:							\
+			asm volatile (LOCK_PREFIX "xaddq %q0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		}							\
+	} while(0)
+
+#endif	/* ASM_X86_CMPXCHG_H */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/15] x86/ticketlock: use xadd helper
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (6 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 07/15] x86: add xadd helper macro Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 09/15] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4857dee..05fd59e 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -71,14 +71,7 @@ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spin
 {
 	register struct __raw_tickets tickets = { .tail = 1 };
 
-	if (sizeof(lock->tickets.head) == sizeof(u8))
-		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-			      : "+r" (tickets), "+m" (lock->tickets)
-			      : : "memory", "cc");
-	else
-		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
-			     : "+r" (tickets), "+m" (lock->tickets)
-			     : : "memory", "cc");
+	xadd(&lock->tickets, tickets);
 
 	return tickets;
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/15] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (7 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 08/15] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 10/15] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Not linux/bitops.h.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_32.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 284a6e8..8d53b16 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -1,7 +1,7 @@
 #ifndef _ASM_X86_CMPXCHG_32_H
 #define _ASM_X86_CMPXCHG_32_H
 
-#include <linux/bitops.h> /* for LOCK_PREFIX */
+#include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
 /*
  * Note: if you use set64_bit(), __cmpxchg64(), or their variants, you
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/15] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit.
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (8 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 09/15] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 11/15] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_32.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 8d53b16..0ef9b82 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -9,6 +9,7 @@
  */
 
 extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
 
 /*
  * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
@@ -84,8 +85,6 @@ static inline void set_64bit(volatile u64 *ptr, u64 value)
 		     : "memory");
 }
 
-extern void __cmpxchg_wrong_size(void);
-
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/15] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (9 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 10/15] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 12/15] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_64.h |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 423ae58..aeaa610 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -3,11 +3,6 @@
 
 #include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
-static inline void set_64bit(volatile u64 *ptr, u64 val)
-{
-	*ptr = val;
-}
-
 extern void __xchg_wrong_size(void);
 extern void __cmpxchg_wrong_size(void);
 
@@ -66,6 +61,11 @@ extern void __cmpxchg_wrong_size(void);
 #define xchg(ptr, v)							\
 	__xchg((v), (ptr), sizeof(*ptr))
 
+static inline void set_64bit(volatile u64 *ptr, u64 val)
+{
+	*ptr = val;
+}
+
 #define __HAVE_ARCH_CMPXCHG 1
 
 /*
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/15] x86/cmpxchg: unify cmpxchg into cmpxchg.h
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (10 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 11/15] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 13/15] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Everything that's actually common between 32 and 64-bit is moved into
cmpxchg.h.

The common variants are derived from the 64-bit versions, so they
will emit 64-bit instructions if given 64-bit arguments, which will
be bad instructions on a 32-bit target.  Don't do that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h    |  133 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_32.h |  113 -------------------------------
 arch/x86/include/asm/cmpxchg_64.h |  131 ------------------------------------
 3 files changed, 133 insertions(+), 244 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index b771cdb..76375ba 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -1,6 +1,128 @@
 #ifndef ASM_X86_CMPXCHG_H
 #define ASM_X86_CMPXCHG_H
 
+#include <asm/alternative.h> /* Provides LOCK_PREFIX */
+
+extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
+ * Since this is generally used to protect other memory information, we
+ * use "asm volatile" and "memory" clobbers to prevent gcc from moving
+ * information around.
+ */
+#define __xchg(x, ptr, size)						\
+({									\
+	__typeof(*(ptr)) __x = (x);					\
+	switch (size) {							\
+	case 1:								\
+	{								\
+		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
+		asm volatile("xchgb %0,%1"				\
+			     : "=q" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 2:								\
+	{								\
+		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
+		asm volatile("xchgw %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 4:								\
+	{								\
+		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
+		asm volatile("xchgl %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 8:								\
+	{								\
+		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
+		asm volatile("xchgq %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	default:							\
+		__xchg_wrong_size();					\
+	}								\
+	__x;								\
+})
+
+#define xchg(ptr, v)							\
+	__xchg((v), (ptr), sizeof(*ptr))
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+({									\
+	__typeof__(*(ptr)) __ret;					\
+	__typeof__(*(ptr)) __old = (old);				\
+	__typeof__(*(ptr)) __new = (new);				\
+	switch (size) {							\
+	case 1:								\
+	{								\
+		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
+		asm volatile(lock "cmpxchgb %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "q" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 2:								\
+	{								\
+		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
+		asm volatile(lock "cmpxchgw %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 4:								\
+	{								\
+		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
+		asm volatile(lock "cmpxchgl %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 8:								\
+	{								\
+		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
+		asm volatile(lock "cmpxchgq %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	default:							\
+		__cmpxchg_wrong_size();					\
+	}								\
+	__ret;								\
+})
+
+#define __cmpxchg(ptr, old, new, size)					\
+	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
+
+#define __sync_cmpxchg(ptr, old, new, size)				\
+	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
+
+#define __cmpxchg_local(ptr, old, new, size)				\
+	__raw_cmpxchg((ptr), (old), (new), (size), "")
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
@@ -8,6 +130,17 @@
 #endif
 
 
+#ifdef __HAVE_ARCH_CMPXCHG
+#define cmpxchg(ptr, old, new)						\
+	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new)					\
+	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new)					\
+	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
+#endif
+
 #define xadd(ptr, inc)							\
 	do {								\
 		switch (sizeof(*(ptr))) {				\
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 0ef9b82..3b573f6 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -1,62 +1,11 @@
 #ifndef _ASM_X86_CMPXCHG_32_H
 #define _ASM_X86_CMPXCHG_32_H
 
-#include <asm/alternative.h> /* Provides LOCK_PREFIX */
-
 /*
  * Note: if you use set64_bit(), __cmpxchg64(), or their variants, you
  *       you need to test for the feature in boot_cpu_data.
  */
 
-extern void __xchg_wrong_size(void);
-extern void __cmpxchg_wrong_size(void);
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define __xchg(x, ptr, size)						\
-({									\
-	__typeof(*(ptr)) __x = (x);					\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile("xchgb %0,%1"				\
-			     : "=q" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile("xchgw %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile("xchgl %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__xchg_wrong_size();					\
-	}								\
-	__x;								\
-})
-
-#define xchg(ptr, v)							\
-	__xchg((v), (ptr), sizeof(*ptr))
-
 /*
  * CMPXCHG8B only writes to the target if we had the previous
  * value in registers, otherwise it acts as a read and gives us the
@@ -85,70 +34,8 @@ static inline void set_64bit(volatile u64 *ptr, u64 value)
 		     : "memory");
 }
 
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
-({									\
-	__typeof__(*(ptr)) __ret;					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "q" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__cmpxchg_wrong_size();					\
-	}								\
-	__ret;								\
-})
-
-#define __cmpxchg(ptr, old, new, size)					\
-	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
-#define __sync_cmpxchg(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
-
-#define __cmpxchg_local(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "")
-
 #ifdef CONFIG_X86_CMPXCHG
 #define __HAVE_ARCH_CMPXCHG 1
-
-#define cmpxchg(ptr, old, new)						\
-	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define sync_cmpxchg(ptr, old, new)					\
-	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define cmpxchg_local(ptr, old, new)					\
-	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
 #endif
 
 #ifdef CONFIG_X86_CMPXCHG64
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index aeaa610..9d6c147 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -1,66 +1,6 @@
 #ifndef _ASM_X86_CMPXCHG_64_H
 #define _ASM_X86_CMPXCHG_64_H
 
-#include <asm/alternative.h> /* Provides LOCK_PREFIX */
-
-extern void __xchg_wrong_size(void);
-extern void __cmpxchg_wrong_size(void);
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define __xchg(x, ptr, size)						\
-({									\
-	__typeof(*(ptr)) __x = (x);					\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile("xchgb %0,%1"				\
-			     : "=q" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile("xchgw %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile("xchgl %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 8:								\
-	{								\
-		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile("xchgq %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__xchg_wrong_size();					\
-	}								\
-	__x;								\
-})
-
-#define xchg(ptr, v)							\
-	__xchg((v), (ptr), sizeof(*ptr))
-
 static inline void set_64bit(volatile u64 *ptr, u64 val)
 {
 	*ptr = val;
@@ -68,77 +8,6 @@ static inline void set_64bit(volatile u64 *ptr, u64 val)
 
 #define __HAVE_ARCH_CMPXCHG 1
 
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
-({									\
-	__typeof__(*(ptr)) __ret;					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "q" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 8:								\
-	{								\
-		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile(lock "cmpxchgq %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__cmpxchg_wrong_size();					\
-	}								\
-	__ret;								\
-})
-
-#define __cmpxchg(ptr, old, new, size)					\
-	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
-#define __sync_cmpxchg(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
-
-#define __cmpxchg_local(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "")
-
-#define cmpxchg(ptr, old, new)						\
-	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define sync_cmpxchg(ptr, old, new)					\
-	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define cmpxchg_local(ptr, old, new)					\
-	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
-
 #define cmpxchg64(ptr, o, n)						\
 ({									\
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/15] x86: add cmpxchg_flag() variant
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (11 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 12/15] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-23 19:01     ` Christoph Lameter
  2011-08-22 23:15   ` [PATCH 14/15] x86/ticketlocks: use cmpxchg_flag for trylock Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 15/15] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
  14 siblings, 1 reply; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Most callers of cmpxchg() direcly compare RETURN with OLD to see if it was
successful.  This results in unnecessary conditional comparisons
and conditionals since the cmpxchg instruction directly sets the flags
to indicate success/failure.

Add cmpxchg_flag() variants which return a boolean flag directly indicating
success.  Unfortunately an asm() statement can't directly export status
status flags, but sete isn't too bad.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h    |   48 +++++++++++++++++++++++++++++++-----
 arch/x86/include/asm/cmpxchg_32.h |   14 ++++++++++
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 76375ba..57d6706 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -64,18 +64,18 @@ extern void __cmpxchg_wrong_size(void);
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
+ * determined by "compare"
  */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+#define __raw_cmpxchg_cmp(ptr, old, new, size, lock, rettype, compare)	\
 ({									\
-	__typeof__(*(ptr)) __ret;					\
+	rettype __ret;							\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
 	switch (size) {							\
 	case 1:								\
 	{								\
 		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
+		asm volatile(lock "cmpxchgb %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "q" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -84,7 +84,7 @@ extern void __cmpxchg_wrong_size(void);
 	case 2:								\
 	{								\
 		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
+		asm volatile(lock "cmpxchgw %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -93,7 +93,7 @@ extern void __cmpxchg_wrong_size(void);
 	case 4:								\
 	{								\
 		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
+		asm volatile(lock "cmpxchgl %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -102,7 +102,7 @@ extern void __cmpxchg_wrong_size(void);
 	case 8:								\
 	{								\
 		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile(lock "cmpxchgq %2,%1"			\
+		asm volatile(lock "cmpxchgq %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -114,15 +114,40 @@ extern void __cmpxchg_wrong_size(void);
 	__ret;								\
 })
 
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+	__raw_cmpxchg_cmp(ptr, old, new, size, lock, __typeof__(*(ptr)), "")
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by a true return.
+ */
+#define __raw_cmpxchg_flag(ptr, old, new, size, lock)			\
+	__raw_cmpxchg_cmp(ptr, old, new, size, lock, unsigned char, "sete %0")
+
 #define __cmpxchg(ptr, old, new, size)					\
 	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
 
+#define __cmpxchg_flag(ptr, old, new, size)				\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), LOCK_PREFIX)
+
 #define __sync_cmpxchg(ptr, old, new, size)				\
 	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
 
+#define __sync_cmpxchg_flag(ptr, old, new, size)			\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), "lock; ")
+
 #define __cmpxchg_local(ptr, old, new, size)				\
 	__raw_cmpxchg((ptr), (old), (new), (size), "")
 
+#define __cmpxchg_local_flag(ptr, old, new, size)		\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), "")
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
@@ -134,11 +159,20 @@ extern void __cmpxchg_wrong_size(void);
 #define cmpxchg(ptr, old, new)						\
 	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
 
+#define cmpxchg_flag(ptr, old, new)					\
+	__cmpxchg_flag((ptr), (old), (new), sizeof(*ptr))
+
 #define sync_cmpxchg(ptr, old, new)					\
 	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
 
+#define sync_cmpxchg_flag(ptr, old, new)				\
+	__sync_cmpxchg_flag((ptr), (old), (new), sizeof(*ptr))
+
 #define cmpxchg_local(ptr, old, new)					\
 	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local_flag(ptr, old, new)				\
+	__cmpxchg_local_flag((ptr), (old), (new), sizeof(*ptr))
 #endif
 
 #define xadd(ptr, inc)							\
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 3b573f6..0797bc6 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -111,6 +111,13 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 				sizeof(*(ptr)));			\
 	__ret;								\
 })
+
+#define cmpxchg_flag(ptr, o, n)			\
+({						\
+	__typeof__(*(ptr)) __orig = (o);	\
+	cmpxchg((ptr), __orig, (n)) == __orig;	\
+})
+
 #define cmpxchg_local(ptr, o, n)					\
 ({									\
 	__typeof__(*(ptr)) __ret;					\
@@ -124,6 +131,13 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 				sizeof(*(ptr)));			\
 	__ret;								\
 })
+
+#define cmpxchg_local_flag(ptr, o, n)			\
+({							\
+	__typeof__(*(ptr)) __orig = (o);		\
+	cmpxchg_local((ptr), __orig, (n)) == __orig;	\
+})
+
 #endif
 
 #ifndef CONFIG_X86_CMPXCHG64
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 14/15] x86/ticketlocks: use cmpxchg_flag for trylock
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (12 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 13/15] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  2011-08-22 23:15   ` [PATCH 15/15] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 05fd59e..e208c8f 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -102,7 +102,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
 
 	/* cmpxchg is a full barrier, so nothing can move before it */
-	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
+	return cmpxchg_flag(&lock->head_tail, old.head_tail, new.head_tail);
 }
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 15/15] x86: use cmpxchg_flag() where applicable
       [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
                     ` (13 preceding siblings ...)
  2011-08-22 23:15   ` [PATCH 14/15] x86/ticketlocks: use cmpxchg_flag for trylock Jeremy Fitzhardinge
@ 2011-08-22 23:15   ` Jeremy Fitzhardinge
  14 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cacheflush.h |    2 +-
 arch/x86/include/asm/rwsem.h      |    7 ++-----
 arch/x86/kernel/acpi/boot.c       |   10 ++++------
 arch/x86/kernel/cpu/mcheck/mce.c  |    2 +-
 arch/x86/xen/p2m.c                |   10 +++++-----
 5 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 4e12668..6ceae20 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -56,7 +56,7 @@ static inline void set_page_memtype(struct page *pg, unsigned long memtype)
 	do {
 		old_flags = pg->flags;
 		new_flags = (old_flags & _PGMT_CLEAR_MASK) | memtype_flags;
-	} while (cmpxchg(&pg->flags, old_flags, new_flags) != old_flags);
+	} while (!cmpxchg_flag(&pg->flags, old_flags, new_flags));
 }
 #else
 static inline unsigned long get_page_memtype(struct page *pg) { return -1; }
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index df4cd32..5a35263 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -126,11 +126,8 @@ static inline void __down_write(struct rw_semaphore *sem)
  */
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long ret = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-			   RWSEM_ACTIVE_WRITE_BIAS);
-	if (ret == RWSEM_UNLOCKED_VALUE)
-		return 1;
-	return 0;
+	return cmpxchg_flag(&sem->count, RWSEM_UNLOCKED_VALUE,
+			    RWSEM_ACTIVE_WRITE_BIAS);
 }
 
 /*
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9a966c5..6bf0bf8 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1669,22 +1669,20 @@ early_param("acpi_sci", setup_acpi_sci);
 
 int __acpi_acquire_global_lock(unsigned int *lock)
 {
-	unsigned int old, new, val;
+	unsigned int old, new;
 	do {
 		old = *lock;
 		new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1));
-		val = cmpxchg(lock, old, new);
-	} while (unlikely (val != old));
+	} while (unlikely (!cmpxchg_flag(lock, old, new)));
 	return (new < 3) ? -1 : 0;
 }
 
 int __acpi_release_global_lock(unsigned int *lock)
 {
-	unsigned int old, new, val;
+	unsigned int old, new;
 	do {
 		old = *lock;
 		new = old & ~0x3;
-		val = cmpxchg(lock, old, new);
-	} while (unlikely (val != old));
+	} while (unlikely (!cmpxchg_flag(lock, old, new)));
 	return old & 0x1;
 }
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3385ea2..99ae2d1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -198,7 +198,7 @@ void mce_log(struct mce *mce)
 		}
 		smp_rmb();
 		next = entry + 1;
-		if (cmpxchg(&mcelog.next, entry, next) == entry)
+		if (cmpxchg_flag(&mcelog.next, entry, next))
 			break;
 	}
 	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 141eb0d..5510082 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -448,7 +448,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_mid_init(mid);
 
-		if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
+		if (!cmpxchg_flag(top_p, p2m_mid_missing, mid))
 			free_p2m_page(mid);
 	}
 
@@ -470,7 +470,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
 		mid_mfn_mfn = virt_to_mfn(mid_mfn);
-		if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
+		if (!cmpxchg_flag(top_mfn_p, missing_mfn, mid_mfn_mfn))
 			free_p2m_page(mid_mfn);
 		else
 			p2m_top_mfn_p[topidx] = mid_mfn;
@@ -488,7 +488,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_init(p2m);
 
-		if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
+		if (!cmpxchg_flag(&mid[mididx], p2m_orig, p2m))
 			free_p2m_page(p2m);
 		else
 			mid_mfn[mididx] = virt_to_mfn(p2m);
@@ -600,8 +600,8 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 		/* Swap over from MISSING to IDENTITY if needed. */
 		if (p2m_top[topidx][mididx] == p2m_missing) {
-			WARN_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing,
-				p2m_identity) != p2m_missing);
+			WARN_ON(!cmpxchg_flag(&p2m_top[topidx][mididx], p2m_missing,
+					      p2m_identity));
 			return true;
 		}
 	}
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/15] x86: add xadd helper macro
  2011-08-22 23:15   ` [PATCH 07/15] x86: add xadd helper macro Jeremy Fitzhardinge
@ 2011-08-22 23:29     ` H. Peter Anvin
  2011-08-22 23:43       ` Jeremy Fitzhardinge
  2011-08-23 10:59     ` Peter Zijlstra
  1 sibling, 1 reply; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-22 23:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/22/2011 04:15 PM, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Add a common xadd implementation.
> 
> This has the side effect of generating a bad instruction if you try to
> use it on a 64-bit value on a 32-bit system - but don't do that.
> 

It would be better to barf at that point, so we get the error with a C
line... also, there needs to be a default clause with
__compiletime_error() in it.

There are a few additional xadd users which should be converted unless
I'm mistaken:

rwsem_atomic_update() in asm/rwsem.h.

atomic_add_return() in asm/atomic.h.

atomic64_add_return() in asm/atomic64_64.h.

atom_asr() in asm/uv/uv_bau.h (*VOMIT* - the UV people have created a
whole different type in private code...)

	-hpa

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/15] x86: add xadd helper macro
  2011-08-22 23:29     ` H. Peter Anvin
@ 2011-08-22 23:43       ` Jeremy Fitzhardinge
  2011-08-23  4:41         ` H. Peter Anvin
  0 siblings, 1 reply; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-22 23:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/22/2011 04:29 PM, H. Peter Anvin wrote:
> On 08/22/2011 04:15 PM, Jeremy Fitzhardinge wrote:
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> Add a common xadd implementation.
>>
>> This has the side effect of generating a bad instruction if you try to
>> use it on a 64-bit value on a 32-bit system - but don't do that.
>>
> It would be better to barf at that point, so we get the error with a C
> line... also, there needs to be a default clause with
> __compiletime_error() in it.

OK, but the "standard of care" here is the old "calling undefined
function" link error; getting a bad asm instruction is a little more
helpful, in a sense.  But agreed on the default: case.  I'll see if I
can fix up xadd and cmpxchg to fail better as well.

> There are a few additional xadd users which should be converted unless
> I'm mistaken:
>
> rwsem_atomic_update() in asm/rwsem.h.
>
> atomic_add_return() in asm/atomic.h.
>
> atomic64_add_return() in asm/atomic64_64.h.
>
> atom_asr() in asm/uv/uv_bau.h (*VOMIT* - the UV people have created a
> whole different type in private code...)
>

Yeah, I hadn't done a full xadd audit.  I'll take a look.

    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/15] x86: add xadd helper macro
  2011-08-22 23:43       ` Jeremy Fitzhardinge
@ 2011-08-23  4:41         ` H. Peter Anvin
  0 siblings, 0 replies; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-23  4:41 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/22/2011 04:43 PM, Jeremy Fitzhardinge wrote:
> On 08/22/2011 04:29 PM, H. Peter Anvin wrote:
>> On 08/22/2011 04:15 PM, Jeremy Fitzhardinge wrote:
>>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>>
>>> Add a common xadd implementation.
>>>
>>> This has the side effect of generating a bad instruction if you try to
>>> use it on a 64-bit value on a 32-bit system - but don't do that.
>>>
>> It would be better to barf at that point, so we get the error with a C
>> line... also, there needs to be a default clause with
>> __compiletime_error() in it.
> 
> OK, but the "standard of care" here is the old "calling undefined
> function" link error; getting a bad asm instruction is a little more
> helpful, in a sense.  But agreed on the default: case.  I'll see if I
> can fix up xadd and cmpxchg to fail better as well.
> 

__compiletime_error() is better than either, obviously...

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/15] x86: add xadd helper macro
  2011-08-22 23:15   ` [PATCH 07/15] x86: add xadd helper macro Jeremy Fitzhardinge
  2011-08-22 23:29     ` H. Peter Anvin
@ 2011-08-23 10:59     ` Peter Zijlstra
  2011-08-23 16:34       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2011-08-23 10:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge



What you could do is something like:

#define CASE_B	case 1
#define CASE_W	case 2
#define CASE_L	case 4
#ifdef CONFIG_64BIT
# define CASE_Q	case 8
#else
# define CASE_Q case -1 /* sizeof() won't ever return this */
#endif


> +#define xadd(ptr, inc)                                                 \
> +       do {                                                            \
> +               switch (sizeof(*(ptr))) {                               \
                 CASE_B:                                                 \
> +                       asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"     \
> +                                     : "+r" (inc), "+m" (*(ptr))       \
> +                                     : : "memory", "cc");              \
> +                       break;                                          \
		 CASE_W:                                                 \
> +                       asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"     \
> +                                     : "+r" (inc), "+m" (*(ptr))       \
> +                                     : : "memory", "cc");              \
> +                       break;                                          \
                 CASE_L:                                                 \
> +                       asm volatile (LOCK_PREFIX "xaddl %0, %1\n"      \
> +                                     : "+r" (inc), "+m" (*(ptr))       \
> +                                     : : "memory", "cc");              \
> +                       break;                                          \
                 CASE_Q:                                                 \
> +                       asm volatile (LOCK_PREFIX "xaddq %q0, %1\n"     \
> +                                     : "+r" (inc), "+m" (*(ptr))       \
> +                                     : : "memory", "cc");              \
> +                       break;                                          \
		 default:						 \
			 __xadd_wrong_size();				 \
> +               }                                                       \
> +       } while(0) 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/15] x86: add xadd helper macro
  2011-08-23 10:59     ` Peter Zijlstra
@ 2011-08-23 16:34       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-23 16:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/23/2011 03:59 AM, Peter Zijlstra wrote:
>
> What you could do is something like:
>
> #define CASE_B	case 1
> #define CASE_W	case 2
> #define CASE_L	case 4
> #ifdef CONFIG_64BIT
> # define CASE_Q	case 8
> #else
> # define CASE_Q case -1 /* sizeof() won't ever return this */
> #endif

Yeah, that's OK, though I'll just make it the constants so there's still
a case keyword in the switch.

    J


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-22 23:15   ` [PATCH 13/15] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
@ 2011-08-23 19:01     ` Christoph Lameter
  2011-08-23 19:22       ` H. Peter Anvin
  2011-08-23 19:53       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 40+ messages in thread
From: Christoph Lameter @ 2011-08-23 19:01 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Mon, 22 Aug 2011, Jeremy Fitzhardinge wrote:

> Most callers of cmpxchg() direcly compare RETURN with OLD to see if it was
> successful.  This results in unnecessary conditional comparisons
> and conditionals since the cmpxchg instruction directly sets the flags
> to indicate success/failure.

> Add cmpxchg_flag() variants which return a boolean flag directly indicating
> success.  Unfortunately an asm() statement can't directly export status
> status flags, but sete isn't too bad.

And so what happens through this patch is that a cmp with a value that is
likely in a register is replaced by a sete. Is there really a benefit?

What I wish we would have is the actual use of the processor flag.

if (cmpxchg_flags(....)) {
}

where the cmpxchg is followed immediately by a jump. I tried in the past
to pass a goto label to cmpxchg but that did not work.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 19:01     ` Christoph Lameter
@ 2011-08-23 19:22       ` H. Peter Anvin
  2011-08-23 19:52         ` Christoph Lameter
  2011-08-23 19:53       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-23 19:22 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jeremy Fitzhardinge, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/23/2011 12:01 PM, Christoph Lameter wrote:
> On Mon, 22 Aug 2011, Jeremy Fitzhardinge wrote:
> 
>> Most callers of cmpxchg() direcly compare RETURN with OLD to see if it was
>> successful.  This results in unnecessary conditional comparisons
>> and conditionals since the cmpxchg instruction directly sets the flags
>> to indicate success/failure.
> 
>> Add cmpxchg_flag() variants which return a boolean flag directly indicating
>> success.  Unfortunately an asm() statement can't directly export status
>> status flags, but sete isn't too bad.
> 
> And so what happens through this patch is that a cmp with a value that is
> likely in a register is replaced by a sete. Is there really a benefit?
> 
> What I wish we would have is the actual use of the processor flag.
> 
> if (cmpxchg_flags(....)) {
> }
> 
> where the cmpxchg is followed immediately by a jump. I tried in the past
> to pass a goto label to cmpxchg but that did not work.
> 

Yes; asm goto can't have output values :(

I keep meaning to write up a proposal for the gcc people to output
arithmetic flag status.

	-hpa


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 19:22       ` H. Peter Anvin
@ 2011-08-23 19:52         ` Christoph Lameter
  2011-08-23 21:03           ` H. Peter Anvin
  0 siblings, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2011-08-23 19:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, 23 Aug 2011, H. Peter Anvin wrote:

> I keep meaning to write up a proposal for the gcc people to output
> arithmetic flag status.

Should be relatively straightforward if you make the processor flag a
builtin variable.

i.e. __processor_flag_zero, __processor_flag__gt etc.

Then the inline cmpxchg functions could simply do a

	return __processor_flag_zero;

The code generator then needs to realize from the expression that we are
referring to the zero flag and insert the correct jxx instruction.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 19:01     ` Christoph Lameter
  2011-08-23 19:22       ` H. Peter Anvin
@ 2011-08-23 19:53       ` Jeremy Fitzhardinge
  2011-08-23 20:45         ` H. Peter Anvin
  1 sibling, 1 reply; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-23 19:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/23/2011 12:01 PM, Christoph Lameter wrote:
> On Mon, 22 Aug 2011, Jeremy Fitzhardinge wrote:
>
>> Most callers of cmpxchg() direcly compare RETURN with OLD to see if it was
>> successful.  This results in unnecessary conditional comparisons
>> and conditionals since the cmpxchg instruction directly sets the flags
>> to indicate success/failure.
>> Add cmpxchg_flag() variants which return a boolean flag directly indicating
>> success.  Unfortunately an asm() statement can't directly export status
>> status flags, but sete isn't too bad.
> And so what happens through this patch is that a cmp with a value that is
> likely in a register is replaced by a sete. Is there really a benefit?

There's definitely a benefit for functions which return a bool resulting
from cmpxchg, which occurs in a few places.

> What I wish we would have is the actual use of the processor flag.
>
> if (cmpxchg_flags(....)) {
> }
>
> where the cmpxchg is followed immediately by a jump. I tried in the past
> to pass a goto label to cmpxchg but that did not work.

Yes, that would ideal.  The closest you can get is asm goto(), but the
syntax for that would be awful; something like:

#define cmpxchg_jump(ptr, old, new, fail)\
	asm goto (...)


:
:

again:
	old = *thingp;
	new = frobulate(old);
	cmpxchg_jump(thingp, old, new, again);
	/* worked */


Would this be useful enough?

Also, cmpxchg_flag() is limited to arch/x86; I hadn't looked into what
it would take to make it a generic part of the cmpxchg API for the rest
of the kernel.

    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 19:53       ` Jeremy Fitzhardinge
@ 2011-08-23 20:45         ` H. Peter Anvin
  2011-08-23 22:15           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-23 20:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Christoph Lameter, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/23/2011 12:53 PM, Jeremy Fitzhardinge wrote:
> 
> Yes, that would ideal.  The closest you can get is asm goto(), but the
> syntax for that would be awful; something like:
> 
> #define cmpxchg_jump(ptr, old, new, fail)\
> 	asm goto (...)
> 
> 
> :
> :
> 
> again:
> 	old = *thingp;
> 	new = frobulate(old);
> 	cmpxchg_jump(thingp, old, new, again);
> 	/* worked */
> 
> Would this be useful enough?
> 

Actually there is a trick:

static inline bool ....
{
	asm goto(... yes);
no:
	return false;
yes:
	return true;
}

... which makes syntax a heckuva lot less awkward.

	-hpa

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 19:52         ` Christoph Lameter
@ 2011-08-23 21:03           ` H. Peter Anvin
  0 siblings, 0 replies; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-23 21:03 UTC (permalink / raw)
  To: Christoph Lameter, H.J. Lu
  Cc: Jeremy Fitzhardinge, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/23/2011 12:52 PM, Christoph Lameter wrote:
> 
> Should be relatively straightforward if you make the processor flag a
> builtin variable.
> 
> i.e. __processor_flag_zero, __processor_flag__gt etc.
> 
> Then the inline cmpxchg functions could simply do a
> 
> 	return __processor_flag_zero;
> 
> The code generator then needs to realize from the expression that we are
> referring to the zero flag and insert the correct jxx instruction.
>

The right thing is probably to have not the *flags* but the *conditions*
(exposed as variables):

bool __builtin_x86_s,  __builtin_x86_ns;
bool __builtin_x86_pe, __builtin_x86_po;
...


Or perhaps even

register bool x86_s asm("s");?

H.J. thoughts/ideas?

	-hpa

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 20:45         ` H. Peter Anvin
@ 2011-08-23 22:15           ` Jeremy Fitzhardinge
  2011-08-23 22:43             ` H. Peter Anvin
  2011-08-24 13:53             ` Christoph Lameter
  0 siblings, 2 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-23 22:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Lameter, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/23/2011 01:45 PM, H. Peter Anvin wrote:
> On 08/23/2011 12:53 PM, Jeremy Fitzhardinge wrote:
>> Yes, that would ideal.  The closest you can get is asm goto(), but the
>> syntax for that would be awful; something like:
>>
>> #define cmpxchg_jump(ptr, old, new, fail)\
>> 	asm goto (...)
>>
>>
>> :
>> :
>>
>> again:
>> 	old = *thingp;
>> 	new = frobulate(old);
>> 	cmpxchg_jump(thingp, old, new, again);
>> 	/* worked */
>>
>> Would this be useful enough?
>>
> Actually there is a trick:
>
> static inline bool ....
> {
> 	asm goto(... yes);
> no:
> 	return false;
> yes:
> 	return true;
> }
>
> ... which makes syntax a heckuva lot less awkward.

Yeah, but you'd need to define an inline for each type, since the
function isn't polymorphic.  But it can be done with a macro.

However, having prototyped it, I dunno, it doesn't really seem like much
of a win for all the extra code it adds.  I just can't get too excited
about an extra test instruction adjacent to a monster like a locked
cmpxchg.  The jump variant avoids the test, but gcc still generates some
pretty bogus stuff:

        lock; cmpxchgq %rbx,(%rcx); jne .L88    # D.24853, MEM[(volatile u64 *)top_p_26],
# 0 "" 2
#NO_APP
        jmp     .L87    #
.L88:
        xorl    %esi, %esi      #
        movq    %rbx, %rdi      # D.24853,
        call    free_pages      #
.L87:
        addq    p2m_top_mfn(%rip), %r13 # p2m_top_mfn, D.24896
        movq    p2m_top_mfn_p(%rip), %rax       # p2m_top_mfn_p, p2m_top_mfn_p


and adding unlikely()s doesn't help at all.

    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 22:15           ` Jeremy Fitzhardinge
@ 2011-08-23 22:43             ` H. Peter Anvin
  2011-08-24 13:54               ` Christoph Lameter
  2011-08-24 13:53             ` Christoph Lameter
  1 sibling, 1 reply; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-23 22:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Christoph Lameter, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/23/2011 03:15 PM, Jeremy Fitzhardinge wrote:
> 
> However, having prototyped it, I dunno, it doesn't really seem like much
> of a win for all the extra code it adds.  I just can't get too excited
> about an extra test instruction adjacent to a monster like a locked
> cmpxchg.  The jump variant avoids the test, but gcc still generates some
> pretty bogus stuff:
> 

A compare is hardly a big cost, as you're quite correctly pointing out.

	-hpa


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 22:15           ` Jeremy Fitzhardinge
  2011-08-23 22:43             ` H. Peter Anvin
@ 2011-08-24 13:53             ` Christoph Lameter
  2011-08-24 16:33               ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Lameter @ 2011-08-24 13:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, 23 Aug 2011, Jeremy Fitzhardinge wrote:

> However, having prototyped it, I dunno, it doesn't really seem like much
> of a win for all the extra code it adds.  I just can't get too excited
> about an extra test instruction adjacent to a monster like a locked
> cmpxchg.  The jump variant avoids the test, but gcc still generates some
> pretty bogus stuff:

There are also unlocked cmpxchges in use. And if the cmpxchg is a 16 byte
cmpxchg (cmpxchg_double) then the comparison is getting more expensive.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-23 22:43             ` H. Peter Anvin
@ 2011-08-24 13:54               ` Christoph Lameter
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2011-08-24 13:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, 23 Aug 2011, H. Peter Anvin wrote:

> On 08/23/2011 03:15 PM, Jeremy Fitzhardinge wrote:
> >
> > However, having prototyped it, I dunno, it doesn't really seem like much
> > of a win for all the extra code it adds.  I just can't get too excited
> > about an extra test instruction adjacent to a monster like a locked
> > cmpxchg.  The jump variant avoids the test, but gcc still generates some
> > pretty bogus stuff:
> >
>
> A compare is hardly a big cost, as you're quite correctly pointing out.

Could become relatively costly if the cmpxchg is not locked or the compare
involves comparing multiple words.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 13:53             ` Christoph Lameter
@ 2011-08-24 16:33               ` Jeremy Fitzhardinge
  2011-08-24 19:27                 ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 16:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 06:53 AM, Christoph Lameter wrote:
> On Tue, 23 Aug 2011, Jeremy Fitzhardinge wrote:
>
>> However, having prototyped it, I dunno, it doesn't really seem like much
>> of a win for all the extra code it adds.  I just can't get too excited
>> about an extra test instruction adjacent to a monster like a locked
>> cmpxchg.  The jump variant avoids the test, but gcc still generates some
>> pretty bogus stuff:
> There are also unlocked cmpxchges in use.

I only looked in arch/x86, but I didn't find any that were
straightforward candidates for cmpxchg_flag.

>  And if the cmpxchg is a 16 byte
> cmpxchg (cmpxchg_double) then the comparison is getting more expensive.

We're talking about the difference between cmpxchg_flag()  - which does
a sete based on the flags set up cmpxchg - and a variant based on "asm
goto" which could, in principle, avoid the need for sete by allowing a
control flow statment to directly use the asm goto's conditional jump. 
The performance of both is invariant wrt the cmpxchg argument size.

    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 16:33               ` Jeremy Fitzhardinge
@ 2011-08-24 19:27                 ` Christoph Lameter
  2011-08-24 20:15                   ` Jeremy Fitzhardinge
  2011-08-24 20:31                   ` H. Peter Anvin
  0 siblings, 2 replies; 40+ messages in thread
From: Christoph Lameter @ 2011-08-24 19:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, 24 Aug 2011, Jeremy Fitzhardinge wrote:

> I only looked in arch/x86, but I didn't find any that were
> straightforward candidates for cmpxchg_flag.

Look at core code: mm/slub.c

> >  And if the cmpxchg is a 16 byte
> > cmpxchg (cmpxchg_double) then the comparison is getting more expensive.
>
> We're talking about the difference between cmpxchg_flag()  - which does
> a sete based on the flags set up cmpxchg - and a variant based on "asm
> goto" which could, in principle, avoid the need for sete by allowing a
> control flow statment to directly use the asm goto's conditional jump.
> The performance of both is invariant wrt the cmpxchg argument size.

Indeed for sete the size of the argument does not matter. Look at
percpu_cmpxchg_double() and cmpxchg_double() in arch/x86/include for some
of the functions I wrote. The sete is used to avoid the double word
comparisons that would otherwise have been necessary.

But still the solution with the flags would save another instruction and
the generated code would not be as ugly. For not only do you have an
additional sete you will then also have to check the result again. This
means at least two additional instruction.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 19:27                 ` Christoph Lameter
@ 2011-08-24 20:15                   ` Jeremy Fitzhardinge
  2011-08-24 20:31                   ` H. Peter Anvin
  1 sibling, 0 replies; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 20:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 12:27 PM, Christoph Lameter wrote:
> On Wed, 24 Aug 2011, Jeremy Fitzhardinge wrote:
>
>> I only looked in arch/x86, but I didn't find any that were
>> straightforward candidates for cmpxchg_flag.
> Look at core code: mm/slub.c

These changes are currently only in arch/x86.  I haven't looked at
extending the cmpxchg API kernel-wide.

> But still the solution with the flags would save another instruction and
> the generated code would not be as ugly. For not only do you have an
> additional sete you will then also have to check the result again. This
> means at least two additional instruction.

Sure.  And the asm goto() variant avoids the sete and subsequent
gcc-generated test, but at the cost of generating a number of
unnecessary jumps - so it doesn't look like much of a win.  Perhaps a
hypothetical gcc extension to add "cc" as an output for asms would help,
but that's conjecture at this point.


    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 19:27                 ` Christoph Lameter
  2011-08-24 20:15                   ` Jeremy Fitzhardinge
@ 2011-08-24 20:31                   ` H. Peter Anvin
  2011-08-24 20:38                     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-24 20:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jeremy Fitzhardinge, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 12:27 PM, Christoph Lameter wrote:
> 
> Indeed for sete the size of the argument does not matter. Look at
> percpu_cmpxchg_double() and cmpxchg_double() in arch/x86/include for some
> of the functions I wrote. The sete is used to avoid the double word
> comparisons that would otherwise have been necessary.
> 
> But still the solution with the flags would save another instruction and
> the generated code would not be as ugly. For not only do you have an
> additional sete you will then also have to check the result again. This
> means at least two additional instruction.
> 

The sete is actually more expensive than the compare for the single-word
case.  The double-word case is a different matter.

	-hpa



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 20:31                   ` H. Peter Anvin
@ 2011-08-24 20:38                     ` Jeremy Fitzhardinge
  2011-08-24 23:04                       ` H. Peter Anvin
  0 siblings, 1 reply; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 20:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Lameter, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 01:31 PM, H. Peter Anvin wrote:
> The sete is actually more expensive than the compare for the single-word
> case.  The double-word case is a different matter.

Hm, setcc has a longer latency but a higher throughput, is that
correct?  Still, its recommended where you can use it to avoid a branch.

I was also thinking that using it reduces register pressure, since you
don't need to keep the "old" value around, so it dies sooner.

    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 20:38                     ` Jeremy Fitzhardinge
@ 2011-08-24 23:04                       ` H. Peter Anvin
  2011-08-24 23:11                         ` Linus Torvalds
  0 siblings, 1 reply; 40+ messages in thread
From: H. Peter Anvin @ 2011-08-24 23:04 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Christoph Lameter, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 01:38 PM, Jeremy Fitzhardinge wrote:
> On 08/24/2011 01:31 PM, H. Peter Anvin wrote:
>> The sete is actually more expensive than the compare for the single-word
>> case.  The double-word case is a different matter.
> 
> Hm, setcc has a longer latency but a higher throughput, is that
> correct?  Still, its recommended where you can use it to avoid a branch.

Yes, but you don't know when that is (the compiler does, of course.)

> I was also thinking that using it reduces register pressure, since you
> don't need to keep the "old" value around, so it dies sooner.

... but you burn a register for the intermediate flag value, so you're
just as bad off.

	-hpa

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 23:04                       ` H. Peter Anvin
@ 2011-08-24 23:11                         ` Linus Torvalds
  2011-08-24 23:19                           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2011-08-24 23:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Christoph Lameter, Peter Zijlstra,
	Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Nick Piggin, Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 4:04 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
>> I was also thinking that using it reduces register pressure, since you
>> don't need to keep the "old" value around, so it dies sooner.
>
> ... but you burn a register for the intermediate flag value, so you're
> just as bad off.

But that register has much shorter liveness - so I do agree that it *can* help.

Whether it actually *does* help is unclear. I do agree that we might
be better off without introducing yet another (questionable)
interface.

                     Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 23:11                         ` Linus Torvalds
@ 2011-08-24 23:19                           ` Jeremy Fitzhardinge
  2011-08-25 14:07                             ` Christoph Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 23:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Christoph Lameter, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 04:11 PM, Linus Torvalds wrote:
> On Wed, Aug 24, 2011 at 4:04 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> I was also thinking that using it reduces register pressure, since you
>>> don't need to keep the "old" value around, so it dies sooner.
>> ... but you burn a register for the intermediate flag value, so you're
>> just as bad off.
> But that register has much shorter liveness - so I do agree that it *can* help.
>
> Whether it actually *does* help is unclear. I do agree that we might
> be better off without introducing yet another (questionable)
> interface.

I think the interface is useful because it directly expresses what many
cmpxchg users want to know: "Did that work?"  There are very few users
which actually care what the "old" value was if it wasn't what they were
expecting.

There's 3(ish) ways it could be implemented, but I don't have a strong
opinion on them:

 1. with a direct compare, as people are doing now
 2. with sete to set a flag (clearly better for > wordsize arguments)
 3. with asm goto
 4. (with a hypothetical gcc extension which exposes condition codes)

My experiments with asm goto were not very promising at first glance,
but it *should* be the best of the lot for most users.  A hybrid 1/2
implementation would also be possible.

But the interface hides the implementation specifics, so its not all
that important.

    J

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 13/15] x86: add cmpxchg_flag() variant
  2011-08-24 23:19                           ` Jeremy Fitzhardinge
@ 2011-08-25 14:07                             ` Christoph Lameter
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2011-08-25 14:07 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, 24 Aug 2011, Jeremy Fitzhardinge wrote:

> But the interface hides the implementation specifics, so its not all
> that important.

Ok but could you make it consistent throughout? The cmpxchg_double's do
not have the _flags at the end but also already do a sete. So either
rename those or come up with another convention?


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2011-08-25 14:07 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-22 23:14 [PATCH 00/12] x86: Ticket lock cleanup Jeremy Fitzhardinge
     [not found] ` <cover.1314054734.git.jeremy.fitzhardinge@citrix.com>
2011-08-22 23:14   ` [PATCH 01/15] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-08-22 23:14   ` [PATCH 02/15] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
2011-08-22 23:14   ` [PATCH 03/15] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 04/15] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 05/15] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 06/15] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 07/15] x86: add xadd helper macro Jeremy Fitzhardinge
2011-08-22 23:29     ` H. Peter Anvin
2011-08-22 23:43       ` Jeremy Fitzhardinge
2011-08-23  4:41         ` H. Peter Anvin
2011-08-23 10:59     ` Peter Zijlstra
2011-08-23 16:34       ` Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 08/15] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 09/15] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 10/15] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 11/15] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 12/15] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 13/15] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
2011-08-23 19:01     ` Christoph Lameter
2011-08-23 19:22       ` H. Peter Anvin
2011-08-23 19:52         ` Christoph Lameter
2011-08-23 21:03           ` H. Peter Anvin
2011-08-23 19:53       ` Jeremy Fitzhardinge
2011-08-23 20:45         ` H. Peter Anvin
2011-08-23 22:15           ` Jeremy Fitzhardinge
2011-08-23 22:43             ` H. Peter Anvin
2011-08-24 13:54               ` Christoph Lameter
2011-08-24 13:53             ` Christoph Lameter
2011-08-24 16:33               ` Jeremy Fitzhardinge
2011-08-24 19:27                 ` Christoph Lameter
2011-08-24 20:15                   ` Jeremy Fitzhardinge
2011-08-24 20:31                   ` H. Peter Anvin
2011-08-24 20:38                     ` Jeremy Fitzhardinge
2011-08-24 23:04                       ` H. Peter Anvin
2011-08-24 23:11                         ` Linus Torvalds
2011-08-24 23:19                           ` Jeremy Fitzhardinge
2011-08-25 14:07                             ` Christoph Lameter
2011-08-22 23:15   ` [PATCH 14/15] x86/ticketlocks: use cmpxchg_flag for trylock Jeremy Fitzhardinge
2011-08-22 23:15   ` [PATCH 15/15] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.