All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
@ 2011-08-24 17:52 Jeremy Fitzhardinge
  2011-08-24 17:52 ` [PATCH 01/18] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                   ` (20 more replies)
  0 siblings, 21 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Hi all,

[ Change from last post: use PeterZ's idea to retain link-time
  reporting of bad sized arguments for xchg/cmpxchg/xadd. ]

This is a repost of the ticketlock cleanup I posted for the last mergewindow.

The main differences are:
 - put "asm volatiles" into the locked unlock path
 - tidy up xadd helper to put it into a common place
 - clean up the rest of cmpxchg.h to remove most of the 32/64 duplication

As a bonus, I added some cmpxchg_flag() variants, which return a bool
rather than the old value, which saves the caller from having to do a
comparison (since cmpxchg itself sets the flags).  This is useful
since many callers of cmpxchg() only care whether it worked without
actually getting the old value.

This is also available at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/ticketlock-cleanup

Thanks,
	J

Jeremy Fitzhardinge (18):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
    same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86: add xadd helper macro
  x86/ticketlock: use xadd helper
  x86/cmpxchg: linux/alternative.h has LOCK_PREFIX
  x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit.
  x86/cmpxchg: move 64-bit set64_bit() to match 32-bit
  x86/cmpxchg: unify cmpxchg into cmpxchg.h
  x86: add cmpxchg_flag() variant
  x86/ticketlocks: use cmpxchg_flag for trylock
  x86: use cmpxchg_flag() where applicable
  x86: report xchg/cmpxchg/xadd usage errors consistently
  x86: add local and sync variants of xadd
  x86: use xadd helper more widely

 arch/x86/include/asm/atomic.h         |    4 +-
 arch/x86/include/asm/atomic64_64.h    |    4 +-
 arch/x86/include/asm/cacheflush.h     |    2 +-
 arch/x86/include/asm/cmpxchg.h        |  223 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_32.h     |  128 ++-----------------
 arch/x86/include/asm/cmpxchg_64.h     |  131 -------------------
 arch/x86/include/asm/rwsem.h          |   11 +--
 arch/x86/include/asm/spinlock.h       |  140 +++++++--------------
 arch/x86/include/asm/spinlock_types.h |   22 +++-
 arch/x86/include/asm/uv/uv_bau.h      |    4 +-
 arch/x86/kernel/acpi/boot.c           |   10 +-
 arch/x86/kernel/cpu/mcheck/mce.c      |    2 +-
 arch/x86/xen/p2m.c                    |   10 +-
 13 files changed, 321 insertions(+), 370 deletions(-)

-- 
1.7.6


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/18] x86/ticketlock: clean up types and accessors
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
@ 2011-08-24 17:52 ` Jeremy Fitzhardinge
  2011-08-24 17:52 ` [PATCH 02/18] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   24 ++++++++++--------------
 arch/x86/include/asm/spinlock_types.h |   20 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	unsigned short inc = 1 << TICKET_SHIFT;
 
 	asm volatile (
 		LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp, new;
+	unsigned int tmp, new;
 
 	asm volatile("movzwl %2, %0\n\t"
 		     "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	int inc = 0x00010000;
-	int tmp;
+	unsigned inc = 1 << TICKET_SHIFT;
+	unsigned tmp;
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 		     "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp;
-	int new;
+	unsigned tmp;
+	unsigned new;
 
 	asm volatile("movl %2,%0\n\t"
 		     "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
+#define TICKET_MASK	((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-	unsigned int slock;
+	union {
+		unsigned int slock;
+		struct __raw_tickets {
+			__ticket_t head, tail;
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/18] x86/ticketlock: convert spin loop to C
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
  2011-08-24 17:52 ` [PATCH 01/18] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-08-24 17:52 ` Jeremy Fitzhardinge
  2011-08-24 20:02   ` Andi Kleen
  2011-08-24 17:52 ` [PATCH 03/18] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   58 +++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned short inc = 1 << TICKET_SHIFT;
-
-	asm volatile (
-		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
-		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
-		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
-		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
-		:
-		: "memory", "cc");
+	register union {
+		struct __raw_tickets tickets;
+		unsigned short slock;
+	} inc = { .slock = 1 << TICKET_SHIFT };
+
+	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+	for (;;) {
+		if (inc.tickets.head == inc.tickets.tail)
+			goto out;
+		cpu_relax();
+		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned inc = 1 << TICKET_SHIFT;
-	unsigned tmp;
+	__ticket_t tmp;
 
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-		     "movzwl %w0, %2\n\t"
-		     "shrl $16, %0\n\t"
-		     "1:\t"
-		     "cmpl %0, %2\n\t"
-		     "je 2f\n\t"
-		     "rep ; nop\n\t"
-		     "movzwl %1, %2\n\t"
-		     /* don't need lfence here, because loads are in-order */
-		     "jmp 1b\n"
-		     "2:"
-		     : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-		     :
-		     : "memory", "cc");
+	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+		     : "+r" (inc), "+m" (lock->slock)
+		     : : "memory", "cc");
+
+	tmp = inc;
+	inc >>= TICKET_SHIFT;
+
+	for (;;) {
+		if ((__ticket_t)inc == tmp)
+			goto out;
+		cpu_relax();
+		tmp = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/18] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
  2011-08-24 17:52 ` [PATCH 01/18] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
  2011-08-24 17:52 ` [PATCH 02/18] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-08-24 17:52 ` Jeremy Fitzhardinge
  2011-08-24 18:01   ` Linus Torvalds
  2011-08-24 17:52 ` [PATCH 04/18] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   33 ++++++++++++++++++---------------
 1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..61feeb5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "incb %0"
+			      : "+m" (lock->tickets.head) : : "memory");
+	else
+		asm volatile (LOCK_PREFIX "incw %0"
+			      : "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	lock->tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,14 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	barrier();		/* prevent reordering out of locked region */
+	__ticket_unlock_release(lock);
+	barrier();		/* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/18] x86/ticketlock: make large and small ticket versions of spin_lock the same
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2011-08-24 17:52 ` [PATCH 03/18] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-08-24 17:52 ` Jeremy Fitzhardinge
  2011-08-24 17:52 ` [PATCH 05/18] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   23 ++++++++---------------
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 61feeb5..894753f 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	register union {
-		struct __raw_tickets tickets;
-		unsigned short slock;
-	} inc = { .slock = 1 << TICKET_SHIFT };
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
 	for (;;) {
-		if (inc.tickets.head == inc.tickets.tail)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned inc = 1 << TICKET_SHIFT;
-	__ticket_t tmp;
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->slock)
+		     : "+r" (inc), "+m" (lock->tickets)
 		     : : "memory", "cc");
 
-	tmp = inc;
-	inc >>= TICKET_SHIFT;
-
 	for (;;) {
-		if ((__ticket_t)inc == tmp)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		tmp = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/18] x86/ticketlock: make __ticket_spin_lock common
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2011-08-24 17:52 ` [PATCH 04/18] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
@ 2011-08-24 17:52 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   42 ++++++++++++++++++--------------------
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 894753f..37c3b10 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets tickets = { .tail = 1 };
+
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+			      : "+r" (tickets), "+m" (lock->tickets)
+			      : : "memory", "cc");
+	else
+		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+			     : "+r" (tickets), "+m" (lock->tickets)
+			     : : "memory", "cc");
 
-	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+	return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+	register struct __raw_tickets inc;
+
+	inc = __ticket_spin_claim(lock);
 
 	for (;;) {
 		if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-	register struct __raw_tickets inc = { .tail = 1 };
-
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->tickets)
-		     : : "memory", "cc");
-
-	for (;;) {
-		if (inc.head == inc.tail)
-			goto out;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
-	}
-out:	barrier();		/* make sure nothing creeps before the lock is taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned tmp;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (4 preceding siblings ...)
  2011-08-24 17:52 ` [PATCH 05/18] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 20:00   ` Andi Kleen
  2011-08-24 17:53 ` [PATCH 07/18] x86: add xadd helper macro Jeremy Fitzhardinge
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   49 +++++++--------------------------
 arch/x86/include/asm/spinlock_types.h |    6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 37c3b10..4857dee 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp, new;
-
-	asm volatile("movzwl %2, %0\n\t"
-		     "cmpb %h0,%b0\n\t"
-		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned tmp;
-	unsigned new;
-
-	asm volatile("movl %2,%0\n\t"
-		     "movl %0,%1\n\t"
-		     "roll $16, %0\n\t"
-		     "cmpl %0,%1\n\t"
-		     "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
+	arch_spinlock_t old, new;
+
+	old.tickets = ACCESS_ONCE(lock->tickets);
+	if (old.tickets.head != old.tickets.tail)
+		return 0;
+
+	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+	/* cmpxchg is a full barrier, so nothing can move before it */
+	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
 	union {
-		unsigned int slock;
+		__ticketpair_t head_tail;
 		struct __raw_tickets {
 			__ticket_t head, tail;
 		} tickets;
 	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/18] x86: add xadd helper macro
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (5 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 08/18] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Add a common xadd implementation.

This has the side effect of generating a bad instruction if you try to
use it on a 64-bit value on a 32-bit system - but don't do that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index a460fa0..b771cdb 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -1,5 +1,37 @@
+#ifndef ASM_X86_CMPXCHG_H
+#define ASM_X86_CMPXCHG_H
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
 # include "cmpxchg_64.h"
 #endif
+
+
+#define xadd(ptr, inc)							\
+	do {								\
+		switch (sizeof(*(ptr))) {				\
+		case 1:							\
+			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 2:							\
+			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 4:							\
+			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 8:							\
+			asm volatile (LOCK_PREFIX "xaddq %q0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		}							\
+	} while(0)
+
+#endif	/* ASM_X86_CMPXCHG_H */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/18] x86/ticketlock: use xadd helper
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (6 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 07/18] x86: add xadd helper macro Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 09/18] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4857dee..05fd59e 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -71,14 +71,7 @@ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spin
 {
 	register struct __raw_tickets tickets = { .tail = 1 };
 
-	if (sizeof(lock->tickets.head) == sizeof(u8))
-		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-			      : "+r" (tickets), "+m" (lock->tickets)
-			      : : "memory", "cc");
-	else
-		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
-			     : "+r" (tickets), "+m" (lock->tickets)
-			     : : "memory", "cc");
+	xadd(&lock->tickets, tickets);
 
 	return tickets;
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/18] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (7 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 08/18] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 10/18] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Not linux/bitops.h.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_32.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 284a6e8..8d53b16 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -1,7 +1,7 @@
 #ifndef _ASM_X86_CMPXCHG_32_H
 #define _ASM_X86_CMPXCHG_32_H
 
-#include <linux/bitops.h> /* for LOCK_PREFIX */
+#include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
 /*
  * Note: if you use set64_bit(), __cmpxchg64(), or their variants, you
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/18] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit.
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (8 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 09/18] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 11/18] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_32.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 8d53b16..0ef9b82 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -9,6 +9,7 @@
  */
 
 extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
 
 /*
  * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
@@ -84,8 +85,6 @@ static inline void set_64bit(volatile u64 *ptr, u64 value)
 		     : "memory");
 }
 
-extern void __cmpxchg_wrong_size(void);
-
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/18] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (9 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 10/18] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 12/18] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_64.h |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 423ae58..aeaa610 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -3,11 +3,6 @@
 
 #include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
-static inline void set_64bit(volatile u64 *ptr, u64 val)
-{
-	*ptr = val;
-}
-
 extern void __xchg_wrong_size(void);
 extern void __cmpxchg_wrong_size(void);
 
@@ -66,6 +61,11 @@ extern void __cmpxchg_wrong_size(void);
 #define xchg(ptr, v)							\
 	__xchg((v), (ptr), sizeof(*ptr))
 
+static inline void set_64bit(volatile u64 *ptr, u64 val)
+{
+	*ptr = val;
+}
+
 #define __HAVE_ARCH_CMPXCHG 1
 
 /*
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/18] x86/cmpxchg: unify cmpxchg into cmpxchg.h
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (10 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 11/18] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 13/18] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Everything that's actually common between 32 and 64-bit is moved into
cmpxchg.h.

The common variants are derived from the 64-bit versions, so they
will emit 64-bit instructions if given 64-bit arguments, which will
be bad instructions on a 32-bit target.  Don't do that.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h    |  133 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_32.h |  113 -------------------------------
 arch/x86/include/asm/cmpxchg_64.h |  131 ------------------------------------
 3 files changed, 133 insertions(+), 244 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index b771cdb..76375ba 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -1,6 +1,128 @@
 #ifndef ASM_X86_CMPXCHG_H
 #define ASM_X86_CMPXCHG_H
 
+#include <asm/alternative.h> /* Provides LOCK_PREFIX */
+
+extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
+ * Since this is generally used to protect other memory information, we
+ * use "asm volatile" and "memory" clobbers to prevent gcc from moving
+ * information around.
+ */
+#define __xchg(x, ptr, size)						\
+({									\
+	__typeof(*(ptr)) __x = (x);					\
+	switch (size) {							\
+	case 1:								\
+	{								\
+		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
+		asm volatile("xchgb %0,%1"				\
+			     : "=q" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 2:								\
+	{								\
+		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
+		asm volatile("xchgw %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 4:								\
+	{								\
+		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
+		asm volatile("xchgl %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 8:								\
+	{								\
+		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
+		asm volatile("xchgq %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	default:							\
+		__xchg_wrong_size();					\
+	}								\
+	__x;								\
+})
+
+#define xchg(ptr, v)							\
+	__xchg((v), (ptr), sizeof(*ptr))
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+({									\
+	__typeof__(*(ptr)) __ret;					\
+	__typeof__(*(ptr)) __old = (old);				\
+	__typeof__(*(ptr)) __new = (new);				\
+	switch (size) {							\
+	case 1:								\
+	{								\
+		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
+		asm volatile(lock "cmpxchgb %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "q" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 2:								\
+	{								\
+		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
+		asm volatile(lock "cmpxchgw %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 4:								\
+	{								\
+		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
+		asm volatile(lock "cmpxchgl %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case 8:								\
+	{								\
+		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
+		asm volatile(lock "cmpxchgq %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	default:							\
+		__cmpxchg_wrong_size();					\
+	}								\
+	__ret;								\
+})
+
+#define __cmpxchg(ptr, old, new, size)					\
+	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
+
+#define __sync_cmpxchg(ptr, old, new, size)				\
+	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
+
+#define __cmpxchg_local(ptr, old, new, size)				\
+	__raw_cmpxchg((ptr), (old), (new), (size), "")
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
@@ -8,6 +130,17 @@
 #endif
 
 
+#ifdef __HAVE_ARCH_CMPXCHG
+#define cmpxchg(ptr, old, new)						\
+	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new)					\
+	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new)					\
+	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
+#endif
+
 #define xadd(ptr, inc)							\
 	do {								\
 		switch (sizeof(*(ptr))) {				\
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 0ef9b82..3b573f6 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -1,62 +1,11 @@
 #ifndef _ASM_X86_CMPXCHG_32_H
 #define _ASM_X86_CMPXCHG_32_H
 
-#include <asm/alternative.h> /* Provides LOCK_PREFIX */
-
 /*
  * Note: if you use set64_bit(), __cmpxchg64(), or their variants, you
  *       you need to test for the feature in boot_cpu_data.
  */
 
-extern void __xchg_wrong_size(void);
-extern void __cmpxchg_wrong_size(void);
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define __xchg(x, ptr, size)						\
-({									\
-	__typeof(*(ptr)) __x = (x);					\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile("xchgb %0,%1"				\
-			     : "=q" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile("xchgw %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile("xchgl %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__xchg_wrong_size();					\
-	}								\
-	__x;								\
-})
-
-#define xchg(ptr, v)							\
-	__xchg((v), (ptr), sizeof(*ptr))
-
 /*
  * CMPXCHG8B only writes to the target if we had the previous
  * value in registers, otherwise it acts as a read and gives us the
@@ -85,70 +34,8 @@ static inline void set_64bit(volatile u64 *ptr, u64 value)
 		     : "memory");
 }
 
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
-({									\
-	__typeof__(*(ptr)) __ret;					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "q" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__cmpxchg_wrong_size();					\
-	}								\
-	__ret;								\
-})
-
-#define __cmpxchg(ptr, old, new, size)					\
-	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
-#define __sync_cmpxchg(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
-
-#define __cmpxchg_local(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "")
-
 #ifdef CONFIG_X86_CMPXCHG
 #define __HAVE_ARCH_CMPXCHG 1
-
-#define cmpxchg(ptr, old, new)						\
-	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define sync_cmpxchg(ptr, old, new)					\
-	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define cmpxchg_local(ptr, old, new)					\
-	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
 #endif
 
 #ifdef CONFIG_X86_CMPXCHG64
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index aeaa610..9d6c147 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -1,66 +1,6 @@
 #ifndef _ASM_X86_CMPXCHG_64_H
 #define _ASM_X86_CMPXCHG_64_H
 
-#include <asm/alternative.h> /* Provides LOCK_PREFIX */
-
-extern void __xchg_wrong_size(void);
-extern void __cmpxchg_wrong_size(void);
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define __xchg(x, ptr, size)						\
-({									\
-	__typeof(*(ptr)) __x = (x);					\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile("xchgb %0,%1"				\
-			     : "=q" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile("xchgw %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile("xchgl %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 8:								\
-	{								\
-		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile("xchgq %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__xchg_wrong_size();					\
-	}								\
-	__x;								\
-})
-
-#define xchg(ptr, v)							\
-	__xchg((v), (ptr), sizeof(*ptr))
-
 static inline void set_64bit(volatile u64 *ptr, u64 val)
 {
 	*ptr = val;
@@ -68,77 +8,6 @@ static inline void set_64bit(volatile u64 *ptr, u64 val)
 
 #define __HAVE_ARCH_CMPXCHG 1
 
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
-({									\
-	__typeof__(*(ptr)) __ret;					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "q" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 8:								\
-	{								\
-		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile(lock "cmpxchgq %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__cmpxchg_wrong_size();					\
-	}								\
-	__ret;								\
-})
-
-#define __cmpxchg(ptr, old, new, size)					\
-	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
-#define __sync_cmpxchg(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
-
-#define __cmpxchg_local(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "")
-
-#define cmpxchg(ptr, old, new)						\
-	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define sync_cmpxchg(ptr, old, new)					\
-	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define cmpxchg_local(ptr, old, new)					\
-	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
-
 #define cmpxchg64(ptr, o, n)						\
 ({									\
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/18] x86: add cmpxchg_flag() variant
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (11 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 12/18] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 14/18] x86/ticketlocks: use cmpxchg_flag for trylock Jeremy Fitzhardinge
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Most callers of cmpxchg() direcly compare RETURN with OLD to see if it was
successful.  This results in unnecessary conditional comparisons
and conditionals since the cmpxchg instruction directly sets the flags
to indicate success/failure.

Add cmpxchg_flag() variants which return a boolean flag directly indicating
success.  Unfortunately an asm() statement can't directly export status
status flags, but sete isn't too bad.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h    |   48 +++++++++++++++++++++++++++++++-----
 arch/x86/include/asm/cmpxchg_32.h |   14 ++++++++++
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 76375ba..57d6706 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -64,18 +64,18 @@ extern void __cmpxchg_wrong_size(void);
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
+ * determined by "compare"
  */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+#define __raw_cmpxchg_cmp(ptr, old, new, size, lock, rettype, compare)	\
 ({									\
-	__typeof__(*(ptr)) __ret;					\
+	rettype __ret;							\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
 	switch (size) {							\
 	case 1:								\
 	{								\
 		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
+		asm volatile(lock "cmpxchgb %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "q" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -84,7 +84,7 @@ extern void __cmpxchg_wrong_size(void);
 	case 2:								\
 	{								\
 		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
+		asm volatile(lock "cmpxchgw %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -93,7 +93,7 @@ extern void __cmpxchg_wrong_size(void);
 	case 4:								\
 	{								\
 		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
+		asm volatile(lock "cmpxchgl %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -102,7 +102,7 @@ extern void __cmpxchg_wrong_size(void);
 	case 8:								\
 	{								\
 		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile(lock "cmpxchgq %2,%1"			\
+		asm volatile(lock "cmpxchgq %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -114,15 +114,40 @@ extern void __cmpxchg_wrong_size(void);
 	__ret;								\
 })
 
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+	__raw_cmpxchg_cmp(ptr, old, new, size, lock, __typeof__(*(ptr)), "")
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by a true return.
+ */
+#define __raw_cmpxchg_flag(ptr, old, new, size, lock)			\
+	__raw_cmpxchg_cmp(ptr, old, new, size, lock, unsigned char, "sete %0")
+
 #define __cmpxchg(ptr, old, new, size)					\
 	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
 
+#define __cmpxchg_flag(ptr, old, new, size)				\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), LOCK_PREFIX)
+
 #define __sync_cmpxchg(ptr, old, new, size)				\
 	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
 
+#define __sync_cmpxchg_flag(ptr, old, new, size)			\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), "lock; ")
+
 #define __cmpxchg_local(ptr, old, new, size)				\
 	__raw_cmpxchg((ptr), (old), (new), (size), "")
 
+#define __cmpxchg_local_flag(ptr, old, new, size)		\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), "")
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
@@ -134,11 +159,20 @@ extern void __cmpxchg_wrong_size(void);
 #define cmpxchg(ptr, old, new)						\
 	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
 
+#define cmpxchg_flag(ptr, old, new)					\
+	__cmpxchg_flag((ptr), (old), (new), sizeof(*ptr))
+
 #define sync_cmpxchg(ptr, old, new)					\
 	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
 
+#define sync_cmpxchg_flag(ptr, old, new)				\
+	__sync_cmpxchg_flag((ptr), (old), (new), sizeof(*ptr))
+
 #define cmpxchg_local(ptr, old, new)					\
 	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local_flag(ptr, old, new)				\
+	__cmpxchg_local_flag((ptr), (old), (new), sizeof(*ptr))
 #endif
 
 #define xadd(ptr, inc)							\
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 3b573f6..0797bc6 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -111,6 +111,13 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 				sizeof(*(ptr)));			\
 	__ret;								\
 })
+
+#define cmpxchg_flag(ptr, o, n)			\
+({						\
+	__typeof__(*(ptr)) __orig = (o);	\
+	cmpxchg((ptr), __orig, (n)) == __orig;	\
+})
+
 #define cmpxchg_local(ptr, o, n)					\
 ({									\
 	__typeof__(*(ptr)) __ret;					\
@@ -124,6 +131,13 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 				sizeof(*(ptr)));			\
 	__ret;								\
 })
+
+#define cmpxchg_local_flag(ptr, o, n)			\
+({							\
+	__typeof__(*(ptr)) __orig = (o);		\
+	cmpxchg_local((ptr), __orig, (n)) == __orig;	\
+})
+
 #endif
 
 #ifndef CONFIG_X86_CMPXCHG64
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/18] x86/ticketlocks: use cmpxchg_flag for trylock
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (12 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 13/18] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 15/18] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 05fd59e..e208c8f 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -102,7 +102,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
 
 	/* cmpxchg is a full barrier, so nothing can move before it */
-	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
+	return cmpxchg_flag(&lock->head_tail, old.head_tail, new.head_tail);
 }
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/18] x86: use cmpxchg_flag() where applicable
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (13 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 14/18] x86/ticketlocks: use cmpxchg_flag for trylock Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 16/18] x86: report xchg/cmpxchg/xadd usage errors consistently Jeremy Fitzhardinge
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cacheflush.h |    2 +-
 arch/x86/include/asm/rwsem.h      |    7 ++-----
 arch/x86/kernel/acpi/boot.c       |   10 ++++------
 arch/x86/kernel/cpu/mcheck/mce.c  |    2 +-
 arch/x86/xen/p2m.c                |   10 +++++-----
 5 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 4e12668..6ceae20 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -56,7 +56,7 @@ static inline void set_page_memtype(struct page *pg, unsigned long memtype)
 	do {
 		old_flags = pg->flags;
 		new_flags = (old_flags & _PGMT_CLEAR_MASK) | memtype_flags;
-	} while (cmpxchg(&pg->flags, old_flags, new_flags) != old_flags);
+	} while (!cmpxchg_flag(&pg->flags, old_flags, new_flags));
 }
 #else
 static inline unsigned long get_page_memtype(struct page *pg) { return -1; }
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index df4cd32..5a35263 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -126,11 +126,8 @@ static inline void __down_write(struct rw_semaphore *sem)
  */
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long ret = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-			   RWSEM_ACTIVE_WRITE_BIAS);
-	if (ret == RWSEM_UNLOCKED_VALUE)
-		return 1;
-	return 0;
+	return cmpxchg_flag(&sem->count, RWSEM_UNLOCKED_VALUE,
+			    RWSEM_ACTIVE_WRITE_BIAS);
 }
 
 /*
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9a966c5..6bf0bf8 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1669,22 +1669,20 @@ early_param("acpi_sci", setup_acpi_sci);
 
 int __acpi_acquire_global_lock(unsigned int *lock)
 {
-	unsigned int old, new, val;
+	unsigned int old, new;
 	do {
 		old = *lock;
 		new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1));
-		val = cmpxchg(lock, old, new);
-	} while (unlikely (val != old));
+	} while (unlikely (!cmpxchg_flag(lock, old, new)));
 	return (new < 3) ? -1 : 0;
 }
 
 int __acpi_release_global_lock(unsigned int *lock)
 {
-	unsigned int old, new, val;
+	unsigned int old, new;
 	do {
 		old = *lock;
 		new = old & ~0x3;
-		val = cmpxchg(lock, old, new);
-	} while (unlikely (val != old));
+	} while (unlikely (!cmpxchg_flag(lock, old, new)));
 	return old & 0x1;
 }
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3385ea2..99ae2d1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -198,7 +198,7 @@ void mce_log(struct mce *mce)
 		}
 		smp_rmb();
 		next = entry + 1;
-		if (cmpxchg(&mcelog.next, entry, next) == entry)
+		if (cmpxchg_flag(&mcelog.next, entry, next))
 			break;
 	}
 	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 141eb0d..5510082 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -448,7 +448,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_mid_init(mid);
 
-		if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
+		if (!cmpxchg_flag(top_p, p2m_mid_missing, mid))
 			free_p2m_page(mid);
 	}
 
@@ -470,7 +470,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
 		mid_mfn_mfn = virt_to_mfn(mid_mfn);
-		if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
+		if (!cmpxchg_flag(top_mfn_p, missing_mfn, mid_mfn_mfn))
 			free_p2m_page(mid_mfn);
 		else
 			p2m_top_mfn_p[topidx] = mid_mfn;
@@ -488,7 +488,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_init(p2m);
 
-		if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
+		if (!cmpxchg_flag(&mid[mididx], p2m_orig, p2m))
 			free_p2m_page(p2m);
 		else
 			mid_mfn[mididx] = virt_to_mfn(p2m);
@@ -600,8 +600,8 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 		/* Swap over from MISSING to IDENTITY if needed. */
 		if (p2m_top[topidx][mididx] == p2m_missing) {
-			WARN_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing,
-				p2m_identity) != p2m_missing);
+			WARN_ON(!cmpxchg_flag(&p2m_top[topidx][mididx], p2m_missing,
+					      p2m_identity));
 			return true;
 		}
 	}
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 16/18] x86: report xchg/cmpxchg/xadd usage errors consistently
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (14 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 15/18] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 17/18] x86: add local and sync variants of xadd Jeremy Fitzhardinge
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make sure that passing a variable of an unusable size causes a consistent
link-time failure.  Previously, using a 64-bit value on a 32-bit system
would cause an assember error, which isn't easy to correlate with a line
of code.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h |   44 +++++++++++++++++++++++++++++----------
 1 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 57d6706..c99ce79 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -3,8 +3,26 @@
 
 #include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
+/* Non-existant functions to indicate usage errors at link time. */
 extern void __xchg_wrong_size(void);
 extern void __cmpxchg_wrong_size(void);
+extern void __xadd_wrong_size(void);
+
+/*
+ * Constants for operation sizes. On 32-bit, the 64-bit size it set to
+ * -1 because sizeof will never return -1, thereby making those switch
+ * case statements guaranteeed dead code which the compiler will
+ * eliminate, and allowing the "missing symbol in the default case" to
+ * indicate a usage error.
+ */
+#define __X86_CASE_B	1
+#define __X86_CASE_W	2
+#define __X86_CASE_L	4
+#ifdef CONFIG_64BIT
+#define __X86_CASE_Q	8
+#else
+#define	__X86_CASE_Q	-1		/* sizeof will never return -1 */
+#endif
 
 /*
  * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
@@ -16,7 +34,7 @@ extern void __cmpxchg_wrong_size(void);
 ({									\
 	__typeof(*(ptr)) __x = (x);					\
 	switch (size) {							\
-	case 1:								\
+	case __X86_CASE_B:						\
 	{								\
 		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
 		asm volatile("xchgb %0,%1"				\
@@ -25,7 +43,7 @@ extern void __cmpxchg_wrong_size(void);
 			     : "memory");				\
 		break;							\
 	}								\
-	case 2:								\
+	case __X86_CASE_W:						\
 	{								\
 		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
 		asm volatile("xchgw %0,%1"				\
@@ -34,7 +52,7 @@ extern void __cmpxchg_wrong_size(void);
 			     : "memory");				\
 		break;							\
 	}								\
-	case 4:								\
+	case __X86_CASE_L:						\
 	{								\
 		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
 		asm volatile("xchgl %0,%1"				\
@@ -43,7 +61,7 @@ extern void __cmpxchg_wrong_size(void);
 			     : "memory");				\
 		break;							\
 	}								\
-	case 8:								\
+	case __X86_CASE_Q:						\
 	{								\
 		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
 		asm volatile("xchgq %0,%1"				\
@@ -72,7 +90,7 @@ extern void __cmpxchg_wrong_size(void);
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
 	switch (size) {							\
-	case 1:								\
+	case __X86_CASE_B:						\
 	{								\
 		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
 		asm volatile(lock "cmpxchgb %2,%1; " compare		\
@@ -81,7 +99,7 @@ extern void __cmpxchg_wrong_size(void);
 			     : "memory");				\
 		break;							\
 	}								\
-	case 2:								\
+	case __X86_CASE_W:						\
 	{								\
 		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
 		asm volatile(lock "cmpxchgw %2,%1; " compare		\
@@ -90,7 +108,7 @@ extern void __cmpxchg_wrong_size(void);
 			     : "memory");				\
 		break;							\
 	}								\
-	case 4:								\
+	case __X86_CASE_L:						\
 	{								\
 		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
 		asm volatile(lock "cmpxchgl %2,%1; " compare		\
@@ -99,7 +117,7 @@ extern void __cmpxchg_wrong_size(void);
 			     : "memory");				\
 		break;							\
 	}								\
-	case 8:								\
+	case __X86_CASE_Q:						\
 	{								\
 		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
 		asm volatile(lock "cmpxchgq %2,%1; " compare		\
@@ -178,26 +196,28 @@ extern void __cmpxchg_wrong_size(void);
 #define xadd(ptr, inc)							\
 	do {								\
 		switch (sizeof(*(ptr))) {				\
-		case 1:							\
+		case __X86_CASE_B:					\
 			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
-		case 2:							\
+		case __X86_CASE_W:					\
 			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
-		case 4:							\
+		case __X86_CASE_L:					\
 			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
-		case 8:							\
+		case __X86_CASE_Q:					\
 			asm volatile (LOCK_PREFIX "xaddq %q0, %1\n"	\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
+		default:						\
+			__xadd_wrong_size();				\
 		}							\
 	} while(0)
 
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 17/18] x86: add local and sync variants of xadd
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (15 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 16/18] x86: report xchg/cmpxchg/xadd usage errors consistently Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 17:53 ` [PATCH 18/18] x86: use xadd helper more widely Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index c99ce79..992ac80 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -193,26 +193,26 @@ extern void __xadd_wrong_size(void);
 	__cmpxchg_local_flag((ptr), (old), (new), sizeof(*ptr))
 #endif
 
-#define xadd(ptr, inc)							\
+#define __xadd(ptr, inc, lock)						\
 	do {								\
 		switch (sizeof(*(ptr))) {				\
 		case __X86_CASE_B:					\
-			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
+			asm volatile (lock "xaddb %b0, %1\n"		\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
 		case __X86_CASE_W:					\
-			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
+			asm volatile (lock "xaddw %w0, %1\n"		\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
 		case __X86_CASE_L:					\
-			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
+			asm volatile (lock "xaddl %0, %1\n"		\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
 		case __X86_CASE_Q:					\
-			asm volatile (LOCK_PREFIX "xaddq %q0, %1\n"	\
+			asm volatile (lock "xaddq %q0, %1\n"		\
 				      : "+r" (inc), "+m" (*(ptr))	\
 				      : : "memory", "cc");		\
 			break;						\
@@ -221,4 +221,8 @@ extern void __xadd_wrong_size(void);
 		}							\
 	} while(0)
 
+#define xadd(ptr, inc)		__xadd((ptr), (inc), LOCK_PREFIX)
+#define xadd_sync(ptr, inc)	__xadd((ptr), (inc), "lock; ")
+#define xadd_local(ptr, inc)	__xadd((ptr), (inc), "")
+
 #endif	/* ASM_X86_CMPXCHG_H */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 18/18] x86: use xadd helper more widely
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (16 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 17/18] x86: add local and sync variants of xadd Jeremy Fitzhardinge
@ 2011-08-24 17:53 ` Jeremy Fitzhardinge
  2011-08-24 18:03 ` [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Peter Zijlstra
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

This covers the trivial cases.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/atomic.h      |    4 +---
 arch/x86/include/asm/atomic64_64.h |    4 +---
 arch/x86/include/asm/rwsem.h       |    4 +---
 arch/x86/include/asm/uv/uv_bau.h   |    4 +---
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 952a826..f201fab 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -180,9 +180,7 @@ static inline int atomic_add_return(int i, atomic_t *v)
 #endif
 	/* Modern 486+ processor */
 	__i = i;
-	asm volatile(LOCK_PREFIX "xaddl %0, %1"
-		     : "+r" (i), "+m" (v->counter)
-		     : : "memory");
+	xadd(&v->counter, i);
 	return i + __i;
 
 #ifdef CONFIG_M386
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 49fd1ea..ab3b586 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -171,9 +171,7 @@ static inline int atomic64_add_negative(long i, atomic64_t *v)
 static inline long atomic64_add_return(long i, atomic64_t *v)
 {
 	long __i = i;
-	asm volatile(LOCK_PREFIX "xaddq %0, %1;"
-		     : "+r" (i), "+m" (v->counter)
-		     : : "memory");
+	xadd(&v->counter, i);
 	return i + __i;
 }
 
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 5a35263..958841a 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -203,9 +203,7 @@ static inline long rwsem_atomic_update(long delta, struct rw_semaphore *sem)
 {
 	long tmp = delta;
 
-	asm volatile(LOCK_PREFIX "xadd %0,%1"
-		     : "+r" (tmp), "+m" (sem->count)
-		     : : "memory");
+	xadd(&sem->count, tmp);
 
 	return tmp + delta;
 }
diff --git a/arch/x86/include/asm/uv/uv_bau.h b/arch/x86/include/asm/uv/uv_bau.h
index 130f1ee..3d36436 100644
--- a/arch/x86/include/asm/uv/uv_bau.h
+++ b/arch/x86/include/asm/uv/uv_bau.h
@@ -488,9 +488,7 @@ static inline int atomic_read_short(const struct atomic_short *v)
 static inline int atomic_add_short_return(short i, struct atomic_short *v)
 {
 	short __i = i;
-	asm volatile(LOCK_PREFIX "xaddw %0, %1"
-			: "+r" (i), "+m" (v->counter)
-			: : "memory");
+	xadd(&v->counter, i);
 	return i + __i;
 }
 
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/18] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-08-24 17:52 ` [PATCH 03/18] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-08-24 18:01   ` Linus Torvalds
  0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2011-08-24 18:01 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 10:52 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> If we don't need to use a locked inc for unlock, then implement it in C.

Ok, so I really hate this patch.

What the f&*^ is the point of doing it this way? It just ends up being
incredibly ugly, with a mixture of C, asm, and barriers. The "C"
version is actually *uglier* than the non-C version, so why do it in C
at all?

Just get rid of this one.  It adds lines of code, makes the code
harder to read, and is just a disaster.

                        Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (17 preceding siblings ...)
  2011-08-24 17:53 ` [PATCH 18/18] x86: use xadd helper more widely Jeremy Fitzhardinge
@ 2011-08-24 18:03 ` Peter Zijlstra
  2011-08-24 18:24   ` Linus Torvalds
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
  2011-08-24 21:46 ` [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
  20 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2011-08-24 18:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, 2011-08-24 at 10:52 -0700, Jeremy Fitzhardinge wrote:
> 
>  arch/x86/include/asm/atomic.h         |    4 +-
>  arch/x86/include/asm/atomic64_64.h    |    4 +-
>  arch/x86/include/asm/cacheflush.h     |    2 +-
>  arch/x86/include/asm/cmpxchg.h        |  223 +++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/cmpxchg_32.h     |  128 ++-----------------
>  arch/x86/include/asm/cmpxchg_64.h     |  131 -------------------
>  arch/x86/include/asm/rwsem.h          |   11 +--
>  arch/x86/include/asm/spinlock.h       |  140 +++++++--------------
>  arch/x86/include/asm/spinlock_types.h |   22 +++-
>  arch/x86/include/asm/uv/uv_bau.h      |    4 +-
>  arch/x86/kernel/acpi/boot.c           |   10 +-
>  arch/x86/kernel/cpu/mcheck/mce.c      |    2 +-
>  arch/x86/xen/p2m.c                    |   10 +-
>  13 files changed, 321 insertions(+), 370 deletions(-)

The patches look good to me, and 50 lines less is always nice too!

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 18:03 ` [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Peter Zijlstra
@ 2011-08-24 18:24   ` Linus Torvalds
  2011-08-24 20:10     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2011-08-24 18:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jeremy Fitzhardinge, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 11:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The patches look good to me, and 50 lines less is always nice too!
>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

So I mostly agree, except for the abortion that is the
__ticket_spin_unlock changes. Please just kill that part.

I also personally would have preferred to re-order things a bit. For
example,  I despise

  [PATCH 05/18] x86/ticketlock: make __ticket_spin_lock common

because it is totally pointless. If patch "[PATCH 07/18] x86: add xadd
helper macro" had been before that patch, then the whole "if
(sizeof..)" crap would never have existed. So even though the end
result there looks good (unlike the __ticket_spin_unlock case, which
just *stays* uglier after the whole series), the individual patches
are much uglier than needed, just because they are done in the wrong
order.

So I'd much have preferred that the xadd helper go in first (07/18),
then the patch to make __ticket_spin_lock the same for 32/64 would go
in next (04/18) using that helper, and then they'd just have been
identical automatically.

Instead, there are a few patches that actually make the code uglier
temporarily, only to then be undone later. That seems bogus, when it
looks so simple to fix it.

                       Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 17:53 ` [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-08-24 20:00   ` Andi Kleen
  2011-08-24 21:38     ` Linus Torvalds
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2011-08-24 20:00 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> Make trylock code common regardless of ticket size.

Can't we just get rid of the small ticket size code path?

I couldn't benchmark any difference between the two last time 
I tried.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/18] x86/ticketlock: convert spin loop to C
  2011-08-24 17:52 ` [PATCH 02/18] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-08-24 20:02   ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2011-08-24 20:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

Jeremy Fitzhardinge <jeremy@goop.org> writes:
> +
> +	for (;;) {
> +		if (inc.tickets.head == inc.tickets.tail)
> +			goto out;

What's wrong with break ?  Similar below.

> +		cpu_relax();
> +		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
> +	}
> +out:	barrier();		/* make sure nothing creeps before the lock is taken */
>  }


-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 18:24   ` Linus Torvalds
@ 2011-08-24 20:10     ` Jeremy Fitzhardinge
  2011-08-24 22:53       ` Linus Torvalds
  0 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 20:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 11:24 AM, Linus Torvalds wrote:
> On Wed, Aug 24, 2011 at 11:03 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> The patches look good to me, and 50 lines less is always nice too!
>>
>> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> So I mostly agree, except for the abortion that is the
> __ticket_spin_unlock changes. Please just kill that part.

Could we just kill SMP support for OOSTORE machines now?  That would be
the cleanest possible fix...

> So I'd much have preferred that the xadd helper go in first (07/18),
> then the patch to make __ticket_spin_lock the same for 32/64 would go
> in next (04/18) using that helper, and then they'd just have been
> identical automatically.
>
> Instead, there are a few patches that actually make the code uglier
> temporarily, only to then be undone later. That seems bogus, when it
> looks so simple to fix it.

Yep, OK.  The current series is a history of incremental development; I
can easily clean it up into something that appeared perfectly formed.

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (18 preceding siblings ...)
  2011-08-24 18:03 ` [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Peter Zijlstra
@ 2011-08-24 21:36 ` Jeremy Fitzhardinge
  2011-08-24 21:36   ` [PATCH 02/12] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
                     ` (10 more replies)
  2011-08-24 21:46 ` [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
  20 siblings, 11 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Not linux/bitops.h.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_32.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 284a6e8..8d53b16 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -1,7 +1,7 @@
 #ifndef _ASM_X86_CMPXCHG_32_H
 #define _ASM_X86_CMPXCHG_32_H
 
-#include <linux/bitops.h> /* for LOCK_PREFIX */
+#include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
 /*
  * Note: if you use set64_bit(), __cmpxchg64(), or their variants, you
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/12] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit.
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
@ 2011-08-24 21:36   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 03/12] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_32.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 8d53b16..0ef9b82 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -9,6 +9,7 @@
  */
 
 extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
 
 /*
  * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
@@ -84,8 +85,6 @@ static inline void set_64bit(volatile u64 *ptr, u64 value)
 		     : "memory");
 }
 
-extern void __cmpxchg_wrong_size(void);
-
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/12] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
  2011-08-24 21:36   ` [PATCH 02/12] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 04/12] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_64.h |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 423ae58..aeaa610 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -3,11 +3,6 @@
 
 #include <asm/alternative.h> /* Provides LOCK_PREFIX */
 
-static inline void set_64bit(volatile u64 *ptr, u64 val)
-{
-	*ptr = val;
-}
-
 extern void __xchg_wrong_size(void);
 extern void __cmpxchg_wrong_size(void);
 
@@ -66,6 +61,11 @@ extern void __cmpxchg_wrong_size(void);
 #define xchg(ptr, v)							\
 	__xchg((v), (ptr), sizeof(*ptr))
 
+static inline void set_64bit(volatile u64 *ptr, u64 val)
+{
+	*ptr = val;
+}
+
 #define __HAVE_ARCH_CMPXCHG 1
 
 /*
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/12] x86/cmpxchg: unify cmpxchg into cmpxchg.h
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
  2011-08-24 21:36   ` [PATCH 02/12] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 03/12] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 05/12] x86: add xadd helper macro Jeremy Fitzhardinge
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Everything that's actually common between 32 and 64-bit is moved into
cmpxchg.h.

xchg/cmpxchg will fail with a link error if they're passed an
unsupported size (which includes 64-bit args on 32-bit systems).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h    |  155 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_32.h |  113 ---------------------------
 arch/x86/include/asm/cmpxchg_64.h |  131 -------------------------------
 3 files changed, 155 insertions(+), 244 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index a460fa0..efe3ec7 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -1,5 +1,160 @@
+#ifndef ASM_X86_CMPXCHG_H
+#define ASM_X86_CMPXCHG_H
+
+#include <asm/alternative.h> /* Provides LOCK_PREFIX */
+
+/* Non-existant functions to indicate usage errors at link time. */
+extern void __xchg_wrong_size(void);
+extern void __cmpxchg_wrong_size(void);
+
+/*
+ * Constants for operation sizes. On 32-bit, the 64-bit size it set to
+ * -1 because sizeof will never return -1, thereby making those switch
+ * case statements guaranteeed dead code which the compiler will
+ * eliminate, and allowing the "missing symbol in the default case" to
+ * indicate a usage error.
+ */
+#define __X86_CASE_B	1
+#define __X86_CASE_W	2
+#define __X86_CASE_L	4
+#ifdef CONFIG_64BIT
+#define __X86_CASE_Q	8
+#else
+#define	__X86_CASE_Q	-1		/* sizeof will never return -1 */
+#endif
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
+ * Since this is generally used to protect other memory information, we
+ * use "asm volatile" and "memory" clobbers to prevent gcc from moving
+ * information around.
+ */
+#define __xchg(x, ptr, size)						\
+({									\
+	__typeof(*(ptr)) __x = (x);					\
+	switch (size) {							\
+	case __X86_CASE_B:						\
+	{								\
+		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
+		asm volatile("xchgb %0,%1"				\
+			     : "=q" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_W:						\
+	{								\
+		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
+		asm volatile("xchgw %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_L:						\
+	{								\
+		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
+		asm volatile("xchgl %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_Q:						\
+	{								\
+		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
+		asm volatile("xchgq %0,%1"				\
+			     : "=r" (__x), "+m" (*__ptr)		\
+			     : "0" (__x)				\
+			     : "memory");				\
+		break;							\
+	}								\
+	default:							\
+		__xchg_wrong_size();					\
+	}								\
+	__x;								\
+})
+
+#define xchg(ptr, v)							\
+	__xchg((v), (ptr), sizeof(*ptr))
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+({									\
+	__typeof__(*(ptr)) __ret;					\
+	__typeof__(*(ptr)) __old = (old);				\
+	__typeof__(*(ptr)) __new = (new);				\
+	switch (size) {							\
+	case __X86_CASE_B:						\
+	{								\
+		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
+		asm volatile(lock "cmpxchgb %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "q" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_W:						\
+	{								\
+		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
+		asm volatile(lock "cmpxchgw %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_L:						\
+	{								\
+		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
+		asm volatile(lock "cmpxchgl %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	case __X86_CASE_Q:						\
+	{								\
+		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
+		asm volatile(lock "cmpxchgq %2,%1"			\
+			     : "=a" (__ret), "+m" (*__ptr)		\
+			     : "r" (__new), "0" (__old)			\
+			     : "memory");				\
+		break;							\
+	}								\
+	default:							\
+		__cmpxchg_wrong_size();					\
+	}								\
+	__ret;								\
+})
+
+#define __cmpxchg(ptr, old, new, size)					\
+	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
+
+#define __sync_cmpxchg(ptr, old, new, size)				\
+	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
+
+#define __cmpxchg_local(ptr, old, new, size)				\
+	__raw_cmpxchg((ptr), (old), (new), (size), "")
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
 # include "cmpxchg_64.h"
 #endif
+
+#ifdef __HAVE_ARCH_CMPXCHG
+#define cmpxchg(ptr, old, new)						\
+	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define sync_cmpxchg(ptr, old, new)					\
+	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local(ptr, old, new)					\
+	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
+#endif
+
+#endif	/* ASM_X86_CMPXCHG_H */
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 0ef9b82..3b573f6 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -1,62 +1,11 @@
 #ifndef _ASM_X86_CMPXCHG_32_H
 #define _ASM_X86_CMPXCHG_32_H
 
-#include <asm/alternative.h> /* Provides LOCK_PREFIX */
-
 /*
  * Note: if you use set64_bit(), __cmpxchg64(), or their variants, you
  *       you need to test for the feature in boot_cpu_data.
  */
 
-extern void __xchg_wrong_size(void);
-extern void __cmpxchg_wrong_size(void);
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define __xchg(x, ptr, size)						\
-({									\
-	__typeof(*(ptr)) __x = (x);					\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile("xchgb %0,%1"				\
-			     : "=q" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile("xchgw %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile("xchgl %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__xchg_wrong_size();					\
-	}								\
-	__x;								\
-})
-
-#define xchg(ptr, v)							\
-	__xchg((v), (ptr), sizeof(*ptr))
-
 /*
  * CMPXCHG8B only writes to the target if we had the previous
  * value in registers, otherwise it acts as a read and gives us the
@@ -85,70 +34,8 @@ static inline void set_64bit(volatile u64 *ptr, u64 value)
 		     : "memory");
 }
 
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
-({									\
-	__typeof__(*(ptr)) __ret;					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "q" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__cmpxchg_wrong_size();					\
-	}								\
-	__ret;								\
-})
-
-#define __cmpxchg(ptr, old, new, size)					\
-	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
-#define __sync_cmpxchg(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
-
-#define __cmpxchg_local(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "")
-
 #ifdef CONFIG_X86_CMPXCHG
 #define __HAVE_ARCH_CMPXCHG 1
-
-#define cmpxchg(ptr, old, new)						\
-	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define sync_cmpxchg(ptr, old, new)					\
-	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define cmpxchg_local(ptr, old, new)					\
-	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
 #endif
 
 #ifdef CONFIG_X86_CMPXCHG64
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index aeaa610..9d6c147 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -1,66 +1,6 @@
 #ifndef _ASM_X86_CMPXCHG_64_H
 #define _ASM_X86_CMPXCHG_64_H
 
-#include <asm/alternative.h> /* Provides LOCK_PREFIX */
-
-extern void __xchg_wrong_size(void);
-extern void __cmpxchg_wrong_size(void);
-
-/*
- * Note: no "lock" prefix even on SMP: xchg always implies lock anyway.
- * Since this is generally used to protect other memory information, we
- * use "asm volatile" and "memory" clobbers to prevent gcc from moving
- * information around.
- */
-#define __xchg(x, ptr, size)						\
-({									\
-	__typeof(*(ptr)) __x = (x);					\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile("xchgb %0,%1"				\
-			     : "=q" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile("xchgw %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile("xchgl %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 8:								\
-	{								\
-		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile("xchgq %0,%1"				\
-			     : "=r" (__x), "+m" (*__ptr)		\
-			     : "0" (__x)				\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__xchg_wrong_size();					\
-	}								\
-	__x;								\
-})
-
-#define xchg(ptr, v)							\
-	__xchg((v), (ptr), sizeof(*ptr))
-
 static inline void set_64bit(volatile u64 *ptr, u64 val)
 {
 	*ptr = val;
@@ -68,77 +8,6 @@ static inline void set_64bit(volatile u64 *ptr, u64 val)
 
 #define __HAVE_ARCH_CMPXCHG 1
 
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
-({									\
-	__typeof__(*(ptr)) __ret;					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	switch (size) {							\
-	case 1:								\
-	{								\
-		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "q" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 2:								\
-	{								\
-		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 4:								\
-	{								\
-		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	case 8:								\
-	{								\
-		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile(lock "cmpxchgq %2,%1"			\
-			     : "=a" (__ret), "+m" (*__ptr)		\
-			     : "r" (__new), "0" (__old)			\
-			     : "memory");				\
-		break;							\
-	}								\
-	default:							\
-		__cmpxchg_wrong_size();					\
-	}								\
-	__ret;								\
-})
-
-#define __cmpxchg(ptr, old, new, size)					\
-	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
-
-#define __sync_cmpxchg(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
-
-#define __cmpxchg_local(ptr, old, new, size)				\
-	__raw_cmpxchg((ptr), (old), (new), (size), "")
-
-#define cmpxchg(ptr, old, new)						\
-	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define sync_cmpxchg(ptr, old, new)					\
-	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
-
-#define cmpxchg_local(ptr, old, new)					\
-	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
-
 #define cmpxchg64(ptr, o, n)						\
 ({									\
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/12] x86: add xadd helper macro
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (2 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 04/12] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 06/12] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Add a common xadd implementation.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h |   43 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index efe3ec7..0d0d9cd 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -6,6 +6,7 @@
 /* Non-existant functions to indicate usage errors at link time. */
 extern void __xchg_wrong_size(void);
 extern void __cmpxchg_wrong_size(void);
+extern void __xadd_wrong_size(void);
 
 /*
  * Constants for operation sizes. On 32-bit, the 64-bit size it set to
@@ -157,4 +158,46 @@ extern void __cmpxchg_wrong_size(void);
 	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
 #endif
 
+#define __xadd(ptr, inc, lock)						\
+	({								\
+	        __typeof__ (*(ptr)) __ret = (inc);			\
+		switch (sizeof(*(ptr))) {				\
+		case __X86_CASE_B:					\
+			asm volatile (lock "xaddb %b0, %1\n"		\
+				      : "+r" (__ret), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case __X86_CASE_W:					\
+			asm volatile (lock "xaddw %w0, %1\n"		\
+				      : "+r" (__ret), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case __X86_CASE_L:					\
+			asm volatile (lock "xaddl %0, %1\n"		\
+				      : "+r" (__ret), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case __X86_CASE_Q:					\
+			asm volatile (lock "xaddq %q0, %1\n"		\
+				      : "+r" (__ret), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		default:						\
+			__xadd_wrong_size();				\
+		}							\
+		__ret;							\
+	})
+
+/*
+ * xadd() adds "inc" to "*ptr" and atomically returns the previous
+ * value of "*ptr".
+ *
+ * xadd() is locked when multiple CPUs are online
+ * xadd_sync() is always locked
+ * xadd_local() is never locked
+ */
+#define xadd(ptr, inc)		__xadd((ptr), (inc), LOCK_PREFIX)
+#define xadd_sync(ptr, inc)	__xadd((ptr), (inc), "lock; ")
+#define xadd_local(ptr, inc)	__xadd((ptr), (inc), "")
+
 #endif	/* ASM_X86_CMPXCHG_H */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/12] x86: add cmpxchg_flag() variant
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (3 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 05/12] x86: add xadd helper macro Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 07/12] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Most callers of cmpxchg() direcly compare RETURN with OLD to see if it was
successful.  This results in unnecessary conditional comparisons
and conditionals since the cmpxchg instruction directly sets the flags
to indicate success/failure.

Add cmpxchg_flag() variants which return a boolean flag directly indicating
success.  Unfortunately an asm() statement can't directly export status
status flags, but sete isn't too bad.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg.h    |   48 +++++++++++++++++++++++++++++++-----
 arch/x86/include/asm/cmpxchg_32.h |   14 ++++++++++
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 0d0d9cd..6013247 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -82,18 +82,18 @@ extern void __xadd_wrong_size(void);
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
+ * determined by "compare"
  */
-#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+#define __raw_cmpxchg_cmp(ptr, old, new, size, lock, rettype, compare)	\
 ({									\
-	__typeof__(*(ptr)) __ret;					\
+	rettype __ret;							\
 	__typeof__(*(ptr)) __old = (old);				\
 	__typeof__(*(ptr)) __new = (new);				\
 	switch (size) {							\
 	case __X86_CASE_B:						\
 	{								\
 		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
-		asm volatile(lock "cmpxchgb %2,%1"			\
+		asm volatile(lock "cmpxchgb %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "q" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -102,7 +102,7 @@ extern void __xadd_wrong_size(void);
 	case __X86_CASE_W:						\
 	{								\
 		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
-		asm volatile(lock "cmpxchgw %2,%1"			\
+		asm volatile(lock "cmpxchgw %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -111,7 +111,7 @@ extern void __xadd_wrong_size(void);
 	case __X86_CASE_L:						\
 	{								\
 		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
-		asm volatile(lock "cmpxchgl %2,%1"			\
+		asm volatile(lock "cmpxchgl %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -120,7 +120,7 @@ extern void __xadd_wrong_size(void);
 	case __X86_CASE_Q:						\
 	{								\
 		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
-		asm volatile(lock "cmpxchgq %2,%1"			\
+		asm volatile(lock "cmpxchgq %2,%1; " compare		\
 			     : "=a" (__ret), "+m" (*__ptr)		\
 			     : "r" (__new), "0" (__old)			\
 			     : "memory");				\
@@ -132,15 +132,40 @@ extern void __xadd_wrong_size(void);
 	__ret;								\
 })
 
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+#define __raw_cmpxchg(ptr, old, new, size, lock)			\
+	__raw_cmpxchg_cmp(ptr, old, new, size, lock, __typeof__(*(ptr)), "")
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by a true return.
+ */
+#define __raw_cmpxchg_flag(ptr, old, new, size, lock)			\
+	__raw_cmpxchg_cmp(ptr, old, new, size, lock, unsigned char, "sete %0")
+
 #define __cmpxchg(ptr, old, new, size)					\
 	__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
 
+#define __cmpxchg_flag(ptr, old, new, size)				\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), LOCK_PREFIX)
+
 #define __sync_cmpxchg(ptr, old, new, size)				\
 	__raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
 
+#define __sync_cmpxchg_flag(ptr, old, new, size)			\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), "lock; ")
+
 #define __cmpxchg_local(ptr, old, new, size)				\
 	__raw_cmpxchg((ptr), (old), (new), (size), "")
 
+#define __cmpxchg_local_flag(ptr, old, new, size)		\
+	__raw_cmpxchg_flag((ptr), (old), (new), (size), "")
+
 #ifdef CONFIG_X86_32
 # include "cmpxchg_32.h"
 #else
@@ -151,11 +176,20 @@ extern void __xadd_wrong_size(void);
 #define cmpxchg(ptr, old, new)						\
 	__cmpxchg((ptr), (old), (new), sizeof(*ptr))
 
+#define cmpxchg_flag(ptr, old, new)					\
+	__cmpxchg_flag((ptr), (old), (new), sizeof(*ptr))
+
 #define sync_cmpxchg(ptr, old, new)					\
 	__sync_cmpxchg((ptr), (old), (new), sizeof(*ptr))
 
+#define sync_cmpxchg_flag(ptr, old, new)				\
+	__sync_cmpxchg_flag((ptr), (old), (new), sizeof(*ptr))
+
 #define cmpxchg_local(ptr, old, new)					\
 	__cmpxchg_local((ptr), (old), (new), sizeof(*ptr))
+
+#define cmpxchg_local_flag(ptr, old, new)				\
+	__cmpxchg_local_flag((ptr), (old), (new), sizeof(*ptr))
 #endif
 
 #define __xadd(ptr, inc, lock)						\
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 3b573f6..0797bc6 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -111,6 +111,13 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 				sizeof(*(ptr)));			\
 	__ret;								\
 })
+
+#define cmpxchg_flag(ptr, o, n)			\
+({						\
+	__typeof__(*(ptr)) __orig = (o);	\
+	cmpxchg((ptr), __orig, (n)) == __orig;	\
+})
+
 #define cmpxchg_local(ptr, o, n)					\
 ({									\
 	__typeof__(*(ptr)) __ret;					\
@@ -124,6 +131,13 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 				sizeof(*(ptr)));			\
 	__ret;								\
 })
+
+#define cmpxchg_local_flag(ptr, o, n)			\
+({							\
+	__typeof__(*(ptr)) __orig = (o);		\
+	cmpxchg_local((ptr), __orig, (n)) == __orig;	\
+})
+
 #endif
 
 #ifndef CONFIG_X86_CMPXCHG64
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/12] x86: use cmpxchg_flag() where applicable
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (4 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 06/12] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:56     ` Linus Torvalds
  2011-08-24 21:37   ` [PATCH 08/12] x86: use xadd helper more widely Jeremy Fitzhardinge
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cacheflush.h |    2 +-
 arch/x86/include/asm/rwsem.h      |    7 ++-----
 arch/x86/kernel/acpi/boot.c       |   10 ++++------
 arch/x86/kernel/cpu/mcheck/mce.c  |    2 +-
 arch/x86/xen/p2m.c                |   10 +++++-----
 5 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 4e12668..6ceae20 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -56,7 +56,7 @@ static inline void set_page_memtype(struct page *pg, unsigned long memtype)
 	do {
 		old_flags = pg->flags;
 		new_flags = (old_flags & _PGMT_CLEAR_MASK) | memtype_flags;
-	} while (cmpxchg(&pg->flags, old_flags, new_flags) != old_flags);
+	} while (!cmpxchg_flag(&pg->flags, old_flags, new_flags));
 }
 #else
 static inline unsigned long get_page_memtype(struct page *pg) { return -1; }
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index df4cd32..5a35263 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -126,11 +126,8 @@ static inline void __down_write(struct rw_semaphore *sem)
  */
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long ret = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-			   RWSEM_ACTIVE_WRITE_BIAS);
-	if (ret == RWSEM_UNLOCKED_VALUE)
-		return 1;
-	return 0;
+	return cmpxchg_flag(&sem->count, RWSEM_UNLOCKED_VALUE,
+			    RWSEM_ACTIVE_WRITE_BIAS);
 }
 
 /*
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9a966c5..6bf0bf8 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1669,22 +1669,20 @@ early_param("acpi_sci", setup_acpi_sci);
 
 int __acpi_acquire_global_lock(unsigned int *lock)
 {
-	unsigned int old, new, val;
+	unsigned int old, new;
 	do {
 		old = *lock;
 		new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1));
-		val = cmpxchg(lock, old, new);
-	} while (unlikely (val != old));
+	} while (unlikely (!cmpxchg_flag(lock, old, new)));
 	return (new < 3) ? -1 : 0;
 }
 
 int __acpi_release_global_lock(unsigned int *lock)
 {
-	unsigned int old, new, val;
+	unsigned int old, new;
 	do {
 		old = *lock;
 		new = old & ~0x3;
-		val = cmpxchg(lock, old, new);
-	} while (unlikely (val != old));
+	} while (unlikely (!cmpxchg_flag(lock, old, new)));
 	return old & 0x1;
 }
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3385ea2..99ae2d1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -198,7 +198,7 @@ void mce_log(struct mce *mce)
 		}
 		smp_rmb();
 		next = entry + 1;
-		if (cmpxchg(&mcelog.next, entry, next) == entry)
+		if (cmpxchg_flag(&mcelog.next, entry, next))
 			break;
 	}
 	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 141eb0d..5510082 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -448,7 +448,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_mid_init(mid);
 
-		if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
+		if (!cmpxchg_flag(top_p, p2m_mid_missing, mid))
 			free_p2m_page(mid);
 	}
 
@@ -470,7 +470,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
 		mid_mfn_mfn = virt_to_mfn(mid_mfn);
-		if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
+		if (!cmpxchg_flag(top_mfn_p, missing_mfn, mid_mfn_mfn))
 			free_p2m_page(mid_mfn);
 		else
 			p2m_top_mfn_p[topidx] = mid_mfn;
@@ -488,7 +488,7 @@ static bool alloc_p2m(unsigned long pfn)
 
 		p2m_init(p2m);
 
-		if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig)
+		if (!cmpxchg_flag(&mid[mididx], p2m_orig, p2m))
 			free_p2m_page(p2m);
 		else
 			mid_mfn[mididx] = virt_to_mfn(p2m);
@@ -600,8 +600,8 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 
 		/* Swap over from MISSING to IDENTITY if needed. */
 		if (p2m_top[topidx][mididx] == p2m_missing) {
-			WARN_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing,
-				p2m_identity) != p2m_missing);
+			WARN_ON(!cmpxchg_flag(&p2m_top[topidx][mididx], p2m_missing,
+					      p2m_identity));
 			return true;
 		}
 	}
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/12] x86: use xadd helper more widely
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (5 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 07/12] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 09/12] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

This covers the trivial cases.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/atomic.h      |    8 ++------
 arch/x86/include/asm/atomic64_64.h |    6 +-----
 arch/x86/include/asm/rwsem.h       |    8 +-------
 arch/x86/include/asm/uv/uv_bau.h   |    6 +-----
 4 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 952a826..dfc1b7b 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -172,18 +172,14 @@ static inline int atomic_add_negative(int i, atomic_t *v)
  */
 static inline int atomic_add_return(int i, atomic_t *v)
 {
-	int __i;
 #ifdef CONFIG_M386
+	int __i;
 	unsigned long flags;
 	if (unlikely(boot_cpu_data.x86 <= 3))
 		goto no_xadd;
 #endif
 	/* Modern 486+ processor */
-	__i = i;
-	asm volatile(LOCK_PREFIX "xaddl %0, %1"
-		     : "+r" (i), "+m" (v->counter)
-		     : : "memory");
-	return i + __i;
+	return i + xadd(&v->counter, i);
 
 #ifdef CONFIG_M386
 no_xadd: /* Legacy 386 processor */
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 49fd1ea..d76cdaa 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -170,11 +170,7 @@ static inline int atomic64_add_negative(long i, atomic64_t *v)
  */
 static inline long atomic64_add_return(long i, atomic64_t *v)
 {
-	long __i = i;
-	asm volatile(LOCK_PREFIX "xaddq %0, %1;"
-		     : "+r" (i), "+m" (v->counter)
-		     : : "memory");
-	return i + __i;
+	return i + xadd(&v->counter, i);
 }
 
 static inline long atomic64_sub_return(long i, atomic64_t *v)
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 5a35263..1751107 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -201,13 +201,7 @@ static inline void rwsem_atomic_add(long delta, struct rw_semaphore *sem)
  */
 static inline long rwsem_atomic_update(long delta, struct rw_semaphore *sem)
 {
-	long tmp = delta;
-
-	asm volatile(LOCK_PREFIX "xadd %0,%1"
-		     : "+r" (tmp), "+m" (sem->count)
-		     : : "memory");
-
-	return tmp + delta;
+	return delta + xadd(&sem->count, delta);
 }
 
 #endif /* __KERNEL__ */
diff --git a/arch/x86/include/asm/uv/uv_bau.h b/arch/x86/include/asm/uv/uv_bau.h
index 130f1ee..ebc9344 100644
--- a/arch/x86/include/asm/uv/uv_bau.h
+++ b/arch/x86/include/asm/uv/uv_bau.h
@@ -487,11 +487,7 @@ static inline int atomic_read_short(const struct atomic_short *v)
  */
 static inline int atomic_add_short_return(short i, struct atomic_short *v)
 {
-	short __i = i;
-	asm volatile(LOCK_PREFIX "xaddw %0, %1"
-			: "+r" (i), "+m" (v->counter)
-			: : "memory");
-	return i + __i;
+	return i + xadd(&v->counter, i);
 }
 
 #endif /* _ASM_X86_UV_UV_BAU_H */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/12] x86/ticketlock: clean up types and accessors
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (6 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 08/12] x86: use xadd helper more widely Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 10/12] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   24 ++++++++++--------------
 arch/x86/include/asm/spinlock_types.h |   20 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	unsigned short inc = 1 << TICKET_SHIFT;
 
 	asm volatile (
 		LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp, new;
+	unsigned int tmp, new;
 
 	asm volatile("movzwl %2, %0\n\t"
 		     "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	int inc = 0x00010000;
-	int tmp;
+	unsigned inc = 1 << TICKET_SHIFT;
+	unsigned tmp;
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 		     "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp;
-	int new;
+	unsigned tmp;
+	unsigned new;
 
 	asm volatile("movl %2,%0\n\t"
 		     "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
+#define TICKET_MASK	((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-	unsigned int slock;
+	union {
+		unsigned int slock;
+		struct __raw_tickets {
+			__ticket_t head, tail;
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/12] x86/ticketlock: convert spin loop to C
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (7 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 09/12] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 11/12] x86/ticketlock: convert __ticket_spin_lock to use xadd() Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 12/12] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   58 +++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..1c8a23c 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned short inc = 1 << TICKET_SHIFT;
-
-	asm volatile (
-		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
-		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
-		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
-		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
-		:
-		: "memory", "cc");
+	register union {
+		struct __raw_tickets tickets;
+		unsigned short slock;
+	} inc = { .slock = 1 << TICKET_SHIFT };
+
+	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+	for (;;) {
+		if (inc.tickets.head == inc.tickets.tail)
+			break;
+		cpu_relax();
+		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+	}
+	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned inc = 1 << TICKET_SHIFT;
-	unsigned tmp;
+	__ticket_t tmp;
 
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-		     "movzwl %w0, %2\n\t"
-		     "shrl $16, %0\n\t"
-		     "1:\t"
-		     "cmpl %0, %2\n\t"
-		     "je 2f\n\t"
-		     "rep ; nop\n\t"
-		     "movzwl %1, %2\n\t"
-		     /* don't need lfence here, because loads are in-order */
-		     "jmp 1b\n"
-		     "2:"
-		     : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-		     :
-		     : "memory", "cc");
+	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+		     : "+r" (inc), "+m" (lock->slock)
+		     : : "memory", "cc");
+
+	tmp = inc;
+	inc >>= TICKET_SHIFT;
+
+	for (;;) {
+		if ((__ticket_t)inc == tmp)
+			break;
+		cpu_relax();
+		tmp = ACCESS_ONCE(lock->tickets.head);
+	}
+	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/12] x86/ticketlock: convert __ticket_spin_lock to use xadd()
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (8 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 10/12] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  2011-08-24 21:37   ` [PATCH 12/12] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Convert the two variants of __ticket_spin_lock() to use xadd(), which
has the effect of making them identical, so remove the duplicate function.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   35 +++++------------------------------
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 1c8a23c..adf010a 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -55,26 +55,22 @@
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of line.
  */
-#if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	register union {
-		struct __raw_tickets tickets;
-		unsigned short slock;
-	} inc = { .slock = 1 << TICKET_SHIFT };
+	register struct __raw_tickets inc = { .tail = 1 };
 
-	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+	inc = xadd(&lock->tickets, inc);
 
 	for (;;) {
-		if (inc.tickets.head == inc.tickets.tail)
+		if (inc.head == inc.tail)
 			break;
 		cpu_relax();
-		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp, new;
@@ -102,27 +98,6 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-	unsigned inc = 1 << TICKET_SHIFT;
-	__ticket_t tmp;
-
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->slock)
-		     : : "memory", "cc");
-
-	tmp = inc;
-	inc >>= TICKET_SHIFT;
-
-	for (;;) {
-		if ((__ticket_t)inc == tmp)
-			break;
-		cpu_relax();
-		tmp = ACCESS_ONCE(lock->tickets.head);
-	}
-	barrier();		/* make sure nothing creeps before the lock is taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned tmp;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/12] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
                     ` (9 preceding siblings ...)
  2011-08-24 21:37   ` [PATCH 11/12] x86/ticketlock: convert __ticket_spin_lock to use xadd() Jeremy Fitzhardinge
@ 2011-08-24 21:37   ` Jeremy Fitzhardinge
  10 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make trylock code common regardless of ticket size.

(Also, rename arch_spinlock.slock to head_tail.)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   51 ++++++++-------------------------
 arch/x86/include/asm/spinlock_types.h |    6 ++-
 2 files changed, 16 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index adf010a..69f96d2 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,60 +70,33 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp, new;
-
-	asm volatile("movzwl %2, %0\n\t"
-		     "cmpb %h0,%b0\n\t"
-		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	arch_spinlock_t old, new;
+
+	old.tickets = ACCESS_ONCE(lock->tickets);
+	if (old.tickets.head != old.tickets.tail)
+		return 0;
 
-	return tmp;
+	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+	/* cmpxchg is a full barrier, so nothing can move before it */
+	return cmpxchg_flag(&lock->head_tail, old.head_tail, new.head_tail);
 }
 
+#if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
 	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
+		     : "+m" (lock->head_tail)
 		     :
 		     : "memory", "cc");
 }
 #else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned tmp;
-	unsigned new;
-
-	asm volatile("movl %2,%0\n\t"
-		     "movl %0,%1\n\t"
-		     "roll $16, %0\n\t"
-		     "cmpl %0,%1\n\t"
-		     "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
-}
-
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
 	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-		     : "+m" (lock->slock)
+		     : "+m" (lock->head_tail)
 		     :
 		     : "memory", "cc");
 }
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
 	union {
-		unsigned int slock;
+		__ticketpair_t head_tail;
 		struct __raw_tickets {
 			__ticket_t head, tail;
 		} tickets;
 	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 20:00   ` Andi Kleen
@ 2011-08-24 21:38     ` Linus Torvalds
  2011-08-24 21:43       ` Jeremy Fitzhardinge
  2011-08-24 21:43       ` Andi Kleen
  0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2011-08-24 21:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 1:00 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
> Can't we just get rid of the small ticket size code path?
>
> I couldn't benchmark any difference between the two last time
> I tried.

The small locks aren't any faster. They are just smaller.

Many data structures have spinlocks inside of them, and the smaller
spinlock *should* be able to result in smaller data structures.

Of course, that assumes that they have been packed correctly. And they
seldom are ;(

Looking at 'struct task_struct', for example, the spinlocks there
aren't next to each other, and have pointers and 'unsigned int's
around them, so rather than shrinking the data structure, it just
results in holes.

                       Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 21:38     ` Linus Torvalds
@ 2011-08-24 21:43       ` Jeremy Fitzhardinge
  2011-08-24 21:43       ` Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 02:38 PM, Linus Torvalds wrote:
> Many data structures have spinlocks inside of them, and the smaller
> spinlock *should* be able to result in smaller data structures.
>
> Of course, that assumes that they have been packed correctly. And they
> seldom are ;(
>
> Looking at 'struct task_struct', for example, the spinlocks there
> aren't next to each other, and have pointers and 'unsigned int's
> around them, so rather than shrinking the data structure, it just
> results in holes.

Wouldn't sticking all the spinlocks together just result in false
sharing?  Wouldn't it be best to put them right next to the fields they
protect so the act of getting the lock also gets you your data?

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 21:38     ` Linus Torvalds
  2011-08-24 21:43       ` Jeremy Fitzhardinge
@ 2011-08-24 21:43       ` Andi Kleen
  2011-08-24 21:48         ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2011-08-24 21:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Jeremy Fitzhardinge, H. Peter Anvin, Peter Zijlstra,
	Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Nick Piggin, Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 02:38:12PM -0700, Linus Torvalds wrote:
> On Wed, Aug 24, 2011 at 1:00 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > Can't we just get rid of the small ticket size code path?
> >
> > I couldn't benchmark any difference between the two last time
> > I tried.
> 
> The small locks aren't any faster. They are just smaller.

Are you sure? 

AFAIK it's always

typedef struct arch_spinlock {
        unsigned int slock;
} arch_spinlock_t;

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
                   ` (19 preceding siblings ...)
  2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
@ 2011-08-24 21:46 ` Jeremy Fitzhardinge
  20 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 02:36 PM, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> Hi all,
>
> Another spin of the cmpxchg/xadd/ticketlock cleanups.
>
> Changes from last post:
>   - reorder all the cmpxchg/xadd patches first
>   - change xadd to return prev rather than implicitly update arg
>   - recast ticketlock cleanup in terms of xadd
>   - drop conversion of unlock to C
>
> This is also available at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/ticketlock-cleanup
>
>  arch/x86/include/asm/atomic.h         |    8 +-
>  arch/x86/include/asm/atomic64_64.h    |    6 +-
>  arch/x86/include/asm/cacheflush.h     |    2 +-
>  arch/x86/include/asm/cmpxchg.h        |  232 +++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/cmpxchg_32.h     |  128 ++----------------
>  arch/x86/include/asm/cmpxchg_64.h     |  131 -------------------
>  arch/x86/include/asm/rwsem.h          |   15 +--
>  arch/x86/include/asm/spinlock.h       |  110 ++++------------
>  arch/x86/include/asm/spinlock_types.h |   22 +++-
>  arch/x86/include/asm/uv/uv_bau.h      |    6 +-
>  arch/x86/kernel/acpi/boot.c           |   10 +-
>  arch/x86/kernel/cpu/mcheck/mce.c      |    2 +-
>  arch/x86/xen/p2m.c                    |   10 +-
>  13 files changed, 311 insertions(+), 371 deletions(-)
>
> Thanks,
> 	J

Uh, that's 00/12, and ignore the below; that's left over from the
previous posting.

    J

>
> Jeremy Fitzhardinge (18):
>   x86/ticketlock: clean up types and accessors
>   x86/ticketlock: convert spin loop to C
>   x86/ticketlock: Use C for __ticket_spin_unlock
>   x86/ticketlock: make large and small ticket versions of spin_lock the
>     same
>   x86/ticketlock: make __ticket_spin_lock common
>   x86/ticketlock: make __ticket_spin_trylock common
>   x86: add xadd helper macro
>   x86/ticketlock: use xadd helper
>   x86/cmpxchg: linux/alternative.h has LOCK_PREFIX
>   x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit.
>   x86/cmpxchg: move 64-bit set64_bit() to match 32-bit
>   x86/cmpxchg: unify cmpxchg into cmpxchg.h
>   x86: add cmpxchg_flag() variant
>   x86/ticketlocks: use cmpxchg_flag for trylock
>   x86: use cmpxchg_flag() where applicable
>   x86: report xchg/cmpxchg/xadd usage errors consistently
>   x86: add local and sync variants of xadd
>   x86: use xadd helper more widely
>
>  arch/x86/include/asm/atomic.h         |    4 +-
>  arch/x86/include/asm/atomic64_64.h    |    4 +-
>  arch/x86/include/asm/cacheflush.h     |    2 +-
>  arch/x86/include/asm/cmpxchg.h        |  223 +++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/cmpxchg_32.h     |  128 ++-----------------
>  arch/x86/include/asm/cmpxchg_64.h     |  131 -------------------
>  arch/x86/include/asm/rwsem.h          |   11 +--
>  arch/x86/include/asm/spinlock.h       |  140 +++++++--------------
>  arch/x86/include/asm/spinlock_types.h |   22 +++-
>  arch/x86/include/asm/uv/uv_bau.h      |    4 +-
>  arch/x86/kernel/acpi/boot.c           |   10 +-
>  arch/x86/kernel/cpu/mcheck/mce.c      |    2 +-
>  arch/x86/xen/p2m.c                    |   10 +-
>  13 files changed, 321 insertions(+), 370 deletions(-)
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 21:43       ` Andi Kleen
@ 2011-08-24 21:48         ` Jeremy Fitzhardinge
  2011-08-24 21:53           ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 21:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 02:43 PM, Andi Kleen wrote:
> On Wed, Aug 24, 2011 at 02:38:12PM -0700, Linus Torvalds wrote:
>> On Wed, Aug 24, 2011 at 1:00 PM, Andi Kleen <andi@firstfloor.org> wrote:
>>> Can't we just get rid of the small ticket size code path?
>>>
>>> I couldn't benchmark any difference between the two last time
>>> I tried.
>> The small locks aren't any faster. They are just smaller.
> Are you sure? 
>
> AFAIK it's always
>
> typedef struct arch_spinlock {
>         unsigned int slock;
> } arch_spinlock_t;

Hm, I changed that in this series; arch_spinlock can be 16 bits in the
<256 case.  I wonder if it matters...

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common
  2011-08-24 21:48         ` Jeremy Fitzhardinge
@ 2011-08-24 21:53           ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2011-08-24 21:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andi Kleen, Linus Torvalds, H. Peter Anvin, Peter Zijlstra,
	Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Nick Piggin, Jeremy Fitzhardinge

> Hm, I changed that in this series; arch_spinlock can be 16 bits in the
> <256 case.  I wonder if it matters...

Oh I see.  You could diff pahole listings of the kernel, but I suppose
as Linus said, there will be mainly lots of new holes.

Besides mainstream systems (<= 8S) are steadily marching towards
the 256T barrier.

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/12] x86: use cmpxchg_flag() where applicable
  2011-08-24 21:37   ` [PATCH 07/12] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
@ 2011-08-24 21:56     ` Linus Torvalds
  2011-08-24 22:01       ` H. Peter Anvin
  2011-08-24 22:02       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2011-08-24 21:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

Ok, I see nothing horrible in this series.

The one reaction I have is that the cmpxchg_flag() thing returns an
8-bit value, but then a lot of the users end up having to extend it to
a full "int" purely for calling convention reasons (eg I think
'down_write_trylock()' will have 'sete + movzl' - not a new problem,
but since the whole point was to remove extraneous instructions and we
no longer have the silly 'testl', it now annoys me more).

So it seems a bit sad. But I guess it doesn't really matter.

                          Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/12] x86: use cmpxchg_flag() where applicable
  2011-08-24 21:56     ` Linus Torvalds
@ 2011-08-24 22:01       ` H. Peter Anvin
  2011-08-24 22:03         ` Jeremy Fitzhardinge
  2011-08-24 22:02       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2011-08-24 22:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeremy Fitzhardinge, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 02:56 PM, Linus Torvalds wrote:
> Ok, I see nothing horrible in this series.
> 
> The one reaction I have is that the cmpxchg_flag() thing returns an
> 8-bit value, but then a lot of the users end up having to extend it to
> a full "int" purely for calling convention reasons (eg I think
> 'down_write_trylock()' will have 'sete + movzl' - not a new problem,
> but since the whole point was to remove extraneous instructions and we
> no longer have the silly 'testl', it now annoys me more).
> 
> So it seems a bit sad. But I guess it doesn't really matter.
> 

I think it is a net lose.  The most common case is probably going to be
to use it immediately, in which case we have:

cmpxchg -> sete -> compare -> conditional

versus

cmpxchg -> compare -> conditional

For doubleword cmpxchg it's another matter entirely, because doubleword
comparisons are significantly more expensive that sete + test.

So unless there is actual data showing this is better, I would like to
see this dropped for now.

	-hpa

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/12] x86: use cmpxchg_flag() where applicable
  2011-08-24 21:56     ` Linus Torvalds
  2011-08-24 22:01       ` H. Peter Anvin
@ 2011-08-24 22:02       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 22:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 02:56 PM, Linus Torvalds wrote:
> Ok, I see nothing horrible in this series.
>
> The one reaction I have is that the cmpxchg_flag() thing returns an
> 8-bit value, but then a lot of the users end up having to extend it to
> a full "int" purely for calling convention reasons (eg I think
> 'down_write_trylock()' will have 'sete + movzl' - not a new problem,
> but since the whole point was to remove extraneous instructions and we
> no longer have the silly 'testl', it now annoys me more).
>
> So it seems a bit sad. But I guess it doesn't really matter.

It's still annoying because in practice sete sets a result, but most
users are just going to have to testb it to do a conditional anyway -
it's a pity there's no way to tell gcc that an asm has already set the
flags, and it can generate a jcc/setcc based on that.

I experimented with a variant based on "asm goto", so that the
conditional control flow is done by the asm itself, and gcc should be
able to make use of that (via constant folding).  But the resulting code
wasn't very spectacular, with quite a few unnecessary jumps.

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/12] x86: use cmpxchg_flag() where applicable
  2011-08-24 22:01       ` H. Peter Anvin
@ 2011-08-24 22:03         ` Jeremy Fitzhardinge
  2011-08-24 22:05           ` H. Peter Anvin
  0 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 22:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 03:01 PM, H. Peter Anvin wrote:
> On 08/24/2011 02:56 PM, Linus Torvalds wrote:
>> Ok, I see nothing horrible in this series.
>>
>> The one reaction I have is that the cmpxchg_flag() thing returns an
>> 8-bit value, but then a lot of the users end up having to extend it to
>> a full "int" purely for calling convention reasons (eg I think
>> 'down_write_trylock()' will have 'sete + movzl' - not a new problem,
>> but since the whole point was to remove extraneous instructions and we
>> no longer have the silly 'testl', it now annoys me more).
>>
>> So it seems a bit sad. But I guess it doesn't really matter.
>>
> I think it is a net lose.  The most common case is probably going to be
> to use it immediately, in which case we have:
>
> cmpxchg -> sete -> compare -> conditional
>
> versus
>
> cmpxchg -> compare -> conditional
>
> For doubleword cmpxchg it's another matter entirely, because doubleword
> comparisons are significantly more expensive that sete + test.
>
> So unless there is actual data showing this is better, I would like to
> see this dropped for now.

Well, we could keep the API (since it is convenient), but just implement
it with a compare.

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/12] x86: use cmpxchg_flag() where applicable
  2011-08-24 22:03         ` Jeremy Fitzhardinge
@ 2011-08-24 22:05           ` H. Peter Anvin
  0 siblings, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2011-08-24 22:05 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 03:03 PM, Jeremy Fitzhardinge wrote:
> 
> Well, we could keep the API (since it is convenient), but just implement
> it with a compare.
> 

It seems like a gratuitous change at this point to me, but I don't have
a strong opinion one way or the other.

	-hpa


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 20:10     ` Jeremy Fitzhardinge
@ 2011-08-24 22:53       ` Linus Torvalds
  2011-08-24 22:59         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2011-08-24 22:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 1:10 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
> Could we just kill SMP support for OOSTORE machines now?  That would be
> the cleanest possible fix...

No it wouldn't. The asm version would *still* be cleaner than the "C
plus random barriers".

It's not like the C version is "portable" in any case.

                         Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 22:53       ` Linus Torvalds
@ 2011-08-24 22:59         ` Jeremy Fitzhardinge
  2011-08-24 23:05           ` H. Peter Anvin
  0 siblings, 1 reply; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 22:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 03:53 PM, Linus Torvalds wrote:
> On Wed, Aug 24, 2011 at 1:10 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> Could we just kill SMP support for OOSTORE machines now?  That would be
>> the cleanest possible fix...
> No it wouldn't. The asm version would *still* be cleaner than the "C
> plus random barriers".
>
> It's not like the C version is "portable" in any case.

If there's no need to have a locked instruction, then it could simply be:

    barrier();
    lock->head++;
    barrier();

with no need for asm at all.

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 22:59         ` Jeremy Fitzhardinge
@ 2011-08-24 23:05           ` H. Peter Anvin
  2011-08-24 23:09             ` Linus Torvalds
  2011-08-24 23:10             ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 55+ messages in thread
From: H. Peter Anvin @ 2011-08-24 23:05 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 03:59 PM, Jeremy Fitzhardinge wrote:
> On 08/24/2011 03:53 PM, Linus Torvalds wrote:
>> On Wed, Aug 24, 2011 at 1:10 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>> Could we just kill SMP support for OOSTORE machines now?  That would be
>>> the cleanest possible fix...
>> No it wouldn't. The asm version would *still* be cleaner than the "C
>> plus random barriers".
>>
>> It's not like the C version is "portable" in any case.
> 
> If there's no need to have a locked instruction, then it could simply be:
> 
>     barrier();
>     lock->head++;
>     barrier();
> 
> with no need for asm at all.
> 

That's not guaranteed in any way to generate a locally atomic instruction.

	-hpa



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 23:05           ` H. Peter Anvin
@ 2011-08-24 23:09             ` Linus Torvalds
  2011-08-24 23:21               ` Linus Torvalds
  2011-08-24 23:10             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2011-08-24 23:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 4:05 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> That's not guaranteed in any way to generate a locally atomic instruction.

No such guarantee is needed - we hold the lock (until the last store),
so as long as it doesn't do anything completely crazy, then the
"head++" will work.

However, the reason I disagree with it is that I don't think that it's
any prettier at all to have the two barriers than it is to just have
the asm.

I see no advantage of a three-line "pseudo-C" code with two magic
barriers over just doing it with the inline asm. The fact that the
inline asm also makes the OOSTORE case trivial is just gravy.

                     Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 23:05           ` H. Peter Anvin
  2011-08-24 23:09             ` Linus Torvalds
@ 2011-08-24 23:10             ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 23:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 04:05 PM, H. Peter Anvin wrote:
> On 08/24/2011 03:59 PM, Jeremy Fitzhardinge wrote:
>> On 08/24/2011 03:53 PM, Linus Torvalds wrote:
>>> On Wed, Aug 24, 2011 at 1:10 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>>> Could we just kill SMP support for OOSTORE machines now?  That would be
>>>> the cleanest possible fix...
>>> No it wouldn't. The asm version would *still* be cleaner than the "C
>>> plus random barriers".
>>>
>>> It's not like the C version is "portable" in any case.
>> If there's no need to have a locked instruction, then it could simply be:
>>
>>     barrier();
>>     lock->head++;
>>     barrier();
>>
>> with no need for asm at all.
>>
> That's not guaranteed in any way to generate a locally atomic instruction.

Doesn't need to be.  The final write needs to be locally atomic, but we
assume that a lot.

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 23:09             ` Linus Torvalds
@ 2011-08-24 23:21               ` Linus Torvalds
  2011-08-24 23:30                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2011-08-24 23:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Wed, Aug 24, 2011 at 4:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> However, the reason I disagree with it is that I don't think that it's
> any prettier at all to have the two barriers than it is to just have
> the asm.

.. and btw, we probably do need *both* barriers. We definitely need
the one before. The one after is a bit less obvious, since it is
technically legal for code to move into the locked region. However,
it's not necessarily a *good* idea for code to move into the locked
region, so the two barriers are likely the RightThing(tm). But the two
barriers are what makes me think that the C version really isn't any
better. And the OOSTORE case then just clinches the deal for me.

                         Linus

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup
  2011-08-24 23:21               ` Linus Torvalds
@ 2011-08-24 23:30                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 55+ messages in thread
From: Jeremy Fitzhardinge @ 2011-08-24 23:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Peter Zijlstra, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 08/24/2011 04:21 PM, Linus Torvalds wrote:
> On Wed, Aug 24, 2011 at 4:09 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> However, the reason I disagree with it is that I don't think that it's
>> any prettier at all to have the two barriers than it is to just have
>> the asm.
> .. and btw, we probably do need *both* barriers. We definitely need
> the one before. The one after is a bit less obvious, since it is
> technically legal for code to move into the locked region. However,
> it's not necessarily a *good* idea for code to move into the locked
> region, so the two barriers are likely the RightThing(tm).

Originally I left the second barrier off for that reason, but I got
mysterious lockups.  The second barrier fixed them, so I never got
around to do a full root-cause analysis.

I still think the C version is more straightforward given that the asm
version is confused with the details of the ticket sizes, etc.  But,
shrug, its a pretty minor detail.

The OOSTORE stuff is a complete red herring; I bet its been *years*
since someone specifically compiled a kernel with OOSTORE SMP support
because they actually wanted to use it.

    J

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2011-08-24 23:30 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-24 17:52 [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge
2011-08-24 17:52 ` [PATCH 01/18] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-08-24 17:52 ` [PATCH 02/18] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
2011-08-24 20:02   ` Andi Kleen
2011-08-24 17:52 ` [PATCH 03/18] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
2011-08-24 18:01   ` Linus Torvalds
2011-08-24 17:52 ` [PATCH 04/18] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
2011-08-24 17:52 ` [PATCH 05/18] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 06/18] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-08-24 20:00   ` Andi Kleen
2011-08-24 21:38     ` Linus Torvalds
2011-08-24 21:43       ` Jeremy Fitzhardinge
2011-08-24 21:43       ` Andi Kleen
2011-08-24 21:48         ` Jeremy Fitzhardinge
2011-08-24 21:53           ` Andi Kleen
2011-08-24 17:53 ` [PATCH 07/18] x86: add xadd helper macro Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 08/18] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 09/18] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 10/18] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 11/18] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 12/18] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 13/18] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 14/18] x86/ticketlocks: use cmpxchg_flag for trylock Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 15/18] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 16/18] x86: report xchg/cmpxchg/xadd usage errors consistently Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 17/18] x86: add local and sync variants of xadd Jeremy Fitzhardinge
2011-08-24 17:53 ` [PATCH 18/18] x86: use xadd helper more widely Jeremy Fitzhardinge
2011-08-24 18:03 ` [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Peter Zijlstra
2011-08-24 18:24   ` Linus Torvalds
2011-08-24 20:10     ` Jeremy Fitzhardinge
2011-08-24 22:53       ` Linus Torvalds
2011-08-24 22:59         ` Jeremy Fitzhardinge
2011-08-24 23:05           ` H. Peter Anvin
2011-08-24 23:09             ` Linus Torvalds
2011-08-24 23:21               ` Linus Torvalds
2011-08-24 23:30                 ` Jeremy Fitzhardinge
2011-08-24 23:10             ` Jeremy Fitzhardinge
2011-08-24 21:36 ` [PATCH 01/12] x86/cmpxchg: linux/alternative.h has LOCK_PREFIX Jeremy Fitzhardinge
2011-08-24 21:36   ` [PATCH 02/12] x86/cmpxchg: move 32-bit __cmpxchg_wrong_size to match 64 bit Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 03/12] x86/cmpxchg: move 64-bit set64_bit() to match 32-bit Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 04/12] x86/cmpxchg: unify cmpxchg into cmpxchg.h Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 05/12] x86: add xadd helper macro Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 06/12] x86: add cmpxchg_flag() variant Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 07/12] x86: use cmpxchg_flag() where applicable Jeremy Fitzhardinge
2011-08-24 21:56     ` Linus Torvalds
2011-08-24 22:01       ` H. Peter Anvin
2011-08-24 22:03         ` Jeremy Fitzhardinge
2011-08-24 22:05           ` H. Peter Anvin
2011-08-24 22:02       ` Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 08/12] x86: use xadd helper more widely Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 09/12] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 10/12] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 11/12] x86/ticketlock: convert __ticket_spin_lock to use xadd() Jeremy Fitzhardinge
2011-08-24 21:37   ` [PATCH 12/12] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-08-24 21:46 ` [PATCH 00/18] x86: Ticket lock + cmpxchg cleanup Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.