linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
@ 2011-06-16 21:40 Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 1/7] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                   ` (12 more replies)
  0 siblings, 13 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Hi all,

I'm proposing this series for 3[.0].1.

This is a repost of a series to clean up the x86 ticket lock
code by converting it to a mostly C implementation and removing
lots of duplicate code relating to the ticket size.

The last time I posted this series, the only significant comments
were from Nick Piggin, specifically relating to:

 1. A wrongly placed barrier on unlock (which may have allowed the
    compiler to move things out of the locked region.  I went
    belt-and-suspenders by having two barriers to prevent motion
    into or out of the locked region.

 2. With NR_CPUS < 256 the ticket size is 8 bits.  The compiler doesn't
    use the same trick as the hand-coded asm to directly compare the high
    and low bytes in the word, but does a bit of extra shuffling around.
    However, the Intel optimisation guide and several x86 experts have
    opined that its best to avoid the high-byte operations anyway, since
    they will cause a partial word stall, and the gcc-generated code should
    be better.

    Overall the compiler-generated code is very similar to the hand-coded
    versions, with the partial byte operations being the only significant
    difference. (Curiously, gcc does generate a high-byte compare for me
    in trylock, so it can if it wants to.)

I've been running with this code in place for several months on 4 core
systems without any problems.

I couldn't measure a consistent performance difference between the two
implemenations; there seemed to be +/- ~1% +/-, which is the level of
variation I see from simply recompiling the kernel with slightly
different code alignment.

Overall, I think the large reduction in code size is a big win.

Thanks,
	J

Jeremy Fitzhardinge (7):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
    same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86/ticketlock: prevent memory accesses from reordered out of lock
    region

 arch/x86/include/asm/spinlock.h       |  147 ++++++++++++---------------------
 arch/x86/include/asm/spinlock_types.h |   22 +++++-
 2 files changed, 74 insertions(+), 95 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/7] x86/ticketlock: clean up types and accessors
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
@ 2011-06-16 21:40 ` Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 2/7] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   24 ++++++++++--------------
 arch/x86/include/asm/spinlock_types.h |   20 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	unsigned short inc = 1 << TICKET_SHIFT;
 
 	asm volatile (
 		LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp, new;
+	unsigned int tmp, new;
 
 	asm volatile("movzwl %2, %0\n\t"
 		     "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	int inc = 0x00010000;
-	int tmp;
+	unsigned inc = 1 << TICKET_SHIFT;
+	unsigned tmp;
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 		     "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp;
-	int new;
+	unsigned tmp;
+	unsigned new;
 
 	asm volatile("movl %2,%0\n\t"
 		     "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
+#define TICKET_MASK	((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-	unsigned int slock;
+	union {
+		unsigned int slock;
+		struct __raw_tickets {
+			__ticket_t head, tail;
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/7] x86/ticketlock: convert spin loop to C
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 1/7] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-06-16 21:40 ` Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 3/7] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   58 +++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned short inc = 1 << TICKET_SHIFT;
-
-	asm volatile (
-		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
-		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
-		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
-		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
-		:
-		: "memory", "cc");
+	register union {
+		struct __raw_tickets tickets;
+		unsigned short slock;
+	} inc = { .slock = 1 << TICKET_SHIFT };
+
+	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+	for (;;) {
+		if (inc.tickets.head == inc.tickets.tail)
+			goto out;
+		cpu_relax();
+		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned inc = 1 << TICKET_SHIFT;
-	unsigned tmp;
+	__ticket_t tmp;
 
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-		     "movzwl %w0, %2\n\t"
-		     "shrl $16, %0\n\t"
-		     "1:\t"
-		     "cmpl %0, %2\n\t"
-		     "je 2f\n\t"
-		     "rep ; nop\n\t"
-		     "movzwl %1, %2\n\t"
-		     /* don't need lfence here, because loads are in-order */
-		     "jmp 1b\n"
-		     "2:"
-		     : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-		     :
-		     : "memory", "cc");
+	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+		     : "+r" (inc), "+m" (lock->slock)
+		     : : "memory", "cc");
+
+	tmp = inc;
+	inc >>= TICKET_SHIFT;
+
+	for (;;) {
+		if ((__ticket_t)inc == tmp)
+			goto out;
+		cpu_relax();
+		tmp = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/7] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 1/7] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 2/7] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-06-16 21:40 ` Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 4/7] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   32 +++++++++++++++++---------------
 1 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..0170ba9 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm (LOCK_PREFIX "incb %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+	else
+		asm (LOCK_PREFIX "incw %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	lock->tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	__ticket_unlock_release(lock);
+	barrier();		/* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/7] x86/ticketlock: make large and small ticket versions of spin_lock the same
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2011-06-16 21:40 ` [PATCH 3/7] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-06-16 21:40 ` Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 5/7] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   23 ++++++++---------------
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 0170ba9..1b81809 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	register union {
-		struct __raw_tickets tickets;
-		unsigned short slock;
-	} inc = { .slock = 1 << TICKET_SHIFT };
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
 	for (;;) {
-		if (inc.tickets.head == inc.tickets.tail)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned inc = 1 << TICKET_SHIFT;
-	__ticket_t tmp;
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->slock)
+		     : "+r" (inc), "+m" (lock->tickets)
 		     : : "memory", "cc");
 
-	tmp = inc;
-	inc >>= TICKET_SHIFT;
-
 	for (;;) {
-		if ((__ticket_t)inc == tmp)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		tmp = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/7] x86/ticketlock: make __ticket_spin_lock common
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2011-06-16 21:40 ` [PATCH 4/7] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
@ 2011-06-16 21:40 ` Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 6/7] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   42 ++++++++++++++++++--------------------
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 1b81809..f722f96 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets tickets = { .tail = 1 };
+
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+			      : "+r" (tickets), "+m" (lock->tickets)
+			      : : "memory", "cc");
+	else
+		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+			     : "+r" (tickets), "+m" (lock->tickets)
+			     : : "memory", "cc");
 
-	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+	return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+	register struct __raw_tickets inc;
+
+	inc = __ticket_spin_claim(lock);
 
 	for (;;) {
 		if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-	register struct __raw_tickets inc = { .tail = 1 };
-
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->tickets)
-		     : : "memory", "cc");
-
-	for (;;) {
-		if (inc.head == inc.tail)
-			goto out;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
-	}
-out:	barrier();		/* make sure nothing creeps before the lock is taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned tmp;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/7] x86/ticketlock: make __ticket_spin_trylock common
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (4 preceding siblings ...)
  2011-06-16 21:40 ` [PATCH 5/7] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
@ 2011-06-16 21:40 ` Jeremy Fitzhardinge
  2011-06-16 21:40 ` [PATCH 7/7] x86/ticketlock: prevent memory accesses from reordered out of lock region Jeremy Fitzhardinge
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   49 +++++++--------------------------
 arch/x86/include/asm/spinlock_types.h |    6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f722f96..3afb1a7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp, new;
-
-	asm volatile("movzwl %2, %0\n\t"
-		     "cmpb %h0,%b0\n\t"
-		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned tmp;
-	unsigned new;
-
-	asm volatile("movl %2,%0\n\t"
-		     "movl %0,%1\n\t"
-		     "roll $16, %0\n\t"
-		     "cmpl %0,%1\n\t"
-		     "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
+	arch_spinlock_t old, new;
+
+	old.tickets = ACCESS_ONCE(lock->tickets);
+	if (old.tickets.head != old.tickets.tail)
+		return 0;
+
+	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+	/* cmpxchg is a full barrier, so nothing can move before it */
+	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
 	union {
-		unsigned int slock;
+		__ticketpair_t head_tail;
 		struct __raw_tickets {
 			__ticket_t head, tail;
 		} tickets;
 	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 7/7] x86/ticketlock: prevent memory accesses from reordered out of lock region
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (5 preceding siblings ...)
  2011-06-16 21:40 ` [PATCH 6/7] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-06-16 21:40 ` Jeremy Fitzhardinge
  2011-06-21 14:01 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Peter Zijlstra
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-16 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3afb1a7..dac6fc6 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -114,6 +114,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
+	barrier();		/* prevent reordering out of locked region */
 	__ticket_unlock_release(lock);
 	barrier();		/* prevent reordering into locked region */
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (6 preceding siblings ...)
  2011-06-16 21:40 ` [PATCH 7/7] x86/ticketlock: prevent memory accesses from reordered out of lock region Jeremy Fitzhardinge
@ 2011-06-21 14:01 ` Peter Zijlstra
  2011-06-21 17:54   ` Jeremy Fitzhardinge
  2011-06-22 19:21   ` Jeremy Fitzhardinge
  2011-06-21 14:18 ` Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Peter Zijlstra @ 2011-06-21 14:01 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

On Thu, 2011-06-16 at 14:40 -0700, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Hi all,
> 
> I'm proposing this series for 3[.0].1.
> 
> This is a repost of a series to clean up the x86 ticket lock
> code by converting it to a mostly C implementation and removing
> lots of duplicate code relating to the ticket size.
> 
> The last time I posted this series, the only significant comments
> were from Nick Piggin, specifically relating to:
> 
>  1. A wrongly placed barrier on unlock (which may have allowed the
>     compiler to move things out of the locked region.  I went
>     belt-and-suspenders by having two barriers to prevent motion
>     into or out of the locked region.
> 
>  2. With NR_CPUS < 256 the ticket size is 8 bits.  The compiler doesn't
>     use the same trick as the hand-coded asm to directly compare the high
>     and low bytes in the word, but does a bit of extra shuffling around.
>     However, the Intel optimisation guide and several x86 experts have
>     opined that its best to avoid the high-byte operations anyway, since
>     they will cause a partial word stall, and the gcc-generated code should
>     be better.
> 
>     Overall the compiler-generated code is very similar to the hand-coded
>     versions, with the partial byte operations being the only significant
>     difference. (Curiously, gcc does generate a high-byte compare for me
>     in trylock, so it can if it wants to.)
> 
> I've been running with this code in place for several months on 4 core
> systems without any problems.
> 
> I couldn't measure a consistent performance difference between the two
> implemenations; there seemed to be +/- ~1% +/-, which is the level of
> variation I see from simply recompiling the kernel with slightly
> different code alignment.
> 
> Overall, I think the large reduction in code size is a big win.

No complaints from me, I rather like the result.

One other thing you could contemplate is adding something like:

#define xadd(ptr, inc)							\
do {									\
	switch(sizeof(*(ptr))) {					\
	case 1:								\
		asm volatile (LOCK_PREFIX "xaddb %0, %1\n"		\
                             : "+r" (inc), "+m" (*(ptr))		\
                             : : "memory", "cc");			\
	case 2:
		... xaddw ...
	case 4:
		... xaddl ...
} while (0)

and a similar something for inc. For both there seem to be various asm
bits left (we could even consider adding xadd to
arch/x86/include/asm/cmpxchg*.h).

Also, you might have wanted to CC Linus on this, he's usually interested
in these bits.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (7 preceding siblings ...)
  2011-06-21 14:01 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Peter Zijlstra
@ 2011-06-21 14:18 ` Konrad Rzeszutek Wilk
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-06-21 14:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Thu, Jun 16, 2011 at 02:40:47PM -0700, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Hi all,
> 
> I'm proposing this series for 3[.0].1.

Is there a git branch with these patches? I've looked over them
and they look good (and they make the code much easier to read)
but I would like to ingest them in my branches for some extra
testing goodness.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-21 14:01 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Peter Zijlstra
@ 2011-06-21 17:54   ` Jeremy Fitzhardinge
  2011-06-22 19:21   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-21 17:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

On 06/21/2011 07:01 AM, Peter Zijlstra wrote:
> No complaints from me, I rather like the result.

OK, good.

> One other thing you could contemplate is adding something like:
>
> #define xadd(ptr, inc)							\
> do {									\
> 	switch(sizeof(*(ptr))) {					\
> 	case 1:								\
> 		asm volatile (LOCK_PREFIX "xaddb %0, %1\n"		\
>                              : "+r" (inc), "+m" (*(ptr))		\
>                              : : "memory", "cc");			\
> 	case 2:
> 		... xaddw ...
> 	case 4:
> 		... xaddl ...
> } while (0)
>
> and a similar something for inc. For both there seem to be various asm
> bits left (we could even consider adding xadd to
> arch/x86/include/asm/cmpxchg*.h).
>
> Also, you might have wanted to CC Linus on this, he's usually interested
> in these bits.

OK, I'll add the xadd/inc helpers and repost.

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-21 14:01 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Peter Zijlstra
  2011-06-21 17:54   ` Jeremy Fitzhardinge
@ 2011-06-22 19:21   ` Jeremy Fitzhardinge
  2011-06-22 20:19     ` H. Peter Anvin
  1 sibling, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-22 19:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge,
	Linus Torvalds

On 06/21/2011 07:01 AM, Peter Zijlstra wrote:
> One other thing you could contemplate is adding something like:
>
> #define xadd(ptr, inc)							\
> do {									\
> 	switch(sizeof(*(ptr))) {					\
> 	case 1:								\
> 		asm volatile (LOCK_PREFIX "xaddb %0, %1\n"		\
>                              : "+r" (inc), "+m" (*(ptr))		\
>                              : : "memory", "cc");			\
> 	case 2:
> 		... xaddw ...
> 	case 4:
> 		... xaddl ...
> } while (0)
>
> and a similar something for inc. For both there seem to be various asm
> bits left (we could even consider adding xadd to
> arch/x86/include/asm/cmpxchg*.h).

A friend just pointed out that gcc has a "__sync_fetch_and_add()"
intrinsic, which compiles to xadd when used in this context.  What's the
general feeling about using these kinds of gcc features?

It also has __sync_bool_compare_and_swap(), which would simplify a lot
of asm/cmpxchg.h...

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-22 19:21   ` Jeremy Fitzhardinge
@ 2011-06-22 20:19     ` H. Peter Anvin
  2011-06-22 20:59       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2011-06-22 20:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge,
	Linus Torvalds

On 06/22/2011 12:21 PM, Jeremy Fitzhardinge wrote:
> 
> A friend just pointed out that gcc has a "__sync_fetch_and_add()"
> intrinsic, which compiles to xadd when used in this context.  What's the
> general feeling about using these kinds of gcc features?
> 

In general they are good, IF:

a) they cover all versions of gcc we care about (or we have a fallback),
and
b) they have the right semantics.

Using gcc intrinsics can generate better code than we can in inline
assembly.

However, (b) is a killer since gcc doesn't have a way to generate our
lock prefix annotations, and so it isn't really useful here.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-22 20:19     ` H. Peter Anvin
@ 2011-06-22 20:59       ` Jeremy Fitzhardinge
  2011-06-22 21:07         ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-22 20:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge,
	Linus Torvalds

On 06/22/2011 01:19 PM, H. Peter Anvin wrote:
> On 06/22/2011 12:21 PM, Jeremy Fitzhardinge wrote:
>> A friend just pointed out that gcc has a "__sync_fetch_and_add()"
>> intrinsic, which compiles to xadd when used in this context.  What's the
>> general feeling about using these kinds of gcc features?
>>
> In general they are good, IF:
>
> a) they cover all versions of gcc we care about (or we have a fallback),

What is the supported range for these days?

> and
> b) they have the right semantics.

My main concern was making sure that its a strong enough barrier, but
the documentation is pretty explicit about that.

> Using gcc intrinsics can generate better code than we can in inline
> assembly.

It does seem to do a pretty good job; it generates a plain locked add if
you never look at the returned value, for example.

> However, (b) is a killer since gcc doesn't have a way to generate our
> lock prefix annotations, and so it isn't really useful here.

Yeah, I thought about that.  Its a bit unfortunate we're getting into
spinlock code at all on a UP system, but we don't have a mechanism to
stomp locking at a higher level.  (Ignoring all the insane stuff that
happens when the system becomes UP transiently just because all the
other CPUs have been unplugged for suspend, etc; we just shouldn't
bother in that case.)

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-22 20:59       ` Jeremy Fitzhardinge
@ 2011-06-22 21:07         ` H. Peter Anvin
  2011-06-22 21:35           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2011-06-22 21:07 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge,
	Linus Torvalds

On 06/22/2011 01:59 PM, Jeremy Fitzhardinge wrote:
> On 06/22/2011 01:19 PM, H. Peter Anvin wrote:
>> On 06/22/2011 12:21 PM, Jeremy Fitzhardinge wrote:
>>> A friend just pointed out that gcc has a "__sync_fetch_and_add()"
>>> intrinsic, which compiles to xadd when used in this context.  What's the
>>> general feeling about using these kinds of gcc features?
>>>
>> In general they are good, IF:
>>
>> a) they cover all versions of gcc we care about (or we have a fallback),
> 
> What is the supported range for these days?
> 

For x86, we support 3.4, 4.0 and 4.1.2 and above (not sure if 4.0.x
actually works).  Other architectures have different rules.  At some
point we'll probably drop anything below 4.1.2, but we apparently can't yet.

>> and
>> b) they have the right semantics.
> 
> My main concern was making sure that its a strong enough barrier, but
> the documentation is pretty explicit about that.
> 
>> Using gcc intrinsics can generate better code than we can in inline
>> assembly.
> 
> It does seem to do a pretty good job; it generates a plain locked add if
> you never look at the returned value, for example.
> 
>> However, (b) is a killer since gcc doesn't have a way to generate our
>> lock prefix annotations, and so it isn't really useful here.
> 
> Yeah, I thought about that.  Its a bit unfortunate we're getting into
> spinlock code at all on a UP system, but we don't have a mechanism to
> stomp locking at a higher level.  (Ignoring all the insane stuff that
> happens when the system becomes UP transiently just because all the
> other CPUs have been unplugged for suspend, etc; we just shouldn't
> bother in that case.)

Yep...

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-22 21:07         ` H. Peter Anvin
@ 2011-06-22 21:35           ` Jeremy Fitzhardinge
  2011-06-22 23:16             ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-22 21:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge,
	Linus Torvalds

On 06/22/2011 02:07 PM, H. Peter Anvin wrote:
> For x86, we support 3.4, 4.0 and 4.1.2 and above (not sure if 4.0.x
> actually works).  Other architectures have different rules.  At some
> point we'll probably drop anything below 4.1.2, but we apparently can't yet.

Looks like the intrinsics turned up in 4.1.0.

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-22 21:35           ` Jeremy Fitzhardinge
@ 2011-06-22 23:16             ` H. Peter Anvin
  0 siblings, 0 replies; 49+ messages in thread
From: H. Peter Anvin @ 2011-06-22 23:16 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge,
	Linus Torvalds

On 06/22/2011 02:35 PM, Jeremy Fitzhardinge wrote:
> On 06/22/2011 02:07 PM, H. Peter Anvin wrote:
>> For x86, we support 3.4, 4.0 and 4.1.2 and above (not sure if 4.0.x
>> actually works).  Other architectures have different rules.  At some
>> point we'll probably drop anything below 4.1.2, but we apparently can't yet.
> 
> Looks like the intrinsics turned up in 4.1.0.
> 

Yes, but they still don't do what we want...

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/8] x86/ticketlock: clean up types and accessors
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (8 preceding siblings ...)
  2011-06-21 14:18 ` Konrad Rzeszutek Wilk
@ 2011-06-24  1:19 ` Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 2/8] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
                     ` (7 more replies)
  2011-06-24 21:50 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code H. Peter Anvin
                   ` (2 subsequent siblings)
  12 siblings, 8 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   24 ++++++++++--------------
 arch/x86/include/asm/spinlock_types.h |   20 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	unsigned short inc = 1 << TICKET_SHIFT;
 
 	asm volatile (
 		LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp, new;
+	unsigned int tmp, new;
 
 	asm volatile("movzwl %2, %0\n\t"
 		     "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	int inc = 0x00010000;
-	int tmp;
+	unsigned inc = 1 << TICKET_SHIFT;
+	unsigned tmp;
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 		     "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp;
-	int new;
+	unsigned tmp;
+	unsigned new;
 
 	asm volatile("movl %2,%0\n\t"
 		     "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
+#define TICKET_MASK	((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-	unsigned int slock;
+	union {
+		unsigned int slock;
+		struct __raw_tickets {
+			__ticket_t head, tail;
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/8] x86/ticketlock: convert spin loop to C
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-06-24  1:19   ` Jeremy Fitzhardinge
  2011-07-22 19:55     ` [tip:x86/spinlocks] x86, ticketlock: Convert " tip-bot for Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   58 +++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned short inc = 1 << TICKET_SHIFT;
-
-	asm volatile (
-		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
-		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
-		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
-		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
-		:
-		: "memory", "cc");
+	register union {
+		struct __raw_tickets tickets;
+		unsigned short slock;
+	} inc = { .slock = 1 << TICKET_SHIFT };
+
+	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+	for (;;) {
+		if (inc.tickets.head == inc.tickets.tail)
+			goto out;
+		cpu_relax();
+		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned inc = 1 << TICKET_SHIFT;
-	unsigned tmp;
+	__ticket_t tmp;
 
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-		     "movzwl %w0, %2\n\t"
-		     "shrl $16, %0\n\t"
-		     "1:\t"
-		     "cmpl %0, %2\n\t"
-		     "je 2f\n\t"
-		     "rep ; nop\n\t"
-		     "movzwl %1, %2\n\t"
-		     /* don't need lfence here, because loads are in-order */
-		     "jmp 1b\n"
-		     "2:"
-		     : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-		     :
-		     : "memory", "cc");
+	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+		     : "+r" (inc), "+m" (lock->slock)
+		     : : "memory", "cc");
+
+	tmp = inc;
+	inc >>= TICKET_SHIFT;
+
+	for (;;) {
+		if ((__ticket_t)inc == tmp)
+			goto out;
+		cpu_relax();
+		tmp = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 2/8] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-06-24  1:19   ` Jeremy Fitzhardinge
  2011-06-24 21:52     ` H. Peter Anvin
  2011-07-22 19:56     ` [tip:x86/spinlocks] x86, ticketlock: Use C for __ticket_spin_unlock tip-bot for Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 4/8] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
                     ` (5 subsequent siblings)
  7 siblings, 2 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   33 ++++++++++++++++++---------------
 1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..704b0c3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm (LOCK_PREFIX "incb %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+	else
+		asm (LOCK_PREFIX "incw %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	lock->tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,14 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	barrier();		/* prevent reordering out of locked region */
+	__ticket_unlock_release(lock);
+	barrier();		/* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/8] x86/ticketlock: make large and small ticket versions of spin_lock the same
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 2/8] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-06-24  1:19   ` Jeremy Fitzhardinge
  2011-07-22 19:57     ` [tip:x86/spinlocks] x86, ticketlock: Make " tip-bot for Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 5/8] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   23 ++++++++---------------
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 704b0c3..694b8a0 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	register union {
-		struct __raw_tickets tickets;
-		unsigned short slock;
-	} inc = { .slock = 1 << TICKET_SHIFT };
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
 	for (;;) {
-		if (inc.tickets.head == inc.tickets.tail)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned inc = 1 << TICKET_SHIFT;
-	__ticket_t tmp;
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->slock)
+		     : "+r" (inc), "+m" (lock->tickets)
 		     : : "memory", "cc");
 
-	tmp = inc;
-	inc >>= TICKET_SHIFT;
-
 	for (;;) {
-		if ((__ticket_t)inc == tmp)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		tmp = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 5/8] x86/ticketlock: make __ticket_spin_lock common
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                     ` (2 preceding siblings ...)
  2011-06-24  1:19   ` [PATCH 4/8] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
@ 2011-06-24  1:19   ` Jeremy Fitzhardinge
  2011-07-22 19:57     ` [tip:x86/spinlocks] x86, ticketlock: Make " tip-bot for Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 6/8] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   42 ++++++++++++++++++--------------------
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 694b8a0..0b8d2b2 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets tickets = { .tail = 1 };
+
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+			      : "+r" (tickets), "+m" (lock->tickets)
+			      : : "memory", "cc");
+	else
+		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+			     : "+r" (tickets), "+m" (lock->tickets)
+			     : : "memory", "cc");
 
-	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+	return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+	register struct __raw_tickets inc;
+
+	inc = __ticket_spin_claim(lock);
 
 	for (;;) {
 		if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-	register struct __raw_tickets inc = { .tail = 1 };
-
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->tickets)
-		     : : "memory", "cc");
-
-	for (;;) {
-		if (inc.head == inc.tail)
-			goto out;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
-	}
-out:	barrier();		/* make sure nothing creeps before the lock is taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned tmp;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 6/8] x86/ticketlock: make __ticket_spin_trylock common
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                     ` (3 preceding siblings ...)
  2011-06-24  1:19   ` [PATCH 5/8] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
@ 2011-06-24  1:19   ` Jeremy Fitzhardinge
  2011-07-22 19:57     ` [tip:x86/spinlocks] x86, ticketlock: Make " tip-bot for Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 7/8] x86: add xadd helper macro Jeremy Fitzhardinge
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   49 +++++++--------------------------
 arch/x86/include/asm/spinlock_types.h |    6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 0b8d2b2..dac6fc6 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp, new;
-
-	asm volatile("movzwl %2, %0\n\t"
-		     "cmpb %h0,%b0\n\t"
-		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned tmp;
-	unsigned new;
-
-	asm volatile("movl %2,%0\n\t"
-		     "movl %0,%1\n\t"
-		     "roll $16, %0\n\t"
-		     "cmpl %0,%1\n\t"
-		     "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
+	arch_spinlock_t old, new;
+
+	old.tickets = ACCESS_ONCE(lock->tickets);
+	if (old.tickets.head != old.tickets.tail)
+		return 0;
+
+	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+	/* cmpxchg is a full barrier, so nothing can move before it */
+	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
 	union {
-		unsigned int slock;
+		__ticketpair_t head_tail;
 		struct __raw_tickets {
 			__ticket_t head, tail;
 		} tickets;
 	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 7/8] x86: add xadd helper macro
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                     ` (4 preceding siblings ...)
  2011-06-24  1:19   ` [PATCH 6/8] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-06-24  1:19   ` Jeremy Fitzhardinge
  2011-07-22 19:58     ` [tip:x86/spinlocks] x86: Add " tip-bot for Jeremy Fitzhardinge
  2011-06-24  1:19   ` [PATCH 8/8] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
  2011-07-22 19:55   ` [tip:x86/spinlocks] x86, ticketlock: Clean up types and accessors tip-bot for Jeremy Fitzhardinge
  7 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/cmpxchg_32.h |   21 +++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_64.h |   26 ++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 284a6e8..30f0318 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -280,4 +280,25 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 
 #endif
 
+#define xadd(ptr, inc)							\
+	do {								\
+		switch (sizeof(*(ptr))) {				\
+		case 1:							\
+			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 2:							\
+			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 4:							\
+			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		}							\
+	} while(0)
+
 #endif /* _ASM_X86_CMPXCHG_32_H */
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 423ae58..62da1ff 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -151,4 +151,30 @@ extern void __cmpxchg_wrong_size(void);
 	cmpxchg_local((ptr), (o), (n));					\
 })
 
+#define xadd(ptr, inc)							\
+	do {								\
+		switch (sizeof(*(ptr))) {				\
+		case 1:							\
+			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 2:							\
+			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 4:							\
+			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 8:							\
+			asm volatile (LOCK_PREFIX "xaddq %q0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		}							\
+	} while(0)
+
 #endif /* _ASM_X86_CMPXCHG_64_H */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 8/8] x86/ticketlock: use xadd helper
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                     ` (5 preceding siblings ...)
  2011-06-24  1:19   ` [PATCH 7/8] x86: add xadd helper macro Jeremy Fitzhardinge
@ 2011-06-24  1:19   ` Jeremy Fitzhardinge
  2011-07-22 19:58     ` [tip:x86/spinlocks] x86, ticketlock: Use " tip-bot for Jeremy Fitzhardinge
  2011-07-22 19:55   ` [tip:x86/spinlocks] x86, ticketlock: Clean up types and accessors tip-bot for Jeremy Fitzhardinge
  7 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index dac6fc6..2d14e7c 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -71,14 +71,7 @@ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spin
 {
 	register struct __raw_tickets tickets = { .tail = 1 };
 
-	if (sizeof(lock->tickets.head) == sizeof(u8))
-		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-			      : "+r" (tickets), "+m" (lock->tickets)
-			      : : "memory", "cc");
-	else
-		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
-			     : "+r" (tickets), "+m" (lock->tickets)
-			     : : "memory", "cc");
+	xadd(&lock->tickets, tickets);
 
 	return tickets;
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (9 preceding siblings ...)
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-06-24 21:50 ` H. Peter Anvin
  2011-06-24 22:42   ` Jeremy Fitzhardinge
  2011-06-25 10:11 ` Ingo Molnar
  2011-06-29 20:44 ` Andi Kleen
  12 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2011-06-24 21:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/23/2011 06:19 PM, Jeremy Fitzhardinge wrote:
> 
> I've been running with this code in place for several months on 4 core
> systems without any problems.
> 
> I couldn't measure a consistent performance difference between the two
> implemenations; there seemed to be +/- ~1% +/-, which is the level of
> variation I see from simply recompiling the kernel with slightly
> different code alignment.
> 
> Overall, I think the large reduction in code size is a big win.
> 

Could you give us the delta in *compiled* code size?

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-06-24  1:19   ` [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-06-24 21:52     ` H. Peter Anvin
  2011-06-24 22:41       ` Jeremy Fitzhardinge
  2011-07-22 19:56       ` [tip:x86/spinlocks] x86, ticketlock: Use asm volatile for __ticket_unlock_release() tip-bot for H. Peter Anvin
  2011-07-22 19:56     ` [tip:x86/spinlocks] x86, ticketlock: Use C for __ticket_spin_unlock tip-bot for Jeremy Fitzhardinge
  1 sibling, 2 replies; 49+ messages in thread
From: H. Peter Anvin @ 2011-06-24 21:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/23/2011 06:19 PM, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> If we don't need to use a locked inc for unlock, then implement it in C.
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  arch/x86/include/asm/spinlock.h |   33 ++++++++++++++++++---------------
>  1 files changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index f48a6e3..704b0c3 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -33,9 +33,21 @@
>   * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
>   * (PPro errata 66, 92)
>   */
> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
> +{
> +	if (sizeof(lock->tickets.head) == sizeof(u8))
> +		asm (LOCK_PREFIX "incb %0"
> +		     : "+m" (lock->tickets.head) : : "memory");
> +	else
> +		asm (LOCK_PREFIX "incw %0"
> +		     : "+m" (lock->tickets.head) : : "memory");
> +

These should be asm volatile, I believe.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-06-24 21:52     ` H. Peter Anvin
@ 2011-06-24 22:41       ` Jeremy Fitzhardinge
  2011-07-22 18:32         ` H. Peter Anvin
  2011-07-22 19:56       ` [tip:x86/spinlocks] x86, ticketlock: Use asm volatile for __ticket_unlock_release() tip-bot for H. Peter Anvin
  1 sibling, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24 22:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/24/2011 02:52 PM, H. Peter Anvin wrote:
> These should be asm volatile, I believe.

OK.

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-24 21:50 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code H. Peter Anvin
@ 2011-06-24 22:42   ` Jeremy Fitzhardinge
  2011-06-25  3:15     ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-06-24 22:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/24/2011 02:50 PM, H. Peter Anvin wrote:
> On 06/23/2011 06:19 PM, Jeremy Fitzhardinge wrote:
>> I've been running with this code in place for several months on 4 core
>> systems without any problems.
>>
>> I couldn't measure a consistent performance difference between the two
>> implemenations; there seemed to be +/- ~1% +/-, which is the level of
>> variation I see from simply recompiling the kernel with slightly
>> different code alignment.
>>
>> Overall, I think the large reduction in code size is a big win.
>>
> Could you give us the delta in *compiled* code size?

Sure.  Do you mean for the individual lock sequences, or for the overall
kernel?

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-24 22:42   ` Jeremy Fitzhardinge
@ 2011-06-25  3:15     ` H. Peter Anvin
  2011-07-15 17:24       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2011-06-25  3:15 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>On 06/24/2011 02:50 PM, H. Peter Anvin wrote:
>> On 06/23/2011 06:19 PM, Jeremy Fitzhardinge wrote:
>>> I've been running with this code in place for several months on 4
>core
>>> systems without any problems.
>>>
>>> I couldn't measure a consistent performance difference between the
>two
>>> implemenations; there seemed to be +/- ~1% +/-, which is the level
>of
>>> variation I see from simply recompiling the kernel with slightly
>>> different code alignment.
>>>
>>> Overall, I think the large reduction in code size is a big win.
>>>
>> Could you give us the delta in *compiled* code size?
>
>Sure.  Do you mean for the individual lock sequences, or for the
>overall
>kernel?
>
>    J

Overall is fine.
-- 
Sent from my mobile phone. Please excuse my brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (10 preceding siblings ...)
  2011-06-24 21:50 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code H. Peter Anvin
@ 2011-06-25 10:11 ` Ingo Molnar
  2011-06-29 20:44 ` Andi Kleen
  12 siblings, 0 replies; 49+ messages in thread
From: Ingo Molnar @ 2011-06-25 10:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>  2. With NR_CPUS < 256 the ticket size is 8 bits.  The compiler doesn't
>     use the same trick as the hand-coded asm to directly compare the high
>     and low bytes in the word, but does a bit of extra shuffling around.
>     However, the Intel optimisation guide and several x86 experts have
>     opined that its best to avoid the high-byte operations anyway, since
>     they will cause a partial word stall, and the gcc-generated code should
>     be better.
> 
>     Overall the compiler-generated code is very similar to the hand-coded
>     versions, with the partial byte operations being the only significant
>     difference. (Curiously, gcc does generate a high-byte compare for me
>     in trylock, so it can if it wants to.)
> 
> I've been running with this code in place for several months on 4 core
> systems without any problems.

Please do measurements both in terms of disassembly based instruction 
count(s) in the fastpath(s) (via looking at the before/after 
disassembly) and actual cycle, instruction and branch counts (via 
perf measurements).

> I couldn't measure a consistent performance difference between the two
> implemenations; there seemed to be +/- ~1% +/-, which is the level of
> variation I see from simply recompiling the kernel with slightly
> different code alignment.

Then you've done the micro-cost measurements the wrong way - we can 
and do detect much finer effects than 1%, see the methods used in 
this commit for example:

  c8b281161dfa: sched: Increase SCHED_LOAD_SCALE resolution

Please also ensure that the cold-cache behavior is fairly measured 
via hot-cache benchmarks (that is not always guaranteed).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
                   ` (11 preceding siblings ...)
  2011-06-25 10:11 ` Ingo Molnar
@ 2011-06-29 20:44 ` Andi Kleen
  2011-07-21 23:33   ` Jeremy Fitzhardinge
  2011-09-07 23:25   ` Jeremy Fitzhardinge
  12 siblings, 2 replies; 49+ messages in thread
From: Andi Kleen @ 2011-06-29 20:44 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

Jeremy Fitzhardinge <jeremy@goop.org> writes:
>
> I couldn't measure a consistent performance difference between the two
> implemenations; there seemed to be +/- ~1% +/-, which is the level of
> variation I see from simply recompiling the kernel with slightly
> different code alignment.

I ran your new locks in my lock tester and I have a similar experience.
There's some variation, but it seems to be in the usual variance.
In some cases the C locks were actually faster.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-25  3:15     ` H. Peter Anvin
@ 2011-07-15 17:24       ` Jeremy Fitzhardinge
  2011-07-15 23:06         ` H. Peter Anvin
  0 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-07-15 17:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/24/2011 08:15 PM, H. Peter Anvin wrote:
>>> Could you give us the delta in *compiled* code size?
>> Sure.  Do you mean for the individual lock sequences, or for the
>> overall
>> kernel?
>>
>>    J
> Overall is fine.

Here's both ;)

For the NR_CPUS < 256 case, it shrinks the kernel text by a little over
1k (and a bit of a data reduction too?):

   text	   data	    bss	    dec	    hex	filename
7287009	1915524	2347008	11549541	 b03b65	vmlinux-spin-base
7285777	1915412	2347008	11548197	 b03625	vmlinux-cleantick

A comparison of spin_lock before:

<do_raw_spin_lock>:
        55                      push   %rbp
        48 89 e5                mov    %rsp,%rbp
        e8 6f fa 3d 00          callq  <mcount>
        b8 00 01 00 00          mov    $0x100,%eax
        f0 66 0f c1 07          lock xadd %ax,(%rdi)
1:      38 e0                   cmp    %ah,%al
        74 06                   je     2f
        f3 90                   pause  
        8a 07                   mov    (%rdi),%al
        eb f6                   jmp    1b
2:      5d                      pop    %rbp
        c3                      retq   

and after:

<do_raw_spin_lock>:
        55                      push   %rbp
        48 89 e5                mov    %rsp,%rbp
        e8 1f f6 3d 00          callq  <mcount>
        b8 00 01 00 00          mov    $0x100,%eax
        f0 66 0f c1 07          lock xadd %ax,(%rdi)
        0f b6 d4                movzbl %ah,%edx
1:      38 d0                   cmp    %dl,%al
        74 06                   je     2f
        f3 90                   pause  
        8a 07                   mov    (%rdi),%al
        eb f6                   jmp    1b
2:      5d                      pop    %rbp
        c3                      retq   

IOW the generated code is identical except that the original has "cmp %ah,%al" and
the compiler generates
	movzbl %ah,%edx
	cmp    %dl,%al

I posted a bug about gcc not generating cmp %ah, %al in this case, but the general consensus was that
it's not a good choice on current Intel chips, because it causes a partial word stall, and that the
current generated code is better.  (And gcc can generate "cmp %Xh, %Xl" if it wants to - see below.)

Likewise trylock before:

<do_raw_spin_trylock>:
        55                      push   %rbp
        48 89 e5                mov    %rsp,%rbp
        e8 50 fa 3d 00          callq  <mcount>
        0f b7 07                movzwl (%rdi),%eax
        38 e0                   cmp    %ah,%al
        8d 90 00 01 00 00       lea    0x100(%rax),%edx
        75 05                   jne    1f
        f0 66 0f b1 17          lock cmpxchg %dx,(%rdi)
1:      0f 94 c2                sete   %dl
        0f b6 c2                movzbl %dl,%eax
        5d                      pop    %rbp
        c3                      retq   

and after:

<do_raw_spin_trylock>:
        55                      push   %rbp
        48 89 e5                mov    %rsp,%rbp
        e8 fd f5 3d 00          callq  <mcount>
        31 c0                   xor    %eax,%eax
        66 8b 17                mov    (%rdi),%dx
        38 d6                   cmp    %dl,%dh
        75 16                   jne    1f
        8d 8a 00 01 00 00       lea    0x100(%rdx),%ecx
        89 d0                   mov    %edx,%eax
        f0 66 0f b1 0f          lock cmpxchg %cx,(%rdi)
        66 39 d0                cmp    %dx,%ax
        0f 94 c0                sete   %al
        0f b6 c0                movzbl %al,%eax
1:      5d                      pop    %rbp
        c3                      retq   


The differences here are a little more extensive, but the code is
broadly comparable.   Some interesting points on the gcc code:

    * it pre-loads the failure return value, so it doesn't need to use
      sete unless its actually doing the cmpxchg - whether this is good
      or not depends on how often trylock fails vs succeeds, but is
      pretty minor either way
    * it generates a 16 bit load, rather than using zero extending
      32-bit load
    * it *can* generate a "cmp %dl,%dh" if it wants to
    * it ends up re-comparing the "old" and "new" values after cmpxchg
      rather than reusing the flags it sets.  This could be fixed by
      having a cmpxchg() variant which returns a flag rather than old,
      which would be generally useful since most cmpxchg() callers end
      up doing that comparison anyway.

spin_unlock is a little trickier to compare because it's inlined, but
I'm guessing that's the source of the code shrinkage.

Likewise, for NR_CPUS=512, there's a ~900 byte kernel shrinkage with the
new code:

   text	   data	    bss	    dec	    hex	filename
7296571	1945380	2424832	11666783	 b2055f	vmlinux-spin-base-big
7295610	1945412	2424832	11665854	 b201be	vmlinux-cleantick-big

The generated code for do_raw_spin_lock is even closer:

Before:

<do_raw_spin_lock>:
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       e8 cf 0a 3e 00          callq  <mcount>
       b8 00 00 01 00          mov    $0x10000,%eax
       f0 0f c1 07             lock xadd %eax,(%rdi)
       0f b7 d0                movzwl %ax,%edx
       c1 e8 10                shr    $0x10,%eax
1:     39 c2                   cmp    %eax,%edx
       74 07                   je     2f
       f3 90                   pause  
       0f b7 17                movzwl (%rdi),%edx
       eb f5                   jmp    1b
2:     5d                      pop    %rbp
       c3                      retq   

After:

<do_raw_spin_lock>:
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       e8 63 07 3e 00          callq  <mcount>
       b8 00 00 01 00          mov    $0x10000,%eax
       f0 0f c1 07             lock xadd %eax,(%rdi)
       89 c2                   mov    %eax,%edx
       c1 ea 10                shr    $0x10,%edx
1:     66 39 d0                cmp    %dx,%ax
       74 07                   je     2f
       f3 90                   pause  
       66 8b 07                mov    (%rdi),%ax
       eb f4                   jmp    1b
2:     5d                      pop    %rbp
       c3                      retq   

In other words, identical aside from using 16 bit regs rather than 32
bit regs and zero extend.

Trylock:

Before:

<do_raw_spin_trylock>:
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       e8 aa 0a 3e 00          callq  <mcount>
       8b 07                   mov    (%rdi),%eax
       89 c2                   mov    %eax,%edx
       c1 c0 10                rol    $0x10,%eax
       39 c2                   cmp    %eax,%edx
       8d 90 00 00 01 00       lea    0x10000(%rax),%edx
       75 04                   jne    1f
       f0 0f b1 17             lock cmpxchg %edx,(%rdi)
1:     0f 94 c2                sete   %dl
       0f b6 c2                movzbl %dl,%eax
       5d                      pop    %rbp
       c3                      retq   

After:

<do_raw_spin_trylock>:
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       e8 3e 07 3e 00          callq  <mcount>
       31 c0                   xor    %eax,%eax
       8b 17                   mov    (%rdi),%edx
       89 d1                   mov    %edx,%ecx
       c1 e9 10                shr    $0x10,%ecx
       66 39 ca                cmp    %cx,%dx
       75 14                   jne    1f
       8d 8a 00 00 01 00       lea    0x10000(%rdx),%ecx
       89 d0                   mov    %edx,%eax
       f0 0f b1 0f             lock cmpxchg %ecx,(%rdi)
       39 d0                   cmp    %edx,%eax
       0f 94 c0                sete   %al
       0f b6 c0                movzbl %al,%eax
1:     5d                      pop    %rbp
       c3                      retq   

The differences here are similar to the < 256 CPU case:

    * gcc generates a straightforward shift and 16-bit compare to
      compare the head and tail, rather than the rol version (which I
      guess keeps everything 32b)
    * same pre-loading the failure return value rather than reusing sete
    * same redundant compare after cmpxchg


So conclusion:

    * overall kernel code size reduction
    * the spinlock code is very similar
    * the trylock code could be improved a little, but its unclear to me
      that it would make much difference (since there's a big fat locked
      cmpxchg in the middle of any interesting code path)
    * unlock is inlined, so I couldn't evaluate that, but since its just
      an unlocked inc, its hard to see how that could go wrong.

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-07-15 17:24       ` Jeremy Fitzhardinge
@ 2011-07-15 23:06         ` H. Peter Anvin
  2011-07-16  0:14           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2011-07-15 23:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge, H.J. Lu

On 07/15/2011 10:24 AM, Jeremy Fitzhardinge wrote:
> 
> I posted a bug about gcc not generating cmp %ah, %al in this case, but the general consensus was that
> it's not a good choice on current Intel chips, because it causes a partial word stall, and that the
> current generated code is better.  (And gcc can generate "cmp %Xh, %Xl" if it wants to - see below.)
> 

Which version of gcc is this, and also, do you have the gcc bug report
number?  I was talking to H.J. and he said he thought it sounded strange.

	-hpa

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-07-15 23:06         ` H. Peter Anvin
@ 2011-07-16  0:14           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-07-16  0:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge, H.J. Lu

On 07/15/2011 04:06 PM, H. Peter Anvin wrote:
> On 07/15/2011 10:24 AM, Jeremy Fitzhardinge wrote:
>> I posted a bug about gcc not generating cmp %ah, %al in this case, but the general consensus was that
>> it's not a good choice on current Intel chips, because it causes a partial word stall, and that the
>> current generated code is better.  (And gcc can generate "cmp %Xh, %Xl" if it wants to - see below.)
>>
> Which version of gcc is this, and also, do you have the gcc bug report
> number?  I was talking to H.J. and he said he thought it sounded strange.

gcc version 4.6.0 20110530 (Red Hat 4.6.0-9) (GCC)

It's bug 47556.

    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-29 20:44 ` Andi Kleen
@ 2011-07-21 23:33   ` Jeremy Fitzhardinge
  2011-07-22 16:25     ` Andi Kleen
  2011-09-07 23:25   ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-07-21 23:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/29/2011 01:44 PM, Andi Kleen wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>> I couldn't measure a consistent performance difference between the two
>> implemenations; there seemed to be +/- ~1% +/-, which is the level of
>> variation I see from simply recompiling the kernel with slightly
>> different code alignment.
> I ran your new locks in my lock tester and I have a similar experience.
> There's some variation, but it seems to be in the usual variance.
> In some cases the C locks were actually faster.

Yes, I observed cases where they were faster.  What is your lock
tester?  Does it test the lock code in isolation, or in situ?

Thanks,
    J

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-07-21 23:33   ` Jeremy Fitzhardinge
@ 2011-07-22 16:25     ` Andi Kleen
  0 siblings, 0 replies; 49+ messages in thread
From: Andi Kleen @ 2011-07-22 16:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

Jeremy Fitzhardinge <jeremy@goop.org> writes:
>
> Yes, I observed cases where they were faster.  What is your lock
> tester?  Does it test the lock code in isolation, or in situ?

User space harness to test a couple of lock scenarios with different
numbers of CPUs competing.

It tests the lock code in isolation.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-06-24 22:41       ` Jeremy Fitzhardinge
@ 2011-07-22 18:32         ` H. Peter Anvin
  2011-07-22 19:28           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 49+ messages in thread
From: H. Peter Anvin @ 2011-07-22 18:32 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/24/2011 03:41 PM, Jeremy Fitzhardinge wrote:
> On 06/24/2011 02:52 PM, H. Peter Anvin wrote:
>> These should be asm volatile, I believe.
> 
> OK.
> 
>     J

You still owe me a new patch here, I believe...

	-hpa


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-07-22 18:32         ` H. Peter Anvin
@ 2011-07-22 19:28           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-07-22 19:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 07/22/2011 11:32 AM, H. Peter Anvin wrote:
> On 06/24/2011 03:41 PM, Jeremy Fitzhardinge wrote:
>> On 06/24/2011 02:52 PM, H. Peter Anvin wrote:
>>> These should be asm volatile, I believe.
>> OK.
>>
>>     J
> You still owe me a new patch here, I believe...

Ah, yes, though I don't think it can make a difference in practice,
since the caller contains barrier()s on either side.  And the non-asm
non-locked variant has no equivalent.  But it can't hurt.

    J

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Fri, 22 Jul 2011 12:27:14 -0700
Subject: [PATCH] x86/ticketlocks: make asms volatile to be extra sure

Make the locked-unlock asms volatile, just to be extra sure they won't wander away.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 2d14e7c..05fd59e 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -36,11 +36,11 @@
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
 	if (sizeof(lock->tickets.head) == sizeof(u8))
-		asm (LOCK_PREFIX "incb %0"
-		     : "+m" (lock->tickets.head) : : "memory");
+		asm volatile (LOCK_PREFIX "incb %0"
+			      : "+m" (lock->tickets.head) : : "memory");
 	else
-		asm (LOCK_PREFIX "incw %0"
-		     : "+m" (lock->tickets.head) : : "memory");
+		asm volatile (LOCK_PREFIX "incw %0"
+			      : "+m" (lock->tickets.head) : : "memory");
 
 }
 #else



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Clean up types and accessors
  2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                     ` (6 preceding siblings ...)
  2011-06-24  1:19   ` [PATCH 8/8] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
@ 2011-07-22 19:55   ` tip-bot for Jeremy Fitzhardinge
  7 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  699f3dd4e4f8fba6ded83a3dc21e3db960323fcc
Gitweb:     http://git.kernel.org/tip/699f3dd4e4f8fba6ded83a3dc21e3db960323fcc
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:13 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:13:17 -0700

x86, ticketlock: Clean up types and accessors

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/05552e1c97739f589cb76c20b5a6565ccd506636.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h       |   24 ++++++++++--------------
 arch/x86/include/asm/spinlock_types.h |   20 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	unsigned short inc = 1 << TICKET_SHIFT;
 
 	asm volatile (
 		LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp, new;
+	unsigned int tmp, new;
 
 	asm volatile("movzwl %2, %0\n\t"
 		     "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	int inc = 0x00010000;
-	int tmp;
+	unsigned inc = 1 << TICKET_SHIFT;
+	unsigned tmp;
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 		     "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp;
-	int new;
+	unsigned tmp;
+	unsigned new;
 
 	asm volatile("movl %2,%0\n\t"
 		     "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
+#define TICKET_MASK	((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-	unsigned int slock;
+	union {
+		unsigned int slock;
+		struct __raw_tickets {
+			__ticket_t head, tail;
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
 
 typedef struct {
 	unsigned int lock;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Convert spin loop to C
  2011-06-24  1:19   ` [PATCH 2/8] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-07-22 19:55     ` tip-bot for Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  79e661791e9681d31fe32dfebe6d2fcf4938b64b
Gitweb:     http://git.kernel.org/tip/79e661791e9681d31fe32dfebe6d2fcf4938b64b
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:14 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:13:26 -0700

x86, ticketlock: Convert spin loop to C

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/f04120629b8b1cfa1c306373e34320687305a518.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h |   58 +++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned short inc = 1 << TICKET_SHIFT;
-
-	asm volatile (
-		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
-		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
-		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
-		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
-		:
-		: "memory", "cc");
+	register union {
+		struct __raw_tickets tickets;
+		unsigned short slock;
+	} inc = { .slock = 1 << TICKET_SHIFT };
+
+	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+	for (;;) {
+		if (inc.tickets.head == inc.tickets.tail)
+			goto out;
+		cpu_relax();
+		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned inc = 1 << TICKET_SHIFT;
-	unsigned tmp;
+	__ticket_t tmp;
 
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-		     "movzwl %w0, %2\n\t"
-		     "shrl $16, %0\n\t"
-		     "1:\t"
-		     "cmpl %0, %2\n\t"
-		     "je 2f\n\t"
-		     "rep ; nop\n\t"
-		     "movzwl %1, %2\n\t"
-		     /* don't need lfence here, because loads are in-order */
-		     "jmp 1b\n"
-		     "2:"
-		     : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-		     :
-		     : "memory", "cc");
+	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+		     : "+r" (inc), "+m" (lock->slock)
+		     : : "memory", "cc");
+
+	tmp = inc;
+	inc >>= TICKET_SHIFT;
+
+	for (;;) {
+		if ((__ticket_t)inc == tmp)
+			goto out;
+		cpu_relax();
+		tmp = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Use C for __ticket_spin_unlock
  2011-06-24  1:19   ` [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
  2011-06-24 21:52     ` H. Peter Anvin
@ 2011-07-22 19:56     ` tip-bot for Jeremy Fitzhardinge
  1 sibling, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  68302ae430d388a0118a8ce2a1abb5f87b737ef6
Gitweb:     http://git.kernel.org/tip/68302ae430d388a0118a8ce2a1abb5f87b737ef6
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:15 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:13:44 -0700

x86, ticketlock: Use C for __ticket_spin_unlock

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/aa0d82c6e7a1c22a941da0df3c4ba58fea375074.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h |   33 ++++++++++++++++++---------------
 1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..704b0c3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm (LOCK_PREFIX "incb %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+	else
+		asm (LOCK_PREFIX "incw %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	lock->tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,14 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	barrier();		/* prevent reordering out of locked region */
+	__ticket_unlock_release(lock);
+	barrier();		/* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Use asm volatile for __ticket_unlock_release()
  2011-06-24 21:52     ` H. Peter Anvin
  2011-06-24 22:41       ` Jeremy Fitzhardinge
@ 2011-07-22 19:56       ` tip-bot for H. Peter Anvin
  1 sibling, 0 replies; 49+ messages in thread
From: tip-bot for H. Peter Anvin @ 2011-07-22 19:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  26056c822f9428ddc95fd7ac3008bbb0032d1086
Gitweb:     http://git.kernel.org/tip/26056c822f9428ddc95fd7ac3008bbb0032d1086
Author:     H. Peter Anvin <hpa@linux.intel.com>
AuthorDate: Fri, 22 Jul 2011 11:15:20 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:15:20 -0700

x86, ticketlock: Use asm volatile for __ticket_unlock_release()

__ticket_uplock_release() really should have barrier semantics, so use
"asm volatile" there.

Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/4E050704.3070409@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 704b0c3..f8d51dc 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -36,11 +36,11 @@
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
 	if (sizeof(lock->tickets.head) == sizeof(u8))
-		asm (LOCK_PREFIX "incb %0"
-		     : "+m" (lock->tickets.head) : : "memory");
+		asm volatile(LOCK_PREFIX "incb %0"
+			     : "+m" (lock->tickets.head) : : "memory");
 	else
-		asm (LOCK_PREFIX "incw %0"
-		     : "+m" (lock->tickets.head) : : "memory");
+		asm volatile(LOCK_PREFIX "incw %0"
+			     : "+m" (lock->tickets.head) : : "memory");
 
 }
 #else

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Make large and small ticket versions of spin_lock the same
  2011-06-24  1:19   ` [PATCH 4/8] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
@ 2011-07-22 19:57     ` tip-bot for Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  dc126c03ca25de516eaa8e575a98b81382d6d4b4
Gitweb:     http://git.kernel.org/tip/dc126c03ca25de516eaa8e575a98b81382d6d4b4
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:16 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:18:37 -0700

x86, ticketlock: Make large and small ticket versions of spin_lock the same

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/9eea42d16aabc7624488eb62213a763dbbacef38.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h |   23 ++++++++---------------
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f8d51dc..078319e 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	register union {
-		struct __raw_tickets tickets;
-		unsigned short slock;
-	} inc = { .slock = 1 << TICKET_SHIFT };
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
 	for (;;) {
-		if (inc.tickets.head == inc.tickets.tail)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned inc = 1 << TICKET_SHIFT;
-	__ticket_t tmp;
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->slock)
+		     : "+r" (inc), "+m" (lock->tickets)
 		     : : "memory", "cc");
 
-	tmp = inc;
-	inc >>= TICKET_SHIFT;
-
 	for (;;) {
-		if ((__ticket_t)inc == tmp)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		tmp = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Make __ticket_spin_lock common
  2011-06-24  1:19   ` [PATCH 5/8] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
@ 2011-07-22 19:57     ` tip-bot for Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  65bd26616b53c5dfab162f18c34ec57ddb682a00
Gitweb:     http://git.kernel.org/tip/65bd26616b53c5dfab162f18c34ec57ddb682a00
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:17 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:18:44 -0700

x86, ticketlock: Make __ticket_spin_lock common

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/11b6893703912054024bdb60251ed944f31cbafe.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h |   42 ++++++++++++++++++--------------------
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 078319e..a0e26c8 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets tickets = { .tail = 1 };
+
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+			      : "+r" (tickets), "+m" (lock->tickets)
+			      : : "memory", "cc");
+	else
+		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+			     : "+r" (tickets), "+m" (lock->tickets)
+			     : : "memory", "cc");
 
-	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+	return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+	register struct __raw_tickets inc;
+
+	inc = __ticket_spin_claim(lock);
 
 	for (;;) {
 		if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-	register struct __raw_tickets inc = { .tail = 1 };
-
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->tickets)
-		     : : "memory", "cc");
-
-	for (;;) {
-		if (inc.head == inc.tail)
-			goto out;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
-	}
-out:	barrier();		/* make sure nothing creeps before the lock is taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned tmp;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Make __ticket_spin_trylock common
  2011-06-24  1:19   ` [PATCH 6/8] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-07-22 19:57     ` tip-bot for Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  f5b00da7b60cc444b5e6caf2f08b0eebcebff75e
Gitweb:     http://git.kernel.org/tip/f5b00da7b60cc444b5e6caf2f08b0eebcebff75e
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:18 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:18:53 -0700

x86, ticketlock: Make __ticket_spin_trylock common

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/9ef9d6f621bda6e9052d908a39a9831345c34296.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h       |   49 +++++++--------------------------
 arch/x86/include/asm/spinlock_types.h |    6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index a0e26c8..e07dc41 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp, new;
-
-	asm volatile("movzwl %2, %0\n\t"
-		     "cmpb %h0,%b0\n\t"
-		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned tmp;
-	unsigned new;
-
-	asm volatile("movl %2,%0\n\t"
-		     "movl %0,%1\n\t"
-		     "roll $16, %0\n\t"
-		     "cmpl %0,%1\n\t"
-		     "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
+	arch_spinlock_t old, new;
+
+	old.tickets = ACCESS_ONCE(lock->tickets);
+	if (old.tickets.head != old.tickets.tail)
+		return 0;
+
+	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+	/* cmpxchg is a full barrier, so nothing can move before it */
+	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
 	union {
-		unsigned int slock;
+		__ticketpair_t head_tail;
 		struct __raw_tickets {
 			__ticket_t head, tail;
 		} tickets;
 	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	unsigned int lock;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86: Add xadd helper macro
  2011-06-24  1:19   ` [PATCH 7/8] x86: add xadd helper macro Jeremy Fitzhardinge
@ 2011-07-22 19:58     ` tip-bot for Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  847c73e8042e565a2cc4934c84103ab82e0eac42
Gitweb:     http://git.kernel.org/tip/847c73e8042e565a2cc4934c84103ab82e0eac42
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:19 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:18:58 -0700

x86: Add xadd helper macro

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/ce03e48f4b70a2a31accf32c8b41b781674e57c3.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/cmpxchg_32.h |   21 +++++++++++++++++++++
 arch/x86/include/asm/cmpxchg_64.h |   26 ++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index 284a6e8..30f0318 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -280,4 +280,25 @@ static inline unsigned long cmpxchg_386(volatile void *ptr, unsigned long old,
 
 #endif
 
+#define xadd(ptr, inc)							\
+	do {								\
+		switch (sizeof(*(ptr))) {				\
+		case 1:							\
+			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 2:							\
+			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 4:							\
+			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		}							\
+	} while(0)
+
 #endif /* _ASM_X86_CMPXCHG_32_H */
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index 423ae58..62da1ff 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -151,4 +151,30 @@ extern void __cmpxchg_wrong_size(void);
 	cmpxchg_local((ptr), (o), (n));					\
 })
 
+#define xadd(ptr, inc)							\
+	do {								\
+		switch (sizeof(*(ptr))) {				\
+		case 1:							\
+			asm volatile (LOCK_PREFIX "xaddb %b0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 2:							\
+			asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 4:							\
+			asm volatile (LOCK_PREFIX "xaddl %0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		case 8:							\
+			asm volatile (LOCK_PREFIX "xaddq %q0, %1\n"	\
+				      : "+r" (inc), "+m" (*(ptr))	\
+				      : : "memory", "cc");		\
+			break;						\
+		}							\
+	} while(0)
+
 #endif /* _ASM_X86_CMPXCHG_64_H */

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/spinlocks] x86, ticketlock: Use xadd helper
  2011-06-24  1:19   ` [PATCH 8/8] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
@ 2011-07-22 19:58     ` tip-bot for Jeremy Fitzhardinge
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Jeremy Fitzhardinge @ 2011-07-22 19:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, tglx, hpa, jeremy.fitzhardinge

Commit-ID:  599c3e7724ab4520c6f11857eb953271d365f842
Gitweb:     http://git.kernel.org/tip/599c3e7724ab4520c6f11857eb953271d365f842
Author:     Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
AuthorDate: Thu, 23 Jun 2011 18:19:20 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Fri, 22 Jul 2011 11:19:05 -0700

x86, ticketlock: Use xadd helper

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Link: http://lkml.kernel.org/r/aa7c94ecfbee05a6b8e455271ba0bb467bf33b2f.1308878118.git.jeremy.fitzhardinge@citrix.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/spinlock.h |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index e07dc41..da196f1 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -71,14 +71,7 @@ static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spin
 {
 	register struct __raw_tickets tickets = { .tail = 1 };
 
-	if (sizeof(lock->tickets.head) == sizeof(u8))
-		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-			      : "+r" (tickets), "+m" (lock->tickets)
-			      : : "memory", "cc");
-	else
-		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
-			     : "+r" (tickets), "+m" (lock->tickets)
-			     : : "memory", "cc");
+	xadd(&lock->tickets, tickets);
 
 	return tickets;
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code
  2011-06-29 20:44 ` Andi Kleen
  2011-07-21 23:33   ` Jeremy Fitzhardinge
@ 2011-09-07 23:25   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-07 23:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 06/29/2011 01:44 PM, Andi Kleen wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>> I couldn't measure a consistent performance difference between the two
>> implemenations; there seemed to be +/- ~1% +/-, which is the level of
>> variation I see from simply recompiling the kernel with slightly
>> different code alignment.
> I ran your new locks in my lock tester and I have a similar experience.
> There's some variation, but it seems to be in the usual variance.
> In some cases the C locks were actually faster.

Is your test harness available?

    J



^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2011-09-07 23:25 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-16 21:40 [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Jeremy Fitzhardinge
2011-06-16 21:40 ` [PATCH 1/7] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-06-16 21:40 ` [PATCH 2/7] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
2011-06-16 21:40 ` [PATCH 3/7] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
2011-06-16 21:40 ` [PATCH 4/7] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
2011-06-16 21:40 ` [PATCH 5/7] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
2011-06-16 21:40 ` [PATCH 6/7] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-06-16 21:40 ` [PATCH 7/7] x86/ticketlock: prevent memory accesses from reordered out of lock region Jeremy Fitzhardinge
2011-06-21 14:01 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code Peter Zijlstra
2011-06-21 17:54   ` Jeremy Fitzhardinge
2011-06-22 19:21   ` Jeremy Fitzhardinge
2011-06-22 20:19     ` H. Peter Anvin
2011-06-22 20:59       ` Jeremy Fitzhardinge
2011-06-22 21:07         ` H. Peter Anvin
2011-06-22 21:35           ` Jeremy Fitzhardinge
2011-06-22 23:16             ` H. Peter Anvin
2011-06-21 14:18 ` Konrad Rzeszutek Wilk
2011-06-24  1:19 ` [PATCH 1/8] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-06-24  1:19   ` [PATCH 2/8] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
2011-07-22 19:55     ` [tip:x86/spinlocks] x86, ticketlock: Convert " tip-bot for Jeremy Fitzhardinge
2011-06-24  1:19   ` [PATCH 3/8] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
2011-06-24 21:52     ` H. Peter Anvin
2011-06-24 22:41       ` Jeremy Fitzhardinge
2011-07-22 18:32         ` H. Peter Anvin
2011-07-22 19:28           ` Jeremy Fitzhardinge
2011-07-22 19:56       ` [tip:x86/spinlocks] x86, ticketlock: Use asm volatile for __ticket_unlock_release() tip-bot for H. Peter Anvin
2011-07-22 19:56     ` [tip:x86/spinlocks] x86, ticketlock: Use C for __ticket_spin_unlock tip-bot for Jeremy Fitzhardinge
2011-06-24  1:19   ` [PATCH 4/8] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
2011-07-22 19:57     ` [tip:x86/spinlocks] x86, ticketlock: Make " tip-bot for Jeremy Fitzhardinge
2011-06-24  1:19   ` [PATCH 5/8] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
2011-07-22 19:57     ` [tip:x86/spinlocks] x86, ticketlock: Make " tip-bot for Jeremy Fitzhardinge
2011-06-24  1:19   ` [PATCH 6/8] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-07-22 19:57     ` [tip:x86/spinlocks] x86, ticketlock: Make " tip-bot for Jeremy Fitzhardinge
2011-06-24  1:19   ` [PATCH 7/8] x86: add xadd helper macro Jeremy Fitzhardinge
2011-07-22 19:58     ` [tip:x86/spinlocks] x86: Add " tip-bot for Jeremy Fitzhardinge
2011-06-24  1:19   ` [PATCH 8/8] x86/ticketlock: use xadd helper Jeremy Fitzhardinge
2011-07-22 19:58     ` [tip:x86/spinlocks] x86, ticketlock: Use " tip-bot for Jeremy Fitzhardinge
2011-07-22 19:55   ` [tip:x86/spinlocks] x86, ticketlock: Clean up types and accessors tip-bot for Jeremy Fitzhardinge
2011-06-24 21:50 ` [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code H. Peter Anvin
2011-06-24 22:42   ` Jeremy Fitzhardinge
2011-06-25  3:15     ` H. Peter Anvin
2011-07-15 17:24       ` Jeremy Fitzhardinge
2011-07-15 23:06         ` H. Peter Anvin
2011-07-16  0:14           ` Jeremy Fitzhardinge
2011-06-25 10:11 ` Ingo Molnar
2011-06-29 20:44 ` Andi Kleen
2011-07-21 23:33   ` Jeremy Fitzhardinge
2011-07-22 16:25     ` Andi Kleen
2011-09-07 23:25   ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).