All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Clean up ticketlock implementation
@ 2011-01-24 23:41 Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Hi all,

This series cleans up the x86 ticketlock implementation by converting
a large proportion of it to C.  This eliminates the need for having
separate implementations for "large" (NR_CPUS >= 256) and "small"
(NR_CPUS < 256) ticket locks.

This also lays the groundwork for future changes to the ticketlock
implementation.

Of course, the big question when converting from assembler to C is
what the compiler will do to the code.  In general, the results are
very similar.

For example, the original hand-coded small-ticket ticket_lock is:
      movl   $256, %eax
      lock xadd %ax,(%rdi)
   1: cmp    %ah,%al
      je     2f
      pause  
      mov    (%rdi),%al
      jmp    1b
   2:

The C version, compiled by gcc 4.5.1 is:
        movl   $256, %eax
        lock; xaddw %ax, (%rdi)
        movzbl  %ah, %edx
.L3:    cmpb    %dl, %al
        je      .L2
        rep; nop
        movb    (%rdi), %al     # lock_1(D)->D.5949.tickets.head, inc$head
        jmp     .L3     #
.L2:

So very similar, except the compiler misses directly comparing
%ah to %al.

With big tickets, which is what distros are typically compiled with,
the results are:

hand-coded:
        movl    $65536, %eax    #, inc
        lock; xaddl %eax, (%rdi)        # inc, lock_2(D)->slock
	movzwl %ax, %edx        # inc, tmp
        shrl $16, %eax  # inc
1:      cmpl %eax, %edx # inc, tmp
        je 2f
        rep ; nop
        movzwl (%rdi), %edx     # lock_2(D)->slock, tmp
        jmp 1b
2:

Compiled C:
        movl    $65536, %eax    #, tickets
        lock; xaddl %eax, (%rdi)        # tickets, lock_1(D)->D.5952.tickets
        movl    %eax, %edx      # tickets,
        shrl    $16, %edx       #,
.L3:    cmpw    %dx, %ax        # tickets$tail, inc$head
        je      .L2     #,
        rep; nop
        movw    (%rdi), %ax     # lock_1(D)->D.5952.tickets.head, inc$head
        jmp     .L3     #
.L2:

In this case the code is pretty much identical except for slight
variations in where the 32-bit values are truncated to 16.

So overall, I think this change will have negligable performance
impact.

Thanks,
	J


Jeremy Fitzhardinge (6):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
    same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common

 arch/x86/include/asm/spinlock.h       |  146 ++++++++++++---------------------
 arch/x86/include/asm/spinlock_types.h |   22 +++++-
 2 files changed, 73 insertions(+), 95 deletions(-)

-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] x86/ticketlock: clean up types and accessors
  2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 2/6] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   24 ++++++++++--------------
 arch/x86/include/asm/spinlock_types.h |   20 ++++++++++++++++++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	unsigned short inc = 1 << TICKET_SHIFT;
 
 	asm volatile (
 		LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp, new;
+	unsigned int tmp, new;
 
 	asm volatile("movzwl %2, %0\n\t"
 		     "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 		     : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	int inc = 0x00010000;
-	int tmp;
+	unsigned inc = 1 << TICKET_SHIFT;
+	unsigned tmp;
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 		     "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	int tmp;
-	int new;
+	unsigned tmp;
+	unsigned new;
 
 	asm volatile("movl %2,%0\n\t"
 		     "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-	int tmp = ACCESS_ONCE(lock->slock);
+	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include <linux/types.h>
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
+#define TICKET_MASK	((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-	unsigned int slock;
+	union {
+		unsigned int slock;
+		struct __raw_tickets {
+			__ticket_t head, tail;
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] x86/ticketlock: convert spin loop to C
  2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   58 +++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned short inc = 1 << TICKET_SHIFT;
-
-	asm volatile (
-		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
-		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
-		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
-		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
-		:
-		: "memory", "cc");
+	register union {
+		struct __raw_tickets tickets;
+		unsigned short slock;
+	} inc = { .slock = 1 << TICKET_SHIFT };
+
+	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+	for (;;) {
+		if (inc.tickets.head == inc.tickets.tail)
+			goto out;
+		cpu_relax();
+		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned inc = 1 << TICKET_SHIFT;
-	unsigned tmp;
+	__ticket_t tmp;
 
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-		     "movzwl %w0, %2\n\t"
-		     "shrl $16, %0\n\t"
-		     "1:\t"
-		     "cmpl %0, %2\n\t"
-		     "je 2f\n\t"
-		     "rep ; nop\n\t"
-		     "movzwl %1, %2\n\t"
-		     /* don't need lfence here, because loads are in-order */
-		     "jmp 1b\n"
-		     "2:"
-		     : "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-		     :
-		     : "memory", "cc");
+	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+		     : "+r" (inc), "+m" (lock->slock)
+		     : : "memory", "cc");
+
+	tmp = inc;
+	inc >>= TICKET_SHIFT;
+
+	for (;;) {
+		if ((__ticket_t)inc == tmp)
+			goto out;
+		cpu_relax();
+		tmp = ACCESS_ONCE(lock->tickets.head);
+	}
+out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 2/6] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
  2011-01-25  1:13   ` Nick Piggin
  2011-01-24 23:41 ` [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   32 +++++++++++++++++---------------
 1 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..0170ba9 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm (LOCK_PREFIX "incb %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+	else
+		asm (LOCK_PREFIX "incw %0"
+		     : "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+	lock->tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 	return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	__ticket_unlock_release(lock);
+	barrier();		/* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same
  2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   23 ++++++++---------------
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 0170ba9..1b81809 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	register union {
-		struct __raw_tickets tickets;
-		unsigned short slock;
-	} inc = { .slock = 1 << TICKET_SHIFT };
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
 	for (;;) {
-		if (inc.tickets.head == inc.tickets.tail)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	unsigned inc = 1 << TICKET_SHIFT;
-	__ticket_t tmp;
+	register struct __raw_tickets inc = { .tail = 1 };
 
 	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->slock)
+		     : "+r" (inc), "+m" (lock->tickets)
 		     : : "memory", "cc");
 
-	tmp = inc;
-	inc >>= TICKET_SHIFT;
-
 	for (;;) {
-		if ((__ticket_t)inc == tmp)
+		if (inc.head == inc.tail)
 			goto out;
 		cpu_relax();
-		tmp = ACCESS_ONCE(lock->tickets.head);
+		inc.head = ACCESS_ONCE(lock->tickets.head);
 	}
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common
  2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2011-01-24 23:41 ` [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
  2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
  2011-01-25  1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
  6 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h |   42 ++++++++++++++++++--------------------
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 1b81809..f722f96 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets tickets = { .tail = 1 };
+
+	if (sizeof(lock->tickets.head) == sizeof(u8))
+		asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+			      : "+r" (tickets), "+m" (lock->tickets)
+			      : : "memory", "cc");
+	else
+		asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+			     : "+r" (tickets), "+m" (lock->tickets)
+			     : : "memory", "cc");
 
-	asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
-		      : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+	return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+	register struct __raw_tickets inc;
+
+	inc = __ticket_spin_claim(lock);
 
 	for (;;) {
 		if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-	register struct __raw_tickets inc = { .tail = 1 };
-
-	asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-		     : "+r" (inc), "+m" (lock->tickets)
-		     : : "memory", "cc");
-
-	for (;;) {
-		if (inc.head == inc.tail)
-			goto out;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
-	}
-out:	barrier();		/* make sure nothing creeps before the lock is taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned tmp;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
  2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
                   ` (4 preceding siblings ...)
  2011-01-24 23:41 ` [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
@ 2011-01-24 23:41 ` Jeremy Fitzhardinge
  2011-01-25  1:16   ` Nick Piggin
  2011-01-25  1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
  6 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-24 23:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Nick Piggin, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/spinlock.h       |   49 +++++++--------------------------
 arch/x86/include/asm/spinlock_types.h |    6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f722f96..3afb1a7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();		/* make sure nothing creeps before the lock is taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-	unsigned int tmp, new;
-
-	asm volatile("movzwl %2, %0\n\t"
-		     "cmpb %h0,%b0\n\t"
-		     "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned tmp;
-	unsigned new;
-
-	asm volatile("movl %2,%0\n\t"
-		     "movl %0,%1\n\t"
-		     "roll $16, %0\n\t"
-		     "cmpl %0,%1\n\t"
-		     "leal 0x00010000(%" REG_PTR_MODE "0), %1\n\t"
-		     "jne 1f\n\t"
-		     LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-		     "1:"
-		     "sete %b1\n\t"
-		     "movzbl %b1,%0\n\t"
-		     : "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
-
-	return tmp;
+	arch_spinlock_t old, new;
+
+	old.tickets = ACCESS_ONCE(lock->tickets);
+	if (old.tickets.head != old.tickets.tail)
+		return 0;
+
+	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+	/* cmpxchg is a full barrier, so nothing can move before it */
+	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
 	union {
-		unsigned int slock;
+		__ticketpair_t head_tail;
 		struct __raw_tickets {
 			__ticket_t head, tail;
 		} tickets;
 	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	unsigned int lock;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Clean up ticketlock implementation
  2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
                   ` (5 preceding siblings ...)
  2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-01-25  1:08 ` Nick Piggin
  2011-01-31 21:46   ` Jeremy Fitzhardinge
  6 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25  1:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> Hi all,
>
> This series cleans up the x86 ticketlock implementation by converting
> a large proportion of it to C.  This eliminates the need for having
> separate implementations for "large" (NR_CPUS >= 256) and "small"
> (NR_CPUS < 256) ticket locks.
>
> This also lays the groundwork for future changes to the ticketlock
> implementation.
>
> Of course, the big question when converting from assembler to C is
> what the compiler will do to the code.  In general, the results are
> very similar.
>
> For example, the original hand-coded small-ticket ticket_lock is:
>      movl   $256, %eax
>      lock xadd %ax,(%rdi)
>   1: cmp    %ah,%al
>      je     2f
>      pause
>      mov    (%rdi),%al
>      jmp    1b
>   2:
>
> The C version, compiled by gcc 4.5.1 is:
>        movl   $256, %eax
>        lock; xaddw %ax, (%rdi)
>        movzbl  %ah, %edx
> .L3:    cmpb    %dl, %al
>        je      .L2
>        rep; nop
>        movb    (%rdi), %al     # lock_1(D)->D.5949.tickets.head, inc$head
>        jmp     .L3     #
> .L2:
>
> So very similar, except the compiler misses directly comparing
> %ah to %al.

Oh :(

Have you filed a bug with gcc?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
@ 2011-01-25  1:13   ` Nick Piggin
  2011-01-25  1:42     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25  1:13 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> If we don't need to use a locked inc for unlock, then implement it in C.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  arch/x86/include/asm/spinlock.h |   32 +++++++++++++++++---------------
>  1 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index f48a6e3..0170ba9 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -33,9 +33,21 @@
>  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
>  * (PPro errata 66, 92)
>  */
> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
> +{
> +       if (sizeof(lock->tickets.head) == sizeof(u8))
> +               asm (LOCK_PREFIX "incb %0"
> +                    : "+m" (lock->tickets.head) : : "memory");
> +       else
> +               asm (LOCK_PREFIX "incw %0"
> +                    : "+m" (lock->tickets.head) : : "memory");
> +
> +}
>  #else
> -# define UNLOCK_LOCK_PREFIX
> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
> +{
> +       lock->tickets.head++;
> +}
>  #endif
>
>  /*
> @@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>
>        return tmp;
>  }
> -
> -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
> -{
> -       asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
> -                    : "+m" (lock->slock)
> -                    :
> -                    : "memory", "cc");
> -}
>  #else
>  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>  {
> @@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>
>        return tmp;
>  }
> +#endif
>
>  static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>  {
> -       asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
> -                    : "+m" (lock->slock)
> -                    :
> -                    : "memory", "cc");
> +       __ticket_unlock_release(lock);
> +       barrier();              /* prevent reordering into locked region */
>  }
> -#endif

The barrier is wrong.

What makes me a tiny bit uneasy is that gcc is allowed to implement
this any way it wishes. OK there may be a NULL intersection of possible
valid assembly which is a buggy unlock... but relying on gcc to implement
lock primitives is scary. Does this really help in a way that can't be done
with the assembly versions?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
  2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
@ 2011-01-25  1:16   ` Nick Piggin
  2011-01-25  1:42     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25  1:16 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> Make trylock code common regardless of ticket size.

What's the asm for this look like?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
  2011-01-25  1:16   ` Nick Piggin
@ 2011-01-25  1:42     ` Jeremy Fitzhardinge
  2011-01-25  1:58       ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-25  1:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 01/24/2011 05:16 PM, Nick Piggin wrote:
> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> Make trylock code common regardless of ticket size.
> What's the asm for this look like?

Asm:

        movzwl (%rdi), %eax     # lock_1(D)->slock, tmp
        cmpb %ah,%al    # tmp
        leal 0x100(%rax), %edx  # tmp, new
        jne 1f
        lock; cmpxchgw %dx,(%rdi)       # new, lock_1(D)->slock
1:	sete %dl      # new
        movzbl %dl,%eax # new, tmp



C:

        movw    (%rdi), %dx     # lock_2(D)->D.5949.tickets, old
        xorl    %eax, %eax      # D.13954
        movzbl  %dh, %ecx       # old, tmp70
        cmpb    %dl, %cl        # old, tmp70
        jne     .L5     #,
        leal    256(%rdx), %ecx #, D.13956
        movl    %edx, %eax      # old, __ret
        lock; cmpxchgw %cx,(%rdi)       # D.13956,* lock
        cmpw    %dx, %ax        # old, __ret
        sete    %al     #, D.13954
        movzbl  %al, %eax       # D.13954, D.13954
.L5:


The C version can't take advantage of the fact that the cmpxchg directly
sets the flags, so it ends up re-comparing the old and swapped-out
values to set the return.  And it doesn't re-use the same sete to set
the return value in the quick failed-to-acquire path.

It might be worth having a generic cmpxchg() variant which returns a
succeed/fail flag rather than the fetched value, to avoid comparison in
this case - since many (most?) cmpxchg() callers end up doing that
comparison.

How performance critical is trylock?  I guess the ones in fs/dcache.c
are the ones looming large in your mind.

    J

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-01-25  1:13   ` Nick Piggin
@ 2011-01-25  1:42     ` Jeremy Fitzhardinge
  2011-01-25  1:49       ` Nick Piggin
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-25  1:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 01/24/2011 05:13 PM, Nick Piggin wrote:
> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> If we don't need to use a locked inc for unlock, then implement it in C.
>>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> ---
>>  arch/x86/include/asm/spinlock.h |   32 +++++++++++++++++---------------
>>  1 files changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
>> index f48a6e3..0170ba9 100644
>> --- a/arch/x86/include/asm/spinlock.h
>> +++ b/arch/x86/include/asm/spinlock.h
>> @@ -33,9 +33,21 @@
>>  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
>>  * (PPro errata 66, 92)
>>  */
>> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>> +{
>> +       if (sizeof(lock->tickets.head) == sizeof(u8))
>> +               asm (LOCK_PREFIX "incb %0"
>> +                    : "+m" (lock->tickets.head) : : "memory");
>> +       else
>> +               asm (LOCK_PREFIX "incw %0"
>> +                    : "+m" (lock->tickets.head) : : "memory");
>> +
>> +}
>>  #else
>> -# define UNLOCK_LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>> +{
>> +       lock->tickets.head++;
>> +}
>>  #endif
>>
>>  /*
>> @@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>
>>        return tmp;
>>  }
>> -
>> -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>> -{
>> -       asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
>> -                    : "+m" (lock->slock)
>> -                    :
>> -                    : "memory", "cc");
>> -}
>>  #else
>>  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>>  {
>> @@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>
>>        return tmp;
>>  }
>> +#endif
>>
>>  static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>>  {
>> -       asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
>> -                    : "+m" (lock->slock)
>> -                    :
>> -                    : "memory", "cc");
>> +       __ticket_unlock_release(lock);
>> +       barrier();              /* prevent reordering into locked region */
>>  }
>> -#endif
> The barrier is wrong.

In what way?  Do you think it should be on the other side?

> What makes me a tiny bit uneasy is that gcc is allowed to implement
> this any way it wishes. OK there may be a NULL intersection of possible
> valid assembly which is a buggy unlock... but relying on gcc to implement
> lock primitives is scary. Does this really help in a way that can't be done
> with the assembly versions?

We rely on C/gcc for plenty of other subtle ordering things.  Spinlocks
aren't particularly special in this regard.

    J


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock
  2011-01-25  1:42     ` Jeremy Fitzhardinge
@ 2011-01-25  1:49       ` Nick Piggin
  0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2011-01-25  1:49 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, Jan 25, 2011 at 12:42 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 01/24/2011 05:13 PM, Nick Piggin wrote:
>> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>>
>>> If we don't need to use a locked inc for unlock, then implement it in C.
>>>
>>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>> ---
>>>  arch/x86/include/asm/spinlock.h |   32 +++++++++++++++++---------------
>>>  1 files changed, 17 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
>>> index f48a6e3..0170ba9 100644
>>> --- a/arch/x86/include/asm/spinlock.h
>>> +++ b/arch/x86/include/asm/spinlock.h
>>> @@ -33,9 +33,21 @@
>>>  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
>>>  * (PPro errata 66, 92)
>>>  */
>>> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
>>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>>> +{
>>> +       if (sizeof(lock->tickets.head) == sizeof(u8))
>>> +               asm (LOCK_PREFIX "incb %0"
>>> +                    : "+m" (lock->tickets.head) : : "memory");
>>> +       else
>>> +               asm (LOCK_PREFIX "incw %0"
>>> +                    : "+m" (lock->tickets.head) : : "memory");
>>> +
>>> +}
>>>  #else
>>> -# define UNLOCK_LOCK_PREFIX
>>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>>> +{
>>> +       lock->tickets.head++;
>>> +}
>>>  #endif
>>>
>>>  /*
>>> @@ -93,14 +105,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>>
>>>        return tmp;
>>>  }
>>> -
>>> -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>>> -{
>>> -       asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
>>> -                    : "+m" (lock->slock)
>>> -                    :
>>> -                    : "memory", "cc");
>>> -}
>>>  #else
>>>  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>>>  {
>>> @@ -144,15 +148,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
>>>
>>>        return tmp;
>>>  }
>>> +#endif
>>>
>>>  static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
>>>  {
>>> -       asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
>>> -                    : "+m" (lock->slock)
>>> -                    :
>>> -                    : "memory", "cc");
>>> +       __ticket_unlock_release(lock);
>>> +       barrier();              /* prevent reordering into locked region */
>>>  }
>>> -#endif
>> The barrier is wrong.
>
> In what way?  Do you think it should be on the other side?

Well the other side is where the locked region is :)


>> What makes me a tiny bit uneasy is that gcc is allowed to implement
>> this any way it wishes. OK there may be a NULL intersection of possible
>> valid assembly which is a buggy unlock... but relying on gcc to implement
>> lock primitives is scary. Does this really help in a way that can't be done
>> with the assembly versions?
>
> We rely on C/gcc for plenty of other subtle ordering things.  Spinlocks
> aren't particularly special in this regard.

Well probably not orderings, but we do rely on it to do atomic
stores and loads to <= sizeof(long) data types, unfortunately.
We also rely on it not to speculatively store into data on not taken
side of branches and things like that.

I guess it's OK, but it makes me cringe a little bit to see unlock just
do head++

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
  2011-01-25  1:42     ` Jeremy Fitzhardinge
@ 2011-01-25  1:58       ` Nick Piggin
  2011-01-27 23:53         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2011-01-25  1:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On Tue, Jan 25, 2011 at 12:42 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 01/24/2011 05:16 PM, Nick Piggin wrote:
>> On Tue, Jan 25, 2011 at 10:41 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>>
>>> Make trylock code common regardless of ticket size.
>> What's the asm for this look like?
>
> Asm:
>
>        movzwl (%rdi), %eax     # lock_1(D)->slock, tmp
>        cmpb %ah,%al    # tmp
>        leal 0x100(%rax), %edx  # tmp, new
>        jne 1f
>        lock; cmpxchgw %dx,(%rdi)       # new, lock_1(D)->slock
> 1:      sete %dl      # new
>        movzbl %dl,%eax # new, tmp
>
>
>
> C:
>
>        movw    (%rdi), %dx     # lock_2(D)->D.5949.tickets, old
>        xorl    %eax, %eax      # D.13954
>        movzbl  %dh, %ecx       # old, tmp70
>        cmpb    %dl, %cl        # old, tmp70
>        jne     .L5     #,
>        leal    256(%rdx), %ecx #, D.13956
>        movl    %edx, %eax      # old, __ret
>        lock; cmpxchgw %cx,(%rdi)       # D.13956,* lock
>        cmpw    %dx, %ax        # old, __ret
>        sete    %al     #, D.13954
>        movzbl  %al, %eax       # D.13954, D.13954
> .L5:
>
>
> The C version can't take advantage of the fact that the cmpxchg directly
> sets the flags, so it ends up re-comparing the old and swapped-out
> values to set the return.  And it doesn't re-use the same sete to set
> the return value in the quick failed-to-acquire path.

Hm.


> It might be worth having a generic cmpxchg() variant which returns a
> succeed/fail flag rather than the fetched value, to avoid comparison in
> this case - since many (most?) cmpxchg() callers end up doing that
> comparison.
>
> How performance critical is trylock?  I guess the ones in fs/dcache.c
> are the ones looming large in your mind.

Well they are on on the reclaim/free path rather than the _hottest_
paths, but yes they are performance critical.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common
  2011-01-25  1:58       ` Nick Piggin
@ 2011-01-27 23:53         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-27 23:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 01/24/2011 05:58 PM, Nick Piggin wrote:
>>
>> The C version can't take advantage of the fact that the cmpxchg directly
>> sets the flags, so it ends up re-comparing the old and swapped-out
>> values to set the return.  And it doesn't re-use the same sete to set
>> the return value in the quick failed-to-acquire path.
> Hm.

Adding a "cmpxchg_flag" which does its own sete and returns a boolean
"success" flag, the whole thing goes to:

__ticket_spin_trylock:
        movw    (%rdi), %ax     # lock_2(D)->D.5950.tickets, old
        xorl    %edx, %edx      # D.13949
        movzbl  %ah, %ecx       # old, tmp69
        cmpb    %al, %cl        # old, tmp69
        jne     .L5     #,
        leal    256(%rax), %edx #, D.13951
        lock; cmpxchgw %dx,(%rdi); sete %al     # D.13951,* lock, __ret
        movzbl  %al, %edx       # __ret, D.13949
.L5:
        movl    %edx, %eax      # D.13949,
        ret




The eax/edx shuffle is a bit unfortunate I can't see it hurting very much.


>> It might be worth having a generic cmpxchg() variant which returns a
>> succeed/fail flag rather than the fetched value, to avoid comparison in
>> this case - since many (most?) cmpxchg() callers end up doing that
>> comparison.
>>
>> How performance critical is trylock?  I guess the ones in fs/dcache.c
>> are the ones looming large in your mind.
> Well they are on on the reclaim/free path rather than the _hottest_
> paths, but yes they are performance critical.

I think that code looks pretty reasonable.

    J


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Clean up ticketlock implementation
  2011-01-25  1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
@ 2011-01-31 21:46   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-01-31 21:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar,
	the arch/x86 maintainers, Linux Kernel Mailing List, Nick Piggin,
	Jeremy Fitzhardinge

On 01/24/2011 05:08 PM, Nick Piggin wrote:
>> So very similar, except the compiler misses directly comparing
>> %ah to %al.
> Oh :(
>
> Have you filed a bug with gcc?

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47556

    J


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-01-31 21:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-24 23:41 [PATCH 0/6] Clean up ticketlock implementation Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 1/6] x86/ticketlock: clean up types and accessors Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 2/6] x86/ticketlock: convert spin loop to C Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 3/6] x86/ticketlock: Use C for __ticket_spin_unlock Jeremy Fitzhardinge
2011-01-25  1:13   ` Nick Piggin
2011-01-25  1:42     ` Jeremy Fitzhardinge
2011-01-25  1:49       ` Nick Piggin
2011-01-24 23:41 ` [PATCH 4/6] x86/ticketlock: make large and small ticket versions of spin_lock the same Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 5/6] x86/ticketlock: make __ticket_spin_lock common Jeremy Fitzhardinge
2011-01-24 23:41 ` [PATCH 6/6] x86/ticketlock: make __ticket_spin_trylock common Jeremy Fitzhardinge
2011-01-25  1:16   ` Nick Piggin
2011-01-25  1:42     ` Jeremy Fitzhardinge
2011-01-25  1:58       ` Nick Piggin
2011-01-27 23:53         ` Jeremy Fitzhardinge
2011-01-25  1:08 ` [PATCH 0/6] Clean up ticketlock implementation Nick Piggin
2011-01-31 21:46   ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.