linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] Paravirtual spinlocks
@ 2008-07-07 19:07 Jeremy Fitzhardinge
  2008-07-07 19:07 ` [PATCH RFC 1/4] x86/paravirt: add hooks for spinlock operations Jeremy Fitzhardinge
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-07 19:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel

At the most recent Xen Summit, Thomas Friebel presented a paper
("Preventing Guests from Spinning Around",
http://xen.org/files/xensummitboston08/LHP.pdf) investigating the
interactions between spinlocks and virtual machines.  Specifically, he
looked at what happens when a lock-holding VCPU gets involuntarily
preempted.

The obvious first order effect is that while the VCPU is not running,
the effective critical region time goes from being microseconds to
milliseconds, until it gets scheduled again.  This increases the
chance that there will be be contention, and the contending VCPU will
waste time spinning.

This is a measurable effect, but not terribly serious.  After all,
since Linux tends to hold locks for very short periods of time,
the likelihood of being preempted while holding a lock is low.

The real eye-opener is the secondary effects specific to ticket locks.

Clearly ticket locks suffer the same problem as all spinlocks.  But
when the lock holder releases the lock, the real fun begins.

By design, ticket locks are strictly fair, by imposing a FIFO order
lock holders.  The micro-architectural effect of this is that the
lock cache line will bounce around between the contending CPUs until
it finds the next in line, who then takes the lock and carries on.

When running in a virtual machine, a similar effect happens at the
VCPU level.  If all the contending VCPUs are not currently running on
real CPUs, then VCPU scheduler will run some random subset of them.
If it isn't a given VCPUs turn to take the lock, it will spin, burning
a VCPU timeslice.  Eventually the next-in-line will get scheduled,
take the lock, release it, and the remaining contending VCPUs will
repeat the process until the next in line is scheduled.

This means that the effective contention time of the lock is not
merely the time if takes the original lock-holder to take and release
the lock - including any preemption it may suffer - but the
spin-scheduling storm that follows to schedule the right VCPU to next
take the lock.  This could happen if the original contention was not
as a result of preemption, but just normal spinlock level contention.

One of the results Thomas presents is a kernbench run which normally
takes less than a minute going for 45 minutes, with 99+% spent in
ticket lock contention.  I've reproduced similar results.

This series has:
 - a paravirt_ops spinlock interface, which defaults to the standard
   ticket lock algorithm,
 - a second spinlock implementation based on the pre-ticket-lock
   "lock-byte" algorithm,
 - And a Xen-specific spinlock algorithm which voluntarily preempts a
   VCPU if it spins for too long. [FOR REFERENCE ONLY: will not apply
   to a current git tree.]

When running on native hardware, the overhead of enabling
CONFIG_PARAVIRT is an extra direct call/return on the lock/unlock
paths; the paravirt-ops patching machinery eliminates any indirect
calls.  With a small amount of restructuring, this overhead could be
eliminated (by making spin_lock()/unlock() inline functions,
containing calls to __raw_spin_lock/unlock).

My experiments show that using a Xen-specific lock helps guest
performance a bit (reduction in elapsed and system time in a kernbench
run), but most significantly, reduces overall physical CPU consumption
by 10%, and so increases overall system scalability.

   J
-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 1/4] x86/paravirt: add hooks for spinlock operations
  2008-07-07 19:07 [PATCH RFC 0/4] Paravirtual spinlocks Jeremy Fitzhardinge
@ 2008-07-07 19:07 ` Jeremy Fitzhardinge
  2008-07-07 19:07 ` [PATCH RFC 2/4] paravirt: introduce a "lock-byte" spinlock implementation Jeremy Fitzhardinge
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-07 19:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Jeremy Fitzhardinge

[-- Attachment #1: paravirt-spinlocks.patch --]
[-- Type: text/plain, Size: 8933 bytes --]

Ticket spinlocks have absolutely ghastly worst-case performance
characteristics in a virtual environment.  If there is any contention
for physical CPUs (ie, there are more runnable vcpus than cpus), then
ticket locks can cause the system to end up spending 90+% of its time
spinning.

The problem is that (v)cpus waiting on a ticket spinlock will be
granted access to the lock in strict order they got their tickets.  If
the hypervisor scheduler doesn't give the vcpus time in that order,
they will burn timeslices waiting for the scheduler to give the right
vcpu some time.  In the worst case it could take O(n^2) vcpu scheduler
timeslices for everyone waiting on the lock to get it, not counting
new cpus trying to take the lock while the log-jam is sorted out.

These hooks allow a paravirt backend to replace the spinlock
implementation.

At the very least, this could revert the implementation back to the
old lock algorithm, which allows the next scheduled vcpu to take the
lock, and has basically fairly good performance.

It also allows the spinlocks to take advantages of the hypervisor
features to make locks more efficient (spin and block, for example).

The cost to native execution is an extra direct call when using a
spinlock function.  There's no overhead if CONFIG_PARAVIRT is turned
off.

The lock structure is fixed at a single "unsigned int", initialized to
zero, but the spinlock implementation can use it as it wishes.

Thanks to Thomas Friebel's Xen Summit talk "Preventing Guests from
Spinning Around" for pointing out this problem.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Thomas Friebel <thomas.friebel@amd.com>
---
 arch/x86/kernel/paravirt.c       |   10 ++++++
 include/asm-x86/paravirt.h       |   37 +++++++++++++++++++++++++
 include/asm-x86/spinlock.h       |   55 +++++++++++++++++++++++++++-----------
 include/asm-x86/spinlock_types.h |    2 -
 4 files changed, 88 insertions(+), 16 deletions(-)

===================================================================
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -123,6 +123,7 @@
 		.pv_irq_ops = pv_irq_ops,
 		.pv_apic_ops = pv_apic_ops,
 		.pv_mmu_ops = pv_mmu_ops,
+		.pv_lock_ops = pv_lock_ops,
 	};
 	return *((void **)&tmpl + type);
 }
@@ -445,6 +446,15 @@
 	.set_fixmap = native_set_fixmap,
 };
 
+struct pv_lock_ops pv_lock_ops = {
+	.spin_is_locked = __ticket_spin_is_locked,
+	.spin_is_contended = __ticket_spin_is_contended,
+
+	.spin_lock = __ticket_spin_lock,
+	.spin_trylock = __ticket_spin_trylock,
+	.spin_unlock = __ticket_spin_unlock,
+};
+
 EXPORT_SYMBOL_GPL(pv_time_ops);
 EXPORT_SYMBOL    (pv_cpu_ops);
 EXPORT_SYMBOL    (pv_mmu_ops);
===================================================================
--- a/include/asm-x86/paravirt.h
+++ b/include/asm-x86/paravirt.h
@@ -329,6 +329,15 @@
 			   unsigned long phys, pgprot_t flags);
 };
 
+struct raw_spinlock;
+struct pv_lock_ops {
+	int (*spin_is_locked)(struct raw_spinlock *lock);
+	int (*spin_is_contended)(struct raw_spinlock *lock);
+	void (*spin_lock)(struct raw_spinlock *lock);
+	int (*spin_trylock)(struct raw_spinlock *lock);
+	void (*spin_unlock)(struct raw_spinlock *lock);
+};
+
 /* This contains all the paravirt structures: we get a convenient
  * number for each function using the offset which we use to indicate
  * what to patch. */
@@ -339,6 +348,7 @@
 	struct pv_irq_ops pv_irq_ops;
 	struct pv_apic_ops pv_apic_ops;
 	struct pv_mmu_ops pv_mmu_ops;
+	struct pv_lock_ops pv_lock_ops;
 };
 
 extern struct pv_info pv_info;
@@ -348,6 +358,7 @@
 extern struct pv_irq_ops pv_irq_ops;
 extern struct pv_apic_ops pv_apic_ops;
 extern struct pv_mmu_ops pv_mmu_ops;
+extern struct pv_lock_ops pv_lock_ops;
 
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
@@ -1377,6 +1388,31 @@
 void _paravirt_nop(void);
 #define paravirt_nop	((void *)_paravirt_nop)
 
+static inline int __raw_spin_is_locked(struct raw_spinlock *lock)
+{
+	return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+}
+
+static inline int __raw_spin_is_contended(struct raw_spinlock *lock)
+{
+	return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
+}
+
+static __always_inline void __raw_spin_lock(struct raw_spinlock *lock)
+{
+	return PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
+}
+
+static __always_inline int __raw_spin_trylock(struct raw_spinlock *lock)
+{
+	return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
+}
+
+static __always_inline void __raw_spin_unlock(struct raw_spinlock *lock)
+{
+	return PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+}
+
 /* These all sit in the .parainstructions section to tell us what to patch. */
 struct paravirt_patch_site {
 	u8 *instr; 		/* original instructions */
@@ -1461,6 +1497,7 @@
 	return f;
 }
 
+
 /* Make sure as little as possible of this mess escapes. */
 #undef PARAVIRT_CALL
 #undef __PVOP_CALL
===================================================================
--- a/include/asm-x86/spinlock.h
+++ b/include/asm-x86/spinlock.h
@@ -6,7 +6,7 @@
 #include <asm/page.h>
 #include <asm/processor.h>
 #include <linux/compiler.h>
-
+#include <asm/paravirt.h>
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -54,21 +54,21 @@
  * much between them in performance though, especially as locks are out of line.
  */
 #if (NR_CPUS < 256)
-static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
+static inline int __ticket_spin_is_locked(raw_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->slock);
 
 	return (((tmp >> 8) & 0xff) != (tmp & 0xff));
 }
 
-static inline int __raw_spin_is_contended(raw_spinlock_t *lock)
+static inline int __ticket_spin_is_contended(raw_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->slock);
 
 	return (((tmp >> 8) & 0xff) - (tmp & 0xff)) > 1;
 }
 
-static __always_inline void __raw_spin_lock(raw_spinlock_t *lock)
+static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock)
 {
 	short inc = 0x0100;
 
@@ -87,9 +87,7 @@
 		: "memory", "cc");
 }
 
-#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
-
-static __always_inline int __raw_spin_trylock(raw_spinlock_t *lock)
+static __always_inline int __ticket_spin_trylock(raw_spinlock_t *lock)
 {
 	int tmp;
 	short new;
@@ -110,7 +108,7 @@
 	return tmp;
 }
 
-static __always_inline void __raw_spin_unlock(raw_spinlock_t *lock)
+static __always_inline void __ticket_spin_unlock(raw_spinlock_t *lock)
 {
 	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
 		     : "+m" (lock->slock)
@@ -118,21 +116,21 @@
 		     : "memory", "cc");
 }
 #else
-static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
+static inline int __ticket_spin_is_locked(raw_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->slock);
 
 	return (((tmp >> 16) & 0xffff) != (tmp & 0xffff));
 }
 
-static inline int __raw_spin_is_contended(raw_spinlock_t *lock)
+static inline int __ticket_spin_is_contended(raw_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->slock);
 
 	return (((tmp >> 16) & 0xffff) - (tmp & 0xffff)) > 1;
 }
 
-static __always_inline void __raw_spin_lock(raw_spinlock_t *lock)
+static __always_inline void __ticket_spin_lock(raw_spinlock_t *lock)
 {
 	int inc = 0x00010000;
 	int tmp;
@@ -153,9 +151,7 @@
 		     : "memory", "cc");
 }
 
-#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
-
-static __always_inline int __raw_spin_trylock(raw_spinlock_t *lock)
+static __always_inline int __ticket_spin_trylock(raw_spinlock_t *lock)
 {
 	int tmp;
 	int new;
@@ -177,7 +173,7 @@
 	return tmp;
 }
 
-static __always_inline void __raw_spin_unlock(raw_spinlock_t *lock)
+static __always_inline void __ticket_spin_unlock(raw_spinlock_t *lock)
 {
 	asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
 		     : "+m" (lock->slock)
@@ -185,6 +181,35 @@
 		     : "memory", "cc");
 }
 #endif
+
+#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
+
+#ifndef CONFIG_PARAVIRT
+static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
+{
+	return __ticket_spin_is_locked(lock);
+}
+
+static inline int __raw_spin_is_contended(raw_spinlock_t *lock)
+{
+	return __ticket_spin_is_contended(lock);
+}
+
+static __always_inline void __raw_spin_lock(raw_spinlock_t *lock)
+{
+	__ticket_spin_lock(lock);
+}
+
+static __always_inline int __raw_spin_trylock(raw_spinlock_t *lock)
+{
+	return __ticket_spin_trylock(lock);
+}
+
+static __always_inline void __raw_spin_unlock(raw_spinlock_t *lock)
+{
+	__ticket_spin_unlock(lock);
+}
+#endif	/* CONFIG_PARAVIRT */
 
 static inline void __raw_spin_unlock_wait(raw_spinlock_t *lock)
 {
===================================================================
--- a/include/asm-x86/spinlock_types.h
+++ b/include/asm-x86/spinlock_types.h
@@ -5,7 +5,7 @@
 # error "please don't include this file directly"
 #endif
 
-typedef struct {
+typedef struct raw_spinlock {
 	unsigned int slock;
 } raw_spinlock_t;
 

-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 2/4] paravirt: introduce a "lock-byte" spinlock implementation
  2008-07-07 19:07 [PATCH RFC 0/4] Paravirtual spinlocks Jeremy Fitzhardinge
  2008-07-07 19:07 ` [PATCH RFC 1/4] x86/paravirt: add hooks for spinlock operations Jeremy Fitzhardinge
@ 2008-07-07 19:07 ` Jeremy Fitzhardinge
  2008-07-07 19:07 ` [PATCH RFC 3/4] xen: use lock-byte " Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-07 19:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Jeremy Fitzhardinge

[-- Attachment #1: paravirt-byte-spinlock.patch --]
[-- Type: text/plain, Size: 4062 bytes --]

Implement a version of the old spinlock algorithm, in which everyone
spins waiting for a lock byte.  In order to be compatible with the
ticket-lock's use of a zero initializer, this uses the convention of
'0' for unlocked and '1' for locked.

This algorithm is much better than ticket locks in a virtual
envionment, because it doesn't interact badly with the vcpu scheduler.
If there are multiple vcpus spinning on a lock and the lock is
released, the next vcpu to be scheduled will take the lock, rather
than cycling around until the next ticketed vcpu gets it.

To use this, you must call paravirt_use_bytelocks() very early, before
any spinlocks have been taken.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Thomas Friebel <thomas.friebel@amd.com>
---
 arch/x86/kernel/paravirt.c |    9 ++++++
 include/asm-x86/paravirt.h |    2 +
 include/asm-x86/spinlock.h |   65 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 75 insertions(+), 1 deletion(-)

===================================================================
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -269,6 +269,15 @@
 	return __get_cpu_var(paravirt_lazy_mode);
 }
 
+void __init paravirt_use_bytelocks(void)
+{
+	pv_lock_ops.spin_is_locked = __byte_spin_is_locked;
+	pv_lock_ops.spin_is_contended = __byte_spin_is_contended;
+	pv_lock_ops.spin_lock = __byte_spin_lock;
+	pv_lock_ops.spin_trylock = __byte_spin_trylock;
+	pv_lock_ops.spin_unlock = __byte_spin_unlock;
+}
+
 struct pv_info pv_info = {
 	.name = "bare hardware",
 	.paravirt_enabled = 0,
===================================================================
--- a/include/asm-x86/paravirt.h
+++ b/include/asm-x86/paravirt.h
@@ -1388,6 +1388,8 @@
 void _paravirt_nop(void);
 #define paravirt_nop	((void *)_paravirt_nop)
 
+void paravirt_use_bytelocks(void);
+
 static inline int __raw_spin_is_locked(struct raw_spinlock *lock)
 {
 	return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
===================================================================
--- a/include/asm-x86/spinlock.h
+++ b/include/asm-x86/spinlock.h
@@ -184,7 +184,70 @@
 
 #define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
 
-#ifndef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT
+/*
+ * Define virtualization-friendly old-style lock byte lock, for use in
+ * pv_lock_ops if desired.
+ *
+ * This differs from the pre-2.6.24 spinlock by always using xchgb
+ * rather than decb to take the lock; this allows it to use a
+ * zero-initialized lock structure.  It also maintains a 1-byte
+ * contention counter, so that we can implement
+ * __byte_spin_is_contended.
+ */
+struct __byte_spinlock {
+	s8 lock;
+	s8 spinners;
+};
+
+static inline int __byte_spin_is_locked(raw_spinlock_t *lock)
+{
+	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
+	return bl->lock != 0;
+}
+
+static inline int __byte_spin_is_contended(raw_spinlock_t *lock)
+{
+	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
+	return bl->spinners != 0;
+}
+
+static inline void __byte_spin_lock(raw_spinlock_t *lock)
+{
+	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
+	s8 val = 1;
+
+	asm("1: xchgb %1, %0\n"
+	    "   test %1,%1\n"
+	    "   jz 3f\n"
+	    "   " LOCK_PREFIX "incb %2\n"
+	    "2: rep;nop\n"
+	    "   cmpb $1, %0\n"
+	    "   je 2b\n"
+	    "   " LOCK_PREFIX "decb %2\n"
+	    "   jmp 1b\n"
+	    "3:"
+	    : "+m" (bl->lock), "+q" (val), "+m" (bl->spinners): : "memory");
+}
+
+static inline int __byte_spin_trylock(raw_spinlock_t *lock)
+{
+	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
+	u8 old = 1;
+
+	asm("xchgb %1,%0"
+	    : "+m" (bl->lock), "+q" (old) : : "memory");
+
+	return old == 0;
+}
+
+static inline void __byte_spin_unlock(raw_spinlock_t *lock)
+{
+	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
+	smp_wmb();
+	bl->lock = 0;
+}
+#else  /* !CONFIG_PARAVIRT */
 static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
 {
 	return __ticket_spin_is_locked(lock);

-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 3/4] xen: use lock-byte spinlock implementation
  2008-07-07 19:07 [PATCH RFC 0/4] Paravirtual spinlocks Jeremy Fitzhardinge
  2008-07-07 19:07 ` [PATCH RFC 1/4] x86/paravirt: add hooks for spinlock operations Jeremy Fitzhardinge
  2008-07-07 19:07 ` [PATCH RFC 2/4] paravirt: introduce a "lock-byte" spinlock implementation Jeremy Fitzhardinge
@ 2008-07-07 19:07 ` Jeremy Fitzhardinge
  2008-07-07 19:07 ` [PATCH RFC 4/4] xen: implement Xen-specific spinlocks Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-07 19:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Jeremy Fitzhardinge

[-- Attachment #1: xen-use-byte-locks.patch --]
[-- Type: text/plain, Size: 525 bytes --]

Switch to using the lock-byte spinlock implementation, to avoid the
worst of the performance hit from ticket locks.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Thomas Friebel <thomas.friebel@amd.com>
---
 arch/x86/xen/smp.c |    1 +
 1 file changed, 1 insertion(+)

===================================================================
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -430,4 +430,5 @@
 {
 	smp_ops = xen_smp_ops;
 	xen_fill_possible_map();
+	paravirt_use_bytelocks();
 }

-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 4/4] xen: implement Xen-specific spinlocks
  2008-07-07 19:07 [PATCH RFC 0/4] Paravirtual spinlocks Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2008-07-07 19:07 ` [PATCH RFC 3/4] xen: use lock-byte " Jeremy Fitzhardinge
@ 2008-07-07 19:07 ` Jeremy Fitzhardinge
  2008-07-08  6:37   ` Johannes Weiner
  2008-07-08  0:29 ` [PATCH RFC 0/4] Paravirtual spinlocks Rusty Russell
  2008-07-09 12:28 ` Ingo Molnar
  5 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-07 19:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Jeremy Fitzhardinge

[-- Attachment #1: xen-pv-spinlocks.patch --]
[-- Type: text/plain, Size: 8599 bytes --]

The standard ticket spinlocks are very expensive in a virtual
environment, because their performance depends on Xen's scheduler
giving vcpus time in the order that they're supposed to take the
spinlock.

This implements a Xen-specific spinlock, which should be much more
efficient.

The fast-path is essentially the old Linux-x86 locks, using a single
lock byte.  The locker decrements the byte; if the result is 0, then
they have the lock.  If the lock is negative, then locker must spin
until the lock is positive again.

When there's contention, the locker spin for 2^16[*] iterations waiting
to get the lock.  If it fails to get the lock in that time, it adds
itself to the contention count in the lock and blocks on a per-cpu
event channel.

When unlocking the spinlock, the locker looks to see if there's anyone
blocked waiting for the lock by checking for a non-zero waiter count.
If there's a waiter, it traverses the per-cpu "lock_spinners"
variable, which contains which lock each CPU is waiting on.  It picks
one CPU waiting on the lock and sends it an event to wake it up.

This allows efficient fast-path spinlock operation, while allowing
spinning vcpus to give up their processor time while waiting for a
contended lock.

[*] 2^16 iterations is threshold at which 98% locks have been taken
according to Thomas Friebel's Xen Summit talk "Preventing Guests from
Spinning Around".  Therefore, we'd expect the lock and unlock slow
paths will only be entered 2% of the time.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Thomas Friebel <thomas.friebel@amd.com>
---
 arch/x86/xen/smp.c           |  172 +++++++++++++++++++++++++++++++++++++++++-
 drivers/xen/events.c         |   27 ++++++
 include/asm-x86/xen/events.h |    1 
 include/xen/events.h         |    7 +
 4 files changed, 206 insertions(+), 1 deletion(-)

===================================================================
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -15,6 +15,7 @@
  * This does not handle HOTPLUG_CPU yet.
  */
 #include <linux/sched.h>
+#include <linux/kernel_stat.h>
 #include <linux/err.h>
 #include <linux/smp.h>
 
@@ -34,6 +35,8 @@
 
 #include "xen-ops.h"
 #include "mmu.h"
+
+static void __cpuinit xen_init_lock_cpu(int cpu);
 
 cpumask_t xen_cpu_initialized_map;
 
@@ -179,6 +182,8 @@
 {
 	unsigned cpu;
 
+	xen_init_lock_cpu(0);
+
 	smp_store_cpu_info(0);
 	cpu_data(0).x86_max_cores = 1;
 	set_cpu_sibling_map(0);
@@ -301,6 +306,7 @@
 	clear_tsk_thread_flag(idle, TIF_FORK);
 #endif
 	xen_setup_timer(cpu);
+	xen_init_lock_cpu(cpu);
 
 	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
@@ -413,6 +419,170 @@
 	return IRQ_HANDLED;
 }
 
+struct xen_spinlock {
+	unsigned char lock;		/* 0 -> free; 1 -> locked */
+	unsigned short spinners;	/* count of waiting cpus */
+};
+
+static int xen_spin_is_locked(struct raw_spinlock *lock)
+{
+	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
+
+	return xl->lock != 0;
+}
+
+static int xen_spin_is_contended(struct raw_spinlock *lock)
+{
+	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
+
+	/* Not strictly true; this is only the count of contended
+	   lock-takers entering the slow path. */
+	return xl->spinners != 0;
+}
+
+static int xen_spin_trylock(struct raw_spinlock *lock)
+{
+	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
+	u8 old = 1;
+
+	asm("xchgb %b0,%1"
+	    : "+q" (old), "+m" (xl->lock) : : "memory");
+
+	return old == 0;
+}
+
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
+
+static inline void spinning_lock(struct xen_spinlock *xl)
+{
+	__get_cpu_var(lock_spinners) = xl;
+	wmb();			/* set lock of interest before count */
+	asm(LOCK_PREFIX " incw %0"
+	    : "+m" (xl->spinners) : : "memory");
+}
+
+static inline void unspinning_lock(struct xen_spinlock *xl)
+{
+	asm(LOCK_PREFIX " decw %0"
+	    : "+m" (xl->spinners) : : "memory");
+	wmb();			/* decrement count before clearing lock */
+	__get_cpu_var(lock_spinners) = NULL;
+}
+
+static noinline int xen_spin_lock_slow(struct raw_spinlock *lock)
+{
+	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
+	int irq = __get_cpu_var(lock_kicker_irq);
+	int ret;
+
+	/* If kicker interrupts not initialized yet, just spin */
+	if (irq == -1)
+		return 0;
+
+	/* announce we're spinning */
+	spinning_lock(xl);
+
+	/* clear pending */
+	xen_clear_irq_pending(irq);
+
+	/* check again make sure it didn't become free while
+	   we weren't looking  */
+	ret = xen_spin_trylock(lock);
+	if (ret)
+		goto out;
+
+	/* block until irq becomes pending */
+	xen_poll_irq(irq);
+	kstat_this_cpu.irqs[irq]++;
+
+out:
+	unspinning_lock(xl);
+	return ret;
+}
+
+static void xen_spin_lock(struct raw_spinlock *lock)
+{
+	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
+	int timeout;
+	u8 oldval;
+
+	do {
+		timeout = 1 << 10;
+
+		asm("1: xchgb %1,%0\n"
+		    "   testb %1,%1\n"
+		    "   jz 3f\n"
+		    "2: rep;nop\n"
+		    "   cmpb $0,%0\n"
+		    "   je 1b\n"
+		    "   dec %2\n"
+		    "   jnz 2b\n"
+		    "3:\n"
+		    : "+m" (xl->lock), "=q" (oldval), "+r" (timeout)
+		    : "1" (1)
+		    : "memory");
+
+	} while (unlikely(oldval != 0 && !xen_spin_lock_slow(lock)));
+}
+
+static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		/* XXX should mix up next cpu selection */
+		if (per_cpu(lock_spinners, cpu) == xl) {
+			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+			break;
+		}
+	}
+}
+
+static void xen_spin_unlock(struct raw_spinlock *lock)
+{
+	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
+
+	smp_wmb();		/* make sure no writes get moved after unlock */
+	xl->lock = 0;		/* release lock */
+
+	/* make sure unlock happens before kick */
+	barrier();
+
+	if (unlikely(xl->spinners))
+		xen_spin_unlock_slow(xl);
+}
+
+static __cpuinit void xen_init_lock_cpu(int cpu)
+{
+	int irq;
+	const char *name;
+
+	name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
+	irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
+				     cpu,
+				     xen_reschedule_interrupt,
+				     IRQF_DISABLED|IRQF_PERCPU|IRQF_NOBALANCING,
+				     name,
+				     NULL);
+
+	if (irq >= 0) {
+		disable_irq(irq); /* make sure it's never delivered */
+		per_cpu(lock_kicker_irq, cpu) = irq;
+	}
+
+	printk("cpu %d spinlock event irq %d\n", cpu, irq);
+}
+
+static void __init xen_init_spinlocks(void)
+{
+	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
+	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
+	pv_lock_ops.spin_lock = xen_spin_lock;
+	pv_lock_ops.spin_trylock = xen_spin_trylock;
+	pv_lock_ops.spin_unlock = xen_spin_unlock;
+}
+
 static const struct smp_ops xen_smp_ops __initdata = {
 	.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu,
 	.smp_prepare_cpus = xen_smp_prepare_cpus,
@@ -430,5 +600,5 @@
 {
 	smp_ops = xen_smp_ops;
 	xen_fill_possible_map();
-	paravirt_use_bytelocks();
+	xen_init_spinlocks();
 }
===================================================================
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -741,6 +741,33 @@
 	}
 }
 
+/* Clear an irq's pending state, in preparation for polling on it */
+void xen_clear_irq_pending(int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		clear_evtchn(evtchn);
+}
+
+/* Poll waiting for an irq to become pending.  In the usual case, the
+   irq will be disabled so it won't deliver an interrupt. */
+void xen_poll_irq(int irq)
+{
+	evtchn_port_t evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn)) {
+		struct sched_poll poll;
+
+		poll.nr_ports = 1;
+		poll.timeout = 0;
+		poll.ports = &evtchn;
+
+		if (HYPERVISOR_sched_op(SCHEDOP_poll, &poll) != 0)
+			BUG();
+	}
+}
+
 void xen_irq_resume(void)
 {
 	unsigned int cpu, irq, evtchn;
===================================================================
--- a/include/asm-x86/xen/events.h
+++ b/include/asm-x86/xen/events.h
@@ -5,6 +5,7 @@
 	XEN_RESCHEDULE_VECTOR,
 	XEN_CALL_FUNCTION_VECTOR,
 	XEN_CALL_FUNCTION_SINGLE_VECTOR,
+	XEN_SPIN_UNLOCK_VECTOR,
 
 	XEN_NR_IPIS,
 };
===================================================================
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -45,4 +45,11 @@
 extern void xen_irq_resume(void);
 extern void xen_evtchn_do_upcall(struct pt_regs *regs);
 
+/* Clear an irq's pending state, in preparation for polling on it */
+void xen_clear_irq_pending(int irq);
+
+/* Poll waiting for an irq to become pending.  In the usual case, the
+   irq will be disabled so it won't deliver an interrupt. */
+void xen_poll_irq(int irq);
+
 #endif	/* _XEN_EVENTS_H */

-- 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/4] Paravirtual spinlocks
  2008-07-07 19:07 [PATCH RFC 0/4] Paravirtual spinlocks Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2008-07-07 19:07 ` [PATCH RFC 4/4] xen: implement Xen-specific spinlocks Jeremy Fitzhardinge
@ 2008-07-08  0:29 ` Rusty Russell
  2008-07-08  0:37   ` Jeremy Fitzhardinge
  2008-07-08  4:51   ` Nick Piggin
  2008-07-09 12:28 ` Ingo Molnar
  5 siblings, 2 replies; 20+ messages in thread
From: Rusty Russell @ 2008-07-08  0:29 UTC (permalink / raw)
  To: virtualization
  Cc: Jeremy Fitzhardinge, Nick Piggin, Jens Axboe, Xen devel,
	Peter Zijlstra, Christoph Lameter, Petr Tesarik, LKML,
	Thomas Friebel, Ingo Molnar

On Tuesday 08 July 2008 05:07:49 Jeremy Fitzhardinge wrote:
> At the most recent Xen Summit, Thomas Friebel presented a paper
> ("Preventing Guests from Spinning Around",
> http://xen.org/files/xensummitboston08/LHP.pdf) investigating the
> interactions between spinlocks and virtual machines.  Specifically, he
> looked at what happens when a lock-holding VCPU gets involuntarily
> preempted.

I find it interesting that gang scheduling the guest was not suggested as an 
obvious solution.

Anyway, concept looks fine; lguest's solution is more elegant of course :)

A little disappointing that you can't patch your version inline.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/4] Paravirtual spinlocks
  2008-07-08  0:29 ` [PATCH RFC 0/4] Paravirtual spinlocks Rusty Russell
@ 2008-07-08  0:37   ` Jeremy Fitzhardinge
  2008-07-08  1:01     ` Rusty Russell
  2008-07-08  4:51   ` Nick Piggin
  1 sibling, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-08  0:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: virtualization, Nick Piggin, Jens Axboe, Xen devel,
	Peter Zijlstra, Christoph Lameter, Petr Tesarik, LKML,
	Thomas Friebel, Ingo Molnar

Rusty Russell wrote:
> On Tuesday 08 July 2008 05:07:49 Jeremy Fitzhardinge wrote:
>   
>> At the most recent Xen Summit, Thomas Friebel presented a paper
>> ("Preventing Guests from Spinning Around",
>> http://xen.org/files/xensummitboston08/LHP.pdf) investigating the
>> interactions between spinlocks and virtual machines.  Specifically, he
>> looked at what happens when a lock-holding VCPU gets involuntarily
>> preempted.
>>     
>
> I find it interesting that gang scheduling the guest was not suggested as an 
> obvious solution.
>   

It's an obvious answer, but not an obvious solution.  You trade off 
wasting time spinning vs wasting time waiting for N vcpus to be free for 
scheduling.  Or something; seems much more complex, particularly if you 
can do a small guest tweak to solve the problem.

> Anyway, concept looks fine; lguest's solution is more elegant of course :)
>   

You could remove all mutable state and call it "erlang".

> A little disappointing that you can't patch your version inline.

Spinlock code isn't inlined currently, so I hadn't considered it.  The 
fast path code for both lock and unlock is nearly small enough to 
consider it, but it seems a bit fiddly.

If the "spin_lock" and "spin_unlock" functions were inlined functions 
which called the out of line __raw_spin_lock/unlock functions, then 
after patching they would result in a direct call to the backend lock 
functions, which would be exactly equivalent to what happens now (since 
I hook __raw_spin_lock into calls via pv_lock_ops).

    J


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/4] Paravirtual spinlocks
  2008-07-08  0:37   ` Jeremy Fitzhardinge
@ 2008-07-08  1:01     ` Rusty Russell
  0 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2008-07-08  1:01 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: virtualization, Nick Piggin, Jens Axboe, Xen devel,
	Peter Zijlstra, Christoph Lameter, Petr Tesarik, LKML,
	Thomas Friebel, Ingo Molnar

On Tuesday 08 July 2008 10:37:54 Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > On Tuesday 08 July 2008 05:07:49 Jeremy Fitzhardinge wrote:
> >> At the most recent Xen Summit, Thomas Friebel presented a paper
> >> ("Preventing Guests from Spinning Around",
> >> http://xen.org/files/xensummitboston08/LHP.pdf) investigating the
> >> interactions between spinlocks and virtual machines.  Specifically, he
> >> looked at what happens when a lock-holding VCPU gets involuntarily
> >> preempted.
> >
> > I find it interesting that gang scheduling the guest was not suggested as
> > an obvious solution.
>
> It's an obvious answer, but not an obvious solution.  You trade off
> wasting time spinning vs wasting time waiting for N vcpus to be free for
> scheduling.

Perhaps, but with huge numbers of cores (as The Future seems to promise) and 
significant overcommit not sure how bad this would be.

> Or something; seems much more complex, particularly if you
> can do a small guest tweak to solve the problem.

But AFAICT it's one of a related set of problems where all VCPUs are required 
for a task.  Hackbench comes to mind.  There's going to be a lot of 
ping-ponging and you'll approach gang scheduling to get decent performance.

> > A little disappointing that you can't patch your version inline.
>
> Spinlock code isn't inlined currently, so I hadn't considered it.  The
> fast path code for both lock and unlock is nearly small enough to
> consider it, but it seems a bit fiddly.

Yeah, OK.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/4] Paravirtual spinlocks
  2008-07-08  0:29 ` [PATCH RFC 0/4] Paravirtual spinlocks Rusty Russell
  2008-07-08  0:37   ` Jeremy Fitzhardinge
@ 2008-07-08  4:51   ` Nick Piggin
  2008-07-08  5:28     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2008-07-08  4:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: virtualization, Jeremy Fitzhardinge, Jens Axboe, Xen devel,
	Peter Zijlstra, Christoph Lameter, Petr Tesarik, LKML,
	Thomas Friebel, Ingo Molnar

On Tuesday 08 July 2008 10:29, Rusty Russell wrote:
> On Tuesday 08 July 2008 05:07:49 Jeremy Fitzhardinge wrote:
> > At the most recent Xen Summit, Thomas Friebel presented a paper
> > ("Preventing Guests from Spinning Around",
> > http://xen.org/files/xensummitboston08/LHP.pdf) investigating the
> > interactions between spinlocks and virtual machines.  Specifically, he
> > looked at what happens when a lock-holding VCPU gets involuntarily
> > preempted.
>
> I find it interesting that gang scheduling the guest was not suggested as
> an obvious solution.
>
> Anyway, concept looks fine; lguest's solution is more elegant of course :)

What's lguest's solution?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/4] Paravirtual spinlocks
  2008-07-08  4:51   ` Nick Piggin
@ 2008-07-08  5:28     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-08  5:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rusty Russell, virtualization, Jens Axboe, Xen devel,
	Peter Zijlstra, Christoph Lameter, Petr Tesarik, LKML,
	Thomas Friebel, Ingo Molnar

Nick Piggin wrote:
> What's lguest's solution?
>   

Uniprocessor only.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 4/4] xen: implement Xen-specific spinlocks
  2008-07-07 19:07 ` [PATCH RFC 4/4] xen: implement Xen-specific spinlocks Jeremy Fitzhardinge
@ 2008-07-08  6:37   ` Johannes Weiner
  2008-07-08  7:15     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Weiner @ 2008-07-08  6:37 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra,
	Christoph Lameter, Petr Tesarik, Virtualization, Xen devel,
	Thomas Friebel, Jeremy Fitzhardinge

Hi,

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> The standard ticket spinlocks are very expensive in a virtual
> environment, because their performance depends on Xen's scheduler
> giving vcpus time in the order that they're supposed to take the
> spinlock.
>
> This implements a Xen-specific spinlock, which should be much more
> efficient.
>
> The fast-path is essentially the old Linux-x86 locks, using a single
> lock byte.  The locker decrements the byte; if the result is 0, then
> they have the lock.  If the lock is negative, then locker must spin
> until the lock is positive again.
>
> When there's contention, the locker spin for 2^16[*] iterations waiting
> to get the lock.  If it fails to get the lock in that time, it adds
> itself to the contention count in the lock and blocks on a per-cpu
> event channel.
>
> When unlocking the spinlock, the locker looks to see if there's anyone
> blocked waiting for the lock by checking for a non-zero waiter count.
> If there's a waiter, it traverses the per-cpu "lock_spinners"
> variable, which contains which lock each CPU is waiting on.  It picks
> one CPU waiting on the lock and sends it an event to wake it up.
>
> This allows efficient fast-path spinlock operation, while allowing
> spinning vcpus to give up their processor time while waiting for a
> contended lock.
>
> [*] 2^16 iterations is threshold at which 98% locks have been taken
> according to Thomas Friebel's Xen Summit talk "Preventing Guests from
> Spinning Around".  Therefore, we'd expect the lock and unlock slow
> paths will only be entered 2% of the time.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Cc: Thomas Friebel <thomas.friebel@amd.com>
> ---
>  arch/x86/xen/smp.c           |  172 +++++++++++++++++++++++++++++++++++++++++-
>  drivers/xen/events.c         |   27 ++++++
>  include/asm-x86/xen/events.h |    1 
>  include/xen/events.h         |    7 +
>  4 files changed, 206 insertions(+), 1 deletion(-)
>
> ===================================================================
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -15,6 +15,7 @@
>   * This does not handle HOTPLUG_CPU yet.
>   */
>  #include <linux/sched.h>
> +#include <linux/kernel_stat.h>
>  #include <linux/err.h>
>  #include <linux/smp.h>
>  
> @@ -34,6 +35,8 @@
>  
>  #include "xen-ops.h"
>  #include "mmu.h"
> +
> +static void __cpuinit xen_init_lock_cpu(int cpu);
>  
>  cpumask_t xen_cpu_initialized_map;
>  
> @@ -179,6 +182,8 @@
>  {
>  	unsigned cpu;
>  
> +	xen_init_lock_cpu(0);
> +
>  	smp_store_cpu_info(0);
>  	cpu_data(0).x86_max_cores = 1;
>  	set_cpu_sibling_map(0);
> @@ -301,6 +306,7 @@
>  	clear_tsk_thread_flag(idle, TIF_FORK);
>  #endif
>  	xen_setup_timer(cpu);
> +	xen_init_lock_cpu(cpu);
>  
>  	per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
>  
> @@ -413,6 +419,170 @@
>  	return IRQ_HANDLED;
>  }
>  
> +struct xen_spinlock {
> +	unsigned char lock;		/* 0 -> free; 1 -> locked */
> +	unsigned short spinners;	/* count of waiting cpus */
> +};
> +
> +static int xen_spin_is_locked(struct raw_spinlock *lock)
> +{
> +	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
> +
> +	return xl->lock != 0;
> +}
> +
> +static int xen_spin_is_contended(struct raw_spinlock *lock)
> +{
> +	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
> +
> +	/* Not strictly true; this is only the count of contended
> +	   lock-takers entering the slow path. */
> +	return xl->spinners != 0;
> +}
> +
> +static int xen_spin_trylock(struct raw_spinlock *lock)
> +{
> +	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
> +	u8 old = 1;
> +
> +	asm("xchgb %b0,%1"
> +	    : "+q" (old), "+m" (xl->lock) : : "memory");
> +
> +	return old == 0;
> +}
> +
> +static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
> +static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);

The plural is a bit misleading, as this is a single pointer per CPU.

> +static inline void spinning_lock(struct xen_spinlock *xl)
> +{
> +	__get_cpu_var(lock_spinners) = xl;
> +	wmb();			/* set lock of interest before count */
> +	asm(LOCK_PREFIX " incw %0"
> +	    : "+m" (xl->spinners) : : "memory");
> +}
> +
> +static inline void unspinning_lock(struct xen_spinlock *xl)
> +{
> +	asm(LOCK_PREFIX " decw %0"
> +	    : "+m" (xl->spinners) : : "memory");
> +	wmb();			/* decrement count before clearing lock */
> +	__get_cpu_var(lock_spinners) = NULL;
> +}
> +
> +static noinline int xen_spin_lock_slow(struct raw_spinlock *lock)
> +{
> +	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
> +	int irq = __get_cpu_var(lock_kicker_irq);
> +	int ret;
> +
> +	/* If kicker interrupts not initialized yet, just spin */
> +	if (irq == -1)
> +		return 0;
> +
> +	/* announce we're spinning */
> +	spinning_lock(xl);
> +
> +	/* clear pending */
> +	xen_clear_irq_pending(irq);
> +
> +	/* check again make sure it didn't become free while
> +	   we weren't looking  */
> +	ret = xen_spin_trylock(lock);
> +	if (ret)
> +		goto out;
> +
> +	/* block until irq becomes pending */
> +	xen_poll_irq(irq);
> +	kstat_this_cpu.irqs[irq]++;
> +
> +out:
> +	unspinning_lock(xl);
> +	return ret;
> +}
> +
> +static void xen_spin_lock(struct raw_spinlock *lock)
> +{
> +	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
> +	int timeout;
> +	u8 oldval;
> +
> +	do {
> +		timeout = 1 << 10;
> +
> +		asm("1: xchgb %1,%0\n"
> +		    "   testb %1,%1\n"
> +		    "   jz 3f\n"
> +		    "2: rep;nop\n"
> +		    "   cmpb $0,%0\n"
> +		    "   je 1b\n"
> +		    "   dec %2\n"
> +		    "   jnz 2b\n"
> +		    "3:\n"
> +		    : "+m" (xl->lock), "=q" (oldval), "+r" (timeout)
> +		    : "1" (1)
> +		    : "memory");
> +
> +	} while (unlikely(oldval != 0 && !xen_spin_lock_slow(lock)));
> +}
> +
> +static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
> +{
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {

Would it be feasible to have a bitmap for the spinning CPUs in order to
do a for_each_spinning_cpu() here instead?  Or is setting a bit in
spinning_lock() and unsetting it in unspinning_lock() more overhead than
going over all CPUs here?

> +		/* XXX should mix up next cpu selection */
> +		if (per_cpu(lock_spinners, cpu) == xl) {
> +			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> +			break;
> +		}
> +	}
> +}
> +
> +static void xen_spin_unlock(struct raw_spinlock *lock)
> +{
> +	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
> +
> +	smp_wmb();		/* make sure no writes get moved after unlock */
> +	xl->lock = 0;		/* release lock */
> +
> +	/* make sure unlock happens before kick */
> +	barrier();
> +
> +	if (unlikely(xl->spinners))
> +		xen_spin_unlock_slow(xl);
> +}
> +
> +static __cpuinit void xen_init_lock_cpu(int cpu)
> +{
> +	int irq;
> +	const char *name;
> +
> +	name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
> +	irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
> +				     cpu,
> +				     xen_reschedule_interrupt,
> +				     IRQF_DISABLED|IRQF_PERCPU|IRQF_NOBALANCING,
> +				     name,
> +				     NULL);
> +
> +	if (irq >= 0) {
> +		disable_irq(irq); /* make sure it's never delivered */
> +		per_cpu(lock_kicker_irq, cpu) = irq;
> +	}
> +
> +	printk("cpu %d spinlock event irq %d\n", cpu, irq);
> +}
> +
> +static void __init xen_init_spinlocks(void)
> +{
> +	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
> +	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
> +	pv_lock_ops.spin_lock = xen_spin_lock;
> +	pv_lock_ops.spin_trylock = xen_spin_trylock;
> +	pv_lock_ops.spin_unlock = xen_spin_unlock;
> +}
> +
>  static const struct smp_ops xen_smp_ops __initdata = {
>  	.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu,
>  	.smp_prepare_cpus = xen_smp_prepare_cpus,
> @@ -430,5 +600,5 @@
>  {
>  	smp_ops = xen_smp_ops;
>  	xen_fill_possible_map();
> -	paravirt_use_bytelocks();
> +	xen_init_spinlocks();
>  }
> ===================================================================
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -741,6 +741,33 @@
>  	}
>  }
>  
> +/* Clear an irq's pending state, in preparation for polling on it */
> +void xen_clear_irq_pending(int irq)
> +{
> +	int evtchn = evtchn_from_irq(irq);
> +
> +	if (VALID_EVTCHN(evtchn))
> +		clear_evtchn(evtchn);
> +}
> +
> +/* Poll waiting for an irq to become pending.  In the usual case, the
> +   irq will be disabled so it won't deliver an interrupt. */
> +void xen_poll_irq(int irq)
> +{
> +	evtchn_port_t evtchn = evtchn_from_irq(irq);
> +
> +	if (VALID_EVTCHN(evtchn)) {
> +		struct sched_poll poll;
> +
> +		poll.nr_ports = 1;
> +		poll.timeout = 0;
> +		poll.ports = &evtchn;
> +
> +		if (HYPERVISOR_sched_op(SCHEDOP_poll, &poll) != 0)
> +			BUG();
> +	}
> +}
> +
>  void xen_irq_resume(void)
>  {
>  	unsigned int cpu, irq, evtchn;
> ===================================================================
> --- a/include/asm-x86/xen/events.h
> +++ b/include/asm-x86/xen/events.h
> @@ -5,6 +5,7 @@
>  	XEN_RESCHEDULE_VECTOR,
>  	XEN_CALL_FUNCTION_VECTOR,
>  	XEN_CALL_FUNCTION_SINGLE_VECTOR,
> +	XEN_SPIN_UNLOCK_VECTOR,
>  
>  	XEN_NR_IPIS,
>  };
> ===================================================================
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -45,4 +45,11 @@
>  extern void xen_irq_resume(void);
>  extern void xen_evtchn_do_upcall(struct pt_regs *regs);
>  
> +/* Clear an irq's pending state, in preparation for polling on it */
> +void xen_clear_irq_pending(int irq);
> +
> +/* Poll waiting for an irq to become pending.  In the usual case, the
> +   irq will be disabled so it won't deliver an interrupt. */
> +void xen_poll_irq(int irq);
> +
>  #endif	/* _XEN_EVENTS_H */

	Hannes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 4/4] xen: implement Xen-specific spinlocks
  2008-07-08  6:37   ` Johannes Weiner
@ 2008-07-08  7:15     ` Jeremy Fitzhardinge
  2008-07-08  7:30       ` Johannes Weiner
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-08  7:15 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Nick Piggin, LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra,
	Christoph Lameter, Petr Tesarik, Virtualization, Xen devel,
	Thomas Friebel, Jeremy Fitzhardinge

Johannes Weiner wrote:
>> +static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
>> +static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
>>     
>
> The plural is a bit misleading, as this is a single pointer per CPU.
>   

Yeah.  And it's wrong because it's specifically *not* spinning, but 
blocking.

>> +static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
>> +{
>> +	int cpu;
>> +
>> +	for_each_online_cpu(cpu) {
>>     
>
> Would it be feasible to have a bitmap for the spinning CPUs in order to
> do a for_each_spinning_cpu() here instead?  Or is setting a bit in
> spinning_lock() and unsetting it in unspinning_lock() more overhead than
> going over all CPUs here?
>   

Not worthwhile, I think.  This is a very rare path: it will only happen 
if 1) there's lock contention, that 2) wasn't resolved within the 
timeout.  In practice, this gets called a few thousand times per cpu 
over a kernbench, which is nothing.

My very original version of this code kept a bitmask of interested CPUs 
within the lock, but there's only space for 24 cpus if we still use a 
byte for the lock itself.  It all turned out fairly awkward, and this 
version is a marked improvement.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 4/4] xen: implement Xen-specific spinlocks
  2008-07-08  7:15     ` Jeremy Fitzhardinge
@ 2008-07-08  7:30       ` Johannes Weiner
  0 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2008-07-08  7:30 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Ingo Molnar, Jens Axboe, Peter Zijlstra,
	Christoph Lameter, Petr Tesarik, Virtualization, Xen devel,
	Thomas Friebel, Jeremy Fitzhardinge

Hi,

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Johannes Weiner wrote:
>>> +static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
>>> +static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
>>>     
>>
>> The plural is a bit misleading, as this is a single pointer per CPU.
>>   
>
> Yeah.  And it's wrong because it's specifically *not* spinning, but
> blocking.

I thought of it as `virtually spinning', so had no problems with the
naming itself :)

>>> +static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
>>> +{
>>> +	int cpu;
>>> +
>>> +	for_each_online_cpu(cpu) {
>>>     
>>
>> Would it be feasible to have a bitmap for the spinning CPUs in order to
>> do a for_each_spinning_cpu() here instead?  Or is setting a bit in
>> spinning_lock() and unsetting it in unspinning_lock() more overhead than
>> going over all CPUs here?
>>   
>
> Not worthwhile, I think.  This is a very rare path: it will only
> happen if 1) there's lock contention, that 2) wasn't resolved within
> the timeout.  In practice, this gets called a few thousand times per
> cpu over a kernbench, which is nothing.

Okay, I agree.

	Hannes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/4] Paravirtual spinlocks
  2008-07-07 19:07 [PATCH RFC 0/4] Paravirtual spinlocks Jeremy Fitzhardinge
                   ` (4 preceding siblings ...)
  2008-07-08  0:29 ` [PATCH RFC 0/4] Paravirtual spinlocks Rusty Russell
@ 2008-07-09 12:28 ` Ingo Molnar
  2008-07-09 12:35   ` [patch] x86: paravirt spinlocks, !CONFIG_SMP build fixes (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks) Ingo Molnar
                     ` (2 more replies)
  5 siblings, 3 replies; 20+ messages in thread
From: Ingo Molnar @ 2008-07-09 12:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Avi Kivity


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> My experiments show that using a Xen-specific lock helps guest 
> performance a bit (reduction in elapsed and system time in a kernbench 
> run), but most significantly, reduces overall physical CPU consumption 
> by 10%, and so increases overall system scalability.

that's rather impressive and looks nice, considering the fairly low 
impact.

as there were no fundamental objections in this thread i've created a 
tip/x86/paravirt-spinlocks topic branch for these patches and started 
testing them.

i based the topic branch on tip/xen-64bit, so you should be able to get 
the latest code by doing:

  git-merge tip/x86/paravirt-spinlocks

on tip/master.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch] x86: paravirt spinlocks, !CONFIG_SMP build fixes (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks)
  2008-07-09 12:28 ` Ingo Molnar
@ 2008-07-09 12:35   ` Ingo Molnar
  2008-07-09 12:39   ` [patch] x86: paravirt spinlocks, modular build fix " Ingo Molnar
  2008-07-09 13:33   ` [PATCH RFC 0/4] Paravirtual spinlocks Ingo Molnar
  2 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2008-07-09 12:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Avi Kivity


* Ingo Molnar <mingo@elte.hu> wrote:

> as there were no fundamental objections in this thread i've created a 
> tip/x86/paravirt-spinlocks topic branch for these patches and started 
> testing them.

small UP build fixes below.

	Ingo

----------------->
commit c77bc635bb0486b0ddbe03012a8662b399ae9cdb
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Jul 9 14:33:33 2008 +0200

    x86: paravirt spinlocks, !CONFIG_SMP build fixes
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index bba4041..6aa8aed 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -270,11 +270,13 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 
 void __init paravirt_use_bytelocks(void)
 {
+#ifdef CONFIG_SMP
 	pv_lock_ops.spin_is_locked = __byte_spin_is_locked;
 	pv_lock_ops.spin_is_contended = __byte_spin_is_contended;
 	pv_lock_ops.spin_lock = __byte_spin_lock;
 	pv_lock_ops.spin_trylock = __byte_spin_trylock;
 	pv_lock_ops.spin_unlock = __byte_spin_unlock;
+#endif
 }
 
 struct pv_info pv_info = {
@@ -461,12 +463,14 @@ struct pv_mmu_ops pv_mmu_ops = {
 };
 
 struct pv_lock_ops pv_lock_ops = {
+#ifdef CONFIG_SMP
 	.spin_is_locked = __ticket_spin_is_locked,
 	.spin_is_contended = __ticket_spin_is_contended,
 
 	.spin_lock = __ticket_spin_lock,
 	.spin_trylock = __ticket_spin_trylock,
 	.spin_unlock = __ticket_spin_unlock,
+#endif
 };
 
 EXPORT_SYMBOL_GPL(pv_time_ops);
diff --git a/include/asm-x86/paravirt.h b/include/asm-x86/paravirt.h
index 65ed02c..b2aba8f 100644
--- a/include/asm-x86/paravirt.h
+++ b/include/asm-x86/paravirt.h
@@ -1387,6 +1387,8 @@ void _paravirt_nop(void);
 
 void paravirt_use_bytelocks(void);
 
+#ifdef CONFIG_SMP
+
 static inline int __raw_spin_is_locked(struct raw_spinlock *lock)
 {
 	return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
@@ -1412,6 +1414,8 @@ static __always_inline void __raw_spin_unlock(struct raw_spinlock *lock)
 	return PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
 }
 
+#endif
+
 /* These all sit in the .parainstructions section to tell us what to patch. */
 struct paravirt_patch_site {
 	u8 *instr; 		/* original instructions */


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch] x86: paravirt spinlocks, modular build fix  (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks)
  2008-07-09 12:28 ` Ingo Molnar
  2008-07-09 12:35   ` [patch] x86: paravirt spinlocks, !CONFIG_SMP build fixes (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks) Ingo Molnar
@ 2008-07-09 12:39   ` Ingo Molnar
  2008-07-09 13:33   ` [PATCH RFC 0/4] Paravirtual spinlocks Ingo Molnar
  2 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2008-07-09 12:39 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Avi Kivity


another small fixlet.

	Ingo

----------------->
commit 0ea0f032767f41fb0a9586ab6acb5c1408ef16c5
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Jul 9 14:39:15 2008 +0200

    x86: paravirt spinlocks, modular build fix
    
    fix:
    
      MODPOST 408 modules
    ERROR: "pv_lock_ops" [net/dccp/dccp.ko] undefined!
    ERROR: "pv_lock_ops" [fs/jbd2/jbd2.ko] undefined!
    ERROR: "pv_lock_ops" [drivers/media/common/saa7146_vv.ko] undefined!
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 6aa8aed..3edfd7a 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -472,6 +472,7 @@ struct pv_lock_ops pv_lock_ops = {
 	.spin_unlock = __ticket_spin_unlock,
 #endif
 };
+EXPORT_SYMBOL_GPL(pv_lock_ops);
 
 EXPORT_SYMBOL_GPL(pv_time_ops);
 EXPORT_SYMBOL    (pv_cpu_ops);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/4] Paravirtual spinlocks
  2008-07-09 12:28 ` Ingo Molnar
  2008-07-09 12:35   ` [patch] x86: paravirt spinlocks, !CONFIG_SMP build fixes (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks) Ingo Molnar
  2008-07-09 12:39   ` [patch] x86: paravirt spinlocks, modular build fix " Ingo Molnar
@ 2008-07-09 13:33   ` Ingo Molnar
  2008-07-09 13:49     ` [patch] x86, paravirt-spinlocks: fix boot hang (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks) Ingo Molnar
  2 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2008-07-09 13:33 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Avi Kivity


the patches caused a boot hang with this config:

 http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad

i have bisected it down to:

  commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1
  Author: Jeremy Fitzhardinge <jeremy@goop.org>
  Date:   Mon Jul 7 12:07:53 2008 -0700

      xen: implement Xen-specific spinlocks

i.e. applying that patch alone causes the hang. The hang happens in the 
ftrace self-test:

  initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs
  calling  init_sched_switch_trace+0x0/0x4c
  Testing tracer sched_switch: PASSED
  initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs
  calling  init_function_trace+0x0/0x12
  Testing tracer ftrace: 
  [hard hang]

it should have continued like this:

  Testing tracer ftrace: PASSED
  initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs
  calling  init_irqsoff_tracer+0x0/0x14
  Testing tracer irqsoff: PASSED
  initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs
  calling  init_mmio_trace+0x0/0x12
  initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs

note: ftrace is a user of raw spinlocks. Another speciality of the 
config is that it has paravirt enabled (obviously), and that it also has 
lock debugging enabled.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch] x86, paravirt-spinlocks: fix boot hang (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks)
  2008-07-09 13:33   ` [PATCH RFC 0/4] Paravirtual spinlocks Ingo Molnar
@ 2008-07-09 13:49     ` Ingo Molnar
  2008-07-09 15:55       ` [patch] x86, paravirt-spinlocks: fix boot hang Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2008-07-09 13:49 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Avi Kivity, the arch/x86 maintainers


* Ingo Molnar <mingo@elte.hu> wrote:

> the patches caused a boot hang with this config:
> 
>  http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad

ok, fixed it via the patch below.

	Ingo

-------------->
commit 6412367fe22dc54cc727e7bec5bd65c36c1a0754
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Jul 9 15:42:09 2008 +0200

    x86, paravirt-spinlocks: fix boot hang
    
    the paravirt-spinlock patches caused a boot hang with this config:
    
     http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad
    
    i have bisected it down to:
    
    |  commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1
    |  Author: Jeremy Fitzhardinge <jeremy@goop.org>
    |  Date:   Mon Jul 7 12:07:53 2008 -0700
    |
    |      xen: implement Xen-specific spinlocks
    
    i.e. applying that patch alone causes the hang. The hang happens in the
    ftrace self-test:
    
      initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs
      calling  init_sched_switch_trace+0x0/0x4c
      Testing tracer sched_switch: PASSED
      initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs
      calling  init_function_trace+0x0/0x12
      Testing tracer ftrace:
      [hard hang]
    
    it should have continued like this:
    
      Testing tracer ftrace: PASSED
      initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs
      calling  init_irqsoff_tracer+0x0/0x14
      Testing tracer irqsoff: PASSED
      initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs
      calling  init_mmio_trace+0x0/0x12
      initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs
    
    the problem is that such lowlevel primitives as spinlocks should never
    be built with -pg (which ftrace does). Marking paravirt.o as non-pg and
    marking all spinlock ops as always-inline solve the hang.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2dd1ccd..faba669 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -7,10 +7,11 @@ extra-y                := head_$(BITS).o head$(BITS).o head.o init_task.o vmlinu
 CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
 
 ifdef CONFIG_FTRACE
-# Do not profile debug utilities
+# Do not profile debug and lowlevel utilities
 CFLAGS_REMOVE_tsc_64.o = -pg
 CFLAGS_REMOVE_tsc_32.o = -pg
 CFLAGS_REMOVE_rtc.o = -pg
+CFLAGS_REMOVE_paravirt.o = -pg
 endif
 
 #

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [patch] x86, paravirt-spinlocks: fix boot hang
  2008-07-09 13:49     ` [patch] x86, paravirt-spinlocks: fix boot hang (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks) Ingo Molnar
@ 2008-07-09 15:55       ` Jeremy Fitzhardinge
  2008-07-09 19:26         ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-09 15:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, LKML, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Avi Kivity, the arch/x86 maintainers

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>   
>> the patches caused a boot hang with this config:
>>
>>  http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad
>>     
>
> ok, fixed it via the patch below.
>
> 	Ingo
>
> -------------->
> commit 6412367fe22dc54cc727e7bec5bd65c36c1a0754
> Author: Ingo Molnar <mingo@elte.hu>
> Date:   Wed Jul 9 15:42:09 2008 +0200
>
>     x86, paravirt-spinlocks: fix boot hang
>     
>     the paravirt-spinlock patches caused a boot hang with this config:
>     
>      http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad
>     
>     i have bisected it down to:
>     
>     |  commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1
>     |  Author: Jeremy Fitzhardinge <jeremy@goop.org>
>     |  Date:   Mon Jul 7 12:07:53 2008 -0700
>     |
>     |      xen: implement Xen-specific spinlocks
>     
>     i.e. applying that patch alone causes the hang. The hang happens in the
>     ftrace self-test:
>     
>       initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs
>       calling  init_sched_switch_trace+0x0/0x4c
>       Testing tracer sched_switch: PASSED
>       initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs
>       calling  init_function_trace+0x0/0x12
>       Testing tracer ftrace:
>       [hard hang]
>     
>     it should have continued like this:
>     
>       Testing tracer ftrace: PASSED
>       initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs
>       calling  init_irqsoff_tracer+0x0/0x14
>       Testing tracer irqsoff: PASSED
>       initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs
>       calling  init_mmio_trace+0x0/0x12
>       initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs
>     
>     the problem is that such lowlevel primitives as spinlocks should never
>     be built with -pg (which ftrace does). Marking paravirt.o as non-pg and
>     marking all spinlock ops as always-inline solve the hang.
>    

Thanks Ingo, that would have taken me a while to work out.

Presumably that means that xen/smp.o should also be built without -pg.  
In fact, should I move the spinlock code into its own file, so that 
ftrace can cover as much as it can?

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] x86, paravirt-spinlocks: fix boot hang
  2008-07-09 15:55       ` [patch] x86, paravirt-spinlocks: fix boot hang Jeremy Fitzhardinge
@ 2008-07-09 19:26         ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2008-07-09 19:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, LKML, Jens Axboe, Peter Zijlstra, Christoph Lameter,
	Petr Tesarik, Virtualization, Xen devel, Thomas Friebel,
	Avi Kivity, the arch/x86 maintainers


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>> should never
>>     be built with -pg (which ftrace does). Marking paravirt.o as 
>>     non-pg and marking all spinlock ops as always-inline solve the 
>>     hang.
>>    
>
> Thanks Ingo, that would have taken me a while to work out.

yeah, i figured i'd be the better candidate to debug ftrace related 
problems ;-)

> Presumably that means that xen/smp.o should also be built without -pg.  
> In fact, should I move the spinlock code into its own file, so that 
> ftrace can cover as much as it can?

hm, yeah. Anything that can be called via the lowest level spinlock 
functions should be either notrace, or, better, in a -pg-excluded .o 
file. Putting it all into a separate file would cover it nicely i think.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-07-09 19:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-07 19:07 [PATCH RFC 0/4] Paravirtual spinlocks Jeremy Fitzhardinge
2008-07-07 19:07 ` [PATCH RFC 1/4] x86/paravirt: add hooks for spinlock operations Jeremy Fitzhardinge
2008-07-07 19:07 ` [PATCH RFC 2/4] paravirt: introduce a "lock-byte" spinlock implementation Jeremy Fitzhardinge
2008-07-07 19:07 ` [PATCH RFC 3/4] xen: use lock-byte " Jeremy Fitzhardinge
2008-07-07 19:07 ` [PATCH RFC 4/4] xen: implement Xen-specific spinlocks Jeremy Fitzhardinge
2008-07-08  6:37   ` Johannes Weiner
2008-07-08  7:15     ` Jeremy Fitzhardinge
2008-07-08  7:30       ` Johannes Weiner
2008-07-08  0:29 ` [PATCH RFC 0/4] Paravirtual spinlocks Rusty Russell
2008-07-08  0:37   ` Jeremy Fitzhardinge
2008-07-08  1:01     ` Rusty Russell
2008-07-08  4:51   ` Nick Piggin
2008-07-08  5:28     ` Jeremy Fitzhardinge
2008-07-09 12:28 ` Ingo Molnar
2008-07-09 12:35   ` [patch] x86: paravirt spinlocks, !CONFIG_SMP build fixes (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks) Ingo Molnar
2008-07-09 12:39   ` [patch] x86: paravirt spinlocks, modular build fix " Ingo Molnar
2008-07-09 13:33   ` [PATCH RFC 0/4] Paravirtual spinlocks Ingo Molnar
2008-07-09 13:49     ` [patch] x86, paravirt-spinlocks: fix boot hang (was: Re: [PATCH RFC 0/4] Paravirtual spinlocks) Ingo Molnar
2008-07-09 15:55       ` [patch] x86, paravirt-spinlocks: fix boot hang Jeremy Fitzhardinge
2008-07-09 19:26         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).