All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 0/4] Use ticket locks for spinlocks
@ 2015-04-21 10:11 David Vrabel
  2015-04-21 10:11 ` [PATCHv3 1/4] x86: provide xadd() David Vrabel
                   ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: David Vrabel @ 2015-04-21 10:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich, Jennifer Herbert

Use ticket locks for spin locks instead of the current byte locks.
Ticket locks are fair.  This is an important property for hypervisor
locks.

Note that spin_lock_irq() and spin_lock_irqsave() now spin with irqs
disabled (previously, they would spin with irqs enabled if possible).
This is required to prevent deadlocks when the irq handler tries to
take the same lock with a higher ticket.

We have analysed the affect of this series on interrupt latency (by
measuring the duration of irq disable/enable regions) and there is no
signficant impact.

http://xenbits.xen.org/people/dvrabel/bar2_comp.png

Interestingly, the biggest offenders for long critical sections are
bare irq disable/enable regions, and some of these are really bad (10s
of ms)!

http://xenbits.xen.org/people/dvrabel/bar_normal_busy.png

Thanks to Jennifer Herbert for performing this analysis.

Performance results (using a earlier prototype) of aggregate network
throughput between 20 VIF pairs shows good improvements.

http://xenbits.xen.org/people/dvrabel/3336.png

This benchmark is limited by contention on the map track lock.

Changes in v3:
- xadd() implementation for arm32 and arm64.
- Add barriers required on arm.
- Only wait for the current lock holder in _spin_barrier().

Changes in v2:
- Reimplemented in C, requiring only an arch-specific xadd() op
- Optimize spin_lock_recursive() to call spin_lock() since trylock
  loops are bad with ticket locks.

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCHv3 1/4] x86: provide xadd()
  2015-04-21 10:11 [PATCHv3 0/4] Use ticket locks for spinlocks David Vrabel
@ 2015-04-21 10:11 ` David Vrabel
  2015-04-21 10:36   ` Jan Beulich
  2015-04-21 10:11 ` [PATCHv3 2/4] arm: " David Vrabel
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-21 10:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich, Jennifer Herbert

xadd() atomically adds a value and returns the previous value.  This
is needed to implement ticket locks.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/include/asm-x86/system.h |   47 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 7111329..f244c8d 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -118,6 +118,53 @@ static always_inline unsigned long __cmpxchg(
 })
 
 /*
+ * Undefined symbol to cause link failure if a wrong size is used with
+ * xadd().
+ */
+extern unsigned long __bad_xadd_size(void);
+
+static always_inline unsigned long __xadd(
+    volatile void *ptr, unsigned long v, int size)
+{
+    switch ( size )
+    {
+    case 1:
+        asm volatile ( "lock; xaddb %b0,%1"
+                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
+                       :: "memory");
+        return v;
+    case 2:
+        asm volatile ( "lock; xaddw %w0,%1"
+                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
+                       :: "memory");
+        return v;
+    case 4:
+        asm volatile ( "lock; xaddl %k0,%1"
+                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
+                       :: "memory");
+        return v;
+    case 8:
+        asm volatile ( "lock; xaddq %q0,%1"
+                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
+                       :: "memory");
+
+        return v;
+    default:
+        return __bad_xadd_size();
+    }
+}
+
+/*
+ * Atomically add @v to the 1, 2, 4, or 8 byte value at @ptr.  Returns
+ * the previous value.
+ *
+ * This is a full memory barrier.
+ */
+#define xadd(ptr, v) ({                                         \
+            __xadd((ptr), (unsigned long)(v), sizeof(*(ptr)));  \
+        })
+
+/*
  * Both Intel and AMD agree that, from a programmer's viewpoint:
  *  Loads cannot be reordered relative to other loads.
  *  Stores cannot be reordered relative to other stores.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv3 2/4] arm: provide xadd()
  2015-04-21 10:11 [PATCHv3 0/4] Use ticket locks for spinlocks David Vrabel
  2015-04-21 10:11 ` [PATCHv3 1/4] x86: provide xadd() David Vrabel
@ 2015-04-21 10:11 ` David Vrabel
  2015-04-21 10:11 ` [PATCHv3 3/4] xen: use ticket locks for spin locks David Vrabel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: David Vrabel @ 2015-04-21 10:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich, Jennifer Herbert

xadd() atomically adds a value and returns the previous value.  This
is needed to implement ticket locks.

This generic arm implementation uses the GCC __sync_fetch_and_add()
builtin.  This builtin resulted in suitable inlined asm for GCC 4.8.3
(arm64) and GCC 4.6.3 (arm32).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/include/asm-arm/system.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h
index ce3d38a..367af1d 100644
--- a/xen/include/asm-arm/system.h
+++ b/xen/include/asm-arm/system.h
@@ -51,6 +51,8 @@
 # error "unknown ARM variant"
 #endif
 
+#define xadd(x, v) __sync_fetch_and_add(x, v)
+
 extern struct vcpu *__context_switch(struct vcpu *prev, struct vcpu *next);
 
 #endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-21 10:11 [PATCHv3 0/4] Use ticket locks for spinlocks David Vrabel
  2015-04-21 10:11 ` [PATCHv3 1/4] x86: provide xadd() David Vrabel
  2015-04-21 10:11 ` [PATCHv3 2/4] arm: " David Vrabel
@ 2015-04-21 10:11 ` David Vrabel
  2015-04-23 11:58   ` Jan Beulich
  2015-04-23 12:03   ` Tim Deegan
  2015-04-21 10:11 ` [PATCHv3 4/4] x86, arm: remove asm/spinlock.h from all architectures David Vrabel
  2015-04-21 10:22 ` [PATCHv3 0/4] Use ticket locks for spinlocks Jan Beulich
  4 siblings, 2 replies; 36+ messages in thread
From: David Vrabel @ 2015-04-21 10:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich, Jennifer Herbert

Replace the byte locks with ticket locks.  Ticket locks are: a) fair;
and b) peform better when contented since they spin without an atomic
operation.

The lock is split into two ticket values: head and tail.  A locker
acquires a ticket by (atomically) increasing tail and using the
previous tail value.  A CPU holds the lock if its ticket == head.  The
lock is released by increasing head.

Architectures need only provide an xadd() implementation.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/spinlock.c      |  110 ++++++++++++++++++++++++--------------------
 xen/include/xen/spinlock.h |   16 +++++--
 2 files changed, 72 insertions(+), 54 deletions(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 5fd8b1c..a2170d2 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -115,16 +115,32 @@ void spin_debug_disable(void)
 
 #endif
 
+static always_inline spinlock_tickets_t observe_lock(spinlock_tickets_t *t)
+{
+    spinlock_tickets_t v;
+
+    smp_rmb();
+    v.head_tail = read_atomic(&t->head_tail);
+    return v;
+}
+
+static always_inline u16 observe_head(spinlock_tickets_t *t)
+{
+    smp_rmb();
+    return read_atomic(&t->head);
+}
+
 void _spin_lock(spinlock_t *lock)
 {
+    spinlock_tickets_t tickets = { .tail = 1, };
     LOCK_PROFILE_VAR;
 
     check_lock(&lock->debug);
-    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
+    tickets.head_tail = xadd(&lock->tickets.head_tail, tickets.head_tail);
+    while ( tickets.tail != observe_head(&lock->tickets) )
     {
         LOCK_PROFILE_BLOCK;
-        while ( likely(_raw_spin_is_locked(&lock->raw)) )
-            cpu_relax();
+        cpu_relax();
     }
     LOCK_PROFILE_GOT;
     preempt_disable();
@@ -132,76 +148,58 @@ void _spin_lock(spinlock_t *lock)
 
 void _spin_lock_irq(spinlock_t *lock)
 {
-    LOCK_PROFILE_VAR;
-
     ASSERT(local_irq_is_enabled());
     local_irq_disable();
-    check_lock(&lock->debug);
-    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
-    {
-        LOCK_PROFILE_BLOCK;
-        local_irq_enable();
-        while ( likely(_raw_spin_is_locked(&lock->raw)) )
-            cpu_relax();
-        local_irq_disable();
-    }
-    LOCK_PROFILE_GOT;
-    preempt_disable();
+    _spin_lock(lock);
 }
 
 unsigned long _spin_lock_irqsave(spinlock_t *lock)
 {
     unsigned long flags;
-    LOCK_PROFILE_VAR;
 
     local_irq_save(flags);
-    check_lock(&lock->debug);
-    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
-    {
-        LOCK_PROFILE_BLOCK;
-        local_irq_restore(flags);
-        while ( likely(_raw_spin_is_locked(&lock->raw)) )
-            cpu_relax();
-        local_irq_disable();
-    }
-    LOCK_PROFILE_GOT;
-    preempt_disable();
+    _spin_lock(lock);
     return flags;
 }
 
 void _spin_unlock(spinlock_t *lock)
 {
+    smp_mb();
     preempt_enable();
     LOCK_PROFILE_REL;
-    _raw_spin_unlock(&lock->raw);
+    lock->tickets.head++;
 }
 
 void _spin_unlock_irq(spinlock_t *lock)
 {
-    preempt_enable();
-    LOCK_PROFILE_REL;
-    _raw_spin_unlock(&lock->raw);
+    _spin_unlock(lock);
     local_irq_enable();
 }
 
 void _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
 {
-    preempt_enable();
-    LOCK_PROFILE_REL;
-    _raw_spin_unlock(&lock->raw);
+    _spin_unlock(lock);
     local_irq_restore(flags);
 }
 
 int _spin_is_locked(spinlock_t *lock)
 {
     check_lock(&lock->debug);
-    return _raw_spin_is_locked(&lock->raw);
+    return lock->tickets.head != lock->tickets.tail;
 }
 
 int _spin_trylock(spinlock_t *lock)
 {
+    spinlock_tickets_t old, new;
+
     check_lock(&lock->debug);
-    if ( !_raw_spin_trylock(&lock->raw) )
+    old = observe_lock(&lock->tickets);
+    if ( old.head != old.tail )
+        return 0;
+    new = old;
+    new.tail++;
+    if ( cmpxchg(&lock->tickets.head_tail, old.head_tail, new.head_tail)
+         != old.head_tail )
         return 0;
 #ifdef LOCK_PROFILE
     if (lock->profile)
@@ -213,27 +211,32 @@ int _spin_trylock(spinlock_t *lock)
 
 void _spin_barrier(spinlock_t *lock)
 {
+    spinlock_tickets_t sample;
 #ifdef LOCK_PROFILE
     s_time_t block = NOW();
-    u64      loop = 0;
+#endif
 
     check_barrier(&lock->debug);
-    do { smp_mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) );
-    if ((loop > 1) && lock->profile)
+    sample = observe_lock(&lock->tickets);
+    if (sample.head != sample.tail)
     {
-        lock->profile->time_block += NOW() - block;
-        lock->profile->block_cnt++;
-    }
-#else
-    check_barrier(&lock->debug);
-    do { smp_mb(); } while ( _raw_spin_is_locked(&lock->raw) );
+        while (observe_head(&lock->tickets) != sample.tail)
+        {
+#ifdef LOCK_PROFILE
+            if (lock->profile)
+            {
+                lock->profile->time_block += NOW() - block;
+                lock->profile->block_cnt++;
+            }
 #endif
+        }
+    }
     smp_mb();
 }
 
 int _spin_trylock_recursive(spinlock_t *lock)
 {
-    int cpu = smp_processor_id();
+    unsigned int cpu = smp_processor_id();
 
     /* Don't allow overflow of recurse_cpu field. */
     BUILD_BUG_ON(NR_CPUS > 0xfffu);
@@ -256,8 +259,17 @@ int _spin_trylock_recursive(spinlock_t *lock)
 
 void _spin_lock_recursive(spinlock_t *lock)
 {
-    while ( !spin_trylock_recursive(lock) )
-        cpu_relax();
+    unsigned int cpu = smp_processor_id();
+
+    if ( likely(lock->recurse_cpu != cpu) )
+    {
+        spin_lock(lock);
+        lock->recurse_cpu = cpu;
+    }
+
+    /* We support only fairly shallow recursion, else the counter overflows. */
+    ASSERT(lock->recurse_cnt < 0xfu);
+    lock->recurse_cnt++;
 }
 
 void _spin_unlock_recursive(spinlock_t *lock)
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index eda9b2e..bafbc74 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -80,8 +80,7 @@ struct lock_profile_qhead {
     static struct lock_profile *__lock_profile_##name                         \
     __used_section(".lockprofile.data") =                                     \
     &__lock_profile_data_##name
-#define _SPIN_LOCK_UNLOCKED(x) { _RAW_SPIN_LOCK_UNLOCKED, 0xfffu, 0,          \
-                                 _LOCK_DEBUG, x }
+#define _SPIN_LOCK_UNLOCKED(x) { { 0 }, 0xfffu, 0, _LOCK_DEBUG, x }
 #define SPIN_LOCK_UNLOCKED _SPIN_LOCK_UNLOCKED(NULL)
 #define DEFINE_SPINLOCK(l)                                                    \
     spinlock_t l = _SPIN_LOCK_UNLOCKED(NULL);                                 \
@@ -117,8 +116,7 @@ extern void spinlock_profile_reset(unsigned char key);
 
 struct lock_profile_qhead { };
 
-#define SPIN_LOCK_UNLOCKED                                                    \
-    { _RAW_SPIN_LOCK_UNLOCKED, 0xfffu, 0, _LOCK_DEBUG }
+#define SPIN_LOCK_UNLOCKED { { 0 }, 0xfffu, 0, _LOCK_DEBUG }
 #define DEFINE_SPINLOCK(l) spinlock_t l = SPIN_LOCK_UNLOCKED
 
 #define spin_lock_init_prof(s, l) spin_lock_init(&((s)->l))
@@ -127,8 +125,16 @@ struct lock_profile_qhead { };
 
 #endif
 
+typedef union {
+    u32 head_tail;
+    struct {
+        u16 head;
+        u16 tail;
+    };
+} spinlock_tickets_t;
+
 typedef struct spinlock {
-    raw_spinlock_t raw;
+    spinlock_tickets_t tickets;
     u16 recurse_cpu:12;
     u16 recurse_cnt:4;
     struct lock_debug debug;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCHv3 4/4] x86, arm: remove asm/spinlock.h from all architectures
  2015-04-21 10:11 [PATCHv3 0/4] Use ticket locks for spinlocks David Vrabel
                   ` (2 preceding siblings ...)
  2015-04-21 10:11 ` [PATCHv3 3/4] xen: use ticket locks for spin locks David Vrabel
@ 2015-04-21 10:11 ` David Vrabel
  2015-04-21 10:22 ` [PATCHv3 0/4] Use ticket locks for spinlocks Jan Beulich
  4 siblings, 0 replies; 36+ messages in thread
From: David Vrabel @ 2015-04-21 10:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, David Vrabel, Jan Beulich, Jennifer Herbert

Now that all architecture use a common ticket lock implementation for
spinlocks, remove the architecture specific byte lock implementations.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/arm/README.LinuxPrimitives  |   28 ---------------
 xen/include/asm-arm/arm32/spinlock.h |   66 ----------------------------------
 xen/include/asm-arm/arm64/spinlock.h |   63 --------------------------------
 xen/include/asm-arm/spinlock.h       |   23 ------------
 xen/include/asm-x86/spinlock.h       |   37 -------------------
 xen/include/xen/spinlock.h           |    1 -
 6 files changed, 218 deletions(-)
 delete mode 100644 xen/include/asm-arm/arm32/spinlock.h
 delete mode 100644 xen/include/asm-arm/arm64/spinlock.h
 delete mode 100644 xen/include/asm-arm/spinlock.h
 delete mode 100644 xen/include/asm-x86/spinlock.h

diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives
index 7f33fc7..3115f51 100644
--- a/xen/arch/arm/README.LinuxPrimitives
+++ b/xen/arch/arm/README.LinuxPrimitives
@@ -25,16 +25,6 @@ linux/arch/arm64/include/asm/atomic.h   xen/include/asm-arm/arm64/atomic.h
 
 ---------------------------------------------------------------------
 
-spinlocks: last sync @ v3.16-rc6 (last commit: 95c4189689f9)
-
-linux/arch/arm64/include/asm/spinlock.h xen/include/asm-arm/arm64/spinlock.h
-
-Skipped:
-  5686b06 arm64: lockref: add support for lockless lockrefs using cmpxchg
-  52ea2a5 arm64: locks: introduce ticket-based spinlock implementation
-
----------------------------------------------------------------------
-
 mem*: last sync @ v3.16-rc6 (last commit: d875c9b37240)
 
 linux/arch/arm64/lib/memchr.S           xen/arch/arm/arm64/lib/memchr.S
@@ -103,24 +93,6 @@ linux/arch/arm/include/asm/atomic.h     xen/include/asm-arm/arm32/atomic.h
 
 ---------------------------------------------------------------------
 
-spinlocks: last sync: 15e7e5c1ebf5
-
-linux/arch/arm/include/asm/spinlock.h   xen/include/asm-arm/arm32/spinlock.h
-
-*** Linux has switched to ticket locks but we still use bitlocks.
-
-resync to v3.14-rc7:
-
-  7c8746a ARM: 7955/1: spinlock: ensure we have a compiler barrier before sev
-  0cbad9c ARM: 7854/1: lockref: add support for lockless lockrefs using cmpxchg64
-  9bb17be ARM: locks: prefetch the destination word for write prior to strex
-  27a8479 ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.
-  00efaa0 ARM: 7812/1: rwlocks: retry trylock operation if strex fails on free lo
-  afa31d8 ARM: 7811/1: locks: use early clobber in arch_spin_trylock
-  73a6fdc ARM: spinlock: use inner-shareable dsb variant prior to sev instruction
-
----------------------------------------------------------------------
-
 mem*: last sync @ v3.16-rc6 (last commit: d98b90ea22b0)
 
 linux/arch/arm/lib/copy_template.S      xen/arch/arm/arm32/lib/copy_template.S
diff --git a/xen/include/asm-arm/arm32/spinlock.h b/xen/include/asm-arm/arm32/spinlock.h
deleted file mode 100644
index bc0343c..0000000
--- a/xen/include/asm-arm/arm32/spinlock.h
+++ /dev/null
@@ -1,66 +0,0 @@
-#ifndef __ASM_ARM32_SPINLOCK_H
-#define __ASM_ARM32_SPINLOCK_H
-
-static inline void dsb_sev(void)
-{
-    __asm__ __volatile__ (
-        "dsb\n"
-        "sev\n"
-        );
-}
-
-typedef struct {
-    volatile unsigned int lock;
-} raw_spinlock_t;
-
-#define _RAW_SPIN_LOCK_UNLOCKED { 0 }
-
-#define _raw_spin_is_locked(x)          ((x)->lock != 0)
-
-static always_inline void _raw_spin_unlock(raw_spinlock_t *lock)
-{
-    ASSERT(_raw_spin_is_locked(lock));
-
-    smp_mb();
-
-    __asm__ __volatile__(
-"   str     %1, [%0]\n"
-    :
-    : "r" (&lock->lock), "r" (0)
-    : "cc");
-
-    dsb_sev();
-}
-
-static always_inline int _raw_spin_trylock(raw_spinlock_t *lock)
-{
-    unsigned long contended, res;
-
-    do {
-        __asm__ __volatile__(
-    "   ldrex   %0, [%2]\n"
-    "   teq     %0, #0\n"
-    "   strexeq %1, %3, [%2]\n"
-    "   movne   %1, #0\n"
-        : "=&r" (contended), "=r" (res)
-        : "r" (&lock->lock), "r" (1)
-        : "cc");
-    } while (res);
-
-    if (!contended) {
-        smp_mb();
-        return 1;
-    } else {
-        return 0;
-    }
-}
-
-#endif /* __ASM_SPINLOCK_H */
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/xen/include/asm-arm/arm64/spinlock.h b/xen/include/asm-arm/arm64/spinlock.h
deleted file mode 100644
index 5ae034d..0000000
--- a/xen/include/asm-arm/arm64/spinlock.h
+++ /dev/null
@@ -1,63 +0,0 @@
-/*
- * Derived from Linux arch64 spinlock.h which is:
- * Copyright (C) 2012 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef __ASM_ARM64_SPINLOCK_H
-#define __ASM_ARM64_SPINLOCK_H
-
-typedef struct {
-    volatile unsigned int lock;
-} raw_spinlock_t;
-
-#define _RAW_SPIN_LOCK_UNLOCKED { 0 }
-
-#define _raw_spin_is_locked(x)          ((x)->lock != 0)
-
-static always_inline void _raw_spin_unlock(raw_spinlock_t *lock)
-{
-    ASSERT(_raw_spin_is_locked(lock));
-
-    asm volatile(
-        "       stlr    %w1, %0\n"
-        : "=Q" (lock->lock) : "r" (0) : "memory");
-}
-
-static always_inline int _raw_spin_trylock(raw_spinlock_t *lock)
-{
-    unsigned int tmp;
-
-    asm volatile(
-        "2:     ldaxr   %w0, %1\n"
-        "       cbnz    %w0, 1f\n"
-        "       stxr    %w0, %w2, %1\n"
-        "       cbnz    %w0, 2b\n"
-        "1:\n"
-        : "=&r" (tmp), "+Q" (lock->lock)
-        : "r" (1)
-        : "cc", "memory");
-
-    return !tmp;
-}
-
-#endif /* __ASM_SPINLOCK_H */
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/xen/include/asm-arm/spinlock.h b/xen/include/asm-arm/spinlock.h
deleted file mode 100644
index a064f73..0000000
--- a/xen/include/asm-arm/spinlock.h
+++ /dev/null
@@ -1,23 +0,0 @@
-#ifndef __ASM_SPINLOCK_H
-#define __ASM_SPINLOCK_H
-
-#include <xen/config.h>
-#include <xen/lib.h>
-
-#if defined(CONFIG_ARM_32)
-# include <asm/arm32/spinlock.h>
-#elif defined(CONFIG_ARM_64)
-# include <asm/arm64/spinlock.h>
-#else
-# error "unknown ARM variant"
-#endif
-
-#endif /* __ASM_SPINLOCK_H */
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/xen/include/asm-x86/spinlock.h b/xen/include/asm-x86/spinlock.h
deleted file mode 100644
index 757e20b..0000000
--- a/xen/include/asm-x86/spinlock.h
+++ /dev/null
@@ -1,37 +0,0 @@
-#ifndef __ASM_SPINLOCK_H
-#define __ASM_SPINLOCK_H
-
-#include <xen/config.h>
-#include <xen/lib.h>
-#include <asm/atomic.h>
-
-typedef struct {
-    volatile s16 lock;
-} raw_spinlock_t;
-
-#define _RAW_SPIN_LOCK_UNLOCKED /*(raw_spinlock_t)*/ { 1 }
-
-#define _raw_spin_is_locked(x) ((x)->lock <= 0)
-
-static always_inline void _raw_spin_unlock(raw_spinlock_t *lock)
-{
-    ASSERT(_raw_spin_is_locked(lock));
-    asm volatile (
-        "movw $1,%0" 
-        : "=m" (lock->lock) : : "memory" );
-}
-
-static always_inline int _raw_spin_trylock(raw_spinlock_t *lock)
-{
-    s16 oldval;
-    asm volatile (
-        "xchgw %w0,%1"
-        :"=r" (oldval), "=m" (lock->lock)
-        :"0" ((s16)0) : "memory" );
-    return (oldval > 0);
-}
-
-#define _raw_read_unlock(l) \
-    asm volatile ( "lock; dec%z0 %0" : "+m" ((l)->lock) :: "memory" )
-
-#endif /* __ASM_SPINLOCK_H */
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index bafbc74..311685a 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -2,7 +2,6 @@
 #define __SPINLOCK_H__
 
 #include <asm/system.h>
-#include <asm/spinlock.h>
 
 #ifndef NDEBUG
 struct lock_debug {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 0/4] Use ticket locks for spinlocks
  2015-04-21 10:11 [PATCHv3 0/4] Use ticket locks for spinlocks David Vrabel
                   ` (3 preceding siblings ...)
  2015-04-21 10:11 ` [PATCHv3 4/4] x86, arm: remove asm/spinlock.h from all architectures David Vrabel
@ 2015-04-21 10:22 ` Jan Beulich
  2015-04-21 11:24   ` Jennifer Herbert
  2015-04-23 12:42   ` David Vrabel
  4 siblings, 2 replies; 36+ messages in thread
From: Jan Beulich @ 2015-04-21 10:22 UTC (permalink / raw)
  To: David Vrabel, Jennifer Herbert
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel

>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
> We have analysed the affect of this series on interrupt latency (by
> measuring the duration of irq disable/enable regions) and there is no
> signficant impact.
> 
> http://xenbits.xen.org/people/dvrabel/bar2_comp.png 

Thanks for doing this!

> Interestingly, the biggest offenders for long critical sections are
> bare irq disable/enable regions, and some of these are really bad (10s
> of ms)!
> 
> http://xenbits.xen.org/people/dvrabel/bar_normal_busy.png 

Despite both pictures saying micro-seconds at the respective
axis (and hence the problem not being _as bad_) - did the data
collection reveal where these IRQ disable regions are, so we
could look into eliminating them? (ISTR there being some open
coded IRQ-disable, lock, unlock, IRQ-restore/enable sequences,
but I'm not sure whether they got eliminated already.)

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 1/4] x86: provide xadd()
  2015-04-21 10:11 ` [PATCHv3 1/4] x86: provide xadd() David Vrabel
@ 2015-04-21 10:36   ` Jan Beulich
  2015-04-21 12:36     ` David Vrabel
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2015-04-21 10:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Tim Deegan, xen-devel

>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
> +static always_inline unsigned long __xadd(
> +    volatile void *ptr, unsigned long v, int size)
> +{
> +    switch ( size )
> +    {
> +    case 1:
> +        asm volatile ( "lock; xaddb %b0,%1"
> +                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
> +                       :: "memory");
> +        return v;

This doesn't seem to guarantee to return the old value: When the
passed in v has more than 8 significant bits (which will get ignored
as input), nothing will zap those bits from the register. Same for
the 16-bit case obviously.

> +#define xadd(ptr, v) ({                                         \
> +            __xadd((ptr), (unsigned long)(v), sizeof(*(ptr)));  \
> +        })

Assuming only xadd() is supposed to be used directly, perhaps
the easiest would be to cast v to typeof(*(ptr)) (instead of
unsigned long) here?

Also note that there's still a superfluous pair of parentheses
left here.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 0/4] Use ticket locks for spinlocks
  2015-04-21 10:22 ` [PATCHv3 0/4] Use ticket locks for spinlocks Jan Beulich
@ 2015-04-21 11:24   ` Jennifer Herbert
  2015-04-23 12:42   ` David Vrabel
  1 sibling, 0 replies; 36+ messages in thread
From: Jennifer Herbert @ 2015-04-21 11:24 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel


On 21/04/15 11:22, Jan Beulich wrote:
> Despite both pictures saying micro-seconds at the respective axis (and 
> hence the problem not being _as bad_) - did the data collection reveal 
> where these IRQ disable regions are, so we could look into eliminating 
> them? (ISTR there being some open coded IRQ-disable, lock, unlock, 
> IRQ-restore/enable sequences, but I'm not sure whether they got 
> eliminated already.) Jan 

The location of the locks was not collected, (Other then if it was from 
a spinlock or not) but I'm certainly considering doing this, as another 
graph of duration against cumulative time, shows the majority of time 
spent with interrupts disabled, is during code that keeps it disabled 
for long contiguous periods, all using raw (not spinlock) Interrupt 
disable.  (A few occurrences, but very long when they happen)  It would 
certainly be interesting to know where these come from.

-jenny

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 1/4] x86: provide xadd()
  2015-04-21 10:36   ` Jan Beulich
@ 2015-04-21 12:36     ` David Vrabel
  2015-04-21 13:12       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-21 12:36 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, Jennifer Herbert, xen-devel

On 21/04/15 11:36, Jan Beulich wrote:
>>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
>> +static always_inline unsigned long __xadd(
>> +    volatile void *ptr, unsigned long v, int size)
>> +{
>> +    switch ( size )
>> +    {
>> +    case 1:
>> +        asm volatile ( "lock; xaddb %b0,%1"
>> +                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
>> +                       :: "memory");
>> +        return v;
> 
> This doesn't seem to guarantee to return the old value: When the
> passed in v has more than 8 significant bits (which will get ignored
> as input), nothing will zap those bits from the register. Same for
> the 16-bit case obviously.
> 
>> +#define xadd(ptr, v) ({                                         \
>> +            __xadd((ptr), (unsigned long)(v), sizeof(*(ptr)));  \
>> +        })
> 
> Assuming only xadd() is supposed to be used directly, perhaps
> the easiest would be to cast v to typeof(*(ptr)) (instead of
> unsigned long) here?

I don't see how this helps.  Did you perhaps mean cast the result?

#define xadd(ptr, v) ({                                    \
            (typeof *(ptr))__xadd(ptr, (unsigned long)(v), \
                                  sizeof(*(ptr)));         \
        })

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 1/4] x86: provide xadd()
  2015-04-21 12:36     ` David Vrabel
@ 2015-04-21 13:12       ` Jan Beulich
  2015-04-21 13:15         ` David Vrabel
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2015-04-21 13:12 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Tim Deegan, xen-devel

>>> On 21.04.15 at 14:36, <david.vrabel@citrix.com> wrote:
> On 21/04/15 11:36, Jan Beulich wrote:
>>>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
>>> +static always_inline unsigned long __xadd(
>>> +    volatile void *ptr, unsigned long v, int size)
>>> +{
>>> +    switch ( size )
>>> +    {
>>> +    case 1:
>>> +        asm volatile ( "lock; xaddb %b0,%1"
>>> +                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
>>> +                       :: "memory");
>>> +        return v;
>> 
>> This doesn't seem to guarantee to return the old value: When the
>> passed in v has more than 8 significant bits (which will get ignored
>> as input), nothing will zap those bits from the register. Same for
>> the 16-bit case obviously.
>> 
>>> +#define xadd(ptr, v) ({                                         \
>>> +            __xadd((ptr), (unsigned long)(v), sizeof(*(ptr)));  \
>>> +        })
>> 
>> Assuming only xadd() is supposed to be used directly, perhaps
>> the easiest would be to cast v to typeof(*(ptr)) (instead of
>> unsigned long) here?
> 
> I don't see how this helps.  Did you perhaps mean cast the result?
> 
> #define xadd(ptr, v) ({                                    \
>             (typeof *(ptr))__xadd(ptr, (unsigned long)(v), \
>                                   sizeof(*(ptr)));         \
>         })

Casting the result would work too; casting the input would have
the same effect because (as said) the actual xadd doesn't alter
bits 8...63 (or 16...63 in the 16-bit case), i.e. whether zero
extension happens before or after doing the xadd doesn't matter.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 1/4] x86: provide xadd()
  2015-04-21 13:12       ` Jan Beulich
@ 2015-04-21 13:15         ` David Vrabel
  2015-04-21 13:20           ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-21 13:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Tim Deegan, xen-devel

On 21/04/15 14:12, Jan Beulich wrote:
>>>> On 21.04.15 at 14:36, <david.vrabel@citrix.com> wrote:
>> On 21/04/15 11:36, Jan Beulich wrote:
>>>>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
>>>> +static always_inline unsigned long __xadd(
>>>> +    volatile void *ptr, unsigned long v, int size)
>>>> +{
>>>> +    switch ( size )
>>>> +    {
>>>> +    case 1:
>>>> +        asm volatile ( "lock; xaddb %b0,%1"
>>>> +                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
>>>> +                       :: "memory");
>>>> +        return v;
>>>
>>> This doesn't seem to guarantee to return the old value: When the
>>> passed in v has more than 8 significant bits (which will get ignored
>>> as input), nothing will zap those bits from the register. Same for
>>> the 16-bit case obviously.
>>>
>>>> +#define xadd(ptr, v) ({                                         \
>>>> +            __xadd((ptr), (unsigned long)(v), sizeof(*(ptr)));  \
>>>> +        })
>>>
>>> Assuming only xadd() is supposed to be used directly, perhaps
>>> the easiest would be to cast v to typeof(*(ptr)) (instead of
>>> unsigned long) here?
>>
>> I don't see how this helps.  Did you perhaps mean cast the result?
>>
>> #define xadd(ptr, v) ({                                    \
>>             (typeof *(ptr))__xadd(ptr, (unsigned long)(v), \
>>                                   sizeof(*(ptr)));         \
>>         })
> 
> Casting the result would work too; casting the input would have
> the same effect because (as said) the actual xadd doesn't alter
> bits 8...63 (or 16...63 in the 16-bit case), i.e. whether zero
> extension happens before or after doing the xadd doesn't matter.

Oh yes, of course.  Any preference to which method?

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 1/4] x86: provide xadd()
  2015-04-21 13:15         ` David Vrabel
@ 2015-04-21 13:20           ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-04-21 13:20 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Tim Deegan, xen-devel

>>> On 21.04.15 at 15:15, <david.vrabel@citrix.com> wrote:
> On 21/04/15 14:12, Jan Beulich wrote:
>>>>> On 21.04.15 at 14:36, <david.vrabel@citrix.com> wrote:
>>> On 21/04/15 11:36, Jan Beulich wrote:
>>>>>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
>>>>> +static always_inline unsigned long __xadd(
>>>>> +    volatile void *ptr, unsigned long v, int size)
>>>>> +{
>>>>> +    switch ( size )
>>>>> +    {
>>>>> +    case 1:
>>>>> +        asm volatile ( "lock; xaddb %b0,%1"
>>>>> +                       : "+r" (v), "+m" (*__xg((volatile void *)ptr))
>>>>> +                       :: "memory");
>>>>> +        return v;
>>>>
>>>> This doesn't seem to guarantee to return the old value: When the
>>>> passed in v has more than 8 significant bits (which will get ignored
>>>> as input), nothing will zap those bits from the register. Same for
>>>> the 16-bit case obviously.
>>>>
>>>>> +#define xadd(ptr, v) ({                                         \
>>>>> +            __xadd((ptr), (unsigned long)(v), sizeof(*(ptr)));  \
>>>>> +        })
>>>>
>>>> Assuming only xadd() is supposed to be used directly, perhaps
>>>> the easiest would be to cast v to typeof(*(ptr)) (instead of
>>>> unsigned long) here?
>>>
>>> I don't see how this helps.  Did you perhaps mean cast the result?
>>>
>>> #define xadd(ptr, v) ({                                    \
>>>             (typeof *(ptr))__xadd(ptr, (unsigned long)(v), \
>>>                                   sizeof(*(ptr)));         \
>>>         })
>> 
>> Casting the result would work too; casting the input would have
>> the same effect because (as said) the actual xadd doesn't alter
>> bits 8...63 (or 16...63 in the 16-bit case), i.e. whether zero
>> extension happens before or after doing the xadd doesn't matter.
> 
> Oh yes, of course.  Any preference to which method?

Not really - I guess the way you proposed it is the more obvious
one.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-21 10:11 ` [PATCHv3 3/4] xen: use ticket locks for spin locks David Vrabel
@ 2015-04-23 11:58   ` Jan Beulich
  2015-04-28 15:56     ` David Vrabel
  2015-04-23 12:03   ` Tim Deegan
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2015-04-23 11:58 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Tim Deegan, xen-devel

>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
> @@ -213,27 +211,32 @@ int _spin_trylock(spinlock_t *lock)
>  
>  void _spin_barrier(spinlock_t *lock)
>  {
> +    spinlock_tickets_t sample;
>  #ifdef LOCK_PROFILE
>      s_time_t block = NOW();
> -    u64      loop = 0;
> +#endif
>  
>      check_barrier(&lock->debug);
> -    do { smp_mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) );
> -    if ((loop > 1) && lock->profile)
> +    sample = observe_lock(&lock->tickets);
> +    if (sample.head != sample.tail)

Even if the old code didn't conform to coding style, please make sure
the new code does (a few more further down).

> @@ -127,8 +125,16 @@ struct lock_profile_qhead { };
>  
>  #endif
>  
> +typedef union {
> +    u32 head_tail;
> +    struct {
> +        u16 head;
> +        u16 tail;
> +    };
> +} spinlock_tickets_t;
> +
>  typedef struct spinlock {
> -    raw_spinlock_t raw;
> +    spinlock_tickets_t tickets;

At least for x86 this means a growth of this and hence various
other structures - did you examine the effects thereof? Of
course otoh getting the lock size uniform across architectures
is a good thing.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-21 10:11 ` [PATCHv3 3/4] xen: use ticket locks for spin locks David Vrabel
  2015-04-23 11:58   ` Jan Beulich
@ 2015-04-23 12:03   ` Tim Deegan
  2015-04-23 13:43     ` David Vrabel
  2015-04-23 13:54     ` Jan Beulich
  1 sibling, 2 replies; 36+ messages in thread
From: Tim Deegan @ 2015-04-23 12:03 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Jan Beulich, xen-devel

Hi,

At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
>  void _spin_lock(spinlock_t *lock)
>  {
> +    spinlock_tickets_t tickets = { .tail = 1, };
>      LOCK_PROFILE_VAR;
>  
>      check_lock(&lock->debug);
> -    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
> +    tickets.head_tail = xadd(&lock->tickets.head_tail, tickets.head_tail);
> +    while ( tickets.tail != observe_head(&lock->tickets) )
>      {
>          LOCK_PROFILE_BLOCK;
> -        while ( likely(_raw_spin_is_locked(&lock->raw)) )
> -            cpu_relax();
> +        cpu_relax();
>      }
>      LOCK_PROFILE_GOT;
>      preempt_disable();

I think you need an smp_mb() here to make sure that locked accesses
don't get hoisted past the wait-for-my-ticket loop by an out-of-order
(ARM) cpu.

>  void _spin_unlock(spinlock_t *lock)
>  {
> +    smp_mb();
>      preempt_enable();
>      LOCK_PROFILE_REL;
> -    _raw_spin_unlock(&lock->raw);
> +    lock->tickets.head++;

This needs to be done with an explicit atomic (though not locked)
write; otherwise the compiler might use some unsuitable operation that
clobbers .tail as well.  (For some reason I convinced myself last
time that this was OK, but I can't remember why. :))

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 0/4] Use ticket locks for spinlocks
  2015-04-21 10:22 ` [PATCHv3 0/4] Use ticket locks for spinlocks Jan Beulich
  2015-04-21 11:24   ` Jennifer Herbert
@ 2015-04-23 12:42   ` David Vrabel
  2015-04-30 15:44     ` David Vrabel
  1 sibling, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-23 12:42 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel, Jennifer Herbert
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel

On 21/04/15 11:22, Jan Beulich wrote:
>>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
>> We have analysed the affect of this series on interrupt latency (by
>> measuring the duration of irq disable/enable regions) and there is no
>> signficant impact.
>>
>> http://xenbits.xen.org/people/dvrabel/bar2_comp.png 
> 
> Thanks for doing this!
> 
>> Interestingly, the biggest offenders for long critical sections are
>> bare irq disable/enable regions, and some of these are really bad (10s
>> of ms)!
>>
>> http://xenbits.xen.org/people/dvrabel/bar_normal_busy.png 
> 
> Despite both pictures saying micro-seconds at the respective
> axis (and hence the problem not being _as bad_)

This particular graph doesn't show it very well but there are small
blips around 4, 10, and 31 ms.

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 12:03   ` Tim Deegan
@ 2015-04-23 13:43     ` David Vrabel
  2015-04-23 13:45       ` Andrew Cooper
  2015-04-23 14:24       ` Tim Deegan
  2015-04-23 13:54     ` Jan Beulich
  1 sibling, 2 replies; 36+ messages in thread
From: David Vrabel @ 2015-04-23 13:43 UTC (permalink / raw)
  To: Tim Deegan, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Jan Beulich, xen-devel

On 23/04/15 13:03, Tim Deegan wrote:
> Hi,
> 
> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
>>  void _spin_lock(spinlock_t *lock)
>>  {
>> +    spinlock_tickets_t tickets = { .tail = 1, };
>>      LOCK_PROFILE_VAR;
>>  
>>      check_lock(&lock->debug);
>> -    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
>> +    tickets.head_tail = xadd(&lock->tickets.head_tail, tickets.head_tail);
>> +    while ( tickets.tail != observe_head(&lock->tickets) )
>>      {
>>          LOCK_PROFILE_BLOCK;
>> -        while ( likely(_raw_spin_is_locked(&lock->raw)) )
>> -            cpu_relax();
>> +        cpu_relax();
>>      }
>>      LOCK_PROFILE_GOT;
>>      preempt_disable();
> 
> I think you need an smp_mb() here to make sure that locked accesses
> don't get hoisted past the wait-for-my-ticket loop by an out-of-order
> (ARM) cpu.

Ok, but smp_mb() turns into an mfence on x86.  Is this a
problem/sub-optimal?

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 13:43     ` David Vrabel
@ 2015-04-23 13:45       ` Andrew Cooper
  2015-04-23 14:58         ` Tim Deegan
  2015-04-23 14:24       ` Tim Deegan
  1 sibling, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2015-04-23 13:45 UTC (permalink / raw)
  To: David Vrabel, Tim Deegan
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Jennifer Herbert,
	Jan Beulich, xen-devel

On 23/04/15 14:43, David Vrabel wrote:
> On 23/04/15 13:03, Tim Deegan wrote:
>> Hi,
>>
>> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
>>>  void _spin_lock(spinlock_t *lock)
>>>  {
>>> +    spinlock_tickets_t tickets = { .tail = 1, };
>>>      LOCK_PROFILE_VAR;
>>>  
>>>      check_lock(&lock->debug);
>>> -    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
>>> +    tickets.head_tail = xadd(&lock->tickets.head_tail, tickets.head_tail);
>>> +    while ( tickets.tail != observe_head(&lock->tickets) )
>>>      {
>>>          LOCK_PROFILE_BLOCK;
>>> -        while ( likely(_raw_spin_is_locked(&lock->raw)) )
>>> -            cpu_relax();
>>> +        cpu_relax();
>>>      }
>>>      LOCK_PROFILE_GOT;
>>>      preempt_disable();
>> I think you need an smp_mb() here to make sure that locked accesses
>> don't get hoisted past the wait-for-my-ticket loop by an out-of-order
>> (ARM) cpu.
> Ok, but smp_mb() turns into an mfence on x86.  Is this a
> problem/sub-optimal?

That can probably change back to a plain compiler barrier as Xen only
supports 64bit and doesn't need PPRO ordering errata workarounds.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 12:03   ` Tim Deegan
  2015-04-23 13:43     ` David Vrabel
@ 2015-04-23 13:54     ` Jan Beulich
  2015-04-23 14:43       ` Tim Deegan
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2015-04-23 13:54 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, David Vrabel, xen-devel

>>> On 23.04.15 at 14:03, <tim@xen.org> wrote:
> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
>>  void _spin_unlock(spinlock_t *lock)
>>  {
>> +    smp_mb();
>>      preempt_enable();
>>      LOCK_PROFILE_REL;
>> -    _raw_spin_unlock(&lock->raw);
>> +    lock->tickets.head++;
> 
> This needs to be done with an explicit atomic (though not locked)
> write; otherwise the compiler might use some unsuitable operation that
> clobbers .tail as well.

How do you imagine that to happen? An increment of one
structure member surely won't modify any others.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 13:43     ` David Vrabel
  2015-04-23 13:45       ` Andrew Cooper
@ 2015-04-23 14:24       ` Tim Deegan
  2015-04-23 14:51         ` Tim Deegan
  1 sibling, 1 reply; 36+ messages in thread
From: Tim Deegan @ 2015-04-23 14:24 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Jan Beulich, xen-devel

At 14:43 +0100 on 23 Apr (1429800229), David Vrabel wrote:
> On 23/04/15 13:03, Tim Deegan wrote:
> > Hi,
> > 
> > At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
> >>  void _spin_lock(spinlock_t *lock)
> >>  {
> >> +    spinlock_tickets_t tickets = { .tail = 1, };
> >>      LOCK_PROFILE_VAR;
> >>  
> >>      check_lock(&lock->debug);
> >> -    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
> >> +    tickets.head_tail = xadd(&lock->tickets.head_tail, tickets.head_tail);
> >> +    while ( tickets.tail != observe_head(&lock->tickets) )
> >>      {
> >>          LOCK_PROFILE_BLOCK;
> >> -        while ( likely(_raw_spin_is_locked(&lock->raw)) )
> >> -            cpu_relax();
> >> +        cpu_relax();
> >>      }
> >>      LOCK_PROFILE_GOT;
> >>      preempt_disable();
> > 
> > I think you need an smp_mb() here to make sure that locked accesses
> > don't get hoisted past the wait-for-my-ticket loop by an out-of-order
> > (ARM) cpu.
> 
> Ok, but smp_mb() turns into an mfence on x86.  Is this a
> problem/sub-optimal?

So, having chased this around my head for a while, I think you're
right.  Expanding this code a bit, I think the important ops are:

(in observe_head())
    smp_rmb() (== barrier())
    [ POINT A ]
    read lock and see that we have acquired it
(in preempt_disable())    
    increment disable count (this is both a read and a write)
    barrier();
    [ POINT B ]

A read after point B can't see unlocked state because of the second
compiler barrier and the fact that x86 won't reorder it past the read
in observe_head().

A write after point B can't be observed before we have the lock
because
1) the second barrier stops the compiler reorderign before the increment;
2) x86 won't make it visible before the write half of the increment;
3) the write half of the increment can't happen before the read half; and
4) the read half can't happen before the read in observe_head().

I'm not 100% sure about (3), as it happens; maybe either the compiler
or the CPU might do something unexpected there?

So probably, on x86, we don't need the mfence.  A bit fragile,
though, relying on the internals of preempt_disable().

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 13:54     ` Jan Beulich
@ 2015-04-23 14:43       ` Tim Deegan
  2015-04-23 14:58         ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tim Deegan @ 2015-04-23 14:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, David Vrabel, xen-devel

At 14:54 +0100 on 23 Apr (1429800874), Jan Beulich wrote:
> >>> On 23.04.15 at 14:03, <tim@xen.org> wrote:
> > At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
> >>  void _spin_unlock(spinlock_t *lock)
> >>  {
> >> +    smp_mb();
> >>      preempt_enable();
> >>      LOCK_PROFILE_REL;
> >> -    _raw_spin_unlock(&lock->raw);
> >> +    lock->tickets.head++;
> > 
> > This needs to be done with an explicit atomic (though not locked)
> > write; otherwise the compiler might use some unsuitable operation that
> > clobbers .tail as well.
> 
> How do you imagine that to happen? An increment of one
> structure member surely won't modify any others.

AIUI, the '++' could end up as a word-size read, modify, and word-size
write.  If another CPU updates .tail parallel, that update could get
lost.

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 14:24       ` Tim Deegan
@ 2015-04-23 14:51         ` Tim Deegan
  0 siblings, 0 replies; 36+ messages in thread
From: Tim Deegan @ 2015-04-23 14:51 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Jan Beulich, xen-devel

At 15:24 +0100 on 23 Apr (1429802696), Tim Deegan wrote:
> At 14:43 +0100 on 23 Apr (1429800229), David Vrabel wrote:
> > On 23/04/15 13:03, Tim Deegan wrote:
> > > Hi,
> > > 
> > > At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
> > >>  void _spin_lock(spinlock_t *lock)
> > >>  {
> > >> +    spinlock_tickets_t tickets = { .tail = 1, };
> > >>      LOCK_PROFILE_VAR;
> > >>  
> > >>      check_lock(&lock->debug);
> > >> -    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
> > >> +    tickets.head_tail = xadd(&lock->tickets.head_tail, tickets.head_tail);
> > >> +    while ( tickets.tail != observe_head(&lock->tickets) )
> > >>      {
> > >>          LOCK_PROFILE_BLOCK;
> > >> -        while ( likely(_raw_spin_is_locked(&lock->raw)) )
> > >> -            cpu_relax();
> > >> +        cpu_relax();
> > >>      }
> > >>      LOCK_PROFILE_GOT;
> > >>      preempt_disable();
> > > 
> > > I think you need an smp_mb() here to make sure that locked accesses
> > > don't get hoisted past the wait-for-my-ticket loop by an out-of-order
> > > (ARM) cpu.
> > 
> > Ok, but smp_mb() turns into an mfence on x86.  Is this a
> > problem/sub-optimal?
> 
> So, having chased this around my head for a while, I think you're
> right.  Expanding this code a bit, I think the important ops are:
> 
> (in observe_head())
>     smp_rmb() (== barrier())
>     [ POINT A ]
>     read lock and see that we have acquired it
> (in preempt_disable())    
>     increment disable count (this is both a read and a write)
>     barrier();
>     [ POINT B ]
> 
> A read after point B can't see unlocked state because of the second
> compiler barrier and the fact that x86 won't reorder it past the read
> in observe_head().
> 
> A write after point B can't be observed before we have the lock
> because
> 1) the second barrier stops the compiler reorderign before the increment;
> 2) x86 won't make it visible before the write half of the increment;
> 3) the write half of the increment can't happen before the read half; and
> 4) the read half can't happen before the read in observe_head().
> 
> I'm not 100% sure about (3), as it happens; maybe either the compiler
> or the CPU might do something unexpected there?
> 
> So probably, on x86, we don't need the mfence.  A bit fragile,
> though, relying on the internals of preempt_disable().

Andrew pointed me at the SDM (vol 3, 8.2.2.) that says that writes
are not reordered with older reads.  So the write case is the same as
the read case: the read in observe_head() is the important one and all
we need here is a barrier().

I guess you could invent an arch_lock_acquire_barrier() or similar
to go here, that expands to barrier() on x86 and dmb() on ARM...

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 14:43       ` Tim Deegan
@ 2015-04-23 14:58         ` Jan Beulich
  2015-04-29 15:36           ` David Vrabel
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2015-04-23 14:58 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, David Vrabel, xen-devel

>>> On 23.04.15 at 16:43, <tim@xen.org> wrote:
> At 14:54 +0100 on 23 Apr (1429800874), Jan Beulich wrote:
>> >>> On 23.04.15 at 14:03, <tim@xen.org> wrote:
>> > At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
>> >>  void _spin_unlock(spinlock_t *lock)
>> >>  {
>> >> +    smp_mb();
>> >>      preempt_enable();
>> >>      LOCK_PROFILE_REL;
>> >> -    _raw_spin_unlock(&lock->raw);
>> >> +    lock->tickets.head++;
>> > 
>> > This needs to be done with an explicit atomic (though not locked)
>> > write; otherwise the compiler might use some unsuitable operation that
>> > clobbers .tail as well.
>> 
>> How do you imagine that to happen? An increment of one
>> structure member surely won't modify any others.
> 
> AIUI, the '++' could end up as a word-size read, modify, and word-size
> write.  If another CPU updates .tail parallel, that update could get
> lost.

Ah, right, compilers are allowed to do that, albeit normally wouldn't
unless the architecture has no suitable loads/stores.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 13:45       ` Andrew Cooper
@ 2015-04-23 14:58         ` Tim Deegan
  0 siblings, 0 replies; 36+ messages in thread
From: Tim Deegan @ 2015-04-23 14:58 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Jennifer Herbert,
	David Vrabel, Jan Beulich, xen-devel

At 14:45 +0100 on 23 Apr (1429800338), Andrew Cooper wrote:
> On 23/04/15 14:43, David Vrabel wrote:
> > On 23/04/15 13:03, Tim Deegan wrote:
> >> Hi,
> >>
> >> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
> >>>  void _spin_lock(spinlock_t *lock)
> >>>  {
> >>> +    spinlock_tickets_t tickets = { .tail = 1, };
> >>>      LOCK_PROFILE_VAR;
> >>>  
> >>>      check_lock(&lock->debug);
> >>> -    while ( unlikely(!_raw_spin_trylock(&lock->raw)) )
> >>> +    tickets.head_tail = xadd(&lock->tickets.head_tail, tickets.head_tail);
> >>> +    while ( tickets.tail != observe_head(&lock->tickets) )
> >>>      {
> >>>          LOCK_PROFILE_BLOCK;
> >>> -        while ( likely(_raw_spin_is_locked(&lock->raw)) )
> >>> -            cpu_relax();
> >>> +        cpu_relax();
> >>>      }
> >>>      LOCK_PROFILE_GOT;
> >>>      preempt_disable();
> >> I think you need an smp_mb() here to make sure that locked accesses
> >> don't get hoisted past the wait-for-my-ticket loop by an out-of-order
> >> (ARM) cpu.
> > Ok, but smp_mb() turns into an mfence on x86.  Is this a
> > problem/sub-optimal?
> 
> That can probably change back to a plain compiler barrier as Xen only
> supports 64bit and doesn't need PPRO ordering errata workarounds.

And for the archive: smp_mb() needs to be an mfence in the general
case, otherwise "WRITE; smp_mb(); READ;" can be reordered, if the write
and read are to different addresses.

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 11:58   ` Jan Beulich
@ 2015-04-28 15:56     ` David Vrabel
  2015-04-28 23:15       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-28 15:56 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, Jennifer Herbert, xen-devel

On 23/04/15 12:58, Jan Beulich wrote:
>
>> +typedef union {
>> +    u32 head_tail;
>> +    struct {
>> +        u16 head;
>> +        u16 tail;
>> +    };
>> +} spinlock_tickets_t;
>> +
>>  typedef struct spinlock {
>> -    raw_spinlock_t raw;
>> +    spinlock_tickets_t tickets;
> 
> At least for x86 this means a growth of this and hence various
> other structures - did you examine the effects thereof? Of
> course otoh getting the lock size uniform across architectures
> is a good thing.

I've not looked.

Are there any structures whose size you're particularly concerned about?

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-28 15:56     ` David Vrabel
@ 2015-04-28 23:15       ` Jan Beulich
  2015-04-29 15:21         ` David Vrabel
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2015-04-28 23:15 UTC (permalink / raw)
  To: david.vrabel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	jennifer.herbert, tim, xen-devel

>>> David Vrabel <david.vrabel@citrix.com> 04/28/15 6:16 PM >>>
>On 23/04/15 12:58, Jan Beulich wrote:
>>> +typedef union {
>>> +    u32 head_tail;
>>> +    struct {
>>> +        u16 head;
>>> +        u16 tail;
>>> +    };
>>> +} spinlock_tickets_t;
>>> +
>>>  typedef struct spinlock {
>>> -    raw_spinlock_t raw;
>>> +    spinlock_tickets_t tickets;
>> 
>> At least for x86 this means a growth of this and hence various
>> other structures - did you examine the effects thereof? Of
>> course otoh getting the lock size uniform across architectures
>> is a good thing.
>
>I've not looked.
>
>Are there any structures whose size you're particularly concerned about?

No specific ones (but of course structures with an inherent size constraint
- like struct domain and struct vcpu - are, with all of their sub-structures,
primary candidates). I just recall that in some cases (and this may no longer
apply due to later additions) structures got arranged specifically taking in
mind the 2-byte size of the locks, and the growth here may thus mean a
structure size growth of more than just two bytes.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-28 23:15       ` Jan Beulich
@ 2015-04-29 15:21         ` David Vrabel
  2015-04-29 23:56           ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-29 15:21 UTC (permalink / raw)
  To: Jan Beulich, david.vrabel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3, tim,
	jennifer.herbert, xen-devel

On 29/04/15 00:15, Jan Beulich wrote:
>>>> David Vrabel <david.vrabel@citrix.com> 04/28/15 6:16 PM >>>
>> On 23/04/15 12:58, Jan Beulich wrote:
>>>> +typedef union {
>>>> +    u32 head_tail;
>>>> +    struct {
>>>> +        u16 head;
>>>> +        u16 tail;
>>>> +    };
>>>> +} spinlock_tickets_t;
>>>> +
>>>>  typedef struct spinlock {
>>>> -    raw_spinlock_t raw;
>>>> +    spinlock_tickets_t tickets;
>>>
>>> At least for x86 this means a growth of this and hence various
>>> other structures - did you examine the effects thereof? Of
>>> course otoh getting the lock size uniform across architectures
>>> is a good thing.
>>
>> I've not looked.
>>
>> Are there any structures whose size you're particularly concerned about?
> 
> No specific ones (but of course structures with an inherent size constraint
> - like struct domain and struct vcpu - are, with all of their sub-structures,
> primary candidates). I just recall that in some cases (and this may no longer
> apply due to later additions) structures got arranged specifically taking in
> mind the 2-byte size of the locks, and the growth here may thus mean a
> structure size growth of more than just two bytes.

Spin locks are currently 4 bytes (2 bytes for the lock, plus 2 bytes for
the recurse_{cnt,cpu}), and ticket locks are now 8 bytes (an increase in
4 bytes).

struct vcpu does not increase in size (there is 56, now 48 bytes, of
padding before the arch field).

struct domain increases from 3480 to 3544 bytes.

There's no change to any event channel structure.

struct grant_table increases from 64 to 74 bytes, but in combination
with the pending grant table locking patches this can be made 64 bytes
again.

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-23 14:58         ` Jan Beulich
@ 2015-04-29 15:36           ` David Vrabel
  2015-04-29 16:56             ` Tim Deegan
  2015-04-29 23:48             ` Jan Beulich
  0 siblings, 2 replies; 36+ messages in thread
From: David Vrabel @ 2015-04-29 15:36 UTC (permalink / raw)
  To: Jan Beulich, Tim Deegan
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, xen-devel

On 23/04/15 15:58, Jan Beulich wrote:
>>>> On 23.04.15 at 16:43, <tim@xen.org> wrote:
>> At 14:54 +0100 on 23 Apr (1429800874), Jan Beulich wrote:
>>>>>> On 23.04.15 at 14:03, <tim@xen.org> wrote:
>>>> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
>>>>>  void _spin_unlock(spinlock_t *lock)
>>>>>  {
>>>>> +    smp_mb();
>>>>>      preempt_enable();
>>>>>      LOCK_PROFILE_REL;
>>>>> -    _raw_spin_unlock(&lock->raw);
>>>>> +    lock->tickets.head++;
>>>>
>>>> This needs to be done with an explicit atomic (though not locked)
>>>> write; otherwise the compiler might use some unsuitable operation that
>>>> clobbers .tail as well.
>>>
>>> How do you imagine that to happen? An increment of one
>>> structure member surely won't modify any others.
>>
>> AIUI, the '++' could end up as a word-size read, modify, and word-size
>> write.  If another CPU updates .tail parallel, that update could get
>> lost.
> 
> Ah, right, compilers are allowed to do that, albeit normally wouldn't
> unless the architecture has no suitable loads/stores.

lock->tickets.head++;

  7b:   66 83 07 01             addw   $0x1,(%rdi)

write_atomic(&lock->tickets.head, lock->tickets.head + 1);

  7b:   0f b7 07                movzwl (%rdi),%eax
  7e:   83 c0 01                add    $0x1,%eax
  81:   66 89 07                mov    %ax,(%rdi)

Do you want a new add_atomic() operation? e.g.,

#define add_atomic(ptr, inc) \
        asm volatile ("addw %1,%w" \
            : "+m" (*(ptr)) : "ri" (inc) : "memory")

(but obviously handling all the different sizes.)

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-29 15:36           ` David Vrabel
@ 2015-04-29 16:56             ` Tim Deegan
  2015-04-29 17:00               ` David Vrabel
  2015-04-29 23:48             ` Jan Beulich
  1 sibling, 1 reply; 36+ messages in thread
From: Tim Deegan @ 2015-04-29 16:56 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Jan Beulich, xen-devel

At 16:36 +0100 on 29 Apr (1430325362), David Vrabel wrote:
> On 23/04/15 15:58, Jan Beulich wrote:
> >>>> On 23.04.15 at 16:43, <tim@xen.org> wrote:
> >> At 14:54 +0100 on 23 Apr (1429800874), Jan Beulich wrote:
> >>>>>> On 23.04.15 at 14:03, <tim@xen.org> wrote:
> >>>> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
> >>>>>  void _spin_unlock(spinlock_t *lock)
> >>>>>  {
> >>>>> +    smp_mb();
> >>>>>      preempt_enable();
> >>>>>      LOCK_PROFILE_REL;
> >>>>> -    _raw_spin_unlock(&lock->raw);
> >>>>> +    lock->tickets.head++;
> >>>>
> >>>> This needs to be done with an explicit atomic (though not locked)
> >>>> write; otherwise the compiler might use some unsuitable operation that
> >>>> clobbers .tail as well.
> >>>
> >>> How do you imagine that to happen? An increment of one
> >>> structure member surely won't modify any others.
> >>
> >> AIUI, the '++' could end up as a word-size read, modify, and word-size
> >> write.  If another CPU updates .tail parallel, that update could get
> >> lost.
> > 
> > Ah, right, compilers are allowed to do that, albeit normally wouldn't
> > unless the architecture has no suitable loads/stores.
> 
> lock->tickets.head++;
> 
>   7b:   66 83 07 01             addw   $0x1,(%rdi)
> 
> write_atomic(&lock->tickets.head, lock->tickets.head + 1);
> 
>   7b:   0f b7 07                movzwl (%rdi),%eax
>   7e:   83 c0 01                add    $0x1,%eax
>   81:   66 89 07                mov    %ax,(%rdi)

:(

> Do you want a new add_atomic() operation? e.g.,
> 
> #define add_atomic(ptr, inc) \
>         asm volatile ("addw %1,%w" \
>             : "+m" (*(ptr)) : "ri" (inc) : "memory")
> 
> (but obviously handling all the different sizes.)

I guess so.  An equivalent 'inc' operation would be even shorter,
but maybe GCC has its reasons for using addw + immediate?
(Ah, it's in the optimization manual: addw $1 is preferred because it
sets all the flags, whereas inc sets only some, so the inc has a
dependence on the previous instruction to set flags.)

It needs some careful naming -- this series will add two
new add operations, currently xadd() and add_atomic(), where xadd() is
the more atomic of the two, IYSWIM.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-29 16:56             ` Tim Deegan
@ 2015-04-29 17:00               ` David Vrabel
  2015-04-30  9:00                 ` Tim Deegan
  0 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-29 17:00 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Jan Beulich, xen-devel

On 29/04/15 17:56, Tim Deegan wrote:
> At 16:36 +0100 on 29 Apr (1430325362), David Vrabel wrote:
>> On 23/04/15 15:58, Jan Beulich wrote:
>>>>>> On 23.04.15 at 16:43, <tim@xen.org> wrote:
>>>> At 14:54 +0100 on 23 Apr (1429800874), Jan Beulich wrote:
>>>>>>>> On 23.04.15 at 14:03, <tim@xen.org> wrote:
>>>>>> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
>>>>>>>  void _spin_unlock(spinlock_t *lock)
>>>>>>>  {
>>>>>>> +    smp_mb();
>>>>>>>      preempt_enable();
>>>>>>>      LOCK_PROFILE_REL;
>>>>>>> -    _raw_spin_unlock(&lock->raw);
>>>>>>> +    lock->tickets.head++;
>>>>>>
>>>>>> This needs to be done with an explicit atomic (though not locked)
>>>>>> write; otherwise the compiler might use some unsuitable operation that
>>>>>> clobbers .tail as well.
>>>>>
>>>>> How do you imagine that to happen? An increment of one
>>>>> structure member surely won't modify any others.
>>>>
>>>> AIUI, the '++' could end up as a word-size read, modify, and word-size
>>>> write.  If another CPU updates .tail parallel, that update could get
>>>> lost.
>>>
>>> Ah, right, compilers are allowed to do that, albeit normally wouldn't
>>> unless the architecture has no suitable loads/stores.
>>
>> lock->tickets.head++;
>>
>>   7b:   66 83 07 01             addw   $0x1,(%rdi)
>>
>> write_atomic(&lock->tickets.head, lock->tickets.head + 1);
>>
>>   7b:   0f b7 07                movzwl (%rdi),%eax
>>   7e:   83 c0 01                add    $0x1,%eax
>>   81:   66 89 07                mov    %ax,(%rdi)
> 
> :(
> 
>> Do you want a new add_atomic() operation? e.g.,
>>
>> #define add_atomic(ptr, inc) \
>>         asm volatile ("addw %1,%w" \
>>             : "+m" (*(ptr)) : "ri" (inc) : "memory")
>>
>> (but obviously handling all the different sizes.)
> 
> I guess so.  An equivalent 'inc' operation would be even shorter,
> but maybe GCC has its reasons for using addw + immediate?
> (Ah, it's in the optimization manual: addw $1 is preferred because it
> sets all the flags, whereas inc sets only some, so the inc has a
> dependence on the previous instruction to set flags.)
> 
> It needs some careful naming -- this series will add two
> new add operations, currently xadd() and add_atomic(), where xadd() is
> the more atomic of the two, IYSWIM.

Should I rename xadd() to arch_fetch_and_add() to match the GCC builtin
name?

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-29 15:36           ` David Vrabel
  2015-04-29 16:56             ` Tim Deegan
@ 2015-04-29 23:48             ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-04-29 23:48 UTC (permalink / raw)
  To: david.vrabel, tim
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	jennifer.herbert, xen-devel

>>> David Vrabel <david.vrabel@citrix.com> 04/29/15 5:39 PM >>>
>On 23/04/15 15:58, Jan Beulich wrote:
>>>>> On 23.04.15 at 16:43, <tim@xen.org> wrote:
>>> At 14:54 +0100 on 23 Apr (1429800874), Jan Beulich wrote:
>>> AIUI, the '++' could end up as a word-size read, modify, and word-size
>>> write.  If another CPU updates .tail parallel, that update could get
>>> lost.
>> 
>> Ah, right, compilers are allowed to do that, albeit normally wouldn't
>> unless the architecture has no suitable loads/stores.
>
>lock->tickets.head++;
>
  >7b:   66 83 07 01             addw   $0x1,(%rdi)
>
>write_atomic(&lock->tickets.head, lock->tickets.head + 1);
>
  >7b:   0f b7 07                movzwl (%rdi),%eax
  >7e:   83 c0 01                add    $0x1,%eax
  >81:   66 89 07                mov    %ax,(%rdi)
>
>Do you want a new add_atomic() operation? e.g.,
>
>#define add_atomic(ptr, inc) \
        >asm volatile ("addw %1,%w" \
            >: "+m" (*(ptr)) : "ri" (inc) : "memory")
<
>(but obviously handling all the different sizes.)

I think that would be desirable.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-29 15:21         ` David Vrabel
@ 2015-04-29 23:56           ` Jan Beulich
  2015-04-30 10:09             ` Tim Deegan
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2015-04-29 23:56 UTC (permalink / raw)
  To: david.vrabel
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	jennifer.herbert, tim, xen-devel

>>> David Vrabel <david.vrabel@citrix.com> 04/29/15 5:28 PM >>>
>On 29/04/15 00:15, Jan Beulich wrote:
>>>>> David Vrabel <david.vrabel@citrix.com> 04/28/15 6:16 PM >>>
>>> Are there any structures whose size you're particularly concerned about?
>> 
>> No specific ones (but of course structures with an inherent size constraint
>> - like struct domain and struct vcpu - are, with all of their sub-structures,
>> primary candidates). I just recall that in some cases (and this may no longer
>> apply due to later additions) structures got arranged specifically taking in
>> mind the 2-byte size of the locks, and the growth here may thus mean a
>> structure size growth of more than just two bytes.
>
>Spin locks are currently 4 bytes (2 bytes for the lock, plus 2 bytes for
>the recurse_{cnt,cpu}), and ticket locks are now 8 bytes (an increase in
>4 bytes).
>
>struct vcpu does not increase in size (there is 56, now 48 bytes, of
>padding before the arch field).
>
>struct domain increases from 3480 to 3544 bytes.

Odd - this suggests there's still some 64-byte alignment somewhere, but I
can't immediately spot it. Only struct vcpu has an obvious 64-byte aligned
field (struct pi_desc) afaics.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-29 17:00               ` David Vrabel
@ 2015-04-30  9:00                 ` Tim Deegan
  0 siblings, 0 replies; 36+ messages in thread
From: Tim Deegan @ 2015-04-30  9:00 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Jennifer Herbert, Jan Beulich, xen-devel

At 18:00 +0100 on 29 Apr (1430330400), David Vrabel wrote:
> On 29/04/15 17:56, Tim Deegan wrote:
> > At 16:36 +0100 on 29 Apr (1430325362), David Vrabel wrote:
> >> On 23/04/15 15:58, Jan Beulich wrote:
> >>>>>> On 23.04.15 at 16:43, <tim@xen.org> wrote:
> >>>> At 14:54 +0100 on 23 Apr (1429800874), Jan Beulich wrote:
> >>>>>>>> On 23.04.15 at 14:03, <tim@xen.org> wrote:
> >>>>>> At 11:11 +0100 on 21 Apr (1429614687), David Vrabel wrote:
> >>>>>>>  void _spin_unlock(spinlock_t *lock)
> >>>>>>>  {
> >>>>>>> +    smp_mb();
> >>>>>>>      preempt_enable();
> >>>>>>>      LOCK_PROFILE_REL;
> >>>>>>> -    _raw_spin_unlock(&lock->raw);
> >>>>>>> +    lock->tickets.head++;
> >>>>>>
> >>>>>> This needs to be done with an explicit atomic (though not locked)
> >>>>>> write; otherwise the compiler might use some unsuitable operation that
> >>>>>> clobbers .tail as well.
> >>>>>
> >>>>> How do you imagine that to happen? An increment of one
> >>>>> structure member surely won't modify any others.
> >>>>
> >>>> AIUI, the '++' could end up as a word-size read, modify, and word-size
> >>>> write.  If another CPU updates .tail parallel, that update could get
> >>>> lost.
> >>>
> >>> Ah, right, compilers are allowed to do that, albeit normally wouldn't
> >>> unless the architecture has no suitable loads/stores.
> >>
> >> lock->tickets.head++;
> >>
> >>   7b:   66 83 07 01             addw   $0x1,(%rdi)
> >>
> >> write_atomic(&lock->tickets.head, lock->tickets.head + 1);
> >>
> >>   7b:   0f b7 07                movzwl (%rdi),%eax
> >>   7e:   83 c0 01                add    $0x1,%eax
> >>   81:   66 89 07                mov    %ax,(%rdi)
> > 
> > :(
> > 
> >> Do you want a new add_atomic() operation? e.g.,
> >>
> >> #define add_atomic(ptr, inc) \
> >>         asm volatile ("addw %1,%w" \
> >>             : "+m" (*(ptr)) : "ri" (inc) : "memory")
> >>
> >> (but obviously handling all the different sizes.)
> > 
> > I guess so.  An equivalent 'inc' operation would be even shorter,
> > but maybe GCC has its reasons for using addw + immediate?
> > (Ah, it's in the optimization manual: addw $1 is preferred because it
> > sets all the flags, whereas inc sets only some, so the inc has a
> > dependence on the previous instruction to set flags.)
> > 
> > It needs some careful naming -- this series will add two
> > new add operations, currently xadd() and add_atomic(), where xadd() is
> > the more atomic of the two, IYSWIM.
> 
> Should I rename xadd() to arch_fetch_and_add() to match the GCC builtin
> name?

Yes, that sounds good.  And maybe the new one could be add_sized() or
similar?  Just to avoid anyone using it when they wanted a locked op.

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-29 23:56           ` Jan Beulich
@ 2015-04-30 10:09             ` Tim Deegan
  2015-04-30 11:55               ` David Vrabel
  0 siblings, 1 reply; 36+ messages in thread
From: Tim Deegan @ 2015-04-30 10:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	jennifer.herbert, david.vrabel, xen-devel

At 00:56 +0100 on 30 Apr (1430355366), Jan Beulich wrote:
> >>> David Vrabel <david.vrabel@citrix.com> 04/29/15 5:28 PM >>>
> >On 29/04/15 00:15, Jan Beulich wrote:
> >>>>> David Vrabel <david.vrabel@citrix.com> 04/28/15 6:16 PM >>>
> >>> Are there any structures whose size you're particularly concerned about?
> >> 
> >> No specific ones (but of course structures with an inherent size constraint
> >> - like struct domain and struct vcpu - are, with all of their sub-structures,
> >> primary candidates). I just recall that in some cases (and this may no longer
> >> apply due to later additions) structures got arranged specifically taking in
> >> mind the 2-byte size of the locks, and the growth here may thus mean a
> >> structure size growth of more than just two bytes.
> >
> >Spin locks are currently 4 bytes (2 bytes for the lock, plus 2 bytes for
> >the recurse_{cnt,cpu}), and ticket locks are now 8 bytes (an increase in
> >4 bytes).
> >
> >struct vcpu does not increase in size (there is 56, now 48 bytes, of
> >padding before the arch field).
> >
> >struct domain increases from 3480 to 3544 bytes.
> 
> Odd - this suggests there's still some 64-byte alignment somewhere, but I
> can't immediately spot it. Only struct vcpu has an obvious 64-byte aligned
> field (struct pi_desc) afaics.

I got slightly different numbers to David (I suspect just starting
from a different baseline) but the biggest hole came from struct
arch_domain being __cacheline_aligned (i.e. 128-byte aligned).

There are some holes above it in struct domain which should pack in
nicely to counteract the larger spinlocks, e.g. this was enough to
naturally align arch_domain again in my tree:

@@ -334,9 +334,9 @@ struct domain
     spinlock_t       rangesets_lock;
 
     /* Event channel information. */
+    unsigned int     max_evtchns;
     struct evtchn   *evtchn;                         /* first bucket only */
     struct evtchn  **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */
-    unsigned int     max_evtchns;
     unsigned int     max_evtchn_port;
     spinlock_t       event_lock;
     const struct evtchn_port_ops *evtchn_port_ops;

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 3/4] xen: use ticket locks for spin locks
  2015-04-30 10:09             ` Tim Deegan
@ 2015-04-30 11:55               ` David Vrabel
  0 siblings, 0 replies; 36+ messages in thread
From: David Vrabel @ 2015-04-30 11:55 UTC (permalink / raw)
  To: Tim Deegan, Jan Beulich
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	jennifer.herbert, david.vrabel, xen-devel

On 30/04/15 11:09, Tim Deegan wrote:
> At 00:56 +0100 on 30 Apr (1430355366), Jan Beulich wrote:
>>>>> David Vrabel <david.vrabel@citrix.com> 04/29/15 5:28 PM >>>
>>> On 29/04/15 00:15, Jan Beulich wrote:
>>>>>>> David Vrabel <david.vrabel@citrix.com> 04/28/15 6:16 PM >>>
>>>>> Are there any structures whose size you're particularly concerned about?
>>>>
>>>> No specific ones (but of course structures with an inherent size constraint
>>>> - like struct domain and struct vcpu - are, with all of their sub-structures,
>>>> primary candidates). I just recall that in some cases (and this may no longer
>>>> apply due to later additions) structures got arranged specifically taking in
>>>> mind the 2-byte size of the locks, and the growth here may thus mean a
>>>> structure size growth of more than just two bytes.
>>>
>>> Spin locks are currently 4 bytes (2 bytes for the lock, plus 2 bytes for
>>> the recurse_{cnt,cpu}), and ticket locks are now 8 bytes (an increase in
>>> 4 bytes).
>>>
>>> struct vcpu does not increase in size (there is 56, now 48 bytes, of
>>> padding before the arch field).
>>>
>>> struct domain increases from 3480 to 3544 bytes.
>>
>> Odd - this suggests there's still some 64-byte alignment somewhere, but I
>> can't immediately spot it. Only struct vcpu has an obvious 64-byte aligned
>> field (struct pi_desc) afaics.
> 
> I got slightly different numbers to David (I suspect just starting
> from a different baseline) but the biggest hole came from struct
> arch_domain being __cacheline_aligned (i.e. 128-byte aligned).
> 
> There are some holes above it in struct domain which should pack in
> nicely to counteract the larger spinlocks, e.g. this was enough to
> naturally align arch_domain again in my tree:
> 
> @@ -334,9 +334,9 @@ struct domain
>      spinlock_t       rangesets_lock;
>  
>      /* Event channel information. */
> +    unsigned int     max_evtchns;
>      struct evtchn   *evtchn;                         /* first bucket only */
>      struct evtchn  **evtchn_group[NR_EVTCHN_GROUPS]; /* all other buckets */
> -    unsigned int     max_evtchns;
>      unsigned int     max_evtchn_port;
>      spinlock_t       event_lock;
>      const struct evtchn_port_ops *evtchn_port_ops;

This change makes struct domain 8 bytes bigger with ticket locks.

On x86 there are 12 spinlocks in struct domain which is 48 bytes extra.
 I've clawed back 16 bytes by repacking struct paging_domain and struct
hvm_domain.

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 0/4] Use ticket locks for spinlocks
  2015-04-23 12:42   ` David Vrabel
@ 2015-04-30 15:44     ` David Vrabel
  2015-05-04  7:18       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2015-04-30 15:44 UTC (permalink / raw)
  To: David Vrabel, Jan Beulich, Jennifer Herbert
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel

On 23/04/15 13:42, David Vrabel wrote:
> On 21/04/15 11:22, Jan Beulich wrote:
>>>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
>>> We have analysed the affect of this series on interrupt latency (by
>>> measuring the duration of irq disable/enable regions) and there is no
>>> signficant impact.
>>>
>>> http://xenbits.xen.org/people/dvrabel/bar2_comp.png 
>>
>> Thanks for doing this!
>>
>>> Interestingly, the biggest offenders for long critical sections are
>>> bare irq disable/enable regions, and some of these are really bad (10s
>>> of ms)!
>>>
>>> http://xenbits.xen.org/people/dvrabel/bar_normal_busy.png 
>>
>> Despite both pictures saying micro-seconds at the respective
>> axis (and hence the problem not being _as bad_)
> 
> This particular graph doesn't show it very well but there are small
> blips around 4, 10, and 31 ms.

This first set of analysis incorrect included some bogus data.

1. The mwait idle driver has interrupts masked for the mwait instruction
(with the wake-on-interrupt bit enabled), thus the very long times were
measuring idle time.

2. do_IRQ() calls spin_unlock_irq() without a corresponding
spin_lock_irq() thus the values for this lock were incorrect.

Here's an updated graph (Thanks to Jennifer).

http://xenbits.xen.org/people/dvrabel/ticket-locks/busy_byte.png

The durations are all more reasonable now.

David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCHv3 0/4] Use ticket locks for spinlocks
  2015-04-30 15:44     ` David Vrabel
@ 2015-05-04  7:18       ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-05-04  7:18 UTC (permalink / raw)
  To: David Vrabel, Jennifer Herbert
  Cc: Keir Fraser, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel

>>> On 30.04.15 at 17:44, <david.vrabel@citrix.com> wrote:
> On 23/04/15 13:42, David Vrabel wrote:
>> On 21/04/15 11:22, Jan Beulich wrote:
>>>>>> On 21.04.15 at 12:11, <david.vrabel@citrix.com> wrote:
>>>> We have analysed the affect of this series on interrupt latency (by
>>>> measuring the duration of irq disable/enable regions) and there is no
>>>> signficant impact.
>>>>
>>>> http://xenbits.xen.org/people/dvrabel/bar2_comp.png 
>>>
>>> Thanks for doing this!
>>>
>>>> Interestingly, the biggest offenders for long critical sections are
>>>> bare irq disable/enable regions, and some of these are really bad (10s
>>>> of ms)!
>>>>
>>>> http://xenbits.xen.org/people/dvrabel/bar_normal_busy.png 
>>>
>>> Despite both pictures saying micro-seconds at the respective
>>> axis (and hence the problem not being _as bad_)
>> 
>> This particular graph doesn't show it very well but there are small
>> blips around 4, 10, and 31 ms.
> 
> This first set of analysis incorrect included some bogus data.
> 
> 1. The mwait idle driver has interrupts masked for the mwait instruction
> (with the wake-on-interrupt bit enabled), thus the very long times were
> measuring idle time.
> 
> 2. do_IRQ() calls spin_unlock_irq() without a corresponding
> spin_lock_irq() thus the values for this lock were incorrect.
> 
> Here's an updated graph (Thanks to Jennifer).
> 
> http://xenbits.xen.org/people/dvrabel/ticket-locks/busy_byte.png 
> 
> The durations are all more reasonable now.

Thanks for doing this! I think this makes clear that there's no
immediate need for any further investigation then (a couple of
microsecond IRQ-disabled sections don't seem to be a problem
for other than true real-time uses).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-05-04  7:18 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-21 10:11 [PATCHv3 0/4] Use ticket locks for spinlocks David Vrabel
2015-04-21 10:11 ` [PATCHv3 1/4] x86: provide xadd() David Vrabel
2015-04-21 10:36   ` Jan Beulich
2015-04-21 12:36     ` David Vrabel
2015-04-21 13:12       ` Jan Beulich
2015-04-21 13:15         ` David Vrabel
2015-04-21 13:20           ` Jan Beulich
2015-04-21 10:11 ` [PATCHv3 2/4] arm: " David Vrabel
2015-04-21 10:11 ` [PATCHv3 3/4] xen: use ticket locks for spin locks David Vrabel
2015-04-23 11:58   ` Jan Beulich
2015-04-28 15:56     ` David Vrabel
2015-04-28 23:15       ` Jan Beulich
2015-04-29 15:21         ` David Vrabel
2015-04-29 23:56           ` Jan Beulich
2015-04-30 10:09             ` Tim Deegan
2015-04-30 11:55               ` David Vrabel
2015-04-23 12:03   ` Tim Deegan
2015-04-23 13:43     ` David Vrabel
2015-04-23 13:45       ` Andrew Cooper
2015-04-23 14:58         ` Tim Deegan
2015-04-23 14:24       ` Tim Deegan
2015-04-23 14:51         ` Tim Deegan
2015-04-23 13:54     ` Jan Beulich
2015-04-23 14:43       ` Tim Deegan
2015-04-23 14:58         ` Jan Beulich
2015-04-29 15:36           ` David Vrabel
2015-04-29 16:56             ` Tim Deegan
2015-04-29 17:00               ` David Vrabel
2015-04-30  9:00                 ` Tim Deegan
2015-04-29 23:48             ` Jan Beulich
2015-04-21 10:11 ` [PATCHv3 4/4] x86, arm: remove asm/spinlock.h from all architectures David Vrabel
2015-04-21 10:22 ` [PATCHv3 0/4] Use ticket locks for spinlocks Jan Beulich
2015-04-21 11:24   ` Jennifer Herbert
2015-04-23 12:42   ` David Vrabel
2015-04-30 15:44     ` David Vrabel
2015-05-04  7:18       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.