* [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter
2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
@ 2017-02-01 18:26 ` Waiman Long
2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
1 sibling, 0 replies; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long
When the debug spinlock code detects a lockup, it will print out an
error messages as well as the backtraces of all the CPUs. However, if
more than one CPUs are waiting on that lock, multiple lockup messages
will be printed leading to garbled output.
To reduce clutter in the console log, now only one of the lock waiters
will be allowed to print out the CPU backtraces.
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/spinlock_types.h | 3 ++-
kernel/locking/spinlock_debug.c | 26 +++++++++++++++++++++-----
2 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 73548eb..ef28ce5 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -23,7 +23,7 @@
unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
- unsigned int magic, owner_cpu;
+ unsigned int magic, owner_cpu, lockup;
void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -45,6 +45,7 @@
# define SPIN_DEBUG_INIT(lockname) \
.magic = SPINLOCK_MAGIC, \
.owner_cpu = -1, \
+ .lockup = 0, \
.owner = SPINLOCK_OWNER_INIT,
#else
# define SPIN_DEBUG_INIT(lockname)
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 0374a59..0f880a8 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -27,6 +27,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
lock->magic = SPINLOCK_MAGIC;
lock->owner = SPINLOCK_OWNER_INIT;
lock->owner_cpu = -1;
+ lock->lockup = 0;
}
EXPORT_SYMBOL(__raw_spin_lock_init);
@@ -101,6 +102,24 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
lock, "wrong CPU");
lock->owner = SPINLOCK_OWNER_INIT;
lock->owner_cpu = -1;
+ lock->lockup = 0;
+}
+
+static inline void __spin_lockup(raw_spinlock_t *lock)
+{
+ /*
+ * lockup suspected:
+ *
+ * Only one of the lock waiters will be allowed to print the lockup
+ * message in order to avoid an avalanche of lockup and backtrace
+ * messages from different lock waiters of the same lock.
+ */
+ if (!xchg(&lock->lockup, 1)) {
+ spin_dump(lock, "lockup suspected");
+#ifdef CONFIG_SMP
+ trigger_all_cpu_backtrace();
+#endif
+ }
}
static void __spin_lock_debug(raw_spinlock_t *lock)
@@ -113,11 +132,8 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
return;
__delay(1);
}
- /* lockup suspected: */
- spin_dump(lock, "lockup suspected");
-#ifdef CONFIG_SMP
- trigger_all_cpu_backtrace();
-#endif
+
+ __spin_lockup(lock);
/*
* The trylock above was causing a livelock. Give the lower level arch
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention
2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
2017-02-01 18:26 ` [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter Waiman Long
@ 2017-02-01 18:26 ` Waiman Long
2017-02-01 19:43 ` kbuild test robot
1 sibling, 1 reply; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long
The debug spinlock code is a basic TATAS unfair lock irrespective
of what the underlying architecture specific spinlock implementation
is. As a result, it is sometimes possible to trigger a false positive
"lockup suspected" warning with all the cpu backtraces.
This patch re-implements the debug spinlock as a fair MCS lock. This
reduces the chance of false positive warning messages. At the
same time, it also improves performance by reducing lock cacheline
contention.
Because there is a trylock before entering the MCS queue, this new
debug spinlock code also perform pretty well in a virtual machine
even if its vCPUSs are over-committed.
On a 4-socket 32-core 64-thread system, the performance of a locking
microbenchmark (locking rate and standard deviation) on a 4.9.6 based
debug kernel with and without the patch was as follows:
32 locking threads:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 263.1 Mop/s 1.39s
with patch 917.6 Mop/s 0.07s
64 locking threads:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 368.3 Mop/s 6.88s
with patch 733.0 Mop/s 0.09s
On a 2-socket 24-core 48-thread system, the performance of the same
locking microbenchmark (# of locking threads = # of vCPUs) on a KVM
guest are as follows:
24 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 746.4 Mop/s 1.07s
with patch 1323.6 Mop/s 0.20s
48 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 1077.8 Mop/s 3.34s
with patch 1090.4 Mop/s 0.29s
72 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 944.5 Mop/s 3.96s
with patch 1176.7 Mop/s 0.44s
96 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 878.0 Mop/s 5.19s
with patch 1017.0 Mop/s 0.83s
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/spinlock_types.h | 5 ++--
kernel/locking/spinlock_debug.c | 53 +++++++++++++++++++++++++++--------------
2 files changed, 38 insertions(+), 20 deletions(-)
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index ef28ce5..895da96 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -24,7 +24,7 @@
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu, lockup;
- void *owner;
+ void *owner, *tail;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
@@ -46,7 +46,8 @@
.magic = SPINLOCK_MAGIC, \
.owner_cpu = -1, \
.lockup = 0, \
- .owner = SPINLOCK_OWNER_INIT,
+ .owner = SPINLOCK_OWNER_INIT, \
+ .tail = NULL,
#else
# define SPIN_DEBUG_INIT(lockname)
#endif
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 0f880a8..c58b61f 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -12,6 +12,7 @@
#include <linux/debug_locks.h>
#include <linux/delay.h>
#include <linux/export.h>
+#include "mcs_spinlock.h"
void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
struct lock_class_key *key)
@@ -26,6 +27,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
lock->magic = SPINLOCK_MAGIC;
lock->owner = SPINLOCK_OWNER_INIT;
+ lock->tail = NULL;
lock->owner_cpu = -1;
lock->lockup = 0;
}
@@ -105,7 +107,7 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
lock->lockup = 0;
}
-static inline void __spin_lockup(raw_spinlock_t *lock)
+static inline void __spin_chk_lockup(raw_spinlock_t *lock, u64 loops)
{
/*
* lockup suspected:
@@ -113,37 +115,52 @@ static inline void __spin_lockup(raw_spinlock_t *lock)
* Only one of the lock waiters will be allowed to print the lockup
* message in order to avoid an avalanche of lockup and backtrace
* messages from different lock waiters of the same lock.
+ *
+ * With the original __deley(1) call, lockup can happen when both
+ * threads of a hyperthreaded CPU core contend on the same lock. So
+ * cpu_relax() is used here instead.
*/
- if (!xchg(&lock->lockup, 1)) {
+ if (unlikely(!loops && !xchg(&lock->lockup, 1))) {
spin_dump(lock, "lockup suspected");
#ifdef CONFIG_SMP
trigger_all_cpu_backtrace();
#endif
}
+ cpu_relax();
}
+/*
+ * The lock waiters are put into a MCS queue to maintain lock fairness
+ * as well as avoiding excessive contention on the lock cacheline. It
+ * also helps to reduce false positive because of unfairness instead
+ * of real lockup.
+ *
+ * The trylock before entering the MCS queue makes this code perform
+ * reasonably well in a virtual machine where some of the lock waiters
+ * may have their vCPUs preempted.
+ */
static void __spin_lock_debug(raw_spinlock_t *lock)
{
- u64 i;
u64 loops = loops_per_jiffy * HZ;
-
- for (i = 0; i < loops; i++) {
- if (arch_spin_trylock(&lock->raw_lock))
- return;
- __delay(1);
+ struct mcs_spinlock node, *prev;
+
+ node.next = NULL;
+ node.locked = 0;
+ prev = xchg(&lock->tail, &node);
+ if (prev) {
+ WRITE_ONCE(prev->next, &node);
+ while (!READ_ONCE(node.locked))
+ __spin_chk_lockup(lock, loops--);
}
- __spin_lockup(lock);
+ while (!arch_spin_trylock(&lock->raw_lock))
+ __spin_chk_lockup(lock, loops--);
- /*
- * The trylock above was causing a livelock. Give the lower level arch
- * specific lock code a chance to acquire the lock. We have already
- * printed a warning/backtrace at this point. The non-debug arch
- * specific code might actually succeed in acquiring the lock. If it is
- * not successful, the end-result is the same - there is no forward
- * progress.
- */
- arch_spin_lock(&lock->raw_lock);
+ if (cmpxchg(&lock->tail, &node, NULL) == &node)
+ return;
+ while (!READ_ONCE(node.next))
+ cpu_relax();
+ WRITE_ONCE(node.next->locked, 1);
}
void do_raw_spin_lock(raw_spinlock_t *lock)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread