* [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock
@ 2017-02-01 18:26 Waiman Long
2017-02-01 18:26 ` [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter Waiman Long
2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
0 siblings, 2 replies; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long
The current debug spinlock implementation is a TATAS unfair lock. This
can occasionally lead to system lockup with a debug kernel because
of the unfairness of the lock rather than inherent locking problem.
This patch set changes the debug spinlock implementation to a
mostly fair spinlock based on the MCS lock similar to what is done
in qspinlock.
Waiman Long (2):
locking/spinlock_debug: Reduce lockup suspected message clutter
locking/spinlock_debug: Reduce lock cacheline contention
include/linux/spinlock_types.h | 8 +++--
kernel/locking/spinlock_debug.c | 73 ++++++++++++++++++++++++++++++-----------
2 files changed, 58 insertions(+), 23 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter
2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
@ 2017-02-01 18:26 ` Waiman Long
2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
1 sibling, 0 replies; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long
When the debug spinlock code detects a lockup, it will print out an
error messages as well as the backtraces of all the CPUs. However, if
more than one CPUs are waiting on that lock, multiple lockup messages
will be printed leading to garbled output.
To reduce clutter in the console log, now only one of the lock waiters
will be allowed to print out the CPU backtraces.
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/spinlock_types.h | 3 ++-
kernel/locking/spinlock_debug.c | 26 +++++++++++++++++++++-----
2 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 73548eb..ef28ce5 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -23,7 +23,7 @@
unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
- unsigned int magic, owner_cpu;
+ unsigned int magic, owner_cpu, lockup;
void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -45,6 +45,7 @@
# define SPIN_DEBUG_INIT(lockname) \
.magic = SPINLOCK_MAGIC, \
.owner_cpu = -1, \
+ .lockup = 0, \
.owner = SPINLOCK_OWNER_INIT,
#else
# define SPIN_DEBUG_INIT(lockname)
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 0374a59..0f880a8 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -27,6 +27,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
lock->magic = SPINLOCK_MAGIC;
lock->owner = SPINLOCK_OWNER_INIT;
lock->owner_cpu = -1;
+ lock->lockup = 0;
}
EXPORT_SYMBOL(__raw_spin_lock_init);
@@ -101,6 +102,24 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
lock, "wrong CPU");
lock->owner = SPINLOCK_OWNER_INIT;
lock->owner_cpu = -1;
+ lock->lockup = 0;
+}
+
+static inline void __spin_lockup(raw_spinlock_t *lock)
+{
+ /*
+ * lockup suspected:
+ *
+ * Only one of the lock waiters will be allowed to print the lockup
+ * message in order to avoid an avalanche of lockup and backtrace
+ * messages from different lock waiters of the same lock.
+ */
+ if (!xchg(&lock->lockup, 1)) {
+ spin_dump(lock, "lockup suspected");
+#ifdef CONFIG_SMP
+ trigger_all_cpu_backtrace();
+#endif
+ }
}
static void __spin_lock_debug(raw_spinlock_t *lock)
@@ -113,11 +132,8 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
return;
__delay(1);
}
- /* lockup suspected: */
- spin_dump(lock, "lockup suspected");
-#ifdef CONFIG_SMP
- trigger_all_cpu_backtrace();
-#endif
+
+ __spin_lockup(lock);
/*
* The trylock above was causing a livelock. Give the lower level arch
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention
2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
2017-02-01 18:26 ` [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter Waiman Long
@ 2017-02-01 18:26 ` Waiman Long
2017-02-01 19:43 ` kbuild test robot
1 sibling, 1 reply; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long
The debug spinlock code is a basic TATAS unfair lock irrespective
of what the underlying architecture specific spinlock implementation
is. As a result, it is sometimes possible to trigger a false positive
"lockup suspected" warning with all the cpu backtraces.
This patch re-implements the debug spinlock as a fair MCS lock. This
reduces the chance of false positive warning messages. At the
same time, it also improves performance by reducing lock cacheline
contention.
Because there is a trylock before entering the MCS queue, this new
debug spinlock code also perform pretty well in a virtual machine
even if its vCPUSs are over-committed.
On a 4-socket 32-core 64-thread system, the performance of a locking
microbenchmark (locking rate and standard deviation) on a 4.9.6 based
debug kernel with and without the patch was as follows:
32 locking threads:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 263.1 Mop/s 1.39s
with patch 917.6 Mop/s 0.07s
64 locking threads:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 368.3 Mop/s 6.88s
with patch 733.0 Mop/s 0.09s
On a 2-socket 24-core 48-thread system, the performance of the same
locking microbenchmark (# of locking threads = # of vCPUs) on a KVM
guest are as follows:
24 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 746.4 Mop/s 1.07s
with patch 1323.6 Mop/s 0.20s
48 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 1077.8 Mop/s 3.34s
with patch 1090.4 Mop/s 0.29s
72 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 944.5 Mop/s 3.96s
with patch 1176.7 Mop/s 0.44s
96 vCPUs:
Kernel Locking Rate SD (execution time)
------ ------------ -------------------
w/o patch 878.0 Mop/s 5.19s
with patch 1017.0 Mop/s 0.83s
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/spinlock_types.h | 5 ++--
kernel/locking/spinlock_debug.c | 53 +++++++++++++++++++++++++++--------------
2 files changed, 38 insertions(+), 20 deletions(-)
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index ef28ce5..895da96 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -24,7 +24,7 @@
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu, lockup;
- void *owner;
+ void *owner, *tail;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
@@ -46,7 +46,8 @@
.magic = SPINLOCK_MAGIC, \
.owner_cpu = -1, \
.lockup = 0, \
- .owner = SPINLOCK_OWNER_INIT,
+ .owner = SPINLOCK_OWNER_INIT, \
+ .tail = NULL,
#else
# define SPIN_DEBUG_INIT(lockname)
#endif
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 0f880a8..c58b61f 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -12,6 +12,7 @@
#include <linux/debug_locks.h>
#include <linux/delay.h>
#include <linux/export.h>
+#include "mcs_spinlock.h"
void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
struct lock_class_key *key)
@@ -26,6 +27,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
lock->magic = SPINLOCK_MAGIC;
lock->owner = SPINLOCK_OWNER_INIT;
+ lock->tail = NULL;
lock->owner_cpu = -1;
lock->lockup = 0;
}
@@ -105,7 +107,7 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
lock->lockup = 0;
}
-static inline void __spin_lockup(raw_spinlock_t *lock)
+static inline void __spin_chk_lockup(raw_spinlock_t *lock, u64 loops)
{
/*
* lockup suspected:
@@ -113,37 +115,52 @@ static inline void __spin_lockup(raw_spinlock_t *lock)
* Only one of the lock waiters will be allowed to print the lockup
* message in order to avoid an avalanche of lockup and backtrace
* messages from different lock waiters of the same lock.
+ *
+ * With the original __deley(1) call, lockup can happen when both
+ * threads of a hyperthreaded CPU core contend on the same lock. So
+ * cpu_relax() is used here instead.
*/
- if (!xchg(&lock->lockup, 1)) {
+ if (unlikely(!loops && !xchg(&lock->lockup, 1))) {
spin_dump(lock, "lockup suspected");
#ifdef CONFIG_SMP
trigger_all_cpu_backtrace();
#endif
}
+ cpu_relax();
}
+/*
+ * The lock waiters are put into a MCS queue to maintain lock fairness
+ * as well as avoiding excessive contention on the lock cacheline. It
+ * also helps to reduce false positive because of unfairness instead
+ * of real lockup.
+ *
+ * The trylock before entering the MCS queue makes this code perform
+ * reasonably well in a virtual machine where some of the lock waiters
+ * may have their vCPUs preempted.
+ */
static void __spin_lock_debug(raw_spinlock_t *lock)
{
- u64 i;
u64 loops = loops_per_jiffy * HZ;
-
- for (i = 0; i < loops; i++) {
- if (arch_spin_trylock(&lock->raw_lock))
- return;
- __delay(1);
+ struct mcs_spinlock node, *prev;
+
+ node.next = NULL;
+ node.locked = 0;
+ prev = xchg(&lock->tail, &node);
+ if (prev) {
+ WRITE_ONCE(prev->next, &node);
+ while (!READ_ONCE(node.locked))
+ __spin_chk_lockup(lock, loops--);
}
- __spin_lockup(lock);
+ while (!arch_spin_trylock(&lock->raw_lock))
+ __spin_chk_lockup(lock, loops--);
- /*
- * The trylock above was causing a livelock. Give the lower level arch
- * specific lock code a chance to acquire the lock. We have already
- * printed a warning/backtrace at this point. The non-debug arch
- * specific code might actually succeed in acquiring the lock. If it is
- * not successful, the end-result is the same - there is no forward
- * progress.
- */
- arch_spin_lock(&lock->raw_lock);
+ if (cmpxchg(&lock->tail, &node, NULL) == &node)
+ return;
+ while (!READ_ONCE(node.next))
+ cpu_relax();
+ WRITE_ONCE(node.next->locked, 1);
}
void do_raw_spin_lock(raw_spinlock_t *lock)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention
2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
@ 2017-02-01 19:43 ` kbuild test robot
0 siblings, 0 replies; 4+ messages in thread
From: kbuild test robot @ 2017-02-01 19:43 UTC (permalink / raw)
To: Waiman Long
Cc: kbuild-all, Peter Zijlstra, Ingo Molnar, linux-kernel, Waiman Long
[-- Attachment #1: Type: text/plain, Size: 3351 bytes --]
Hi Waiman,
[auto build test WARNING on tip/locking/core]
[also build test WARNING on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Waiman-Long/locking-spinlock_debug-Change-it-to-a-mostly-fair-lock/20170202-024626
config: s390-default_defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=s390
All warnings (new ones prefixed by >>):
block/blk-cgroup.c: In function 'blkg_destroy':
>> block/blk-cgroup.c:353:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^
--
fs/lockd/svcsubs.c: In function 'nlmsvc_mark_resources':
>> fs/lockd/svcsubs.c:376:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^
vim +353 block/blk-cgroup.c
e8989fae Tejun Heo 2012-03-05 337 list_del_init(&blkg->q_node);
9f13ef67 Tejun Heo 2012-03-05 338 hlist_del_init_rcu(&blkg->blkcg_node);
03aa264a Tejun Heo 2012-03-05 339
03aa264a Tejun Heo 2012-03-05 340 /*
a637120e Tejun Heo 2012-04-19 341 * Both setting lookup hint to and clearing it from @blkg are done
a637120e Tejun Heo 2012-04-19 342 * under queue_lock. If it's not pointing to @blkg now, it never
a637120e Tejun Heo 2012-04-19 343 * will. Hint assignment itself can race safely.
a637120e Tejun Heo 2012-04-19 344 */
ec6c676a Paul E. McKenney 2014-02-17 345 if (rcu_access_pointer(blkcg->blkg_hint) == blkg)
a637120e Tejun Heo 2012-04-19 346 rcu_assign_pointer(blkcg->blkg_hint, NULL);
a637120e Tejun Heo 2012-04-19 347
a637120e Tejun Heo 2012-04-19 348 /*
03aa264a Tejun Heo 2012-03-05 349 * Put the reference taken at the time of creation so that when all
03aa264a Tejun Heo 2012-03-05 350 * queues are gone, group can be destroyed.
03aa264a Tejun Heo 2012-03-05 351 */
03aa264a Tejun Heo 2012-03-05 352 blkg_put(blkg);
03aa264a Tejun Heo 2012-03-05 @353 }
03aa264a Tejun Heo 2012-03-05 354
9f13ef67 Tejun Heo 2012-03-05 355 /**
9f13ef67 Tejun Heo 2012-03-05 356 * blkg_destroy_all - destroy all blkgs associated with a request_queue
9f13ef67 Tejun Heo 2012-03-05 357 * @q: request_queue of interest
9f13ef67 Tejun Heo 2012-03-05 358 *
3c96cb32 Tejun Heo 2012-04-13 359 * Destroy all blkgs associated with @q.
9f13ef67 Tejun Heo 2012-03-05 360 */
3c96cb32 Tejun Heo 2012-04-13 361 static void blkg_destroy_all(struct request_queue *q)
:::::: The code at line 353 was first introduced by commit
:::::: 03aa264ac15637b6f98374270bcdf31400965505 blkcg: let blkcg core manage per-queue blkg list and counter
:::::: TO: Tejun Heo <tj@kernel.org>
:::::: CC: Jens Axboe <axboe@kernel.dk>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 16710 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-02-01 19:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
2017-02-01 18:26 ` [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter Waiman Long
2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
2017-02-01 19:43 ` kbuild test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.