[PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock
@ 2017-02-01 18:26 Waiman Long
  2017-02-01 18:26 ` [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter Waiman Long
  2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
  0 siblings, 2 replies; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long

The current debug spinlock implementation is a TATAS unfair lock. This
can occasionally lead to system lockup with a debug kernel because
of the unfairness of the lock rather than inherent locking problem.

This patch set changes the debug spinlock implementation to a
mostly fair spinlock based on the MCS lock similar to what is done
in qspinlock.

Waiman Long (2):
  locking/spinlock_debug: Reduce lockup suspected message clutter
  locking/spinlock_debug: Reduce lock cacheline contention

 include/linux/spinlock_types.h  |  8 +++--
 kernel/locking/spinlock_debug.c | 73 ++++++++++++++++++++++++++++++-----------
 2 files changed, 58 insertions(+), 23 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter
  2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
@ 2017-02-01 18:26 ` Waiman Long
  2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
  1 sibling, 0 replies; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long

When the debug spinlock code detects a lockup, it will print out an
error messages as well as the backtraces of all the CPUs. However, if
more than one CPUs are waiting on that lock, multiple lockup messages
will be printed leading to garbled output.

To reduce clutter in the console log, now only one of the lock waiters
will be allowed to print out the CPU backtraces.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/spinlock_types.h  |  3 ++-
 kernel/locking/spinlock_debug.c | 26 +++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 73548eb..ef28ce5 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -23,7 +23,7 @@
 	unsigned int break_lock;
 #endif
 #ifdef CONFIG_DEBUG_SPINLOCK
-	unsigned int magic, owner_cpu;
+	unsigned int magic, owner_cpu, lockup;
 	void *owner;
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -45,6 +45,7 @@
 # define SPIN_DEBUG_INIT(lockname)		\
 	.magic = SPINLOCK_MAGIC,		\
 	.owner_cpu = -1,			\
+	.lockup = 0,				\
 	.owner = SPINLOCK_OWNER_INIT,
 #else
 # define SPIN_DEBUG_INIT(lockname)
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 0374a59..0f880a8 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -27,6 +27,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 	lock->magic = SPINLOCK_MAGIC;
 	lock->owner = SPINLOCK_OWNER_INIT;
 	lock->owner_cpu = -1;
+	lock->lockup = 0;
 }
 
 EXPORT_SYMBOL(__raw_spin_lock_init);
@@ -101,6 +102,24 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
 							lock, "wrong CPU");
 	lock->owner = SPINLOCK_OWNER_INIT;
 	lock->owner_cpu = -1;
+	lock->lockup = 0;
+}
+
+static inline void __spin_lockup(raw_spinlock_t *lock)
+{
+	/*
+	 * lockup suspected:
+	 *
+	 * Only one of the lock waiters will be allowed to print the lockup
+	 * message in order to avoid an avalanche of lockup and backtrace
+	 * messages from different lock waiters of the same lock.
+	 */
+	if (!xchg(&lock->lockup, 1)) {
+		spin_dump(lock, "lockup suspected");
+#ifdef CONFIG_SMP
+		trigger_all_cpu_backtrace();
+#endif
+	}
 }
 
 static void __spin_lock_debug(raw_spinlock_t *lock)
@@ -113,11 +132,8 @@ static void __spin_lock_debug(raw_spinlock_t *lock)
 			return;
 		__delay(1);
 	}
-	/* lockup suspected: */
-	spin_dump(lock, "lockup suspected");
-#ifdef CONFIG_SMP
-	trigger_all_cpu_backtrace();
-#endif
+
+	__spin_lockup(lock);
 
 	/*
 	 * The trylock above was causing a livelock.  Give the lower level arch
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention
  2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
  2017-02-01 18:26 ` [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter Waiman Long
@ 2017-02-01 18:26 ` Waiman Long
  2017-02-01 19:43   ` kbuild test robot
  1 sibling, 1 reply; 4+ messages in thread
From: Waiman Long @ 2017-02-01 18:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Waiman Long

The debug spinlock code is a basic TATAS unfair lock irrespective
of what the underlying architecture specific spinlock implementation
is. As a result, it is sometimes possible to trigger a false positive
"lockup suspected" warning with all the cpu backtraces.

This patch re-implements the debug spinlock as a fair MCS lock. This
reduces the chance of false positive warning messages. At the
same time, it also improves performance by reducing lock cacheline
contention.

Because there is a trylock before entering the MCS queue, this new
debug spinlock code also perform pretty well in a virtual machine
even if its vCPUSs are over-committed.

On a 4-socket 32-core 64-thread system, the performance of a locking
microbenchmark (locking rate and standard deviation) on a 4.9.6 based
debug kernel with and without the patch was as follows:

  32 locking threads:

  Kernel       Locking Rate    SD (execution time)
  ------       ------------    -------------------
  w/o patch     263.1 Mop/s         1.39s
  with patch    917.6 Mop/s         0.07s

  64 locking threads:

  Kernel       Locking Rate    SD (execution time)
  ------       ------------    -------------------
  w/o patch     368.3 Mop/s         6.88s
  with patch    733.0 Mop/s         0.09s

On a 2-socket 24-core 48-thread system, the performance of the same
locking microbenchmark (# of locking threads = # of vCPUs) on a KVM
guest are as follows:

  24 vCPUs:

  Kernel       Locking Rate    SD (execution time)
  ------       ------------    -------------------
  w/o patch     746.4 Mop/s         1.07s
  with patch   1323.6 Mop/s         0.20s

  48 vCPUs:

  Kernel       Locking Rate    SD (execution time)
  ------       ------------    -------------------
  w/o patch    1077.8 Mop/s         3.34s
  with patch   1090.4 Mop/s         0.29s

  72 vCPUs:

  Kernel       Locking Rate    SD (execution time)
  ------       ------------    -------------------
  w/o patch     944.5 Mop/s         3.96s
  with patch   1176.7 Mop/s         0.44s

  96 vCPUs:

  Kernel       Locking Rate    SD (execution time)
  ------       ------------    -------------------
  w/o patch     878.0 Mop/s         5.19s
  with patch   1017.0 Mop/s         0.83s

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/spinlock_types.h  |  5 ++--
 kernel/locking/spinlock_debug.c | 53 +++++++++++++++++++++++++++--------------
 2 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index ef28ce5..895da96 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -24,7 +24,7 @@
 #endif
 #ifdef CONFIG_DEBUG_SPINLOCK
 	unsigned int magic, owner_cpu, lockup;
-	void *owner;
+	void *owner, *tail;
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map dep_map;
@@ -46,7 +46,8 @@
 	.magic = SPINLOCK_MAGIC,		\
 	.owner_cpu = -1,			\
 	.lockup = 0,				\
-	.owner = SPINLOCK_OWNER_INIT,
+	.owner = SPINLOCK_OWNER_INIT,		\
+	.tail = NULL,
 #else
 # define SPIN_DEBUG_INIT(lockname)
 #endif
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 0f880a8..c58b61f 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -12,6 +12,7 @@
 #include <linux/debug_locks.h>
 #include <linux/delay.h>
 #include <linux/export.h>
+#include "mcs_spinlock.h"
 
 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 			  struct lock_class_key *key)
@@ -26,6 +27,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 	lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 	lock->magic = SPINLOCK_MAGIC;
 	lock->owner = SPINLOCK_OWNER_INIT;
+	lock->tail = NULL;
 	lock->owner_cpu = -1;
 	lock->lockup = 0;
 }
@@ -105,7 +107,7 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
 	lock->lockup = 0;
 }
 
-static inline void __spin_lockup(raw_spinlock_t *lock)
+static inline void __spin_chk_lockup(raw_spinlock_t *lock, u64 loops)
 {
 	/*
 	 * lockup suspected:
@@ -113,37 +115,52 @@ static inline void __spin_lockup(raw_spinlock_t *lock)
 	 * Only one of the lock waiters will be allowed to print the lockup
 	 * message in order to avoid an avalanche of lockup and backtrace
 	 * messages from different lock waiters of the same lock.
+	 *
+	 * With the original __deley(1) call, lockup can happen when both
+	 * threads of a hyperthreaded CPU core contend on the same lock. So
+	 * cpu_relax() is used here instead.
 	 */
-	if (!xchg(&lock->lockup, 1)) {
+	if (unlikely(!loops && !xchg(&lock->lockup, 1))) {
 		spin_dump(lock, "lockup suspected");
 #ifdef CONFIG_SMP
 		trigger_all_cpu_backtrace();
 #endif
 	}
+	cpu_relax();
 }
 
+/*
+ * The lock waiters are put into a MCS queue to maintain lock fairness
+ * as well as avoiding excessive contention on the lock cacheline. It
+ * also helps to reduce false positive because of unfairness instead
+ * of real lockup.
+ *
+ * The trylock before entering the MCS queue makes this code perform
+ * reasonably well in a virtual machine where some of the lock waiters
+ * may have their vCPUs preempted.
+ */
 static void __spin_lock_debug(raw_spinlock_t *lock)
 {
-	u64 i;
 	u64 loops = loops_per_jiffy * HZ;
-
-	for (i = 0; i < loops; i++) {
-		if (arch_spin_trylock(&lock->raw_lock))
-			return;
-		__delay(1);
+	struct mcs_spinlock node, *prev;
+
+	node.next = NULL;
+	node.locked = 0;
+	prev = xchg(&lock->tail, &node);
+	if (prev) {
+		WRITE_ONCE(prev->next, &node);
+		while (!READ_ONCE(node.locked))
+			__spin_chk_lockup(lock, loops--);
 	}
 
-	__spin_lockup(lock);
+	while (!arch_spin_trylock(&lock->raw_lock))
+		__spin_chk_lockup(lock, loops--);
 
-	/*
-	 * The trylock above was causing a livelock.  Give the lower level arch
-	 * specific lock code a chance to acquire the lock. We have already
-	 * printed a warning/backtrace at this point. The non-debug arch
-	 * specific code might actually succeed in acquiring the lock.  If it is
-	 * not successful, the end-result is the same - there is no forward
-	 * progress.
-	 */
-	arch_spin_lock(&lock->raw_lock);
+	if (cmpxchg(&lock->tail, &node, NULL) == &node)
+		return;
+	while (!READ_ONCE(node.next))
+		cpu_relax();
+	WRITE_ONCE(node.next->locked, 1);
 }
 
 void do_raw_spin_lock(raw_spinlock_t *lock)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention
  2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
@ 2017-02-01 19:43   ` kbuild test robot
  0 siblings, 0 replies; 4+ messages in thread
From: kbuild test robot @ 2017-02-01 19:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: kbuild-all, Peter Zijlstra, Ingo Molnar, linux-kernel, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 3351 bytes --]

Hi Waiman,

[auto build test WARNING on tip/locking/core]
[also build test WARNING on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/locking-spinlock_debug-Change-it-to-a-mostly-fair-lock/20170202-024626
config: s390-default_defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

All warnings (new ones prefixed by >>):

   block/blk-cgroup.c: In function 'blkg_destroy':
>> block/blk-cgroup.c:353:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^
--
   fs/lockd/svcsubs.c: In function 'nlmsvc_mark_resources':
>> fs/lockd/svcsubs.c:376:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^

vim +353 block/blk-cgroup.c

e8989fae Tejun Heo        2012-03-05  337  	list_del_init(&blkg->q_node);
9f13ef67 Tejun Heo        2012-03-05  338  	hlist_del_init_rcu(&blkg->blkcg_node);
03aa264a Tejun Heo        2012-03-05  339  
03aa264a Tejun Heo        2012-03-05  340  	/*
a637120e Tejun Heo        2012-04-19  341  	 * Both setting lookup hint to and clearing it from @blkg are done
a637120e Tejun Heo        2012-04-19  342  	 * under queue_lock.  If it's not pointing to @blkg now, it never
a637120e Tejun Heo        2012-04-19  343  	 * will.  Hint assignment itself can race safely.
a637120e Tejun Heo        2012-04-19  344  	 */
ec6c676a Paul E. McKenney 2014-02-17  345  	if (rcu_access_pointer(blkcg->blkg_hint) == blkg)
a637120e Tejun Heo        2012-04-19  346  		rcu_assign_pointer(blkcg->blkg_hint, NULL);
a637120e Tejun Heo        2012-04-19  347  
a637120e Tejun Heo        2012-04-19  348  	/*
03aa264a Tejun Heo        2012-03-05  349  	 * Put the reference taken at the time of creation so that when all
03aa264a Tejun Heo        2012-03-05  350  	 * queues are gone, group can be destroyed.
03aa264a Tejun Heo        2012-03-05  351  	 */
03aa264a Tejun Heo        2012-03-05  352  	blkg_put(blkg);
03aa264a Tejun Heo        2012-03-05 @353  }
03aa264a Tejun Heo        2012-03-05  354  
9f13ef67 Tejun Heo        2012-03-05  355  /**
9f13ef67 Tejun Heo        2012-03-05  356   * blkg_destroy_all - destroy all blkgs associated with a request_queue
9f13ef67 Tejun Heo        2012-03-05  357   * @q: request_queue of interest
9f13ef67 Tejun Heo        2012-03-05  358   *
3c96cb32 Tejun Heo        2012-04-13  359   * Destroy all blkgs associated with @q.
9f13ef67 Tejun Heo        2012-03-05  360   */
3c96cb32 Tejun Heo        2012-04-13  361  static void blkg_destroy_all(struct request_queue *q)

:::::: The code at line 353 was first introduced by commit
:::::: 03aa264ac15637b6f98374270bcdf31400965505 blkcg: let blkcg core manage per-queue blkg list and counter

:::::: TO: Tejun Heo <tj@kernel.org>
:::::: CC: Jens Axboe <axboe@kernel.dk>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 16710 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-01 19:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 18:26 [PATCH 0/2] locking/spinlock_debug: Change it to a mostly fair lock Waiman Long
2017-02-01 18:26 ` [PATCH 1/2] locking/spinlock_debug: Reduce lockup suspected message clutter Waiman Long
2017-02-01 18:26 ` [PATCH 2/2] locking/spinlock_debug: Reduce lock cacheline contention Waiman Long
2017-02-01 19:43   ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.