linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/5] Add NUMA-awareness to qspinlock
@ 2019-11-07 17:46 Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Alex Kogan @ 2019-11-07 17:46 UTC (permalink / raw)
  To: linux, peterz, mingo, will.deacon, arnd, longman, linux-arch,
	linux-arm-kernel, linux-kernel, tglx, bp, hpa, x86, guohanjun,
	jglauber
  Cc: alex.kogan, dave.dice, rahul.x.yadav, steven.sistare, daniel.m.jordan

Minor changes from v5, mainly based on feedback from Longman:
-------------------------------------------------------------

- Make the intra node handoff threshold a configurable parameter 
via the new kernel boot command-line option "numa_spinlock_threshold".

- Add documentation of new command-line options in kernel-parameters.txt.

- Small fix in cna_try_change_tail() (use cmpxhg_relaxed()).

- Small fix in cna_init_nodes() (return 0).

- Minor changes in cna_pass_lock(). 


Summary
-------

Lock throughput can be increased by handing a lock to a waiter on the
same NUMA node as the lock holder, provided care is taken to avoid
starvation of waiters on other NUMA nodes. This patch introduces CNA
(compact NUMA-aware lock) as the slow path for qspinlock. It is
enabled through a configuration option (NUMA_AWARE_SPINLOCKS).

CNA is a NUMA-aware version of the MCS lock. Spinning threads are
organized in two queues, a main queue for threads running on the same
node as the current lock holder, and a secondary queue for threads
running on other nodes. Threads store the ID of the node on which
they are running in their queue nodes. After acquiring the MCS lock and
before acquiring the spinlock, the lock holder scans the main queue
looking for a thread running on the same node (pre-scan). If found (call
it thread T), all threads in the main queue between the current lock
holder and T are moved to the end of the secondary queue.  If such T
is not found, we make another scan of the main queue after acquiring 
the spinlock when unlocking the MCS lock (post-scan), starting at the
node where pre-scan stopped. If both scans fail to find such T, the
MCS lock is passed to the first thread in the secondary queue. If the
secondary queue is empty, the MCS lock is passed to the next thread in the
main queue. To avoid starvation of threads in the secondary queue, those
threads are moved back to the head of the main queue after a certain
number of intra-node lock hand-offs.

More details are available at https://arxiv.org/abs/1810.05600.

We have done some performance evaluation with the locktorture module
as well as with several benchmarks from the will-it-scale repo.
The following locktorture results are from an Oracle X5-4 server
(four Intel Xeon E7-8895 v3 @ 2.60GHz sockets with 18 hyperthreaded
cores each). Each number represents an average (over 25 runs) of the
total number of ops (x10^7) reported at the end of each run. The 
standard deviation is also reported in (), and in general is about 3%
from the average. The 'stock' kernel is v5.4.0-rc5,
commit 7c5e136a02ba, compiled in the default configuration. 
'patch-CNA' is the modified kernel with NUMA_AWARE_SPINLOCKS set; 
the speedup is calculated dividing 'patch-CNA' by 'stock'.

#thr  	 stock        patch-CNA   speedup (patch-CNA/stock)
  1  2.726 (0.107)  2.729 (0.096)  1.001
  2  2.656 (0.113)  2.666 (0.116)  1.004
  4  4.147 (0.085)  4.255 (0.135)  1.026
  8  5.388 (0.146)  6.642 (0.155)  1.233
 16  6.688 (0.152)  8.035 (0.162)  1.202
 32  7.389 (0.203)  8.751 (0.192)  1.184
 36  7.420 (0.179)  8.818 (0.173)  1.188
 72  6.489 (0.122)  9.403 (0.252)  1.449
108  6.163 (0.078)  9.504 (0.177)  1.542
142  5.736 (0.105)  9.371 (0.181)  1.634

The following tables contain throughput results (ops/us) from the same
setup for will-it-scale/open1_threads: 

#thr  	 stock        patch-CNA   speedup (patch-CNA/stock)
  1  0.533 (0.001)  0.534 (0.002)  1.003
  2  0.787 (0.020)  0.801 (0.022)  1.017
  4  1.418 (0.031)  1.421 (0.022)  1.002
  8  1.745 (0.112)  1.736 (0.104)  0.995
 16  1.779 (0.104)  1.696 (0.090)  0.953
 32  0.923 (0.060)  1.634 (0.109)  1.771
 36  0.899 (0.087)  1.636 (0.108)  1.821
 72  0.837 (0.038)  1.615 (0.086)  1.928
108  0.841 (0.044)  1.715 (0.087)  2.041
142  0.802 (0.040)  1.734 (0.085)  2.163

and will-it-scale/lock2_threads:

#thr  	 stock        patch-CNA   speedup (patch-CNA/stock)
  1  1.590 (0.013)  1.583 (0.010)  0.995
  2  2.714 (0.054)  2.697 (0.051)  0.994
  4  5.251 (0.311)  5.252 (0.217)  1.000
  8  4.358 (0.301)  4.309 (0.305)  0.989
 16  4.219 (0.140)  4.161 (0.114)  0.986
 32  2.547 (0.117)  4.134 (0.084)  1.623
 36  2.560 (0.071)  4.127 (0.122)  1.612
 72  1.982 (0.086)  4.097 (0.106)  2.067
108  2.114 (0.089)  4.082 (0.105)  1.930
142  1.923 (0.100)  4.024 (0.086)  2.093

Our evaluation shows that CNA also improves performance of user 
applications that have hot pthread mutexes. Those mutexes are 
blocking, and waiting threads park and unpark via the futex 
mechanism in the kernel. Given that kernel futex chains, which
are hashed by the mutex address, are each protected by a 
chain-specific spin lock, the contention on a user-mode mutex 
translates into contention on a kernel level spinlock. 

Here are the results for the leveldb ‘readrandom’ benchmark:

#thr  	 stock        patch-CNA   speedup (patch-CNA/stock)
  1  0.533 (0.014)  0.533 (0.016)  1.001
  2  0.667 (0.029)  0.669 (0.027)  1.003
  4  0.699 (0.018)  0.714 (0.026)  1.021
  8  0.692 (0.020)  0.696 (0.026)  1.005
 16  0.730 (0.029)  0.733 (0.027)  1.004
 32  0.726 (0.034)  0.978 (0.118)  1.348
 36  0.740 (0.042)  1.099 (0.111)  1.485
 72  0.671 (0.033)  1.167 (0.021)  1.739
108  0.633 (0.017)  1.161 (0.028)  1.834
142  0.606 (0.016)  1.144 (0.018)  1.887

Additional performance numbers are available in previous revisions
of the series.

Further comments are welcome and appreciated.

Alex Kogan (5):
  locking/qspinlock: Rename mcs lock/unlock macros and make them more
    generic
  locking/qspinlock: Refactor the qspinlock slow path
  locking/qspinlock: Introduce CNA into the slow path of qspinlock
  locking/qspinlock: Introduce starvation avoidance into CNA
  locking/qspinlock: Introduce the shuffle reduction optimization into
    CNA

 Documentation/admin-guide/kernel-parameters.txt |  18 ++
 arch/arm/include/asm/mcs_spinlock.h             |   6 +-
 arch/x86/Kconfig                                |  19 ++
 arch/x86/include/asm/qspinlock.h                |   4 +
 arch/x86/kernel/alternative.c                   |  70 ++++++
 include/asm-generic/mcs_spinlock.h              |   4 +-
 kernel/locking/mcs_spinlock.h                   |  20 +-
 kernel/locking/qspinlock.c                      |  77 +++++-
 kernel/locking/qspinlock_cna.h                  | 315 ++++++++++++++++++++++++
 kernel/locking/qspinlock_paravirt.h             |   2 +-
 10 files changed, 512 insertions(+), 23 deletions(-)
 create mode 100644 kernel/locking/qspinlock_cna.h

-- 
2.11.0 (Apple Git-81)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic
  2019-11-07 17:46 [PATCH v6 0/5] Add NUMA-awareness to qspinlock Alex Kogan
@ 2019-11-07 17:46 ` Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Alex Kogan @ 2019-11-07 17:46 UTC (permalink / raw)
  To: linux, peterz, mingo, will.deacon, arnd, longman, linux-arch,
	linux-arm-kernel, linux-kernel, tglx, bp, hpa, x86, guohanjun,
	jglauber
  Cc: alex.kogan, dave.dice, rahul.x.yadav, steven.sistare, daniel.m.jordan

The mcs unlock macro (arch_mcs_pass_lock) should accept the value to be
stored into the lock argument as another argument. This allows using the
same macro in cases where the value to be stored when passing the lock is
different from 1.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 arch/arm/include/asm/mcs_spinlock.h |  6 +++---
 include/asm-generic/mcs_spinlock.h  |  4 ++--
 kernel/locking/mcs_spinlock.h       | 18 +++++++++---------
 kernel/locking/qspinlock.c          |  4 ++--
 kernel/locking/qspinlock_paravirt.h |  2 +-
 5 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/arm/include/asm/mcs_spinlock.h b/arch/arm/include/asm/mcs_spinlock.h
index 529d2cf4d06f..693fe6ce3c43 100644
--- a/arch/arm/include/asm/mcs_spinlock.h
+++ b/arch/arm/include/asm/mcs_spinlock.h
@@ -6,7 +6,7 @@
 #include <asm/spinlock.h>
 
 /* MCS spin-locking. */
-#define arch_mcs_spin_lock_contended(lock)				\
+#define arch_mcs_spin_lock(lock)				\
 do {									\
 	/* Ensure prior stores are observed before we enter wfe. */	\
 	smp_mb();							\
@@ -14,9 +14,9 @@ do {									\
 		wfe();							\
 } while (0)								\
 
-#define arch_mcs_spin_unlock_contended(lock)				\
+#define arch_mcs_pass_lock(lock, val)					\
 do {									\
-	smp_store_release(lock, 1);					\
+	smp_store_release((lock), (val));				\
 	dsb_sev();							\
 } while (0)
 
diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h
index 10cd4ffc6ba2..868da43dba7c 100644
--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -4,8 +4,8 @@
 /*
  * Architectures can define their own:
  *
- *   arch_mcs_spin_lock_contended(l)
- *   arch_mcs_spin_unlock_contended(l)
+ *   arch_mcs_spin_lock(l)
+ *   arch_mcs_pass_lock(l, val)
  *
  * See kernel/locking/mcs_spinlock.c.
  */
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 5e10153b4d3c..52d06ec6f525 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -21,7 +21,7 @@ struct mcs_spinlock {
 	int count;  /* nesting count, see qspinlock.c */
 };
 
-#ifndef arch_mcs_spin_lock_contended
+#ifndef arch_mcs_spin_lock
 /*
  * Using smp_cond_load_acquire() provides the acquire semantics
  * required so that subsequent operations happen after the
@@ -29,20 +29,20 @@ struct mcs_spinlock {
  * ARM64 would like to do spin-waiting instead of purely
  * spinning, and smp_cond_load_acquire() provides that behavior.
  */
-#define arch_mcs_spin_lock_contended(l)					\
-do {									\
-	smp_cond_load_acquire(l, VAL);					\
+#define arch_mcs_spin_lock(l)					\
+do {								\
+	smp_cond_load_acquire(l, VAL);				\
 } while (0)
 #endif
 
-#ifndef arch_mcs_spin_unlock_contended
+#ifndef arch_mcs_spin_unlock
 /*
  * smp_store_release() provides a memory barrier to ensure all
  * operations in the critical section has been completed before
  * unlocking.
  */
-#define arch_mcs_spin_unlock_contended(l)				\
-	smp_store_release((l), 1)
+#define arch_mcs_pass_lock(l, val)				\
+	smp_store_release((l), (val))
 #endif
 
 /*
@@ -91,7 +91,7 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 	WRITE_ONCE(prev->next, node);
 
 	/* Wait until the lock holder passes the lock down. */
-	arch_mcs_spin_lock_contended(&node->locked);
+	arch_mcs_spin_lock(&node->locked);
 }
 
 /*
@@ -115,7 +115,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 	}
 
 	/* Pass lock to next waiter. */
-	arch_mcs_spin_unlock_contended(&next->locked);
+	arch_mcs_pass_lock(&next->locked, 1);
 }
 
 #endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2473f10c6956..804c0fbd6328 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -470,7 +470,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 		WRITE_ONCE(prev->next, node);
 
 		pv_wait_node(node, prev);
-		arch_mcs_spin_lock_contended(&node->locked);
+		arch_mcs_spin_lock(&node->locked);
 
 		/*
 		 * While waiting for the MCS lock, the next pointer may have
@@ -549,7 +549,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	if (!next)
 		next = smp_cond_load_relaxed(&node->next, (VAL));
 
-	arch_mcs_spin_unlock_contended(&next->locked);
+	arch_mcs_pass_lock(&next->locked, 1);
 	pv_kick_node(lock, next);
 
 release:
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index e84d21aa0722..e98079414671 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -368,7 +368,7 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
 	 *
 	 * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
 	 *
-	 * The write to next->locked in arch_mcs_spin_unlock_contended()
+	 * The write to next->locked in arch_mcs_pass_lock()
 	 * must be ordered before the read of pn->state in the cmpxchg()
 	 * below for the code to work correctly. To guarantee full ordering
 	 * irrespective of the success or failure of the cmpxchg(),
-- 
2.11.0 (Apple Git-81)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/5] locking/qspinlock: Refactor the qspinlock slow path
  2019-11-07 17:46 [PATCH v6 0/5] Add NUMA-awareness to qspinlock Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan
@ 2019-11-07 17:46 ` Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Alex Kogan @ 2019-11-07 17:46 UTC (permalink / raw)
  To: linux, peterz, mingo, will.deacon, arnd, longman, linux-arch,
	linux-arm-kernel, linux-kernel, tglx, bp, hpa, x86, guohanjun,
	jglauber
  Cc: alex.kogan, dave.dice, rahul.x.yadav, steven.sistare, daniel.m.jordan

Move some of the code manipulating the spin lock into separate functions.
This would allow easier integration of alternative ways to manipulate
that lock.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 kernel/locking/qspinlock.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 804c0fbd6328..c06d1e8075d9 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -288,6 +288,34 @@ static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
 #define queued_spin_lock_slowpath	native_queued_spin_lock_slowpath
 #endif
 
+/*
+ * __try_clear_tail - try to clear tail by setting the lock value to
+ * _Q_LOCKED_VAL.
+ * @lock: Pointer to the queued spinlock structure
+ * @val: Current value of the lock
+ * @node: Pointer to the MCS node of the lock holder
+ */
+static __always_inline bool __try_clear_tail(struct qspinlock *lock,
+						   u32 val,
+						   struct mcs_spinlock *node)
+{
+	return atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL);
+}
+
+/*
+ * __mcs_pass_lock - pass the MCS lock to the next waiter
+ * @node: Pointer to the MCS node of the lock holder
+ * @next: Pointer to the MCS node of the first waiter in the MCS queue
+ */
+static __always_inline void __mcs_pass_lock(struct mcs_spinlock *node,
+					    struct mcs_spinlock *next)
+{
+	arch_mcs_pass_lock(&next->locked, 1);
+}
+
+#define try_clear_tail	__try_clear_tail
+#define mcs_pass_lock		__mcs_pass_lock
+
 #endif /* _GEN_PV_LOCK_SLOWPATH */
 
 /**
@@ -532,7 +560,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 *       PENDING will make the uncontended transition fail.
 	 */
 	if ((val & _Q_TAIL_MASK) == tail) {
-		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
+		if (try_clear_tail(lock, val, node))
 			goto release; /* No contention */
 	}
 
@@ -549,7 +577,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	if (!next)
 		next = smp_cond_load_relaxed(&node->next, (VAL));
 
-	arch_mcs_pass_lock(&next->locked, 1);
+	mcs_pass_lock(node, next);
 	pv_kick_node(lock, next);
 
 release:
@@ -574,6 +602,12 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath);
 #undef pv_kick_node
 #undef pv_wait_head_or_lock
 
+#undef try_clear_tail
+#define try_clear_tail		__try_clear_tail
+
+#undef mcs_pass_lock
+#define mcs_pass_lock			__mcs_pass_lock
+
 #undef  queued_spin_lock_slowpath
 #define queued_spin_lock_slowpath	__pv_queued_spin_lock_slowpath
 
-- 
2.11.0 (Apple Git-81)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-07 17:46 [PATCH v6 0/5] Add NUMA-awareness to qspinlock Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan
@ 2019-11-07 17:46 ` Alex Kogan
  2019-11-10 21:30   ` kbuild test robot
  2019-11-20 15:16   ` kbuild test robot
  2019-11-07 17:46 ` [PATCH v6 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan
  4 siblings, 2 replies; 13+ messages in thread
From: Alex Kogan @ 2019-11-07 17:46 UTC (permalink / raw)
  To: linux, peterz, mingo, will.deacon, arnd, longman, linux-arch,
	linux-arm-kernel, linux-kernel, tglx, bp, hpa, x86, guohanjun,
	jglauber
  Cc: alex.kogan, dave.dice, rahul.x.yadav, steven.sistare, daniel.m.jordan

In CNA, spinning threads are organized in two queues, a main queue for
threads running on the same node as the current lock holder, and a
secondary queue for threads running on other nodes. After acquiring the
MCS lock and before acquiring the spinlock, the lock holder scans the
main queue looking for a thread running on the same node (pre-scan). If
found (call it thread T), all threads in the main queue between the
current lock holder and T are moved to the end of the secondary queue.
If such T is not found, we make another scan of the main queue when
unlocking the MCS lock (post-scan), starting at the position where
pre-scan stopped. If both scans fail to find such T, the MCS lock is
passed to the first thread in the secondary queue. If the secondary queue
is empty, the lock is passed to the next thread in the main queue.
For more details, see https://arxiv.org/abs/1810.05600.

Note that this variant of CNA may introduce starvation by continuously
passing the lock to threads running on the same node. This issue
will be addressed later in the series.

Enabling CNA is controlled via a new configuration option
(NUMA_AWARE_SPINLOCKS). By default, the CNA variant is patched in at the
boot time only if we run on a multi-node machine in native environment and
the new config is enabled. (For the time being, the patching requires
CONFIG_PARAVIRT_SPINLOCKS to be enabled as well. However, this should be
resolved once static_call() is available.) This default behavior can be
overridden with the new kernel boot command-line option
"numa_spinlock=on/off" (default is "auto").

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  10 +
 arch/x86/Kconfig                                |  19 ++
 arch/x86/include/asm/qspinlock.h                |   4 +
 arch/x86/kernel/alternative.c                   |  43 ++++
 kernel/locking/mcs_spinlock.h                   |   2 +-
 kernel/locking/qspinlock.c                      |  34 +++-
 kernel/locking/qspinlock_cna.h                  | 260 ++++++++++++++++++++++++
 7 files changed, 367 insertions(+), 5 deletions(-)
 create mode 100644 kernel/locking/qspinlock_cna.h

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f8881e..a2fd0c669dba 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3148,6 +3148,16 @@
 
 	nox2apic	[X86-64,APIC] Do not enable x2APIC mode.
 
+	numa_spinlock=	[NUMA, PV_OPS] Select the NUMA-aware variant
+			of spinlock. The options are:
+			auto - Enable this variant if running on a multi-node
+			machine in native environment.
+			on  - Unconditionally enable this variant.
+			off - Unconditionally disable this variant.
+
+			Not specifying this option is equivalent to
+			numa_spinlock=auto.
+
 	cpu0_hotplug	[X86] Turn on CPU0 hotplug feature when
 			CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
 			Some features depend on CPU0. Known dependencies are:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d6e1faa28c58..1d480f190def 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1573,6 +1573,25 @@ config NUMA
 
 	  Otherwise, you should say N.
 
+config NUMA_AWARE_SPINLOCKS
+	bool "Numa-aware spinlocks"
+	depends on NUMA
+	depends on QUEUED_SPINLOCKS
+	# For now, we depend on PARAVIRT_SPINLOCKS to make the patching work.
+	# This is awkward, but hopefully would be resolved once static_call()
+	# is available.
+	depends on PARAVIRT_SPINLOCKS
+	default y
+	help
+	  Introduce NUMA (Non Uniform Memory Access) awareness into
+	  the slow path of spinlocks.
+
+	  In this variant of qspinlock, the kernel will try to keep the lock
+	  on the same node, thus reducing the number of remote cache misses,
+	  while trading some of the short term fairness for better performance.
+
+	  Say N if you want absolute first come first serve fairness.
+
 config AMD_NUMA
 	def_bool y
 	prompt "Old style AMD Opteron NUMA detection"
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 444d6fd9a6d8..6fa8fcc5c7af 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -27,6 +27,10 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo
 	return val;
 }
 
+#ifdef CONFIG_NUMA_AWARE_SPINLOCKS
+extern void __cna_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+#endif
+
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __pv_init_lock_hash(void);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 9d3a971ea364..6a4ccbf4e09c 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -698,6 +698,33 @@ static void __init int3_selftest(void)
 	unregister_die_notifier(&int3_exception_nb);
 }
 
+#if defined(CONFIG_NUMA_AWARE_SPINLOCKS)
+/*
+ * Constant (boot-param configurable) flag selecting the NUMA-aware variant
+ * of spinlock.  Possible values: -1 (off) / 0 (auto, default) / 1 (on).
+ */
+static int numa_spinlock_flag;
+
+static int __init numa_spinlock_setup(char *str)
+{
+	if (!strcmp(str, "auto")) {
+		numa_spinlock_flag = 0;
+		return 1;
+	} else if (!strcmp(str, "on")) {
+		numa_spinlock_flag = 1;
+		return 1;
+	} else if (!strcmp(str, "off")) {
+		numa_spinlock_flag = -1;
+		return 1;
+	}
+
+	return 0;
+}
+
+__setup("numa_spinlock=", numa_spinlock_setup);
+
+#endif
+
 void __init alternative_instructions(void)
 {
 	int3_selftest();
@@ -738,6 +765,22 @@ void __init alternative_instructions(void)
 	}
 #endif
 
+#if defined(CONFIG_NUMA_AWARE_SPINLOCKS)
+	/*
+	 * By default, switch to the NUMA-friendly slow path for
+	 * spinlocks when we have multiple NUMA nodes in native environment.
+	 */
+	if ((numa_spinlock_flag == 1) ||
+	    (numa_spinlock_flag == 0 && nr_node_ids > 1 &&
+		    pv_ops.lock.queued_spin_lock_slowpath ==
+			native_queued_spin_lock_slowpath)) {
+		pv_ops.lock.queued_spin_lock_slowpath =
+		    __cna_queued_spin_lock_slowpath;
+
+		pr_info("Enabling CNA spinlock\n");
+	}
+#endif
+
 	apply_paravirt(__parainstructions, __parainstructions_end);
 
 	restart_nmi();
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 52d06ec6f525..e40b9538b79f 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -17,7 +17,7 @@
 
 struct mcs_spinlock {
 	struct mcs_spinlock *next;
-	int locked; /* 1 if lock acquired */
+	unsigned int locked; /* 1 if lock acquired */
 	int count;  /* nesting count, see qspinlock.c */
 };
 
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index c06d1e8075d9..6d8c4a52e44e 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -11,7 +11,7 @@
  *          Peter Zijlstra <peterz@infradead.org>
  */
 
-#ifndef _GEN_PV_LOCK_SLOWPATH
+#if !defined(_GEN_PV_LOCK_SLOWPATH) && !defined(_GEN_CNA_LOCK_SLOWPATH)
 
 #include <linux/smp.h>
 #include <linux/bug.h>
@@ -70,7 +70,8 @@
 /*
  * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
  * size and four of them will fit nicely in one 64-byte cacheline. For
- * pvqspinlock, however, we need more space for extra data. To accommodate
+ * pvqspinlock, however, we need more space for extra data. The same also
+ * applies for the NUMA-aware variant of spinlocks (CNA). To accommodate
  * that, we insert two more long words to pad it up to 32 bytes. IOW, only
  * two of them can fit in a cacheline in this case. That is OK as it is rare
  * to have more than 2 levels of slowpath nesting in actual use. We don't
@@ -79,7 +80,7 @@
  */
 struct qnode {
 	struct mcs_spinlock mcs;
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#if defined(CONFIG_PARAVIRT_SPINLOCKS) || defined(CONFIG_NUMA_AWARE_SPINLOCKS)
 	long reserved[2];
 #endif
 };
@@ -103,6 +104,8 @@ struct qnode {
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
  *
  * PV doubles the storage and uses the second cacheline for PV state.
+ * CNA also doubles the storage and uses the second cacheline for
+ * CNA-specific state.
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
 
@@ -316,7 +319,7 @@ static __always_inline void __mcs_pass_lock(struct mcs_spinlock *node,
 #define try_clear_tail	__try_clear_tail
 #define mcs_pass_lock		__mcs_pass_lock
 
-#endif /* _GEN_PV_LOCK_SLOWPATH */
+#endif /* _GEN_PV_LOCK_SLOWPATH && _GEN_CNA_LOCK_SLOWPATH */
 
 /**
  * queued_spin_lock_slowpath - acquire the queued spinlock
@@ -589,6 +592,29 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 EXPORT_SYMBOL(queued_spin_lock_slowpath);
 
 /*
+ * Generate the code for NUMA-aware spinlocks
+ */
+#if !defined(_GEN_CNA_LOCK_SLOWPATH) && defined(CONFIG_NUMA_AWARE_SPINLOCKS)
+#define _GEN_CNA_LOCK_SLOWPATH
+
+#undef pv_wait_head_or_lock
+#define pv_wait_head_or_lock		cna_pre_scan
+
+#undef try_clear_tail
+#define try_clear_tail			cna_try_change_tail
+
+#undef mcs_pass_lock
+#define mcs_pass_lock			cna_pass_lock
+
+#undef  queued_spin_lock_slowpath
+#define queued_spin_lock_slowpath	__cna_queued_spin_lock_slowpath
+
+#include "qspinlock_cna.h"
+#include "qspinlock.c"
+
+#endif
+
+/*
  * Generate the paravirt code for queued_spin_unlock_slowpath().
  */
 #if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
new file mode 100644
index 000000000000..0cf5a6a2709c
--- /dev/null
+++ b/kernel/locking/qspinlock_cna.h
@@ -0,0 +1,260 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _GEN_CNA_LOCK_SLOWPATH
+#error "do not include this file"
+#endif
+
+#include <linux/topology.h>
+
+/*
+ * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
+ *
+ * In CNA, spinning threads are organized in two queues, a main queue for
+ * threads running on the same NUMA node as the current lock holder, and a
+ * secondary queue for threads running on other nodes. Schematically, it
+ * looks like this:
+ *
+ *    cna_node
+ *   +----------+    +--------+        +--------+
+ *   |mcs:next  | -> |mcs:next| -> ... |mcs:next| -> NULL      [Main queue]
+ *   |mcs:locked| -+ +--------+        +--------+
+ *   +----------+  |
+ *                 +----------------------+
+ *                                        \/
+ *                 +--------+         +--------+
+ *                 |mcs:next| -> ...  |mcs:next|          [Secondary queue]
+ *                 +--------+         +--------+
+ *                     ^                    |
+ *                     +--------------------+
+ *
+ * N.B. locked = 1 if secondary queue is absent. Othewrise, it contains the
+ * encoded pointer to the tail of the secondary queue, which is organized as a
+ * circular list.
+ *
+ * After acquiring the MCS lock and before acquiring the spinlock, the lock
+ * holder scans the main queue looking for a thread running on the same node
+ * (pre-scan). If found (call it thread T), all threads in the main queue
+ * between the current lock holder and T are moved to the end of the secondary
+ * queue.  If such T is not found, we make another scan of the main queue when
+ * unlocking the MCS lock (post-scan), starting at the node where pre-scan
+ * stopped. If both scans fail to find such T, the MCS lock is passed to the
+ * first thread in the secondary queue. If the secondary queue is empty, the
+ * lock is passed to the next thread in the main queue.
+ *
+ * For more details, see https://arxiv.org/abs/1810.05600.
+ *
+ * Authors: Alex Kogan <alex.kogan@oracle.com>
+ *          Dave Dice <dave.dice@oracle.com>
+ */
+
+struct cna_node {
+	struct mcs_spinlock	mcs;
+	int			numa_node;
+	u32			encoded_tail;
+	u32			pre_scan_result; /* 0 or encoded tail */
+};
+
+static void __init cna_init_nodes_per_cpu(unsigned int cpu)
+{
+	struct mcs_spinlock *base = per_cpu_ptr(&qnodes[0].mcs, cpu);
+	int numa_node = cpu_to_node(cpu);
+	int i;
+
+	for (i = 0; i < MAX_NODES; i++) {
+		struct cna_node *cn = (struct cna_node *)grab_mcs_node(base, i);
+
+		cn->numa_node = numa_node;
+		cn->encoded_tail = encode_tail(cpu, i);
+		/*
+		 * @encoded_tail has to be larger than 1, so we do not confuse
+		 * it with other valid values for @locked or @pre_scan_result
+		 * (0 or 1)
+		 */
+		WARN_ON(cn->encoded_tail <= 1);
+	}
+}
+
+static int __init cna_init_nodes(void)
+{
+	unsigned int cpu;
+
+	BUILD_BUG_ON(sizeof(struct cna_node) > sizeof(struct qnode));
+	/* we store an ecoded tail word in the node's @locked field */
+	BUILD_BUG_ON(sizeof(u32) > sizeof(unsigned int));
+
+	for_each_possible_cpu(cpu)
+		cna_init_nodes_per_cpu(cpu);
+
+	return 0;
+}
+early_initcall(cna_init_nodes);
+
+static inline bool cna_try_change_tail(struct qspinlock *lock, u32 val,
+				       struct mcs_spinlock *node)
+{
+	struct mcs_spinlock *head_2nd, *tail_2nd;
+	u32 new;
+
+	/* If the secondary queue is empty, do what MCS does. */
+	if (node->locked <= 1)
+		return __try_clear_tail(lock, val, node);
+
+	/*
+	 * Try to update the tail value to the last node in the secondary queue.
+	 * If successful, pass the lock to the first thread in the secondary
+	 * queue. Doing those two actions effectively moves all nodes from the
+	 * secondary queue into the main one.
+	 */
+	tail_2nd = decode_tail(node->locked);
+	head_2nd = tail_2nd->next;
+	new = ((struct cna_node *)tail_2nd)->encoded_tail + _Q_LOCKED_VAL;
+
+	if (atomic_try_cmpxchg_relaxed(&lock->val, &val, new)) {
+		/*
+		 * Try to reset @next in tail_2nd to NULL, but no need to check
+		 * the result - if failed, a new successor has updated it.
+		 */
+		cmpxchg_relaxed(&tail_2nd->next, head_2nd, NULL);
+		arch_mcs_pass_lock(&head_2nd->locked, 1);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * cna_splice_tail -- splice nodes in the main queue between [first, last]
+ * onto the secondary queue.
+ */
+static void cna_splice_tail(struct mcs_spinlock *node,
+			    struct mcs_spinlock *first,
+			    struct mcs_spinlock *last)
+{
+	/* remove [first,last] */
+	node->next = last->next;
+
+	/* stick [first,last] on the secondary queue tail */
+	if (node->locked <= 1) { /* if secondary queue is empty */
+		/* create secondary queue */
+		last->next = first;
+	} else {
+		/* add to the tail of the secondary queue */
+		struct mcs_spinlock *tail_2nd = decode_tail(node->locked);
+		struct mcs_spinlock *head_2nd = tail_2nd->next;
+
+		tail_2nd->next = first;
+		last->next = head_2nd;
+	}
+
+	node->locked = ((struct cna_node *)last)->encoded_tail;
+}
+
+/*
+ * cna_scan_main_queue - scan the main waiting queue looking for the first
+ * thread running on the same NUMA node as the lock holder. If found (call it
+ * thread T), move all threads in the main queue between the lock holder and
+ * T to the end of the secondary queue and return 0; otherwise, return the
+ * encoded pointer of the last scanned node in the primary queue (so a
+ * subsequent scan can be resumed from that node)
+ *
+ * Schematically, this may look like the following (nn stands for numa_node and
+ * et stands for encoded_tail).
+ *
+ *   when cna_scan_main_queue() is called (the secondary queue is empty):
+ *
+ *  A+------------+   B+--------+   C+--------+   T+--------+
+ *   |mcs:next    | -> |mcs:next| -> |mcs:next| -> |mcs:next| -> NULL
+ *   |mcs:locked=1|    |cna:nn=0|    |cna:nn=2|    |cna:nn=1|
+ *   |cna:nn=1    |    +--------+    +--------+    +--------+
+ *   +----------- +
+ *
+ *   when cna_scan_main_queue() returns (the secondary queue contains B and C):
+ *
+ *  A+----------------+    T+--------+
+ *   |mcs:next        | ->  |mcs:next| -> NULL
+ *   |mcs:locked=C.et | -+  |cna:nn=1|
+ *   |cna:nn=1        |  |  +--------+
+ *   +--------------- +  +-----+
+ *                             \/
+ *          B+--------+   C+--------+
+ *           |mcs:next| -> |mcs:next| -+
+ *           |cna:nn=0|    |cna:nn=2|  |
+ *           +--------+    +--------+  |
+ *               ^                     |
+ *               +---------------------+
+ *
+ * The worst case complexity of the scan is O(n), where n is the number
+ * of current waiters. However, the amortized complexity is close to O(1),
+ * as the immediate successor is likely to be running on the same node once
+ * threads from other nodes are moved to the secondary queue.
+ */
+static u32 cna_scan_main_queue(struct mcs_spinlock *node,
+			       struct mcs_spinlock *pred_start)
+{
+	struct cna_node *cn = (struct cna_node *)node;
+	struct cna_node *cni = (struct cna_node *)READ_ONCE(pred_start->next);
+	struct cna_node *last;
+	int my_numa_node = cn->numa_node;
+
+	/* find any next waiter on 'our' NUMA node */
+	for (last = cn;
+	     cni && cni->numa_node != my_numa_node;
+	     last = cni, cni = (struct cna_node *)READ_ONCE(cni->mcs.next))
+		;
+
+	/* if found, splice any skipped waiters onto the secondary queue */
+	if (cni) {
+		if (last != cn)	/* did we skip any waiters? */
+			cna_splice_tail(node, node->next,
+					(struct mcs_spinlock *)last);
+		return 0;
+	}
+
+	return last->encoded_tail;
+}
+
+__always_inline u32 cna_pre_scan(struct qspinlock *lock,
+				  struct mcs_spinlock *node)
+{
+	struct cna_node *cn = (struct cna_node *)node;
+
+	cn->pre_scan_result = cna_scan_main_queue(node, node);
+
+	return 0;
+}
+
+static inline void cna_pass_lock(struct mcs_spinlock *node,
+				 struct mcs_spinlock *next)
+{
+	struct cna_node *cn = (struct cna_node *)node;
+	struct mcs_spinlock *next_holder = next, *tail_2nd;
+	u32 val = 1;
+
+	u32 scan = cn->pre_scan_result;
+
+	/*
+	 * check if a successor from the same numa node has not been found in
+	 * pre-scan, and if so, try to find it in post-scan starting from the
+	 * node where pre-scan stopped (stored in @pre_scan_result)
+	 */
+	if (scan > 0)
+		scan = cna_scan_main_queue(node, decode_tail(scan));
+
+	if (!scan) { /* if found a successor from the same numa node */
+		next_holder = node->next;
+		/*
+		 * we unlock successor by passing a non-zero value,
+		 * so set @val to 1 iff @locked is 0, which will happen
+		 * if we acquired the MCS lock when its queue was empty
+		 */
+		val = node->locked ? node->locked : 1;
+	} else if (node->locked > 1) {	  /* if secondary queue is not empty */
+		/* next holder will be the first node in the secondary queue */
+		tail_2nd = decode_tail(node->locked);
+		/* @tail_2nd->next points to the head of the secondary queue */
+		next_holder = tail_2nd->next;
+		/* splice the secondary queue onto the head of the main queue */
+		tail_2nd->next = next;
+	}
+
+	arch_mcs_pass_lock(&next_holder->locked, val);
+}
-- 
2.11.0 (Apple Git-81)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/5] locking/qspinlock: Introduce starvation avoidance into CNA
  2019-11-07 17:46 [PATCH v6 0/5] Add NUMA-awareness to qspinlock Alex Kogan
                   ` (2 preceding siblings ...)
  2019-11-07 17:46 ` [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
@ 2019-11-07 17:46 ` Alex Kogan
  2019-11-07 17:46 ` [PATCH v6 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan
  4 siblings, 0 replies; 13+ messages in thread
From: Alex Kogan @ 2019-11-07 17:46 UTC (permalink / raw)
  To: linux, peterz, mingo, will.deacon, arnd, longman, linux-arch,
	linux-arm-kernel, linux-kernel, tglx, bp, hpa, x86, guohanjun,
	jglauber
  Cc: alex.kogan, dave.dice, rahul.x.yadav, steven.sistare, daniel.m.jordan

Keep track of the number of intra-node lock handoffs, and force
inter-node handoff once this number reaches a preset threshold.
The default value for the threshold can be overridden with
the new kernel boot command-line option "numa_spinlock_threshold".

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
 arch/x86/kernel/alternative.c                   | 27 +++++++++++++++++++++++++
 kernel/locking/qspinlock.c                      |  3 +++
 kernel/locking/qspinlock_cna.h                  | 27 ++++++++++++++++++++++---
 4 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a2fd0c669dba..07913c7da351 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3158,6 +3158,14 @@
 			Not specifying this option is equivalent to
 			numa_spinlock=auto.
 
+	numa_spinlock_threshold=	[NUMA, PV_OPS]
+			Set the threshold for the number of intra-node
+			lock hand-offs before the NUMA-aware spinlock
+			is forced to be passed to a thread on another NUMA node.
+			Valid values are in the [0..31] range. Smaller values
+			result in a more fair, but less performant spinlock, and
+			vice versa. The default value is 16.
+
 	cpu0_hotplug	[X86] Turn on CPU0 hotplug feature when
 			CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
 			Some features depend on CPU0. Known dependencies are:
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 6a4ccbf4e09c..28552e0491b5 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -723,6 +723,33 @@ static int __init numa_spinlock_setup(char *str)
 
 __setup("numa_spinlock=", numa_spinlock_setup);
 
+/*
+ * Controls the threshold for the number of intra-node lock hand-offs before
+ * the NUMA-aware variant of spinlock is forced to be passed to a thread on
+ * another NUMA node. By default, the chosen value provides reasonable
+ * long-term fairness without sacrificing performance compared to a lock
+ * that does not have any fairness guarantees.
+ */
+int intra_node_handoff_threshold = 1 << 16;
+
+static int __init numa_spinlock_threshold_setup(char *str)
+{
+	int new_threshold_param;
+
+	if (get_option(&str, &new_threshold_param)) {
+		/* valid value is between 0 and 31 */
+		if (new_threshold_param < 0 || new_threshold_param > 31)
+			return 0;
+
+		intra_node_handoff_threshold = 1 << new_threshold_param;
+		return 1;
+	}
+
+	return 0;
+}
+
+__setup("numa_spinlock_threshold=", numa_spinlock_threshold_setup);
+
 #endif
 
 void __init alternative_instructions(void)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 6d8c4a52e44e..1d0d884308ef 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -597,6 +597,9 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath);
 #if !defined(_GEN_CNA_LOCK_SLOWPATH) && defined(CONFIG_NUMA_AWARE_SPINLOCKS)
 #define _GEN_CNA_LOCK_SLOWPATH
 
+#undef pv_init_node
+#define pv_init_node			cna_init_node
+
 #undef pv_wait_head_or_lock
 #define pv_wait_head_or_lock		cna_pre_scan
 
diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index 0cf5a6a2709c..916c9083e8ec 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -50,9 +50,16 @@ struct cna_node {
 	struct mcs_spinlock	mcs;
 	int			numa_node;
 	u32			encoded_tail;
-	u32			pre_scan_result; /* 0 or encoded tail */
+	u32			pre_scan_result; /* 0, 1 or encoded tail */
+	u32			intra_count;
 };
 
+/*
+ * Controls the threshold for the number of intra-node lock hand-offs.
+ * See arch/x86/kernel/alternative.c for details.
+ */
+extern int intra_node_handoff_threshold;
+
 static void __init cna_init_nodes_per_cpu(unsigned int cpu)
 {
 	struct mcs_spinlock *base = per_cpu_ptr(&qnodes[0].mcs, cpu);
@@ -88,6 +95,11 @@ static int __init cna_init_nodes(void)
 }
 early_initcall(cna_init_nodes);
 
+static __always_inline void cna_init_node(struct mcs_spinlock *node)
+{
+	((struct cna_node *)node)->intra_count = 0;
+}
+
 static inline bool cna_try_change_tail(struct qspinlock *lock, u32 val,
 				       struct mcs_spinlock *node)
 {
@@ -217,7 +229,13 @@ __always_inline u32 cna_pre_scan(struct qspinlock *lock,
 {
 	struct cna_node *cn = (struct cna_node *)node;
 
-	cn->pre_scan_result = cna_scan_main_queue(node, node);
+	/*
+	 * setting @pre_scan_result to 1 indicates that no post-scan
+	 * should be made in cna_pass_lock()
+	 */
+	cn->pre_scan_result =
+		cn->intra_count == intra_node_handoff_threshold ?
+			1 : cna_scan_main_queue(node, node);
 
 	return 0;
 }
@@ -236,7 +254,7 @@ static inline void cna_pass_lock(struct mcs_spinlock *node,
 	 * pre-scan, and if so, try to find it in post-scan starting from the
 	 * node where pre-scan stopped (stored in @pre_scan_result)
 	 */
-	if (scan > 0)
+	if (scan > 1)
 		scan = cna_scan_main_queue(node, decode_tail(scan));
 
 	if (!scan) { /* if found a successor from the same numa node */
@@ -247,6 +265,9 @@ static inline void cna_pass_lock(struct mcs_spinlock *node,
 		 * if we acquired the MCS lock when its queue was empty
 		 */
 		val = node->locked ? node->locked : 1;
+		/* inc @intra_count if the secondary queue is not empty */
+		((struct cna_node *)next_holder)->intra_count =
+			cn->intra_count + (node->locked > 1);
 	} else if (node->locked > 1) {	  /* if secondary queue is not empty */
 		/* next holder will be the first node in the secondary queue */
 		tail_2nd = decode_tail(node->locked);
-- 
2.11.0 (Apple Git-81)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 5/5] locking/qspinlock: Introduce the shuffle reduction optimization into CNA
  2019-11-07 17:46 [PATCH v6 0/5] Add NUMA-awareness to qspinlock Alex Kogan
                   ` (3 preceding siblings ...)
  2019-11-07 17:46 ` [PATCH v6 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan
@ 2019-11-07 17:46 ` Alex Kogan
  4 siblings, 0 replies; 13+ messages in thread
From: Alex Kogan @ 2019-11-07 17:46 UTC (permalink / raw)
  To: linux, peterz, mingo, will.deacon, arnd, longman, linux-arch,
	linux-arm-kernel, linux-kernel, tglx, bp, hpa, x86, guohanjun,
	jglauber
  Cc: alex.kogan, dave.dice, rahul.x.yadav, steven.sistare, daniel.m.jordan

This optimization reduces the probability threads will be shuffled between
the main and secondary queues when the secondary queue is empty.
It is helpful when the lock is only lightly contended.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 kernel/locking/qspinlock_cna.h | 50 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 42 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index 916c9083e8ec..0549ce40dc75 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -4,6 +4,7 @@
 #endif
 
 #include <linux/topology.h>
+#include <linux/random.h>
 
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
@@ -50,7 +51,7 @@ struct cna_node {
 	struct mcs_spinlock	mcs;
 	int			numa_node;
 	u32			encoded_tail;
-	u32			pre_scan_result; /* 0, 1 or encoded tail */
+	u32			pre_scan_result; /* 0, 1, 2 or encoded tail */
 	u32			intra_count;
 };
 
@@ -60,6 +61,34 @@ struct cna_node {
  */
 extern int intra_node_handoff_threshold;
 
+/*
+ * Controls the probability for enabling the scan of the main queue when
+ * the secondary queue is empty. The chosen value reduces the amount of
+ * unnecessary shuffling of threads between the two waiting queues when
+ * the contention is low, while responding fast enough and enabling
+ * the shuffling when the contention is high.
+ */
+#define SHUFFLE_REDUCTION_PROB_ARG  (7)
+
+/* Per-CPU pseudo-random number seed */
+static DEFINE_PER_CPU(u32, seed);
+
+/*
+ * Return false with probability 1 / 2^@num_bits.
+ * Intuitively, the larger @num_bits the less likely false is to be returned.
+ * @num_bits must be a number between 0 and 31.
+ */
+static bool probably(unsigned int num_bits)
+{
+	u32 s;
+
+	s = this_cpu_read(seed);
+	s = next_pseudo_random32(s);
+	this_cpu_write(seed, s);
+
+	return s & ((1 << num_bits) - 1);
+}
+
 static void __init cna_init_nodes_per_cpu(unsigned int cpu)
 {
 	struct mcs_spinlock *base = per_cpu_ptr(&qnodes[0].mcs, cpu);
@@ -72,11 +101,11 @@ static void __init cna_init_nodes_per_cpu(unsigned int cpu)
 		cn->numa_node = numa_node;
 		cn->encoded_tail = encode_tail(cpu, i);
 		/*
-		 * @encoded_tail has to be larger than 1, so we do not confuse
+		 * @encoded_tail has to be larger than 2, so we do not confuse
 		 * it with other valid values for @locked or @pre_scan_result
-		 * (0 or 1)
+		 * (0, 1 or 2)
 		 */
-		WARN_ON(cn->encoded_tail <= 1);
+		WARN_ON(cn->encoded_tail <= 2);
 	}
 }
 
@@ -230,12 +259,13 @@ __always_inline u32 cna_pre_scan(struct qspinlock *lock,
 	struct cna_node *cn = (struct cna_node *)node;
 
 	/*
-	 * setting @pre_scan_result to 1 indicates that no post-scan
+	 * setting @pre_scan_result to 1 or 2 indicates that no post-scan
 	 * should be made in cna_pass_lock()
 	 */
 	cn->pre_scan_result =
-		cn->intra_count == intra_node_handoff_threshold ?
-			1 : cna_scan_main_queue(node, node);
+		(node->locked <= 1 && probably(SHUFFLE_REDUCTION_PROB_ARG)) ?
+			1 : cn->intra_count == intra_node_handoff_threshold ?
+			2 : cna_scan_main_queue(node, node);
 
 	return 0;
 }
@@ -249,12 +279,15 @@ static inline void cna_pass_lock(struct mcs_spinlock *node,
 
 	u32 scan = cn->pre_scan_result;
 
+	if (scan == 1)
+		goto pass_lock;
+
 	/*
 	 * check if a successor from the same numa node has not been found in
 	 * pre-scan, and if so, try to find it in post-scan starting from the
 	 * node where pre-scan stopped (stored in @pre_scan_result)
 	 */
-	if (scan > 1)
+	if (scan > 2)
 		scan = cna_scan_main_queue(node, decode_tail(scan));
 
 	if (!scan) { /* if found a successor from the same numa node */
@@ -277,5 +310,6 @@ static inline void cna_pass_lock(struct mcs_spinlock *node,
 		tail_2nd->next = next;
 	}
 
+pass_lock:
 	arch_mcs_pass_lock(&next_holder->locked, val);
 }
-- 
2.11.0 (Apple Git-81)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-07 17:46 ` [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
@ 2019-11-10 21:30   ` kbuild test robot
  2019-11-14 20:57     ` Alex Kogan
  2019-11-20 15:16   ` kbuild test robot
  1 sibling, 1 reply; 13+ messages in thread
From: kbuild test robot @ 2019-11-10 21:30 UTC (permalink / raw)
  To: Alex Kogan
  Cc: linux-arch, guohanjun, kbuild-all, arnd, peterz, dave.dice,
	jglauber, x86, will.deacon, linux, steven.sistare, linux-kernel,
	rahul.x.yadav, mingo, bp, hpa, alex.kogan, longman, tglx,
	daniel.m.jordan, linux-arm-kernel

Hi Alex,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.4-rc6 next-20191108]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Alex-Kogan/locking-qspinlock-Rename-mcs-lock-unlock-macros-and-make-them-more-generic/20191109-180535
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 0058b0a506e40d9a2c62015fe92eb64a44d78cd9
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-21-gb31adac-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   kernel/locking/qspinlock.c:450:14: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *[assigned] node @@    got ct mcs_spinlock *[assigned] node @@
   kernel/locking/qspinlock.c:450:14: sparse:    expected struct mcs_spinlock *[assigned] node
   kernel/locking/qspinlock.c:450:14: sparse:    got struct mcs_spinlock [pure] *
   kernel/locking/qspinlock.c:498:22: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *prev @@    got struct struct mcs_spinlock *prev @@
   kernel/locking/qspinlock.c:498:22: sparse:    expected struct mcs_spinlock *prev
   kernel/locking/qspinlock.c:498:22: sparse:    got struct mcs_spinlock [pure] *
>> kernel/locking/qspinlock_cna.h:141:60: sparse: sparse: incorrect type in initializer (different modifiers) @@    expected struct mcs_spinlock *tail_2nd @@    got struct struct mcs_spinlock *tail_2nd @@
>> kernel/locking/qspinlock_cna.h:141:60: sparse:    expected struct mcs_spinlock *tail_2nd
>> kernel/locking/qspinlock_cna.h:141:60: sparse:    got struct mcs_spinlock [pure] *
   kernel/locking/qspinlock.c:450:14: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *[assigned] node @@    got ct mcs_spinlock *[assigned] node @@
   kernel/locking/qspinlock.c:450:14: sparse:    expected struct mcs_spinlock *[assigned] node
   kernel/locking/qspinlock.c:450:14: sparse:    got struct mcs_spinlock [pure] *
   kernel/locking/qspinlock.c:498:22: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *prev @@    got struct struct mcs_spinlock *prev @@
   kernel/locking/qspinlock.c:498:22: sparse:    expected struct mcs_spinlock *prev
   kernel/locking/qspinlock.c:498:22: sparse:    got struct mcs_spinlock [pure] *
>> kernel/locking/qspinlock_cna.h:107:18: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *tail_2nd @@    got struct struct mcs_spinlock *tail_2nd @@
   kernel/locking/qspinlock_cna.h:107:18: sparse:    expected struct mcs_spinlock *tail_2nd
   kernel/locking/qspinlock_cna.h:107:18: sparse:    got struct mcs_spinlock [pure] *
>> kernel/locking/qspinlock_cna.h:240:61: sparse: sparse: incorrect type in argument 2 (different modifiers) @@    expected struct mcs_spinlock *pred_start @@    got struct struct mcs_spinlock *pred_start @@
>> kernel/locking/qspinlock_cna.h:240:61: sparse:    expected struct mcs_spinlock *pred_start
   kernel/locking/qspinlock_cna.h:240:61: sparse:    got struct mcs_spinlock [pure] *
   kernel/locking/qspinlock_cna.h:252:26: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *tail_2nd @@    got struct struct mcs_spinlock *tail_2nd @@
   kernel/locking/qspinlock_cna.h:252:26: sparse:    expected struct mcs_spinlock *tail_2nd
   kernel/locking/qspinlock_cna.h:252:26: sparse:    got struct mcs_spinlock [pure] *
   kernel/locking/qspinlock.c:450:14: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *[assigned] node @@    got ct mcs_spinlock *[assigned] node @@
   kernel/locking/qspinlock.c:450:14: sparse:    expected struct mcs_spinlock *[assigned] node
   kernel/locking/qspinlock.c:450:14: sparse:    got struct mcs_spinlock [pure] *
   kernel/locking/qspinlock.c:498:22: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *prev @@    got struct struct mcs_spinlock *prev @@
   kernel/locking/qspinlock.c:498:22: sparse:    expected struct mcs_spinlock *prev
   kernel/locking/qspinlock.c:498:22: sparse:    got struct mcs_spinlock [pure] *

vim +141 kernel/locking/qspinlock_cna.h

    90	
    91	static inline bool cna_try_change_tail(struct qspinlock *lock, u32 val,
    92					       struct mcs_spinlock *node)
    93	{
    94		struct mcs_spinlock *head_2nd, *tail_2nd;
    95		u32 new;
    96	
    97		/* If the secondary queue is empty, do what MCS does. */
    98		if (node->locked <= 1)
    99			return __try_clear_tail(lock, val, node);
   100	
   101		/*
   102		 * Try to update the tail value to the last node in the secondary queue.
   103		 * If successful, pass the lock to the first thread in the secondary
   104		 * queue. Doing those two actions effectively moves all nodes from the
   105		 * secondary queue into the main one.
   106		 */
 > 107		tail_2nd = decode_tail(node->locked);
   108		head_2nd = tail_2nd->next;
   109		new = ((struct cna_node *)tail_2nd)->encoded_tail + _Q_LOCKED_VAL;
   110	
   111		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, new)) {
   112			/*
   113			 * Try to reset @next in tail_2nd to NULL, but no need to check
   114			 * the result - if failed, a new successor has updated it.
   115			 */
   116			cmpxchg_relaxed(&tail_2nd->next, head_2nd, NULL);
   117			arch_mcs_pass_lock(&head_2nd->locked, 1);
   118			return true;
   119		}
   120	
   121		return false;
   122	}
   123	
   124	/*
   125	 * cna_splice_tail -- splice nodes in the main queue between [first, last]
   126	 * onto the secondary queue.
   127	 */
   128	static void cna_splice_tail(struct mcs_spinlock *node,
   129				    struct mcs_spinlock *first,
   130				    struct mcs_spinlock *last)
   131	{
   132		/* remove [first,last] */
   133		node->next = last->next;
   134	
   135		/* stick [first,last] on the secondary queue tail */
   136		if (node->locked <= 1) { /* if secondary queue is empty */
   137			/* create secondary queue */
   138			last->next = first;
   139		} else {
   140			/* add to the tail of the secondary queue */
 > 141			struct mcs_spinlock *tail_2nd = decode_tail(node->locked);
   142			struct mcs_spinlock *head_2nd = tail_2nd->next;
   143	
   144			tail_2nd->next = first;
   145			last->next = head_2nd;
   146		}
   147	
   148		node->locked = ((struct cna_node *)last)->encoded_tail;
   149	}
   150	
   151	/*
   152	 * cna_scan_main_queue - scan the main waiting queue looking for the first
   153	 * thread running on the same NUMA node as the lock holder. If found (call it
   154	 * thread T), move all threads in the main queue between the lock holder and
   155	 * T to the end of the secondary queue and return 0; otherwise, return the
   156	 * encoded pointer of the last scanned node in the primary queue (so a
   157	 * subsequent scan can be resumed from that node)
   158	 *
   159	 * Schematically, this may look like the following (nn stands for numa_node and
   160	 * et stands for encoded_tail).
   161	 *
   162	 *   when cna_scan_main_queue() is called (the secondary queue is empty):
   163	 *
   164	 *  A+------------+   B+--------+   C+--------+   T+--------+
   165	 *   |mcs:next    | -> |mcs:next| -> |mcs:next| -> |mcs:next| -> NULL
   166	 *   |mcs:locked=1|    |cna:nn=0|    |cna:nn=2|    |cna:nn=1|
   167	 *   |cna:nn=1    |    +--------+    +--------+    +--------+
   168	 *   +----------- +
   169	 *
   170	 *   when cna_scan_main_queue() returns (the secondary queue contains B and C):
   171	 *
   172	 *  A+----------------+    T+--------+
   173	 *   |mcs:next        | ->  |mcs:next| -> NULL
   174	 *   |mcs:locked=C.et | -+  |cna:nn=1|
   175	 *   |cna:nn=1        |  |  +--------+
   176	 *   +--------------- +  +-----+
   177	 *                             \/
   178	 *          B+--------+   C+--------+
   179	 *           |mcs:next| -> |mcs:next| -+
   180	 *           |cna:nn=0|    |cna:nn=2|  |
   181	 *           +--------+    +--------+  |
   182	 *               ^                     |
   183	 *               +---------------------+
   184	 *
   185	 * The worst case complexity of the scan is O(n), where n is the number
   186	 * of current waiters. However, the amortized complexity is close to O(1),
   187	 * as the immediate successor is likely to be running on the same node once
   188	 * threads from other nodes are moved to the secondary queue.
   189	 */
   190	static u32 cna_scan_main_queue(struct mcs_spinlock *node,
   191				       struct mcs_spinlock *pred_start)
   192	{
   193		struct cna_node *cn = (struct cna_node *)node;
   194		struct cna_node *cni = (struct cna_node *)READ_ONCE(pred_start->next);
   195		struct cna_node *last;
   196		int my_numa_node = cn->numa_node;
   197	
   198		/* find any next waiter on 'our' NUMA node */
   199		for (last = cn;
   200		     cni && cni->numa_node != my_numa_node;
   201		     last = cni, cni = (struct cna_node *)READ_ONCE(cni->mcs.next))
   202			;
   203	
   204		/* if found, splice any skipped waiters onto the secondary queue */
   205		if (cni) {
   206			if (last != cn)	/* did we skip any waiters? */
   207				cna_splice_tail(node, node->next,
   208						(struct mcs_spinlock *)last);
   209			return 0;
   210		}
   211	
   212		return last->encoded_tail;
   213	}
   214	
   215	__always_inline u32 cna_pre_scan(struct qspinlock *lock,
   216					  struct mcs_spinlock *node)
   217	{
   218		struct cna_node *cn = (struct cna_node *)node;
   219	
   220		cn->pre_scan_result = cna_scan_main_queue(node, node);
   221	
   222		return 0;
   223	}
   224	
   225	static inline void cna_pass_lock(struct mcs_spinlock *node,
   226					 struct mcs_spinlock *next)
   227	{
   228		struct cna_node *cn = (struct cna_node *)node;
   229		struct mcs_spinlock *next_holder = next, *tail_2nd;
   230		u32 val = 1;
   231	
   232		u32 scan = cn->pre_scan_result;
   233	
   234		/*
   235		 * check if a successor from the same numa node has not been found in
   236		 * pre-scan, and if so, try to find it in post-scan starting from the
   237		 * node where pre-scan stopped (stored in @pre_scan_result)
   238		 */
   239		if (scan > 0)
 > 240			scan = cna_scan_main_queue(node, decode_tail(scan));

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-10 21:30   ` kbuild test robot
@ 2019-11-14 20:57     ` Alex Kogan
  2019-11-15  0:38       ` Luc Van Oostenryck
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Kogan @ 2019-11-14 20:57 UTC (permalink / raw)
  To: kbuild test robot, linux-sparse
  Cc: linux-arch, guohanjun, kbuild-all, arnd, Peter Zijlstra,
	dave.dice, jglauber, x86, will.deacon, linux, steven.sistare,
	linux-kernel, rahul.x.yadav, mingo, bp, hpa, longman, tglx,
	daniel.m.jordan, linux-arm-kernel

+ linux-sparse mailing list

It seems like a bug in the way sparse handles “pure” functions that return
a pointer.

One of the pure functions in question is defined as following:
	static inline __pure
	struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
	{
        	return &((struct qnode *)base + idx)->mcs;
	}

and the corresponding variable definition and the assignment statement that
produce a warning (in kernel/locking/qspinlock.c) are:
	struct mcs_spinlock *prev, *next, *node;
	…
	node = grab_mcs_node(node, idx);

The issue can be recreated without my patch with
	# sparse version: v0.6.1
	make ARCH=x86_64 allmodconfig
	make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' kernel/locking/qspinlock.o


The warnings can be eliminated by adding an explicit cast, e.g.:

	node = (struct mcs_spinlock *)grab_mcs_node(node, idx);

but this seems wrong (unnecessary) to me.

Regards,
— Alex

> On Nov 10, 2019, at 4:30 PM, kbuild test robot <lkp@intel.com> wrote:
> 
> Hi Alex,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [cannot apply to v5.4-rc6 next-20191108]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_a_37406982&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=hIJsql5G3kZsA2K8s_1WK7096mEKsYe-jEraOUNhbDs&s=4bbPcLEtAedk_fBrSIBMWvdEslLtH5W28nZLbmMIgL8&e= ]
> 
> url:    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_0day-2Dci_linux_commits_Alex-2DKogan_locking-2Dqspinlock-2DRename-2Dmcs-2Dlock-2Dunlock-2Dmacros-2Dand-2Dmake-2Dthem-2Dmore-2Dgeneric_20191109-2D180535&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=hIJsql5G3kZsA2K8s_1WK7096mEKsYe-jEraOUNhbDs&s=ydR3iBtEF-3XUySBCcPYJ8oqw_oNDB-liJdapTXeFeM&e= 
> base:   https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=hIJsql5G3kZsA2K8s_1WK7096mEKsYe-jEraOUNhbDs&s=c4rCmFY0YTXCPiXW9d_BD0RN6WU6QGb64h1iyWNCm9A&e=  0058b0a506e40d9a2c62015fe92eb64a44d78cd9
> reproduce:
>        # apt-get install sparse
>        # sparse version: v0.6.1-21-gb31adac-dirty
>        make ARCH=x86_64 allmodconfig
>        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> 
> sparse warnings: (new ones prefixed by >>)
> 
>   kernel/locking/qspinlock.c:450:14: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *[assigned] node @@    got ct mcs_spinlock *[assigned] node @@
>   kernel/locking/qspinlock.c:450:14: sparse:    expected struct mcs_spinlock *[assigned] node
>   kernel/locking/qspinlock.c:450:14: sparse:    got struct mcs_spinlock [pure] *
>   kernel/locking/qspinlock.c:498:22: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *prev @@    got struct struct mcs_spinlock *prev @@
>   kernel/locking/qspinlock.c:498:22: sparse:    expected struct mcs_spinlock *prev
>   kernel/locking/qspinlock.c:498:22: sparse:    got struct mcs_spinlock [pure] *
>>> kernel/locking/qspinlock_cna.h:141:60: sparse: sparse: incorrect type in initializer (different modifiers) @@    expected struct mcs_spinlock *tail_2nd @@    got struct struct mcs_spinlock *tail_2nd @@
>>> kernel/locking/qspinlock_cna.h:141:60: sparse:    expected struct mcs_spinlock *tail_2nd
>>> kernel/locking/qspinlock_cna.h:141:60: sparse:    got struct mcs_spinlock [pure] *
>   kernel/locking/qspinlock.c:450:14: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *[assigned] node @@    got ct mcs_spinlock *[assigned] node @@
>   kernel/locking/qspinlock.c:450:14: sparse:    expected struct mcs_spinlock *[assigned] node
>   kernel/locking/qspinlock.c:450:14: sparse:    got struct mcs_spinlock [pure] *
>   kernel/locking/qspinlock.c:498:22: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *prev @@    got struct struct mcs_spinlock *prev @@
>   kernel/locking/qspinlock.c:498:22: sparse:    expected struct mcs_spinlock *prev
>   kernel/locking/qspinlock.c:498:22: sparse:    got struct mcs_spinlock [pure] *
>>> kernel/locking/qspinlock_cna.h:107:18: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *tail_2nd @@    got struct struct mcs_spinlock *tail_2nd @@
>   kernel/locking/qspinlock_cna.h:107:18: sparse:    expected struct mcs_spinlock *tail_2nd
>   kernel/locking/qspinlock_cna.h:107:18: sparse:    got struct mcs_spinlock [pure] *
>>> kernel/locking/qspinlock_cna.h:240:61: sparse: sparse: incorrect type in argument 2 (different modifiers) @@    expected struct mcs_spinlock *pred_start @@    got struct struct mcs_spinlock *pred_start @@
>>> kernel/locking/qspinlock_cna.h:240:61: sparse:    expected struct mcs_spinlock *pred_start
>   kernel/locking/qspinlock_cna.h:240:61: sparse:    got struct mcs_spinlock [pure] *
>   kernel/locking/qspinlock_cna.h:252:26: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *tail_2nd @@    got struct struct mcs_spinlock *tail_2nd @@
>   kernel/locking/qspinlock_cna.h:252:26: sparse:    expected struct mcs_spinlock *tail_2nd
>   kernel/locking/qspinlock_cna.h:252:26: sparse:    got struct mcs_spinlock [pure] *
>   kernel/locking/qspinlock.c:450:14: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *[assigned] node @@    got ct mcs_spinlock *[assigned] node @@
>   kernel/locking/qspinlock.c:450:14: sparse:    expected struct mcs_spinlock *[assigned] node
>   kernel/locking/qspinlock.c:450:14: sparse:    got struct mcs_spinlock [pure] *
>   kernel/locking/qspinlock.c:498:22: sparse: sparse: incorrect type in assignment (different modifiers) @@    expected struct mcs_spinlock *prev @@    got struct struct mcs_spinlock *prev @@
>   kernel/locking/qspinlock.c:498:22: sparse:    expected struct mcs_spinlock *prev
>   kernel/locking/qspinlock.c:498:22: sparse:    got struct mcs_spinlock [pure] *
> 
> vim +141 kernel/locking/qspinlock_cna.h
> 
>    90	
>    91	static inline bool cna_try_change_tail(struct qspinlock *lock, u32 val,
>    92					       struct mcs_spinlock *node)
>    93	{
>    94		struct mcs_spinlock *head_2nd, *tail_2nd;
>    95		u32 new;
>    96	
>    97		/* If the secondary queue is empty, do what MCS does. */
>    98		if (node->locked <= 1)
>    99			return __try_clear_tail(lock, val, node);
>   100	
>   101		/*
>   102		 * Try to update the tail value to the last node in the secondary queue.
>   103		 * If successful, pass the lock to the first thread in the secondary
>   104		 * queue. Doing those two actions effectively moves all nodes from the
>   105		 * secondary queue into the main one.
>   106		 */
>> 107		tail_2nd = decode_tail(node->locked);
>   108		head_2nd = tail_2nd->next;
>   109		new = ((struct cna_node *)tail_2nd)->encoded_tail + _Q_LOCKED_VAL;
>   110	
>   111		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, new)) {
>   112			/*
>   113			 * Try to reset @next in tail_2nd to NULL, but no need to check
>   114			 * the result - if failed, a new successor has updated it.
>   115			 */
>   116			cmpxchg_relaxed(&tail_2nd->next, head_2nd, NULL);
>   117			arch_mcs_pass_lock(&head_2nd->locked, 1);
>   118			return true;
>   119		}
>   120	
>   121		return false;
>   122	}
>   123	
>   124	/*
>   125	 * cna_splice_tail -- splice nodes in the main queue between [first, last]
>   126	 * onto the secondary queue.
>   127	 */
>   128	static void cna_splice_tail(struct mcs_spinlock *node,
>   129				    struct mcs_spinlock *first,
>   130				    struct mcs_spinlock *last)
>   131	{
>   132		/* remove [first,last] */
>   133		node->next = last->next;
>   134	
>   135		/* stick [first,last] on the secondary queue tail */
>   136		if (node->locked <= 1) { /* if secondary queue is empty */
>   137			/* create secondary queue */
>   138			last->next = first;
>   139		} else {
>   140			/* add to the tail of the secondary queue */
>> 141			struct mcs_spinlock *tail_2nd = decode_tail(node->locked);
>   142			struct mcs_spinlock *head_2nd = tail_2nd->next;
>   143	
>   144			tail_2nd->next = first;
>   145			last->next = head_2nd;
>   146		}
>   147	
>   148		node->locked = ((struct cna_node *)last)->encoded_tail;
>   149	}
>   150	
>   151	/*
>   152	 * cna_scan_main_queue - scan the main waiting queue looking for the first
>   153	 * thread running on the same NUMA node as the lock holder. If found (call it
>   154	 * thread T), move all threads in the main queue between the lock holder and
>   155	 * T to the end of the secondary queue and return 0; otherwise, return the
>   156	 * encoded pointer of the last scanned node in the primary queue (so a
>   157	 * subsequent scan can be resumed from that node)
>   158	 *
>   159	 * Schematically, this may look like the following (nn stands for numa_node and
>   160	 * et stands for encoded_tail).
>   161	 *
>   162	 *   when cna_scan_main_queue() is called (the secondary queue is empty):
>   163	 *
>   164	 *  A+------------+   B+--------+   C+--------+   T+--------+
>   165	 *   |mcs:next    | -> |mcs:next| -> |mcs:next| -> |mcs:next| -> NULL
>   166	 *   |mcs:locked=1|    |cna:nn=0|    |cna:nn=2|    |cna:nn=1|
>   167	 *   |cna:nn=1    |    +--------+    +--------+    +--------+
>   168	 *   +----------- +
>   169	 *
>   170	 *   when cna_scan_main_queue() returns (the secondary queue contains B and C):
>   171	 *
>   172	 *  A+----------------+    T+--------+
>   173	 *   |mcs:next        | ->  |mcs:next| -> NULL
>   174	 *   |mcs:locked=C.et | -+  |cna:nn=1|
>   175	 *   |cna:nn=1        |  |  +--------+
>   176	 *   +--------------- +  +-----+
>   177	 *                             \/
>   178	 *          B+--------+   C+--------+
>   179	 *           |mcs:next| -> |mcs:next| -+
>   180	 *           |cna:nn=0|    |cna:nn=2|  |
>   181	 *           +--------+    +--------+  |
>   182	 *               ^                     |
>   183	 *               +---------------------+
>   184	 *
>   185	 * The worst case complexity of the scan is O(n), where n is the number
>   186	 * of current waiters. However, the amortized complexity is close to O(1),
>   187	 * as the immediate successor is likely to be running on the same node once
>   188	 * threads from other nodes are moved to the secondary queue.
>   189	 */
>   190	static u32 cna_scan_main_queue(struct mcs_spinlock *node,
>   191				       struct mcs_spinlock *pred_start)
>   192	{
>   193		struct cna_node *cn = (struct cna_node *)node;
>   194		struct cna_node *cni = (struct cna_node *)READ_ONCE(pred_start->next);
>   195		struct cna_node *last;
>   196		int my_numa_node = cn->numa_node;
>   197	
>   198		/* find any next waiter on 'our' NUMA node */
>   199		for (last = cn;
>   200		     cni && cni->numa_node != my_numa_node;
>   201		     last = cni, cni = (struct cna_node *)READ_ONCE(cni->mcs.next))
>   202			;
>   203	
>   204		/* if found, splice any skipped waiters onto the secondary queue */
>   205		if (cni) {
>   206			if (last != cn)	/* did we skip any waiters? */
>   207				cna_splice_tail(node, node->next,
>   208						(struct mcs_spinlock *)last);
>   209			return 0;
>   210		}
>   211	
>   212		return last->encoded_tail;
>   213	}
>   214	
>   215	__always_inline u32 cna_pre_scan(struct qspinlock *lock,
>   216					  struct mcs_spinlock *node)
>   217	{
>   218		struct cna_node *cn = (struct cna_node *)node;
>   219	
>   220		cn->pre_scan_result = cna_scan_main_queue(node, node);
>   221	
>   222		return 0;
>   223	}
>   224	
>   225	static inline void cna_pass_lock(struct mcs_spinlock *node,
>   226					 struct mcs_spinlock *next)
>   227	{
>   228		struct cna_node *cn = (struct cna_node *)node;
>   229		struct mcs_spinlock *next_holder = next, *tail_2nd;
>   230		u32 val = 1;
>   231	
>   232		u32 scan = cn->pre_scan_result;
>   233	
>   234		/*
>   235		 * check if a successor from the same numa node has not been found in
>   236		 * pre-scan, and if so, try to find it in post-scan starting from the
>   237		 * node where pre-scan stopped (stored in @pre_scan_result)
>   238		 */
>   239		if (scan > 0)
>> 240			scan = cna_scan_main_queue(node, decode_tail(scan));
> 
> ---
> 0-DAY kernel test infrastructure                 Open Source Technology Center
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_hyperkitty_list_kbuild-2Dall-40lists.01.org&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=hIJsql5G3kZsA2K8s_1WK7096mEKsYe-jEraOUNhbDs&s=VprTTTCiBtYDpGK-n61PqoAYogz7_cX60cLNj_O8K2E&e=  Intel Corporation


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-14 20:57     ` Alex Kogan
@ 2019-11-15  0:38       ` Luc Van Oostenryck
  0 siblings, 0 replies; 13+ messages in thread
From: Luc Van Oostenryck @ 2019-11-15  0:38 UTC (permalink / raw)
  To: Alex Kogan
  Cc: Peter Zijlstra, hpa, will.deacon, steven.sistare, guohanjun,
	linux-arch, kbuild test robot, x86, linux, daniel.m.jordan,
	linux-sparse, mingo, longman, arnd, jglauber, rahul.x.yadav, bp,
	tglx, linux-arm-kernel, kbuild-all, dave.dice, linux-kernel

On Thu, Nov 14, 2019 at 03:57:34PM -0500, Alex Kogan wrote:
> + linux-sparse mailing list
> 
> It seems like a bug in the way sparse handles “pure” functions that return
> a pointer.

Yes, it's a bug in sparse.
 
> The warnings can be eliminated by adding an explicit cast, e.g.:
> 
> 	node = (struct mcs_spinlock *)grab_mcs_node(node, idx);
> 
> but this seems wrong (unnecessary) to me.

Indeed, it would be wrong.

Thanks for analyzing and reporting this,
-- Luc 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-07 17:46 ` [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
  2019-11-10 21:30   ` kbuild test robot
@ 2019-11-20 15:16   ` kbuild test robot
  2019-11-22 18:28     ` Alex Kogan
  1 sibling, 1 reply; 13+ messages in thread
From: kbuild test robot @ 2019-11-20 15:16 UTC (permalink / raw)
  To: Alex Kogan
  Cc: linux-arch, guohanjun, kbuild-all, arnd, peterz, dave.dice,
	jglauber, x86, will.deacon, linux, steven.sistare, linux-kernel,
	rahul.x.yadav, mingo, bp, hpa, alex.kogan, longman, tglx,
	daniel.m.jordan, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 4394 bytes --]

Hi Alex,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.4-rc8 next-20191120]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Alex-Kogan/locking-qspinlock-Rename-mcs-lock-unlock-macros-and-make-them-more-generic/20191109-180535
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 0058b0a506e40d9a2c62015fe92eb64a44d78cd9
config: i386-randconfig-f003-20191120 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-14) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/export.h:42:0,
                    from include/linux/linkage.h:7,
                    from include/linux/kernel.h:8,
                    from include/linux/list.h:9,
                    from include/linux/smp.h:12,
                    from kernel/locking/qspinlock.c:16:
   kernel/locking/qspinlock_cna.h: In function 'cna_init_nodes':
>> include/linux/compiler.h:350:38: error: call to '__compiletime_assert_80' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct cna_node) > sizeof(struct qnode)
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:331:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
>> kernel/locking/qspinlock_cna.h:80:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(sizeof(struct cna_node) > sizeof(struct qnode));
     ^~~~~~~~~~~~

vim +/__compiletime_assert_80 +350 include/linux/compiler.h

9a8ab1c39970a4 Daniel Santos 2013-02-21  336  
9a8ab1c39970a4 Daniel Santos 2013-02-21  337  #define _compiletime_assert(condition, msg, prefix, suffix) \
9a8ab1c39970a4 Daniel Santos 2013-02-21  338  	__compiletime_assert(condition, msg, prefix, suffix)
9a8ab1c39970a4 Daniel Santos 2013-02-21  339  
9a8ab1c39970a4 Daniel Santos 2013-02-21  340  /**
9a8ab1c39970a4 Daniel Santos 2013-02-21  341   * compiletime_assert - break build and emit msg if condition is false
9a8ab1c39970a4 Daniel Santos 2013-02-21  342   * @condition: a compile-time constant condition to check
9a8ab1c39970a4 Daniel Santos 2013-02-21  343   * @msg:       a message to emit if condition is false
9a8ab1c39970a4 Daniel Santos 2013-02-21  344   *
9a8ab1c39970a4 Daniel Santos 2013-02-21  345   * In tradition of POSIX assert, this macro will break the build if the
9a8ab1c39970a4 Daniel Santos 2013-02-21  346   * supplied condition is *false*, emitting the supplied error message if the
9a8ab1c39970a4 Daniel Santos 2013-02-21  347   * compiler has support to do so.
9a8ab1c39970a4 Daniel Santos 2013-02-21  348   */
9a8ab1c39970a4 Daniel Santos 2013-02-21  349  #define compiletime_assert(condition, msg) \
9a8ab1c39970a4 Daniel Santos 2013-02-21 @350  	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
9a8ab1c39970a4 Daniel Santos 2013-02-21  351  

:::::: The code at line 350 was first introduced by commit
:::::: 9a8ab1c39970a4938a72d94e6fd13be88a797590 bug.h, compiler.h: introduce compiletime_assert & BUILD_BUG_ON_MSG

:::::: TO: Daniel Santos <daniel.santos@pobox.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37669 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-20 15:16   ` kbuild test robot
@ 2019-11-22 18:28     ` Alex Kogan
  2019-11-22 19:29       ` Waiman Long
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Kogan @ 2019-11-22 18:28 UTC (permalink / raw)
  To: kbuild test robot
  Cc: linux-arch, guohanjun, kbuild-all, arnd, peterz, dave.dice,
	jglauber, x86, will.deacon, linux, steven.sistare, linux-kernel,
	rahul.x.yadav, mingo, bp, hpa, longman, tglx, daniel.m.jordan,
	linux-arm-kernel



> On Nov 20, 2019, at 10:16 AM, kbuild test robot <lkp@intel.com> wrote:
> 
> Hi Alex,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v5.4-rc8 next-20191120]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_a_37406982&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=OzzQqg4fTDV55X-y4vbnGeXoJaPHSvO_EfrUQnMVRHc&e= ]
> 
> url:    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_0day-2Dci_linux_commits_Alex-2DKogan_locking-2Dqspinlock-2DRename-2Dmcs-2Dlock-2Dunlock-2Dmacros-2Dand-2Dmake-2Dthem-2Dmore-2Dgeneric_20191109-2D180535&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=uE7ZeYXOFiu09PUVjnCntEe2rR5x_QxS6dEW9twpfok&e= 
> base:   https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=aAKxuXc_c7OF0ffioQfVsIB6H-4Sd9PYxSM7kurm2ig&e=  0058b0a506e40d9a2c62015fe92eb64a44d78cd9
> config: i386-randconfig-f003-20191120 (attached as .config)
> compiler: gcc-7 (Debian 7.4.0-14) 7.4.0
> reproduce:
>        # save the attached .config to linux build tree
>        make ARCH=i386 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> All error/warnings (new ones prefixed by >>):
> 
>   In file included from include/linux/export.h:42:0,
>                    from include/linux/linkage.h:7,
>                    from include/linux/kernel.h:8,
>                    from include/linux/list.h:9,
>                    from include/linux/smp.h:12,
>                    from kernel/locking/qspinlock.c:16:
>   kernel/locking/qspinlock_cna.h: In function 'cna_init_nodes':
>>> include/linux/compiler.h:350:38: error: call to '__compiletime_assert_80' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct cna_node) > sizeof(struct qnode)
>     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>                                         ^
>   include/linux/compiler.h:331:4: note: in definition of macro '__compiletime_assert'
>       prefix ## suffix();    \
>       ^~~~~~
>   include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
>     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>     ^~~~~~~~~~~~~~~~~~~
>   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
>    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>                                        ^~~~~~~~~~~~~~~~~~
>   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
>     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>     ^~~~~~~~~~~~~~~~
>>> kernel/locking/qspinlock_cna.h:80:2: note: in expansion of macro 'BUILD_BUG_ON'
>     BUILD_BUG_ON(sizeof(struct cna_node) > sizeof(struct qnode));
>     ^~~~~~~~~~~~

Consider the following definition of qnode:

struct qnode {
	struct mcs_spinlock mcs;
#if defined(CONFIG_PARAVIRT_SPINLOCKS) || defined(CONFIG_NUMA_AWARE_SPINLOCKS)
	long reserved[2];
#endif
};

and this is how cna_node is defined:

struct cna_node {
	struct mcs_spinlock	mcs;
	int			numa_node;
	u32			encoded_tail;
	u32			pre_scan_result; /* 0, 1, 2 or encoded tail */
	u32			intra_count;
};

Since long is 32 bit on i386, we get the compilation error above.

We can try and squeeze CNA-specific fields into 64 bit on i386 (or any 32bit 
architecture for that matter). Note that an encoded tail pointer requires up 
to 24 bits, and we have two of those. We would want different field encodings 
for 32 vs 64bit architectures, and this all will be quite ugly.

So instead we should probably either change the definition of @reserved in qnode 
to long long, or perhaps disable CNA on 32bit architectures altogether?
I would certainly prefer the former, especially as it requires the least amount 
of code/config changes.

Any objections / thoughts?

Thanks,
— Alex


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-22 18:28     ` Alex Kogan
@ 2019-11-22 19:29       ` Waiman Long
  2019-11-22 19:52         ` Alex Kogan
  0 siblings, 1 reply; 13+ messages in thread
From: Waiman Long @ 2019-11-22 19:29 UTC (permalink / raw)
  To: Alex Kogan, kbuild test robot
  Cc: linux-arch, guohanjun, kbuild-all, arnd, peterz, dave.dice,
	jglauber, x86, will.deacon, linux, linux-kernel, rahul.x.yadav,
	mingo, bp, hpa, steven.sistare, tglx, daniel.m.jordan,
	linux-arm-kernel

On 11/22/19 1:28 PM, Alex Kogan wrote:
>
>> On Nov 20, 2019, at 10:16 AM, kbuild test robot <lkp@intel.com> wrote:
>>
>> Hi Alex,
>>
>> Thank you for the patch! Yet something to improve:
>>
>> [auto build test ERROR on linus/master]
>> [also build test ERROR on v5.4-rc8 next-20191120]
>> [if your patch is applied to the wrong git tree, please drop us a note to help
>> improve the system. BTW, we also suggest to use '--base' option to specify the
>> base tree in git format-patch, please see https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_a_37406982&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=OzzQqg4fTDV55X-y4vbnGeXoJaPHSvO_EfrUQnMVRHc&e= ]
>>
>> url:    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_0day-2Dci_linux_commits_Alex-2DKogan_locking-2Dqspinlock-2DRename-2Dmcs-2Dlock-2Dunlock-2Dmacros-2Dand-2Dmake-2Dthem-2Dmore-2Dgeneric_20191109-2D180535&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=uE7ZeYXOFiu09PUVjnCntEe2rR5x_QxS6dEW9twpfok&e= 
>> base:   https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=aAKxuXc_c7OF0ffioQfVsIB6H-4Sd9PYxSM7kurm2ig&e=  0058b0a506e40d9a2c62015fe92eb64a44d78cd9
>> config: i386-randconfig-f003-20191120 (attached as .config)
>> compiler: gcc-7 (Debian 7.4.0-14) 7.4.0
>> reproduce:
>>        # save the attached .config to linux build tree
>>        make ARCH=i386 
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kbuild test robot <lkp@intel.com>
>>
>> All error/warnings (new ones prefixed by >>):
>>
>>   In file included from include/linux/export.h:42:0,
>>                    from include/linux/linkage.h:7,
>>                    from include/linux/kernel.h:8,
>>                    from include/linux/list.h:9,
>>                    from include/linux/smp.h:12,
>>                    from kernel/locking/qspinlock.c:16:
>>   kernel/locking/qspinlock_cna.h: In function 'cna_init_nodes':
>>>> include/linux/compiler.h:350:38: error: call to '__compiletime_assert_80' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct cna_node) > sizeof(struct qnode)
>>     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>>                                         ^
>>   include/linux/compiler.h:331:4: note: in definition of macro '__compiletime_assert'
>>       prefix ## suffix();    \
>>       ^~~~~~
>>   include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
>>     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>>     ^~~~~~~~~~~~~~~~~~~
>>   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
>>    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>>                                        ^~~~~~~~~~~~~~~~~~
>>   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
>>     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>>     ^~~~~~~~~~~~~~~~
>>>> kernel/locking/qspinlock_cna.h:80:2: note: in expansion of macro 'BUILD_BUG_ON'
>>     BUILD_BUG_ON(sizeof(struct cna_node) > sizeof(struct qnode));
>>     ^~~~~~~~~~~~
> Consider the following definition of qnode:
>
> struct qnode {
> 	struct mcs_spinlock mcs;
> #if defined(CONFIG_PARAVIRT_SPINLOCKS) || defined(CONFIG_NUMA_AWARE_SPINLOCKS)
> 	long reserved[2];
> #endif
> };
>
> and this is how cna_node is defined:
>
> struct cna_node {
> 	struct mcs_spinlock	mcs;
> 	int			numa_node;
> 	u32			encoded_tail;
> 	u32			pre_scan_result; /* 0, 1, 2 or encoded tail */
> 	u32			intra_count;
> };
>
> Since long is 32 bit on i386, we get the compilation error above.
>
> We can try and squeeze CNA-specific fields into 64 bit on i386 (or any 32bit 
> architecture for that matter). Note that an encoded tail pointer requires up 
> to 24 bits, and we have two of those. We would want different field encodings 
> for 32 vs 64bit architectures, and this all will be quite ugly.
>
> So instead we should probably either change the definition of @reserved in qnode 
> to long long, or perhaps disable CNA on 32bit architectures altogether?
> I would certainly prefer the former, especially as it requires the least amount 
> of code/config changes.
>
> Any objections / thoughts?
>
> Thanks,
> — Alex
>
The easy way out is to restrict NUMA qspinlock to 64-bit only. There
aren't that many 32-bit NUMA systems out there that we have to worry about.

Just add "depends on 64BIT" to the config entry.

Cheers,
Longman



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
  2019-11-22 19:29       ` Waiman Long
@ 2019-11-22 19:52         ` Alex Kogan
  0 siblings, 0 replies; 13+ messages in thread
From: Alex Kogan @ 2019-11-22 19:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-arch, guohanjun, kbuild-all, kbuild test robot, arnd,
	peterz, dave.dice, jglauber, x86, will.deacon, linux,
	linux-kernel, rahul.x.yadav, mingo, bp, hpa, steven.sistare,
	tglx, daniel.m.jordan, linux-arm-kernel



> On Nov 22, 2019, at 2:29 PM, Waiman Long <longman@redhat.com> wrote:
> 
> On 11/22/19 1:28 PM, Alex Kogan wrote:
>> 
>>> On Nov 20, 2019, at 10:16 AM, kbuild test robot <lkp@intel.com> wrote:
>>> 
>>> Hi Alex,
>>> 
>>> Thank you for the patch! Yet something to improve:
>>> 
>>> [auto build test ERROR on linus/master]
>>> [also build test ERROR on v5.4-rc8 next-20191120]
>>> [if your patch is applied to the wrong git tree, please drop us a note to help
>>> improve the system. BTW, we also suggest to use '--base' option to specify the
>>> base tree in git format-patch, please see https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_a_37406982&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=OzzQqg4fTDV55X-y4vbnGeXoJaPHSvO_EfrUQnMVRHc&e= ]
>>> 
>>> url:    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_0day-2Dci_linux_commits_Alex-2DKogan_locking-2Dqspinlock-2DRename-2Dmcs-2Dlock-2Dunlock-2Dmacros-2Dand-2Dmake-2Dthem-2Dmore-2Dgeneric_20191109-2D180535&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=uE7ZeYXOFiu09PUVjnCntEe2rR5x_QxS6dEW9twpfok&e= 
>>> base:   https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=BxEt1232ccGlMGDinAB0QAUaTFyl-m5sp4C-crHjpoU&s=aAKxuXc_c7OF0ffioQfVsIB6H-4Sd9PYxSM7kurm2ig&e=  0058b0a506e40d9a2c62015fe92eb64a44d78cd9
>>> config: i386-randconfig-f003-20191120 (attached as .config)
>>> compiler: gcc-7 (Debian 7.4.0-14) 7.4.0
>>> reproduce:
>>>       # save the attached .config to linux build tree
>>>       make ARCH=i386 
>>> 
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kbuild test robot <lkp@intel.com>
>>> 
>>> All error/warnings (new ones prefixed by >>):
>>> 
>>>  In file included from include/linux/export.h:42:0,
>>>                   from include/linux/linkage.h:7,
>>>                   from include/linux/kernel.h:8,
>>>                   from include/linux/list.h:9,
>>>                   from include/linux/smp.h:12,
>>>                   from kernel/locking/qspinlock.c:16:
>>>  kernel/locking/qspinlock_cna.h: In function 'cna_init_nodes':
>>>>> include/linux/compiler.h:350:38: error: call to '__compiletime_assert_80' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct cna_node) > sizeof(struct qnode)
>>>    _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>>>                                        ^
>>>  include/linux/compiler.h:331:4: note: in definition of macro '__compiletime_assert'
>>>      prefix ## suffix();    \
>>>      ^~~~~~
>>>  include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
>>>    _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>>>    ^~~~~~~~~~~~~~~~~~~
>>>  include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
>>>   #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>>>                                       ^~~~~~~~~~~~~~~~~~
>>>  include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
>>>    BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>>>    ^~~~~~~~~~~~~~~~
>>>>> kernel/locking/qspinlock_cna.h:80:2: note: in expansion of macro 'BUILD_BUG_ON'
>>>    BUILD_BUG_ON(sizeof(struct cna_node) > sizeof(struct qnode));
>>>    ^~~~~~~~~~~~
>> Consider the following definition of qnode:
>> 
>> struct qnode {
>> 	struct mcs_spinlock mcs;
>> #if defined(CONFIG_PARAVIRT_SPINLOCKS) || defined(CONFIG_NUMA_AWARE_SPINLOCKS)
>> 	long reserved[2];
>> #endif
>> };
>> 
>> and this is how cna_node is defined:
>> 
>> struct cna_node {
>> 	struct mcs_spinlock	mcs;
>> 	int			numa_node;
>> 	u32			encoded_tail;
>> 	u32			pre_scan_result; /* 0, 1, 2 or encoded tail */
>> 	u32			intra_count;
>> };
>> 
>> Since long is 32 bit on i386, we get the compilation error above.
>> 
>> We can try and squeeze CNA-specific fields into 64 bit on i386 (or any 32bit 
>> architecture for that matter). Note that an encoded tail pointer requires up 
>> to 24 bits, and we have two of those. We would want different field encodings 
>> for 32 vs 64bit architectures, and this all will be quite ugly.
>> 
>> So instead we should probably either change the definition of @reserved in qnode 
>> to long long, or perhaps disable CNA on 32bit architectures altogether?
>> I would certainly prefer the former, especially as it requires the least amount 
>> of code/config changes.
>> 
>> Any objections / thoughts?
>> 
>> Thanks,
>> — Alex
>> 
> The easy way out is to restrict NUMA qspinlock to 64-bit only. There
> aren't that many 32-bit NUMA systems out there that we have to worry about.
> 
> Just add "depends on 64BIT" to the config entry.
Ok, will do.

Thanks,
— Alex
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-22 19:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-07 17:46 [PATCH v6 0/5] Add NUMA-awareness to qspinlock Alex Kogan
2019-11-07 17:46 ` [PATCH v6 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan
2019-11-07 17:46 ` [PATCH v6 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan
2019-11-07 17:46 ` [PATCH v6 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
2019-11-10 21:30   ` kbuild test robot
2019-11-14 20:57     ` Alex Kogan
2019-11-15  0:38       ` Luc Van Oostenryck
2019-11-20 15:16   ` kbuild test robot
2019-11-22 18:28     ` Alex Kogan
2019-11-22 19:29       ` Waiman Long
2019-11-22 19:52         ` Alex Kogan
2019-11-07 17:46 ` [PATCH v6 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan
2019-11-07 17:46 ` [PATCH v6 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).