linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/6] rwsem: performance optimizations
       [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
@ 2013-09-25 22:10 ` Tim Chen
  2013-09-25 22:10 ` [PATCH v6 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-25 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

We fixed a missing file and fixed various style issues for
version 6 of this patchset. We will like to have it merged if
there are no objections.

In this patchset, we introduce two categories of optimizations to read
write semaphore.  The first four patches from Alex Shi reduce cache
bouncing of the sem->count field by doing a pre-read of the sem->count
and avoid cmpxchg if possible.

The last two patches introduce similar optimistic spinning logic as the
mutex code for the writer lock acquisition of rwsem. This addresses the
general 'mutexes out perform writer-rwsems' situations that has been
seen in more than one case.  Users now need not worry about performance
issues when choosing between these two locking mechanisms.

Without these optimizations, Davidlohr Bueso saw a -8% regression to
aim7's shared and high_systime workloads when he switched i_mmap_mutex
to rwsem.  Tests were on 8 socket 80 cores system.  With the patchset,
he got significant improvements to the aim7 suite instead of regressions:

alltests (+16.3%), custom (+20%), disk (+19.5%), high_systime (+7%),
shared (+18.4%) and short (+6.3%).

Tim Chen also got a +5% improvements to exim mail server workload on a
40 core system.

Thanks to Ingo Molnar, Peter Hurley and Peter Zijlstra for reviewing
this patchset.

Regards,
Tim Chen

Changelog:

v6:
1. Fix missing mcslock.h file.
2. Fix various code style issues.

v5:
1. Try optimistic spinning before we put the writer on the wait queue
to avoid bottlenecking at wait queue.  This provides 5% boost to exim workload
and between 2% to 8% boost to aim7. 
2. Put MCS locking code into its own mcslock.h file for better reuse
between mutex.c and rwsem.c
3. Remove the configuration RWSEM_SPIN_ON_WRITE_OWNER and make the 
operations default per Ingo's suggestions.

v4:
1. Fixed a bug in task_struct definition in rwsem_can_spin_on_owner
2. Fix another typo for RWSEM_SPIN_ON_WRITE_OWNER config option

v3:
1. Added ACCESS_ONCE to sem->count access in rwsem_can_spin_on_owner.
2. Fix typo bug for RWSEM_SPIN_ON_WRITE_OWNER option in init/Kconfig

v2:
1. Reorganize changes to down_write_trylock and do_wake into 4 patches and fixed
   a bug referencing &sem->count when sem->count is intended.
2. Fix unsafe sem->owner de-reference in rwsem_can_spin_on_owner.
the option to be on for more seasoning but can be turned off should it be detrimental.
3. Various patch comments update

Alex Shi (4):
  rwsem: check the lock before cpmxchg in down_write_trylock
  rwsem: remove 'out' label in do_wake
  rwsem: remove try_reader_grant label do_wake
  rwsem/wake: check lock before do atomic update

Tim Chen (2):
  MCS Lock: Restructure the MCS lock defines and locking code into its
    own file
  rwsem: do optimistic spinning for writer lock acquisition

 include/asm-generic/rwsem.h |    8 +-
 include/linux/mcslock.h     |   58 +++++++++++
 include/linux/rwsem.h       |    6 +-
 kernel/mutex.c              |   58 ++----------
 kernel/rwsem.c              |   19 ++++-
 lib/rwsem.c                 |  228 +++++++++++++++++++++++++++++++++++++-----
 6 files changed, 292 insertions(+), 85 deletions(-)
 create mode 100644 include/linux/mcslock.h

-- 
1.7.4.4


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v6 1/6] rwsem: check the lock before cpmxchg in down_write_trylock
       [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
  2013-09-25 22:10 ` [PATCH v6 0/6] rwsem: performance optimizations Tim Chen
@ 2013-09-25 22:10 ` Tim Chen
  2013-09-25 22:10 ` [PATCH v6 2/6] rwsem: remove 'out' label in do_wake Tim Chen
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-25 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

Cmpxchg will cause the cacheline bouning when do the value checking,
that cause scalability issue in a large machine (like a 80 core box).

So a lock pre-read can relief this contention.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 include/asm-generic/rwsem.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index bb1e2cd..5ba80e7 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -70,11 +70,11 @@ static inline void __down_write(struct rw_semaphore *sem)
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	long tmp;
+	if (unlikely(sem->count != RWSEM_UNLOCKED_VALUE))
+		return 0;
 
-	tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
+	return cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
+		      RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_UNLOCKED_VALUE;
 }
 
 /*
-- 
1.7.4.4



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v6 2/6] rwsem: remove 'out' label in do_wake
       [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
  2013-09-25 22:10 ` [PATCH v6 0/6] rwsem: performance optimizations Tim Chen
  2013-09-25 22:10 ` [PATCH v6 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
@ 2013-09-25 22:10 ` Tim Chen
  2013-09-25 22:10 ` [PATCH v6 3/6] rwsem: remove try_reader_grant label do_wake Tim Chen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-25 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

That make code simple and more readable.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 lib/rwsem.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 19c5fa9..42f1b1a 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -75,7 +75,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			 * will block as they will notice the queued writer.
 			 */
 			wake_up_process(waiter->task);
-		goto out;
+		return sem;
 	}
 
 	/* Writers might steal the lock before we grant it to the next reader.
@@ -91,7 +91,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			/* A writer stole the lock. Undo our reader grant. */
 			if (rwsem_atomic_update(-adjustment, sem) &
 						RWSEM_ACTIVE_MASK)
-				goto out;
+				return sem;
 			/* Last active locker left. Retry waking readers. */
 			goto try_reader_grant;
 		}
@@ -136,7 +136,6 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	sem->wait_list.next = next;
 	next->prev = &sem->wait_list;
 
- out:
 	return sem;
 }
 
-- 
1.7.4.4



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v6 3/6] rwsem: remove try_reader_grant label do_wake
       [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
                   ` (2 preceding siblings ...)
  2013-09-25 22:10 ` [PATCH v6 2/6] rwsem: remove 'out' label in do_wake Tim Chen
@ 2013-09-25 22:10 ` Tim Chen
  2013-09-25 22:10 ` [PATCH v6 4/6] rwsem/wake: check lock before do atomic update Tim Chen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-25 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

That make code simple and more readable

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 lib/rwsem.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index 42f1b1a..a8055cf 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -85,15 +85,17 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	adjustment = 0;
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
 		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
-		oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/* A writer stole the lock. Undo our reader grant. */
+		while (1) {
+			oldcount = rwsem_atomic_update(adjustment, sem)
+								- adjustment;
+			if (likely(oldcount >= RWSEM_WAITING_BIAS))
+				break;
+
+			 /* A writer stole the lock.  Undo our reader grant. */
 			if (rwsem_atomic_update(-adjustment, sem) &
 						RWSEM_ACTIVE_MASK)
 				return sem;
 			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
 		}
 	}
 
-- 
1.7.4.4



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v6 4/6] rwsem/wake: check lock before do atomic update
       [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
                   ` (3 preceding siblings ...)
  2013-09-25 22:10 ` [PATCH v6 3/6] rwsem: remove try_reader_grant label do_wake Tim Chen
@ 2013-09-25 22:10 ` Tim Chen
  2013-09-25 22:10 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
  2013-09-25 22:10 ` [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
  6 siblings, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-25 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

Atomic update lock and roll back will cause cache bouncing in large
machine. A lock status pre-read can relieve this problem

Suggested-by: Davidlohr bueso <davidlohr.bueso@hp.com>
Suggested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 lib/rwsem.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/lib/rwsem.c b/lib/rwsem.c
index a8055cf..1d6e6e8 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -64,7 +64,7 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	struct rwsem_waiter *waiter;
 	struct task_struct *tsk;
 	struct list_head *next;
-	long oldcount, woken, loop, adjustment;
+	long woken, loop, adjustment;
 
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
 	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
@@ -86,6 +86,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
 		adjustment = RWSEM_ACTIVE_READ_BIAS;
 		while (1) {
+			long oldcount;
+
+			/* A writer stole the lock. */
+			if (unlikely(sem->count < RWSEM_WAITING_BIAS))
+				return sem;
+
 			oldcount = rwsem_atomic_update(adjustment, sem)
 								- adjustment;
 			if (likely(oldcount >= RWSEM_WAITING_BIAS))
-- 
1.7.4.4



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
       [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
                   ` (4 preceding siblings ...)
  2013-09-25 22:10 ` [PATCH v6 4/6] rwsem/wake: check lock before do atomic update Tim Chen
@ 2013-09-25 22:10 ` Tim Chen
  2013-09-26  6:46   ` Ingo Molnar
                     ` (2 more replies)
  2013-09-25 22:10 ` [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
  6 siblings, 3 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-25 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

We will need the MCS lock code for doing optimistic spinning for rwsem.
Extracting the MCS code from mutex.c and put into its own file allow us
to reuse this code easily for rwsem.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/mutex.c          |   58 +++++-----------------------------------------
 2 files changed, 65 insertions(+), 51 deletions(-)
 create mode 100644 include/linux/mcslock.h

diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
new file mode 100644
index 0000000..20fd3f0
--- /dev/null
+++ b/include/linux/mcslock.h
@@ -0,0 +1,58 @@
+/*
+ * MCS lock defines
+ *
+ * This file contains the main data structure and API definitions of MCS lock.
+ */
+#ifndef __LINUX_MCSLOCK_H
+#define __LINUX_MCSLOCK_H
+
+struct mcs_spin_node {
+	struct mcs_spin_node *next;
+	int		  locked;	/* 1 if lock acquired */
+};
+
+/*
+ * We don't inline mcs_spin_lock() so that perf can correctly account for the
+ * time spent in this lock function.
+ */
+static noinline
+void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
+{
+	struct mcs_spin_node *prev;
+
+	/* Init node */
+	node->locked = 0;
+	node->next   = NULL;
+
+	prev = xchg(lock, node);
+	if (likely(prev == NULL)) {
+		/* Lock acquired */
+		node->locked = 1;
+		return;
+	}
+	ACCESS_ONCE(prev->next) = node;
+	smp_wmb();
+	/* Wait until the lock holder passes the lock down */
+	while (!ACCESS_ONCE(node->locked))
+		arch_mutex_cpu_relax();
+}
+
+static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
+{
+	struct mcs_spin_node *next = ACCESS_ONCE(node->next);
+
+	if (likely(!next)) {
+		/*
+		 * Release the lock by setting it to NULL
+		 */
+		if (cmpxchg(lock, node, NULL) == node)
+			return;
+		/* Wait until the next pointer is set */
+		while (!(next = ACCESS_ONCE(node->next)))
+			arch_mutex_cpu_relax();
+	}
+	ACCESS_ONCE(next->locked) = 1;
+	smp_wmb();
+}
+
+#endif
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 6d647ae..1b6ba3f 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -25,6 +25,7 @@
 #include <linux/spinlock.h>
 #include <linux/interrupt.h>
 #include <linux/debug_locks.h>
+#include <linux/mcslock.h>
 
 /*
  * In the DEBUG case we are using the "NULL fastpath" for mutexes,
@@ -111,54 +112,9 @@ EXPORT_SYMBOL(mutex_lock);
  * more or less simultaneously, the spinners need to acquire a MCS lock
  * first before spinning on the owner field.
  *
- * We don't inline mspin_lock() so that perf can correctly account for the
- * time spent in this lock function.
  */
-struct mspin_node {
-	struct mspin_node *next ;
-	int		  locked;	/* 1 if lock acquired */
-};
-#define	MLOCK(mutex)	((struct mspin_node **)&((mutex)->spin_mlock))
 
-static noinline
-void mspin_lock(struct mspin_node **lock, struct mspin_node *node)
-{
-	struct mspin_node *prev;
-
-	/* Init node */
-	node->locked = 0;
-	node->next   = NULL;
-
-	prev = xchg(lock, node);
-	if (likely(prev == NULL)) {
-		/* Lock acquired */
-		node->locked = 1;
-		return;
-	}
-	ACCESS_ONCE(prev->next) = node;
-	smp_wmb();
-	/* Wait until the lock holder passes the lock down */
-	while (!ACCESS_ONCE(node->locked))
-		arch_mutex_cpu_relax();
-}
-
-static void mspin_unlock(struct mspin_node **lock, struct mspin_node *node)
-{
-	struct mspin_node *next = ACCESS_ONCE(node->next);
-
-	if (likely(!next)) {
-		/*
-		 * Release the lock by setting it to NULL
-		 */
-		if (cmpxchg(lock, node, NULL) == node)
-			return;
-		/* Wait until the next pointer is set */
-		while (!(next = ACCESS_ONCE(node->next)))
-			arch_mutex_cpu_relax();
-	}
-	ACCESS_ONCE(next->locked) = 1;
-	smp_wmb();
-}
+#define	MLOCK(mutex)	((struct mcs_spin_node **)&((mutex)->spin_mlock))
 
 /*
  * Mutex spinning code migrated from kernel/sched/core.c
@@ -448,7 +404,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 
 	for (;;) {
 		struct task_struct *owner;
-		struct mspin_node  node;
+		struct mcs_spin_node  node;
 
 		if (!__builtin_constant_p(ww_ctx == NULL) && ww_ctx->acquired > 0) {
 			struct ww_mutex *ww;
@@ -470,10 +426,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		mspin_lock(MLOCK(lock), &node);
+		mcs_spin_lock(MLOCK(lock), &node);
 		owner = ACCESS_ONCE(lock->owner);
 		if (owner && !mutex_spin_on_owner(lock, owner)) {
-			mspin_unlock(MLOCK(lock), &node);
+			mcs_spin_unlock(MLOCK(lock), &node);
 			goto slowpath;
 		}
 
@@ -488,11 +444,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			}
 
 			mutex_set_owner(lock);
-			mspin_unlock(MLOCK(lock), &node);
+			mcs_spin_unlock(MLOCK(lock), &node);
 			preempt_enable();
 			return 0;
 		}
-		mspin_unlock(MLOCK(lock), &node);
+		mcs_spin_unlock(MLOCK(lock), &node);
 
 		/*
 		 * When there's no owner, we might have preempted between the
-- 
1.7.4.4



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition
       [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
                   ` (5 preceding siblings ...)
  2013-09-25 22:10 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
@ 2013-09-25 22:10 ` Tim Chen
  2013-09-26  6:53   ` Ingo Molnar
  6 siblings, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-09-25 22:10 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton
  Cc: Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, Tim Chen, linux-kernel, linux-mm

We want to add optimistic spinning to rwsems because
the writer rwsem does not perform as well as mutexes. Tim noticed that
for exim (mail server) workloads, when reverting commit 4fc3f1d6 and
Davidlohr noticed it when converting the i_mmap_mutex to a rwsem in some
aim7 workloads. We've noticed that the biggest difference
is when we fail to acquire a mutex in the fastpath, optimistic spinning
comes in to play and we can avoid a large amount of unnecessary sleeping
and overhead of moving tasks in and out of wait queue.

Allowing optimistic spinning before putting the writer on the wait queue
reduces wait queue contention and provided greater chance for the rwsem
to get acquired. With these changes, rwsem is on par with mutex.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 include/linux/rwsem.h |    6 +-
 kernel/rwsem.c        |   19 +++++-
 lib/rwsem.c           |  203 ++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 207 insertions(+), 21 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 0616ffe..ef5a83a 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -26,6 +26,8 @@ struct rw_semaphore {
 	long			count;
 	raw_spinlock_t		wait_lock;
 	struct list_head	wait_list;
+	struct task_struct	*owner; /* write owner */
+	void			*spin_mlock;
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map	dep_map;
 #endif
@@ -58,7 +60,9 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 #define __RWSEM_INITIALIZER(name)			\
 	{ RWSEM_UNLOCKED_VALUE,				\
 	  __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock),	\
-	  LIST_HEAD_INIT((name).wait_list)		\
+	  LIST_HEAD_INIT((name).wait_list),		\
+	  NULL,						\
+	  NULL						\
 	  __RWSEM_DEP_MAP_INIT(name) }
 
 #define DECLARE_RWSEM(name) \
diff --git a/kernel/rwsem.c b/kernel/rwsem.c
index cfff143..d74d1c9 100644
--- a/kernel/rwsem.c
+++ b/kernel/rwsem.c
@@ -12,6 +12,16 @@
 
 #include <linux/atomic.h>
 
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	sem->owner = current;
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	sem->owner = NULL;
+}
+
 /*
  * lock for reading
  */
@@ -48,6 +58,7 @@ void __sched down_write(struct rw_semaphore *sem)
 	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write);
@@ -59,8 +70,10 @@ int down_write_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_write_trylock(sem);
 
-	if (ret == 1)
+	if (ret == 1) {
 		rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_);
+		rwsem_set_owner(sem);
+	}
 	return ret;
 }
 
@@ -86,6 +99,7 @@ void up_write(struct rw_semaphore *sem)
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
 
 	__up_write(sem);
+	rwsem_clear_owner(sem);
 }
 
 EXPORT_SYMBOL(up_write);
@@ -100,6 +114,7 @@ void downgrade_write(struct rw_semaphore *sem)
 	 * dependency.
 	 */
 	__downgrade_write(sem);
+	rwsem_clear_owner(sem);
 }
 
 EXPORT_SYMBOL(downgrade_write);
@@ -122,6 +137,7 @@ void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 	rwsem_acquire_nest(&sem->dep_map, 0, 0, nest, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(_down_write_nest_lock);
@@ -141,6 +157,7 @@ void down_write_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write_nested);
diff --git a/lib/rwsem.c b/lib/rwsem.c
index 1d6e6e8..9535ef7 100644
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -10,6 +10,8 @@
 #include <linux/sched.h>
 #include <linux/init.h>
 #include <linux/export.h>
+#include <linux/sched/rt.h>
+#include <linux/mcslock.h>
 
 /*
  * Initialize an rwsem:
@@ -27,6 +29,8 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 	sem->count = RWSEM_UNLOCKED_VALUE;
 	raw_spin_lock_init(&sem->wait_lock);
 	INIT_LIST_HEAD(&sem->wait_list);
+	sem->owner = NULL;
+	sem->spin_mlock = NULL;
 }
 
 EXPORT_SYMBOL(__init_rwsem);
@@ -194,14 +198,179 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 	return sem;
 }
 
+static inline int rwsem_try_write_lock(long count, struct rw_semaphore *sem)
+{
+	if (!(count & RWSEM_ACTIVE_MASK)) {
+		/* Try acquiring the write lock. */
+		if (sem->count == RWSEM_WAITING_BIAS &&
+		    cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
+			    RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
+			if (!list_is_singular(&sem->wait_list))
+				rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Try to acquire write lock before the writer has been put on wait queue.
+ */
+static inline int rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
+{
+	long count;
+
+	count = ACCESS_ONCE(sem->count);
+retry:
+	if (count == RWSEM_WAITING_BIAS) {
+		count = cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
+			    RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS);
+		/* allow write lock stealing, try acquiring the write lock. */
+		if (count == RWSEM_WAITING_BIAS)
+			goto acquired;
+		else if (count == 0)
+			goto retry;
+	} else if (count == 0) {
+		count = cmpxchg(&sem->count, 0, RWSEM_ACTIVE_WRITE_BIAS);
+		if (count == 0)
+			goto acquired;
+		else if (count == RWSEM_WAITING_BIAS)
+			goto retry;
+	}
+	return 0;
+
+acquired:
+	return 1;
+}
+
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+	int retval;
+	struct task_struct *owner;
+
+	rcu_read_lock();
+	owner = ACCESS_ONCE(sem->owner);
+
+	/* Spin only if active writer running */
+	if (owner)
+		retval = owner->on_cpu;
+	else
+		retval = false;
+
+	rcu_read_unlock();
+	/*
+	 * if lock->owner is not set, the sem owner may have just acquired
+	 * it and not set the owner yet, or the sem has been released, or
+	 * reader active.
+	 */
+	return retval;
+}
+
+static inline bool owner_running(struct rw_semaphore *lock,
+				struct task_struct *owner)
+{
+	if (lock->owner != owner)
+		return false;
+
+	/*
+	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
+	 * lock->owner still matches owner, if that fails, owner might
+	 * point to free()d memory, if it still matches, the rcu_read_lock()
+	 * ensures the memory stays valid.
+	 */
+	barrier();
+
+	return owner->on_cpu;
+}
+
+static noinline
+int rwsem_spin_on_owner(struct rw_semaphore *lock, struct task_struct *owner)
+{
+	rcu_read_lock();
+	while (owner_running(lock, owner)) {
+		if (need_resched())
+			break;
+
+		arch_mutex_cpu_relax();
+	}
+	rcu_read_unlock();
+
+	/*
+	 * We break out the loop above on need_resched() or when the
+	 * owner changed, which is a sign for heavy contention. Return
+	 * success only when lock->owner is NULL.
+	 */
+	return lock->owner == NULL;
+}
+
+#define MLOCK(rwsem)    ((struct mcs_spin_node **)&((rwsem)->spin_mlock))
+
+int rwsem_optimistic_spin(struct rw_semaphore *sem)
+{
+	struct task_struct *owner;
+	int ret = 0;
+
+	/* sem->wait_lock should not be held when doing optimistic spinning */
+	if (!rwsem_can_spin_on_owner(sem))
+		return ret;
+
+	preempt_disable();
+	for (;;) {
+		struct mcs_spin_node node;
+
+		mcs_spin_lock(MLOCK(sem), &node);
+		owner = ACCESS_ONCE(sem->owner);
+		if (owner && !rwsem_spin_on_owner(sem, owner)) {
+			mcs_spin_unlock(MLOCK(sem), &node);
+			break;
+		}
+
+		/* wait_lock will be acquired if write_lock is obtained */
+		if (rwsem_try_write_lock_unqueued(sem)) {
+			mcs_spin_unlock(MLOCK(sem), &node);
+			ret = 1;
+			break;
+		}
+		mcs_spin_unlock(MLOCK(sem), &node);
+
+		/*
+		 * When there's no owner, we might have preempted between the
+		 * owner acquiring the lock and setting the owner field. If
+		 * we're an RT task that will live-lock because we won't let
+		 * the owner complete.
+		 */
+		if (!owner && (need_resched() || rt_task(current)))
+			break;
+
+		/*
+		 * The cpu_relax() call is a compiler barrier which forces
+		 * everything in this loop to be re-loaded. We don't need
+		 * memory barriers as we'll eventually observe the right
+		 * values at the cost of a few extra spins.
+		 */
+		arch_mutex_cpu_relax();
+	}
+	preempt_enable();
+
+	return ret;
+}
+
 /*
  * wait until we successfully acquire the write lock
  */
 struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 {
-	long count, adjustment = -RWSEM_ACTIVE_WRITE_BIAS;
+	long count;
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk = current;
+	bool waiting = true;
+
+	/* undo write bias from down_write operation, stop active locking */
+	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
+
+	/* do optimistic spinning and steal lock if possible */
+	if (rwsem_optimistic_spin(sem))
+		goto done;
 
 	/* set up my own style of waitqueue */
 	waiter.task = tsk;
@@ -209,33 +378,28 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
+		waiting = false;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	count = rwsem_atomic_update(adjustment, sem);
+	if (waiting)
+		count = ACCESS_ONCE(sem->count);
+	else
+		count = rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
 
-	/* If there were already threads queued before us and there are no
+	/*
+	 * If there were already threads queued before us and there are no
 	 * active writers, the lock must be read owned; so we try to wake
-	 * any read locks that were queued ahead of us. */
-	if (count > RWSEM_WAITING_BIAS &&
-	    adjustment == -RWSEM_ACTIVE_WRITE_BIAS)
+	 * any read locks that were queued ahead of us.
+	 */
+	if ((count > RWSEM_WAITING_BIAS) && waiting)
 		sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS);
 
 	/* wait until we successfully acquire the lock */
 	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-	while (true) {
-		if (!(count & RWSEM_ACTIVE_MASK)) {
-			/* Try acquiring the write lock. */
-			count = RWSEM_ACTIVE_WRITE_BIAS;
-			if (!list_is_singular(&sem->wait_list))
-				count += RWSEM_WAITING_BIAS;
-
-			if (sem->count == RWSEM_WAITING_BIAS &&
-			    cmpxchg(&sem->count, RWSEM_WAITING_BIAS, count) ==
-							RWSEM_WAITING_BIAS)
-				break;
-		}
+	for (;;) {
+		if (rwsem_try_write_lock(count, sem))
+			break;
 
 		raw_spin_unlock_irq(&sem->wait_lock);
 
@@ -250,6 +414,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
+done:
 	tsk->state = TASK_RUNNING;
 
 	return sem;
-- 
1.7.4.4


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-25 22:10 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
@ 2013-09-26  6:46   ` Ingo Molnar
  2013-09-26  8:40     ` Peter Zijlstra
  2013-09-26 19:27   ` Jason Low
  2013-09-27 15:29   ` Paul E. McKenney
  2 siblings, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-09-26  6:46 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> We will need the MCS lock code for doing optimistic spinning for rwsem.
> Extracting the MCS code from mutex.c and put into its own file allow us
> to reuse this code easily for rwsem.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> ---
>  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/mutex.c          |   58 +++++-----------------------------------------
>  2 files changed, 65 insertions(+), 51 deletions(-)
>  create mode 100644 include/linux/mcslock.h
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> new file mode 100644
> index 0000000..20fd3f0
> --- /dev/null
> +++ b/include/linux/mcslock.h
> @@ -0,0 +1,58 @@
> +/*
> + * MCS lock defines
> + *
> + * This file contains the main data structure and API definitions of MCS lock.

A (very) short blurb about what an MCS lock is would be nice here.

> + */
> +#ifndef __LINUX_MCSLOCK_H
> +#define __LINUX_MCSLOCK_H
> +
> +struct mcs_spin_node {
> +	struct mcs_spin_node *next;
> +	int		  locked;	/* 1 if lock acquired */
> +};

The vertical alignment looks broken here.

> +
> +/*
> + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> + * time spent in this lock function.
> + */
> +static noinline
> +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> +{
> +	struct mcs_spin_node *prev;
> +
> +	/* Init node */
> +	node->locked = 0;
> +	node->next   = NULL;
> +
> +	prev = xchg(lock, node);
> +	if (likely(prev == NULL)) {
> +		/* Lock acquired */
> +		node->locked = 1;
> +		return;
> +	}
> +	ACCESS_ONCE(prev->next) = node;
> +	smp_wmb();
> +	/* Wait until the lock holder passes the lock down */
> +	while (!ACCESS_ONCE(node->locked))
> +		arch_mutex_cpu_relax();
> +}
> +
> +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> +{
> +	struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> +
> +	if (likely(!next)) {
> +		/*
> +		 * Release the lock by setting it to NULL
> +		 */
> +		if (cmpxchg(lock, node, NULL) == node)
> +			return;
> +		/* Wait until the next pointer is set */
> +		while (!(next = ACCESS_ONCE(node->next)))
> +			arch_mutex_cpu_relax();
> +	}
> +	ACCESS_ONCE(next->locked) = 1;
> +	smp_wmb();
> +}
> +
> +#endif

We typically close header guards not via a plain #endif but like this:

#endif /* __LINUX_SPINLOCK_H */

#endif /* __LINUX_SPINLOCK_TYPES_H */

etc.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition
  2013-09-25 22:10 ` [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
@ 2013-09-26  6:53   ` Ingo Molnar
  0 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-09-26  6:53 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm, Linus Torvalds


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> We want to add optimistic spinning to rwsems because
> the writer rwsem does not perform as well as mutexes. Tim noticed that
> for exim (mail server) workloads, when reverting commit 4fc3f1d6 and
> Davidlohr noticed it when converting the i_mmap_mutex to a rwsem in some
> aim7 workloads. We've noticed that the biggest difference
> is when we fail to acquire a mutex in the fastpath, optimistic spinning
> comes in to play and we can avoid a large amount of unnecessary sleeping
> and overhead of moving tasks in and out of wait queue.
> 
> Allowing optimistic spinning before putting the writer on the wait queue
> reduces wait queue contention and provided greater chance for the rwsem
> to get acquired. With these changes, rwsem is on par with mutex.
> 
> Reviewed-by: Ingo Molnar <mingo@elte.hu>
> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> ---
>  include/linux/rwsem.h |    6 +-
>  kernel/rwsem.c        |   19 +++++-
>  lib/rwsem.c           |  203 ++++++++++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 207 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 0616ffe..ef5a83a 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -26,6 +26,8 @@ struct rw_semaphore {
>  	long			count;
>  	raw_spinlock_t		wait_lock;
>  	struct list_head	wait_list;
> +	struct task_struct	*owner; /* write owner */
> +	void			*spin_mlock;

> +#define MLOCK(rwsem)    ((struct mcs_spin_node **)&((rwsem)->spin_mlock))

> +		mcs_spin_lock(MLOCK(sem), &node);

> +			mcs_spin_unlock(MLOCK(sem), &node);

> +			mcs_spin_unlock(MLOCK(sem), &node);

> +		mcs_spin_unlock(MLOCK(sem), &node);

That forced type casting is ugly and fragile.

To avoid having to include mcslock.h into rwsem.h just add a forward 
struct declaration, before the struct rw_semaphore definition:

struct mcs_spin_node;

Then define spin_mlock with the right type:

	struct mcs_spin_node		*spin_mlock;

I'd also suggest renaming 'spin_mlock', to reduce unnecessary variants. If 
the lock type name is 'struct mcs_spin_node' then 'mcs_lock' would be a 
perfect field name, right?

While at it, renaming mcs_spin_node to mcs_spinlock might be wise as well, 
and the include file would be named mcs_spinlock.h.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26  6:46   ` Ingo Molnar
@ 2013-09-26  8:40     ` Peter Zijlstra
  2013-09-26  9:37       ` Ingo Molnar
  2013-09-26 18:18       ` Tim Chen
  0 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-26  8:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Thu, Sep 26, 2013 at 08:46:29AM +0200, Ingo Molnar wrote:
> > +/*
> > + * MCS lock defines
> > + *
> > + * This file contains the main data structure and API definitions of MCS lock.
> 
> A (very) short blurb about what an MCS lock is would be nice here.

A while back I suggested including a link to something like:

http://www.cise.ufl.edu/tr/DOC/REP-1992-71.pdf

Its a fairly concise write-up of the idea; only 6 pages. The sad part
about linking to the web is that links tend to go dead after a while.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26  8:40     ` Peter Zijlstra
@ 2013-09-26  9:37       ` Ingo Molnar
  2013-09-26 18:18       ` Tim Chen
  1 sibling, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-09-26  9:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tim Chen, Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 26, 2013 at 08:46:29AM +0200, Ingo Molnar wrote:
> > > +/*
> > > + * MCS lock defines
> > > + *
> > > + * This file contains the main data structure and API definitions of MCS lock.
> > 
> > A (very) short blurb about what an MCS lock is would be nice here.
> 
> A while back I suggested including a link to something like:
> 
> http://www.cise.ufl.edu/tr/DOC/REP-1992-71.pdf
> 
> Its a fairly concise write-up of the idea; only 6 pages. The sad part 
> about linking to the web is that links tend to go dead after a while.

So what I wanted to see was to add just a few sentences summing up the 
concept - so that people blundering into this file in include/linux/ have 
an idea what it's all about!

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26  8:40     ` Peter Zijlstra
  2013-09-26  9:37       ` Ingo Molnar
@ 2013-09-26 18:18       ` Tim Chen
  1 sibling, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-26 18:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Ingo Molnar, Andrew Morton, Andrea Arcangeli,
	Alex Shi, Andi Kleen, Michel Lespinasse, Davidlohr Bueso,
	Matthew R Wilcox, Dave Hansen, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Thu, 2013-09-26 at 10:40 +0200, Peter Zijlstra wrote:
> On Thu, Sep 26, 2013 at 08:46:29AM +0200, Ingo Molnar wrote:
> > > +/*
> > > + * MCS lock defines
> > > + *
> > > + * This file contains the main data structure and API definitions of MCS lock.
> > 
> > A (very) short blurb about what an MCS lock is would be nice here.
> 
> A while back I suggested including a link to something like:
> 
> http://www.cise.ufl.edu/tr/DOC/REP-1992-71.pdf
> 
> Its a fairly concise write-up of the idea; only 6 pages. The sad part
> about linking to the web is that links tend to go dead after a while.

Link rot is a problem.  If I provide a few details about MCS lock I
think people should be able to google for it.

Tim



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-25 22:10 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
  2013-09-26  6:46   ` Ingo Molnar
@ 2013-09-26 19:27   ` Jason Low
  2013-09-26 20:06     ` Davidlohr Bueso
  2013-09-27 15:29   ` Paul E. McKenney
  2 siblings, 1 reply; 64+ messages in thread
From: Jason Low @ 2013-09-26 19:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> We will need the MCS lock code for doing optimistic spinning for rwsem.
> Extracting the MCS code from mutex.c and put into its own file allow us
> to reuse this code easily for rwsem.
>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> ---
>  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/mutex.c          |   58 +++++-----------------------------------------
>  2 files changed, 65 insertions(+), 51 deletions(-)
>  create mode 100644 include/linux/mcslock.h
>
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> new file mode 100644
> index 0000000..20fd3f0
> --- /dev/null
> +++ b/include/linux/mcslock.h
> @@ -0,0 +1,58 @@
> +/*
> + * MCS lock defines
> + *
> + * This file contains the main data structure and API definitions of MCS lock.
> + */
> +#ifndef __LINUX_MCSLOCK_H
> +#define __LINUX_MCSLOCK_H
> +
> +struct mcs_spin_node {
> +       struct mcs_spin_node *next;
> +       int               locked;       /* 1 if lock acquired */
> +};
> +
> +/*
> + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> + * time spent in this lock function.
> + */
> +static noinline
> +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> +{
> +       struct mcs_spin_node *prev;
> +
> +       /* Init node */
> +       node->locked = 0;
> +       node->next   = NULL;
> +
> +       prev = xchg(lock, node);
> +       if (likely(prev == NULL)) {
> +               /* Lock acquired */
> +               node->locked = 1;

If we don't spin on the local node, is it necessary to set this variable?

> +               return;
> +       }
> +       ACCESS_ONCE(prev->next) = node;
> +       smp_wmb();
> +       /* Wait until the lock holder passes the lock down */
> +       while (!ACCESS_ONCE(node->locked))
> +               arch_mutex_cpu_relax();
> +}
> +
> +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> +{
> +       struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> +
> +       if (likely(!next)) {
> +               /*
> +                * Release the lock by setting it to NULL
> +                */
> +               if (cmpxchg(lock, node, NULL) == node)

And can we make this check likely()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 19:27   ` Jason Low
@ 2013-09-26 20:06     ` Davidlohr Bueso
  2013-09-26 20:23       ` Jason Low
  0 siblings, 1 reply; 64+ messages in thread
From: Davidlohr Bueso @ 2013-09-26 20:06 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> > to reuse this code easily for rwsem.
> >
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > ---
> >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/mutex.c          |   58 +++++-----------------------------------------
> >  2 files changed, 65 insertions(+), 51 deletions(-)
> >  create mode 100644 include/linux/mcslock.h
> >
> > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > new file mode 100644
> > index 0000000..20fd3f0
> > --- /dev/null
> > +++ b/include/linux/mcslock.h
> > @@ -0,0 +1,58 @@
> > +/*
> > + * MCS lock defines
> > + *
> > + * This file contains the main data structure and API definitions of MCS lock.
> > + */
> > +#ifndef __LINUX_MCSLOCK_H
> > +#define __LINUX_MCSLOCK_H
> > +
> > +struct mcs_spin_node {
> > +       struct mcs_spin_node *next;
> > +       int               locked;       /* 1 if lock acquired */
> > +};
> > +
> > +/*
> > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > + * time spent in this lock function.
> > + */
> > +static noinline
> > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > +{
> > +       struct mcs_spin_node *prev;
> > +
> > +       /* Init node */
> > +       node->locked = 0;
> > +       node->next   = NULL;
> > +
> > +       prev = xchg(lock, node);
> > +       if (likely(prev == NULL)) {
> > +               /* Lock acquired */
> > +               node->locked = 1;
> 
> If we don't spin on the local node, is it necessary to set this variable?

I don't follow, the whole idea is to spin on the local variable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 20:06     ` Davidlohr Bueso
@ 2013-09-26 20:23       ` Jason Low
  2013-09-26 20:40         ` Davidlohr Bueso
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Low @ 2013-09-26 20:23 UTC (permalink / raw)
  To: Davidlohr Bueso, Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote:
> On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > to reuse this code easily for rwsem.
> > >
> > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > ---
> > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > >  create mode 100644 include/linux/mcslock.h
> > >
> > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > new file mode 100644
> > > index 0000000..20fd3f0
> > > --- /dev/null
> > > +++ b/include/linux/mcslock.h
> > > @@ -0,0 +1,58 @@
> > > +/*
> > > + * MCS lock defines
> > > + *
> > > + * This file contains the main data structure and API definitions of MCS lock.
> > > + */
> > > +#ifndef __LINUX_MCSLOCK_H
> > > +#define __LINUX_MCSLOCK_H
> > > +
> > > +struct mcs_spin_node {
> > > +       struct mcs_spin_node *next;
> > > +       int               locked;       /* 1 if lock acquired */
> > > +};
> > > +
> > > +/*
> > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > + * time spent in this lock function.
> > > + */
> > > +static noinline
> > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > +{
> > > +       struct mcs_spin_node *prev;
> > > +
> > > +       /* Init node */
> > > +       node->locked = 0;
> > > +       node->next   = NULL;
> > > +
> > > +       prev = xchg(lock, node);
> > > +       if (likely(prev == NULL)) {
> > > +               /* Lock acquired */
> > > +               node->locked = 1;
> > 
> > If we don't spin on the local node, is it necessary to set this variable?
> 
> I don't follow, the whole idea is to spin on the local variable.

If prev == NULL, doesn't that mean it won't proceed to spin on the
variable because the lock is already free and we call return? In that
case where we directly acquire the lock, I was wondering if it is
necessary to set node->locked = 1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 20:23       ` Jason Low
@ 2013-09-26 20:40         ` Davidlohr Bueso
  2013-09-26 21:09           ` Jason Low
  0 siblings, 1 reply; 64+ messages in thread
From: Davidlohr Bueso @ 2013-09-26 20:40 UTC (permalink / raw)
  To: Jason Low
  Cc: Tim Chen, Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote:
> On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote:
> > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > to reuse this code easily for rwsem.
> > > >
> > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > ---
> > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > >  create mode 100644 include/linux/mcslock.h
> > > >
> > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > new file mode 100644
> > > > index 0000000..20fd3f0
> > > > --- /dev/null
> > > > +++ b/include/linux/mcslock.h
> > > > @@ -0,0 +1,58 @@
> > > > +/*
> > > > + * MCS lock defines
> > > > + *
> > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > + */
> > > > +#ifndef __LINUX_MCSLOCK_H
> > > > +#define __LINUX_MCSLOCK_H
> > > > +
> > > > +struct mcs_spin_node {
> > > > +       struct mcs_spin_node *next;
> > > > +       int               locked;       /* 1 if lock acquired */
> > > > +};
> > > > +
> > > > +/*
> > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > + * time spent in this lock function.
> > > > + */
> > > > +static noinline
> > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > +{
> > > > +       struct mcs_spin_node *prev;
> > > > +
> > > > +       /* Init node */
> > > > +       node->locked = 0;
> > > > +       node->next   = NULL;
> > > > +
> > > > +       prev = xchg(lock, node);
> > > > +       if (likely(prev == NULL)) {
> > > > +               /* Lock acquired */
> > > > +               node->locked = 1;
> > > 
> > > If we don't spin on the local node, is it necessary to set this variable?
> > 
> > I don't follow, the whole idea is to spin on the local variable.
> 
> If prev == NULL, doesn't that mean it won't proceed to spin on the
> variable because the lock is already free and we call return? In that
> case where we directly acquire the lock, I was wondering if it is
> necessary to set node->locked = 1.

Yes, that's true, but we need to flag the lock as acquired (the node's
lock is initially set to unlocked), otherwise others trying to acquire
the lock can spin forever:

	/* Wait until the lock holder passes the lock down */
	while (!ACCESS_ONCE(node->locked))
		arch_mutex_cpu_relax();

The ->locked variable in this implementation refers to if the lock is
acquired, and *not* to if busy-waiting is necessary.

Thanks,
Davidlohr


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 20:40         ` Davidlohr Bueso
@ 2013-09-26 21:09           ` Jason Low
  2013-09-26 21:41             ` Tim Chen
  2013-09-26 22:22             ` Davidlohr Bueso
  0 siblings, 2 replies; 64+ messages in thread
From: Jason Low @ 2013-09-26 21:09 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Tim Chen, Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote:
> On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote:
> > On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote:
> > > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> > > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > to reuse this code easily for rwsem.
> > > > >
> > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > ---
> > > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > > >  create mode 100644 include/linux/mcslock.h
> > > > >
> > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > new file mode 100644
> > > > > index 0000000..20fd3f0
> > > > > --- /dev/null
> > > > > +++ b/include/linux/mcslock.h
> > > > > @@ -0,0 +1,58 @@
> > > > > +/*
> > > > > + * MCS lock defines
> > > > > + *
> > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > + */
> > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > +#define __LINUX_MCSLOCK_H
> > > > > +
> > > > > +struct mcs_spin_node {
> > > > > +       struct mcs_spin_node *next;
> > > > > +       int               locked;       /* 1 if lock acquired */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > + * time spent in this lock function.
> > > > > + */
> > > > > +static noinline
> > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > +{
> > > > > +       struct mcs_spin_node *prev;
> > > > > +
> > > > > +       /* Init node */
> > > > > +       node->locked = 0;
> > > > > +       node->next   = NULL;
> > > > > +
> > > > > +       prev = xchg(lock, node);
> > > > > +       if (likely(prev == NULL)) {
> > > > > +               /* Lock acquired */
> > > > > +               node->locked = 1;
> > > > 
> > > > If we don't spin on the local node, is it necessary to set this variable?
> > > 
> > > I don't follow, the whole idea is to spin on the local variable.
> > 
> > If prev == NULL, doesn't that mean it won't proceed to spin on the
> > variable because the lock is already free and we call return? In that
> > case where we directly acquire the lock, I was wondering if it is
> > necessary to set node->locked = 1.
> 
> Yes, that's true, but we need to flag the lock as acquired (the node's
> lock is initially set to unlocked), otherwise others trying to acquire
> the lock can spin forever:
> 
> 	/* Wait until the lock holder passes the lock down */
> 	while (!ACCESS_ONCE(node->locked))
> 		arch_mutex_cpu_relax();
> 
> The ->locked variable in this implementation refers to if the lock is
> acquired, and *not* to if busy-waiting is necessary.

hmm, others threads acquiring the lock will be spinning on their own
local nodes, not this node's node->locked. And if prev == NULL, the
current thread won't be reading it's node->lock either since we return.
So what other thread is going to be reading this node's node->lock?

Thanks,
Jason

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 21:09           ` Jason Low
@ 2013-09-26 21:41             ` Tim Chen
  2013-09-26 22:42               ` Jason Low
  2013-09-26 22:22             ` Davidlohr Bueso
  1 sibling, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-09-26 21:41 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Ingo Molnar, Andrew Morton, Andrea Arcangeli,
	Alex Shi, Andi Kleen, Michel Lespinasse, Davidlohr Bueso,
	Matthew R Wilcox, Dave Hansen, Peter Zijlstra, Rik van Riel,
	Peter Hurley, linux-kernel, linux-mm

On Thu, 2013-09-26 at 14:09 -0700, Jason Low wrote:
> On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote:
> > On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote:
> > > On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote:
> > > > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> > > > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > > to reuse this code easily for rwsem.
> > > > > >
> > > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > > ---
> > > > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > > > >  create mode 100644 include/linux/mcslock.h
> > > > > >
> > > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > > new file mode 100644
> > > > > > index 0000000..20fd3f0
> > > > > > --- /dev/null
> > > > > > +++ b/include/linux/mcslock.h
> > > > > > @@ -0,0 +1,58 @@
> > > > > > +/*
> > > > > > + * MCS lock defines
> > > > > > + *
> > > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > > + */
> > > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > > +#define __LINUX_MCSLOCK_H
> > > > > > +
> > > > > > +struct mcs_spin_node {
> > > > > > +       struct mcs_spin_node *next;
> > > > > > +       int               locked;       /* 1 if lock acquired */
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > > + * time spent in this lock function.
> > > > > > + */
> > > > > > +static noinline
> > > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > > +{
> > > > > > +       struct mcs_spin_node *prev;
> > > > > > +
> > > > > > +       /* Init node */
> > > > > > +       node->locked = 0;
> > > > > > +       node->next   = NULL;
> > > > > > +
> > > > > > +       prev = xchg(lock, node);
> > > > > > +       if (likely(prev == NULL)) {
> > > > > > +               /* Lock acquired */
> > > > > > +               node->locked = 1;
> > > > > 
> > > > > If we don't spin on the local node, is it necessary to set this variable?
> > > > 
> > > > I don't follow, the whole idea is to spin on the local variable.
> > > 
> > > If prev == NULL, doesn't that mean it won't proceed to spin on the
> > > variable because the lock is already free and we call return? In that
> > > case where we directly acquire the lock, I was wondering if it is
> > > necessary to set node->locked = 1.
> > 
> > Yes, that's true, but we need to flag the lock as acquired (the node's
> > lock is initially set to unlocked), otherwise others trying to acquire
> > the lock can spin forever:
> > 
> > 	/* Wait until the lock holder passes the lock down */
> > 	while (!ACCESS_ONCE(node->locked))
> > 		arch_mutex_cpu_relax();
> > 
> > The ->locked variable in this implementation refers to if the lock is
> > acquired, and *not* to if busy-waiting is necessary.
> 
> hmm, others threads acquiring the lock will be spinning on their own
> local nodes, not this node's node->locked. And if prev == NULL, the
> current thread won't be reading it's node->lock either since we return.
> So what other thread is going to be reading this node's node->lock?
> 
> Thanks,
> Jason

I think setting node->locked = 1 for the prev==NULL case is not
necessary functionally, but was done for semantics consistency.

Tim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 21:09           ` Jason Low
  2013-09-26 21:41             ` Tim Chen
@ 2013-09-26 22:22             ` Davidlohr Bueso
  1 sibling, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2013-09-26 22:22 UTC (permalink / raw)
  To: Jason Low
  Cc: Tim Chen, Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Thu, 2013-09-26 at 14:09 -0700, Jason Low wrote:
> On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote:
> > On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote:
> > > On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote:
> > > > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> > > > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > > to reuse this code easily for rwsem.
> > > > > >
> > > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > > ---
> > > > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > > > >  create mode 100644 include/linux/mcslock.h
> > > > > >
> > > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > > new file mode 100644
> > > > > > index 0000000..20fd3f0
> > > > > > --- /dev/null
> > > > > > +++ b/include/linux/mcslock.h
> > > > > > @@ -0,0 +1,58 @@
> > > > > > +/*
> > > > > > + * MCS lock defines
> > > > > > + *
> > > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > > + */
> > > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > > +#define __LINUX_MCSLOCK_H
> > > > > > +
> > > > > > +struct mcs_spin_node {
> > > > > > +       struct mcs_spin_node *next;
> > > > > > +       int               locked;       /* 1 if lock acquired */
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > > + * time spent in this lock function.
> > > > > > + */
> > > > > > +static noinline
> > > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > > +{
> > > > > > +       struct mcs_spin_node *prev;
> > > > > > +
> > > > > > +       /* Init node */
> > > > > > +       node->locked = 0;
> > > > > > +       node->next   = NULL;
> > > > > > +
> > > > > > +       prev = xchg(lock, node);
> > > > > > +       if (likely(prev == NULL)) {
> > > > > > +               /* Lock acquired */
> > > > > > +               node->locked = 1;
> > > > > 
> > > > > If we don't spin on the local node, is it necessary to set this variable?
> > > > 
> > > > I don't follow, the whole idea is to spin on the local variable.
> > > 
> > > If prev == NULL, doesn't that mean it won't proceed to spin on the
> > > variable because the lock is already free and we call return? In that
> > > case where we directly acquire the lock, I was wondering if it is
> > > necessary to set node->locked = 1.
> > 
> > Yes, that's true, but we need to flag the lock as acquired (the node's
> > lock is initially set to unlocked), otherwise others trying to acquire
> > the lock can spin forever:
> > 
> > 	/* Wait until the lock holder passes the lock down */
> > 	while (!ACCESS_ONCE(node->locked))
> > 		arch_mutex_cpu_relax();
> > 
> > The ->locked variable in this implementation refers to if the lock is
> > acquired, and *not* to if busy-waiting is necessary.
> 
> hmm, others threads acquiring the lock will be spinning on their own
> local nodes, not this node's node->locked. 

But we're xchg'ing memory locations, not values - so after the xchg()
call, when we modify node->locked, we're also changing lock, which isn't
local.

> And if prev == NULL, the
> current thread won't be reading it's node->lock either since we return.
> So what other thread is going to be reading this node's node->lock?




> 
> Thanks,
> Jason
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 21:41             ` Tim Chen
@ 2013-09-26 22:42               ` Jason Low
  2013-09-26 22:57                 ` Tim Chen
  2013-10-02 19:19                 ` Waiman Long
  0 siblings, 2 replies; 64+ messages in thread
From: Jason Low @ 2013-09-26 22:42 UTC (permalink / raw)
  To: Tim Chen
  Cc: Davidlohr Bueso, Ingo Molnar, Andrew Morton, Andrea Arcangeli,
	Alex Shi, Andi Kleen, Michel Lespinasse, Davidlohr Bueso,
	Matthew R Wilcox, Dave Hansen, Peter Zijlstra, Rik van Riel,
	Peter Hurley, linux-kernel, linux-mm

On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote:
> On Thu, 2013-09-26 at 14:09 -0700, Jason Low wrote:
> > On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote:
> > > On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote:
> > > > On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote:
> > > > > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> > > > > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > > > to reuse this code easily for rwsem.
> > > > > > >
> > > > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > > > ---
> > > > > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > > > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > > > > >  create mode 100644 include/linux/mcslock.h
> > > > > > >
> > > > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > > > new file mode 100644
> > > > > > > index 0000000..20fd3f0
> > > > > > > --- /dev/null
> > > > > > > +++ b/include/linux/mcslock.h
> > > > > > > @@ -0,0 +1,58 @@
> > > > > > > +/*
> > > > > > > + * MCS lock defines
> > > > > > > + *
> > > > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > > > + */
> > > > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > > > +#define __LINUX_MCSLOCK_H
> > > > > > > +
> > > > > > > +struct mcs_spin_node {
> > > > > > > +       struct mcs_spin_node *next;
> > > > > > > +       int               locked;       /* 1 if lock acquired */
> > > > > > > +};
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > > > + * time spent in this lock function.
> > > > > > > + */
> > > > > > > +static noinline
> > > > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > > > +{
> > > > > > > +       struct mcs_spin_node *prev;
> > > > > > > +
> > > > > > > +       /* Init node */
> > > > > > > +       node->locked = 0;
> > > > > > > +       node->next   = NULL;
> > > > > > > +
> > > > > > > +       prev = xchg(lock, node);
> > > > > > > +       if (likely(prev == NULL)) {
> > > > > > > +               /* Lock acquired */
> > > > > > > +               node->locked = 1;
> > > > > > 
> > > > > > If we don't spin on the local node, is it necessary to set this variable?
> > > > > 
> > > > > I don't follow, the whole idea is to spin on the local variable.
> > > > 
> > > > If prev == NULL, doesn't that mean it won't proceed to spin on the
> > > > variable because the lock is already free and we call return? In that
> > > > case where we directly acquire the lock, I was wondering if it is
> > > > necessary to set node->locked = 1.
> > > 
> > > Yes, that's true, but we need to flag the lock as acquired (the node's
> > > lock is initially set to unlocked), otherwise others trying to acquire
> > > the lock can spin forever:
> > > 
> > > 	/* Wait until the lock holder passes the lock down */
> > > 	while (!ACCESS_ONCE(node->locked))
> > > 		arch_mutex_cpu_relax();
> > > 
> > > The ->locked variable in this implementation refers to if the lock is
> > > acquired, and *not* to if busy-waiting is necessary.
> > 
> > hmm, others threads acquiring the lock will be spinning on their own
> > local nodes, not this node's node->locked. And if prev == NULL, the
> > current thread won't be reading it's node->lock either since we return.
> > So what other thread is going to be reading this node's node->lock?
> > 
> > Thanks,
> > Jason
> 
> I think setting node->locked = 1 for the prev==NULL case is not
> necessary functionally, but was done for semantics consistency.

Okay, that would makes sense for consistency because we always
first set node->lock = 0 at the top of the function.

If we prefer to optimize this a bit though, perhaps we can
first move the node->lock = 0 so that it gets executed after the
"if (likely(prev == NULL)) {}" code block and then delete
"node->lock = 1" inside the code block.

static noinline
void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
{
       struct mcs_spin_node *prev;

       /* Init node */
       node->next   = NULL;

       prev = xchg(lock, node);
       if (likely(prev == NULL)) {
               /* Lock acquired */
               return;
       }
       node->locked = 0;
       ACCESS_ONCE(prev->next) = node;
       smp_wmb();
       /* Wait until the lock holder passes the lock down */
       while (!ACCESS_ONCE(node->locked))
               arch_mutex_cpu_relax();
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 22:42               ` Jason Low
@ 2013-09-26 22:57                 ` Tim Chen
  2013-09-27  6:02                   ` Ingo Molnar
  2013-10-02 19:19                 ` Waiman Long
  1 sibling, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-09-26 22:57 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Ingo Molnar, Andrew Morton, Andrea Arcangeli,
	Alex Shi, Andi Kleen, Michel Lespinasse, Davidlohr Bueso,
	Matthew R Wilcox, Dave Hansen, Peter Zijlstra, Rik van Riel,
	Peter Hurley, linux-kernel, linux-mm

On Thu, 2013-09-26 at 15:42 -0700, Jason Low wrote:
> On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote:
> > On Thu, 2013-09-26 at 14:09 -0700, Jason Low wrote:
> > > On Thu, 2013-09-26 at 13:40 -0700, Davidlohr Bueso wrote:
> > > > On Thu, 2013-09-26 at 13:23 -0700, Jason Low wrote:
> > > > > On Thu, 2013-09-26 at 13:06 -0700, Davidlohr Bueso wrote:
> > > > > > On Thu, 2013-09-26 at 12:27 -0700, Jason Low wrote:
> > > > > > > On Wed, Sep 25, 2013 at 3:10 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > > > > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > > > > to reuse this code easily for rwsem.
> > > > > > > >
> > > > > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > > > > ---
> > > > > > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > > > > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > > > > > >  create mode 100644 include/linux/mcslock.h
> > > > > > > >
> > > > > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..20fd3f0
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/include/linux/mcslock.h
> > > > > > > > @@ -0,0 +1,58 @@
> > > > > > > > +/*
> > > > > > > > + * MCS lock defines
> > > > > > > > + *
> > > > > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > > > > + */
> > > > > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > > > > +#define __LINUX_MCSLOCK_H
> > > > > > > > +
> > > > > > > > +struct mcs_spin_node {
> > > > > > > > +       struct mcs_spin_node *next;
> > > > > > > > +       int               locked;       /* 1 if lock acquired */
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > > > > + * time spent in this lock function.
> > > > > > > > + */
> > > > > > > > +static noinline
> > > > > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > > > > +{
> > > > > > > > +       struct mcs_spin_node *prev;
> > > > > > > > +
> > > > > > > > +       /* Init node */
> > > > > > > > +       node->locked = 0;
> > > > > > > > +       node->next   = NULL;
> > > > > > > > +
> > > > > > > > +       prev = xchg(lock, node);
> > > > > > > > +       if (likely(prev == NULL)) {
> > > > > > > > +               /* Lock acquired */
> > > > > > > > +               node->locked = 1;
> > > > > > > 
> > > > > > > If we don't spin on the local node, is it necessary to set this variable?
> > > > > > 
> > > > > > I don't follow, the whole idea is to spin on the local variable.
> > > > > 
> > > > > If prev == NULL, doesn't that mean it won't proceed to spin on the
> > > > > variable because the lock is already free and we call return? In that
> > > > > case where we directly acquire the lock, I was wondering if it is
> > > > > necessary to set node->locked = 1.
> > > > 
> > > > Yes, that's true, but we need to flag the lock as acquired (the node's
> > > > lock is initially set to unlocked), otherwise others trying to acquire
> > > > the lock can spin forever:
> > > > 
> > > > 	/* Wait until the lock holder passes the lock down */
> > > > 	while (!ACCESS_ONCE(node->locked))
> > > > 		arch_mutex_cpu_relax();
> > > > 
> > > > The ->locked variable in this implementation refers to if the lock is
> > > > acquired, and *not* to if busy-waiting is necessary.
> > > 
> > > hmm, others threads acquiring the lock will be spinning on their own
> > > local nodes, not this node's node->locked. And if prev == NULL, the
> > > current thread won't be reading it's node->lock either since we return.
> > > So what other thread is going to be reading this node's node->lock?
> > > 
> > > Thanks,
> > > Jason
> > 
> > I think setting node->locked = 1 for the prev==NULL case is not
> > necessary functionally, but was done for semantics consistency.
> 
> Okay, that would makes sense for consistency because we always
> first set node->lock = 0 at the top of the function.
> 
> If we prefer to optimize this a bit though, perhaps we can
> first move the node->lock = 0 so that it gets executed after the
> "if (likely(prev == NULL)) {}" code block and then delete
> "node->lock = 1" inside the code block.

I suppose we can save one single assignment.
The gain is probably not noticeable as once we set node->next to NULL,
node->locked is likely in local cache line and the assignment operation
is cheap.

Regards,
Tim

> 
> static noinline
> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> {
>        struct mcs_spin_node *prev;
> 
>        /* Init node */
>        node->next   = NULL;
> 
>        prev = xchg(lock, node);
>        if (likely(prev == NULL)) {
>                /* Lock acquired */
>                return;
>        }
>        node->locked = 0;
>        ACCESS_ONCE(prev->next) = node;
>        smp_wmb();
>        /* Wait until the lock holder passes the lock down */
>        while (!ACCESS_ONCE(node->locked))
>                arch_mutex_cpu_relax();
> }
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 22:57                 ` Tim Chen
@ 2013-09-27  6:02                   ` Ingo Molnar
  2013-09-27  6:26                     ` Jason Low
                                       ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-09-27  6:02 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> > If we prefer to optimize this a bit though, perhaps we can first move 
> > the node->lock = 0 so that it gets executed after the "if (likely(prev 
> > == NULL)) {}" code block and then delete "node->lock = 1" inside the 
> > code block.
> 
> I suppose we can save one single assignment. The gain is probably not 
> noticeable as once we set node->next to NULL, node->locked is likely in 
> local cache line and the assignment operation is cheap.

Would be nice to have this as a separate, add-on patch. Every single 
instruction removal that has no downside is an upside!

You can add a comment that explains it.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27  6:02                   ` Ingo Molnar
@ 2013-09-27  6:26                     ` Jason Low
  2013-09-27 11:23                     ` Peter Zijlstra
  2013-09-27 16:12                     ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Jason Low
  2 siblings, 0 replies; 64+ messages in thread
From: Jason Low @ 2013-09-27  6:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 08:02 +0200, Ingo Molnar wrote:
> * Tim Chen <tim.c.chen@linux.intel.com> wrote:
> 
> > > If we prefer to optimize this a bit though, perhaps we can first move 
> > > the node->lock = 0 so that it gets executed after the "if (likely(prev 
> > > == NULL)) {}" code block and then delete "node->lock = 1" inside the 
> > > code block.
> > 
> > I suppose we can save one single assignment. The gain is probably not 
> > noticeable as once we set node->next to NULL, node->locked is likely in 
> > local cache line and the assignment operation is cheap.
> 
> Would be nice to have this as a separate, add-on patch. Every single 
> instruction removal that has no downside is an upside!
> 
> You can add a comment that explains it.

Yup, especially a spin lock (and one that I have found to be be used
very frequently when running workloads on big machines).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27  6:02                   ` Ingo Molnar
  2013-09-27  6:26                     ` Jason Low
@ 2013-09-27 11:23                     ` Peter Zijlstra
  2013-09-27 13:44                       ` Joe Perches
  2013-09-27 16:12                     ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Jason Low
  2 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-27 11:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Rik van Riel,
	Peter Hurley, linux-kernel, linux-mm, Joe Perches

On Fri, Sep 27, 2013 at 08:02:13AM +0200, Ingo Molnar wrote:
> Would be nice to have this as a separate, add-on patch. Every single 
> instruction removal that has no downside is an upside!
> 
> You can add a comment that explains it.

If someone is going to do add-on patches to the mcslock.h file, please
also consider doing a patch that adds comments to the memory barriers in
there.

Also, checkpatch.pl should really warn about that; and it appears there
code in there for that; however:

# grep -C3 smp_mb scripts/checkpatch.pl 
                        }
                }
# check for memory barriers without a comment.
                if ($line =~ /\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/) {
                        if (!ctx_has_comment($first_line, $linenr)) {
                                CHK("MEMORY_BARRIER",
                                    "memory barrier without comment\n" . $herecurr);
# grep -C3 smp_wmb kernel/mutex.c
                return;
        }
        ACCESS_ONCE(prev->next) = node;
        smp_wmb();
        /* Wait until the lock holder passes the lock down */
        while (!ACCESS_ONCE(node->locked))
                arch_mutex_cpu_relax();
--
                        arch_mutex_cpu_relax();
        }
        ACCESS_ONCE(next->locked) = 1;
        smp_wmb();
}

/*
# scripts/checkpatch.pl -f kernel/mutex.c 2>&1 | grep memory
#

so that appears to be completely broken :/

Joe, any clue what's up with that?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 11:23                     ` Peter Zijlstra
@ 2013-09-27 13:44                       ` Joe Perches
  2013-09-27 13:48                         ` Peter Zijlstra
  0 siblings, 1 reply; 64+ messages in thread
From: Joe Perches @ 2013-09-27 13:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 13:23 +0200, Peter Zijlstra wrote:
> checkpatch.pl should really warn about that; and it appears there
> code in there for that; however:
> 
> # grep -C3 smp_mb scripts/checkpatch.pl 
[]
> # check for memory barriers without a comment.
>                 if ($line =~ /\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/) {
>                         if (!ctx_has_comment($first_line, $linenr)) {
>                                 CHK("MEMORY_BARRIER",
>                                     "memory barrier without comment\n" . $herecurr);
[]
> # scripts/checkpatch.pl -f kernel/mutex.c 2>&1 | grep memory
> #
> 
> so that appears to be completely broken :/
> 
> Joe, any clue what's up with that?

It's a CHK test, so it's only tested with --strict

$ scripts/checkpatch.pl -f --strict kernel/mutex.c 2>&1 | grep memory
CHECK: memory barrier without comment
CHECK: memory barrier without comment

It could be changed to WARN so it's always on.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 13:44                       ` Joe Perches
@ 2013-09-27 13:48                         ` Peter Zijlstra
  2013-09-27 14:05                           ` Joe Perches
  2013-09-27 14:14                           ` [PATCH] checkpatch: Make the memory barrier test noisier Joe Perches
  0 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-27 13:48 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 06:44:55AM -0700, Joe Perches wrote:
> It's a CHK test, so it's only tested with --strict
> 
> $ scripts/checkpatch.pl -f --strict kernel/mutex.c 2>&1 | grep memory
> CHECK: memory barrier without comment
> CHECK: memory barrier without comment
> 
> It could be changed to WARN so it's always on.

Yes please, we can't be too careful with memory barriers.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 13:48                         ` Peter Zijlstra
@ 2013-09-27 14:05                           ` Joe Perches
  2013-09-27 14:18                             ` Peter Zijlstra
  2013-09-27 14:14                           ` [PATCH] checkpatch: Make the memory barrier test noisier Joe Perches
  1 sibling, 1 reply; 64+ messages in thread
From: Joe Perches @ 2013-09-27 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 15:48 +0200, Peter Zijlstra wrote:
> On Fri, Sep 27, 2013 at 06:44:55AM -0700, Joe Perches wrote:
> > It's a CHK test, so it's only tested with --strict
> > 
> > $ scripts/checkpatch.pl -f --strict kernel/mutex.c 2>&1 | grep memory
> > CHECK: memory barrier without comment
> > CHECK: memory barrier without comment
> > 
> > It could be changed to WARN so it's always on.
> 
> Yes please, we can't be too careful with memory barriers.

I'll send the patch separately.

It seems a pretty noisy test.
There are 13 hits just in arch/x86/kernel/

$ ./scripts/checkpatch.pl -f arch/x86/kernel/*.c | grep -A3 "memory barrier"
WARNING: memory barrier without comment
#685: FILE: x86/kernel/alternative.c:685:
+	smp_wmb();

--
WARNING: memory barrier without comment
#401: FILE: x86/kernel/kvm.c:401:
+		rmb();

WARNING: memory barrier without comment
#403: FILE: x86/kernel/kvm.c:403:
+		rmb();

--
WARNING: memory barrier without comment
#702: FILE: x86/kernel/kvm.c:702:
+	smp_wmb();

WARNING: memory barrier without comment
#704: FILE: x86/kernel/kvm.c:704:
+	smp_wmb();

--
WARNING: memory barrier without comment
#62: FILE: x86/kernel/ldt.c:62:
+	wmb();

WARNING: memory barrier without comment
#64: FILE: x86/kernel/ldt.c:64:
+	wmb();

--
WARNING: memory barrier without comment
#204: FILE: x86/kernel/smpboot.c:204:
+	wmb();

WARNING: memory barrier without comment
#265: FILE: x86/kernel/smpboot.c:265:
+	wmb();

--
WARNING: memory barrier without comment
#557: FILE: x86/kernel/smpboot.c:557:
+	mb();

--
WARNING: memory barrier without comment
#1065: FILE: x86/kernel/smpboot.c:1065:
+	mb();

--
WARNING: memory barrier without comment
#1321: FILE: x86/kernel/smpboot.c:1321:
+	mb();

WARNING: memory barrier without comment
#1399: FILE: x86/kernel/smpboot.c:1399:
+		mb();



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 13:48                         ` Peter Zijlstra
  2013-09-27 14:05                           ` Joe Perches
@ 2013-09-27 14:14                           ` Joe Perches
  2013-09-27 14:26                             ` Peter Zijlstra
  1 sibling, 1 reply; 64+ messages in thread
From: Joe Perches @ 2013-09-27 14:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

Peter Zijlstra prefers that comments be required near uses
of memory barriers.

Change the message level for memory barrier uses from a
--strict test only to a normal WARN so it's always emitted.

This might produce false positives around insertions of
memory barriers when a comment is outside the patch context
block.

And checkpatch is still stupid, it only looks for existence
of any comment, not at the comment content.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joe Perches <joe@perches.com>
---
 scripts/checkpatch.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index c03e427..bd4103a 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3816,8 +3816,8 @@ sub string_find_replace {
 # check for memory barriers without a comment.
 		if ($line =~ /\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/) {
 			if (!ctx_has_comment($first_line, $linenr)) {
-				CHK("MEMORY_BARRIER",
-				    "memory barrier without comment\n" . $herecurr);
+				WARN("MEMORY_BARRIER",
+				     "memory barrier without comment\n" . $herecurr);
 			}
 		}
 # check of hardware specific defines



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 14:05                           ` Joe Perches
@ 2013-09-27 14:18                             ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-27 14:18 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 07:05:00AM -0700, Joe Perches wrote:
> On Fri, 2013-09-27 at 15:48 +0200, Peter Zijlstra wrote:
> > On Fri, Sep 27, 2013 at 06:44:55AM -0700, Joe Perches wrote:
> > > It's a CHK test, so it's only tested with --strict
> > > 
> > > $ scripts/checkpatch.pl -f --strict kernel/mutex.c 2>&1 | grep memory
> > > CHECK: memory barrier without comment
> > > CHECK: memory barrier without comment
> > > 
> > > It could be changed to WARN so it's always on.
> > 
> > Yes please, we can't be too careful with memory barriers.
> 
> I'll send the patch separately.
> 
> It seems a pretty noisy test.
> There are 13 hits just in arch/x86/kernel/

Urgh. that wants fixing. We really need to stop getting more and that's
where checkpatch is good at.

At very very bare minimum the comment should mention where the pairing
barrier is; but ideally it should describe the actual ordering.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 14:14                           ` [PATCH] checkpatch: Make the memory barrier test noisier Joe Perches
@ 2013-09-27 14:26                             ` Peter Zijlstra
  2013-09-27 14:34                               ` Joe Perches
  0 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-27 14:26 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 07:14:17AM -0700, Joe Perches wrote:
> Peter Zijlstra prefers that comments be required near uses
> of memory barriers.
> 
> Change the message level for memory barrier uses from a
> --strict test only to a normal WARN so it's always emitted.
> 
> This might produce false positives around insertions of
> memory barriers when a comment is outside the patch context
> block.

One would argue that in that case they're too far away in any case :-)

> And checkpatch is still stupid, it only looks for existence
> of any comment, not at the comment content.

Could we try and alleviate this by giving a slightly more verbose
warning?

Maybe something like:

 memory barrier without comment; please refer to the pairing barrier and
 describe the ordering requirements.

> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Joe Perches <joe@perches.com>

Acked-by: Peter Zijlstra <peterz@infradead.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 14:26                             ` Peter Zijlstra
@ 2013-09-27 14:34                               ` Joe Perches
  2013-09-27 14:50                                 ` Peter Zijlstra
  0 siblings, 1 reply; 64+ messages in thread
From: Joe Perches @ 2013-09-27 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 16:26 +0200, Peter Zijlstra wrote:
> On Fri, Sep 27, 2013 at 07:14:17AM -0700, Joe Perches wrote:
> > Peter Zijlstra prefers that comments be required near uses
> > of memory barriers.
> > 
> > Change the message level for memory barrier uses from a
> > --strict test only to a normal WARN so it's always emitted.
> > 
> > This might produce false positives around insertions of
> > memory barriers when a comment is outside the patch context
> > block.
> 
> One would argue that in that case they're too far away in any case :-)
> 
> > And checkpatch is still stupid, it only looks for existence
> > of any comment, not at the comment content.
> 
> Could we try and alleviate this by giving a slightly more verbose
> warning?

> Maybe something like:
> 
>  memory barrier without comment; please refer to the pairing barrier and
>  describe the ordering requirements.

That would make it seem as if all barriers are SMP no?

Maybe just refer to Documentation/memory-barriers.txt
and/or say something like "please document appropriately"


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 14:34                               ` Joe Perches
@ 2013-09-27 14:50                                 ` Peter Zijlstra
  2013-09-27 15:17                                   ` Paul E. McKenney
  2013-09-27 23:40                                   ` Oliver Neukum
  0 siblings, 2 replies; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-27 14:50 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso, Ingo Molnar,
	Andrew Morton, Andrea Arcangeli, Alex Shi, Andi Kleen,
	Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 07:34:55AM -0700, Joe Perches wrote:
> That would make it seem as if all barriers are SMP no?

I would think any memory barrier is ordering against someone else; if
not smp then a device/hardware -- like for instance the hardware page
table walker.

Barriers are fundamentally about order; and order only makes sense if
there's more than 1 party to the game.

> Maybe just refer to Documentation/memory-barriers.txt
> and/or say something like "please document appropriately"

Documentation/memory-barriers.txt is always good; appropriately doesn't
seem to quantify anything much at all. Someone might think:

/*  */
smp_mb();

appropriate... 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 14:50                                 ` Peter Zijlstra
@ 2013-09-27 15:17                                   ` Paul E. McKenney
  2013-09-27 15:34                                     ` Peter Zijlstra
  2013-09-27 23:40                                   ` Oliver Neukum
  1 sibling, 1 reply; 64+ messages in thread
From: Paul E. McKenney @ 2013-09-27 15:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joe Perches, Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso,
	Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 04:50:07PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 27, 2013 at 07:34:55AM -0700, Joe Perches wrote:
> > That would make it seem as if all barriers are SMP no?
> 
> I would think any memory barrier is ordering against someone else; if
> not smp then a device/hardware -- like for instance the hardware page
> table walker.
> 
> Barriers are fundamentally about order; and order only makes sense if
> there's more than 1 party to the game.

Oddly enough, there is one exception that proves the rule...  On Itanium,
suppose we have the following code, with x initially equal to zero:

CPU 1: ACCESS_ONCE(x) = 1;

CPU 2: r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(x);

Itanium architects have told me that it really is possible for CPU 2 to
see r1==1 and r2==0.  Placing a memory barrier between CPU 2's pair of
fetches prevents this, but without any other memory barrier to pair with.

> > Maybe just refer to Documentation/memory-barriers.txt
> > and/or say something like "please document appropriately"
> 
> Documentation/memory-barriers.txt is always good; appropriately doesn't
> seem to quantify anything much at all. Someone might think:
> 
> /*  */
> smp_mb();
> 
> appropriate... 

I end up doing this:

/* */
smp_mb(); /* See above block comment. */

But it would be nice for the prior comment to be recognized as belonging
to the memory barrier without the additional "See above" comment.

In any case, please feel free to add:

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

to the original checkpatch.pl patch.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-25 22:10 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
  2013-09-26  6:46   ` Ingo Molnar
  2013-09-26 19:27   ` Jason Low
@ 2013-09-27 15:29   ` Paul E. McKenney
  2013-09-27 18:09     ` Tim Chen
  2013-09-27 19:38     ` Tim Chen
  2 siblings, 2 replies; 64+ messages in thread
From: Paul E. McKenney @ 2013-09-27 15:29 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> We will need the MCS lock code for doing optimistic spinning for rwsem.
> Extracting the MCS code from mutex.c and put into its own file allow us
> to reuse this code easily for rwsem.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> ---
>  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/mutex.c          |   58 +++++-----------------------------------------
>  2 files changed, 65 insertions(+), 51 deletions(-)
>  create mode 100644 include/linux/mcslock.h
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> new file mode 100644
> index 0000000..20fd3f0
> --- /dev/null
> +++ b/include/linux/mcslock.h
> @@ -0,0 +1,58 @@
> +/*
> + * MCS lock defines
> + *
> + * This file contains the main data structure and API definitions of MCS lock.
> + */
> +#ifndef __LINUX_MCSLOCK_H
> +#define __LINUX_MCSLOCK_H
> +
> +struct mcs_spin_node {
> +	struct mcs_spin_node *next;
> +	int		  locked;	/* 1 if lock acquired */
> +};
> +
> +/*
> + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> + * time spent in this lock function.
> + */
> +static noinline
> +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> +{
> +	struct mcs_spin_node *prev;
> +
> +	/* Init node */
> +	node->locked = 0;
> +	node->next   = NULL;
> +
> +	prev = xchg(lock, node);
> +	if (likely(prev == NULL)) {
> +		/* Lock acquired */
> +		node->locked = 1;
> +		return;
> +	}
> +	ACCESS_ONCE(prev->next) = node;
> +	smp_wmb();
> +	/* Wait until the lock holder passes the lock down */
> +	while (!ACCESS_ONCE(node->locked))
> +		arch_mutex_cpu_relax();
> +}
> +
> +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> +{
> +	struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> +
> +	if (likely(!next)) {
> +		/*
> +		 * Release the lock by setting it to NULL
> +		 */
> +		if (cmpxchg(lock, node, NULL) == node)
> +			return;
> +		/* Wait until the next pointer is set */
> +		while (!(next = ACCESS_ONCE(node->next)))
> +			arch_mutex_cpu_relax();
> +	}
> +	ACCESS_ONCE(next->locked) = 1;
> +	smp_wmb();

Shouldn't the memory barrier precede the "ACCESS_ONCE(next->locked) = 1;"?
Maybe in an "else" clause of the prior "if" statement, given that the
cmpxchg() does it otherwise.

Otherwise, in the case where the "if" conditionn is false, the critical
section could bleed out past the unlock.

							Thanx, Paul

> +}
> +
> +#endif
> diff --git a/kernel/mutex.c b/kernel/mutex.c
> index 6d647ae..1b6ba3f 100644
> --- a/kernel/mutex.c
> +++ b/kernel/mutex.c
> @@ -25,6 +25,7 @@
>  #include <linux/spinlock.h>
>  #include <linux/interrupt.h>
>  #include <linux/debug_locks.h>
> +#include <linux/mcslock.h>
>  
>  /*
>   * In the DEBUG case we are using the "NULL fastpath" for mutexes,
> @@ -111,54 +112,9 @@ EXPORT_SYMBOL(mutex_lock);
>   * more or less simultaneously, the spinners need to acquire a MCS lock
>   * first before spinning on the owner field.
>   *
> - * We don't inline mspin_lock() so that perf can correctly account for the
> - * time spent in this lock function.
>   */
> -struct mspin_node {
> -	struct mspin_node *next ;
> -	int		  locked;	/* 1 if lock acquired */
> -};
> -#define	MLOCK(mutex)	((struct mspin_node **)&((mutex)->spin_mlock))
>  
> -static noinline
> -void mspin_lock(struct mspin_node **lock, struct mspin_node *node)
> -{
> -	struct mspin_node *prev;
> -
> -	/* Init node */
> -	node->locked = 0;
> -	node->next   = NULL;
> -
> -	prev = xchg(lock, node);
> -	if (likely(prev == NULL)) {
> -		/* Lock acquired */
> -		node->locked = 1;
> -		return;
> -	}
> -	ACCESS_ONCE(prev->next) = node;
> -	smp_wmb();
> -	/* Wait until the lock holder passes the lock down */
> -	while (!ACCESS_ONCE(node->locked))
> -		arch_mutex_cpu_relax();
> -}
> -
> -static void mspin_unlock(struct mspin_node **lock, struct mspin_node *node)
> -{
> -	struct mspin_node *next = ACCESS_ONCE(node->next);
> -
> -	if (likely(!next)) {
> -		/*
> -		 * Release the lock by setting it to NULL
> -		 */
> -		if (cmpxchg(lock, node, NULL) == node)
> -			return;
> -		/* Wait until the next pointer is set */
> -		while (!(next = ACCESS_ONCE(node->next)))
> -			arch_mutex_cpu_relax();
> -	}
> -	ACCESS_ONCE(next->locked) = 1;
> -	smp_wmb();
> -}
> +#define	MLOCK(mutex)	((struct mcs_spin_node **)&((mutex)->spin_mlock))
>  
>  /*
>   * Mutex spinning code migrated from kernel/sched/core.c
> @@ -448,7 +404,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  
>  	for (;;) {
>  		struct task_struct *owner;
> -		struct mspin_node  node;
> +		struct mcs_spin_node  node;
>  
>  		if (!__builtin_constant_p(ww_ctx == NULL) && ww_ctx->acquired > 0) {
>  			struct ww_mutex *ww;
> @@ -470,10 +426,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		 * If there's an owner, wait for it to either
>  		 * release the lock or go to sleep.
>  		 */
> -		mspin_lock(MLOCK(lock), &node);
> +		mcs_spin_lock(MLOCK(lock), &node);
>  		owner = ACCESS_ONCE(lock->owner);
>  		if (owner && !mutex_spin_on_owner(lock, owner)) {
> -			mspin_unlock(MLOCK(lock), &node);
> +			mcs_spin_unlock(MLOCK(lock), &node);
>  			goto slowpath;
>  		}
>  
> @@ -488,11 +444,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  			}
>  
>  			mutex_set_owner(lock);
> -			mspin_unlock(MLOCK(lock), &node);
> +			mcs_spin_unlock(MLOCK(lock), &node);
>  			preempt_enable();
>  			return 0;
>  		}
> -		mspin_unlock(MLOCK(lock), &node);
> +		mcs_spin_unlock(MLOCK(lock), &node);
>  
>  		/*
>  		 * When there's no owner, we might have preempted between the
> -- 
> 1.7.4.4
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 15:17                                   ` Paul E. McKenney
@ 2013-09-27 15:34                                     ` Peter Zijlstra
  2013-09-27 16:04                                       ` Paul E. McKenney
  0 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-27 15:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joe Perches, Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso,
	Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 08:17:50AM -0700, Paul E. McKenney wrote:
> > Barriers are fundamentally about order; and order only makes sense if
> > there's more than 1 party to the game.
> 
> Oddly enough, there is one exception that proves the rule...  On Itanium,
> suppose we have the following code, with x initially equal to zero:
> 
> CPU 1: ACCESS_ONCE(x) = 1;
> 
> CPU 2: r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(x);
> 
> Itanium architects have told me that it really is possible for CPU 2 to
> see r1==1 and r2==0.  Placing a memory barrier between CPU 2's pair of
> fetches prevents this, but without any other memory barrier to pair with.

Oh man.. its really past time to sink that itanic already.

I suppose it allows the cpu to reorder the reads in its pipeline and the
memory barrier disallows this. Curious.. does our memory-barriers.txt
file mention this 'fun' fact?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 15:34                                     ` Peter Zijlstra
@ 2013-09-27 16:04                                       ` Paul E. McKenney
  0 siblings, 0 replies; 64+ messages in thread
From: Paul E. McKenney @ 2013-09-27 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joe Perches, Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso,
	Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm,
	tony.luck, fenghua.yu, linux-ia64

On Fri, Sep 27, 2013 at 05:34:34PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 27, 2013 at 08:17:50AM -0700, Paul E. McKenney wrote:
> > > Barriers are fundamentally about order; and order only makes sense if
> > > there's more than 1 party to the game.
> > 
> > Oddly enough, there is one exception that proves the rule...  On Itanium,
> > suppose we have the following code, with x initially equal to zero:
> > 
> > CPU 1: ACCESS_ONCE(x) = 1;
> > 
> > CPU 2: r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(x);
> > 
> > Itanium architects have told me that it really is possible for CPU 2 to
> > see r1==1 and r2==0.  Placing a memory barrier between CPU 2's pair of
> > fetches prevents this, but without any other memory barrier to pair with.
> 
> Oh man.. its really past time to sink that itanic already.
> 
> I suppose it allows the cpu to reorder the reads in its pipeline and the
> memory barrier disallows this. Curious.. does our memory-barriers.txt
> file mention this 'fun' fact?

Probably not.  I was recently reminded of it by some people on the C++
standards committee.  I had first heard of it about 5 years ago, but
hadn't heard definitively until quite recently.

I defer to the Itanium maintainers to actually make the required changes,
should they choose to do so.  I suppose that one way to handle it in the
Linux kernel would be to make ACCESS_ONCE() be architecture specific,
with Itanium placing a memory barrier either before or after --- either
would work.  But since Itanium seems to run Linux reliably, I am guessing
that the probability of misordering is quite low.  But again, the ball
is firmly in the Itanium maintainers' courts, and I have added them on CC.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27  6:02                   ` Ingo Molnar
  2013-09-27  6:26                     ` Jason Low
  2013-09-27 11:23                     ` Peter Zijlstra
@ 2013-09-27 16:12                     ` Jason Low
  2013-09-27 16:19                       ` Tim Chen
  2 siblings, 1 reply; 64+ messages in thread
From: Jason Low @ 2013-09-27 16:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 08:02 +0200, Ingo Molnar wrote:
> Would be nice to have this as a separate, add-on patch. Every single 
> instruction removal that has no downside is an upside!

Okay, so here is a patch. Tim, would you like to add this to v7?

...
Subject: MCS lock: Remove and reorder unnecessary assignments in mcs_spin_lock()

In mcs_spin_lock(), if (likely(prev == NULL)) is true, then the lock is free
and we won't spin on the local node. In that case, we don't have to assign
node->locked because it won't be used. We can also move the node->locked = 0
assignment so that it occurs after the if (likely(prev == NULL)) check.

This might also help make it clearer as to how the node->locked variable
is used in MCS locks.

Signed-off-by: Jason Low <jason.low2@hp.com>
---
 include/linux/mcslock.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
index 20fd3f0..1167d57 100644
--- a/include/linux/mcslock.h
+++ b/include/linux/mcslock.h
@@ -21,15 +21,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
 	struct mcs_spin_node *prev;
 
 	/* Init node */
-	node->locked = 0;
 	node->next   = NULL;
 
 	prev = xchg(lock, node);
 	if (likely(prev == NULL)) {
 		/* Lock acquired */
-		node->locked = 1;
 		return;
 	}
+	node->locked = 0;
 	ACCESS_ONCE(prev->next) = node;
 	smp_wmb();
 	/* Wait until the lock holder passes the lock down */
-- 
1.7.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 16:12                     ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Jason Low
@ 2013-09-27 16:19                       ` Tim Chen
  0 siblings, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-27 16:19 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 09:12 -0700, Jason Low wrote:
> On Fri, 2013-09-27 at 08:02 +0200, Ingo Molnar wrote:
> > Would be nice to have this as a separate, add-on patch. Every single 
> > instruction removal that has no downside is an upside!
> 
> Okay, so here is a patch. Tim, would you like to add this to v7?

Okay.  Will do.

Tim

> 
> ...
> Subject: MCS lock: Remove and reorder unnecessary assignments in mcs_spin_lock()
> 
> In mcs_spin_lock(), if (likely(prev == NULL)) is true, then the lock is free
> and we won't spin on the local node. In that case, we don't have to assign
> node->locked because it won't be used. We can also move the node->locked = 0
> assignment so that it occurs after the if (likely(prev == NULL)) check.
> 
> This might also help make it clearer as to how the node->locked variable
> is used in MCS locks.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>
> ---
>  include/linux/mcslock.h |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> index 20fd3f0..1167d57 100644
> --- a/include/linux/mcslock.h
> +++ b/include/linux/mcslock.h
> @@ -21,15 +21,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
>  	struct mcs_spin_node *prev;
>  
>  	/* Init node */
> -	node->locked = 0;
>  	node->next   = NULL;
>  
>  	prev = xchg(lock, node);
>  	if (likely(prev == NULL)) {
>  		/* Lock acquired */
> -		node->locked = 1;
>  		return;
>  	}
> +	node->locked = 0;
>  	ACCESS_ONCE(prev->next) = node;
>  	smp_wmb();
>  	/* Wait until the lock holder passes the lock down */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 15:29   ` Paul E. McKenney
@ 2013-09-27 18:09     ` Tim Chen
  2013-09-28  2:58       ` Waiman Long
  2013-09-27 19:38     ` Tim Chen
  1 sibling, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-09-27 18:09 UTC (permalink / raw)
  To: paulmck, Waiman Long
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> > to reuse this code easily for rwsem.
> > 
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > ---
> >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/mutex.c          |   58 +++++-----------------------------------------
> >  2 files changed, 65 insertions(+), 51 deletions(-)
> >  create mode 100644 include/linux/mcslock.h
> > 
> > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > new file mode 100644
> > index 0000000..20fd3f0
> > --- /dev/null
> > +++ b/include/linux/mcslock.h
> > @@ -0,0 +1,58 @@
> > +/*
> > + * MCS lock defines
> > + *
> > + * This file contains the main data structure and API definitions of MCS lock.
> > + */
> > +#ifndef __LINUX_MCSLOCK_H
> > +#define __LINUX_MCSLOCK_H
> > +
> > +struct mcs_spin_node {
> > +	struct mcs_spin_node *next;
> > +	int		  locked;	/* 1 if lock acquired */
> > +};
> > +
> > +/*
> > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > + * time spent in this lock function.
> > + */
> > +static noinline
> > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > +{
> > +	struct mcs_spin_node *prev;
> > +
> > +	/* Init node */
> > +	node->locked = 0;
> > +	node->next   = NULL;
> > +
> > +	prev = xchg(lock, node);
> > +	if (likely(prev == NULL)) {
> > +		/* Lock acquired */
> > +		node->locked = 1;
> > +		return;
> > +	}
> > +	ACCESS_ONCE(prev->next) = node;
> > +	smp_wmb();
> > +	/* Wait until the lock holder passes the lock down */
> > +	while (!ACCESS_ONCE(node->locked))
> > +		arch_mutex_cpu_relax();
> > +}
> > +
> > +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > +{
> > +	struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> > +
> > +	if (likely(!next)) {
> > +		/*
> > +		 * Release the lock by setting it to NULL
> > +		 */
> > +		if (cmpxchg(lock, node, NULL) == node)
> > +			return;
> > +		/* Wait until the next pointer is set */
> > +		while (!(next = ACCESS_ONCE(node->next)))
> > +			arch_mutex_cpu_relax();
> > +	}
> > +	ACCESS_ONCE(next->locked) = 1;
> > +	smp_wmb();
> 
> Shouldn't the memory barrier precede the "ACCESS_ONCE(next->locked) = 1;"?
> Maybe in an "else" clause of the prior "if" statement, given that the
> cmpxchg() does it otherwise.
> 
> Otherwise, in the case where the "if" conditionn is false, the critical
> section could bleed out past the unlock.

Yes, I agree with you that the smp_wmb should be moved before
ACCESS_ONCE to prevent critical section from bleeding.  Copying Waiman
who is the original author of the mcs code to see if he has any comments
on things we may have missed.

Tim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 15:29   ` Paul E. McKenney
  2013-09-27 18:09     ` Tim Chen
@ 2013-09-27 19:38     ` Tim Chen
  2013-09-27 20:16       ` Jason Low
  2013-09-27 20:38       ` Paul E. McKenney
  1 sibling, 2 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-27 19:38 UTC (permalink / raw)
  To: paulmck, Waiman Long
  Cc: Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > Extracting the MCS code from mutex.c and put into its own file allow us
> > to reuse this code easily for rwsem.
> > 
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > ---
> >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/mutex.c          |   58 +++++-----------------------------------------
> >  2 files changed, 65 insertions(+), 51 deletions(-)
> >  create mode 100644 include/linux/mcslock.h
> > 
> > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > new file mode 100644
> > index 0000000..20fd3f0
> > --- /dev/null
> > +++ b/include/linux/mcslock.h
> > @@ -0,0 +1,58 @@
> > +/*
> > + * MCS lock defines
> > + *
> > + * This file contains the main data structure and API definitions of MCS lock.
> > + */
> > +#ifndef __LINUX_MCSLOCK_H
> > +#define __LINUX_MCSLOCK_H
> > +
> > +struct mcs_spin_node {
> > +	struct mcs_spin_node *next;
> > +	int		  locked;	/* 1 if lock acquired */
> > +};
> > +
> > +/*
> > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > + * time spent in this lock function.
> > + */
> > +static noinline
> > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > +{
> > +	struct mcs_spin_node *prev;
> > +
> > +	/* Init node */
> > +	node->locked = 0;
> > +	node->next   = NULL;
> > +
> > +	prev = xchg(lock, node);
> > +	if (likely(prev == NULL)) {
> > +		/* Lock acquired */
> > +		node->locked = 1;
> > +		return;
> > +	}
> > +	ACCESS_ONCE(prev->next) = node;
> > +	smp_wmb();

BTW, is the above memory barrier necessary?  It seems like the xchg
instruction already provided a memory barrier.

Now if we made the changes that Jason suggested:


        /* Init node */
-       node->locked = 0;
        node->next   = NULL;
 
        prev = xchg(lock, node);
        if (likely(prev == NULL)) {
                /* Lock acquired */
-               node->locked = 1;
                return;
        }
+       node->locked = 0;
        ACCESS_ONCE(prev->next) = node;
        smp_wmb();

We are probably still okay as other cpus do not read the value of
node->locked, which is a local variable.

Tim

> > +	/* Wait until the lock holder passes the lock down */
> > +	while (!ACCESS_ONCE(node->locked))
> > +		arch_mutex_cpu_relax();
> > +}
> > +
> > +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > +{
> > +	struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> > +
> > +	if (likely(!next)) {
> > +		/*
> > +		 * Release the lock by setting it to NULL
> > +		 */
> > +		if (cmpxchg(lock, node, NULL) == node)
> > +			return;
> > +		/* Wait until the next pointer is set */
> > +		while (!(next = ACCESS_ONCE(node->next)))
> > +			arch_mutex_cpu_relax();
> > +	}
> > +	ACCESS_ONCE(next->locked) = 1;
> > +	smp_wmb();
> 
> Shouldn't the memory barrier precede the "ACCESS_ONCE(next->locked) = 1;"?
> Maybe in an "else" clause of the prior "if" statement, given that the
> cmpxchg() does it otherwise.
> 
> Otherwise, in the case where the "if" conditionn is false, the critical
> section could bleed out past the unlock.
> 
> 							Thanx, Paul
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 19:38     ` Tim Chen
@ 2013-09-27 20:16       ` Jason Low
  2013-09-27 20:38       ` Paul E. McKenney
  1 sibling, 0 replies; 64+ messages in thread
From: Jason Low @ 2013-09-27 20:16 UTC (permalink / raw)
  To: Tim Chen
  Cc: Paul McKenney, Waiman Long, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 12:38 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:

> BTW, is the above memory barrier necessary?  It seems like the xchg
> instruction already provided a memory barrier.
>
> Now if we made the changes that Jason suggested:
>
>
>         /* Init node */
> -       node->locked = 0;
>         node->next   = NULL;
>
>         prev = xchg(lock, node);
>         if (likely(prev == NULL)) {
>                 /* Lock acquired */
> -               node->locked = 1;
>                 return;
>         }
> +       node->locked = 0;
>         ACCESS_ONCE(prev->next) = node;
>         smp_wmb();
>
> We are probably still okay as other cpus do not read the value of
> node->locked, which is a local variable.

Similarly, I was wondering if we should also move smp_wmb() so that it
is before the ACCESS_ONCE(prev->next) = node and after the
node->locked = 0. Would we want to guarantee that the node->locked
gets set before it is added to the linked list where a previous thread
calling mcs_spin_unlock() would potentially modify node->locked?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 19:38     ` Tim Chen
  2013-09-27 20:16       ` Jason Low
@ 2013-09-27 20:38       ` Paul E. McKenney
  2013-09-27 22:46         ` Tim Chen
  1 sibling, 1 reply; 64+ messages in thread
From: Paul E. McKenney @ 2013-09-27 20:38 UTC (permalink / raw)
  To: Tim Chen
  Cc: Waiman Long, Ingo Molnar, Andrew Morton, Andrea Arcangeli,
	Alex Shi, Andi Kleen, Michel Lespinasse, Davidlohr Bueso,
	Matthew R Wilcox, Dave Hansen, Peter Zijlstra, Rik van Riel,
	Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > to reuse this code easily for rwsem.
> > > 
> > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > ---
> > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > >  create mode 100644 include/linux/mcslock.h
> > > 
> > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > new file mode 100644
> > > index 0000000..20fd3f0
> > > --- /dev/null
> > > +++ b/include/linux/mcslock.h
> > > @@ -0,0 +1,58 @@
> > > +/*
> > > + * MCS lock defines
> > > + *
> > > + * This file contains the main data structure and API definitions of MCS lock.
> > > + */
> > > +#ifndef __LINUX_MCSLOCK_H
> > > +#define __LINUX_MCSLOCK_H
> > > +
> > > +struct mcs_spin_node {
> > > +	struct mcs_spin_node *next;
> > > +	int		  locked;	/* 1 if lock acquired */
> > > +};
> > > +
> > > +/*
> > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > + * time spent in this lock function.
> > > + */
> > > +static noinline
> > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > +{
> > > +	struct mcs_spin_node *prev;
> > > +
> > > +	/* Init node */
> > > +	node->locked = 0;
> > > +	node->next   = NULL;
> > > +
> > > +	prev = xchg(lock, node);
> > > +	if (likely(prev == NULL)) {
> > > +		/* Lock acquired */
> > > +		node->locked = 1;
> > > +		return;
> > > +	}
> > > +	ACCESS_ONCE(prev->next) = node;
> > > +	smp_wmb();
> 
> BTW, is the above memory barrier necessary?  It seems like the xchg
> instruction already provided a memory barrier.
> 
> Now if we made the changes that Jason suggested:
> 
> 
>         /* Init node */
> -       node->locked = 0;
>         node->next   = NULL;
> 
>         prev = xchg(lock, node);
>         if (likely(prev == NULL)) {
>                 /* Lock acquired */
> -               node->locked = 1;
>                 return;
>         }
> +       node->locked = 0;
>         ACCESS_ONCE(prev->next) = node;
>         smp_wmb();
> 
> We are probably still okay as other cpus do not read the value of
> node->locked, which is a local variable.

I don't immediately see the need for the smp_wmb() in either case.

> Tim
> 
> > > +	/* Wait until the lock holder passes the lock down */
> > > +	while (!ACCESS_ONCE(node->locked))
> > > +		arch_mutex_cpu_relax();

However, you do need a full memory barrier here in order to ensure that
you see the effects of the previous lock holder's critical section.

							Thanx, Paul

> > > +}
> > > +
> > > +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > +{
> > > +	struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> > > +
> > > +	if (likely(!next)) {
> > > +		/*
> > > +		 * Release the lock by setting it to NULL
> > > +		 */
> > > +		if (cmpxchg(lock, node, NULL) == node)
> > > +			return;
> > > +		/* Wait until the next pointer is set */
> > > +		while (!(next = ACCESS_ONCE(node->next)))
> > > +			arch_mutex_cpu_relax();
> > > +	}
> > > +	ACCESS_ONCE(next->locked) = 1;
> > > +	smp_wmb();
> > 
> > Shouldn't the memory barrier precede the "ACCESS_ONCE(next->locked) = 1;"?
> > Maybe in an "else" clause of the prior "if" statement, given that the
> > cmpxchg() does it otherwise.
> > 
> > Otherwise, in the case where the "if" conditionn is false, the critical
> > section could bleed out past the unlock.
> > 
> > 							Thanx, Paul
> > 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 20:38       ` Paul E. McKenney
@ 2013-09-27 22:46         ` Tim Chen
  2013-09-27 23:01           ` Paul E. McKenney
  0 siblings, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-09-27 22:46 UTC (permalink / raw)
  To: paulmck, Jason Low
  Cc: Waiman Long, Ingo Molnar, Andrew Morton, Andrea Arcangeli,
	Alex Shi, Andi Kleen, Michel Lespinasse, Davidlohr Bueso,
	Matthew R Wilcox, Dave Hansen, Peter Zijlstra, Rik van Riel,
	Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote:
> On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > to reuse this code easily for rwsem.
> > > > 
> > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > ---
> > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > >  create mode 100644 include/linux/mcslock.h
> > > > 
> > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > new file mode 100644
> > > > index 0000000..20fd3f0
> > > > --- /dev/null
> > > > +++ b/include/linux/mcslock.h
> > > > @@ -0,0 +1,58 @@
> > > > +/*
> > > > + * MCS lock defines
> > > > + *
> > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > + */
> > > > +#ifndef __LINUX_MCSLOCK_H
> > > > +#define __LINUX_MCSLOCK_H
> > > > +
> > > > +struct mcs_spin_node {
> > > > +	struct mcs_spin_node *next;
> > > > +	int		  locked;	/* 1 if lock acquired */
> > > > +};
> > > > +
> > > > +/*
> > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > + * time spent in this lock function.
> > > > + */
> > > > +static noinline
> > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > +{
> > > > +	struct mcs_spin_node *prev;
> > > > +
> > > > +	/* Init node */
> > > > +	node->locked = 0;
> > > > +	node->next   = NULL;
> > > > +
> > > > +	prev = xchg(lock, node);
> > > > +	if (likely(prev == NULL)) {
> > > > +		/* Lock acquired */
> > > > +		node->locked = 1;
> > > > +		return;
> > > > +	}
> > > > +	ACCESS_ONCE(prev->next) = node;
> > > > +	smp_wmb();
> > 
> > BTW, is the above memory barrier necessary?  It seems like the xchg
> > instruction already provided a memory barrier.
> > 
> > Now if we made the changes that Jason suggested:
> > 
> > 
> >         /* Init node */
> > -       node->locked = 0;
> >         node->next   = NULL;
> > 
> >         prev = xchg(lock, node);
> >         if (likely(prev == NULL)) {
> >                 /* Lock acquired */
> > -               node->locked = 1;
> >                 return;
> >         }
> > +       node->locked = 0;
> >         ACCESS_ONCE(prev->next) = node;
> >         smp_wmb();
> > 
> > We are probably still okay as other cpus do not read the value of
> > node->locked, which is a local variable.
> 
> I don't immediately see the need for the smp_wmb() in either case.


Thinking a bit more, the following could happen in Jason's 
initial patch proposal.  In this case variable "prev" referenced 
by CPU1 points to "node" referenced by CPU2  

	CPU 1 (calling lock)			CPU 2 (calling unlock)
	ACCESS_ONCE(prev->next) = node
						*next = ACCESS_ONCE(node->next);
						ACCESS_ONCE(next->locked) = 1;
	node->locked = 0;

Then we will be spinning forever on CPU1 as we overwrite the lock passed
from CPU2 before we check it.  The original code assign 
"node->locked = 0" before xchg does not have this issue.
Doing the following change of moving smp_wmb immediately
after node->locked assignment (suggested by Jason)

	node->locked = 0;
	smp_wmb();
	ACCESS_ONCE(prev->next) = node;

could avoid the problem, but will need closer scrutiny to see if
there are other pitfalls if wmb happen before 
	
	ACCESS_ONCE(prev->next) = node;


> > 
> > > > +	/* Wait until the lock holder passes the lock down */
> > > > +	while (!ACCESS_ONCE(node->locked))
> > > > +		arch_mutex_cpu_relax();
> 
> However, you do need a full memory barrier here in order to ensure that
> you see the effects of the previous lock holder's critical section.

Is it necessary to add a memory barrier after acquiring
the lock if the previous lock holder execute smp_wmb before passing
the lock?

Thanks.

Tim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 22:46         ` Tim Chen
@ 2013-09-27 23:01           ` Paul E. McKenney
  2013-09-27 23:54             ` Jason Low
  0 siblings, 1 reply; 64+ messages in thread
From: Paul E. McKenney @ 2013-09-27 23:01 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Waiman Long, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 03:46:45PM -0700, Tim Chen wrote:
> On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote:
> > On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> > > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > > to reuse this code easily for rwsem.
> > > > > 
> > > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > > > ---
> > > > >  include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  kernel/mutex.c          |   58 +++++-----------------------------------------
> > > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > > >  create mode 100644 include/linux/mcslock.h
> > > > > 
> > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > new file mode 100644
> > > > > index 0000000..20fd3f0
> > > > > --- /dev/null
> > > > > +++ b/include/linux/mcslock.h
> > > > > @@ -0,0 +1,58 @@
> > > > > +/*
> > > > > + * MCS lock defines
> > > > > + *
> > > > > + * This file contains the main data structure and API definitions of MCS lock.
> > > > > + */
> > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > +#define __LINUX_MCSLOCK_H
> > > > > +
> > > > > +struct mcs_spin_node {
> > > > > +	struct mcs_spin_node *next;
> > > > > +	int		  locked;	/* 1 if lock acquired */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> > > > > + * time spent in this lock function.
> > > > > + */
> > > > > +static noinline
> > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> > > > > +{
> > > > > +	struct mcs_spin_node *prev;
> > > > > +
> > > > > +	/* Init node */
> > > > > +	node->locked = 0;
> > > > > +	node->next   = NULL;
> > > > > +
> > > > > +	prev = xchg(lock, node);
> > > > > +	if (likely(prev == NULL)) {
> > > > > +		/* Lock acquired */
> > > > > +		node->locked = 1;
> > > > > +		return;
> > > > > +	}
> > > > > +	ACCESS_ONCE(prev->next) = node;
> > > > > +	smp_wmb();
> > > 
> > > BTW, is the above memory barrier necessary?  It seems like the xchg
> > > instruction already provided a memory barrier.
> > > 
> > > Now if we made the changes that Jason suggested:
> > > 
> > > 
> > >         /* Init node */
> > > -       node->locked = 0;
> > >         node->next   = NULL;
> > > 
> > >         prev = xchg(lock, node);
> > >         if (likely(prev == NULL)) {
> > >                 /* Lock acquired */
> > > -               node->locked = 1;
> > >                 return;
> > >         }
> > > +       node->locked = 0;
> > >         ACCESS_ONCE(prev->next) = node;
> > >         smp_wmb();
> > > 
> > > We are probably still okay as other cpus do not read the value of
> > > node->locked, which is a local variable.
> > 
> > I don't immediately see the need for the smp_wmb() in either case.
> 
> 
> Thinking a bit more, the following could happen in Jason's 
> initial patch proposal.  In this case variable "prev" referenced 
> by CPU1 points to "node" referenced by CPU2  
> 
> 	CPU 1 (calling lock)			CPU 2 (calling unlock)
> 	ACCESS_ONCE(prev->next) = node
> 						*next = ACCESS_ONCE(node->next);
> 						ACCESS_ONCE(next->locked) = 1;
> 	node->locked = 0;
> 
> Then we will be spinning forever on CPU1 as we overwrite the lock passed
> from CPU2 before we check it.  The original code assign 
> "node->locked = 0" before xchg does not have this issue.
> Doing the following change of moving smp_wmb immediately
> after node->locked assignment (suggested by Jason)
> 
> 	node->locked = 0;
> 	smp_wmb();
> 	ACCESS_ONCE(prev->next) = node;
> 
> could avoid the problem, but will need closer scrutiny to see if
> there are other pitfalls if wmb happen before 
> 	
> 	ACCESS_ONCE(prev->next) = node;

I could believe that an smp_wmb() might be needed before the
"ACCESS_ONCE(prev->next) = node;", just not after.

> > > > > +	/* Wait until the lock holder passes the lock down */
> > > > > +	while (!ACCESS_ONCE(node->locked))
> > > > > +		arch_mutex_cpu_relax();
> > 
> > However, you do need a full memory barrier here in order to ensure that
> > you see the effects of the previous lock holder's critical section.
> 
> Is it necessary to add a memory barrier after acquiring
> the lock if the previous lock holder execute smp_wmb before passing
> the lock?

Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
or the CPU from reordering things for the new lock holder.  They could for
example reorder the critical section to precede the node->locked check,
which would be very bad.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 14:50                                 ` Peter Zijlstra
  2013-09-27 15:17                                   ` Paul E. McKenney
@ 2013-09-27 23:40                                   ` Oliver Neukum
  2013-09-28  7:54                                     ` Peter Zijlstra
  1 sibling, 1 reply; 64+ messages in thread
From: Oliver Neukum @ 2013-09-27 23:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joe Perches, Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso,
	Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 16:50 +0200, Peter Zijlstra wrote:
> On Fri, Sep 27, 2013 at 07:34:55AM -0700, Joe Perches wrote:
> > That would make it seem as if all barriers are SMP no?
> 
> I would think any memory barrier is ordering against someone else; if
> not smp then a device/hardware -- like for instance the hardware page
> table walker.
> 
> Barriers are fundamentally about order; and order only makes sense if
> there's more than 1 party to the game.

But not necessarily more than 1 kind of parties. It is perfectly
possible to have a barrier against other threads running the same
function.

	Regards
		Oliver


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 23:01           ` Paul E. McKenney
@ 2013-09-27 23:54             ` Jason Low
  2013-09-28  0:02               ` Davidlohr Bueso
  2013-09-28  2:19               ` Paul E. McKenney
  0 siblings, 2 replies; 64+ messages in thread
From: Jason Low @ 2013-09-27 23:54 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Tim Chen, Jason Low, Waiman Long, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
> or the CPU from reordering things for the new lock holder.  They could for
> example reorder the critical section to precede the node->locked check,
> which would be very bad.

Paul, Tim, Longman,

How would you like the proposed changes below?

---
Subject: [PATCH] MCS: optimizations and barrier corrections

Delete the node->locked = 1 assignment if the lock is free as it won't be used.

Delete the smp_wmb() in mcs_spin_lock() and add a full memory barrier at the
end of the mcs_spin_lock() function. As Paul McKenney suggested, "you do need a
full memory barrier here in order to ensure that you see the effects of the
previous lock holder's critical section." And in the mcs_spin_unlock(), move the
memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/mcslock.h |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
index 20fd3f0..edd57d2 100644
--- a/include/linux/mcslock.h
+++ b/include/linux/mcslock.h
@@ -26,15 +26,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock,
struct mcs_spin_node *node)

        prev = xchg(lock, node);
        if (likely(prev == NULL)) {
-               /* Lock acquired */
-               node->locked = 1;
+               /* Lock acquired. No need to set node->locked since it
won't be used */
                return;
        }
        ACCESS_ONCE(prev->next) = node;
-       smp_wmb();
        /* Wait until the lock holder passes the lock down */
        while (!ACCESS_ONCE(node->locked))
                arch_mutex_cpu_relax();
+       smp_mb();
 }

 static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
mcs_spin_node *node)
@@ -51,8 +50,8 @@ static void mcs_spin_unlock(struct mcs_spin_node
**lock, struct mcs_spin_node *n
                while (!(next = ACCESS_ONCE(node->next)))
                        arch_mutex_cpu_relax();
        }
-       ACCESS_ONCE(next->locked) = 1;
        smp_wmb();
+       ACCESS_ONCE(next->locked) = 1;
 }

 #endif
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 23:54             ` Jason Low
@ 2013-09-28  0:02               ` Davidlohr Bueso
  2013-09-28  2:19               ` Paul E. McKenney
  1 sibling, 0 replies; 64+ messages in thread
From: Davidlohr Bueso @ 2013-09-28  0:02 UTC (permalink / raw)
  To: Jason Low
  Cc: Paul McKenney, Tim Chen, Waiman Long, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 16:54 -0700, Jason Low wrote:
> On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
> > or the CPU from reordering things for the new lock holder.  They could for
> > example reorder the critical section to precede the node->locked check,
> > which would be very bad.
> 
> Paul, Tim, Longman,
> 
> How would you like the proposed changes below?
> 
> ---
> Subject: [PATCH] MCS: optimizations and barrier corrections

We *really* need to comment those barriers - explicitly that is :)

> 
> Delete the node->locked = 1 assignment if the lock is free as it won't be used.
> 
> Delete the smp_wmb() in mcs_spin_lock() and add a full memory barrier at the
> end of the mcs_spin_lock() function. As Paul McKenney suggested, "you do need a
> full memory barrier here in order to ensure that you see the effects of the
> previous lock holder's critical section." And in the mcs_spin_unlock(), move the
> memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  include/linux/mcslock.h |    7 +++----
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> index 20fd3f0..edd57d2 100644
> --- a/include/linux/mcslock.h
> +++ b/include/linux/mcslock.h
> @@ -26,15 +26,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock,
> struct mcs_spin_node *node)
> 
>         prev = xchg(lock, node);
>         if (likely(prev == NULL)) {
> -               /* Lock acquired */
> -               node->locked = 1;
> +               /* Lock acquired. No need to set node->locked since it
> won't be used */

Then, we need to explain/comment then the relationship between this
situation and the locked being set in mspin_unlock(), passing the lock
holder down the list.

>                 return;
>         }
>         ACCESS_ONCE(prev->next) = node;
> -       smp_wmb();
>         /* Wait until the lock holder passes the lock down */
>         while (!ACCESS_ONCE(node->locked))
>                 arch_mutex_cpu_relax();
> +       smp_mb();
>  }
> 
>  static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
> mcs_spin_node *node)
> @@ -51,8 +50,8 @@ static void mcs_spin_unlock(struct mcs_spin_node
> **lock, struct mcs_spin_node *n
>                 while (!(next = ACCESS_ONCE(node->next)))
>                         arch_mutex_cpu_relax();
>         }
> -       ACCESS_ONCE(next->locked) = 1;
>         smp_wmb();
> +       ACCESS_ONCE(next->locked) = 1;
>  }
> 
>  #endif


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 23:54             ` Jason Low
  2013-09-28  0:02               ` Davidlohr Bueso
@ 2013-09-28  2:19               ` Paul E. McKenney
  2013-09-28  4:34                 ` Jason Low
  2013-09-30 16:28                 ` Tim Chen
  1 sibling, 2 replies; 64+ messages in thread
From: Paul E. McKenney @ 2013-09-28  2:19 UTC (permalink / raw)
  To: Jason Low
  Cc: Tim Chen, Waiman Long, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 04:54:06PM -0700, Jason Low wrote:
> On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
> > or the CPU from reordering things for the new lock holder.  They could for
> > example reorder the critical section to precede the node->locked check,
> > which would be very bad.
> 
> Paul, Tim, Longman,
> 
> How would you like the proposed changes below?

Could you point me at what this applies to?  I can find flaws looking
at random pieces, given a little luck, but at some point I need to look
at the whole thing.  ;-)

							Thanx, Paul

> ---
> Subject: [PATCH] MCS: optimizations and barrier corrections
> 
> Delete the node->locked = 1 assignment if the lock is free as it won't be used.
> 
> Delete the smp_wmb() in mcs_spin_lock() and add a full memory barrier at the
> end of the mcs_spin_lock() function. As Paul McKenney suggested, "you do need a
> full memory barrier here in order to ensure that you see the effects of the
> previous lock holder's critical section." And in the mcs_spin_unlock(), move the
> memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  include/linux/mcslock.h |    7 +++----
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> index 20fd3f0..edd57d2 100644
> --- a/include/linux/mcslock.h
> +++ b/include/linux/mcslock.h
> @@ -26,15 +26,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock,
> struct mcs_spin_node *node)
> 
>         prev = xchg(lock, node);
>         if (likely(prev == NULL)) {
> -               /* Lock acquired */
> -               node->locked = 1;
> +               /* Lock acquired. No need to set node->locked since it
> won't be used */
>                 return;
>         }
>         ACCESS_ONCE(prev->next) = node;
> -       smp_wmb();
>         /* Wait until the lock holder passes the lock down */
>         while (!ACCESS_ONCE(node->locked))
>                 arch_mutex_cpu_relax();
> +       smp_mb();
>  }
> 
>  static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
> mcs_spin_node *node)
> @@ -51,8 +50,8 @@ static void mcs_spin_unlock(struct mcs_spin_node
> **lock, struct mcs_spin_node *n
>                 while (!(next = ACCESS_ONCE(node->next)))
>                         arch_mutex_cpu_relax();
>         }
> -       ACCESS_ONCE(next->locked) = 1;
>         smp_wmb();
> +       ACCESS_ONCE(next->locked) = 1;
>  }
> 
>  #endif
> -- 
> 1.7.1
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-27 18:09     ` Tim Chen
@ 2013-09-28  2:58       ` Waiman Long
  0 siblings, 0 replies; 64+ messages in thread
From: Waiman Long @ 2013-09-28  2:58 UTC (permalink / raw)
  To: Tim Chen
  Cc: paulmck, Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Peter Zijlstra, Rik van Riel, Peter Hurley,
	linux-kernel, linux-mm

On 09/27/2013 02:09 PM, Tim Chen wrote:
> On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
>> On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
>>> We will need the MCS lock code for doing optimistic spinning for rwsem.
>>> Extracting the MCS code from mutex.c and put into its own file allow us
>>> to reuse this code easily for rwsem.
>>>
>>> Signed-off-by: Tim Chen<tim.c.chen@linux.intel.com>
>>> Signed-off-by: Davidlohr Bueso<davidlohr@hp.com>
>>> ---
>>>   include/linux/mcslock.h |   58 +++++++++++++++++++++++++++++++++++++++++++++++
>>>   kernel/mutex.c          |   58 +++++-----------------------------------------
>>>   2 files changed, 65 insertions(+), 51 deletions(-)
>>>   create mode 100644 include/linux/mcslock.h
>>>
>>> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
>>> new file mode 100644
>>> index 0000000..20fd3f0
>>> --- /dev/null
>>> +++ b/include/linux/mcslock.h
>>> @@ -0,0 +1,58 @@
>>> +/*
>>> + * MCS lock defines
>>> + *
>>> + * This file contains the main data structure and API definitions of MCS lock.
>>> + */
>>> +#ifndef __LINUX_MCSLOCK_H
>>> +#define __LINUX_MCSLOCK_H
>>> +
>>> +struct mcs_spin_node {
>>> +	struct mcs_spin_node *next;
>>> +	int		  locked;	/* 1 if lock acquired */
>>> +};
>>> +
>>> +/*
>>> + * We don't inline mcs_spin_lock() so that perf can correctly account for the
>>> + * time spent in this lock function.
>>> + */
>>> +static noinline
>>> +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
>>> +{
>>> +	struct mcs_spin_node *prev;
>>> +
>>> +	/* Init node */
>>> +	node->locked = 0;
>>> +	node->next   = NULL;
>>> +
>>> +	prev = xchg(lock, node);
>>> +	if (likely(prev == NULL)) {
>>> +		/* Lock acquired */
>>> +		node->locked = 1;
>>> +		return;
>>> +	}
>>> +	ACCESS_ONCE(prev->next) = node;
>>> +	smp_wmb();
>>> +	/* Wait until the lock holder passes the lock down */
>>> +	while (!ACCESS_ONCE(node->locked))
>>> +		arch_mutex_cpu_relax();
>>> +}
>>> +
>>> +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
>>> +{
>>> +	struct mcs_spin_node *next = ACCESS_ONCE(node->next);
>>> +
>>> +	if (likely(!next)) {
>>> +		/*
>>> +		 * Release the lock by setting it to NULL
>>> +		 */
>>> +		if (cmpxchg(lock, node, NULL) == node)
>>> +			return;
>>> +		/* Wait until the next pointer is set */
>>> +		while (!(next = ACCESS_ONCE(node->next)))
>>> +			arch_mutex_cpu_relax();
>>> +	}
>>> +	ACCESS_ONCE(next->locked) = 1;
>>> +	smp_wmb();
>> Shouldn't the memory barrier precede the "ACCESS_ONCE(next->locked) = 1;"?
>> Maybe in an "else" clause of the prior "if" statement, given that the
>> cmpxchg() does it otherwise.
>>
>> Otherwise, in the case where the "if" conditionn is false, the critical
>> section could bleed out past the unlock.
> Yes, I agree with you that the smp_wmb should be moved before
> ACCESS_ONCE to prevent critical section from bleeding.  Copying Waiman
> who is the original author of the mcs code to see if he has any comments
> on things we may have missed.
>
> Tim

As a more general lock/unlock mechanism, I also agreed that we should 
move smp_wmb() before ACCESS_ONCE(). For the mutex case, it is used as a 
queuing mechanism rather than guarding critical section, so it doesn't 
really matter.

Regards,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-28  2:19               ` Paul E. McKenney
@ 2013-09-28  4:34                 ` Jason Low
  2013-09-30 15:51                   ` Waiman Long
  2013-09-30 16:28                 ` Tim Chen
  1 sibling, 1 reply; 64+ messages in thread
From: Jason Low @ 2013-09-28  4:34 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Jason Low, Tim Chen, Waiman Long, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, Sep 27, 2013 at 7:19 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Fri, Sep 27, 2013 at 04:54:06PM -0700, Jason Low wrote:
>> On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
>> > or the CPU from reordering things for the new lock holder.  They could for
>> > example reorder the critical section to precede the node->locked check,
>> > which would be very bad.
>>
>> Paul, Tim, Longman,
>>
>> How would you like the proposed changes below?
>
> Could you point me at what this applies to?  I can find flaws looking
> at random pieces, given a little luck, but at some point I need to look
> at the whole thing.  ;-)

Sure. Here is a link to the patch we are trying to modify:
https://lkml.org/lkml/2013/9/25/532

Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
functions would look like after applying the proposed changes.

static noinline
void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
{
        struct mcs_spin_node *prev;

        /* Init node */
        node->locked = 0;
        node->next   = NULL;

        prev = xchg(lock, node);
        if (likely(prev == NULL)) {
                /* Lock acquired. No need to set node->locked since it
won't be used */
                return;
        }
        ACCESS_ONCE(prev->next) = node;
        /* Wait until the lock holder passes the lock down */
        while (!ACCESS_ONCE(node->locked))
                arch_mutex_cpu_relax();
        smp_mb();
}

static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
mcs_spin_node *node)
{
        struct mcs_spin_node *next = ACCESS_ONCE(node->next);

        if (likely(!next)) {
                /*
                 * Release the lock by setting it to NULL
                 */
                if (cmpxchg(lock, node, NULL) == node)
                        return;
                /* Wait until the next pointer is set */
                while (!(next = ACCESS_ONCE(node->next)))
                        arch_mutex_cpu_relax();
        }
        smp_wmb();
        ACCESS_ONCE(next->locked) = 1;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] checkpatch: Make the memory barrier test noisier
  2013-09-27 23:40                                   ` Oliver Neukum
@ 2013-09-28  7:54                                     ` Peter Zijlstra
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Zijlstra @ 2013-09-28  7:54 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Joe Perches, Ingo Molnar, Tim Chen, Jason Low, Davidlohr Bueso,
	Ingo Molnar, Andrew Morton, Andrea Arcangeli, Alex Shi,
	Andi Kleen, Michel Lespinasse, Davidlohr Bueso, Matthew R Wilcox,
	Dave Hansen, Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Sat, Sep 28, 2013 at 01:40:27AM +0200, Oliver Neukum wrote:
> On Fri, 2013-09-27 at 16:50 +0200, Peter Zijlstra wrote:
> > On Fri, Sep 27, 2013 at 07:34:55AM -0700, Joe Perches wrote:
> > > That would make it seem as if all barriers are SMP no?
> > 
> > I would think any memory barrier is ordering against someone else; if
> > not smp then a device/hardware -- like for instance the hardware page
> > table walker.
> > 
> > Barriers are fundamentally about order; and order only makes sense if
> > there's more than 1 party to the game.
> 
> But not necessarily more than 1 kind of parties. It is perfectly
> possible to have a barrier against other threads running the same
> function.

Then that makes a good comment ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-28  4:34                 ` Jason Low
@ 2013-09-30 15:51                   ` Waiman Long
  2013-09-30 16:10                     ` Jason Low
  0 siblings, 1 reply; 64+ messages in thread
From: Waiman Long @ 2013-09-30 15:51 UTC (permalink / raw)
  To: Jason Low
  Cc: Paul McKenney, Tim Chen, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On 09/28/2013 12:34 AM, Jason Low wrote:
>> Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
>> functions would look like after applying the proposed changes.
>>
>> static noinline
>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
>> {
>>          struct mcs_spin_node *prev;
>>
>>          /* Init node */
>>          node->locked = 0;
>>          node->next   = NULL;
>>
>>          prev = xchg(lock, node);
>>          if (likely(prev == NULL)) {
>>                  /* Lock acquired. No need to set node->locked since it
>> won't be used */
>>                  return;
>>          }
>>          ACCESS_ONCE(prev->next) = node;
>>          /* Wait until the lock holder passes the lock down */
>>          while (!ACCESS_ONCE(node->locked))
>>                  arch_mutex_cpu_relax();
>>          smp_mb();

I wonder if a memory barrier is really needed here.

>> }
>>
>> static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
>> mcs_spin_node *node)
>> {
>>          struct mcs_spin_node *next = ACCESS_ONCE(node->next);
>>
>>          if (likely(!next)) {
>>                  /*
>>                   * Release the lock by setting it to NULL
>>                   */
>>                  if (cmpxchg(lock, node, NULL) == node)
>>                          return;
>>                  /* Wait until the next pointer is set */
>>                  while (!(next = ACCESS_ONCE(node->next)))
>>                          arch_mutex_cpu_relax();
>>          }
>>          smp_wmb();
>>          ACCESS_ONCE(next->locked) = 1;
>> }

Instead, I think what we need may be:

if (likely(!next)) {
     ....
} else
     smp_mb();
ACCESS_ONCE(next->locked) = 1;

That will ensure a memory barrier in the unlock path.

Regards,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-30 15:51                   ` Waiman Long
@ 2013-09-30 16:10                     ` Jason Low
  2013-09-30 16:36                       ` Waiman Long
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Low @ 2013-09-30 16:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Paul McKenney, Tim Chen, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote:
> On 09/28/2013 12:34 AM, Jason Low wrote:
> >> Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
> >> functions would look like after applying the proposed changes.
> >>
> >> static noinline
> >> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> >> {
> >>          struct mcs_spin_node *prev;
> >>
> >>          /* Init node */
> >>          node->locked = 0;
> >>          node->next   = NULL;
> >>
> >>          prev = xchg(lock, node);
> >>          if (likely(prev == NULL)) {
> >>                  /* Lock acquired. No need to set node->locked since it
> >> won't be used */
> >>                  return;
> >>          }
> >>          ACCESS_ONCE(prev->next) = node;
> >>          /* Wait until the lock holder passes the lock down */
> >>          while (!ACCESS_ONCE(node->locked))
> >>                  arch_mutex_cpu_relax();
> >>          smp_mb();
> 
> I wonder if a memory barrier is really needed here.

If the compiler can reorder the while (!ACCESS_ONCE(node->locked)) check
so that the check occurs after an instruction in the critical section,
then the barrier may be necessary.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-28  2:19               ` Paul E. McKenney
  2013-09-28  4:34                 ` Jason Low
@ 2013-09-30 16:28                 ` Tim Chen
  1 sibling, 0 replies; 64+ messages in thread
From: Tim Chen @ 2013-09-30 16:28 UTC (permalink / raw)
  To: paulmck
  Cc: Jason Low, Waiman Long, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Fri, 2013-09-27 at 19:19 -0700, Paul E. McKenney wrote:
> On Fri, Sep 27, 2013 at 04:54:06PM -0700, Jason Low wrote:
> > On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
> > > or the CPU from reordering things for the new lock holder.  They could for
> > > example reorder the critical section to precede the node->locked check,
> > > which would be very bad.
> > 
> > Paul, Tim, Longman,
> > 
> > How would you like the proposed changes below?
> 
> Could you point me at what this applies to?  I can find flaws looking
> at random pieces, given a little luck, but at some point I need to look
> at the whole thing.  ;-)
> 
> 							Thanx, Paul

Jason's patch is on top of the following patchset:

https://lkml.org/lkml/2013/9/26/674

Thanks.

Tim

> 
> > ---
> > Subject: [PATCH] MCS: optimizations and barrier corrections
> > 
> > Delete the node->locked = 1 assignment if the lock is free as it won't be used.
> > 
> > Delete the smp_wmb() in mcs_spin_lock() and add a full memory barrier at the
> > end of the mcs_spin_lock() function. As Paul McKenney suggested, "you do need a
> > full memory barrier here in order to ensure that you see the effects of the
> > previous lock holder's critical section." And in the mcs_spin_unlock(), move the
> > memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".
> > 
> > Signed-off-by: Jason Low <jason.low2@hp.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  include/linux/mcslock.h |    7 +++----
> >  1 files changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > index 20fd3f0..edd57d2 100644
> > --- a/include/linux/mcslock.h
> > +++ b/include/linux/mcslock.h
> > @@ -26,15 +26,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock,
> > struct mcs_spin_node *node)
> > 
> >         prev = xchg(lock, node);
> >         if (likely(prev == NULL)) {
> > -               /* Lock acquired */
> > -               node->locked = 1;
> > +               /* Lock acquired. No need to set node->locked since it
> > won't be used */
> >                 return;
> >         }
> >         ACCESS_ONCE(prev->next) = node;
> > -       smp_wmb();
> >         /* Wait until the lock holder passes the lock down */
> >         while (!ACCESS_ONCE(node->locked))
> >                 arch_mutex_cpu_relax();
> > +       smp_mb();
> >  }
> > 
> >  static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
> > mcs_spin_node *node)
> > @@ -51,8 +50,8 @@ static void mcs_spin_unlock(struct mcs_spin_node
> > **lock, struct mcs_spin_node *n
> >                 while (!(next = ACCESS_ONCE(node->next)))
> >                         arch_mutex_cpu_relax();
> >         }
> > -       ACCESS_ONCE(next->locked) = 1;
> >         smp_wmb();
> > +       ACCESS_ONCE(next->locked) = 1;
> >  }
> > 
> >  #endif
> > -- 
> > 1.7.1
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-30 16:10                     ` Jason Low
@ 2013-09-30 16:36                       ` Waiman Long
  2013-10-01 16:48                         ` Tim Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Waiman Long @ 2013-09-30 16:36 UTC (permalink / raw)
  To: Jason Low
  Cc: Paul McKenney, Tim Chen, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On 09/30/2013 12:10 PM, Jason Low wrote:
> On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote:
>> On 09/28/2013 12:34 AM, Jason Low wrote:
>>>> Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
>>>> functions would look like after applying the proposed changes.
>>>>
>>>> static noinline
>>>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
>>>> {
>>>>           struct mcs_spin_node *prev;
>>>>
>>>>           /* Init node */
>>>>           node->locked = 0;
>>>>           node->next   = NULL;
>>>>
>>>>           prev = xchg(lock, node);
>>>>           if (likely(prev == NULL)) {
>>>>                   /* Lock acquired. No need to set node->locked since it
>>>> won't be used */
>>>>                   return;
>>>>           }
>>>>           ACCESS_ONCE(prev->next) = node;
>>>>           /* Wait until the lock holder passes the lock down */
>>>>           while (!ACCESS_ONCE(node->locked))
>>>>                   arch_mutex_cpu_relax();
>>>>           smp_mb();
>> I wonder if a memory barrier is really needed here.
> If the compiler can reorder the while (!ACCESS_ONCE(node->locked)) check
> so that the check occurs after an instruction in the critical section,
> then the barrier may be necessary.
>

In that case, just a barrier() call should be enough.

-Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-30 16:36                       ` Waiman Long
@ 2013-10-01 16:48                         ` Tim Chen
  2013-10-01 20:01                           ` Waiman Long
  0 siblings, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-10-01 16:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jason Low, Paul McKenney, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Mon, 2013-09-30 at 12:36 -0400, Waiman Long wrote:
> On 09/30/2013 12:10 PM, Jason Low wrote:
> > On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote:
> >> On 09/28/2013 12:34 AM, Jason Low wrote:
> >>>> Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
> >>>> functions would look like after applying the proposed changes.
> >>>>
> >>>> static noinline
> >>>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> >>>> {
> >>>>           struct mcs_spin_node *prev;
> >>>>
> >>>>           /* Init node */
> >>>>           node->locked = 0;
> >>>>           node->next   = NULL;
> >>>>
> >>>>           prev = xchg(lock, node);
> >>>>           if (likely(prev == NULL)) {
> >>>>                   /* Lock acquired. No need to set node->locked since it
> >>>> won't be used */
> >>>>                   return;
> >>>>           }
> >>>>           ACCESS_ONCE(prev->next) = node;
> >>>>           /* Wait until the lock holder passes the lock down */
> >>>>           while (!ACCESS_ONCE(node->locked))
> >>>>                   arch_mutex_cpu_relax();
> >>>>           smp_mb();
> >> I wonder if a memory barrier is really needed here.
> > If the compiler can reorder the while (!ACCESS_ONCE(node->locked)) check
> > so that the check occurs after an instruction in the critical section,
> > then the barrier may be necessary.
> >
> 
> In that case, just a barrier() call should be enough.

The cpu could still be executing out of order load instruction from the
critical section before checking node->locked?  Probably smp_mb() is
still needed.

Tim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-10-01 16:48                         ` Tim Chen
@ 2013-10-01 20:01                           ` Waiman Long
  2013-10-01 21:16                             ` Tim Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Waiman Long @ 2013-10-01 20:01 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Paul McKenney, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On 10/01/2013 12:48 PM, Tim Chen wrote:
> On Mon, 2013-09-30 at 12:36 -0400, Waiman Long wrote:
>> On 09/30/2013 12:10 PM, Jason Low wrote:
>>> On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote:
>>>> On 09/28/2013 12:34 AM, Jason Low wrote:
>>>>>> Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
>>>>>> functions would look like after applying the proposed changes.
>>>>>>
>>>>>> static noinline
>>>>>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
>>>>>> {
>>>>>>            struct mcs_spin_node *prev;
>>>>>>
>>>>>>            /* Init node */
>>>>>>            node->locked = 0;
>>>>>>            node->next   = NULL;
>>>>>>
>>>>>>            prev = xchg(lock, node);
>>>>>>            if (likely(prev == NULL)) {
>>>>>>                    /* Lock acquired. No need to set node->locked since it
>>>>>> won't be used */
>>>>>>                    return;
>>>>>>            }
>>>>>>            ACCESS_ONCE(prev->next) = node;
>>>>>>            /* Wait until the lock holder passes the lock down */
>>>>>>            while (!ACCESS_ONCE(node->locked))
>>>>>>                    arch_mutex_cpu_relax();
>>>>>>            smp_mb();
>>>> I wonder if a memory barrier is really needed here.
>>> If the compiler can reorder the while (!ACCESS_ONCE(node->locked)) check
>>> so that the check occurs after an instruction in the critical section,
>>> then the barrier may be necessary.
>>>
>> In that case, just a barrier() call should be enough.
> The cpu could still be executing out of order load instruction from the
> critical section before checking node->locked?  Probably smp_mb() is
> still needed.
>
> Tim

But this is the lock function, a barrier() call should be enough to 
prevent the critical section from creeping up there. We certainly need 
some kind of memory barrier at the end of the unlock function.

-Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-10-01 20:01                           ` Waiman Long
@ 2013-10-01 21:16                             ` Tim Chen
  2013-10-02  1:25                               ` Waiman Long
  0 siblings, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-10-01 21:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jason Low, Paul McKenney, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Tue, 2013-10-01 at 16:01 -0400, Waiman Long wrote:
> On 10/01/2013 12:48 PM, Tim Chen wrote:
> > On Mon, 2013-09-30 at 12:36 -0400, Waiman Long wrote:
> >> On 09/30/2013 12:10 PM, Jason Low wrote:
> >>> On Mon, 2013-09-30 at 11:51 -0400, Waiman Long wrote:
> >>>> On 09/28/2013 12:34 AM, Jason Low wrote:
> >>>>>> Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
> >>>>>> functions would look like after applying the proposed changes.
> >>>>>>
> >>>>>> static noinline
> >>>>>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> >>>>>> {
> >>>>>>            struct mcs_spin_node *prev;
> >>>>>>
> >>>>>>            /* Init node */
> >>>>>>            node->locked = 0;
> >>>>>>            node->next   = NULL;
> >>>>>>
> >>>>>>            prev = xchg(lock, node);
> >>>>>>            if (likely(prev == NULL)) {
> >>>>>>                    /* Lock acquired. No need to set node->locked since it
> >>>>>> won't be used */
> >>>>>>                    return;
> >>>>>>            }
> >>>>>>            ACCESS_ONCE(prev->next) = node;
> >>>>>>            /* Wait until the lock holder passes the lock down */
> >>>>>>            while (!ACCESS_ONCE(node->locked))
> >>>>>>                    arch_mutex_cpu_relax();
> >>>>>>            smp_mb();
> >>>> I wonder if a memory barrier is really needed here.
> >>> If the compiler can reorder the while (!ACCESS_ONCE(node->locked)) check
> >>> so that the check occurs after an instruction in the critical section,
> >>> then the barrier may be necessary.
> >>>
> >> In that case, just a barrier() call should be enough.
> > The cpu could still be executing out of order load instruction from the
> > critical section before checking node->locked?  Probably smp_mb() is
> > still needed.
> >
> > Tim
> 
> But this is the lock function, a barrier() call should be enough to 
> prevent the critical section from creeping up there. We certainly need 
> some kind of memory barrier at the end of the unlock function.

I may be missing something.  My understanding is that barrier only
prevents the compiler from rearranging instructions, but not for cpu out
of order execution (as in smp_mb). So cpu could read memory in the next
critical section, before node->locked is true, (i.e. unlock has been
completed).  If we only have a simple barrier at end of mcs_lock, then
say the code on CPU1 is

	mcs_lock
	x = 1;
	...
	x = 2;
	mcs_unlock

and CPU 2 is

	mcs_lock
	y = x;
	...
	mcs_unlock

We expect y to be 2 after the "y = x" assignment.  But we
we may execute the code as

	CPU1		CPU2
		
	x = 1;
	...		y = x;  ( y=1, out of order load)
	x = 2
	mcs_unlock
			Check node->locked==true
			continue executing critical section (y=1 when we expect y=2)

So we get y to be 1 when we expect that it should be 2.  Adding smp_mb
after the node->locked check in lock code

           ACCESS_ONCE(prev->next) = node;
           /* Wait until the lock holder passes the lock down */
           while (!ACCESS_ONCE(node->locked))
                    arch_mutex_cpu_relax();
           smp_mb();

should prevent this scenario.  

Thanks.
Tim
			
> 
> -Longman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-10-01 21:16                             ` Tim Chen
@ 2013-10-02  1:25                               ` Waiman Long
  2013-10-02 18:43                                 ` Tim Chen
  0 siblings, 1 reply; 64+ messages in thread
From: Waiman Long @ 2013-10-02  1:25 UTC (permalink / raw)
  To: Tim Chen
  Cc: Jason Low, Paul McKenney, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On 10/01/2013 05:16 PM, Tim Chen wrote:
> On Tue, 2013-10-01 at 16:01 -0400, Waiman Long wrote:
>>>
>>> The cpu could still be executing out of order load instruction from the
>>> critical section before checking node->locked?  Probably smp_mb() is
>>> still needed.
>>>
>>> Tim
>> But this is the lock function, a barrier() call should be enough to
>> prevent the critical section from creeping up there. We certainly need
>> some kind of memory barrier at the end of the unlock function.
> I may be missing something.  My understanding is that barrier only
> prevents the compiler from rearranging instructions, but not for cpu out
> of order execution (as in smp_mb). So cpu could read memory in the next
> critical section, before node->locked is true, (i.e. unlock has been
> completed).  If we only have a simple barrier at end of mcs_lock, then
> say the code on CPU1 is
>
> 	mcs_lock
> 	x = 1;
> 	...
> 	x = 2;
> 	mcs_unlock
>
> and CPU 2 is
>
> 	mcs_lock
> 	y = x;
> 	...
> 	mcs_unlock
>
> We expect y to be 2 after the "y = x" assignment.  But we
> we may execute the code as
>
> 	CPU1		CPU2
> 		
> 	x = 1;
> 	...		y = x;  ( y=1, out of order load)
> 	x = 2
> 	mcs_unlock
> 			Check node->locked==true
> 			continue executing critical section (y=1 when we expect y=2)
>
> So we get y to be 1 when we expect that it should be 2.  Adding smp_mb
> after the node->locked check in lock code
>
>             ACCESS_ONCE(prev->next) = node;
>             /* Wait until the lock holder passes the lock down */
>             while (!ACCESS_ONCE(node->locked))
>                      arch_mutex_cpu_relax();
>             smp_mb();
>
> should prevent this scenario.
>
> Thanks.
> Tim

If the lock and unlock functions are done right, there should be no 
overlap of critical section. So it is job of the lock/unlock functions 
to make sure that critical section code won't leak out. There should be 
some kind of memory barrier at the beginning of the lock function and 
the end of the unlock function.

The critical section also likely to have branches. The CPU may 
speculatively execute code on the 2 branches, but one of them will be 
discarded once the branch condition is known. Also 
arch_mutex_cpu_relax() is a compiler barrier by itself. So we may not 
need a barrier() after all. The while statement is a branch instruction, 
any code after that can only be speculatively executed and cannot be 
committed until the branch is done.

In x86, the smp_mb() function translated to a mfence instruction which 
cost time. That is why I try to get rid of it if it is not necessary.

Regards,
Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-10-02  1:25                               ` Waiman Long
@ 2013-10-02 18:43                                 ` Tim Chen
  2013-10-02 19:32                                   ` Waiman Long
  0 siblings, 1 reply; 64+ messages in thread
From: Tim Chen @ 2013-10-02 18:43 UTC (permalink / raw)
  To: Waiman Long, Paul McKenney
  Cc: Jason Low, Ingo Molnar, Andrew Morton, Andrea Arcangeli,
	Alex Shi, Andi Kleen, Michel Lespinasse, Davidlohr Bueso,
	Matthew R Wilcox, Dave Hansen, Peter Zijlstra, Rik van Riel,
	Peter Hurley, linux-kernel, linux-mm

On Tue, 2013-10-01 at 21:25 -0400, Waiman Long wrote: 
> On 10/01/2013 05:16 PM, Tim Chen wrote:
> > On Tue, 2013-10-01 at 16:01 -0400, Waiman Long wrote:
> >>>
> >>> The cpu could still be executing out of order load instruction from the
> >>> critical section before checking node->locked?  Probably smp_mb() is
> >>> still needed.
> >>>
> >>> Tim
> >> But this is the lock function, a barrier() call should be enough to
> >> prevent the critical section from creeping up there. We certainly need
> >> some kind of memory barrier at the end of the unlock function.
> > I may be missing something.  My understanding is that barrier only
> > prevents the compiler from rearranging instructions, but not for cpu out
> > of order execution (as in smp_mb). So cpu could read memory in the next
> > critical section, before node->locked is true, (i.e. unlock has been
> > completed).  If we only have a simple barrier at end of mcs_lock, then
> > say the code on CPU1 is
> >
> > 	mcs_lock
> > 	x = 1;
> > 	...
> > 	x = 2;
> > 	mcs_unlock
> >
> > and CPU 2 is
> >
> > 	mcs_lock
> > 	y = x;
> > 	...
> > 	mcs_unlock
> >
> > We expect y to be 2 after the "y = x" assignment.  But we
> > we may execute the code as
> >
> > 	CPU1		CPU2
> > 		
> > 	x = 1;
> > 	...		y = x;  ( y=1, out of order load)
> > 	x = 2
> > 	mcs_unlock
> > 			Check node->locked==true
> > 			continue executing critical section (y=1 when we expect y=2)
> >
> > So we get y to be 1 when we expect that it should be 2.  Adding smp_mb
> > after the node->locked check in lock code
> >
> >             ACCESS_ONCE(prev->next) = node;
> >             /* Wait until the lock holder passes the lock down */
> >             while (!ACCESS_ONCE(node->locked))
> >                      arch_mutex_cpu_relax();
> >             smp_mb();
> >
> > should prevent this scenario.
> >
> > Thanks.
> > Tim
> 
> If the lock and unlock functions are done right, there should be no 
> overlap of critical section. So it is job of the lock/unlock functions 
> to make sure that critical section code won't leak out. There should be 
> some kind of memory barrier at the beginning of the lock function and 
> the end of the unlock function.
> 
> The critical section also likely to have branches. The CPU may 
> speculatively execute code on the 2 branches, but one of them will be 
> discarded once the branch condition is known. Also 
> arch_mutex_cpu_relax() is a compiler barrier by itself. So we may not 
> need a barrier() after all. The while statement is a branch instruction, 
> any code after that can only be speculatively executed and cannot be 
> committed until the branch is done.

But the condition code may be checked after speculative execution? 
The condition may not be true during speculative execution and only
turns true when we check the condition, and take that branch?

The thing that bothers me is without memory barrier after the while
statement, we could speculatively execute before affirming the lock is 
in acquired state. Then when we check the lock, the lock is set
to acquired state in the mean time.    
We could be loading some memory entry *before*
the node->locked has been set true.  I think a smp_rmb (if not a 
smp_mb) should be set after the while statement.

At first I was also thinking that the memory barrier is not 
necessary but Paul convinced me otherwise in a previous email. 
https://lkml.org/lkml/2013/9/27/523 

> 
> In x86, the smp_mb() function translated to a mfence instruction which 
> cost time. That is why I try to get rid of it if it is not necessary.
> 

I also hope that the memory barrier is not necessary and I am missing
something obvious.  But I haven't been able to persuade myself.

Tim 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-09-26 22:42               ` Jason Low
  2013-09-26 22:57                 ` Tim Chen
@ 2013-10-02 19:19                 ` Waiman Long
  2013-10-02 19:30                   ` Jason Low
  1 sibling, 1 reply; 64+ messages in thread
From: Waiman Long @ 2013-10-02 19:19 UTC (permalink / raw)
  To: Jason Low
  Cc: Tim Chen, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On 09/26/2013 06:42 PM, Jason Low wrote:
> On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote:
>> Okay, that would makes sense for consistency because we always
>> first set node->lock = 0 at the top of the function.
>>
>> If we prefer to optimize this a bit though, perhaps we can
>> first move the node->lock = 0 so that it gets executed after the
>> "if (likely(prev == NULL)) {}" code block and then delete
>> "node->lock = 1" inside the code block.
>>
>> static noinline
>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
>> {
>>         struct mcs_spin_node *prev;
>>
>>         /* Init node */
>>         node->next   = NULL;
>>
>>         prev = xchg(lock, node);
>>         if (likely(prev == NULL)) {
>>                 /* Lock acquired */
>>                 return;
>>         }
>>         node->locked = 0;

You can remove the locked flag setting statement inside if (prev == 
NULL), but you can't clear the locked flag after xchg(). In the interval 
between xchg() and locked=0, the previous lock owner may come in and set 
the flag. Now if your clear it, the thread will loop forever. You have 
to clear it before xchg().

-Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-10-02 19:19                 ` Waiman Long
@ 2013-10-02 19:30                   ` Jason Low
  2013-10-02 19:37                     ` Waiman Long
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Low @ 2013-10-02 19:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jason Low, Tim Chen, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On Wed, Oct 2, 2013 at 12:19 PM, Waiman Long <waiman.long@hp.com> wrote:
> On 09/26/2013 06:42 PM, Jason Low wrote:
>>
>> On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote:
>>>
>>> Okay, that would makes sense for consistency because we always
>>> first set node->lock = 0 at the top of the function.
>>>
>>> If we prefer to optimize this a bit though, perhaps we can
>>> first move the node->lock = 0 so that it gets executed after the
>>> "if (likely(prev == NULL)) {}" code block and then delete
>>> "node->lock = 1" inside the code block.
>>>
>>> static noinline
>>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node
>>> *node)
>>> {
>>>         struct mcs_spin_node *prev;
>>>
>>>         /* Init node */
>>>         node->next   = NULL;
>>>
>>>         prev = xchg(lock, node);
>>>         if (likely(prev == NULL)) {
>>>                 /* Lock acquired */
>>>                 return;
>>>         }
>>>         node->locked = 0;
>
>
> You can remove the locked flag setting statement inside if (prev == NULL),
> but you can't clear the locked flag after xchg(). In the interval between
> xchg() and locked=0, the previous lock owner may come in and set the flag.
> Now if your clear it, the thread will loop forever. You have to clear it
> before xchg().

Yes, in my most recent version, I left locked = 0 in its original
place so that the xchg() can act as a barrier for it.

The other option would have been to put another barrier after locked =
0. I went with leaving locked = 0 in its original place so that we
don't need that extra barrier.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-10-02 18:43                                 ` Tim Chen
@ 2013-10-02 19:32                                   ` Waiman Long
  0 siblings, 0 replies; 64+ messages in thread
From: Waiman Long @ 2013-10-02 19:32 UTC (permalink / raw)
  To: Tim Chen
  Cc: Paul McKenney, Jason Low, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On 10/02/2013 02:43 PM, Tim Chen wrote:
> On Tue, 2013-10-01 at 21:25 -0400, Waiman Long wrote:
>>
>> If the lock and unlock functions are done right, there should be no
>> overlap of critical section. So it is job of the lock/unlock functions
>> to make sure that critical section code won't leak out. There should be
>> some kind of memory barrier at the beginning of the lock function and
>> the end of the unlock function.
>>
>> The critical section also likely to have branches. The CPU may
>> speculatively execute code on the 2 branches, but one of them will be
>> discarded once the branch condition is known. Also
>> arch_mutex_cpu_relax() is a compiler barrier by itself. So we may not
>> need a barrier() after all. The while statement is a branch instruction,
>> any code after that can only be speculatively executed and cannot be
>> committed until the branch is done.
> But the condition code may be checked after speculative execution?
> The condition may not be true during speculative execution and only
> turns true when we check the condition, and take that branch?
>
> The thing that bothers me is without memory barrier after the while
> statement, we could speculatively execute before affirming the lock is
> in acquired state. Then when we check the lock, the lock is set
> to acquired state in the mean time.
> We could be loading some memory entry *before*
> the node->locked has been set true.  I think a smp_rmb (if not a
> smp_mb) should be set after the while statement.

Yes, I think a smp_rmb() make sense here to correspond to the smp_wmb() 
in the unlock path.

BTW, you need to move the node->locked = 0; statement before xchg() if 
you haven't done so.

-Longman


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file
  2013-10-02 19:30                   ` Jason Low
@ 2013-10-02 19:37                     ` Waiman Long
  0 siblings, 0 replies; 64+ messages in thread
From: Waiman Long @ 2013-10-02 19:37 UTC (permalink / raw)
  To: Jason Low
  Cc: Tim Chen, Davidlohr Bueso, Ingo Molnar, Andrew Morton,
	Andrea Arcangeli, Alex Shi, Andi Kleen, Michel Lespinasse,
	Davidlohr Bueso, Matthew R Wilcox, Dave Hansen, Peter Zijlstra,
	Rik van Riel, Peter Hurley, linux-kernel, linux-mm

On 10/02/2013 03:30 PM, Jason Low wrote:
> On Wed, Oct 2, 2013 at 12:19 PM, Waiman Long<waiman.long@hp.com>  wrote:
>> On 09/26/2013 06:42 PM, Jason Low wrote:
>>> On Thu, 2013-09-26 at 14:41 -0700, Tim Chen wrote:
>>>> Okay, that would makes sense for consistency because we always
>>>> first set node->lock = 0 at the top of the function.
>>>>
>>>> If we prefer to optimize this a bit though, perhaps we can
>>>> first move the node->lock = 0 so that it gets executed after the
>>>> "if (likely(prev == NULL)) {}" code block and then delete
>>>> "node->lock = 1" inside the code block.
>>>>
>>>> static noinline
>>>> void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node
>>>> *node)
>>>> {
>>>>          struct mcs_spin_node *prev;
>>>>
>>>>          /* Init node */
>>>>          node->next   = NULL;
>>>>
>>>>          prev = xchg(lock, node);
>>>>          if (likely(prev == NULL)) {
>>>>                  /* Lock acquired */
>>>>                  return;
>>>>          }
>>>>          node->locked = 0;
>>
>> You can remove the locked flag setting statement inside if (prev == NULL),
>> but you can't clear the locked flag after xchg(). In the interval between
>> xchg() and locked=0, the previous lock owner may come in and set the flag.
>> Now if your clear it, the thread will loop forever. You have to clear it
>> before xchg().
> Yes, in my most recent version, I left locked = 0 in its original
> place so that the xchg() can act as a barrier for it.
>
> The other option would have been to put another barrier after locked =
> 0. I went with leaving locked = 0 in its original place so that we
> don't need that extra barrier.

I don't think putting another barrier after locked=0 will work. 
Chronologically, the flag must be cleared before the node address is 
saved in the lock field. There is no way to guarantee that except by 
putting the locked=0 before xchg().

-Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2013-10-02 19:37 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1380144003.git.tim.c.chen@linux.intel.com>
2013-09-25 22:10 ` [PATCH v6 0/6] rwsem: performance optimizations Tim Chen
2013-09-25 22:10 ` [PATCH v6 1/6] rwsem: check the lock before cpmxchg in down_write_trylock Tim Chen
2013-09-25 22:10 ` [PATCH v6 2/6] rwsem: remove 'out' label in do_wake Tim Chen
2013-09-25 22:10 ` [PATCH v6 3/6] rwsem: remove try_reader_grant label do_wake Tim Chen
2013-09-25 22:10 ` [PATCH v6 4/6] rwsem/wake: check lock before do atomic update Tim Chen
2013-09-25 22:10 ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
2013-09-26  6:46   ` Ingo Molnar
2013-09-26  8:40     ` Peter Zijlstra
2013-09-26  9:37       ` Ingo Molnar
2013-09-26 18:18       ` Tim Chen
2013-09-26 19:27   ` Jason Low
2013-09-26 20:06     ` Davidlohr Bueso
2013-09-26 20:23       ` Jason Low
2013-09-26 20:40         ` Davidlohr Bueso
2013-09-26 21:09           ` Jason Low
2013-09-26 21:41             ` Tim Chen
2013-09-26 22:42               ` Jason Low
2013-09-26 22:57                 ` Tim Chen
2013-09-27  6:02                   ` Ingo Molnar
2013-09-27  6:26                     ` Jason Low
2013-09-27 11:23                     ` Peter Zijlstra
2013-09-27 13:44                       ` Joe Perches
2013-09-27 13:48                         ` Peter Zijlstra
2013-09-27 14:05                           ` Joe Perches
2013-09-27 14:18                             ` Peter Zijlstra
2013-09-27 14:14                           ` [PATCH] checkpatch: Make the memory barrier test noisier Joe Perches
2013-09-27 14:26                             ` Peter Zijlstra
2013-09-27 14:34                               ` Joe Perches
2013-09-27 14:50                                 ` Peter Zijlstra
2013-09-27 15:17                                   ` Paul E. McKenney
2013-09-27 15:34                                     ` Peter Zijlstra
2013-09-27 16:04                                       ` Paul E. McKenney
2013-09-27 23:40                                   ` Oliver Neukum
2013-09-28  7:54                                     ` Peter Zijlstra
2013-09-27 16:12                     ` [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file Jason Low
2013-09-27 16:19                       ` Tim Chen
2013-10-02 19:19                 ` Waiman Long
2013-10-02 19:30                   ` Jason Low
2013-10-02 19:37                     ` Waiman Long
2013-09-26 22:22             ` Davidlohr Bueso
2013-09-27 15:29   ` Paul E. McKenney
2013-09-27 18:09     ` Tim Chen
2013-09-28  2:58       ` Waiman Long
2013-09-27 19:38     ` Tim Chen
2013-09-27 20:16       ` Jason Low
2013-09-27 20:38       ` Paul E. McKenney
2013-09-27 22:46         ` Tim Chen
2013-09-27 23:01           ` Paul E. McKenney
2013-09-27 23:54             ` Jason Low
2013-09-28  0:02               ` Davidlohr Bueso
2013-09-28  2:19               ` Paul E. McKenney
2013-09-28  4:34                 ` Jason Low
2013-09-30 15:51                   ` Waiman Long
2013-09-30 16:10                     ` Jason Low
2013-09-30 16:36                       ` Waiman Long
2013-10-01 16:48                         ` Tim Chen
2013-10-01 20:01                           ` Waiman Long
2013-10-01 21:16                             ` Tim Chen
2013-10-02  1:25                               ` Waiman Long
2013-10-02 18:43                                 ` Tim Chen
2013-10-02 19:32                                   ` Waiman Long
2013-09-30 16:28                 ` Tim Chen
2013-09-25 22:10 ` [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition Tim Chen
2013-09-26  6:53   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).