All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Chris Mason <chris.mason@fusionio.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: RFC: mutex: hung tasks on SMP platforms with asm-generic/mutex-xchg.h
Date: Thu, 09 Aug 2012 01:12:15 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.02.1208090029460.5231@xanadu.home> (raw)
In-Reply-To: <20120807115647.GA12828@mudshark.cambridge.arm.com>

On Tue, 7 Aug 2012, Will Deacon wrote:

> Hello,
> 
> ARM recently moved to asm-generic/mutex-xchg.h for its mutex implementation
> after our previous implementation was found to be missing some crucial
> memory barriers. However, I'm seeing some problems running hackbench on
> SMP platforms due to the way in which the MUTEX_SPIN_ON_OWNER code operates.
> 
> The symptoms are that a bunch of hackbench tasks are left waiting on an
> unlocked mutex and therefore never get woken up to claim it. I think this
> boils down to the following sequence:
> 
> 
>         Task A        Task B        Task C        Lock value
> 0                                                     1
> 1       lock()                                        0
> 2                     lock()                          0
> 3                     spin(A)                         0
> 4       unlock()                                      1
> 5                                   lock()            0
> 6                     cmpxchg(1,0)                    0
> 7                     contended()                    -1
> 8       lock()                                        0
> 9       spin(C)                                       0
> 10                                  unlock()          1
> 11      cmpxchg(1,0)                                  0
> 12      unlock()                                      1
> 
> 
> At this point, the lock is unlocked, but Task B is in an uninterruptible
> sleep with nobody to wake it up.
> 
> The following patch fixes the problem by ensuring we put the lock into
> the contended state if we acquire it from the spin loop on the slowpath
> but I'd like to be sure that this won't cause problems with other mutex
> implementations:
> 
> 
> diff --git a/kernel/mutex.c b/kernel/mutex.c
> index a307cc9..27b7887 100644
> --- a/kernel/mutex.c
> +++ b/kernel/mutex.c
> @@ -170,7 +170,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>                 if (owner && !mutex_spin_on_owner(lock, owner))
>                         break;
>  
> -               if (atomic_cmpxchg(&lock->count, 1, 0) == 1) {
> +               if (atomic_cmpxchg(&lock->count, 1, -1) == 1) {
>                         lock_acquired(&lock->dep_map, ip);
>                         mutex_set_owner(lock);
>                         preempt_enable();
> 
> 
> All comments welcome.

Well... after thinking about this for a while, I came to the conclusion 
that the mutex_spin_on_owner code is indeed breaking the contract 
between the xchg lock fast path and the slow path.  The xchg fast path 
will always set the count to 0 and rely on the slow path to restore a 
possible pre-existing negative count.  So the above change would be 
needed for correctness in the xchg case, even if it always forces the 
unlock into the slow path.

As I mentioned previously, we don't want to force the slow path in all 
cases though.  The atomic decrement based fast path doesn't need that, 
so I'd suggest this fix instead:

diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
index 580a6d35c7..60964844c8 100644
--- a/include/asm-generic/mutex-xchg.h
+++ b/include/asm-generic/mutex-xchg.h
@@ -108,4 +108,6 @@ __mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
 	return prev;
 }
 
+#define __MUTEX_XCHG_FAST_PATH
+
 #endif
diff --git a/kernel/mutex.c b/kernel/mutex.c
index a307cc9c95..c6a26a4f1c 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -161,6 +161,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 
 	for (;;) {
 		struct task_struct *owner;
+		int locked_val;
 
 		/*
 		 * If there's an owner, wait for it to either
@@ -170,7 +171,19 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		if (owner && !mutex_spin_on_owner(lock, owner))
 			break;
 
-		if (atomic_cmpxchg(&lock->count, 1, 0) == 1) {
+#ifdef __MUTEX_XCHG_FAST_PATH
+		/*
+		 * The fast path based on xchg sets a transient 0 count,
+		 * relying on the slow path to restore a possible
+		 * pre-existing contended count.  Without checking the
+		 * waiters' list we must presume possible contension here.
+		 */
+		locked_val = -1;
+#else
+		locked_val = 0;
+#endif
+
+		if (atomic_cmpxchg(&lock->count, 1, locked_val) == 1) {
 			lock_acquired(&lock->dep_map, ip);
 			mutex_set_owner(lock);
 			preempt_enable();

That would be needed for the stable tree as well.

A further cleanup could remove all definitions of 
__mutex_slowpath_needs_to_unlock() given that they're all set to 1 
except for the xchg fast path, in which case __MUTEX_XCHG_FAST_PATH 
could be reused instead.

Of course that might tilt the balance towards using mutex-dec.h on ARM 
v6 and above instead of mutex-xchg.h.  But that is an orthogonal issue, 
and that doesn't remove the need for fixing the xchg based case for 
correctness.


Nicolas

WARNING: multiple messages have this Message-ID (diff)
From: nico@fluxnic.net (Nicolas Pitre)
To: linux-arm-kernel@lists.infradead.org
Subject: RFC: mutex: hung tasks on SMP platforms with asm-generic/mutex-xchg.h
Date: Thu, 09 Aug 2012 01:12:15 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.02.1208090029460.5231@xanadu.home> (raw)
In-Reply-To: <20120807115647.GA12828@mudshark.cambridge.arm.com>

On Tue, 7 Aug 2012, Will Deacon wrote:

> Hello,
> 
> ARM recently moved to asm-generic/mutex-xchg.h for its mutex implementation
> after our previous implementation was found to be missing some crucial
> memory barriers. However, I'm seeing some problems running hackbench on
> SMP platforms due to the way in which the MUTEX_SPIN_ON_OWNER code operates.
> 
> The symptoms are that a bunch of hackbench tasks are left waiting on an
> unlocked mutex and therefore never get woken up to claim it. I think this
> boils down to the following sequence:
> 
> 
>         Task A        Task B        Task C        Lock value
> 0                                                     1
> 1       lock()                                        0
> 2                     lock()                          0
> 3                     spin(A)                         0
> 4       unlock()                                      1
> 5                                   lock()            0
> 6                     cmpxchg(1,0)                    0
> 7                     contended()                    -1
> 8       lock()                                        0
> 9       spin(C)                                       0
> 10                                  unlock()          1
> 11      cmpxchg(1,0)                                  0
> 12      unlock()                                      1
> 
> 
> At this point, the lock is unlocked, but Task B is in an uninterruptible
> sleep with nobody to wake it up.
> 
> The following patch fixes the problem by ensuring we put the lock into
> the contended state if we acquire it from the spin loop on the slowpath
> but I'd like to be sure that this won't cause problems with other mutex
> implementations:
> 
> 
> diff --git a/kernel/mutex.c b/kernel/mutex.c
> index a307cc9..27b7887 100644
> --- a/kernel/mutex.c
> +++ b/kernel/mutex.c
> @@ -170,7 +170,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>                 if (owner && !mutex_spin_on_owner(lock, owner))
>                         break;
>  
> -               if (atomic_cmpxchg(&lock->count, 1, 0) == 1) {
> +               if (atomic_cmpxchg(&lock->count, 1, -1) == 1) {
>                         lock_acquired(&lock->dep_map, ip);
>                         mutex_set_owner(lock);
>                         preempt_enable();
> 
> 
> All comments welcome.

Well... after thinking about this for a while, I came to the conclusion 
that the mutex_spin_on_owner code is indeed breaking the contract 
between the xchg lock fast path and the slow path.  The xchg fast path 
will always set the count to 0 and rely on the slow path to restore a 
possible pre-existing negative count.  So the above change would be 
needed for correctness in the xchg case, even if it always forces the 
unlock into the slow path.

As I mentioned previously, we don't want to force the slow path in all 
cases though.  The atomic decrement based fast path doesn't need that, 
so I'd suggest this fix instead:

diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
index 580a6d35c7..60964844c8 100644
--- a/include/asm-generic/mutex-xchg.h
+++ b/include/asm-generic/mutex-xchg.h
@@ -108,4 +108,6 @@ __mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
 	return prev;
 }
 
+#define __MUTEX_XCHG_FAST_PATH
+
 #endif
diff --git a/kernel/mutex.c b/kernel/mutex.c
index a307cc9c95..c6a26a4f1c 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -161,6 +161,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 
 	for (;;) {
 		struct task_struct *owner;
+		int locked_val;
 
 		/*
 		 * If there's an owner, wait for it to either
@@ -170,7 +171,19 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		if (owner && !mutex_spin_on_owner(lock, owner))
 			break;
 
-		if (atomic_cmpxchg(&lock->count, 1, 0) == 1) {
+#ifdef __MUTEX_XCHG_FAST_PATH
+		/*
+		 * The fast path based on xchg sets a transient 0 count,
+		 * relying on the slow path to restore a possible
+		 * pre-existing contended count.  Without checking the
+		 * waiters' list we must presume possible contension here.
+		 */
+		locked_val = -1;
+#else
+		locked_val = 0;
+#endif
+
+		if (atomic_cmpxchg(&lock->count, 1, locked_val) == 1) {
 			lock_acquired(&lock->dep_map, ip);
 			mutex_set_owner(lock);
 			preempt_enable();

That would be needed for the stable tree as well.

A further cleanup could remove all definitions of 
__mutex_slowpath_needs_to_unlock() given that they're all set to 1 
except for the xchg fast path, in which case __MUTEX_XCHG_FAST_PATH 
could be reused instead.

Of course that might tilt the balance towards using mutex-dec.h on ARM 
v6 and above instead of mutex-xchg.h.  But that is an orthogonal issue, 
and that doesn't remove the need for fixing the xchg based case for 
correctness.


Nicolas

  parent reply	other threads:[~2012-08-09  5:12 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-07 11:56 RFC: mutex: hung tasks on SMP platforms with asm-generic/mutex-xchg.h Will Deacon
2012-08-07 11:56 ` Will Deacon
2012-08-07 13:48 ` Peter Zijlstra
2012-08-07 13:48   ` Peter Zijlstra
2012-08-07 14:04   ` Will Deacon
2012-08-07 14:04     ` Will Deacon
2012-08-07 17:14 ` Nicolas Pitre
2012-08-07 17:14   ` Nicolas Pitre
2012-08-07 17:33   ` Will Deacon
2012-08-07 17:33     ` Will Deacon
2012-08-07 17:38     ` Will Deacon
2012-08-07 17:38       ` Will Deacon
2012-08-07 18:28     ` Nicolas Pitre
2012-08-07 18:28       ` Nicolas Pitre
2012-08-09  5:12 ` Nicolas Pitre [this message]
2012-08-09  5:12   ` Nicolas Pitre
2012-08-09 14:49   ` Will Deacon
2012-08-09 14:49     ` Will Deacon
2012-08-09 16:17     ` Nicolas Pitre
2012-08-09 16:17       ` Nicolas Pitre
2012-08-09 16:57       ` Nicolas Pitre
2012-08-09 16:57         ` Nicolas Pitre
2012-08-09 17:50         ` Will Deacon
2012-08-09 17:50           ` Will Deacon
2012-08-09 18:09           ` Nicolas Pitre
2012-08-09 18:09             ` Nicolas Pitre
2012-08-09 18:17             ` Will Deacon
2012-08-09 18:17               ` Will Deacon
2012-08-09 20:05               ` Nicolas Pitre
2012-08-09 20:05                 ` Nicolas Pitre
2012-08-13  8:15         ` Peter Zijlstra
2012-08-13  8:15           ` Peter Zijlstra
2012-08-13  9:13           ` Will Deacon
2012-08-13  9:13             ` Will Deacon
2012-08-13 13:35           ` Nicolas Pitre
2012-08-13 13:35             ` Nicolas Pitre
2012-08-13 14:05             ` Peter Zijlstra
2012-08-13 14:05               ` Peter Zijlstra
2012-08-13 14:11               ` Will Deacon
2012-08-13 14:11                 ` Will Deacon
2012-08-13 14:45                 ` Peter Zijlstra
2012-08-13 14:45                   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1208090029460.5231@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=a.p.zijlstra@chello.nl \
    --cc=arnd@arndb.de \
    --cc=chris.mason@fusionio.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.