All of lore.kernel.org
 help / color / mirror / Atom feed
From: boqun.feng@gmail.com (Boqun Feng)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
Date: Fri, 11 Dec 2015 16:09:11 +0800	[thread overview]
Message-ID: <20151211080911.GD1072@fixme-laptop.cn.ibm.com> (raw)
In-Reply-To: <20151207002601.GB13845@fixme-laptop.cn.ibm.com>

On Mon, Dec 07, 2015 at 08:26:01AM +0800, Boqun Feng wrote:
> On Sun, Dec 06, 2015 at 11:27:34AM -0800, Paul E. McKenney wrote:
> > On Sun, Dec 06, 2015 at 04:16:17PM +0800, Boqun Feng wrote:
> > > Hi Paul,
> > > 
> > > On Thu, Dec 03, 2015 at 09:22:07AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Dec 03, 2015 at 04:32:43PM +0000, Will Deacon wrote:
> > > > > Hi Peter, Paul,
> > > > > 
> > > > > Firstly, thanks for writing that up. I agree that you have something
> > > > > that can work in theory, but see below.
> > > > > 
> > > > > On Thu, Dec 03, 2015 at 02:28:39PM +0100, Peter Zijlstra wrote:
> > > > > > On Wed, Dec 02, 2015 at 04:11:41PM -0800, Paul E. McKenney wrote:
> > > > > > > This looks architecture-agnostic to me:
> > > > > > > 
> > > > > > > a.	TSO systems have smp_mb__after_unlock_lock() be a no-op, and
> > > > > > > 	have a read-only implementation for spin_unlock_wait().
> > > > > > > 
> > > > > > > b.	Small-scale weakly ordered systems can also have
> > > > > > > 	smp_mb__after_unlock_lock() be a no-op, but must instead
> > > > > > > 	have spin_unlock_wait() acquire the lock and immediately 
> > > > > > > 	release it, or some optimized implementation of this.
> > > > > > > 
> > > > > > > c.	Large-scale weakly ordered systems are required to define
> > > > > > > 	smp_mb__after_unlock_lock() as smp_mb(), but can have a
> > > > > > > 	read-only implementation of spin_unlock_wait().
> > > > > > 
> > > > > > This would still require all relevant spin_lock() sites to be annotated
> > > > > > with smp_mb__after_unlock_lock(), which is going to be a painful (no
> > > > > > warning when done wrong) exercise and expensive (added MBs all over the
> > > > > > place).
> > > > 
> > > > On the lack of warning, agreed, but please see below.  On the added MBs,
> > > > the only alternative I have been able to come up with has even more MBs,
> > > > as in on every lock acquisition.  If I am missing something, please do
> > > > not keep it a secret!
> > > > 
> > > 
> > > Maybe we can treat this problem as a problem of data accesses other than
> > > one of locks?
> > > 
> > > Let's take the example of tsk->flags in do_exit() and tsk->pi_lock, we
> > > don't need to add a full barrier for every lock acquisition of
> > > ->pi_lock, because some critical sections of ->pi_lock don't access the
> > > PF_EXITING bit of ->flags at all. What we only need is to add a full
> > > barrier before reading the PF_EXITING bit in a critical section of
> > > ->pi_lock. To achieve this, we could introduce a primitive like
> > > smp_load_in_lock():
> > > 
> > > (on PPC and ARM64v8)
> > > 
> > > 	#define smp_load_in_lock(x, lock) 		\
> > > 		({ 					\
> > > 			smp_mb();			\
> > > 			READ_ONCE(x);			\
> > > 		})
> > > 
> > > (on other archs)
> > > 	
> > > 	#define smp_load_in_lock(x, lock) READ_ONCE(x)
> > > 
> > > 
> > > And call it every time we read a data which is not protected by the
> > > current lock critical section but whose updaters synchronize with the
> > > current lock critical section with spin_unlock_wait().
> > > 
> > > I admit the name may be bad and the second parameter @lock is for a way
> > > to diagnosing the usage which I haven't come up with yet ;-)
> > > 
> > > Thoughts?
> > 
> > In other words, dispense with smp_mb__after_unlock_lock() in those cases,
> > and use smp_load_in_lock() to get the desired effect?
> > 
> 
> Exactly.
> 
> > If so, one concern is how to check for proper use of smp_load_in_lock().
> 
> I also propose that on the updaters' side, we merge STORE and smp_mb()
> into another primitive, maybe smp_store_out_of_lock(). After that we
> make sure a smp_store_out_of_lock() plus a spin_unlock_wait() pairs with
> a spin_lock plus a smp_load_in_lock() in the following way:
> 
> 	CPU 0				CPU 1
> 	==============================================================
> 	smp_store_out_of_lock(o, NULL, lock);
> 	<other stores or reads>
> 	spin_unlock_wait(lock);		spin_lock(lock);
> 					<other stores or reads>
> 					obj = smp_load_in_lock(o, lock);
> 
> Their names and this pairing pattern could help us check their usages.
> And we can also try to come up with a way to use lockdep to check their
> usages automatically. Anyway, I don't think that is more difficult to
> check the usage of smp_mb__after_unlock_lock() for the same purpose of
> ordering "Prior Write" with "Current Read" ;-)
> 
> > Another concern is redundant smp_mb() instances in case of multiple
> > accesses to the data under a given critical section.
> > 
> 
> First, I don't think there would be too many cases which a lock critical
> section needs to access multiple variables, which are modified outside
> the critical section and synchronized with spin_unlock_wait(). Because
> using spin_unlock_wait() to synchronize with lock critical sections is
> itself an very weird usage(you could just take the lock).
> 
> Second, even if we have redundant smp_mb()s, we avoid to:
> 
> 1.	use a ll/sc loop on updaters' sides as Will proposed
> 
> or
> 
> 2.	put a full barrier *just* following spin_lock() as you proposed,
> 	which could forbid unrelated data accesses to be moved before
> 	the store part of spin_lock().
> 
> Whether these two perform better than redundant smp_mb()s in a lock
> critical section is uncertain, right?
> 
> Third, even if this perform worse than Will's or your proposal, we don't
> need to maintain two quite different ways to solve the same problem for
> PPC and ARM64v8, that's one benefit of this.
> 

Perhaps it's better if we could look into the use cases before we make a
move. So I have gone through all the uses of spin_unlock_wait() and
friends, and there is a lot of fun ;-)

Cscope tells me there are 7 uses of spin_unlock_wait() and friends. And
AFAICS, a fix is really needed, only if spin_unlock_wait() with a
smp_mb() preceding it wants to synchronize the memory accesses before
the smp_mb() with the memory accesses in the next lock critical section.
So in these 7 uses, 3 of them surely don't need to fix, which are
(according to Linus and Peter:
http://marc.info/?l=linux-kernel&m=144768943410858):

*	spin_unlock_wait() in sem_wait_array()
*	spin_unlock_wait() in exit_sem()
*	spin_unlock_wait() in completion_done()

And for the rest four, I think one of them doesn't need to fix either:

*	spin_unlock_wait() in ata_scsi_cmd_error_handler()

as there is no smp_mb() before it, and the logic here seems to be simply
waiting for the erred host to release its lock so that the error handler
can begin, though I'm not 100% sure because I have zero knowledge of the
ata stuff.

For the rest three, related lock critical sections and related varibles
are as follow:

1.	raw_spin_unlock_wait() after exit_signals() in do_exit()

	wants to synchronize the STORE to PF_EXITING bit in task->flags
	with the LOAD from PF_EXITING bit in task->flags in the critical
	section of task->pi_lock in attach_to_pi_owner()

2.	raw_spin_unlock_wait() after exit_rcu() in do_exit()

	wants to synchronize the STORE to task->state
	with the LOAD from task->state in the critical section of
	task->pi_lock in try_to_wake_up()

3.	raw_spin_unlock_wait() in task_work_run()

	wants to synchronize the STORE to task->task_works
	with the LOAD from task->task_works in the critical section of
	task->pi_lock in task_work_cancel()

(One interesting thing is that in use #3, the critical section of
->pi_lock protects nothing but the the task->task_works and other
operations on task->task_works don't use a lock, which at least
indicates we can use a different lock there.)

In conclusion, we have more than a half of uses working well already,
and each of the fix-needed ones has only one related critical section
and only one related data access in it. So on PPC, I think my proposal
won't have more smp_mb() instances to fix all current use cases than
adding smp_mb__after_unlock_lock() after the lock acquisition in each
related lock critical section.

Of course, my proposal needs the buy-ins of both PPC and ARM64v8, so
Paul and Will, what do you think? ;-)

Regards,
Boqun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20151211/0a071fed/attachment.sig>

  reply	other threads:[~2015-12-11  8:09 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-27 11:44 [PATCH] arm64: spinlock: serialise spin_unlock_wait against concurrent lockers Will Deacon
2015-11-30 15:58 ` Peter Zijlstra
2015-11-30 18:21   ` Paul E. McKenney
2015-12-01 16:40   ` Will Deacon
2015-12-03  0:11     ` Paul E. McKenney
2015-12-03 13:28       ` Peter Zijlstra
2015-12-03 16:32         ` Will Deacon
2015-12-03 17:22           ` Paul E. McKenney
2015-12-04  9:21             ` Peter Zijlstra
2015-12-04 16:07               ` Paul E. McKenney
2015-12-04 16:24                 ` Will Deacon
2015-12-04 16:44                   ` Paul E. McKenney
2015-12-06  7:37                     ` Boqun Feng
2015-12-06 19:23                       ` Paul E. McKenney
2015-12-06 23:28                         ` Boqun Feng
2015-12-07  0:00                           ` Paul E. McKenney
2015-12-07  0:45                             ` Boqun Feng
2015-12-07 10:34                               ` Peter Zijlstra
2015-12-07 15:45                                 ` Paul E. McKenney
2015-12-08  8:42                                   ` Boqun Feng
2015-12-08 19:17                                     ` Paul E. McKenney
2015-12-09  6:43                                       ` Boqun Feng
2015-12-04  9:36             ` Peter Zijlstra
2015-12-04 16:13               ` Paul E. McKenney
2015-12-07  2:12                 ` Michael Ellerman
2015-12-06  8:16             ` Boqun Feng
2015-12-06 19:27               ` Paul E. McKenney
2015-12-07  0:26                 ` Boqun Feng
2015-12-11  8:09                   ` Boqun Feng [this message]
2015-12-11  9:46                     ` Will Deacon
2015-12-11 12:20                       ` Boqun Feng
2015-12-11 13:42                       ` Paul E. McKenney
2015-12-11 13:54                         ` Will Deacon
2015-12-01  0:40 ` Boqun Feng
2015-12-01 16:32   ` Will Deacon
2015-12-02  9:40     ` Boqun Feng
2015-12-02 11:16       ` Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151211080911.GD1072@fixme-laptop.cn.ibm.com \
    --to=boqun.feng@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.