linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patches] seqlock: add barrier-less special cases for seqcounts
@ 2010-11-11  8:00 Nick Piggin
  2010-11-12 16:39 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2010-11-11  8:00 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

Add branch annotations for seqlock read fastpath, and introduce
__read_seqcount_begin and __read_seqcount_end functions, that can avoid
the smp_rmb() if it is provided with some other barrier. Read barriers
have non trivial cost on many architectures.

These will be used by store-free path walking algorithm, where
performance is critical and seqlocks are widely used.

Also expand and kerneldocify comments for seqlock functions in general.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 include/linux/seqlock.h |   67 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 7 deletions(-)

Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h	2010-10-21 23:29:46.000000000 +1100
+++ linux-2.6/include/linux/seqlock.h	2010-10-21 23:29:59.000000000 +1100
@@ -107,7 +107,7 @@ static __always_inline int read_seqretry
 {
 	smp_rmb();
 
-	return (sl->sequence != start);
+	return unlikely(sl->sequence != start);
 }
 
 
@@ -125,14 +125,25 @@ typedef struct seqcount {
 #define SEQCNT_ZERO { 0 }
 #define seqcount_init(x)	do { *(x) = (seqcount_t) SEQCNT_ZERO; } while (0)
 
-/* Start of read using pointer to a sequence counter only.  */
-static inline unsigned read_seqcount_begin(const seqcount_t *s)
+/**
+ * __read_seqcount_begin - begin a seq-read critical section (without barrier)
+ * @s: pointer to seqcount_t
+ * Returns: count to be passed to read_seqcount_retry
+ *
+ * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
+ * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
+ * provided before actually loading any of the variables that are to be
+ * protected in this critical section.
+ *
+ * Use carefully, only in critical code, and comment how the barrier is
+ * provided.
+ */
+static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret;
 
 repeat:
 	ret = s->sequence;
-	smp_rmb();
 	if (unlikely(ret & 1)) {
 		cpu_relax();
 		goto repeat;
@@ -140,14 +151,56 @@ static inline unsigned read_seqcount_beg
 	return ret;
 }
 
-/*
- * Test if reader processed invalid data because sequence number has changed.
+/**
+ * read_seqcount_begin - begin a seq-read critical section
+ * @s: pointer to seqcount_t
+ * Returns: count to be passed to read_seqcount_retry
+ *
+ * read_seqcount_begin opens a read critical section of the given seqcount.
+ * Validity of the critical section is tested by checking read_seqcount_retry
+ * function.
+ */
+static inline unsigned read_seqcount_begin(const seqcount_t *s)
+{
+	unsigned ret = __read_seqcount_begin(s);
+	smp_rmb();
+	return ret;
+}
+
+/**
+ * __read_seqcount_retry - end a seq-read critical section (without barrier)
+ * @s: pointer to seqcount_t
+ * @start: count, from read_seqcount_begin
+ * Returns: 1 if retry is required, else 0
+ *
+ * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
+ * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
+ * provided before actually loading any of the variables that are to be
+ * protected in this critical section.
+ *
+ * Use carefully, only in critical code, and comment how the barrier is
+ * provided.
+ */
+static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
+{
+	return unlikely(s->sequence != start);
+}
+
+/**
+ * read_seqcount_retry - end a seq-read critical section
+ * @s: pointer to seqcount_t
+ * @start: count, from read_seqcount_begin
+ * Returns: 1 if retry is required, else 0
+ *
+ * read_seqcount_retry closes a read critical section of the given seqcount.
+ * If the critical section was invalid, it must be ignored (and typically
+ * retried).
  */
 static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
 	smp_rmb();
 
-	return s->sequence != start;
+	return __read_seqcount_retry(s, start);
 }
 
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patches] seqlock: add barrier-less special cases for seqcounts
  2010-11-11  8:00 [patches] seqlock: add barrier-less special cases for seqcounts Nick Piggin
@ 2010-11-12 16:39 ` Linus Torvalds
  2010-11-12 23:06   ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2010-11-12 16:39 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Al Viro, Ingo Molnar, Thomas Gleixner

On Thu, Nov 11, 2010 at 12:00 AM, Nick Piggin <npiggin@kernel.dk> wrote:
> Add branch annotations for seqlock read fastpath, and introduce
> __read_seqcount_begin and __read_seqcount_end functions, that can avoid
> the smp_rmb() if it is provided with some other barrier. Read barriers
> have non trivial cost on many architectures.
>
> These will be used by store-free path walking algorithm, where
> performance is critical and seqlocks are widely used.

A couple of questions:

 - what are the barriers in question? IOW, describe some normal use.

 - do we really want the "repeat until seqlock is even" code in the
__read_seqcount_begin() code for those kinds of internal cases?

That second one is very much a question for the use-case like the
pathname walk where you have a fall-back that uses "real" locking
rather than the optimistic sequence locks. I have a suspicion that if
seq_locks are used as an "optimistic lockless path with a locking
fallback", then if we see an odd value at the beginning we should
consider it a hint that the sequence lock is contended and the
optimistic path should be aborted early.

In other words, I kind of suspect that anybody that wants to use some
internal sequence lock function like __read_seqcount_begin() would
also want to do its own decision about what happens when the seqlock
is already in the middle of having an active writer.

So the interface seems a bit broken: if we really want to expose these
kinds of internal helper functions, then I suspect not only the
smp_rmb(), but also the whole "loop until even" should be in the
normal "read_seqcount_begin()" function, and __read_seqcount_begin()
would _literally_ just do the single sequence counter access.

I dunno. Just a gut feel. Added Al, Ingo and Thomas to the Cc - the
whole "loop in begin" was added by Ingo and Thomas a few years ago to
avoid a live-lock, but that live-lock issue really isn't an issue if
you end up falling back on a locking algorithm and have a "early
failure" case for the __read_seqcount_begin() the same way we have the
final failure case for [__]read_seqcount_retry().

             Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patches] seqlock: add barrier-less special cases for seqcounts
  2010-11-12 16:39 ` Linus Torvalds
@ 2010-11-12 23:06   ` Nick Piggin
  2010-11-12 23:52     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2010-11-12 23:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, linux-kernel, Al Viro, Ingo Molnar, Thomas Gleixner

On Fri, Nov 12, 2010 at 08:39:17AM -0800, Linus Torvalds wrote:
> On Thu, Nov 11, 2010 at 12:00 AM, Nick Piggin <npiggin@kernel.dk> wrote:
> > Add branch annotations for seqlock read fastpath, and introduce
> > __read_seqcount_begin and __read_seqcount_end functions, that can avoid
> > the smp_rmb() if it is provided with some other barrier. Read barriers
> > have non trivial cost on many architectures.
> >
> > These will be used by store-free path walking algorithm, where
> > performance is critical and seqlocks are widely used.
> 
> A couple of questions:
> 
>  - what are the barriers in question? IOW, describe some normal use.

OK, anything that provides smp_rmb() or stronger. I really prefer not
to have a "normal" usage for these things, only *really* carefully
controlled and critical parts. I came up with these after doing some
testing on a POWER7 to really shave off cycles.

In the case of rcu-walk, we basically have a chain of dentries to walk
down, and so we need to take seqlocks as we go. So the pattern goes:

dentry = cwd;
seq = read_seqlock(&dentry->d_seq);
/* do path walk */
child = d_lookup(dentry, name);
seq2 = read_seqlock(&child->d_seq);
if (read_seqretry(&dentry->d_seq, seq))
  /* bail out */

So we have to have these inter-linked chain of seqlocks covering the
walk. As such, the smp_rmb tends to get repeated in each one, wheras
we don't actually have to have the smp_rmb for the child issued until
after we verify the parent's sequence (because we don't load anything
from the child until after that).

I really don't anticipate many other users, but perhaps similar case
like walking down nodes of a tree or something.


>  - do we really want the "repeat until seqlock is even" code in the
> __read_seqcount_begin() code for those kinds of internal cases?
> 
> That second one is very much a question for the use-case like the
> pathname walk where you have a fall-back that uses "real" locking
> rather than the optimistic sequence locks. I have a suspicion that if
> seq_locks are used as an "optimistic lockless path with a locking
> fallback", then if we see an odd value at the beginning we should
> consider it a hint that the sequence lock is contended and the
> optimistic path should be aborted early.
> 
> In other words, I kind of suspect that anybody that wants to use some
> internal sequence lock function like __read_seqcount_begin() would
> also want to do its own decision about what happens when the seqlock
> is already in the middle of having an active writer.
> 
> So the interface seems a bit broken: if we really want to expose these
> kinds of internal helper functions, then I suspect not only the
> smp_rmb(), but also the whole "loop until even" should be in the
> normal "read_seqcount_begin()" function, and __read_seqcount_begin()
> would _literally_ just do the single sequence counter access.
> 
> I dunno. Just a gut feel. Added Al, Ingo and Thomas to the Cc - the
> whole "loop in begin" was added by Ingo and Thomas a few years ago to
> avoid a live-lock, but that live-lock issue really isn't an issue if
> you end up falling back on a locking algorithm and have a "early
> failure" case for the __read_seqcount_begin() the same way we have the
> final failure case for [__]read_seqcount_retry().

Possibly, you're right. Now the fallback case is obviously suboptimal
and heavyweight, so we do want to avoid it if we can. Also not having
an error to handle in seqcount_begin is just one less thing to worry
about.

I mean, we can just fall out immediately if we want to, but is there
much advantage in doing so? The write side critical sections on these
things are very small -- pretty much only when the ->d_inode goes
away or ->d_name changes.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patches] seqlock: add barrier-less special cases for seqcounts
  2010-11-12 23:06   ` Nick Piggin
@ 2010-11-12 23:52     ` Linus Torvalds
  2010-11-15  3:49       ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2010-11-12 23:52 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Al Viro, Ingo Molnar, Thomas Gleixner

On Fri, Nov 12, 2010 at 3:06 PM, Nick Piggin <npiggin@kernel.dk> wrote:
. ...
> seq2 = read_seqlock_begin(&child->d_seq);
> if (read_seqcount_retry(&dentry->d_seq, seq))
>  /* bail out */

So the only issue is that this particular back-to-back sequence with
these kinds of "take one seqlock and release the previous one" where
you currently end up having basically one smp_rmb() at the end of
"read_seqlock_begin()", only to be followed immediately by another one
starting out the "read_seqcount_retry()"?

If so, I think we should make _that_ operation ("move from one seqlock
to another") be the special one, because it smells like in general,
using the special non-locking versions is going to be a very subtle
interface.

                      Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patches] seqlock: add barrier-less special cases for seqcounts
  2010-11-12 23:52     ` Linus Torvalds
@ 2010-11-15  3:49       ` Nick Piggin
  0 siblings, 0 replies; 5+ messages in thread
From: Nick Piggin @ 2010-11-15  3:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, linux-kernel, Al Viro, Ingo Molnar, Thomas Gleixner

On Fri, Nov 12, 2010 at 03:52:55PM -0800, Linus Torvalds wrote:
> On Fri, Nov 12, 2010 at 3:06 PM, Nick Piggin <npiggin@kernel.dk> wrote:
> . ...
> > seq2 = read_seqlock_begin(&child->d_seq);
> > if (read_seqcount_retry(&dentry->d_seq, seq))
> >  /* bail out */
> 
> So the only issue is that this particular back-to-back sequence with
> these kinds of "take one seqlock and release the previous one" where
> you currently end up having basically one smp_rmb() at the end of
> "read_seqlock_begin()", only to be followed immediately by another one
> starting out the "read_seqcount_retry()"?

I think basically yes, I'll have to take another look at the code.


> If so, I think we should make _that_ operation ("move from one seqlock
> to another") be the special one, because it smells like in general,
> using the special non-locking versions is going to be a very subtle
> interface.

OK, that sounds like a good idea. I'll see if that's applicable.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-11-15  3:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-11  8:00 [patches] seqlock: add barrier-less special cases for seqcounts Nick Piggin
2010-11-12 16:39 ` Linus Torvalds
2010-11-12 23:06   ` Nick Piggin
2010-11-12 23:52     ` Linus Torvalds
2010-11-15  3:49       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).