linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@tv-sign.ru>, Ingo Molnar <mingo@elte.hu>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/7] barrier: a scalable synchonisation barrier
Date: Wed, 31 Jan 2007 16:48:50 -0800	[thread overview]
Message-ID: <20070201004849.GS2574@linux.vnet.ibm.com> (raw)
In-Reply-To: <1170288190.10924.108.camel@lappy>

On Thu, Feb 01, 2007 at 01:03:09AM +0100, Peter Zijlstra wrote:
> On Wed, 2007-01-31 at 15:32 -0800, Paul E. McKenney wrote:
> 
> > The wakeup in barrier_sync() would mean that the counter was zero
> > at some point in the past.  The counter would then be rechecked, and
> > if it were still zero, barrier_sync() would invoke finish_wait() and
> > then return -- but the counter might well become non-zero in the
> > meantime, right?
> > 
> > So given that barrier_sync() is permitted to return after the counter
> > becomes non-zero, why can't it just rely on the fact that barrier_unlock()
> > saw it as zero not long in the past?
> > 
> > > >  It looks like barrier_sync() is more a
> > > > rw semaphore biased to readers.
> > > 
> > > Indeed, the locked sections are designed to be the rare case.
> > 
> > OK -- but barrier_sync() just waits for readers, it doesn't exclude them.
> > 
> > If all barrier_sync() needs to do is to wait until all pre-existing
> > barrier_lock()/barrier_unlock() pairs to complete, it seems to me to
> > be compatible with qrcu's semantics.
> > 
> > So what am I missing here?
> 
> I might be the one missing stuff, I'll have a hard look at qrcu.
> 
> The intent was that barrier_sync() would not write to memory when there
> are no active locked sections, so that the cacheline can stay shared,
> thus keeping is fast.
> 
> If qrcu does exactly this, then yes we have a match.

QRCU as currently written (http://lkml.org/lkml/2006/11/29/330) doesn't
do what you want, as it acquires the lock unconditionally.  I am proposing
that synchronize_qrcu() change to something like the following:

	void synchronize_qrcu(struct qrcu_struct *qp)
	{
		int idx;
	
		smp_mb();
	
		if (atomic_read(qp->ctr[0]) + atomic_read(qp->ctr[1]) <= 1) {
			smp_rmb();
			if (atomic_read(qp->ctr[0]) +
			    atomic_read(qp->ctr[1]) <= 1)
				goto out;
		}
	
		mutex_lock(&qp->mutex);
		idx = qp->completed & 0x1;
		atomic_inc(qp->ctr + (idx ^ 0x1));
		/* Reduce the likelihood that qrcu_read_lock() will loop */
		smp_mb__after_atomic_inc();
		qp->completed++;
	
		atomic_dec(qp->ctr + idx);
		__wait_event(qp->wq, !atomic_read(qp->ctr + idx));
		mutex_unlock(&qp->mutex);
	out:
		smp_mb();
	}

For the first "if" to give a false positive, a concurrent switch had
to have happened.  For example, qp->ctr[0] was zero and qp->ctr[1]
was two at the time of the first atomic_read(), but then qp->completed
switched so that both qp->ctr[0] and qp->ctr[1] were one at the time
of the second atomic_read.  The only way the second "if" can give us a
false positive is if there was another change to qp->completed in the
meantime -- but that means that all of the pre-existing qrcu_read_lock()
holders must have gotten done, otherwise the second switch could not
have happened.  Yes, you do incur three memory barriers on the fast
path, but the best you could hope for with your approach was two of them
(unless I am confused about how you were using barrier_sync()).

Oleg, does this look safe?

Ugly at best, I know, but I do very much sympathize with Christoph's
desire to keep the number of synchronization primitives down to a
dull roar.  ;-)

							Thanx, Paul

  reply	other threads:[~2007-02-01  0:49 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-28 11:51 [PATCH 0/7] breaking the global file_list_lock Peter Zijlstra
2007-01-28 11:51 ` [PATCH 1/7] lockdep: lock_set_subclass - reset a held locks subclass Peter Zijlstra
2007-01-28 11:51 ` [PATCH 3/7] barrier: a scalable synchonisation barrier Peter Zijlstra
2007-01-28 14:39   ` Christoph Hellwig
2007-01-28 15:24     ` Ingo Molnar
2007-01-28 15:34       ` Christoph Hellwig
2007-01-31 19:12       ` Paul E. McKenney
2007-01-31 21:13         ` Oleg Nesterov
2007-01-31 21:30           ` Oleg Nesterov
2007-01-31 21:48           ` Peter Zijlstra
2007-01-31 23:32             ` Paul E. McKenney
2007-02-01  0:03               ` Peter Zijlstra
2007-02-01  0:48                 ` Paul E. McKenney [this message]
2007-02-01 16:00                   ` Oleg Nesterov
2007-02-01 21:38                     ` Paul E. McKenney
2007-02-02 11:56                       ` Oleg Nesterov
2007-02-02 12:01                         ` Peter Zijlstra
2007-02-02 17:13                         ` Paul E. McKenney
2007-02-03 16:38                   ` Oleg Nesterov
2007-02-04  0:23                     ` Paul E. McKenney
2007-02-04  3:24                       ` Alan Stern
2007-02-04  5:46                         ` Paul E. McKenney
2007-01-28 11:51 ` [PATCH 4/7] fs: break the file_list_lock for sb->s_files Peter Zijlstra
2007-01-28 14:43   ` Christoph Hellwig
2007-01-28 15:21     ` Ingo Molnar
2007-01-28 15:30       ` Christoph Hellwig
2007-01-28 15:32         ` Peter Zijlstra
2007-01-28 15:36           ` Christoph Hellwig
2007-01-28 15:44         ` Ingo Molnar
2007-01-28 16:25         ` Bill Huey
2007-01-28 11:51 ` [PATCH 5/7] fs: restore previous sb->s_files iteration semantics Peter Zijlstra
2007-01-28 11:51 ` [PATCH 6/7] schedule_on_each_cpu_wq() Peter Zijlstra
2007-01-28 11:51 ` [PATCH 7/7] fs: fixup filevec_add_drain_all Peter Zijlstra
2007-01-28 12:16 ` [PATCH 8/7] fs: free_write_pipe() fix Peter Zijlstra
2007-01-28 14:43 ` [PATCH 0/7] breaking the global file_list_lock Christoph Hellwig
2007-01-28 15:11   ` Christoph Hellwig
2007-01-28 15:29     ` Peter Zijlstra
2007-01-28 15:33       ` Christoph Hellwig
2007-01-29 13:32     ` Stephen Smalley
2007-01-29 18:02       ` Christoph Hellwig
2007-01-28 15:24   ` Ingo Molnar
2007-01-28 16:52     ` Martin J. Bligh
2007-01-28 17:04       ` lockmeter Christoph Hellwig
2007-01-28 17:38         ` lockmeter Martin J. Bligh
2007-01-28 18:01           ` lockmeter Bill Huey
2007-01-28 19:26             ` lockmeter Ingo Molnar
2007-01-28 21:17             ` lockmeter Ingo Molnar
2007-01-29  5:27               ` lockmeter Bill Huey
2007-01-29 10:26                 ` lockmeter Bill Huey
2007-01-29  1:08         ` lockmeter Arjan van de Ven
2007-01-29  1:12           ` lockmeter Martin J. Bligh
2007-01-28 18:41   ` [PATCH 0/7] breaking the global file_list_lock Ingo Molnar
2007-01-28 20:38     ` Christoph Hellwig
2007-01-28 21:05       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070201004849.GS2574@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).