All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Anton Arapov <anton@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND v2 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily
Date: Thu, 8 Nov 2012 19:23:10 -0800	[thread overview]
Message-ID: <20121109032310.GA2438@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121109004136.GH2519@linux.vnet.ibm.com>

On Thu, Nov 08, 2012 at 04:41:36PM -0800, Paul E. McKenney wrote:
> On Thu, Nov 08, 2012 at 06:41:10PM -0500, Mikulas Patocka wrote:
> > 
> > 
> > On Thu, 8 Nov 2012, Paul E. McKenney wrote:
> > 
> > > On Thu, Nov 08, 2012 at 12:07:00PM -0800, Andrew Morton wrote:
> > > > On Thu, 8 Nov 2012 14:48:49 +0100
> > > > Oleg Nesterov <oleg@redhat.com> wrote:
> > > > 
> > > > > Currently the writer does msleep() plus synchronize_sched() 3 times
> > > > > to acquire/release the semaphore, and during this time the readers
> > > > > are blocked completely. Even if the "write" section was not actually
> > > > > started or if it was already finished.
> > > > > 
> > > > > With this patch down_write/up_write does synchronize_sched() twice
> > > > > and down_read/up_read are still possible during this time, just they
> > > > > use the slow path.
> > > > > 
> > > > > percpu_down_write() first forces the readers to use rw_semaphore and
> > > > > increment the "slow" counter to take the lock for reading, then it
> > > > > takes that rw_semaphore for writing and blocks the readers.
> > > > > 
> > > > > Also. With this patch the code relies on the documented behaviour of
> > > > > synchronize_sched(), it doesn't try to pair synchronize_sched() with
> > > > > barrier.
> > > > > 
> > > > > ...
> > > > >
> > > > >  include/linux/percpu-rwsem.h |   83 +++++------------------------
> > > > >  lib/Makefile                 |    2 +-
> > > > >  lib/percpu-rwsem.c           |  123 ++++++++++++++++++++++++++++++++++++++++++
> > > > 
> > > > The patch also uninlines everything.
> > > > 
> > > > And it didn't export the resulting symbols to modules, so it isn't an
> > > > equivalent.  We can export thing later if needed I guess.
> > > > 
> > > > It adds percpu-rwsem.o to lib-y, so the CONFIG_BLOCK=n kernel will
> > > > avoid including the code altogether, methinks?
> > > > 
> > > > >
> > > > > ...
> > > > >
> > > > > --- /dev/null
> > > > > +++ b/lib/percpu-rwsem.c
> > > > > @@ -0,0 +1,123 @@
> > > > 
> > > > That was nice and terse ;)
> > > > 
> > > > > +#include <linux/percpu-rwsem.h>
> > > > > +#include <linux/rcupdate.h>
> > > > > +#include <linux/sched.h>
> > > > 
> > > > This list is nowhere near sufficient to support this file's
> > > > requirements.  atomic.h, percpu.h, rwsem.h, wait.h, errno.h and plenty
> > > > more.  IOW, if it compiles, it was sheer luck.
> > > > 
> > > > > +int percpu_init_rwsem(struct percpu_rw_semaphore *brw)
> > > > > +{
> > > > > +	brw->fast_read_ctr = alloc_percpu(int);
> > > > > +	if (unlikely(!brw->fast_read_ctr))
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	mutex_init(&brw->writer_mutex);
> > > > > +	init_rwsem(&brw->rw_sem);
> > > > > +	atomic_set(&brw->slow_read_ctr, 0);
> > > > > +	init_waitqueue_head(&brw->write_waitq);
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +void percpu_free_rwsem(struct percpu_rw_semaphore *brw)
> > > > > +{
> > > > > +	free_percpu(brw->fast_read_ctr);
> > > > > +	brw->fast_read_ctr = NULL; /* catch use after free bugs */
> > > > > +}
> > > > > +
> > > > > +static bool update_fast_ctr(struct percpu_rw_semaphore *brw, unsigned int val)
> > > > > +{
> > > > > +	bool success = false;
> > > > > +
> > > > > +	preempt_disable();
> > > > > +	if (likely(!mutex_is_locked(&brw->writer_mutex))) {
> > > > > +		__this_cpu_add(*brw->fast_read_ctr, val);
> > > > > +		success = true;
> > > > > +	}
> > > > > +	preempt_enable();
> > > > > +
> > > > > +	return success;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Like the normal down_read() this is not recursive, the writer can
> > > > > + * come after the first percpu_down_read() and create the deadlock.
> > > > > + */
> > > > > +void percpu_down_read(struct percpu_rw_semaphore *brw)
> > > > > +{
> > > > > +	if (likely(update_fast_ctr(brw, +1)))
> > > > > +		return;
> > > > > +
> > > > > +	down_read(&brw->rw_sem);
> > > > > +	atomic_inc(&brw->slow_read_ctr);
> > > > > +	up_read(&brw->rw_sem);
> > > > > +}
> > > > > +
> > > > > +void percpu_up_read(struct percpu_rw_semaphore *brw)
> > > > > +{
> > > > > +	if (likely(update_fast_ctr(brw, -1)))
> > > > > +		return;
> > > > > +
> > > > > +	/* false-positive is possible but harmless */
> > > > > +	if (atomic_dec_and_test(&brw->slow_read_ctr))
> > > > > +		wake_up_all(&brw->write_waitq);
> > > > > +}
> > > > > +
> > > > > +static int clear_fast_ctr(struct percpu_rw_semaphore *brw)
> > > > > +{
> > > > > +	unsigned int sum = 0;
> > > > > +	int cpu;
> > > > > +
> > > > > +	for_each_possible_cpu(cpu) {
> > > > > +		sum += per_cpu(*brw->fast_read_ctr, cpu);
> > > > > +		per_cpu(*brw->fast_read_ctr, cpu) = 0;
> > > > > +	}
> > > > > +
> > > > > +	return sum;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * A writer takes ->writer_mutex to exclude other writers and to force the
> > > > > + * readers to switch to the slow mode, note the mutex_is_locked() check in
> > > > > + * update_fast_ctr().
> > > > > + *
> > > > > + * After that the readers can only inc/dec the slow ->slow_read_ctr counter,
> > > > > + * ->fast_read_ctr is stable. Once the writer moves its sum into the slow
> > > > > + * counter it represents the number of active readers.
> > > > > + *
> > > > > + * Finally the writer takes ->rw_sem for writing and blocks the new readers,
> > > > > + * then waits until the slow counter becomes zero.
> > > > > + */
> > > > 
> > > > Some overview of how fast/slow_read_ctr are supposed to work would be
> > > > useful.  This comment seems to assume that the reader already knew
> > > > that.
> > > > 
> > > > > +void percpu_down_write(struct percpu_rw_semaphore *brw)
> > > > > +{
> > > > > +	/* also blocks update_fast_ctr() which checks mutex_is_locked() */
> > > > > +	mutex_lock(&brw->writer_mutex);
> > > > > +
> > > > > +	/*
> > > > > +	 * 1. Ensures mutex_is_locked() is visible to any down_read/up_read
> > > > > +	 *    so that update_fast_ctr() can't succeed.
> > > > > +	 *
> > > > > +	 * 2. Ensures we see the result of every previous this_cpu_add() in
> > > > > +	 *    update_fast_ctr().
> > > > > +	 *
> > > > > +	 * 3. Ensures that if any reader has exited its critical section via
> > > > > +	 *    fast-path, it executes a full memory barrier before we return.
> > > > > +	 */
> > > > > +	synchronize_sched();
> > > > 
> > > > Here's where I get horridly confused.  Your patch completely deRCUifies
> > > > this code, yes?  Yet here we're using an RCU primitive.  And we seem to
> > > > be using it not as an RCU primitive but as a handy thing which happens
> > > > to have desirable side-effects.  But the implementation of
> > > > synchronize_sched() differs considerably according to which rcu
> > > > flavor-of-the-minute you're using.
> > > 
> > > The trick is that the preempt_disable() call in update_fast_ctr()
> > > acts as an RCU read-side critical section WRT synchronize_sched().
> > > 
> > > The algorithm would work given rcu_read_lock()/rcu_read_unlock() and
> > > synchronize_rcu() in place of preempt_disable()/preempt_enable() and
> > > synchronize_sched().  The real-time guys would prefer the change
> > > to rcu_read_lock()/rcu_read_unlock() and synchronize_rcu(), now that
> > > you mention it.
> > > 
> > > Oleg, Mikulas, any reason not to move to rcu_read_lock()/rcu_read_unlock()
> > > and synchronize_rcu()?
> > 
> > preempt_disable/preempt_enable is faster than 
> > rcu_read_lock/rcu_read_unlock for preemptive kernels.
> 
> Significantly faster in this case?  Can you measure the difference
> from a user-mode test?
> 
> Hmmm.  I have been avoiding moving the preemptible-RCU state from
> task_struct to thread_info, but if the difference really matters,
> perhaps that needs to be done.

Actually, the fact that __this_cpu_add() will malfunction on some
architectures is preemption is not disabled seems a more compelling
reason to keep preempt_enable() than any performance improvement.  ;-)

							Thanx, Paul

> > Regarding real-time response - the region blocked with 
> > preempt_disable/preempt_enable contains a few instructions (one test for 
> > mutex_is_locked and one increment of percpu variable), so it isn't any 
> > threat to real time response. There are plenty of longer regions in the 
> > kernel that are executed with interrupts or preemption disabled.
> 
> Careful.  The real-time guys might take the same every-little-bit approach
> to latency that you seem to be taking for CPU cycles.  ;-)
> 
> 							Thanx, Paul
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


  reply	other threads:[~2012-11-09  3:23 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15 19:09 [RFC PATCH 0/2] uprobes: register/unregister can race with fork Oleg Nesterov
2012-10-15 19:10 ` [PATCH 1/2] brw_mutex: big read-write mutex Oleg Nesterov
2012-10-15 23:28   ` Paul E. McKenney
2012-10-16 15:56     ` Oleg Nesterov
2012-10-16 18:58       ` Paul E. McKenney
2012-10-17 16:37         ` Oleg Nesterov
2012-10-17 22:28           ` Paul E. McKenney
2012-10-16 19:56   ` Linus Torvalds
2012-10-17 16:59     ` Oleg Nesterov
2012-10-17 22:44       ` Paul E. McKenney
2012-10-18 16:24         ` Oleg Nesterov
2012-10-18 16:38           ` Paul E. McKenney
2012-10-18 17:57             ` Oleg Nesterov
2012-10-18 19:28               ` Mikulas Patocka
2012-10-19 12:38                 ` Peter Zijlstra
2012-10-19 15:32                   ` Mikulas Patocka
2012-10-19 17:40                     ` Peter Zijlstra
2012-10-19 17:57                       ` Oleg Nesterov
2012-10-19 22:54                       ` Mikulas Patocka
2012-10-24  3:08                         ` Dave Chinner
2012-10-25 14:09                           ` Mikulas Patocka
2012-10-25 23:40                             ` Dave Chinner
2012-10-26 12:06                               ` Oleg Nesterov
2012-10-26 13:22                                 ` Mikulas Patocka
2012-10-26 14:12                                   ` Oleg Nesterov
2012-10-26 15:23                                     ` mark_files_ro && sb_end_write Oleg Nesterov
2012-10-26 16:09                                     ` [PATCH 1/2] brw_mutex: big read-write mutex Mikulas Patocka
2012-10-19 17:49                     ` Oleg Nesterov
2012-10-22 23:09                       ` Mikulas Patocka
2012-10-23 15:12                         ` Oleg Nesterov
2012-10-19 19:28               ` Paul E. McKenney
2012-10-22 23:36                 ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Mikulas Patocka
2012-10-22 23:37                   ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Mikulas Patocka
2012-10-22 23:39                     ` [PATCH 2/2] percpu-rw-semaphores: use rcu_read_lock_sched Mikulas Patocka
2012-10-24 16:16                       ` Paul E. McKenney
2012-10-24 17:18                         ` Oleg Nesterov
2012-10-24 18:20                           ` Paul E. McKenney
2012-10-24 18:43                             ` Oleg Nesterov
2012-10-24 19:43                               ` Paul E. McKenney
2012-10-25 14:54                         ` Mikulas Patocka
2012-10-25 15:07                           ` Paul E. McKenney
2012-10-25 16:15                             ` Mikulas Patocka
2012-10-23 16:59                     ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Oleg Nesterov
2012-10-23 18:05                       ` Paul E. McKenney
2012-10-23 18:27                         ` Oleg Nesterov
2012-10-23 18:41                         ` Oleg Nesterov
2012-10-23 20:29                           ` Paul E. McKenney
2012-10-23 20:32                             ` Paul E. McKenney
2012-10-23 21:39                               ` Mikulas Patocka
2012-10-24 16:23                                 ` Paul E. McKenney
2012-10-24 20:22                                   ` Mikulas Patocka
2012-10-24 20:36                                     ` Paul E. McKenney
2012-10-24 20:44                                       ` Mikulas Patocka
2012-10-24 23:57                                         ` Paul E. McKenney
2012-10-25 12:39                                           ` Paul E. McKenney
2012-10-25 13:48                                           ` Mikulas Patocka
2012-10-23 19:23                       ` Oleg Nesterov
2012-10-23 20:45                         ` Peter Zijlstra
2012-10-23 20:57                         ` Peter Zijlstra
2012-10-24 15:11                           ` Oleg Nesterov
2012-10-23 21:26                         ` Mikulas Patocka
2012-10-23 20:32                     ` Peter Zijlstra
2012-10-30 18:48                   ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Oleg Nesterov
2012-10-31 19:41                     ` [PATCH 0/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Oleg Nesterov
2012-10-31 19:41                       ` [PATCH 1/1] " Oleg Nesterov
2012-11-01 15:10                         ` Linus Torvalds
2012-11-01 15:34                           ` Oleg Nesterov
2012-11-02 18:06                           ` [PATCH v2 0/1] " Oleg Nesterov
2012-11-02 18:06                             ` [PATCH v2 1/1] " Oleg Nesterov
2012-11-07 17:04                               ` [PATCH v3 " Mikulas Patocka
2012-11-07 17:47                                 ` Oleg Nesterov
2012-11-07 19:17                                   ` Mikulas Patocka
2012-11-08 13:42                                     ` Oleg Nesterov
2012-11-08  1:23                                 ` Paul E. McKenney
2012-11-08  1:16                               ` [PATCH v2 " Paul E. McKenney
2012-11-08 13:33                                 ` Oleg Nesterov
2012-11-08 16:27                                   ` Paul E. McKenney
2012-11-08 13:48                             ` [PATCH RESEND v2 0/1] " Oleg Nesterov
2012-11-08 13:48                               ` [PATCH RESEND v2 1/1] " Oleg Nesterov
2012-11-08 20:07                                 ` Andrew Morton
2012-11-08 21:08                                   ` Paul E. McKenney
2012-11-08 23:41                                     ` Mikulas Patocka
2012-11-09  0:41                                       ` Paul E. McKenney
2012-11-09  3:23                                         ` Paul E. McKenney [this message]
2012-11-09 16:35                                           ` Oleg Nesterov
2012-11-09 16:59                                             ` Paul E. McKenney
2012-11-09 12:47                                   ` Mikulas Patocka
2012-11-09 15:46                                   ` Oleg Nesterov
2012-11-09 17:01                                     ` Paul E. McKenney
2012-11-09 18:10                                       ` Oleg Nesterov
2012-11-09 18:19                                         ` Oleg Nesterov
2012-11-10  0:55                                         ` Paul E. McKenney
2012-11-11 15:45                                           ` Oleg Nesterov
2012-11-12 18:38                                             ` Paul E. McKenney
2012-11-11 18:27                                   ` [PATCH -mm] percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessari ly.fix Oleg Nesterov
2012-11-12 18:31                                     ` Paul E. McKenney
2012-11-16 23:22                                     ` Andrew Morton
2012-11-18 19:32                                       ` Oleg Nesterov
2012-11-01 15:43                         ` [PATCH 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Paul E. McKenney
2012-11-01 18:33                           ` Oleg Nesterov
2012-11-02 16:18                             ` Oleg Nesterov
2012-10-15 19:10 ` [PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race Oleg Nesterov
2012-10-18  7:03   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121109032310.GA2438@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=anton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpatocka@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.