All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Anton Arapov <anton@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND v2 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily
Date: Thu, 8 Nov 2012 13:08:43 -0800	[thread overview]
Message-ID: <20121108210843.GF2519@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121108120700.42d438f2.akpm@linux-foundation.org>

On Thu, Nov 08, 2012 at 12:07:00PM -0800, Andrew Morton wrote:
> On Thu, 8 Nov 2012 14:48:49 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > Currently the writer does msleep() plus synchronize_sched() 3 times
> > to acquire/release the semaphore, and during this time the readers
> > are blocked completely. Even if the "write" section was not actually
> > started or if it was already finished.
> > 
> > With this patch down_write/up_write does synchronize_sched() twice
> > and down_read/up_read are still possible during this time, just they
> > use the slow path.
> > 
> > percpu_down_write() first forces the readers to use rw_semaphore and
> > increment the "slow" counter to take the lock for reading, then it
> > takes that rw_semaphore for writing and blocks the readers.
> > 
> > Also. With this patch the code relies on the documented behaviour of
> > synchronize_sched(), it doesn't try to pair synchronize_sched() with
> > barrier.
> > 
> > ...
> >
> >  include/linux/percpu-rwsem.h |   83 +++++------------------------
> >  lib/Makefile                 |    2 +-
> >  lib/percpu-rwsem.c           |  123 ++++++++++++++++++++++++++++++++++++++++++
> 
> The patch also uninlines everything.
> 
> And it didn't export the resulting symbols to modules, so it isn't an
> equivalent.  We can export thing later if needed I guess.
> 
> It adds percpu-rwsem.o to lib-y, so the CONFIG_BLOCK=n kernel will
> avoid including the code altogether, methinks?
> 
> >
> > ...
> >
> > --- /dev/null
> > +++ b/lib/percpu-rwsem.c
> > @@ -0,0 +1,123 @@
> 
> That was nice and terse ;)
> 
> > +#include <linux/percpu-rwsem.h>
> > +#include <linux/rcupdate.h>
> > +#include <linux/sched.h>
> 
> This list is nowhere near sufficient to support this file's
> requirements.  atomic.h, percpu.h, rwsem.h, wait.h, errno.h and plenty
> more.  IOW, if it compiles, it was sheer luck.
> 
> > +int percpu_init_rwsem(struct percpu_rw_semaphore *brw)
> > +{
> > +	brw->fast_read_ctr = alloc_percpu(int);
> > +	if (unlikely(!brw->fast_read_ctr))
> > +		return -ENOMEM;
> > +
> > +	mutex_init(&brw->writer_mutex);
> > +	init_rwsem(&brw->rw_sem);
> > +	atomic_set(&brw->slow_read_ctr, 0);
> > +	init_waitqueue_head(&brw->write_waitq);
> > +	return 0;
> > +}
> > +
> > +void percpu_free_rwsem(struct percpu_rw_semaphore *brw)
> > +{
> > +	free_percpu(brw->fast_read_ctr);
> > +	brw->fast_read_ctr = NULL; /* catch use after free bugs */
> > +}
> > +
> > +static bool update_fast_ctr(struct percpu_rw_semaphore *brw, unsigned int val)
> > +{
> > +	bool success = false;
> > +
> > +	preempt_disable();
> > +	if (likely(!mutex_is_locked(&brw->writer_mutex))) {
> > +		__this_cpu_add(*brw->fast_read_ctr, val);
> > +		success = true;
> > +	}
> > +	preempt_enable();
> > +
> > +	return success;
> > +}
> > +
> > +/*
> > + * Like the normal down_read() this is not recursive, the writer can
> > + * come after the first percpu_down_read() and create the deadlock.
> > + */
> > +void percpu_down_read(struct percpu_rw_semaphore *brw)
> > +{
> > +	if (likely(update_fast_ctr(brw, +1)))
> > +		return;
> > +
> > +	down_read(&brw->rw_sem);
> > +	atomic_inc(&brw->slow_read_ctr);
> > +	up_read(&brw->rw_sem);
> > +}
> > +
> > +void percpu_up_read(struct percpu_rw_semaphore *brw)
> > +{
> > +	if (likely(update_fast_ctr(brw, -1)))
> > +		return;
> > +
> > +	/* false-positive is possible but harmless */
> > +	if (atomic_dec_and_test(&brw->slow_read_ctr))
> > +		wake_up_all(&brw->write_waitq);
> > +}
> > +
> > +static int clear_fast_ctr(struct percpu_rw_semaphore *brw)
> > +{
> > +	unsigned int sum = 0;
> > +	int cpu;
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		sum += per_cpu(*brw->fast_read_ctr, cpu);
> > +		per_cpu(*brw->fast_read_ctr, cpu) = 0;
> > +	}
> > +
> > +	return sum;
> > +}
> > +
> > +/*
> > + * A writer takes ->writer_mutex to exclude other writers and to force the
> > + * readers to switch to the slow mode, note the mutex_is_locked() check in
> > + * update_fast_ctr().
> > + *
> > + * After that the readers can only inc/dec the slow ->slow_read_ctr counter,
> > + * ->fast_read_ctr is stable. Once the writer moves its sum into the slow
> > + * counter it represents the number of active readers.
> > + *
> > + * Finally the writer takes ->rw_sem for writing and blocks the new readers,
> > + * then waits until the slow counter becomes zero.
> > + */
> 
> Some overview of how fast/slow_read_ctr are supposed to work would be
> useful.  This comment seems to assume that the reader already knew
> that.
> 
> > +void percpu_down_write(struct percpu_rw_semaphore *brw)
> > +{
> > +	/* also blocks update_fast_ctr() which checks mutex_is_locked() */
> > +	mutex_lock(&brw->writer_mutex);
> > +
> > +	/*
> > +	 * 1. Ensures mutex_is_locked() is visible to any down_read/up_read
> > +	 *    so that update_fast_ctr() can't succeed.
> > +	 *
> > +	 * 2. Ensures we see the result of every previous this_cpu_add() in
> > +	 *    update_fast_ctr().
> > +	 *
> > +	 * 3. Ensures that if any reader has exited its critical section via
> > +	 *    fast-path, it executes a full memory barrier before we return.
> > +	 */
> > +	synchronize_sched();
> 
> Here's where I get horridly confused.  Your patch completely deRCUifies
> this code, yes?  Yet here we're using an RCU primitive.  And we seem to
> be using it not as an RCU primitive but as a handy thing which happens
> to have desirable side-effects.  But the implementation of
> synchronize_sched() differs considerably according to which rcu
> flavor-of-the-minute you're using.

The trick is that the preempt_disable() call in update_fast_ctr()
acts as an RCU read-side critical section WRT synchronize_sched().

The algorithm would work given rcu_read_lock()/rcu_read_unlock() and
synchronize_rcu() in place of preempt_disable()/preempt_enable() and
synchronize_sched().  The real-time guys would prefer the change
to rcu_read_lock()/rcu_read_unlock() and synchronize_rcu(), now that
you mention it.

Oleg, Mikulas, any reason not to move to rcu_read_lock()/rcu_read_unlock()
and synchronize_rcu()?

							Thanx, Paul

> And part 3 talks about the reader's critical section.  The only
> critical sections I can see on the reader side are already covered by
> mutex_lock() and preempt_diable().
> 
> I get this feeling I don't have clue what's going on here and I think
> I'll just retire hurt now.  If this code isn't as brain damaged as it
> initially appears then please, go easy on us simpletons in the next
> version?
> 
> > +	/* nobody can use fast_read_ctr, move its sum into slow_read_ctr */
> > +	atomic_add(clear_fast_ctr(brw), &brw->slow_read_ctr);
> > +
> > +	/* block the new readers completely */
> > +	down_write(&brw->rw_sem);
> > +
> > +	/* wait for all readers to complete their percpu_up_read() */
> > +	wait_event(brw->write_waitq, !atomic_read(&brw->slow_read_ctr));
> > +}
> > +
> > +void percpu_up_write(struct percpu_rw_semaphore *brw)
> > +{
> > +	/* allow the new readers, but only the slow-path */
> > +	up_write(&brw->rw_sem);
> > +
> > +	/* insert the barrier before the next fast-path in down_read */
> > +	synchronize_sched();
> > +
> > +	mutex_unlock(&brw->writer_mutex);
> > +}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


  reply	other threads:[~2012-11-08 21:16 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15 19:09 [RFC PATCH 0/2] uprobes: register/unregister can race with fork Oleg Nesterov
2012-10-15 19:10 ` [PATCH 1/2] brw_mutex: big read-write mutex Oleg Nesterov
2012-10-15 23:28   ` Paul E. McKenney
2012-10-16 15:56     ` Oleg Nesterov
2012-10-16 18:58       ` Paul E. McKenney
2012-10-17 16:37         ` Oleg Nesterov
2012-10-17 22:28           ` Paul E. McKenney
2012-10-16 19:56   ` Linus Torvalds
2012-10-17 16:59     ` Oleg Nesterov
2012-10-17 22:44       ` Paul E. McKenney
2012-10-18 16:24         ` Oleg Nesterov
2012-10-18 16:38           ` Paul E. McKenney
2012-10-18 17:57             ` Oleg Nesterov
2012-10-18 19:28               ` Mikulas Patocka
2012-10-19 12:38                 ` Peter Zijlstra
2012-10-19 15:32                   ` Mikulas Patocka
2012-10-19 17:40                     ` Peter Zijlstra
2012-10-19 17:57                       ` Oleg Nesterov
2012-10-19 22:54                       ` Mikulas Patocka
2012-10-24  3:08                         ` Dave Chinner
2012-10-25 14:09                           ` Mikulas Patocka
2012-10-25 23:40                             ` Dave Chinner
2012-10-26 12:06                               ` Oleg Nesterov
2012-10-26 13:22                                 ` Mikulas Patocka
2012-10-26 14:12                                   ` Oleg Nesterov
2012-10-26 15:23                                     ` mark_files_ro && sb_end_write Oleg Nesterov
2012-10-26 16:09                                     ` [PATCH 1/2] brw_mutex: big read-write mutex Mikulas Patocka
2012-10-19 17:49                     ` Oleg Nesterov
2012-10-22 23:09                       ` Mikulas Patocka
2012-10-23 15:12                         ` Oleg Nesterov
2012-10-19 19:28               ` Paul E. McKenney
2012-10-22 23:36                 ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Mikulas Patocka
2012-10-22 23:37                   ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Mikulas Patocka
2012-10-22 23:39                     ` [PATCH 2/2] percpu-rw-semaphores: use rcu_read_lock_sched Mikulas Patocka
2012-10-24 16:16                       ` Paul E. McKenney
2012-10-24 17:18                         ` Oleg Nesterov
2012-10-24 18:20                           ` Paul E. McKenney
2012-10-24 18:43                             ` Oleg Nesterov
2012-10-24 19:43                               ` Paul E. McKenney
2012-10-25 14:54                         ` Mikulas Patocka
2012-10-25 15:07                           ` Paul E. McKenney
2012-10-25 16:15                             ` Mikulas Patocka
2012-10-23 16:59                     ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Oleg Nesterov
2012-10-23 18:05                       ` Paul E. McKenney
2012-10-23 18:27                         ` Oleg Nesterov
2012-10-23 18:41                         ` Oleg Nesterov
2012-10-23 20:29                           ` Paul E. McKenney
2012-10-23 20:32                             ` Paul E. McKenney
2012-10-23 21:39                               ` Mikulas Patocka
2012-10-24 16:23                                 ` Paul E. McKenney
2012-10-24 20:22                                   ` Mikulas Patocka
2012-10-24 20:36                                     ` Paul E. McKenney
2012-10-24 20:44                                       ` Mikulas Patocka
2012-10-24 23:57                                         ` Paul E. McKenney
2012-10-25 12:39                                           ` Paul E. McKenney
2012-10-25 13:48                                           ` Mikulas Patocka
2012-10-23 19:23                       ` Oleg Nesterov
2012-10-23 20:45                         ` Peter Zijlstra
2012-10-23 20:57                         ` Peter Zijlstra
2012-10-24 15:11                           ` Oleg Nesterov
2012-10-23 21:26                         ` Mikulas Patocka
2012-10-23 20:32                     ` Peter Zijlstra
2012-10-30 18:48                   ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Oleg Nesterov
2012-10-31 19:41                     ` [PATCH 0/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Oleg Nesterov
2012-10-31 19:41                       ` [PATCH 1/1] " Oleg Nesterov
2012-11-01 15:10                         ` Linus Torvalds
2012-11-01 15:34                           ` Oleg Nesterov
2012-11-02 18:06                           ` [PATCH v2 0/1] " Oleg Nesterov
2012-11-02 18:06                             ` [PATCH v2 1/1] " Oleg Nesterov
2012-11-07 17:04                               ` [PATCH v3 " Mikulas Patocka
2012-11-07 17:47                                 ` Oleg Nesterov
2012-11-07 19:17                                   ` Mikulas Patocka
2012-11-08 13:42                                     ` Oleg Nesterov
2012-11-08  1:23                                 ` Paul E. McKenney
2012-11-08  1:16                               ` [PATCH v2 " Paul E. McKenney
2012-11-08 13:33                                 ` Oleg Nesterov
2012-11-08 16:27                                   ` Paul E. McKenney
2012-11-08 13:48                             ` [PATCH RESEND v2 0/1] " Oleg Nesterov
2012-11-08 13:48                               ` [PATCH RESEND v2 1/1] " Oleg Nesterov
2012-11-08 20:07                                 ` Andrew Morton
2012-11-08 21:08                                   ` Paul E. McKenney [this message]
2012-11-08 23:41                                     ` Mikulas Patocka
2012-11-09  0:41                                       ` Paul E. McKenney
2012-11-09  3:23                                         ` Paul E. McKenney
2012-11-09 16:35                                           ` Oleg Nesterov
2012-11-09 16:59                                             ` Paul E. McKenney
2012-11-09 12:47                                   ` Mikulas Patocka
2012-11-09 15:46                                   ` Oleg Nesterov
2012-11-09 17:01                                     ` Paul E. McKenney
2012-11-09 18:10                                       ` Oleg Nesterov
2012-11-09 18:19                                         ` Oleg Nesterov
2012-11-10  0:55                                         ` Paul E. McKenney
2012-11-11 15:45                                           ` Oleg Nesterov
2012-11-12 18:38                                             ` Paul E. McKenney
2012-11-11 18:27                                   ` [PATCH -mm] percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessari ly.fix Oleg Nesterov
2012-11-12 18:31                                     ` Paul E. McKenney
2012-11-16 23:22                                     ` Andrew Morton
2012-11-18 19:32                                       ` Oleg Nesterov
2012-11-01 15:43                         ` [PATCH 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Paul E. McKenney
2012-11-01 18:33                           ` Oleg Nesterov
2012-11-02 16:18                             ` Oleg Nesterov
2012-10-15 19:10 ` [PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race Oleg Nesterov
2012-10-18  7:03   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121108210843.GF2519@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=anton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpatocka@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.