All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Anton Arapov <anton@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND v2 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily
Date: Fri, 9 Nov 2012 09:01:07 -0800	[thread overview]
Message-ID: <20121109170107.GB2419@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121109154656.GA26134@redhat.com>

On Fri, Nov 09, 2012 at 04:46:56PM +0100, Oleg Nesterov wrote:
> On 11/08, Andrew Morton wrote:
> >
> > On Thu, 8 Nov 2012 14:48:49 +0100
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > >
> > >  include/linux/percpu-rwsem.h |   83 +++++------------------------
> > >  lib/Makefile                 |    2 +-
> > >  lib/percpu-rwsem.c           |  123 ++++++++++++++++++++++++++++++++++++++++++
> >
> > The patch also uninlines everything.
> >
> > And it didn't export the resulting symbols to modules, so it isn't an
> > equivalent.  We can export thing later if needed I guess.
> 
> Yes, currently it is only used by block_dev.c
> 
> > It adds percpu-rwsem.o to lib-y, so the CONFIG_BLOCK=n kernel will
> > avoid including the code altogether, methinks?
> 
> I am going to add another user (uprobes), this was my motivation for
> this patch. And perhaps it will have more users.
> 
> But I agree, CONFIG_PERCPU_RWSEM makes sense at least now, I'll send
> the patch.
> 
> > > +#include <linux/percpu-rwsem.h>
> > > +#include <linux/rcupdate.h>
> > > +#include <linux/sched.h>
> >
> > This list is nowhere near sufficient to support this file's
> > requirements.  atomic.h, percpu.h, rwsem.h, wait.h, errno.h and plenty
> > more.  IOW, if it compiles, it was sheer luck.
> 
> OK, thanks, I'll send
> send percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessarily.fix
> 
> > > +/*
> > > + * A writer takes ->writer_mutex to exclude other writers and to force the
> > > + * readers to switch to the slow mode, note the mutex_is_locked() check in
> > > + * update_fast_ctr().
> > > + *
> > > + * After that the readers can only inc/dec the slow ->slow_read_ctr counter,
> > > + * ->fast_read_ctr is stable. Once the writer moves its sum into the slow
> > > + * counter it represents the number of active readers.
> > > + *
> > > + * Finally the writer takes ->rw_sem for writing and blocks the new readers,
> > > + * then waits until the slow counter becomes zero.
> > > + */
> >
> > Some overview of how fast/slow_read_ctr are supposed to work would be
> > useful.  This comment seems to assume that the reader already knew
> > that.
> 
> I hate to say this, but I'll try to update this comment too ;)
> 
> > > +void percpu_down_write(struct percpu_rw_semaphore *brw)
> > > +{
> > > +	/* also blocks update_fast_ctr() which checks mutex_is_locked() */
> > > +	mutex_lock(&brw->writer_mutex);
> > > +
> > > +	/*
> > > +	 * 1. Ensures mutex_is_locked() is visible to any down_read/up_read
> > > +	 *    so that update_fast_ctr() can't succeed.
> > > +	 *
> > > +	 * 2. Ensures we see the result of every previous this_cpu_add() in
> > > +	 *    update_fast_ctr().
> > > +	 *
> > > +	 * 3. Ensures that if any reader has exited its critical section via
> > > +	 *    fast-path, it executes a full memory barrier before we return.
> > > +	 */
> > > +	synchronize_sched();
> >
> > Here's where I get horridly confused.  Your patch completely deRCUifies
> > this code, yes?  Yet here we're using an RCU primitive.  And we seem to
> > be using it not as an RCU primitive but as a handy thing which happens
> > to have desirable side-effects.  But the implementation of
> > synchronize_sched() differs considerably according to which rcu
> > flavor-of-the-minute you're using.
> 
> It is documented that synchronize_sched() should play well with
> preempt_disable/enable. From the comment:
> 
> 	Note that preempt_disable(),
> 	local_irq_disable(), and so on may be used in place of
> 	rcu_read_lock_sched().
> 
> But I guess this needs more discussion, I see other emails in this
> thread...
> 
> > And part 3 talks about the reader's critical section.  The only
> > critical sections I can see on the reader side are already covered by
> > mutex_lock() and preempt_diable().
> 
> Yes, but we need to ensure that if we take the lock for writing, we
> should see all memory modifications done under down_read/up_read().
> 
> IOW. Suppose that the reader does
> 
> 	percpu_down_read();
> 	STORE;
> 	percpu_up_read();	// no barriers in the fast path
> 
> The writer should see the result of that STORE under percpu_down_write().
> 
> Part 3 tries to say that at this point we should already see the result,
> so we should not worry about acquire/release semantics.
> 
> > If this code isn't as brain damaged as it
> > initially appears then please,
> 
> I hope ;)
> 
> > go easy on us simpletons in the next
> > version?
> 
> Well, I'll try to update the comments... but the code is simple, I do
> not think I can simplify it more. The nontrivial part is the barriers,
> but this is always nontrivial.
> 
> Contrary, I am going to try to add some complications later, so that
> it can have more users. In particular, I think it can replace
> get_online_cpus/cpu_hotplug_begin, just we need
> percpu_down_write_but_dont_deadlock_with_recursive_readers().

I must confess that I am a bit concerned about possible scalability
bottlenecks in the current get_online_cpus(), so +1 from me on this one.

							Thanx, Paul


  reply	other threads:[~2012-11-09 17:44 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15 19:09 [RFC PATCH 0/2] uprobes: register/unregister can race with fork Oleg Nesterov
2012-10-15 19:10 ` [PATCH 1/2] brw_mutex: big read-write mutex Oleg Nesterov
2012-10-15 23:28   ` Paul E. McKenney
2012-10-16 15:56     ` Oleg Nesterov
2012-10-16 18:58       ` Paul E. McKenney
2012-10-17 16:37         ` Oleg Nesterov
2012-10-17 22:28           ` Paul E. McKenney
2012-10-16 19:56   ` Linus Torvalds
2012-10-17 16:59     ` Oleg Nesterov
2012-10-17 22:44       ` Paul E. McKenney
2012-10-18 16:24         ` Oleg Nesterov
2012-10-18 16:38           ` Paul E. McKenney
2012-10-18 17:57             ` Oleg Nesterov
2012-10-18 19:28               ` Mikulas Patocka
2012-10-19 12:38                 ` Peter Zijlstra
2012-10-19 15:32                   ` Mikulas Patocka
2012-10-19 17:40                     ` Peter Zijlstra
2012-10-19 17:57                       ` Oleg Nesterov
2012-10-19 22:54                       ` Mikulas Patocka
2012-10-24  3:08                         ` Dave Chinner
2012-10-25 14:09                           ` Mikulas Patocka
2012-10-25 23:40                             ` Dave Chinner
2012-10-26 12:06                               ` Oleg Nesterov
2012-10-26 13:22                                 ` Mikulas Patocka
2012-10-26 14:12                                   ` Oleg Nesterov
2012-10-26 15:23                                     ` mark_files_ro && sb_end_write Oleg Nesterov
2012-10-26 16:09                                     ` [PATCH 1/2] brw_mutex: big read-write mutex Mikulas Patocka
2012-10-19 17:49                     ` Oleg Nesterov
2012-10-22 23:09                       ` Mikulas Patocka
2012-10-23 15:12                         ` Oleg Nesterov
2012-10-19 19:28               ` Paul E. McKenney
2012-10-22 23:36                 ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Mikulas Patocka
2012-10-22 23:37                   ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Mikulas Patocka
2012-10-22 23:39                     ` [PATCH 2/2] percpu-rw-semaphores: use rcu_read_lock_sched Mikulas Patocka
2012-10-24 16:16                       ` Paul E. McKenney
2012-10-24 17:18                         ` Oleg Nesterov
2012-10-24 18:20                           ` Paul E. McKenney
2012-10-24 18:43                             ` Oleg Nesterov
2012-10-24 19:43                               ` Paul E. McKenney
2012-10-25 14:54                         ` Mikulas Patocka
2012-10-25 15:07                           ` Paul E. McKenney
2012-10-25 16:15                             ` Mikulas Patocka
2012-10-23 16:59                     ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Oleg Nesterov
2012-10-23 18:05                       ` Paul E. McKenney
2012-10-23 18:27                         ` Oleg Nesterov
2012-10-23 18:41                         ` Oleg Nesterov
2012-10-23 20:29                           ` Paul E. McKenney
2012-10-23 20:32                             ` Paul E. McKenney
2012-10-23 21:39                               ` Mikulas Patocka
2012-10-24 16:23                                 ` Paul E. McKenney
2012-10-24 20:22                                   ` Mikulas Patocka
2012-10-24 20:36                                     ` Paul E. McKenney
2012-10-24 20:44                                       ` Mikulas Patocka
2012-10-24 23:57                                         ` Paul E. McKenney
2012-10-25 12:39                                           ` Paul E. McKenney
2012-10-25 13:48                                           ` Mikulas Patocka
2012-10-23 19:23                       ` Oleg Nesterov
2012-10-23 20:45                         ` Peter Zijlstra
2012-10-23 20:57                         ` Peter Zijlstra
2012-10-24 15:11                           ` Oleg Nesterov
2012-10-23 21:26                         ` Mikulas Patocka
2012-10-23 20:32                     ` Peter Zijlstra
2012-10-30 18:48                   ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Oleg Nesterov
2012-10-31 19:41                     ` [PATCH 0/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Oleg Nesterov
2012-10-31 19:41                       ` [PATCH 1/1] " Oleg Nesterov
2012-11-01 15:10                         ` Linus Torvalds
2012-11-01 15:34                           ` Oleg Nesterov
2012-11-02 18:06                           ` [PATCH v2 0/1] " Oleg Nesterov
2012-11-02 18:06                             ` [PATCH v2 1/1] " Oleg Nesterov
2012-11-07 17:04                               ` [PATCH v3 " Mikulas Patocka
2012-11-07 17:47                                 ` Oleg Nesterov
2012-11-07 19:17                                   ` Mikulas Patocka
2012-11-08 13:42                                     ` Oleg Nesterov
2012-11-08  1:23                                 ` Paul E. McKenney
2012-11-08  1:16                               ` [PATCH v2 " Paul E. McKenney
2012-11-08 13:33                                 ` Oleg Nesterov
2012-11-08 16:27                                   ` Paul E. McKenney
2012-11-08 13:48                             ` [PATCH RESEND v2 0/1] " Oleg Nesterov
2012-11-08 13:48                               ` [PATCH RESEND v2 1/1] " Oleg Nesterov
2012-11-08 20:07                                 ` Andrew Morton
2012-11-08 21:08                                   ` Paul E. McKenney
2012-11-08 23:41                                     ` Mikulas Patocka
2012-11-09  0:41                                       ` Paul E. McKenney
2012-11-09  3:23                                         ` Paul E. McKenney
2012-11-09 16:35                                           ` Oleg Nesterov
2012-11-09 16:59                                             ` Paul E. McKenney
2012-11-09 12:47                                   ` Mikulas Patocka
2012-11-09 15:46                                   ` Oleg Nesterov
2012-11-09 17:01                                     ` Paul E. McKenney [this message]
2012-11-09 18:10                                       ` Oleg Nesterov
2012-11-09 18:19                                         ` Oleg Nesterov
2012-11-10  0:55                                         ` Paul E. McKenney
2012-11-11 15:45                                           ` Oleg Nesterov
2012-11-12 18:38                                             ` Paul E. McKenney
2012-11-11 18:27                                   ` [PATCH -mm] percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessari ly.fix Oleg Nesterov
2012-11-12 18:31                                     ` Paul E. McKenney
2012-11-16 23:22                                     ` Andrew Morton
2012-11-18 19:32                                       ` Oleg Nesterov
2012-11-01 15:43                         ` [PATCH 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Paul E. McKenney
2012-11-01 18:33                           ` Oleg Nesterov
2012-11-02 16:18                             ` Oleg Nesterov
2012-10-15 19:10 ` [PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race Oleg Nesterov
2012-10-18  7:03   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121109170107.GB2419@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=anton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpatocka@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.