All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Anton Arapov <anton@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers
Date: Tue, 23 Oct 2012 11:05:58 -0700	[thread overview]
Message-ID: <20121023180558.GF2585@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121023165912.GA18712@redhat.com>

On Tue, Oct 23, 2012 at 06:59:12PM +0200, Oleg Nesterov wrote:
> Not really the comment, but the question...
> 
> On 10/22, Mikulas Patocka wrote:
> >
> >  static inline void percpu_down_read(struct percpu_rw_semaphore *p)
> >  {
> >  	rcu_read_lock();
> > @@ -24,22 +27,12 @@ static inline void percpu_down_read(stru
> >  	}
> >  	this_cpu_inc(*p->counters);
> >  	rcu_read_unlock();
> > +	light_mb(); /* A, between read of p->locked and read of data, paired with D */
> >  }
> 
> rcu_read_unlock() (or even preempt_enable) should have compiler barrier
> semantics... But I agree, this adds more documentation for free.

Although rcu_read_lock() does have compiler-barrier semantics if
CONFIG_PREEMPT=y, it does not for CONFIG_PREEMPT=n.  So the
light_mb() (which appears to be barrier()) is needed in that case.

> >  static inline void percpu_up_read(struct percpu_rw_semaphore *p)
> >  {
> > -	/*
> > -	 * On X86, write operation in this_cpu_dec serves as a memory unlock
> > -	 * barrier (i.e. memory accesses may be moved before the write, but
> > -	 * no memory accesses are moved past the write).
> > -	 * On other architectures this may not be the case, so we need smp_mb()
> > -	 * there.
> > -	 */
> > -#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && !defined(CONFIG_X86_OOSTORE))
> > -	barrier();
> > -#else
> > -	smp_mb();
> > -#endif
> > +	light_mb(); /* B, between read of the data and write to p->counter, paired with C */
> >  	this_cpu_dec(*p->counters);
> >  }
> >
> > @@ -61,11 +54,12 @@ static inline void percpu_down_write(str
> >  	synchronize_rcu();
> >  	while (__percpu_count(p->counters))
> >  		msleep(1);
> > -	smp_rmb(); /* paired with smp_mb() in percpu_sem_up_read() */
> > +	heavy_mb(); /* C, between read of p->counter and write to data, paired with B */
> 
> I _think_ this is correct.
> 
> 
> Just I am wondering if this is strongly correct in theory, I would
> really like to know what Paul thinks.

I need to take a closer look.

> Ignoring the current implementation, according to the documentation
> synchronize_sched() has all rights to return immediately if there is
> no active rcu_read_lock_sched() section. If this were possible, than
> percpu_up_read() lacks mb.

Even if there happen to be no RCU-sched read-side critical sections
at the current instant, synchronize_sched() is required to make sure
that everyone agrees that whatever code is executed by the caller after
synchronize_sched() returns happens after any of the preceding RCU
read-side critical sections.

So, if we have this, with x==0 initially:

	Task 0					Task 1

						rcu_read_lock_sched();
						x = 1;
						rcu_read_unlock_sched();
	synchronize_sched();
	r1 = x;

Then the value of r1 had better be one.

Of course, the above code fragment is doing absolutely nothing to ensure
that the synchronize_sched() really does start after Task 1's very strange
RCU read-side critical section, but if things did happen in that order,
synchronize_sched() would be required to make this guarantee.

> So _perhaps_ it makes sense to document that synchronize_sched() also
> guarantees that all pending loads/stores on other CPUs should be
> completed upon return? Or I misunderstood the patch?

Good point.  The current documentation implies that it does make that
guarantee, but it would be good for it to be explicit.  Queued for 3.8
is the following addition:

 * Note that this guarantee implies a further memory-ordering guarantee.
 * On systems with more than one CPU, when synchronize_sched() returns,
 * each CPU is guaranteed to have executed a full memory barrier since
 * the end of its last RCU read-side critical section whose beginning
 * preceded the call to synchronize_sched().  Note that this guarantee
 * includes CPUs that are offline, idle, or executing in user mode, as
 * well as CPUs that are executing in the kernel.  Furthermore, if CPU A
 * invoked synchronize_sched(), which returned to its caller on CPU B,
 * then both CPU A and CPU B are guaranteed to have executed a full memory
 * barrier during the execution of synchronize_sched().

The full comment block now reads:

/**
 * synchronize_sched - wait until an rcu-sched grace period has elapsed.
 *
 * Control will return to the caller some time after a full rcu-sched
 * grace period has elapsed, in other words after all currently executing
 * rcu-sched read-side critical sections have completed.   These read-side
 * critical sections are delimited by rcu_read_lock_sched() and
 * rcu_read_unlock_sched(), and may be nested.  Note that preempt_disable(),
 * local_irq_disable(), and so on may be used in place of
 * rcu_read_lock_sched().
 *
 * This means that all preempt_disable code sequences, including NMI and
 * hardware-interrupt handlers, in progress on entry will have completed
 * before this primitive returns.  However, this does not guarantee that
 * softirq handlers will have completed, since in some kernels, these
 * handlers can run in process context, and can block.
 *
 * Note that this guarantee implies a further memory-ordering guarantee.
 * On systems with more than one CPU, when synchronize_sched() returns,
 * each CPU is guaranteed to have executed a full memory barrier since
 * the end of its last RCU read-side critical section whose beginning
 * preceded the call to synchronize_sched().  Note that this guarantee
 * includes CPUs that are offline, idle, or executing in user mode, as
 * well as CPUs that are executing in the kernel.  Furthermore, if CPU A
 * invoked synchronize_sched(), which returned to its caller on CPU B,
 * then both CPU A and CPU B are guaranteed to have executed a full memory
 * barrier during the execution of synchronize_sched().
 *
 * This primitive provides the guarantees made by the (now removed)
 * synchronize_kernel() API.  In contrast, synchronize_rcu() only
 * guarantees that rcu_read_lock() sections will have completed.
 * In "classic RCU", these two guarantees happen to be one and
 * the same, but can differ in realtime RCU implementations.
 */

If this wording looks good to you, I will apply it to the other
grace-period primitives as well.

							Thanx, Paul


  reply	other threads:[~2012-10-23 18:07 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15 19:09 [RFC PATCH 0/2] uprobes: register/unregister can race with fork Oleg Nesterov
2012-10-15 19:10 ` [PATCH 1/2] brw_mutex: big read-write mutex Oleg Nesterov
2012-10-15 23:28   ` Paul E. McKenney
2012-10-16 15:56     ` Oleg Nesterov
2012-10-16 18:58       ` Paul E. McKenney
2012-10-17 16:37         ` Oleg Nesterov
2012-10-17 22:28           ` Paul E. McKenney
2012-10-16 19:56   ` Linus Torvalds
2012-10-17 16:59     ` Oleg Nesterov
2012-10-17 22:44       ` Paul E. McKenney
2012-10-18 16:24         ` Oleg Nesterov
2012-10-18 16:38           ` Paul E. McKenney
2012-10-18 17:57             ` Oleg Nesterov
2012-10-18 19:28               ` Mikulas Patocka
2012-10-19 12:38                 ` Peter Zijlstra
2012-10-19 15:32                   ` Mikulas Patocka
2012-10-19 17:40                     ` Peter Zijlstra
2012-10-19 17:57                       ` Oleg Nesterov
2012-10-19 22:54                       ` Mikulas Patocka
2012-10-24  3:08                         ` Dave Chinner
2012-10-25 14:09                           ` Mikulas Patocka
2012-10-25 23:40                             ` Dave Chinner
2012-10-26 12:06                               ` Oleg Nesterov
2012-10-26 13:22                                 ` Mikulas Patocka
2012-10-26 14:12                                   ` Oleg Nesterov
2012-10-26 15:23                                     ` mark_files_ro && sb_end_write Oleg Nesterov
2012-10-26 16:09                                     ` [PATCH 1/2] brw_mutex: big read-write mutex Mikulas Patocka
2012-10-19 17:49                     ` Oleg Nesterov
2012-10-22 23:09                       ` Mikulas Patocka
2012-10-23 15:12                         ` Oleg Nesterov
2012-10-19 19:28               ` Paul E. McKenney
2012-10-22 23:36                 ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Mikulas Patocka
2012-10-22 23:37                   ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Mikulas Patocka
2012-10-22 23:39                     ` [PATCH 2/2] percpu-rw-semaphores: use rcu_read_lock_sched Mikulas Patocka
2012-10-24 16:16                       ` Paul E. McKenney
2012-10-24 17:18                         ` Oleg Nesterov
2012-10-24 18:20                           ` Paul E. McKenney
2012-10-24 18:43                             ` Oleg Nesterov
2012-10-24 19:43                               ` Paul E. McKenney
2012-10-25 14:54                         ` Mikulas Patocka
2012-10-25 15:07                           ` Paul E. McKenney
2012-10-25 16:15                             ` Mikulas Patocka
2012-10-23 16:59                     ` [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers Oleg Nesterov
2012-10-23 18:05                       ` Paul E. McKenney [this message]
2012-10-23 18:27                         ` Oleg Nesterov
2012-10-23 18:41                         ` Oleg Nesterov
2012-10-23 20:29                           ` Paul E. McKenney
2012-10-23 20:32                             ` Paul E. McKenney
2012-10-23 21:39                               ` Mikulas Patocka
2012-10-24 16:23                                 ` Paul E. McKenney
2012-10-24 20:22                                   ` Mikulas Patocka
2012-10-24 20:36                                     ` Paul E. McKenney
2012-10-24 20:44                                       ` Mikulas Patocka
2012-10-24 23:57                                         ` Paul E. McKenney
2012-10-25 12:39                                           ` Paul E. McKenney
2012-10-25 13:48                                           ` Mikulas Patocka
2012-10-23 19:23                       ` Oleg Nesterov
2012-10-23 20:45                         ` Peter Zijlstra
2012-10-23 20:57                         ` Peter Zijlstra
2012-10-24 15:11                           ` Oleg Nesterov
2012-10-23 21:26                         ` Mikulas Patocka
2012-10-23 20:32                     ` Peter Zijlstra
2012-10-30 18:48                   ` [PATCH 0/2] fix and improvements for percpu-rw-semaphores (was: brw_mutex: big read-write mutex) Oleg Nesterov
2012-10-31 19:41                     ` [PATCH 0/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Oleg Nesterov
2012-10-31 19:41                       ` [PATCH 1/1] " Oleg Nesterov
2012-11-01 15:10                         ` Linus Torvalds
2012-11-01 15:34                           ` Oleg Nesterov
2012-11-02 18:06                           ` [PATCH v2 0/1] " Oleg Nesterov
2012-11-02 18:06                             ` [PATCH v2 1/1] " Oleg Nesterov
2012-11-07 17:04                               ` [PATCH v3 " Mikulas Patocka
2012-11-07 17:47                                 ` Oleg Nesterov
2012-11-07 19:17                                   ` Mikulas Patocka
2012-11-08 13:42                                     ` Oleg Nesterov
2012-11-08  1:23                                 ` Paul E. McKenney
2012-11-08  1:16                               ` [PATCH v2 " Paul E. McKenney
2012-11-08 13:33                                 ` Oleg Nesterov
2012-11-08 16:27                                   ` Paul E. McKenney
2012-11-08 13:48                             ` [PATCH RESEND v2 0/1] " Oleg Nesterov
2012-11-08 13:48                               ` [PATCH RESEND v2 1/1] " Oleg Nesterov
2012-11-08 20:07                                 ` Andrew Morton
2012-11-08 21:08                                   ` Paul E. McKenney
2012-11-08 23:41                                     ` Mikulas Patocka
2012-11-09  0:41                                       ` Paul E. McKenney
2012-11-09  3:23                                         ` Paul E. McKenney
2012-11-09 16:35                                           ` Oleg Nesterov
2012-11-09 16:59                                             ` Paul E. McKenney
2012-11-09 12:47                                   ` Mikulas Patocka
2012-11-09 15:46                                   ` Oleg Nesterov
2012-11-09 17:01                                     ` Paul E. McKenney
2012-11-09 18:10                                       ` Oleg Nesterov
2012-11-09 18:19                                         ` Oleg Nesterov
2012-11-10  0:55                                         ` Paul E. McKenney
2012-11-11 15:45                                           ` Oleg Nesterov
2012-11-12 18:38                                             ` Paul E. McKenney
2012-11-11 18:27                                   ` [PATCH -mm] percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessari ly.fix Oleg Nesterov
2012-11-12 18:31                                     ` Paul E. McKenney
2012-11-16 23:22                                     ` Andrew Morton
2012-11-18 19:32                                       ` Oleg Nesterov
2012-11-01 15:43                         ` [PATCH 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily Paul E. McKenney
2012-11-01 18:33                           ` Oleg Nesterov
2012-11-02 16:18                             ` Oleg Nesterov
2012-10-15 19:10 ` [PATCH 2/2] uprobes: Use brw_mutex to fix register/unregister vs dup_mmap() race Oleg Nesterov
2012-10-18  7:03   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121023180558.GF2585@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=anton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mpatocka@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.