linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Alan Stern" <stern@rowland.harvard.edu>,
	"Will Deacon" <will.deacon@arm.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Andrea Parri" <parri.andrea@gmail.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	priyalee.kushwaha@intel.com,
	"Stanisław Drozd" <drozdziak1@gmail.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	ldr709@gmail.com, "Thomas Gleixner" <tglx@linutronix.de>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Nicolas Pitre" <nico@linaro.org>,
	"Krister Johansen" <kjlx@templeofstupid.com>,
	"Vegard Nossum" <vegard.nossum@oracle.com>,
	dcb314@hotmail.com, "Wu Fengguang" <fengguang.wu@intel.com>,
	"Frederic Weisbecker" <fweisbec@gmail.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Luc Maranget" <luc.maranget@inria.fr>,
	"Jade Alglave" <j.alglave@ucl.ac.uk>
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13
Date: Fri, 30 Jun 2017 13:16:54 +0800	[thread overview]
Message-ID: <20170630051654.wsoog5nlwtmbh5y2@tardis> (raw)
In-Reply-To: <20170630040241.GR2393@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 7480 bytes --]

On Thu, Jun 29, 2017 at 09:02:41PM -0700, Paul E. McKenney wrote:
[...]
> > > o	net/netfilter/nf_conntrack_core.c nf_conntrack_lock()
> > > 	This instance of spin_unlock_wait() interacts with
> > > 	nf_conntrack_all_lock()'s instance of spin_unlock_wait().
> > > 	Although nf_conntrack_all_lock() has an smp_mb(), which I
> > > 	believe provides release semantics given current implementations,
> > > 	nf_conntrack_lock() just has smp_rmb().
> > > 
> > > 	I believe that the smp_rmb() needs to be smp_mb().  Am I missing
> > > 	something here that makes the current code safe on x86?
> > > 
> > 
> > actually i think the smp_rmb() or even along with the spin_unlock_wait()
> > in nf_conntrack_lock() is not needed, we could
> > implementnf_conntrack_lock() as:
> > 
> > 	
> > 	void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
> > 	{
> > 		spin_lock(lock);
> > 		while (unlikely(smp_load_acquire(nf_conntrack_locks_all))) {
> > 			spin_unlock(lock);
> > 			cpu_relaxed();
> > 			spin_lock(lock);
> > 		}
> > 	}
> > 
> > because in nf_conntrack_all_unlock(), we have:
> > 
> > 		smp_store_release(&nf_conntrack_locks_all, false);
> > 		spin_unlock(&nf_conntrack_locks_all_lock);
> > 
> > so if we exit the loop, which means we observe nf_conntrack_locks_all
> > being false, we actually hold the per bucket lock and observe everything
> > before the smp_store_release(), which is the same as everything in the
> > critical section of nf_conntrack_locks_all_lock. Otherwise, we observe
> > the nf_conntrack_locks_all being true, which means a global lock
> > critical section may be on its way, we simply drop the per bucket lock
> > and test whether the global lock is finished again some time later.
> > 
> > So I think spin_unlock_wait() in the nf_conntrack_lock() just requires
> > acquire semantics, at least.
> > 
> > Maybe I miss someting?
> 
> Or perhaps I was being too paranoid.
> 
> But does the same analysis work in the case where an nf_conntrack_lock
> races with an nf_contrack_all_lock()?
> 

You mean the smp_mb()+spin_unlock_wait() in nf_conntrack_all_lock(),
right? I think it's different, because nf_conntrack_all_lock() relies
this release-like operation to let all the next critical sections of per
bucket locks observe nf_conntrack_locks_all=true, otherwise
nf_conntrack_lock() will break out the loop and access some data while
the global lock crictial section is doing the same.

The variable @nf_conntrack_locks_all is used for synchronized between
two kinds of locks and is set by nf_conntrack_all_lock(), I think this
make things different.

> > > 	I believe that this code could use spin_lock+spin_unlock without
> > > 	significant performance penalties -- I do not believe that
> > > 	nf_conntrack_locks_all_lock gets significant contention.
> > > 
> > > raw_spin_unlock_wait() (Courtesy of Andrea Parri with added commentary):
> > > 
> > > o	kernel/exit.c do_exit()
> > > 	Seems to rely on both acquire and release semantics. The
> > > 	raw_spin_unlock_wait() primitive is preceded by a smp_mb().
> > > 	But this is task exit doing spin_unlock_wait() on the task's
> > > 	lock, so spin_lock+spin_unlock should work fine here.
> > > 
> > > o	kernel/sched/core.c do_task_dead()
> > > 	Seems to rely on the acquire semantics only. The
> > > 	raw_spin_unlock_wait() primitive is preceded by an inexplicable
> > > 	smp_mb().  Again, this is task exit doing spin_unlock_wait() on
> > > 	the task's lock, so spin_lock+spin_unlock should work fine here.
> > > 
> > > o	kernel/task_work.c task_work_run()
> > > 	Seems to rely on the acquire semantics only.  This is to handle
> > 
> > I think this one needs the stronger semantics, the smp_mb() is just
> > hidden in the cmpxchg() before the raw_spin_unlock_wait() ;-)
> > 
> > cmpxchg() sets a special value to indicate the task_work has been taken,
> > and raw_spin_unlock_wait() must wait until the next critical section of
> > ->pi_lock(in task_work_cancel()) could observe this, otherwise we may
> > cancel a task_work while executing it.
> 
> But either way, replacing the spin_unlock_wait() with a spin_lock()
> immediately followed by a spin_unlock() should work correctly, right?
> 

Yep ;-) I was thinking about the case that we kept spin_unlock_wait()
with a simpler acquire semantics, and if so, we would actually have to
do the replace. But I saw your patchset of removing it, so it doesn't
matter.

Regards,
Boqun

> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > > 	a race with task_work_cancel(), which appears to be quite rare.
> > > 	So the spin_lock+spin_unlock should work fine here.
> > > 
> > > spin_lock()/spin_unlock():
> > > 
> > > o	ipc/sem.c complexmode_enter()
> > > 	This used to be spin_unlock_wait(), but was changed to a
> > > 	spin_lock()/spin_unlock() pair by 27d7be1801a4 ("ipc/sem.c:
> > > 	avoid using spin_unlock_wait()").
> > > 
> > > Looks to me like we really can drop spin_unlock_wait() in favor of
> > > momentarily acquiring the lock.  There are so few use cases that I don't
> > > see a problem open-coding this.  I will put together yet another patch
> > > series for my spin_unlock_wait() collection of patch serieses.  ;-)
> > > 
> > > > As regards (2), I did a little digging.  spin_unlock_wait was
> > > > introduced in the 2.1.36 kernel, in mid-April 1997.  I wasn't able to
> > > > find a specific patch for it in the LKML archives.  At the time it
> > > > was used in only one place in the entire kernel (in kernel/exit.c):
> > > > 
> > > > void release(struct task_struct * p)
> > > > {
> > > > 	int i;
> > > > 
> > > > 	if (!p)
> > > > 		return;
> > > > 	if (p == current) {
> > > > 		printk("task releasing itself\n");
> > > > 		return;
> > > > 	}
> > > > 	for (i=1 ; i<NR_TASKS ; i++)
> > > > 		if (task[i] == p) {
> > > > #ifdef __SMP__
> > > > 			/* FIXME! Cheesy, but kills the window... -DaveM */
> > > > 			while(p->processor != NO_PROC_ID)
> > > > 				barrier();
> > > > 			spin_unlock_wait(&scheduler_lock);
> > > > #endif
> > > > 			nr_tasks--;
> > > > 			task[i] = NULL;
> > > > 			REMOVE_LINKS(p);
> > > > 			release_thread(p);
> > > > 			if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
> > > > 				printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
> > > > 			free_kernel_stack(p->kernel_stack_page);
> > > > 			current->cmin_flt += p->min_flt + p->cmin_flt;
> > > > 			current->cmaj_flt += p->maj_flt + p->cmaj_flt;
> > > > 			current->cnswap += p->nswap + p->cnswap;
> > > > 			free_task_struct(p);
> > > > 			return;
> > > > 		}
> > > > 	panic("trying to release non-existent task");
> > > > }
> > > > 
> > > > I'm not entirely clear on the point of this call.  It looks like it 
> > > > wanted to wait until p was guaranteed not to be running on any 
> > > > processor ever again.  (I don't see why it couldn't have just acquired 
> > > > the scheduler_lock -- was release() a particularly hot path?)
> > > > 
> > > > Although it doesn't matter now, this would mean that the original
> > > > semantics of spin_unlock_wait were different from what we are
> > > > discussing.  It apparently was meant to provide the release guarantee:
> > > > any future critical sections would see the values that were visible
> > > > before the call.  Ironic.
> > > 
> > > Cute!!!  ;-)
> > > 
> > > 							Thanx, Paul
> > > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2017-06-30  5:15 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-12 21:37 [GIT PULL rcu/next] RCU commits for 4.13 Paul E. McKenney
2017-06-13  6:41 ` Ingo Molnar
2017-06-14  2:54 ` Andrea Parri
2017-06-14  4:33   ` Paul E. McKenney
2017-06-14 14:33     ` Andrea Parri
2017-06-14 20:23       ` Paul E. McKenney
2017-06-19 16:24         ` Andrea Parri
2017-06-27 20:58           ` Paul E. McKenney
2017-06-27 21:48             ` Linus Torvalds
2017-06-27 23:37               ` Paul E. McKenney
2017-06-28 15:31                 ` Alan Stern
2017-06-28 17:03                   ` Paul E. McKenney
2017-06-28 20:16                     ` Alan Stern
2017-06-28 23:54                       ` Paul E. McKenney
2017-06-29  0:05                         ` Linus Torvalds
2017-06-29  0:45                           ` Paul E. McKenney
2017-06-29  3:17                             ` Boqun Feng
2017-06-29 18:47                               ` Paul E. McKenney
2017-06-29 11:36                             ` Will Deacon
2017-06-29 11:38                           ` Will Deacon
2017-06-29 15:59                             ` Alan Stern
2017-06-29 18:11                               ` Paul E. McKenney
2017-06-30  2:51                                 ` Boqun Feng
2017-06-30  4:02                                   ` Paul E. McKenney
2017-06-30  5:16                                     ` Boqun Feng [this message]
2017-06-30 17:31                                       ` Paul E. McKenney
2017-06-29 18:46                             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170630051654.wsoog5nlwtmbh5y2@tardis \
    --to=boqun.feng@gmail.com \
    --cc=arnd@arndb.de \
    --cc=dcb314@hotmail.com \
    --cc=drozdziak1@gmail.com \
    --cc=fengguang.wu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=josh@joshtriplett.org \
    --cc=kjlx@templeofstupid.com \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=mingo@kernel.org \
    --cc=nico@linaro.org \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=priyalee.kushwaha@intel.com \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vegard.nossum@oracle.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).