All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	mingo@kernel.org, linux-kernel@vger.kernel.org, corbet@lwn.net,
	mhocko@kernel.org, dhowells@redhat.com,
	torvalds@linux-foundation.org, will.deacon@arm.com
Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()
Date: Thu, 12 Nov 2015 05:50:24 -0800	[thread overview]
Message-ID: <20151112135024.GQ3972@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151111103456.GB6314@fixme-laptop.cn.ibm.com>

On Wed, Nov 11, 2015 at 06:34:56PM +0800, Boqun Feng wrote:
> On Wed, Nov 11, 2015 at 05:39:40PM +0800, Boqun Feng wrote:
> > Hi Oleg,
> > 
> > On Tue, Nov 03, 2015 at 06:59:58PM +0100, Oleg Nesterov wrote:
> > [snip]
> > > 
> > > Unfortunately this doesn't look exactly right...
> > > 
> > > spin_unlock_wait() is not equal to "while (locked) relax", the latter
> > > is live-lockable or at least sub-optimal: we do not really need to spin
> > 
> > Just be curious, should spin_unlock_wait() semantically be an ACQUIRE?
> 
> Hmm.. I guess I was wrong, it doesn't need to be an ACQUIRE, it needs
> only to use the control dependency to order the load of lock state and
> stores following it.

I must say that spin_unlock_wait() and friends are a bit tricky.
There are only a few uses in drivers/ata/libata-eh.c, ipc/sem.c,
kernel/exit.c, kernel/sched/completion.c, and kernel/task_work.c.
Almost all of these seem to assume that spin_unlock_wait() provides
no ordering.  We might have just barely enough uses to produce useful
abstractions, but my guess is that it would not hurt to wait.

> > Because spin_unlock_wait() is used for us to wait for a certain lock to
> > RELEASE so that we can do something *after* we observe the RELEASE.
> > Considering the follow example:
> > 
> > 	CPU 0 				CPU 1
> > 	============================	===========================
> > 	{ X = 0 }
> > 					WRITE_ONCE(X, 1);
> > 					spin_unlock(&lock);
> > 	spin_unlock_wait(&lock)
> > 	r1 = READ_ONCE(X);
> > 
> > If spin_unlock_wait() is not an ACQUIRE, r1 can be 0 in this case,
> > right? Am I missing something subtle here? Or spin_unlock_wait() itself
> > doesn't have the ACQUIRE semantics, but it should always come with a
> > smp_mb() *following* it to achieve the ACQUIRE semantics? However in
> > do_exit(), an smp_mb() is preceding raw_spin_unlock_wait() rather than
> > following, which makes me confused... could you explain that? Thank you
> > ;-)
> > 
> 
> But still, there is one suspicious use of smp_mb() in do_exit():
> 
> 	/*
> 	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
> 	 * when the following two conditions become true.
> 	 *   - There is race condition of mmap_sem (It is acquired by
> 	 *     exit_mm()), and
> 	 *   - SMI occurs before setting TASK_RUNINNG.
> 	 *     (or hypervisor of virtual machine switches to other guest)
> 	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
> 	 *
> 	 * To avoid it, we have to wait for releasing tsk->pi_lock which
> 	 * is held by try_to_wake_up()
> 	 */
> 	smp_mb();
> 	raw_spin_unlock_wait(&tsk->pi_lock);
> 
> 	/* causes final put_task_struct in finish_task_switch(). */
> 	tsk->state = TASK_DEAD;
> 	tsk->flags |= PF_NOFREEZE;	/* tell freezer to ignore us */
> 	schedule();
> 
> Seems like smp_mb() doesn't need here? Because the control dependency
> already orders load of tsk->pi_lock and store of tsk->state, and this
> control dependency order guarantee pairs with the spin_unlock(->pi_lock)
> in try_to_wake_up() to avoid data race on ->state.

The exit() path is pretty heavyweight, so I suspect that an extra smp_mb()
is down in the noise.  Or are you saying that this is somehow unsafe?

							Thanx, Paul

> Regards,
> Boqun
> 
> > Regards,
> > Boqun
> > 
> > > until we observe !spin_is_locked(), we only need to synchronize with the
> > > current owner of this lock. Once it drops the lock we can proceed, we
> > > do not care if another thread takes the same lock right after that.
> > > 
> > > Oleg.
> > > 
> 
> 



  parent reply	other threads:[~2015-11-12 13:50 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 13:29 [PATCH 0/4] scheduler ordering bits Peter Zijlstra
2015-11-02 13:29 ` [PATCH 1/4] sched: Better document the try_to_wake_up() barriers Peter Zijlstra
2015-12-04  0:09   ` Byungchul Park
2015-12-04  0:58   ` Byungchul Park
2015-11-02 13:29 ` [PATCH 2/4] sched: Document Program-Order guarantees Peter Zijlstra
2015-11-02 20:27   ` Paul Turner
2015-11-02 20:34     ` Peter Zijlstra
2015-11-02 22:09       ` Paul Turner
2015-11-02 22:12         ` Peter Zijlstra
2015-11-20 10:02     ` Peter Zijlstra
2015-11-20 14:08       ` Boqun Feng
2015-11-20 14:18         ` Peter Zijlstra
2015-11-20 14:21           ` Boqun Feng
2015-11-20 19:41             ` Peter Zijlstra
2015-11-02 13:29 ` [PATCH 3/4] sched: Fix a race in try_to_wake_up() vs schedule() Peter Zijlstra
2015-11-02 13:29 ` [PATCH 4/4] locking: Introduce smp_cond_acquire() Peter Zijlstra
2015-11-02 13:57   ` Peter Zijlstra
2015-11-02 17:43     ` Will Deacon
2015-11-03  1:14       ` Paul E. McKenney
2015-11-03  1:25         ` Linus Torvalds
2015-11-02 17:42   ` Will Deacon
2015-11-02 18:08   ` Linus Torvalds
2015-11-02 18:37     ` Will Deacon
2015-11-02 19:17       ` Linus Torvalds
2015-11-02 19:57         ` Will Deacon
2015-11-02 20:23           ` Peter Zijlstra
2015-11-02 21:56         ` Peter Zijlstra
2015-11-03  1:57         ` Paul E. McKenney
2015-11-03 19:40           ` Linus Torvalds
2015-11-04  3:57             ` Paul E. McKenney
2015-11-04  4:43               ` Linus Torvalds
2015-11-04 12:54                 ` Paul E. McKenney
2015-11-02 20:36   ` David Howells
2015-11-02 20:40     ` Peter Zijlstra
2015-11-02 21:11     ` Linus Torvalds
2015-11-03 17:59   ` Oleg Nesterov
2015-11-03 18:23     ` Peter Zijlstra
2015-11-11  9:39     ` Boqun Feng
2015-11-11 10:34       ` Boqun Feng
2015-11-11 19:53         ` Oleg Nesterov
2015-11-12 13:50         ` Paul E. McKenney [this message]
2015-11-11 12:12       ` Peter Zijlstra
2015-11-11 19:39         ` Oleg Nesterov
2015-11-11 21:23           ` Linus Torvalds
2015-11-12  7:14           ` Boqun Feng
2015-11-12 10:28             ` Peter Zijlstra
2015-11-12 15:00             ` Oleg Nesterov
2015-11-12 14:40               ` Paul E. McKenney
2015-11-12 14:49                 ` Boqun Feng
2015-11-12 15:02                   ` Paul E. McKenney
2015-11-12 21:53                     ` Will Deacon
2015-11-12 14:50                 ` Peter Zijlstra
2015-11-12 15:01                   ` Paul E. McKenney
2015-11-12 15:08                     ` Peter Zijlstra
2015-11-12 15:20                       ` Paul E. McKenney
2015-11-12 21:25                         ` Will Deacon
2015-11-12 15:18               ` Boqun Feng
2015-11-12 18:38                 ` Oleg Nesterov
2015-11-12 18:02                   ` Peter Zijlstra
2015-11-12 19:33                     ` Oleg Nesterov
2015-11-12 18:59                       ` Paul E. McKenney
2015-11-12 21:33                         ` Will Deacon
2015-11-12 23:43                           ` Paul E. McKenney
2015-11-16 13:58                             ` Will Deacon
2015-11-12 18:21             ` Linus Torvalds
2015-11-12 22:09               ` Will Deacon
2015-11-16 15:56               ` Peter Zijlstra
2015-11-16 16:04                 ` Peter Zijlstra
2015-11-16 16:24                   ` Will Deacon
2015-11-16 16:44                     ` Paul E. McKenney
2015-11-16 16:46                       ` Will Deacon
2015-11-16 17:15                         ` Paul E. McKenney
2015-11-16 21:58                     ` Linus Torvalds
2015-11-17 11:51                       ` Will Deacon
2015-11-17 21:01                         ` Paul E. McKenney
2015-11-18 11:25                           ` Will Deacon
2015-11-19 18:01                             ` Will Deacon
2015-11-20 10:09                               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151112135024.GQ3972@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.