All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	manfred@colorfullife.com, Waiman.Long@hpe.com, mingo@kernel.org,
	torvalds@linux-foundation.org, ggherdovich@suse.com,
	mgorman@techsingularity.net, linux-kernel@vger.kernel.org,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Will Deacon <will.deacon@arm.com>
Subject: Re: sem_lock() vs qspinlocks
Date: Fri, 20 May 2016 17:21:49 +0200	[thread overview]
Message-ID: <20160520152149.GH3193@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160520140533.GA20726@insomnia>

On Fri, May 20, 2016 at 10:05:33PM +0800, Boqun Feng wrote:
> On Fri, May 20, 2016 at 01:58:19PM +0200, Peter Zijlstra wrote:
> > On Thu, May 19, 2016 at 10:39:26PM -0700, Davidlohr Bueso wrote:
> > > As such, the following restores the behavior of the ticket locks and 'fixes'
> > > (or hides?) the bug in sems. Naturally incorrect approach:
> > > 
> > > @@ -290,7 +290,8 @@ static void sem_wait_array(struct sem_array *sma)
> > > 
> > > 	for (i = 0; i < sma->sem_nsems; i++) {
> > > 		sem = sma->sem_base + i;
> > > -               spin_unlock_wait(&sem->lock);
> > > +               while (atomic_read(&sem->lock))
> > > +                       cpu_relax();
> > > 	}
> > > 	ipc_smp_acquire__after_spin_is_unlocked();
> > > }
> > 
> > The actual bug is clear_pending_set_locked() not having acquire
> > semantics. And the above 'fixes' things because it will observe the old
> > pending bit or the locked bit, so it doesn't matter if the store
> > flipping them is delayed.
> > 
> > The comment in queued_spin_lock_slowpath() above the smp_cond_acquire()
> > states that that acquire is sufficient, but this is incorrect in the
> > face of spin_is_locked()/spin_unlock_wait() usage only looking at the
> > lock byte.
> > 
> > The problem is that the clear_pending_set_locked() is an unordered
> > store, therefore this store can be delayed until no later than
> > spin_unlock() (which orders against it due to the address dependency).
> > 
> > This opens numerous races; for example:
> > 
> > 	ipc_lock_object(&sma->sem_perm);
> > 	sem_wait_array(sma);
> > 
> > 				false   ->	spin_is_locked(&sma->sem_perm.lock)
> > 
> > is entirely possible, because sem_wait_array() consists of pure reads,
> > so the store can pass all that, even on x86.
> > 
> > The below 'hack' seems to solve the problem.
> > 
> > _However_ this also means the atomic_cmpxchg_relaxed() in the locked:
> > branch is equally wrong -- although not visible on x86. And note that
> > atomic_cmpxchg_acquire() would not in fact be sufficient either, since
> > the acquire is on the LOAD not the STORE of the LL/SC.
> > 
> > I need a break of sorts, because after twisting my head around the sem
> > code and then the qspinlock code I'm wrecked. I'll try and make a proper
> > patch if people can indeed confirm my thinking here.
> > 
> 
> I think your analysis is right, however, the problem only exists if we
> have the following use pattern, right?
> 
> 	CPU 0			CPU 1
> 	====================	==================
> 	spin_lock(A);		spin_lock(B);
> 	spin_unlock_wait(B);	spin_unlock_wait(A);
> 	do_something();		do_something();

More or less yes. The semaphore code is like:

	spin_lock(A)		spin_lock(B)
	spin_unlock_wait(B)	spin_is_locked(A)

which shows that both spin_is_locked() and spin_unlock_wait() are in the
same class.

> , which ends up CPU 0 and 1 both running do_something(). And actually
> this can be simply fixed by add smp_mb() between spin_lock() and
> spin_unlock_wait() on both CPU, or add an smp_mb() in spin_unlock_wait()
> as PPC does in 51d7d5205d338 "powerpc: Add smp_mb() to arch_spin_is_locked()".

Right and arm64 does in d86b8da04dfa. Curiously you only fixed
spin_is_locked() and Will only fixed spin_unlock_wait, while AFAIU we
need to have _BOTH_ fixed.

Now looking at the PPC code, spin_unlock_wait() as per
arch/powerpc/lib/locks.c actually does included the extra smp_mb().

> So if relaxed/acquire atomics and clear_pending_set_locked() work fine
> in other situations, a proper fix would be fixing the
> spin_is_locked()/spin_unlock_wait() or their users?

Right; the relaxed stores work fine for the 'regular' mutual exclusive
critical section usage of locks. And yes, I think only the case you
outlined can care about it.

Let me write a patch..

  reply	other threads:[~2016-05-20 15:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-20  5:39 sem_lock() vs qspinlocks Davidlohr Bueso
2016-05-20  7:49 ` Peter Zijlstra
2016-05-20 15:00   ` Davidlohr Bueso
2016-05-20 15:05     ` Peter Zijlstra
2016-05-20 15:25       ` Davidlohr Bueso
2016-05-20 15:28       ` Peter Zijlstra
2016-05-20 20:47     ` Waiman Long
2016-05-20 20:52       ` Peter Zijlstra
2016-05-21  0:59         ` Davidlohr Bueso
2016-05-21  4:01           ` Waiman Long
2016-05-21  7:40             ` Peter Zijlstra
2016-05-20  7:53 ` Peter Zijlstra
2016-05-20  8:13 ` Peter Zijlstra
2016-05-20  8:18   ` Peter Zijlstra
2016-05-20  9:07     ` Giovanni Gherdovich
2016-05-20  9:34       ` Peter Zijlstra
2016-05-20  8:30 ` Peter Zijlstra
2016-05-20  9:00   ` Peter Zijlstra
2016-05-20 10:09     ` Ingo Molnar
2016-05-20 10:45       ` Mel Gorman
2016-05-20 11:58 ` Peter Zijlstra
2016-05-20 14:05   ` Boqun Feng
2016-05-20 15:21     ` Peter Zijlstra [this message]
2016-05-20 16:04       ` Peter Zijlstra
2016-05-20 17:00         ` Linus Torvalds
2016-05-20 21:06           ` Peter Zijlstra
2016-05-20 21:44             ` Linus Torvalds
2016-05-21  0:48               ` Davidlohr Bueso
2016-05-21  2:30                 ` Linus Torvalds
2016-05-21  7:37                 ` Peter Zijlstra
2016-05-21 13:49                   ` Manfred Spraul
2016-05-24 10:57                     ` Peter Zijlstra
2016-05-21 17:14                   ` Davidlohr Bueso
2016-05-23 12:25           ` Peter Zijlstra
2016-05-23 17:52             ` Linus Torvalds
2016-05-25  6:37               ` Boqun Feng
2016-05-22  8:43         ` Manfred Spraul
2016-05-22  9:38           ` Peter Zijlstra
2016-05-20 16:20   ` Davidlohr Bueso
2016-05-20 20:44   ` Waiman Long
2016-05-20 20:53     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160520152149.GH3193@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Waiman.Long@hpe.com \
    --cc=boqun.feng@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=ggherdovich@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.