From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753381AbbKLSVl (ORCPT ); Thu, 12 Nov 2015 13:21:41 -0500 Received: from mail-ig0-f174.google.com ([209.85.213.174]:38210 "EHLO mail-ig0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752584AbbKLSVj (ORCPT ); Thu, 12 Nov 2015 13:21:39 -0500 MIME-Version: 1.0 In-Reply-To: <20151112070915.GC6314@fixme-laptop.cn.ibm.com> References: <20151102132901.157178466@infradead.org> <20151102134941.005198372@infradead.org> <20151103175958.GA4800@redhat.com> <20151111093939.GA6314@fixme-laptop.cn.ibm.com> <20151111121232.GN17308@twins.programming.kicks-ass.net> <20151111193953.GA23515@redhat.com> <20151112070915.GC6314@fixme-laptop.cn.ibm.com> Date: Thu, 12 Nov 2015 10:21:39 -0800 X-Google-Sender-Auth: PwoHZcIVjZfdwBxvd0Vt9-u4suM Message-ID: Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire() From: Linus Torvalds To: Boqun Feng Cc: Oleg Nesterov , Peter Zijlstra , Ingo Molnar , Linux Kernel Mailing List , Paul McKenney , Jonathan Corbet , Michal Hocko , David Howells , Will Deacon , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 11, 2015 at 11:14 PM, Boqun Feng wrote: > > Hmm.. probably incorrect.. because the ACQUIRE semantics of spin_lock() > only guarantees that the memory operations following spin_lock() can't > be reorder before the *LOAD* part of spin_lock() not the *STORE* part, > i.e. the case below can happen(assuming the spin_lock() is implemented > as ll/sc loop) > > spin_lock(&lock): > r1 = *lock; // LL, r1 == 0 > o = READ_ONCE(object); // could be reordered here. > *lock = 1; // SC It may be worth noting that at least in theory, not only reads may pass the store. If the spin-lock is done as r1 = *lock // read-acquire, r1 == 0 *lock = 1 // SC then even *writes* inside the locked region might pass up through the "*lock = 1". So other CPU's - that haven't taken the spinlock - could see the modifications inside the critical region before they actually see the lock itself change. Now, the point of spin_unlock_wait() (and "spin_is_locked()") should generally be that you have some external ordering guarantee that guarantees that the lock has been taken. For example, for the IPC semaphores, we do either one of: (a) get large lock, then - once you hold that lock - wait for each small lock or (b) get small lock, then - once you hold that lock - check that the largo lock is unlocked and that's the case we should really worry about. The other uses of spin_unlock_wait() should have similar "I have other reasons to know I've seen that the lock was taken, or will never be taken after this because XYZ". This is why powerpc has a memory barrier in "arch_spin_is_locked()". Exactly so that the "check that the other lock is unlocked" is guaranteed to be ordered wrt the store that gets the first lock. It looks like ARM64 gets this wrong and is fundamentally buggy wrt "spin_is_locked()" (and, as a result, "spin_unlock_wait()"). BUT! And this is a bug BUT: It should be noted that that is purely an ARM64 bug. Not a bug in our users. If you have a spinlock where the "get lock write" part of the lock can be delayed, then you have to have a "arch_spin_is_locked()" that has the proper memory barriers. Of course, ARM still hides their architecture manuals in odd places, so I can't double-check. But afaik, ARM64 store-conditional is "store exclusive with release", and it has only release semantics, and ARM64 really does have the above bug. On that note: can anybody point me to the latest ARM64 8.1 architecture manual in pdf form, without the "you have to register" crap? I thought ARM released it, but all my googling just points to the idiotic ARM service center that wants me to sign away something just to see the docs. Which I don't do. Linus