From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752243AbbKPV6x (ORCPT ); Mon, 16 Nov 2015 16:58:53 -0500 Received: from mail-io0-f196.google.com ([209.85.223.196]:35856 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751389AbbKPV6u (ORCPT ); Mon, 16 Nov 2015 16:58:50 -0500 MIME-Version: 1.0 In-Reply-To: <20151116162452.GD1999@arm.com> References: <20151102132901.157178466@infradead.org> <20151102134941.005198372@infradead.org> <20151103175958.GA4800@redhat.com> <20151111093939.GA6314@fixme-laptop.cn.ibm.com> <20151111121232.GN17308@twins.programming.kicks-ass.net> <20151111193953.GA23515@redhat.com> <20151112070915.GC6314@fixme-laptop.cn.ibm.com> <20151116155658.GW17308@twins.programming.kicks-ass.net> <20151116160445.GK11639@twins.programming.kicks-ass.net> <20151116162452.GD1999@arm.com> Date: Mon, 16 Nov 2015 13:58:49 -0800 X-Google-Sender-Auth: ZTIXi7ZOk0W50Wlx1D4TDhH2oSY Message-ID: Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire() From: Linus Torvalds To: Will Deacon Cc: Peter Zijlstra , Boqun Feng , Oleg Nesterov , Ingo Molnar , Linux Kernel Mailing List , Paul McKenney , Jonathan Corbet , Michal Hocko , David Howells , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 16, 2015 at 8:24 AM, Will Deacon wrote: > > ... or we upgrade spin_unlock_wait to a LOCK operation, which might be > slightly cheaper than spin_lock()+spin_unlock(). So traditionally the real concern has been the cacheline ping-pong part of spin_unlock_wait(). I think adding a memory barrier (that doesn't force any exclusive states, just ordering) to it is fine, but I don't think we want to necessarily have it have to get the cacheline into exclusive state. Because if spin_unlock_wait() ends up having to get the spinlock cacheline (for example, by writing the same value back with a SC), I don't think spin_unlock_wait() will really be all that much cheaper than just getting the spinlock, and in that case we shouldn't play complicated ordering games. On another issue: I'm also looking at the ARM documentation for strx, and the _documentation_ says that it has no stronger ordering than a "store release", but I'm starting to wonder if that is actually true. Because I do end up thinking that it does have the same "control dependency" to all subsequent writes (but not reads). So reads after the SC can percolate up, but I think writes are restricted. Why? In order for the SC to be able to return success, the write itself may not have been actually done yet, but the cacheline for the write must have successfully be turned into exclusive ownership. Agreed? That means that by the time a SC returns success, no other CPU can see the old value of the spinlock any more. So by the time any subsequent stores in the locked region can be visible to any other CPU's, the locked value of the lock itself has to be visible too. Agreed? So I think that in effect, when a spinlock is implemnted with LL/SC, the loads inside the locked region are only ordered wrt the acquire on the LL, but the stores can be considered ordered wrt the SC. No? So I think a _successful_ SC - is still more ordered than just any random store with release consistency. Of course, I'm not sure that actually *helps* us, because I think the problem tends to be loads in the locked region moving up earlier than the actual store that sets the lock, but maybe it makes some difference. Linus