From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030749AbbD1SY0 (ORCPT ); Tue, 28 Apr 2015 14:24:26 -0400 Received: from casper.infradead.org ([85.118.1.10]:54657 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030476AbbD1SYV (ORCPT ); Tue, 28 Apr 2015 14:24:21 -0400 Date: Tue, 28 Apr 2015 20:24:10 +0200 From: Peter Zijlstra To: Chris Metcalf Cc: "Paul E. McKenney" , Manfred Spraul , Oleg Nesterov , Kirill Tkhai , linux-kernel@vger.kernel.org, Ingo Molnar , Josh Poimboeuf Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles Message-ID: <20150428182410.GM5029@twins.programming.kicks-ass.net> References: <54E77CC0.5030401@colorfullife.com> <20150220184551.GQ2896@worktop.programming.kicks-ass.net> <20150425195602.GA26676@linux.vnet.ibm.com> <20150426105213.GA27191@linux.vnet.ibm.com> <20150428143357.GF23123@twins.programming.kicks-ass.net> <553FACF1.2020405@ezchip.com> <20150428164033.GJ5029@twins.programming.kicks-ass.net> <553FBC4F.3070104@ezchip.com> <20150428174323.GL5029@twins.programming.kicks-ass.net> <553FCAD0.9090403@ezchip.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <553FCAD0.9090403@ezchip.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 28, 2015 at 02:00:48PM -0400, Chris Metcalf wrote: > Yes, tilepro can do 16-bit atomic load/stores. The reason we didn't use > your approach (basically having tns provide locking for the head/tail) > is just a perceived efficiency gain from rolling the tns lock into the head. > > The current tilepro arch_spin_lock() is just three mesh network transactions > (tns, store, load). Your proposed spin lock is five (tns, load, store, > store, load). > Or, looking it from a core-centric perspective, the current arch_spin_lock() > only has to wait on requests from the mesh network twice (tns, load), > basically > once for each member of the lock structure; your proposed version is three > (tns, load, load). > > I don't honestly know how critical this difference is, but that's why I > designed it the way I did. Makes sense. Good reason ;-) > I think your goal with your proposed redesign is being able to atomically > read head and tail together for arch_spin_unlock_wait(), but I don't see > why that's better than just reading head, checking it's not equal to tail > with a separate read, then spinning waiting for head to change. Right, that should be perfectly fine indeed. A few questions: > >static inline void arch_spin_lock(arch_spinlock_t *lock) > >{ > > unsigned short head, tail; > > > > ___tns_lock(&lock->lock); /* XXX does the TNS imply a ___sync? */ Does it? Something needs to provide the ACQUIRE semantics. > > head = lock->head; > > lock->head++; > > ___tns_unlock(&lock->lock); > > > > while (READ_ONCE(lock->tail) != head) > > cpu_relax(); > >} > > > >static inline void arch_spin_unlock(arch_spinlock_t *lock) > >{ > > /* > > * can do with regular load/store because the lock owner > > * is the only one going to do stores to the tail > > */ > > unsigned short tail = READ_ONCE(lock->tail); > > smp_mb(); /* MB is stronger than RELEASE */ Note that your code uses wmb(), wmb is strictly speaking not correct, as its weaker than RELEASE. _However_ it doesn't make any practical difference since all three barriers end up emitting __sync() so its not a bug per se. > > WRITE_ONCE(lock->tail, tail + 1); > >}