From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1030749AbbD1SY0 (ORCPT <rfc822;w@1wt.eu>);
	Tue, 28 Apr 2015 14:24:26 -0400
Received: from casper.infradead.org ([85.118.1.10]:54657 "EHLO
	casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1030476AbbD1SYV (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 28 Apr 2015 14:24:21 -0400
Date: Tue, 28 Apr 2015 20:24:10 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Chris Metcalf <cmetcalf@ezchip.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Manfred Spraul <manfred@colorfullife.com>,
        Oleg Nesterov <oleg@redhat.com>, Kirill Tkhai <ktkhai@parallels.com>,
        linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
        Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking
 cycles
Message-ID: <20150428182410.GM5029@twins.programming.kicks-ass.net>
References: <54E77CC0.5030401@colorfullife.com>
 <20150220184551.GQ2896@worktop.programming.kicks-ass.net>
 <20150425195602.GA26676@linux.vnet.ibm.com>
 <20150426105213.GA27191@linux.vnet.ibm.com>
 <20150428143357.GF23123@twins.programming.kicks-ass.net>
 <553FACF1.2020405@ezchip.com>
 <20150428164033.GJ5029@twins.programming.kicks-ass.net>
 <553FBC4F.3070104@ezchip.com>
 <20150428174323.GL5029@twins.programming.kicks-ass.net>
 <553FCAD0.9090403@ezchip.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <553FCAD0.9090403@ezchip.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Apr 28, 2015 at 02:00:48PM -0400, Chris Metcalf wrote:
> Yes, tilepro can do 16-bit atomic load/stores.  The reason we didn't use
> your approach (basically having tns provide locking for the head/tail)
> is just a perceived efficiency gain from rolling the tns lock into the head.
> 
> The current tilepro arch_spin_lock() is just three mesh network transactions
> (tns, store, load).  Your proposed spin lock is five (tns, load, store,
> store, load).
> Or, looking it from a core-centric perspective, the current arch_spin_lock()
> only has to wait on requests from the mesh network twice (tns, load),
> basically
> once for each member of the lock structure; your proposed version is three
> (tns, load, load).
> 
> I don't honestly know how critical this difference is, but that's why I
> designed it the way I did.

Makes sense. Good reason ;-)

> I think your goal with your proposed redesign is being able to atomically
> read head and tail together for arch_spin_unlock_wait(), but I don't see
> why that's better than just reading head, checking it's not equal to tail
> with a separate read, then spinning waiting for head to change.

Right, that should be perfectly fine indeed.

A few questions:

> >static inline void arch_spin_lock(arch_spinlock_t *lock)
> >{
> >	unsigned short head, tail;
> >
> >	___tns_lock(&lock->lock); /* XXX does the TNS imply a ___sync?  */

Does it? Something needs to provide the ACQUIRE semantics.

> >	head = lock->head;
> >	lock->head++;
> >	___tns_unlock(&lock->lock);
> >
> >	while (READ_ONCE(lock->tail) != head)
> >		cpu_relax();
> >}
> >
> >static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >{
> >	/*
> >	 * can do with regular load/store because the lock owner
> >	 * is the only one going to do stores to the tail
> >	 */
> >	unsigned short tail = READ_ONCE(lock->tail);
> >	smp_mb(); /* MB is stronger than RELEASE */

Note that your code uses wmb(), wmb is strictly speaking not correct,
as its weaker than RELEASE.

_However_ it doesn't make any practical difference since all three
barriers end up emitting __sync() so its not a bug per se.

> >	WRITE_ONCE(lock->tail, tail + 1);
> >}