From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754054AbbKLNbW (ORCPT <rfc822;w@1wt.eu>);
	Thu, 12 Nov 2015 08:31:22 -0500
Received: from unicorn.mansr.com ([81.2.72.234]:47694 "EHLO unicorn.mansr.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751664AbbKLNbU convert rfc822-to-8bit (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 12 Nov 2015 08:31:20 -0500
From: =?iso-8859-1?Q?M=E5ns_Rullg=E5rd?= <mans@mansr.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: ralf@linux-mips.org, ddaney@caviumnetworks.com,
        linux-kernel@vger.kernel.org,
        Paul McKenney <paulmck@linux.vnet.ibm.com>,
        Will Deacon <will.deacon@arm.com>, torvalds@linux-foundation.org,
        boqun.feng@gmail.com
Subject: Re: [RFC][PATCH] mips: Fix arch_spin_unlock()
References: <20151112123123.GZ17308@twins.programming.kicks-ass.net>
Date: Thu, 12 Nov 2015 13:31:11 +0000
In-Reply-To: <20151112123123.GZ17308@twins.programming.kicks-ass.net> (Peter
	Zijlstra's message of "Thu, 12 Nov 2015 13:31:23 +0100")
Message-ID: <yw1xoaezmjj4.fsf@unicorn.mansr.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Peter Zijlstra <peterz@infradead.org> writes:

> Hi
>
> I think the MIPS arch_spin_unlock() is borken.
>
> spin_unlock() must have RELEASE semantics, these require that no LOADs
> nor STOREs leak out from the critical section.
>
> From what I know MIPS has a relaxed memory model which allows reads to
> pass stores, and as implemented arch_spin_unlock() only issues a wmb
> which doesn't order prior reads vs later stores.

This is correct.

> Therefore upgrade the wmb() to smp_mb().
>
> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)

Good question.

The current MIPS asm/barrier.h uses a plain SYNC instruction for all
kinds of barriers (except on Cavium Octeon), which is a bit wasteful.
A MIPS implementation can optionally support partial barriers (load,
store, acquire, release) which all behave like a full barrier if not
implemented, so those really ought to be used.

> Maybe-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> index 40196bebe849..b2ca13f06152 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
>  	unsigned int serving_now = lock->h.serving_now + 1;
> -	wmb();
> +	smp_mb();
>  	lock->h.serving_now = (u16)serving_now;
>  	nudge_writes();
>  }

All this weirdness was added in commit 500c2e1f:

    MIPS: Optimize spinlocks.
    
    The current locking mechanism uses a ll/sc sequence to release a
    spinlock.  This is slower than a wmb() followed by a store to unlock.
    
    The branching forward to .subsection 2 on sc failure slows down the
    contended case.  So we get rid of that part too.
    
    Since we are now working on naturally aligned u16 values, we can get
    rid of a masking operation as the LHU already does the right thing.
    The ANDI are reversed for better scheduling on multi-issue CPUs
    
    On a 12 CPU 750MHz Octeon cn5750 this patch improves ipv4 UDP packet
    forwarding rates from 3.58*10^6 PPS to 3.99*10^6 PPS, or about 11%.
    
    Signed-off-by: David Daney <ddaney@caviumnetworks.com>
    To: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/937/
    Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

-- 
Måns Rullgård
mans@mansr.com