From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S933186AbcIEKhY (ORCPT <rfc822;w@1wt.eu>);
        Mon, 5 Sep 2016 06:37:24 -0400
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33511 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S932249AbcIEKhW (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 5 Sep 2016 06:37:22 -0400
X-IBM-Helo: d03dlp02.boulder.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
Date: Mon, 5 Sep 2016 03:37:14 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Will Deacon <will.deacon@arm.com>, Oleg Nesterov <oleg@redhat.com>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Michael Ellerman <mpe@ellerman.id.au>, linux-kernel@vger.kernel.org,
        Nicholas Piggin <npiggin@gmail.com>, Ingo Molnar <mingo@kernel.org>,
        Alan Stern <stern@rowland.harvard.edu>
Subject: Re: Question on smp_mb__before_spinlock
Reply-To: paulmck@linux.vnet.ibm.com
References: <20160905093753.GN10138@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160905093753.GN10138@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16090510-0024-0000-0000-0000147A9F25
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00005713; HX=3.00000240; KW=3.00000007;
 PH=3.00000004; SC=3.00000184; SDB=6.00754046; UDB=6.00356734; IPR=6.00526894;
 BA=6.00004693; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012583; XFM=3.00000011;
 UTC=2016-09-05 10:37:18
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16090510-0025-0000-0000-0000442EC0E2
Message-Id: <20160905103714.GZ3663@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-05_05:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000
 definitions=main-1609050161
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Sep 05, 2016 at 11:37:53AM +0200, Peter Zijlstra wrote:
> Hi all,
> 
> So recently I've had two separate issues that touched upon
> smp_mb__before_spinlock().
> 
> 
> Since its inception, our understanding of ACQUIRE, esp. as applied to
> spinlocks, has changed somewhat. Also, I wonder if, with a simple
> change, we cannot make it provide more.
> 
> The problem with the comment is that the STORE done by spin_lock isn't
> itself ordered by the ACQUIRE, and therefore a later LOAD can pass over
> it and cross with any prior STORE, rendering the default WMB
> insufficient (pointed out by Alan).
> 
> Now, this is only really a problem on PowerPC and ARM64, the former of
> which already defined smp_mb__before_spinlock() as a smp_mb(), the
> latter does not, Will?
> 
> The second issue I wondered about is spinlock transitivity. All except
> powerpc have RCsc locks, and since Power already does a full mb, would
> it not make sense to put it _after_ the spin_lock(), which would provide
> the same guarantee, but also upgrades the section to RCsc.
> 
> That would make all schedule() calls fully transitive against one
> another.
> 
> 
> That is, would something like the below make sense?

Looks to me like you have reinvented smp_mb__after_unlock_lock()...

							Thanx, Paul

> (does not deal with mm_types.h and overlayfs using
> smp_mb__before_spnlock).
> 
> ---
>  arch/arm64/include/asm/barrier.h   |  2 ++
>  arch/powerpc/include/asm/barrier.h |  2 +-
>  include/linux/spinlock.h           | 41 +++++++++++++++++++++++++++++---------
>  kernel/sched/core.c                |  5 +++--
>  4 files changed, 38 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
> index 4eea7f618dce..d5cc8b58f942 100644
> --- a/arch/arm64/include/asm/barrier.h
> +++ b/arch/arm64/include/asm/barrier.h
> @@ -104,6 +104,8 @@ do {									\
>  	VAL;								\
>  })
> 
> +#define smp_mb__after_spinlock()	smp_mb()
> +
>  #include <asm-generic/barrier.h>
> 
>  #endif	/* __ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index c0deafc212b8..23d64d7196b7 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -74,7 +74,7 @@ do {									\
>  	___p1;								\
>  })
> 
> -#define smp_mb__before_spinlock()   smp_mb()
> +#define smp_mb__after_spinlock()   smp_mb()
> 
>  #include <asm-generic/barrier.h>
> 
> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 47dd0cebd204..284616dad607 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -118,16 +118,39 @@ do {								\
>  #endif
> 
>  /*
> - * Despite its name it doesn't necessarily has to be a full barrier.
> - * It should only guarantee that a STORE before the critical section
> - * can not be reordered with LOADs and STOREs inside this section.
> - * spin_lock() is the one-way barrier, this LOAD can not escape out
> - * of the region. So the default implementation simply ensures that
> - * a STORE can not move into the critical section, smp_wmb() should
> - * serialize it with another STORE done by spin_lock().
> + * This barrier must provide two things:
> + *
> + *   - it must guarantee a STORE before the spin_lock() is ordered against a
> + *     LOAD after it, see the comments at its two usage sites.
> + *
> + *   - it must ensure the critical section is RCsc.
> + *
> + * The latter is important for cases where we observe values written by other
> + * CPUs in spin-loops, without barriers, while being subject to scheduling.
> + *
> + * CPU0			CPU1			CPU2
> + * 
> + * 			for (;;) {
> + * 			  if (READ_ONCE(X))
> + * 			  	break;
> + * 			}
> + * X=1
> + * 			<sched-out>
> + * 						<sched-in>
> + * 						r = X;
> + *
> + * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> + * we get migrated and CPU2 sees X==0.
> + *
> + * Since most load-store architectures implement ACQUIRE with an smp_mb() after
> + * the LL/SC loop, they need no further barriers. Similarly all our TSO
> + * architectures imlpy an smp_mb() for each atomic instruction and equally don't
> + * need more.
> + *
> + * Architectures that can implement ACQUIRE better need to take care.
>   */
> -#ifndef smp_mb__before_spinlock
> -#define smp_mb__before_spinlock()	smp_wmb()
> +#ifndef smp_mb__after_spinlock
> +#define smp_mb__after_spinlock()	do { } while (0)
>  #endif
> 
>  /**
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 556cb07ab1cf..b151a33d393b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2006,8 +2006,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	 * reordered with p->state check below. This pairs with mb() in
>  	 * set_current_state() the waiting thread does.
>  	 */
> -	smp_mb__before_spinlock();
>  	raw_spin_lock_irqsave(&p->pi_lock, flags);
> +	smp_mb__after_spinlock();
>  	if (!(p->state & state))
>  		goto out;
> 
> @@ -3332,8 +3332,9 @@ static void __sched notrace __schedule(bool preempt)
>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
>  	 * done by the caller to avoid the race with signal_wake_up().
>  	 */
> -	smp_mb__before_spinlock();
>  	raw_spin_lock(&rq->lock);
> +	smp_mb__after_spinlock();
> +
>  	cookie = lockdep_pin_lock(&rq->lock);
> 
>  	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
>