From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1755868AbdKBRpQ (ORCPT <rfc822;w@1wt.eu>);
        Thu, 2 Nov 2017 13:45:16 -0400
Received: from mail-wr0-f196.google.com ([209.85.128.196]:49256 "EHLO
        mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1755781AbdKBRpN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Nov 2017 13:45:13 -0400
X-Google-Smtp-Source: ABhQp+Qeg+5rPo40VYRHOGwCQKw+MixEeWdLD2Zn6YuYT2RmHOvNm7ZOCwM99OqpDxnH6ziX+sZloQ==
Date: Thu, 2 Nov 2017 18:45:04 +0100
From: Andrea Parri <parri.andrea@gmail.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>,
        "Reshetova, Elena" <elena.reshetova@intel.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
        "keescook@chromium.org" <keescook@chromium.org>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "ishkamiel@gmail.com" <ishkamiel@gmail.com>,
        Will Deacon <will.deacon@arm.com>,
        Paul McKenney <paulmck@linux.vnet.ibm.com>, boqun.feng@gmail.com,
        dhowells@redhat.com, david@fromorbit.com
Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in
 atomic_t
Message-ID: <20171102174504.GA19833@andrea>
References: <20171102160237.t2xkryg6joskf77y@hirez.programming.kicks-ass.net>
 <Pine.LNX.4.44L0.1711021245570.1277-100000@iolanthe.rowland.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.44L0.1711021245570.1277-100000@iolanthe.rowland.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Nov 02, 2017 at 01:08:52PM -0400, Alan Stern wrote:
> On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> 
> > On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> > > On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > > 
> > > > > Lock functions such as refcount_dec_and_lock() &
> > > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > > > they atomic counterparts. 
> > > > 
> > > > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > > > refcount_dec_and_lock() merely orders all prior load/store's against all
> > > > later load/store's.
> > > 
> > > In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> > > returns false; 
> > 
> > It should provide a release:
> > 
> >  - if !=1, dec_not_one will provide release
> >  - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
> >    dec_and_test will provide the release, even if the test fails and we
> >    unlock again it should still dec.
> > 
> > The one exception is when the counter is saturated, but in that case
> > we'll never free the object and the ordering is moot in any case.
> 
> Also if the counter is 0, but that will never happen if the 
> refcounting is correct.
> 
> > > it provides ordering only if the return value is true.  
> > > In which case it provides acquire ordering (thanks to the spin_lock),
> > > and both release ordering and a control dependency (thanks to the
> > > refcount_dec_and_test).
> > > 
> > > > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > > > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > > > have you got anything simple around?
> > > 
> > > The combination of acquire + release is not the same as smp_mb, because 
> > 
> > acquire+release is nothing, its release+acquire that I meant which
> > should order things locally, but now that you've got me looking at it
> > again, we don't in fact do that.
> > 
> > So refcount_dec_and_lock() will provide a release, irrespective of the
> > return value (assuming we're not saturated). If it returns true, it also
> > does an acquire for the lock.
> > 
> > But combined they're acquire+release, which is unfortunate.. it means
> > the lock section and the refcount stuff overlaps, but I don't suppose
> > that's actually a problem. Need to consider more.
> 
> Right.  To address your point: release + acquire isn't the same as a
> full barrier either.  The SB pattern illustrates the difference:
> 
> 	P0		P1
> 	Write x=1	Write y=1
> 	Release a	smp_mb
> 	Acquire b	Read x=0
> 	Read y=0
> 
> This would not be allowed if the release + acquire sequence was 
> replaced by smp_mb.  But as it stands, this is allowed because nothing 
> prevents the CPU from interchanging the order of the release and the 
> acquire -- and then you're back to the acquire + release case.
> 
> However, there is one circumstance where this interchange isn't 
> allowed: when the release and acquire access the same memory 
> location.  Thus:
> 
> 	P0(int *x, int *y, int *a)
> 	{
> 		int r0;
> 
> 		WRITE_ONCE(*x, 1);
> 		smp_store_release(a, 1);
> 		smp_load_acquire(a);
> 		r0 = READ_ONCE(*y);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 		int r1;
> 
> 		WRITE_ONCE(*y, 1);
> 		smp_mb();
> 		r1 = READ_ONCE(*x);
> 	}
> 
> 	exists (0:r0=0 /\ 1:r1=0)
> 
> This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> were replaced by a similar release/acquire pair for the same memory 
> location.

Hopefully, the LKMM does not agree with this assessment... ;-)


> 
> To see the difference between smp_mb and release/acquire requires three 
> threads:
> 
> 	P0		P1		P2
> 	Write x=1	Read y=1	Read z=1
> 	Release a	data dep.	smp_rmb
> 	Acquire a	Write z=1	Read x=0
> 	Write y=1
> 
> The Linux Kernel Memory Model allows this execution, although as far as 
> I know, no existing hardware will do it.  But with smp_mb in P0, the 
> execution would be forbidden.

Here's a two-threads example showing that "(w)mb is _not_ rfi-rel-acq":

C rfi-rel-acq-is-not-mb

{}

P0(int *x, int *y, int *a)
{
	WRITE_ONCE(*x, 1);
	smp_store_release(a, 1);
	r1 = smp_load_acquire(a);
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y)
{
	int r0;
	int r1;

	r0 = READ_ONCE(*y);
	smp_rmb();
	r1 = READ_ONCE(*x);
}

exists (1:r0=1 /\ 1:r1=0)

  Andrea


> 
> None of this should be a problem for refcount_dec_and_lock, assuming it 
> is used purely for reference counting.
> 
> Alan Stern
>