From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S933965AbdKBPki (ORCPT <rfc822;w@1wt.eu>);
        Thu, 2 Nov 2017 11:40:38 -0400
Received: from iolanthe.rowland.org ([192.131.102.54]:32778 "HELO
        iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with SMTP id S933567AbdKBPkg (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 2 Nov 2017 11:40:36 -0400
Date: Thu, 2 Nov 2017 11:40:35 -0400 (EDT)
From: Alan Stern <stern@rowland.harvard.edu>
X-X-Sender: stern@iolanthe.rowland.org
To: Peter Zijlstra <peterz@infradead.org>
cc: "Reshetova, Elena" <elena.reshetova@intel.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
        "keescook@chromium.org" <keescook@chromium.org>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "ishkamiel@gmail.com" <ishkamiel@gmail.com>,
        Will Deacon <will.deacon@arm.com>,
        Paul McKenney <paulmck@linux.vnet.ibm.com>, <parri.andrea@gmail.com>,
        <boqun.feng@gmail.com>, <dhowells@redhat.com>, <david@fromorbit.com>
Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in
 atomic_t
In-Reply-To: <20171102135742.7o4urtltgvhr6dku@hirez.programming.kicks-ass.net>
Message-ID: <Pine.LNX.4.44L0.1711021123210.1277-100000@iolanthe.rowland.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, 2 Nov 2017, Peter Zijlstra wrote:

> > Lock functions such as refcount_dec_and_lock() &
> > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > they atomic counterparts. 
> 
> Nope. The atomic_dec_and_lock() provides smp_mb() while
> refcount_dec_and_lock() merely orders all prior load/store's against all
> later load/store's.

In fact there is no guaranteed ordering when refcount_dec_and_lock()  
returns false; it provides ordering only if the return value is true.  
In which case it provides acquire ordering (thanks to the spin_lock),
and both release ordering and a control dependency (thanks to the
refcount_dec_and_test).

> The difference is subtle and involves at least 3 CPUs. I can't seem to
> write up anything simple, keeps turning into monsters :/ Will, Paul,
> have you got anything simple around?

The combination of acquire + release is not the same as smp_mb, because 
they allow things to pass by in one direction.  Example:

C C-refcount-vs-atomic-dec-and-lock

{
}

P0(int *x, int *y, refcount_t *r)
{
	refcount_set(r, 1);
	WRITE_ONCE(*x, 1);
	smp_wmb();
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, refcount_t *r, spinlock_t *s)
{
	int rx, ry;
	bool r1;

	ry = READ_ONCE(*y);
	r1 = refcount_dec_and_lock(r, s);
	if (r1)
		rx = READ_ONCE(*x);
}

exists (1:ry=1 /\ 1:r1=1 /\ 1:rx=0)

This is allowed.  The idea is that the CPU can take:

	Read y
	Acquire
	Release
	Read x

and execute the first read after the Acquire and the second read before 
the Release:

	Acquire
	Read y
	Read x
	Release

and then the CPU can reorder the reads:

	Acquire
	Read x
	Read y
	Release

If the program had used atomic_dec_and_lock() instead, which provides a 
full smp_mb barrier, this outcome would not be possible.

Alan Stern