From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933965AbdKBPki (ORCPT ); Thu, 2 Nov 2017 11:40:38 -0400 Received: from iolanthe.rowland.org ([192.131.102.54]:32778 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S933567AbdKBPkg (ORCPT ); Thu, 2 Nov 2017 11:40:36 -0400 Date: Thu, 2 Nov 2017 11:40:35 -0400 (EDT) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Peter Zijlstra cc: "Reshetova, Elena" , "linux-kernel@vger.kernel.org" , "gregkh@linuxfoundation.org" , "keescook@chromium.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "ishkamiel@gmail.com" , Will Deacon , Paul McKenney , , , , Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t In-Reply-To: <20171102135742.7o4urtltgvhr6dku@hirez.programming.kicks-ass.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2 Nov 2017, Peter Zijlstra wrote: > > Lock functions such as refcount_dec_and_lock() & > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as > > they atomic counterparts. > > Nope. The atomic_dec_and_lock() provides smp_mb() while > refcount_dec_and_lock() merely orders all prior load/store's against all > later load/store's. In fact there is no guaranteed ordering when refcount_dec_and_lock() returns false; it provides ordering only if the return value is true. In which case it provides acquire ordering (thanks to the spin_lock), and both release ordering and a control dependency (thanks to the refcount_dec_and_test). > The difference is subtle and involves at least 3 CPUs. I can't seem to > write up anything simple, keeps turning into monsters :/ Will, Paul, > have you got anything simple around? The combination of acquire + release is not the same as smp_mb, because they allow things to pass by in one direction. Example: C C-refcount-vs-atomic-dec-and-lock { } P0(int *x, int *y, refcount_t *r) { refcount_set(r, 1); WRITE_ONCE(*x, 1); smp_wmb(); WRITE_ONCE(*y, 1); } P1(int *x, int *y, refcount_t *r, spinlock_t *s) { int rx, ry; bool r1; ry = READ_ONCE(*y); r1 = refcount_dec_and_lock(r, s); if (r1) rx = READ_ONCE(*x); } exists (1:ry=1 /\ 1:r1=1 /\ 1:rx=0) This is allowed. The idea is that the CPU can take: Read y Acquire Release Read x and execute the first read after the Acquire and the second read before the Release: Acquire Read y Read x Release and then the CPU can reorder the reads: Acquire Read x Read y Release If the program had used atomic_dec_and_lock() instead, which provides a full smp_mb barrier, this outcome would not be possible. Alan Stern