From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751661AbdF1XyY (ORCPT <rfc822;w@1wt.eu>);
        Wed, 28 Jun 2017 19:54:24 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:33183 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1751552AbdF1XyR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 28 Jun 2017 19:54:17 -0400
Date: Wed, 28 Jun 2017 16:54:12 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Andrea Parri <parri.andrea@gmail.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        priyalee.kushwaha@intel.com, drozdziak1@gmail.com,
        Arnd Bergmann <arnd@arndb.de>, ldr709@gmail.com,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <peterz@infradead.org>,
        Josh Triplett <josh@joshtriplett.org>, Nicolas Pitre <nico@linaro.org>,
        Krister Johansen <kjlx@templeofstupid.com>,
        Vegard Nossum <vegard.nossum@oracle.com>, dcb314@hotmail.com,
        Wu Fengguang <fengguang.wu@intel.com>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Rik van Riel <riel@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
        Ingo Molnar <mingo@kernel.org>, Luc Maranget <luc.maranget@inria.fr>,
        Jade Alglave <j.alglave@ucl.ac.uk>
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13
Reply-To: paulmck@linux.vnet.ibm.com
References: <20170628170321.GQ3721@linux.vnet.ibm.com>
 <Pine.LNX.4.44L0.1706281547270.27696-100000@netrider.rowland.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.44L0.1706281547270.27696-100000@netrider.rowland.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
x-cbid: 17062823-0044-0000-0000-0000035F1A6D
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007291; HX=3.00000241; KW=3.00000007;
 PH=3.00000004; SC=3.00000214; SDB=6.00880126; UDB=6.00438724; IPR=6.00660280;
 BA=6.00005445; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015995; XFM=3.00000015;
 UTC=2017-06-28 23:54:15
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17062823-0045-0000-0000-0000078D271C
Message-Id: <20170628235412.GB3721@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-28_15:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=98 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000
 definitions=main-1706280382
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Jun 28, 2017 at 04:16:03PM -0400, Alan Stern wrote:
> On Wed, 28 Jun 2017, Paul E. McKenney wrote:

[ . . . ]

> Yes.  Bear in mind that the PowerPC version of arch_spin_unlock_wait
> ends with smp_mb.  That already makes it a lot stronger than the
> smp_cond_load_acquire implementation on other architectures.

Fair enough!

> > So what am I missing here?
> 
> Our memory model is (deliberately) weaker than the guarantees provided
> by any of the hardware implementations.

Agreed -- the model cannot say that something is guaranteed unless -all-
the architectures provide that guarantee.  If I remember correctly, each
architecture currently running Linux is stronger in some way than at least
one other architecture.  So yes, the model has to be strictly weaker than
each of the architectures taken individually, because otherwise the model
would be making a guarantee that at least one architecture does not meet,
which would disqualify it from being a valid Linux-kernel memory model.

Plus, yes, there are a few places where the model is a bit weaker than
it needs to be in order to keep the model's complexity from exploding
beyond all reason.

>                                          So while adding smp_mb in
> front of smp_cond_load_acquire may suffice to give the desired
> semantics in many cases, it might not suffice for all architectures
> (because it doesn't suffice in the model).  In fact, we would need to 
> change the model to make it accept this idiom.

Understood.  And if Murphy is being his usual self, the change wouldn't
make the model any simpler.

> I admit not being able to point to any architectures where the
> difference matters.  But if you take this approach, you do need to
> document that for any future architectures, putting smp_mb in front of
> spin_unlock_wait must be guaranteed by the arch-specific code to have
> the desired effect.
> 
> Also, it would not be obvious to a reader _why_ putting an explicit
> smp_mb before spin_unlock_wait would make prior writes visible to later
> critical sections, since it's not obvious that spin_unlock_wait needs
> to do any writes.

Agreed, my current comments need help.  I will fix them as you suggest.

>                    Replacing the whole thing with spin_lock +
> spin_unlock would be easier to understand and wouldn't need any special 
> guarantees or explanations.

That was my initial choice, so I would look pretty silly arguing all
that much.  And I have had much better success getting the lock+unlock
definition across to people, so my experience agrees with yours on the
easy-to-understand part.

> > > If there are any places where this would add unacceptable overhead,
> > > maybe those places need some rethinking.  For instance, perhaps we
> > > could add a separate primitive that provides only release semantics.  
> > > (But would using the new primitive together with spin_unlock_wait
> > > really be significantly better than lock-unlock?)
> > 
> > At the moment, I have no idea on the relative overheads.  ;-)
> 
> I don't either.  But for architectures where spin_unlock_wait already 
> does the equivalent of load-locked + store-conditional, it's hard to 
> see why replacing it with spin_lock + spin_unlock would make it any 
> slower.

Well, the current implementations don't actually take the lock, which
means that they don't delay other lock holders quite so much.  Which might
not be a huge difference even at high levels of lock contention, and
would definitely be down in the noise for low levels of lock contention.

Digital packrat that I am, I am still carrying both sets of patches,
so can still go either way.

Linus, are you dead-set against defining spin_unlock_wait() to be
spin_lock + spin_unlock?  For example, is the current x86 implementation
of spin_unlock_wait() really a non-negotiable hard requirement?  Or
would you be willing to live with the spin_lock + spin_unlock semantics?

							Thanx, Paul