From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751878AbdJ0N4r (ORCPT <rfc822;w@1wt.eu>);
        Fri, 27 Oct 2017 09:56:47 -0400
Received: from merlin.infradead.org ([205.233.59.134]:56186 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751298AbdJ0N4q (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 27 Oct 2017 09:56:46 -0400
Date: Fri, 27 Oct 2017 15:56:24 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
        "keescook@chromium.org" <keescook@chromium.org>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "ishkamiel@gmail.com" <ishkamiel@gmail.com>,
        Will Deacon <will.deacon@arm.com>
Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in
 atomic_t
Message-ID: <20171027135624.GY3165@worktop.lehotels.local>
References: <1508756984-18980-1-git-send-email-elena.reshetova@intel.com>
 <20171023131224.GC3165@worktop.lehotels.local>
 <2236FBA76BA1254E88B949DDB74E612B802B6A2E@IRSMSX102.ger.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2236FBA76BA1254E88B949DDB74E612B802B6A2E@IRSMSX102.ger.corp.intel.com>
User-Agent: Mutt/1.5.22.1 (2013-10-16)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Oct 27, 2017 at 06:49:55AM +0000, Reshetova, Elena wrote:
> Could we possibly have a bit more elaborate discussion on this? 
> 
> Or alternatively, what then should be the correct way for a certain
> variable (that behaves like a standard refcounter) to check if the
> relaxed memory ordering is ok? 

The refcount_t includes sufficient ordering for normal reference
counting. It provides a happens-before relation wrt. dropping the
reference, such that all object accesses happen before we drop the
reference; because at that point the object can go away and any further
access would be UAF.

So if the thing being replaced is purely a reference count, all should
be well.

> This is what Thomas was asking me to do for core kernel conversions
> and this is what I don't know how to do correctly.

Thomas asked you to verify that nothing relies on the stronger ordering
provided by atomic_dec_and_test(). This means you have to audit the code
with an eye towards memory ordering.

Now the futex one is actually OK. Thomas just wants you to mention that
refcount_dec_and_test is weaker and explain that in this case that is
fine since pi_state::refcount is in fact a pure refcount and
put_pi_state() does not in fact rely on further ordering.

Just mentioning the above gives the maintainer:

  1) the information he needs to perform review, mentioning memory
  ordering changes means he needs to look at it.

  2) you identify the place where it changes (put_pi_state) which again
  aids him in review by limiting the amount of code he needs to double
  check.

  3) that you actually looked at the problem; giving more trust you
  actually know wth you're doing.

> Also, I got exactly the same question from xfs maintainer,
> so if we provide an API that we hope will be used correctly in the
> future, we should have a way for people to verify it. 

I actually replied to Dave (but have not checked if there's further
email on the thread). I object to the claim that the code is correct if
he cannot explain the memory ordering.

Aside from that; if/when he explain the ordering requirements of those
sites and those do indeed require more than refcount_dec_and_test()
provides we should add a comment exactly explaining the ordering along
with additional barriers, such as smp_mb__after_atomic().

> Maybe it is just me, but I would think that having a way to verify
> that your code is ok with this refcounter-specific relaxed memory
> ordering applies to any memory ordering requirements, refcounters are
> just a subset with certain properties, so is then the full answer is
> to figure out how to verify any memory ordering requirement of your
> code? 

Yes; and this can be quite tricky, but undocumented ordering
requirements are a bug that need to be fixed. Mostly the code is
'simple' and refcount really does the right and sufficient thing.

If the maintainer knows or suspects more ordering is required he can
help out; but for that to happen you have to, at the very least, do 1&2
above.

> We can also document this logic in more docs or even maybe try to
> create a better documentation for current memory ordering bits since
> it is not the most easiest read at the moment. Otherwise this might be
> just bad enough reason for many people to avoid refcount_t type if it
> is just easier to tell "let's take just old atomic, we knew it somehow
> worked before"....

That's just voodoo programming; and while that is rife I have absolutely
no sympathy for it. Memory ordering is hard, but that doesn't mean we
should just blindly sprinkle memory barriers around until it 'works'.

So while memory-barriers.txt is a longish document, it is readable with
a bit of time and effort. There are also tools being developed that can
help people validate their code.

There are enough people around that actually get this stuff (it really
is not _that_ hard, but it does require a functional brain) so if you
can describe the problem with sufficient detail we can help out getting
the ordering right (or state its broken :-).