All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Jann Horn <jannh@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>,
	Kees Cook <keescook@chromium.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Android Kernel Team <kernel-team@android.com>,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Oleg Nesterov <oleg@redhat.com>,
	"Reshetova, Elena" <elena.reshetova@intel.com>
Subject: Re: [PATCH] Convert struct pid count to refcount_t
Date: Thu, 28 Mar 2019 16:00:52 -0400	[thread overview]
Message-ID: <20190328200052.GA105221@google.com> (raw)
In-Reply-To: <CAG48ez1-3dLvzQoxxoUzTUq8dc1wwvwPBBxX_X9FWMscWTS-FQ@mail.gmail.com>

On Thu, Mar 28, 2019 at 04:17:50PM +0100, Jann Horn wrote:
> Since we're just talking about RCU stuff now, adding Paul McKenney to
> the thread.
> 
> On Thu, Mar 28, 2019 at 3:37 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Thu, Mar 28, 2019 at 03:57:44AM +0100, Jann Horn wrote:
> > > On Thu, Mar 28, 2019 at 3:34 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > On Thu, Mar 28, 2019 at 01:59:45AM +0100, Jann Horn wrote:
> > > > > On Thu, Mar 28, 2019 at 1:06 AM Kees Cook <keescook@chromium.org> wrote:
> > > > > > On Wed, Mar 27, 2019 at 7:53 AM Joel Fernandes (Google)
> > > > > > <joel@joelfernandes.org> wrote:
> > > > > > >
> > > > > > > struct pid's count is an atomic_t field used as a refcount. Use
> > > > > > > refcount_t for it which is basically atomic_t but does additional
> > > > > > > checking to prevent use-after-free bugs. No change in behavior if
> > > > > > > CONFIG_REFCOUNT_FULL=n.
> > > > > > >
> > > > > > > Cc: keescook@chromium.org
> > > > > > > Cc: kernel-team@android.com
> > > > > > > Cc: kernel-hardening@lists.openwall.com
> > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > [...]
> > > > > > > diff --git a/kernel/pid.c b/kernel/pid.c
> > > > > > > index 20881598bdfa..2095c7da644d 100644
> > > > > > > --- a/kernel/pid.c
> > > > > > > +++ b/kernel/pid.c
> > > > > > > @@ -37,7 +37,7 @@
> > > > > > >  #include <linux/init_task.h>
> > > > > > >  #include <linux/syscalls.h>
> > > > > > >  #include <linux/proc_ns.h>
> > > > > > > -#include <linux/proc_fs.h>
> > > > > > > +#include <linux/refcount.h>
> > > > > > >  #include <linux/sched/task.h>
> > > > > > >  #include <linux/idr.h>
> > > > > > >
> > > > > > > @@ -106,8 +106,8 @@ void put_pid(struct pid *pid)
> > > > > > >                 return;
> > > > > > >
> > > > > > >         ns = pid->numbers[pid->level].ns;
> > > > > > > -       if ((atomic_read(&pid->count) == 1) ||
> > > > > > > -            atomic_dec_and_test(&pid->count)) {
> > > > > > > +       if ((refcount_read(&pid->count) == 1) ||
> > > > > > > +            refcount_dec_and_test(&pid->count)) {
> > > > > >
> > > > > > Why is this (and the original code) safe in the face of a race against
> > > > > > get_pid()? i.e. shouldn't this only use refcount_dec_and_test()? I
> > > > > > don't see this code pattern anywhere else in the kernel.
> > > > >
> > > > > Semantically, it doesn't make a difference whether you do this or
> > > > > leave out the "refcount_read(&pid->count) == 1". If you read a 1 from
> > > > > refcount_read(), then you have the only reference to "struct pid", and
> > > > > therefore you want to free it. If you don't get a 1, you have to
> > > > > atomically drop a reference, which, if someone else is concurrently
> > > > > also dropping a reference, may leave you with the last reference (in
> > > > > the case where refcount_dec_and_test() returns true), in which case
> > > > > you still have to take care of freeing it.
> > > >
> > > > Also, based on Kees comment, I think it appears to me that get_pid and
> > > > put_pid can race in this way in the original code right?
> > > >
> > > > get_pid                 put_pid
> > > >
> > > >                         atomic_dec_and_test returns 1
> > >
> > > This can't happen. get_pid() can only be called on an existing
> > > reference. If you are calling get_pid() on an existing reference, and
> > > someone else is dropping another reference with put_pid(), then when
> > > both functions start running, the refcount must be at least 2.
> >
> > Sigh, you are right. Ok. I was quite tired last night when I wrote this.
> > Obviously, I should have waited a bit and thought it through.
> >
> > Kees can you describe more the race you had in mind?
> >
> > > > atomic_inc
> > > >                         kfree
> > > >
> > > > deref pid /* boom */
> > > > -------------------------------------------------
> > > >
> > > > I think get_pid needs to call atomic_inc_not_zero() and put_pid should
> > > > not test for pid->count == 1 as condition for freeing, but rather just do
> > > > atomic_dec_and_test. So something like the following diff. (And I see a
> > > > similar pattern used in drivers/net/mac.c)
> > >
> > > get_pid() can only be called when you already have a refcounted
> > > reference; in other words, when the reference count is at least one.
> > > The lifetime management of struct pid differs from the lifetime
> > > management of most other objects in the kernel; the usual patterns
> > > don't quite apply here.
> > >
> > > Look at put_pid(): When the refcount has reached zero, there is no RCU
> > > grace period (unlike most other objects with RCU-managed lifetimes).
> > > Instead, free_pid() has an RCU grace period *before* it invokes
> > > delayed_put_pid() to drop a reference; and free_pid() is also the
> > > function that removes a PID from the namespace's IDR, and it is used
> > > by __change_pid() when a task loses its reference on a PID.
> > >
> > > In other words: Most refcounted objects with RCU guarantee that the
> > > object waits for a grace period after its refcount has reached zero;
> > > and during the grace period, the refcount is zero and you're not
> > > allowed to increment it again.
> >
> > Can you give an example of this "most refcounted objects with RCU" usecase?
> > I could not find any good examples of such. I want to document this pattern
> > and possibly submit to Documentation/RCU.
> 
> E.g. struct posix_acl is a relatively straightforward example:
> posix_acl_release() is a wrapper around refcount_dec_and_test(); if
> the refcount has dropped to zero, the object is released after an RCU
> grace period using kfree_rcu().
> get_cached_acl() takes an RCU read lock, does rcu_dereference() [with
> a missing __rcu annotation, grmbl], and attempts to take a reference
> with refcount_inc_not_zero().

Ok I get it now. It is quite a subtle difference in usage, I have noted both
these usecases in my private notes for my own sanity ;-). I wonder if Paul
thinks this is too silly to document into Documentation/RCU/, or if I should
write-up something.

One thing I wonder is if one usage pattern is faster than the other.
Certainly in the {get,put}_pid case, it seems nice to be able to do a
get_pid even though free_pid's grace period has still not completed. Where as
in the posix_acl case, once the grace period starts then it is no longer
possible to get a reference as you pointed and its basically game-over for
that object.

thank you!

 - Joel


  parent reply	other threads:[~2019-03-28 20:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-27 14:53 [PATCH] Convert struct pid count to refcount_t Joel Fernandes (Google)
2019-03-28  0:06 ` Kees Cook
2019-03-28  0:59   ` Jann Horn
2019-03-28  2:34     ` Joel Fernandes
2019-03-28  2:57       ` Jann Horn
2019-03-28 14:37         ` Joel Fernandes
2019-03-28 15:17           ` Jann Horn
2019-03-28 16:26             ` Oleg Nesterov
2019-03-28 17:37               ` Paul E. McKenney
2019-03-29 17:32                 ` Oleg Nesterov
2019-03-29 19:45                   ` Alan Stern
2019-04-01 15:28                     ` David Laight
2019-03-30  2:36                 ` Joel Fernandes
2019-03-30 15:16                   ` Alan Stern
2019-03-31 21:57                     ` Paul E. McKenney
2019-03-31 21:55                   ` Paul E. McKenney
2019-04-01 21:11                     ` Joel Fernandes
2019-04-04 15:23                       ` Paul E. McKenney
2019-04-04 16:01                         ` Alan Stern
2019-04-04 18:08                           ` Joel Fernandes
2019-04-04 18:19                             ` Paul E. McKenney
2019-04-04 20:31                               ` Joel Fernandes
2019-04-04 19:09                             ` Alan Stern
2019-03-28 20:00             ` Joel Fernandes [this message]
2019-03-29  2:24               ` Joel Fernandes
2019-03-28 16:52           ` Kees Cook
2019-03-28 14:26       ` Oleg Nesterov
2019-03-28 14:39         ` Joel Fernandes
2019-03-29  2:34           ` Joel Fernandes
2019-03-29 17:37             ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190328200052.GA105221@google.com \
    --to=joel@joelfernandes.org \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=elena.reshetova@intel.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.