All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Android Kernel Team <kernel-team@android.com>,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Oleg Nesterov <oleg@redhat.com>,
	"Reshetova, Elena" <elena.reshetova@intel.com>
Subject: Re: [PATCH] Convert struct pid count to refcount_t
Date: Thu, 28 Mar 2019 10:37:38 -0400	[thread overview]
Message-ID: <20190328143738.GA261521@google.com> (raw)
In-Reply-To: <CAG48ez2bbSYpNHcC4oucx_AG=5y4K5pZfSOfN3mxGadJGpHuQQ@mail.gmail.com>

On Thu, Mar 28, 2019 at 03:57:44AM +0100, Jann Horn wrote:
> On Thu, Mar 28, 2019 at 3:34 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Thu, Mar 28, 2019 at 01:59:45AM +0100, Jann Horn wrote:
> > > On Thu, Mar 28, 2019 at 1:06 AM Kees Cook <keescook@chromium.org> wrote:
> > > > On Wed, Mar 27, 2019 at 7:53 AM Joel Fernandes (Google)
> > > > <joel@joelfernandes.org> wrote:
> > > > >
> > > > > struct pid's count is an atomic_t field used as a refcount. Use
> > > > > refcount_t for it which is basically atomic_t but does additional
> > > > > checking to prevent use-after-free bugs. No change in behavior if
> > > > > CONFIG_REFCOUNT_FULL=n.
> > > > >
> > > > > Cc: keescook@chromium.org
> > > > > Cc: kernel-team@android.com
> > > > > Cc: kernel-hardening@lists.openwall.com
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > [...]
> > > > > diff --git a/kernel/pid.c b/kernel/pid.c
> > > > > index 20881598bdfa..2095c7da644d 100644
> > > > > --- a/kernel/pid.c
> > > > > +++ b/kernel/pid.c
> > > > > @@ -37,7 +37,7 @@
> > > > >  #include <linux/init_task.h>
> > > > >  #include <linux/syscalls.h>
> > > > >  #include <linux/proc_ns.h>
> > > > > -#include <linux/proc_fs.h>
> > > > > +#include <linux/refcount.h>
> > > > >  #include <linux/sched/task.h>
> > > > >  #include <linux/idr.h>
> > > > >
> > > > > @@ -106,8 +106,8 @@ void put_pid(struct pid *pid)
> > > > >                 return;
> > > > >
> > > > >         ns = pid->numbers[pid->level].ns;
> > > > > -       if ((atomic_read(&pid->count) == 1) ||
> > > > > -            atomic_dec_and_test(&pid->count)) {
> > > > > +       if ((refcount_read(&pid->count) == 1) ||
> > > > > +            refcount_dec_and_test(&pid->count)) {
> > > >
> > > > Why is this (and the original code) safe in the face of a race against
> > > > get_pid()? i.e. shouldn't this only use refcount_dec_and_test()? I
> > > > don't see this code pattern anywhere else in the kernel.
> > >
> > > Semantically, it doesn't make a difference whether you do this or
> > > leave out the "refcount_read(&pid->count) == 1". If you read a 1 from
> > > refcount_read(), then you have the only reference to "struct pid", and
> > > therefore you want to free it. If you don't get a 1, you have to
> > > atomically drop a reference, which, if someone else is concurrently
> > > also dropping a reference, may leave you with the last reference (in
> > > the case where refcount_dec_and_test() returns true), in which case
> > > you still have to take care of freeing it.
> >
> > Also, based on Kees comment, I think it appears to me that get_pid and
> > put_pid can race in this way in the original code right?
> >
> > get_pid                 put_pid
> >
> >                         atomic_dec_and_test returns 1
> 
> This can't happen. get_pid() can only be called on an existing
> reference. If you are calling get_pid() on an existing reference, and
> someone else is dropping another reference with put_pid(), then when
> both functions start running, the refcount must be at least 2.

Sigh, you are right. Ok. I was quite tired last night when I wrote this.
Obviously, I should have waited a bit and thought it through.

Kees can you describe more the race you had in mind?

> > atomic_inc
> >                         kfree
> >
> > deref pid /* boom */
> > -------------------------------------------------
> >
> > I think get_pid needs to call atomic_inc_not_zero() and put_pid should
> > not test for pid->count == 1 as condition for freeing, but rather just do
> > atomic_dec_and_test. So something like the following diff. (And I see a
> > similar pattern used in drivers/net/mac.c)
> 
> get_pid() can only be called when you already have a refcounted
> reference; in other words, when the reference count is at least one.
> The lifetime management of struct pid differs from the lifetime
> management of most other objects in the kernel; the usual patterns
> don't quite apply here.
> 
> Look at put_pid(): When the refcount has reached zero, there is no RCU
> grace period (unlike most other objects with RCU-managed lifetimes).
> Instead, free_pid() has an RCU grace period *before* it invokes
> delayed_put_pid() to drop a reference; and free_pid() is also the
> function that removes a PID from the namespace's IDR, and it is used
> by __change_pid() when a task loses its reference on a PID.
> 
> In other words: Most refcounted objects with RCU guarantee that the
> object waits for a grace period after its refcount has reached zero;
> and during the grace period, the refcount is zero and you're not
> allowed to increment it again.

Can you give an example of this "most refcounted objects with RCU" usecase?
I could not find any good examples of such. I want to document this pattern
and possibly submit to Documentation/RCU.

> But for struct pid, the guarantee is
> instead that there is an RCU grace period after it has been removed
> from the IDRs and the task, and during the grace period, refcounting
> is guaranteed to still work normally.

Ok, thanks. Here I think in scrappy but simple pseudo code form, the struct
pid flow is something like (replaced "pid" with data");

get_data:
	atomic_inc(data->refcount);

some_user_of_data:
	rcu_read_lock();
	From X, obtain a ptr to data using rcu_dereference.
	get_data(data);
	rcu_read_unlock();

free_data:
	remove all references to data in all places in X
	call_rcu(put_data)

put_data:
	if (atomic_dec_and_test(data->refcount)) {
		free(data);
	}

create_data:
	data = alloc(..)
	atomic_set(data->refcount, 1);
	set pointers to data in X.

> > pud_pid to avoid such a race.
> >
> > ---8<-----------------------
> >
> > diff --git a/include/linux/pid.h b/include/linux/pid.h
> > index 8cb86d377ff5..3d79834e3180 100644
> > --- a/include/linux/pid.h
> > +++ b/include/linux/pid.h
> > @@ -69,8 +69,8 @@ extern struct pid init_struct_pid;
> >
> >  static inline struct pid *get_pid(struct pid *pid)
> >  {
> > -       if (pid)
> > -               refcount_inc(&pid->count);
> > +       if (!pid || !refcount_inc_not_zero(&pid->count))
> > +               return NULL;
> >         return pid;
> >  }
> 
> Nope, this is wrong. Once the refcount is zero, the object goes away,
> refcount_inc_not_zero() makes no sense here.

Yeah ok, I think what you meant here is that references to the object from
all places go away before the grace period starts, so a get_pid on an object
with refcount of zero is impossible since there's no way to *get* to that
object after the grace-period ends.

So, yes you are right that refcount_inc is all that's needed.

Also note to the on looker, the original patch I sent is not wrong, that
still applies and is correct. We are just discussing here any possible issues
with the *existing* code.

thanks!

 - Joel


  reply	other threads:[~2019-03-28 14:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-27 14:53 [PATCH] Convert struct pid count to refcount_t Joel Fernandes (Google)
2019-03-28  0:06 ` Kees Cook
2019-03-28  0:59   ` Jann Horn
2019-03-28  2:34     ` Joel Fernandes
2019-03-28  2:57       ` Jann Horn
2019-03-28 14:37         ` Joel Fernandes [this message]
2019-03-28 15:17           ` Jann Horn
2019-03-28 16:26             ` Oleg Nesterov
2019-03-28 17:37               ` Paul E. McKenney
2019-03-29 17:32                 ` Oleg Nesterov
2019-03-29 19:45                   ` Alan Stern
2019-04-01 15:28                     ` David Laight
2019-03-30  2:36                 ` Joel Fernandes
2019-03-30 15:16                   ` Alan Stern
2019-03-31 21:57                     ` Paul E. McKenney
2019-03-31 21:55                   ` Paul E. McKenney
2019-04-01 21:11                     ` Joel Fernandes
2019-04-04 15:23                       ` Paul E. McKenney
2019-04-04 16:01                         ` Alan Stern
2019-04-04 18:08                           ` Joel Fernandes
2019-04-04 18:19                             ` Paul E. McKenney
2019-04-04 20:31                               ` Joel Fernandes
2019-04-04 19:09                             ` Alan Stern
2019-03-28 20:00             ` Joel Fernandes
2019-03-29  2:24               ` Joel Fernandes
2019-03-28 16:52           ` Kees Cook
2019-03-28 14:26       ` Oleg Nesterov
2019-03-28 14:39         ` Joel Fernandes
2019-03-29  2:34           ` Joel Fernandes
2019-03-29 17:37             ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190328143738.GA261521@google.com \
    --to=joel@joelfernandes.org \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=elena.reshetova@intel.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.