From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-yw0-f182.google.com ([209.85.161.182]:32831 "EHLO
        mail-yw0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1756174AbcIRT6H (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Sun, 18 Sep 2016 15:58:07 -0400
Received: by mail-yw0-f182.google.com with SMTP id i129so122449115ywb.0
        for <linux-fsdevel@vger.kernel.org>; Sun, 18 Sep 2016 12:58:07 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20160918184507.GT10601@decadent.org.uk>
References: <1474211117-16674-1-git-send-email-jann@thejh.net>
 <1474211117-16674-3-git-send-email-jann@thejh.net> <1474222407.2428.2.camel@decadent.org.uk>
 <20160918183137.GA17170@pc.thejh.net> <20160918184507.GT10601@decadent.org.uk>
From: Andy Lutomirski <luto@amacapital.net>
Date: Sun, 18 Sep 2016 12:57:46 -0700
Message-ID: <CALCETrUzgWzjeAoearJCermuoLbrrMEA15C=xgVXVkLsUgWLYA@mail.gmail.com>
Subject: Re: [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
To: Ben Hutchings <ben@decadent.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>,
        Stephen Smalley <sds@tycho.nsa.gov>,
        Andrew Morton <akpm@linux-foundation.org>,
        "security@kernel.org" <security@kernel.org>,
        James Morris <james.l.morris@oracle.com>,
        Janis Danisevskis <jdanis@google.com>,
        Casey Schaufler <casey@schaufler-ca.com>,
        Roland McGrath <roland@hack.frob.com>,
        Kees Cook <keescook@chromium.org>,
        Alexander Viro <viro@zeniv.linux.org.uk>,
        LSM List <linux-security-module@vger.kernel.org>,
        "Serge E. Hallyn" <serge@hallyn.com>, Jann Horn <jann@thejh.net>,
        "Eric . Biederman" <ebiederm@xmission.com>,
        Paul Moore <aul@paul-moore.com>,
        Linux FS Devel <linux-fsdevel@vger.kernel.org>,
        Oleg Nesterov <oleg@redhat.com>,
        Benjamin LaHaise <bcrl@kvack.org>,
        Eric Paris <eparis@parisplace.org>,
        Seth Forshee <seth.forshee@canonical.com>,
        John Johansen <john.johansen@canonical.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sep 18, 2016 8:45 AM, "Ben Hutchings" <ben@decadent.org.uk> wrote:
>
> On Sun, Sep 18, 2016 at 08:31:37PM +0200, Jann Horn wrote:
> > On Sun, Sep 18, 2016 at 07:13:27PM +0100, Ben Hutchings wrote:
> > > On Sun, 2016-09-18 at 17:05 +0200, Jann Horn wrote:
> > > > This ensures that self_privunit_id ("privilege unit ID") is only shared by
> > > > processes that share the mm_struct and the signal_struct; not just
> > > > spatially, but also temporally. In other words, if you do execve() or
> > > > clone() without CLONE_THREAD, you get a new privunit_id that has never been
> > > > used before.
> > > [...]
> > > > +void increment_privunit_counter(void)
> > > > +{
> > > > + BUILD_BUG_ON(NR_CPUS > (1 << 16));
> > > > + current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
> > > > +}
> > > [...]
> > >
> > > This will wrap incorrectly if NR_CPUS is not a power of 2 (which is
> > > unusual but allowed).
> >
> > If this wraps, hell breaks loose permission-wise - processes that have
> > no relationship whatsoever with each other will suddenly be able to ptrace
> > each other.
> >
> > The idea is that it never wraps.
>
> That's what I suspected, but wasn't sure.  In that case you can
> initialise each counter to U64_MAX/NR_CPUS*cpu and increment by
> 1 each time, which might be more efficient on some architectures.
>
> > It wraps after (2^64)/NR_CPUS execs or
> > forks on one CPU core. NR_CPUS is bounded to <=2^16, so in the worst case,
> > it wraps after 2^48 execs or forks.
> >
> > On my system with 3.7GHz per core, 2^16 minimal sequential non-thread clone()
> > calls need 1 second system time (and 2 seconds wall clock time, but let's
> > disregard that), so 2^48 non-thread clone() calls should need over 100 years.
> >
> > But I guess both the kernel and machines get faster - if you think the margin
> > might not be future-proof enough (or if you think I measured wrong and it's
> > actually much faster), I guess I could bump this to a 128bit number.
>
> Sequential execution speed isn't likely to get significantly faster so
> with those current numbers this seems to be quite safe.
>

But how big can NR_CPUs get before this gets uncomfortable?

We could do:

struct luid {
  u64 count:
  unsigned cpu;
};

(LUID = locally unique ID).

IIRC my draft PCID code does something similar to uniquely identify
mms.  If I accidentally reused a PCID without a flush, everything
would explode.

--Andy