From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-lf0-f47.google.com ([209.85.215.47]:37840 "EHLO
        mail-lf0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1754539AbeDZWrs (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 26 Apr 2018 18:47:48 -0400
Received: by mail-lf0-f47.google.com with SMTP id b23-v6so33091275lfg.4
        for <linux-fsdevel@vger.kernel.org>; Thu, 26 Apr 2018 15:47:47 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20180425004031.zutsno6hvmpq3crd@madcap2.tricolour.ca>
References: <cover.1521179281.git.rgb@redhat.com> <e284617ad667ad8f17958dd8babb87fe1b4d7205.1521179281.git.rgb@redhat.com>
 <CAHC9VhTyvxxj2e2Gn+iyW6iLLeYB7hp8a+JvfeMmJ2nUPqtEaw@mail.gmail.com>
 <20180421143443.faaput5g2rn6ul7p@madcap2.tricolour.ca> <CAHC9VhQkJBU-f-AuEnGF1BA2QW6nCJ_yr_EqBR02-1y9+XQZ5A@mail.gmail.com>
 <20180424020200.imonhbkwtb73luxl@madcap2.tricolour.ca> <CAHC9VhSZd7V9avx6K5g6CQ7mkj1T8ti7Nqq=OoWVwPznkesD1w@mail.gmail.com>
 <20180425004031.zutsno6hvmpq3crd@madcap2.tricolour.ca>
From: Paul Moore <paul@paul-moore.com>
Date: Thu, 26 Apr 2018 18:47:45 -0400
Message-ID: <CAHC9VhRXTvYvWNgOuxnnVHOJo0nUswoL1L0TdOzzCdcB2Mt+Jw@mail.gmail.com>
Subject: Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
To: Richard Guy Briggs <rgb@redhat.com>
Cc: cgroups@vger.kernel.org, containers@lists.linux-foundation.org,
        linux-api@vger.kernel.org,
        Linux-Audit Mailing List <linux-audit@redhat.com>,
        linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
        netdev@vger.kernel.org, ebiederm@xmission.com, luto@kernel.org,
        jlayton@redhat.com, carlos@redhat.com, dhowells@redhat.com,
        viro@zeniv.linux.org.uk, simo@redhat.com,
        Eric Paris <eparis@parisplace.org>, serge@hallyn.com
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Tue, Apr 24, 2018 at 8:40 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-24 15:01, Paul Moore wrote:
>> On Mon, Apr 23, 2018 at 10:02 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > On 2018-04-23 19:15, Paul Moore wrote:
>> >> On Sat, Apr 21, 2018 at 10:34 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> >> > On 2018-04-18 19:47, Paul Moore wrote:
>> >> >> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> >> >> > Implement the proc fs write to set the audit container ID of a process,
>> >> >> > emitting an AUDIT_CONTAINER record to document the event.
>> >> >> >
>> >> >> > This is a write from the container orchestrator task to a proc entry of
>> >> >> > the form /proc/PID/containerid where PID is the process ID of the newly
>> >> >> > created task that is to become the first task in a container, or an
>> >> >> > additional task added to a container.
>> >> >> >
>> >> >> > The write expects up to a u64 value (unset: 18446744073709551615).
>> >> >> >
>> >> >> > This will produce a record such as this:
>> >> >> > type=CONTAINER msg=audit(1519903238.968:261): op=set pid=596 uid=0 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0 ses=1 opid=596 old-contid=18446744073709551615 contid=123455 res=0
>> >> >> >
>> >> >> > The "op" field indicates an initial set.  The "pid" to "ses" fields are
>> >> >> > the orchestrator while the "opid" field is the object's PID, the process
>> >> >> > being "contained".  Old and new container ID values are given in the
>> >> >> > "contid" fields, while res indicates its success.
>> >> >> >
>> >> >> > It is not permitted to self-set, unset or re-set the container ID.  A
>> >> >> > child inherits its parent's container ID, but then can be set only once
>> >> >> > after.
>> >> >> >
>> >> >> > See: https://github.com/linux-audit/audit-kernel/issues/32
>> >> >> >
>> >> >> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
>> >> >> > ---
>> >> >> >  fs/proc/base.c             | 37 ++++++++++++++++++++
>> >> >> >  include/linux/audit.h      | 16 +++++++++
>> >> >> >  include/linux/init_task.h  |  4 ++-
>> >> >> >  include/linux/sched.h      |  1 +
>> >> >> >  include/uapi/linux/audit.h |  2 ++
>> >> >> >  kernel/auditsc.c           | 84 ++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >  6 files changed, 143 insertions(+), 1 deletion(-)
>>
>> ...
>>
>> >> >> >  /* audit_rule_data supports filter rules with both integer and string
>> >> >> >   * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
>> >> >> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> >> >> > index 4e0a4ac..29c8482 100644
>> >> >> > --- a/kernel/auditsc.c
>> >> >> > +++ b/kernel/auditsc.c
>> >> >> > @@ -2073,6 +2073,90 @@ int audit_set_loginuid(kuid_t loginuid)
>> >> >> >         return rc;
>> >> >> >  }
>> >> >> >
>> >> >> > +static int audit_set_containerid_perm(struct task_struct *task, u64 containerid)
>> >> >> > +{
>> >> >> > +       struct task_struct *parent;
>> >> >> > +       u64 pcontainerid, ccontainerid;
>> >> >> > +
>> >> >> > +       /* Don't allow to set our own containerid */
>> >> >> > +       if (current == task)
>> >> >> > +               return -EPERM;
>> >> >>
>> >> >> Why not?  Is there some obvious security concern that I missing?
>> >> >
>> >> > We then lose the distinction in the AUDIT_CONTAINER record between the
>> >> > initiating PID and the target PID.  This was outlined in the proposal.
>> >>
>> >> I just went back and reread the v3 proposal and I still don't see a
>> >> good explanation of this.  Why is this bad?  What's the security
>> >> concern?
>> >
>> > I don't remember, specifically.  Maybe this has been addressed by the
>> > check for children/threads or identical parent container ID.  So, I'm
>> > reluctantly willing to remove that check for now.
>>
>> Okay.  For the record, if someone can explain to me why this
>> restriction saves us from some terrible situation I'm all for leaving
>> it.  I'm just opposed to restrictions without solid reasoning behind
>> them.
>>
>> >> > Having said that, I'm still not sure we have protected sufficiently from
>> >> > a child turning around and setting it's parent's as yet unset or
>> >> > inherited audit container ID.
>> >>
>> >> Yes, I believe we only want to let a task set the audit container for
>> >> it's children (or itself/threads if we decide to allow that, see
>> >> above).  There *has* to be a function to check to see if a task if a
>> >> child of a given task ... right? ... although this is likely to be a
>> >> pointer traversal and locking nightmare ... hmmm.
>> >
>> > Isn't that just (struct task_struct)parent == (struct
>> > task_struct)child->parent (or ->real_parent)?
>> >
>> > And now that I say that, it is covered by the following patch's child
>> > check, so as long as we keep that, we should be fine.
>>
>> I was thinking of checking not just current's immediate children, but
>> any of it's descendants as I believe that is what we want to limit,
>> yes?  I just worry that it isn't really practical to perform that
>> check.
>
> The child check I'm talking about prevents setting a task's audit
> container ID if it *has* any children or threads, so if it has children
> it is automatically disqualified and its grandchildren are irrelevant.
>
>> >> >> I ask because I suppose it might be possible for some container
>> >> >> runtime to do a fork, setup some of the environment and them exec the
>> >> >> container (before you answer the obvious "namespaces!" please remember
>> >> >> we're not trying to define containers).
>> >> >
>> >> > I don't think namespaces have any bearing on this concern since none are
>> >> > required.
>> >> >
>> >> >> > +       /* Don't allow the containerid to be unset */
>> >> >> > +       if (!cid_valid(containerid))
>> >> >> > +               return -EINVAL;
>> >> >> > +       /* if we don't have caps, reject */
>> >> >> > +       if (!capable(CAP_AUDIT_CONTROL))
>> >> >> > +               return -EPERM;
>> >> >> > +       /* if containerid is unset, allow */
>> >> >> > +       if (!audit_containerid_set(task))
>> >> >> > +               return 0;
>> >> >> > +       /* it is already set, and not inherited from the parent, reject */
>> >> >> > +       ccontainerid = audit_get_containerid(task);
>> >> >> > +       rcu_read_lock();
>> >> >> > +       parent = rcu_dereference(task->real_parent);
>> >> >> > +       rcu_read_unlock();
>> >> >> > +       task_lock(parent);
>> >> >> > +       pcontainerid = audit_get_containerid(parent);
>> >> >> > +       task_unlock(parent);
>> >> >> > +       if (ccontainerid != pcontainerid)
>> >> >> > +               return -EPERM;
>> >> >> > +       return 0;
>>
>> I'm looking at the parent checks again and I wonder if the logic above
>> is what we really want.  Maybe it is, but I'm not sure.
>>
>> Things I'm wondering about:
>>
>> * "ccontainerid" and "containerid" are too close in name, I kept
>> confusing myself when looking at this code.  Please change one.  Bonus
>> points if it is shorter.
>
> Would c_containerid and p_containerid be ok?  child_cid and parent_cid?

Either would be an improvement over ccontainerid/containerid.  I would
give a slight node to child_cid/parent_cid just for length reasons.

> I'd really like it to have the same root as the parameter handed in so
> teh code is easier to follow.  It would be nice to have that across
> caller to local, but that's challenging.

That's fine, but you have to admit that ccontainerid/containerid is
awkward and not easy to quickly differentiate :)

> I've been tempted to use contid or even cid everywhere instead of
> containerid.  Perhaps the longer name doesn't bother me because I
> like its uniqueness and I learned touch-typing in grade 9 and I like
> 100+ character wide terminals?  ;-)

I would definitely appreciate contid/cid or similar, but I don't care
too much either way.  As far as terminal width is concerned, please
make sure your code fits in 80 char terminals.

>> * What if the orchestrator wants to move the task to a new container?
>> Right now it looks like you can only do that once, then then the
>> task's audit container ID will no longer be the same as real_parent
>
> A task's audit container ID can be unset or inherited, and then set
> only once.  After that, if you want it moved to a new container you
> can't and your only option is to spawn another peer to that task or a
> child of it and set that new task's audit container ID.

Okay.  We've had some many discussions about this both on and off list
that I lose track on where we stand for certain things.  I think
preventing task movement is fine for the initial effort so long as we
don't prevent adding it in the future; I don't see anything (other
than the permission checks under discussion, which is fine) preventing
this.

> Currently, the method of detecting if its audit container ID has been
> set (rather than inherited) was to check its parent's audit container
> ID.

Yeah ... those are two different things.  I've been wondering if we
should introduce a set/inherited flag as simply checking the parent
task's audit container ID isn't quite the same; although it may be
"close enough" that it doesn't matter in practice.  However, I'm
beginning to think this parent/child relationship isn't really
important beyond the inheritance issue ... more on this below.

> The only reason to change this might be if the audit container ID
> were not inheritable, but then we lose the accountability of a task
> spawning another process and being able to leave its child's audit
> container ID unset and unaccountable to any existing container.  I think
> the relationship to the parent is crucial, and if something wants to
> change audit container ID it can, by spawning childrent and leaving a
> trail of container IDs in its parent processes.  (So what if a parent
> dies?)

The audit container ID *must* be inherited, I don't really think
anyone is questioning that.  What I'm wondering about is what we
accomplish by comparing the child's and parent's audit container ID?

I've thought about this a bit more and I think we are making this way
too complicated right now.  We basically have three rules for the
audit container ID which we need to follow:

1. Children inherit their parent's audit container ID; this includes
the magic "unset" audit container ID.
2. You can't change the audit container ID once set.
3. In order to set the audit container ID of a process you must have
CAP_AUDIT_CONTROL.

With that in mind, I think the permission checks would be something like this:

[SIDE NOTE: Audit Container ID in acronym form works out to "acid" ;) ]

  int perm(task, acid)
  {
      if (!task || !valid(acid))
         return -EINVAL;
      if (!capable(CAP_AUDIT_CONTROL))
         return -EPERM;
      if (task->acid != UNSET)
         return -EPERM;
      return 0;
  }

>> ... or does the orchestrator change that?  *Can* the orchestrator
>> change real_parent (I suspect the answer is "no")?
>
> I don't think the orchestrator is able to change real_parent.

I didn't think so either, but I didn't do an exhaustive check.

> I've forgotten why there is a ->parent and ->real_parent and how they can
> change.  One is for the wait signal.  I don't remember the purpose of
> the other.

I know ptrace makes use of real_parent when re-parenting the process
being ptrace'd.

> If the parent dies before the child, the child will be re-parented on
> its grandparent if the parent doesn't hang around zombified, if I
> understand correctly.  If anything, a parent dying would likely further
> restrict the ability to set a task's audit container ID because a parent
> with an identical ID could vanish.

All the more reason to go with the simplified approach above.  I think
the parent/child relationship is a bit of a distraction and a
complexity that isn't important (except for the inheritance of
course).

>> * I think the key is the relationship between current and task, not
>> between task and task->real_parent.  I believe what we really care
>> about is that task is a descendant of current.  We might also want to
>> allow current to change the audit container ID if it holds
>> CAP_AUDIT_CONTROL, regardless of it's relationship with task.
>
> Currently, a process with CAP_AUDIT_CONTROL can set the audit container
> ID of any task that hasn't got children or threads, isn't itself, and
> its audit container ID is inherited or unset.  This was to try to
> prevent games with parents and children scratching each other's backs.
>
> I would feel more comfortable if only descendants were settable, so
> adding that restriction sounds like a good idea to me other than the
> tree-climbing excercise and overhead involved.
>
>> >> >> > +static void audit_log_set_containerid(struct task_struct *task, u64 oldcontainerid,
>> >> >> > +                                     u64 containerid, int rc)
>> >> >> > +{
>> >> >> > +       struct audit_buffer *ab;
>> >> >> > +       uid_t uid;
>> >> >> > +       struct tty_struct *tty;
>> >> >> > +
>> >> >> > +       if (!audit_enabled)
>> >> >> > +               return;
>> >> >> > +
>> >> >> > +       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONTAINER);
>> >> >> > +       if (!ab)
>> >> >> > +               return;
>> >> >> > +
>> >> >> > +       uid = from_kuid(&init_user_ns, task_uid(current));
>> >> >> > +       tty = audit_get_tty(current);
>> >> >> > +
>> >> >> > +       audit_log_format(ab, "op=set pid=%d uid=%u", task_tgid_nr(current), uid);
>> >> >> > +       audit_log_task_context(ab);
>> >> >> > +       audit_log_format(ab, " auid=%u tty=%s ses=%u opid=%d old-contid=%llu contid=%llu res=%d",
>> >> >> > +                        from_kuid(&init_user_ns, audit_get_loginuid(current)),
>> >> >> > +                        tty ? tty_name(tty) : "(none)", audit_get_sessionid(current),
>> >> >> > +                        task_tgid_nr(task), oldcontainerid, containerid, !rc);
>> >> >> > +
>> >> >> > +       audit_put_tty(tty);
>> >> >> > +       audit_log_end(ab);
>> >> >> > +}
>> >> >> > +
>> >> >> > +/**
>> >> >> > + * audit_set_containerid - set current task's audit_context containerid
>> >> >> > + * @containerid: containerid value
>> >> >> > + *
>> >> >> > + * Returns 0 on success, -EPERM on permission failure.
>> >> >> > + *
>> >> >> > + * Called (set) from fs/proc/base.c::proc_containerid_write().
>> >> >> > + */
>> >> >> > +int audit_set_containerid(struct task_struct *task, u64 containerid)
>> >> >> > +{
>> >> >> > +       u64 oldcontainerid;
>> >> >> > +       int rc;
>> >> >> > +
>> >> >> > +       oldcontainerid = audit_get_containerid(task);
>> >> >> > +
>> >> >> > +       rc = audit_set_containerid_perm(task, containerid);
>> >> >> > +       if (!rc) {
>> >> >> > +               task_lock(task);
>> >> >> > +               task->containerid = containerid;
>> >> >> > +               task_unlock(task);
>> >> >> > +       }
>> >> >> > +
>> >> >> > +       audit_log_set_containerid(task, oldcontainerid, containerid, rc);
>> >> >> > +       return rc;
>> >> >>
>> >> >> Why are audit_set_containerid_perm() and audit_log_containerid()
>> >> >> separate functions?
>> >> >
>> >> > (I assume you mean audit_log_set_containerid()?)
>> >>
>> >> Yep.  My fingers got tired typing in that function name and decided a
>> >> shortcut was necessary.
>> >>
>> >> > It seemed clearer that all the permission checking was in one function
>> >> > and its return code could be used to report the outcome when logging the
>> >> > (attempted) action.  This is the same structure as audit_set_loginuid()
>> >> > and it made sense.
>> >>
>> >> When possible I really like it when the permission checks are in the
>> >> same function as the code which does the work; it's less likely to get
>> >> abused that way (you have to willfully bypass the access checks).  The
>> >> exceptions might be if you wanted to reuse the access control code, or
>> >> insert a modular access mechanism (e.g. LSMs).
>> >
>> > I don't follow how it could be abused.  The return code from the perm
>> > check gates setting the value and is used in the success field in the
>> > log.
>>
>> If the permission checks are in the same function body as the code
>> which does the work you have to either split the function, or rewrite
>> it, if you want to bypass the permission checks.  It may be more of a
>> style issue than an actual safety issue, but the comments about
>> single-use functions in the same scope is the tie breaker.
>
> Perhaps I'm just being quite dense, but I just don't follow what the
> problem is and how you suggest fixing it.  A bunch of gotos to a label
> such as "out:" to log the refused action?  That seems messy and
> unstructured.

Fold audit_set_containerid_perm() and audit_log_set_containerid() into
their only caller, audit_set_containerid().

-- 
paul moore
www.paul-moore.com