linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Moore <paul@paul-moore.com>
To: Richard Guy Briggs <rgb@redhat.com>
Cc: cgroups@vger.kernel.org, containers@lists.linux-foundation.org,
	linux-api@vger.kernel.org,
	Linux-Audit Mailing List <linux-audit@redhat.com>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, ebiederm@xmission.com, luto@kernel.org,
	jlayton@redhat.com, carlos@redhat.com, dhowells@redhat.com,
	viro@zeniv.linux.org.uk, simo@redhat.com,
	Eric Paris <eparis@parisplace.org>,
	serge@hallyn.com
Subject: Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
Date: Mon, 23 Apr 2018 19:15:08 -0400	[thread overview]
Message-ID: <CAHC9VhQkJBU-f-AuEnGF1BA2QW6nCJ_yr_EqBR02-1y9+XQZ5A@mail.gmail.com> (raw)
In-Reply-To: <20180421143443.faaput5g2rn6ul7p@madcap2.tricolour.ca>

On Sat, Apr 21, 2018 at 10:34 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 19:47, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > Implement the proc fs write to set the audit container ID of a process,
>> > emitting an AUDIT_CONTAINER record to document the event.
>> >
>> > This is a write from the container orchestrator task to a proc entry of
>> > the form /proc/PID/containerid where PID is the process ID of the newly
>> > created task that is to become the first task in a container, or an
>> > additional task added to a container.
>> >
>> > The write expects up to a u64 value (unset: 18446744073709551615).
>> >
>> > This will produce a record such as this:
>> > type=CONTAINER msg=audit(1519903238.968:261): op=set pid=596 uid=0 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0 ses=1 opid=596 old-contid=18446744073709551615 contid=123455 res=0
>> >
>> > The "op" field indicates an initial set.  The "pid" to "ses" fields are
>> > the orchestrator while the "opid" field is the object's PID, the process
>> > being "contained".  Old and new container ID values are given in the
>> > "contid" fields, while res indicates its success.
>> >
>> > It is not permitted to self-set, unset or re-set the container ID.  A
>> > child inherits its parent's container ID, but then can be set only once
>> > after.
>> >
>> > See: https://github.com/linux-audit/audit-kernel/issues/32
>> >
>> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
>> > ---
>> >  fs/proc/base.c             | 37 ++++++++++++++++++++
>> >  include/linux/audit.h      | 16 +++++++++
>> >  include/linux/init_task.h  |  4 ++-
>> >  include/linux/sched.h      |  1 +
>> >  include/uapi/linux/audit.h |  2 ++
>> >  kernel/auditsc.c           | 84 ++++++++++++++++++++++++++++++++++++++++++++++
>> >  6 files changed, 143 insertions(+), 1 deletion(-)

...

>> > diff --git a/include/linux/sched.h b/include/linux/sched.h
>> > index d258826..1b82191 100644
>> > --- a/include/linux/sched.h
>> > +++ b/include/linux/sched.h
>> > @@ -796,6 +796,7 @@ struct task_struct {
>> >  #ifdef CONFIG_AUDITSYSCALL
>> >         kuid_t                          loginuid;
>> >         unsigned int                    sessionid;
>> > +       u64                             containerid;
>>
>> This one line addition to the task_struct scares me the most of
>> anything in this patchset.  Why?  It's a field named "containerid" in
>> a perhaps one of the most widely used core kernel structures; the
>> possibilities for abuse are endless, and it's foolish to think we
>> would ever be able to adequately police this.
>
> Fair enough.
>
>> Unfortunately, we can't add the field to audit_context as things
>> currently stand because we don't always allocate an audit_context,
>> it's dependent on the system's configuration, and we need to track the
>> audit container ID for a given process, regardless of the audit
>> configuration.  Pretty much the same reason why loginuid and sessionid
>> are located directly in task_struct now.  As I stressed during the
>> design phase, I really want to keep this as an *audit* container ID
>> and not a general purpose kernel wide container ID.  If the kernel
>> ever grows a general purpose container ID token, I'll be the first in
>> line to convert the audit code, but I don't want audit to be that
>> general purpose mechanism ... audit is hated enough as-is ;)
>
> When would we need an audit container ID when audit is not enabled
> enough to have an audit_context?

I'm thinking of the audit_alloc() case where audit_filter_task()
returns AUDIT_DISABLED.

I believe this is the same reason why loginuid and sessionid live
directly in the task_struct and not in the audit_context; they need to
persist for the lifetime of the task.

> If it is only used for audit, and audit is the only consumer, and audit
> can only use it when it is enabled, then we can just return success to
> any write to the proc filehandle, or not even present it.  Nothing will
> be able to know that value wasn't used.
>
> When are loginuid and sessionid used now when audit is not enabled (or
> should I say, explicitly disabled)?

See above.  I think that should answer these questions.

>> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
>> > index 4e61a9e..921a71f 100644
>> > --- a/include/uapi/linux/audit.h
>> > +++ b/include/uapi/linux/audit.h
>> > @@ -71,6 +71,7 @@
>> >  #define AUDIT_TTY_SET          1017    /* Set TTY auditing status */
>> >  #define AUDIT_SET_FEATURE      1018    /* Turn an audit feature on or off */
>> >  #define AUDIT_GET_FEATURE      1019    /* Get which features are enabled */
>> > +#define AUDIT_CONTAINER                1020    /* Define the container id and information */
>> >
>> >  #define AUDIT_FIRST_USER_MSG   1100    /* Userspace messages mostly uninteresting to kernel */
>> >  #define AUDIT_USER_AVC         1107    /* We filter this differently */
>> > @@ -465,6 +466,7 @@ struct audit_tty_status {
>> >  };
>> >
>> >  #define AUDIT_UID_UNSET (unsigned int)-1
>> > +#define AUDIT_CID_UNSET ((u64)-1)
>>
>> I think we need to decide if we want to distinguish between the "host"
>> (e.g. init ns) and "unset".  Looking at this patch (I've only quickly
>> skimmed the others so far) it would appear that you don't think we
>> need to worry about this distinction; that's fine, but let's make it
>> explicit with a comment in the code that AUDIT_CID_UNSET means "unset"
>> as well as "host".
>
> I don't see any reason to distinguish between "host" and "unset".  Since
> a container doesn't have a concrete definition based in namespaces, the
> initial namespace set is meaningless here.

Okay, that sounds reasonable.

> Is there value in having a container orchestrator process have a
> reserved container ID that has a policy distinct from any other
> container?

I'm open to arguments for this idea, but I don't see a point to it right now.

> If so, then I could see the value in making the distinction.
> For example, I've heard of interest in systemd acting as a container
> orchestrator, so if it took on that role as PID 1, then every process in
> the system would inherit that ID and none would be unset.
>
> I can't picture how having seperate "host" and "unset" values helps us.

I don't have a strong feeling either way, I just wanted to ask the question.

>> >  /* audit_rule_data supports filter rules with both integer and string
>> >   * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
>> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> > index 4e0a4ac..29c8482 100644
>> > --- a/kernel/auditsc.c
>> > +++ b/kernel/auditsc.c
>> > @@ -2073,6 +2073,90 @@ int audit_set_loginuid(kuid_t loginuid)
>> >         return rc;
>> >  }
>> >
>> > +static int audit_set_containerid_perm(struct task_struct *task, u64 containerid)
>> > +{
>> > +       struct task_struct *parent;
>> > +       u64 pcontainerid, ccontainerid;
>> > +
>> > +       /* Don't allow to set our own containerid */
>> > +       if (current == task)
>> > +               return -EPERM;
>>
>> Why not?  Is there some obvious security concern that I missing?
>
> We then lose the distinction in the AUDIT_CONTAINER record between the
> initiating PID and the target PID.  This was outlined in the proposal.

I just went back and reread the v3 proposal and I still don't see a
good explanation of this.  Why is this bad?  What's the security
concern?

> Having said that, I'm still not sure we have protected sufficiently from
> a child turning around and setting it's parent's as yet unset or
> inherited audit container ID.

Yes, I believe we only want to let a task set the audit container for
it's children (or itself/threads if we decide to allow that, see
above).  There *has* to be a function to check to see if a task if a
child of a given task ... right? ... although this is likely to be a
pointer traversal and locking nightmare ... hmmm.

>> I ask because I suppose it might be possible for some container
>> runtime to do a fork, setup some of the environment and them exec the
>> container (before you answer the obvious "namespaces!" please remember
>> we're not trying to define containers).
>
> I don't think namespaces have any bearing on this concern since none are
> required.
>
>> > +       /* Don't allow the containerid to be unset */
>> > +       if (!cid_valid(containerid))
>> > +               return -EINVAL;
>> > +       /* if we don't have caps, reject */
>> > +       if (!capable(CAP_AUDIT_CONTROL))
>> > +               return -EPERM;
>> > +       /* if containerid is unset, allow */
>> > +       if (!audit_containerid_set(task))
>> > +               return 0;
>> > +       /* it is already set, and not inherited from the parent, reject */
>> > +       ccontainerid = audit_get_containerid(task);
>> > +       rcu_read_lock();
>> > +       parent = rcu_dereference(task->real_parent);
>> > +       rcu_read_unlock();
>> > +       task_lock(parent);
>> > +       pcontainerid = audit_get_containerid(parent);
>> > +       task_unlock(parent);
>> > +       if (ccontainerid != pcontainerid)
>> > +               return -EPERM;
>> > +       return 0;
>> > +}
>> > +
>> > +static void audit_log_set_containerid(struct task_struct *task, u64 oldcontainerid,
>> > +                                     u64 containerid, int rc)
>> > +{
>> > +       struct audit_buffer *ab;
>> > +       uid_t uid;
>> > +       struct tty_struct *tty;
>> > +
>> > +       if (!audit_enabled)
>> > +               return;
>> > +
>> > +       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONTAINER);
>> > +       if (!ab)
>> > +               return;
>> > +
>> > +       uid = from_kuid(&init_user_ns, task_uid(current));
>> > +       tty = audit_get_tty(current);
>> > +
>> > +       audit_log_format(ab, "op=set pid=%d uid=%u", task_tgid_nr(current), uid);
>> > +       audit_log_task_context(ab);
>> > +       audit_log_format(ab, " auid=%u tty=%s ses=%u opid=%d old-contid=%llu contid=%llu res=%d",
>> > +                        from_kuid(&init_user_ns, audit_get_loginuid(current)),
>> > +                        tty ? tty_name(tty) : "(none)", audit_get_sessionid(current),
>> > +                        task_tgid_nr(task), oldcontainerid, containerid, !rc);
>> > +
>> > +       audit_put_tty(tty);
>> > +       audit_log_end(ab);
>> > +}
>> > +
>> > +/**
>> > + * audit_set_containerid - set current task's audit_context containerid
>> > + * @containerid: containerid value
>> > + *
>> > + * Returns 0 on success, -EPERM on permission failure.
>> > + *
>> > + * Called (set) from fs/proc/base.c::proc_containerid_write().
>> > + */
>> > +int audit_set_containerid(struct task_struct *task, u64 containerid)
>> > +{
>> > +       u64 oldcontainerid;
>> > +       int rc;
>> > +
>> > +       oldcontainerid = audit_get_containerid(task);
>> > +
>> > +       rc = audit_set_containerid_perm(task, containerid);
>> > +       if (!rc) {
>> > +               task_lock(task);
>> > +               task->containerid = containerid;
>> > +               task_unlock(task);
>> > +       }
>> > +
>> > +       audit_log_set_containerid(task, oldcontainerid, containerid, rc);
>> > +       return rc;
>>
>> Why are audit_set_containerid_perm() and audit_log_containerid()
>> separate functions?
>
> (I assume you mean audit_log_set_containerid()?)

Yep.  My fingers got tired typing in that function name and decided a
shortcut was necessary.

> It seemed clearer that all the permission checking was in one function
> and its return code could be used to report the outcome when logging the
> (attempted) action.  This is the same structure as audit_set_loginuid()
> and it made sense.

When possible I really like it when the permission checks are in the
same function as the code which does the work; it's less likely to get
abused that way (you have to willfully bypass the access checks).  The
exceptions might be if you wanted to reuse the access control code, or
insert a modular access mechanism (e.g. LSMs).

I'm less concerned about audit_log_set_containerid(), but the usual
idea of avoiding single-use function within the same scope applies
here.

> This would be the time to connect it to a syscall if that seems like a
> good idea and remove pid, uid, auid, tty, ses fields.

Ah yes, I missed that.  You know my stance on connecting records by
now (hint: yes, connect them) so I think that would be a good thing to
do for the next round.

-- 
paul moore
www.paul-moore.com

  reply	other threads:[~2018-04-23 23:15 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-16  9:00 [RFC PATCH ghak32 V2 00/13] audit: implement container id Richard Guy Briggs
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 01/13] audit: add " Richard Guy Briggs
2018-03-28 18:39   ` Jonathan Corbet
2018-03-29  9:01     ` Richard Guy Briggs
2018-03-29 13:03       ` Jonathan Corbet
2018-03-30  5:06         ` Richard Guy Briggs
2018-04-18 23:47   ` Paul Moore
2018-04-19  0:41     ` Casey Schaufler
2018-04-19  0:46       ` Paul Moore
2018-04-19  1:15         ` Casey Schaufler
2018-04-21 14:34     ` Richard Guy Briggs
2018-04-23 23:15       ` Paul Moore [this message]
2018-04-24  2:02         ` Richard Guy Briggs
2018-04-24 19:01           ` Paul Moore
2018-04-25  0:40             ` Richard Guy Briggs
2018-04-26 22:47               ` Paul Moore
2018-05-06 16:51     ` Richard Guy Briggs
2018-05-17 21:00   ` Steve Grubb
2018-05-17 21:56     ` Richard Guy Briggs
2018-05-18 13:56       ` Steve Grubb
2018-05-18 15:21         ` Richard Guy Briggs
2018-05-18 15:38           ` Steve Grubb
2018-06-01 21:04     ` Richard Guy Briggs
2018-06-04 16:09       ` Steve Grubb
2018-06-04 20:23         ` Richard Guy Briggs
2018-06-04 20:30           ` Richard Guy Briggs
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 02/13] audit: check children and threading before allowing containerid Richard Guy Briggs
2018-04-19  0:11   ` Paul Moore
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 03/13] audit: log container info of syscalls Richard Guy Briggs
2018-05-17 21:09   ` Steve Grubb
2018-05-17 21:41     ` Richard Guy Briggs
2018-05-21 19:19       ` Steve Grubb
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 04/13] audit: add containerid filtering Richard Guy Briggs
2018-04-19  0:24   ` Paul Moore
2018-04-19 12:17     ` Richard Guy Briggs
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 05/13] audit: add containerid support for ptrace and signals Richard Guy Briggs
2018-04-19  0:32   ` Paul Moore
2018-04-20  1:03     ` Richard Guy Briggs
2018-04-20 16:13       ` Paul Moore
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 06/13] audit: add support for non-syscall auxiliary records Richard Guy Briggs
2018-04-19  0:39   ` Paul Moore
2018-04-20  1:23     ` Richard Guy Briggs
2018-04-20 16:21       ` Paul Moore
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 07/13] audit: add container aux record to watch/tree/mark Richard Guy Briggs
2018-04-19  0:42   ` Paul Moore
2018-04-19 12:24     ` Richard Guy Briggs
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 08/13] audit: add containerid support for tty_audit Richard Guy Briggs
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 09/13] audit: add containerid support for config/feature/user records Richard Guy Briggs
2018-04-19  1:27   ` Paul Moore
2018-04-19 12:31     ` Richard Guy Briggs
2018-04-19 12:59       ` Paul Moore
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 10/13] audit: add containerid support for seccomp and anom_abend records Richard Guy Briggs
2018-04-19  1:31   ` Paul Moore
2018-04-20  0:42     ` Richard Guy Briggs
2018-04-20 16:11       ` Paul Moore
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 11/13] audit: add support for containerid to network namespaces Richard Guy Briggs
2018-04-19  1:46   ` Paul Moore
2018-04-20 20:02     ` Richard Guy Briggs
2018-04-20 20:22       ` Paul Moore
2018-04-20 20:42         ` Richard Guy Briggs
2018-04-21 12:10           ` Paul Moore
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 12/13] audit: NETFILTER_PKT: record each container ID associated with a netNS Richard Guy Briggs
2018-04-19  2:10   ` Paul Moore
2018-04-19 12:45     ` Richard Guy Briggs
2018-04-19 13:13       ` Paul Moore
2018-03-16  9:00 ` [RFC PATCH ghak32 V2 13/13] debug audit: read container ID of a process Richard Guy Briggs
2018-05-21 19:16   ` Steve Grubb
2018-05-21 19:19     ` Eric W. Biederman
2018-05-21 20:06       ` Paul Moore
2018-05-22 17:35         ` Richard Guy Briggs
2018-05-22 18:59           ` Paul Moore
2018-05-30 13:20 ` [RFC PATCH ghak32 V2 00/13] audit: implement container id Steve Grubb
2018-05-30 17:33   ` Richard Guy Briggs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHC9VhQkJBU-f-AuEnGF1BA2QW6nCJ_yr_EqBR02-1y9+XQZ5A@mail.gmail.com \
    --to=paul@paul-moore.com \
    --cc=carlos@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=eparis@parisplace.org \
    --cc=jlayton@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-audit@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rgb@redhat.com \
    --cc=serge@hallyn.com \
    --cc=simo@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).