linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Casey Schaufler <casey@schaufler-ca.com>
To: Richard Guy Briggs <rgb@redhat.com>
Cc: cgroups@vger.kernel.org,
	Linux Containers <containers@lists.linux-foundation.org>,
	Linux API <linux-api@vger.kernel.org>,
	Linux Audit <linux-audit@redhat.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux Network Development <netdev@vger.kernel.org>,
	mszeredi@redhat.com, Andy Lutomirski <luto@kernel.org>,
	jlayton@redhat.com, Carlos O'Donell <carlos@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	David Howells <dhowells@redhat.com>, Simo Sorce <simo@redhat.com>,
	trondmy@primarydata.com, Eric Paris <eparis@parisplace.org>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: RFC(v2): Audit Kernel Container IDs
Date: Thu, 19 Oct 2017 06:32:30 -0700	[thread overview]
Message-ID: <18cb69a5-f998-0e6e-85df-7f4b9b768a6f@schaufler-ca.com> (raw)
In-Reply-To: <20171019000527.eio6dfsmujmtioyt@madcap2.tricolour.ca>

On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
> On 2017-10-17 01:10, Casey Schaufler wrote:
>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
>>> On 2017-10-12 16:33, Casey Schaufler wrote:
>>>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>>>
>>>>> The Linux audit system needs a way to be able to track the container
>>>>> provenance of events and actions.  Audit needs the kernel's help to do
>>>>> this.
>>>>>
>>>>> Since the concept of a container is entirely a userspace concept, a
>>>>> registration from the userspace container orchestration system initiates
>>>>> this.  This will define a point in time and a set of resources
>>>>> associated with a particular container with an audit container ID.
>>>>>
>>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>>> representing a process that will become the first process in a new
>>>>> container.  This write might place restrictions on mount namespaces
>>>>> required to define a container, or at least careful checking of
>>>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>>> necessary in the container orchestrator's mntNS.
>>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>>> and simpler.
>>>>>
>>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>>> registration.
>>>> Hang on. If containers are a user space concept, how can
>>>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>>>> a container, how can you be asking for a capability to manage
>>>> them?
>>> There is such a thing, but the kernel doesn't know about it yet.
>> Then how can it be the kernel's place to control access to a
>> container resource, that is, the containerID.
> Ok, let me try to address your objections.
>
> The kernel can know enough that if it is already set to not allow it to
> be set again.  Or if the user doesn't have permission to set it that the
> user be denied this action.  How is this different from loginuid and
> sessionid?
>>>   This
>>> same situation exists for loginuid and sessionid which are userspace
>>> concepts that the kernel tracks for the convenience of userspace.
>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>> that a user is defined by the uid.
> This simple explanation doesn't help me.  What makes that a kernel
> concept?  The fact that it is stored and compared in more than one
> place?
>
>> The session ID has well defined kernel semantics. You're trying to say
>> that the containerID is an opaque value that is meaningless to the
>> kernel, but you still want the kernel to protect it. How can the
>> kernel know if it is protecting it correctly?
> How so?  A userspace process triggers this.  Does the kernel know what
> these values mean?  Does it do anything with them other than report
> them or allow audit to filter them?  It is given some instructions on
> how to treat it.
>
> This is what we're trying to do with the containerID.
>
>>>   As
>>> for its name, I'm not particularly picky, so if you don't like
>>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
>>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
>>> don't want to give the ability to set a containerID to any process that
>>> is able to do audit logging (such as vsftpd) and similarly we don't want
>>> to give the orchestrator the ability to control the setup of the audit
>>> daemon.
>> Sorry, but what aspect of the kernel security policy is this
>> capability supposed to protect? That's what capabilities are
>> for, not the undefined support of undefined user-space behavior.
> Similarly, loginuids and sessionIDs are only used for audit tracking and
> filtering.

Tell me again why you're not reusing either of these?

>
>> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
>> more than audit behavior you have to define what system security
>> policy you're dealing with in order to pick the right capability.
> It isn't audit behaviour (yet), it is audit reporting information, a
> level above simply writing logs and a level below controlling daemon
> behaviour.

You are changing audit information. That's CAP_AUDIT_CONTROL.

>
>> We get this request pretty regularly. "I need my own capability
>> because I have a niche thing that isn't part of the system security
>> policy but that is important!" Fit the containerID into the
>> system security policy, and if that results in using CAP_SYS_ADMIN,
>> oh well.
> There's far too much piled in to CAP_SYS_ADMIN already, which is making
> capabilites less and less useful.  

No. The value of capabilities is in separating privilege from DAC.
Granularity is a bonus. The current granularity is too fine in some
cases and too coarse in others.

> I realize that capabilities are
> limited compared with netlink message types, but this falls in between
> the abilities needed by CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE.

There is *nothing* about your use that makes a compelling
argument for a new capability. If you can't decide between
CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE require both.

>
> I'll continue on Steve Grubb's comment...
>
>>>>>   At that time, record the target container's user-supplied
>>>>> container identifier along with the target container's first process
>>>>> (which may become the target container's "init" process) process ID
>>>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>>>> record AUDIT_CONTAINER with a qualifying op=$action field.
>>>>>
>>>>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>>>>> container ID present on an auditable action or event.
>>>>>
>>>>> Forked and cloned processes inherit their parent's container ID,
>>>>> referenced in the process' task_struct.
>>>>>
>>>>> Mimic setns(2) and return an error if the process has already initiated
>>>>> threading or forked since this registration should happen before the
>>>>> process execution is started by the orchestrator and hence should not
>>>>> yet have any threads or children.  If this is deemed overly restrictive,
>>>>> switch all threads and children to the new containerID.
>>>>>
>>>>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>>>>>
>>>>> Log the creation of every namespace, inheriting/adding its spawning
>>>>> process' containerID(s), if applicable.  Include the spawning and
>>>>> spawned namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>>>>> Note: At this point it appears only network namespaces may need to track
>>>>> container IDs apart from processes since incoming packets may cause an
>>>>> auditable event before being associated with a process.
>>>>>
>>>>> Log the destruction of every namespace when it is no longer used by any
>>>>> process, include the namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>>>>>
>>>>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>>>>> the parent and child namespace IDs for any changes to a process'
>>>>> namespaces. [setns(2)]
>>>>> Note: It may be possible to combine AUDIT_NS_* record formats and
>>>>> distinguish them with an op=$action field depending on the fields
>>>>> required for each message type.
>>>>>
>>>>> When a container ceases to exist because the last process in that
>>>>> container has exited and hence the last namespace has been destroyed and
>>>>> its refcount dropping to zero, log the fact.
>>>>> (This latter is likely needed for certification accountability.)  A
>>>>> container object may need a list of processes and/or namespaces.
>>>>>
>>>>> A namespace cannot directly migrate from one container to another but
>>>>> could be assigned to a newly spawned container.  A namespace can be
>>>>> moved from one container to another indirectly by having that namespace
>>>>> used in a second process in another container and then ending all the
>>>>> processes in the first container.
>>>>>
>>>>> (v2)
>>>>> - switch from u64 to u128 UUID
>>>>> - switch from "signal" and "trigger" to "register"
>>>>> - restrict registration to single process or force all threads and children into same container
>>>>>
>>>>> - RGB
>>> - RGB
>>>
>>> --
>>> Richard Guy Briggs <rgb@redhat.com>
>>> Sr. S/W Engineer, Kernel Security, Base Operating Systems
>>> Remote, Ottawa, Red Hat Canada
>>> IRC: rgb, SunRaycer
>>> Voice: +1.647.777.2635, Internal: (81) 32635
>>>
> - RGB
>
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>

  reply	other threads:[~2017-10-19 13:32 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 14:14 RFC(v2): Audit Kernel Container IDs Richard Guy Briggs
2017-10-12 15:45 ` Steve Grubb
2017-10-19 19:57   ` Richard Guy Briggs
2017-10-19 23:11     ` Aleksa Sarai
2017-10-19 23:15       ` Aleksa Sarai
2017-10-20  2:25       ` Steve Grubb
2017-10-12 16:33 ` Casey Schaufler
2017-10-17  0:33   ` Richard Guy Briggs
2017-10-17  1:10     ` Casey Schaufler
2017-10-19  0:05       ` Richard Guy Briggs
2017-10-19 13:32         ` Casey Schaufler [this message]
2017-10-19 15:51           ` Paul Moore
2017-10-17  1:42     ` Steve Grubb
2017-10-17 12:31       ` Simo Sorce
2017-10-17 14:59         ` Casey Schaufler
2017-10-17 15:28           ` Simo Sorce
2017-10-17 15:44             ` James Bottomley
2017-10-17 16:43               ` Casey Schaufler
2017-10-17 17:15                 ` Steve Grubb
2017-10-17 17:57                   ` James Bottomley
2017-10-18  0:23                     ` Steve Grubb
2017-10-18 20:56               ` Paul Moore
2017-10-18 23:46                 ` Aleksa Sarai
2017-10-19  0:43                   ` Eric W. Biederman
2017-10-19 15:36                     ` Paul Moore
2017-10-19 16:25                       ` Eric W. Biederman
2017-10-19 17:47                         ` Paul Moore
2017-10-17 16:10             ` Casey Schaufler
2017-10-18 19:58         ` Paul Moore
2017-12-09 10:20   ` Mickaël Salaün
2017-12-09 18:28     ` Casey Schaufler
2017-12-11 16:30       ` Eric Paris
2017-12-11 16:52         ` Casey Schaufler
2017-12-11 19:37         ` Steve Grubb
2017-12-11 15:10     ` Richard Guy Briggs
2017-10-12 17:59 ` Eric W. Biederman
2017-10-13 13:43 ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18cb69a5-f998-0e6e-85df-7f4b9b768a6f@schaufler-ca.com \
    --to=casey@schaufler-ca.com \
    --cc=carlos@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=eparis@parisplace.org \
    --cc=jlayton@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-audit@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rgb@redhat.com \
    --cc=serge@hallyn.com \
    --cc=simo@redhat.com \
    --cc=trondmy@primarydata.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).