All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC(v2): Audit Kernel Container IDs
@ 2017-10-12 14:14 ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-12 14:14 UTC (permalink / raw)
  To: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development
  Cc: Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
	Eric W. Biederman, Eric Paris, jlayton, Andy Lutomirski,
	mszeredi, Paul Moore, Serge E. Hallyn, Steve Grubb, trondmy,
	Al Viro

Containers are a userspace concept.  The kernel knows nothing of them.

The Linux audit system needs a way to be able to track the container
provenance of events and actions.  Audit needs the kernel's help to do
this.

Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this.  This will define a point in time and a set of resources
associated with a particular container with an audit container ID.

The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator so it
can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.  At that time, record the target container's user-supplied
container identifier along with the target container's first process
(which may become the target container's "init" process) process ID
(referenced from the initial PID namespace), all namespace IDs (in the
form of a nsfs device number and inode number tuple) in a new auxilliary
record AUDIT_CONTAINER with a qualifying op=$action field.

Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
container ID present on an auditable action or event.

Forked and cloned processes inherit their parent's container ID,
referenced in the process' task_struct.

Mimic setns(2) and return an error if the process has already initiated
threading or forked since this registration should happen before the
process execution is started by the orchestrator and hence should not
yet have any threads or children.  If this is deemed overly restrictive,
switch all threads and children to the new containerID.

Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.

Log the creation of every namespace, inheriting/adding its spawning
process' containerID(s), if applicable.  Include the spawning and
spawned namespace IDs (device and inode number tuples).
[AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
Note: At this point it appears only network namespaces may need to track
container IDs apart from processes since incoming packets may cause an
auditable event before being associated with a process.

Log the destruction of every namespace when it is no longer used by any
process, include the namespace IDs (device and inode number tuples).
[AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]

Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
the parent and child namespace IDs for any changes to a process'
namespaces. [setns(2)]
Note: It may be possible to combine AUDIT_NS_* record formats and
distinguish them with an op=$action field depending on the fields
required for each message type.

When a container ceases to exist because the last process in that
container has exited and hence the last namespace has been destroyed and
its refcount dropping to zero, log the fact.
(This latter is likely needed for certification accountability.)  A
container object may need a list of processes and/or namespaces.

A namespace cannot directly migrate from one container to another but
could be assigned to a newly spawned container.  A namespace can be
moved from one container to another indirectly by having that namespace
used in a second process in another container and then ending all the
processes in the first container.

(v2)
- switch from u64 to u128 UUID
- switch from "signal" and "trigger" to "register"
- restrict registration to single process or force all threads and children into same container

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RFC(v2): Audit Kernel Container IDs
@ 2017-10-12 14:14 ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-12 14:14 UTC (permalink / raw)
  To: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development
  Cc: Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
	Eric W. Biederman, Eric Paris, jlayton, Andy Lutomirski,
	mszeredi, Paul Moore, Serge E. Hallyn, Steve Grubb, trondmy,
	Al Viro

Containers are a userspace concept.  The kernel knows nothing of them.

The Linux audit system needs a way to be able to track the container
provenance of events and actions.  Audit needs the kernel's help to do
this.

Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this.  This will define a point in time and a set of resources
associated with a particular container with an audit container ID.

The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator so it
can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.  At that time, record the target container's user-supplied
container identifier along with the target container's first process
(which may become the target container's "init" process) process ID
(referenced from the initial PID namespace), all namespace IDs (in the
form of a nsfs device number and inode number tuple) in a new auxilliary
record AUDIT_CONTAINER with a qualifying op=$action field.

Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
container ID present on an auditable action or event.

Forked and cloned processes inherit their parent's container ID,
referenced in the process' task_struct.

Mimic setns(2) and return an error if the process has already initiated
threading or forked since this registration should happen before the
process execution is started by the orchestrator and hence should not
yet have any threads or children.  If this is deemed overly restrictive,
switch all threads and children to the new containerID.

Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.

Log the creation of every namespace, inheriting/adding its spawning
process' containerID(s), if applicable.  Include the spawning and
spawned namespace IDs (device and inode number tuples).
[AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
Note: At this point it appears only network namespaces may need to track
container IDs apart from processes since incoming packets may cause an
auditable event before being associated with a process.

Log the destruction of every namespace when it is no longer used by any
process, include the namespace IDs (device and inode number tuples).
[AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]

Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
the parent and child namespace IDs for any changes to a process'
namespaces. [setns(2)]
Note: It may be possible to combine AUDIT_NS_* record formats and
distinguish them with an op=$action field depending on the fields
required for each message type.

When a container ceases to exist because the last process in that
container has exited and hence the last namespace has been destroyed and
its refcount dropping to zero, log the fact.
(This latter is likely needed for certification accountability.)  A
container object may need a list of processes and/or namespaces.

A namespace cannot directly migrate from one container to another but
could be assigned to a newly spawned container.  A namespace can be
moved from one container to another indirectly by having that namespace
used in a second process in another container and then ending all the
processes in the first container.

(v2)
- switch from u64 to u128 UUID
- switch from "signal" and "trigger" to "register"
- restrict registration to single process or force all threads and children into same container

- RGB

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found] ` <20171012141359.saqdtnodwmbz33b2-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2017-10-12 15:45   ` Steve Grubb
  2017-10-12 16:33   ` Casey Schaufler
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-12 15:45 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Paul Moore,
	Linux Kernel, Eric Paris, Al Viro, David Howells, Linux Audit,
	Simo Sorce, Linux Network Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman

On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

The requirements for common criteria around containers should be very closely 
modeled on the requirements for virtualization. It would be the container 
manager that is responsible for logging the resource assignment events.


> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
> 
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.


> At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.

This would be in addition to the normal audit fields.

> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
> 
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
> 
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
> 
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> 
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
> 
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]

In the virtualization requirements, we only log removal of resources when 
something is removed by intention. If the VM shuts down, the manager issues a 
VIRT_CONTROL stop event and the user space utilities knows this means all 
resources have been unassigned.

> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
> 
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
> 
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.

I'm thinking that there needs to be a clear delineation between what the 
container manager is responsible for and what the kernel needs to do. The 
kernel needs the registration system and to associate an identifier with 
events inside the container.

But would the container manager be mostly responsible for auditing the events 
described here:

https://github.com/linux-audit/audit-documentation/wiki/SPEC-Virtualization-Manager-Guest-Lifecycle-Events

Also, we can already audit exit, unshare, setns, and clone. If the kernel just 
sticks the identifier on them, isn't that sufficient?

-Steve

> (v2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and children
> into same container
> 
> - RGB
> 
> --
> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-12 14:14 ` Richard Guy Briggs
  (?)
@ 2017-10-12 15:45 ` Steve Grubb
  2017-10-19 19:57     ` Richard Guy Briggs
  2017-10-19 19:57   ` Richard Guy Briggs
  -1 siblings, 2 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-12 15:45 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
	Eric W. Biederman, Eric Paris, jlayton, Andy Lutomirski,
	mszeredi, Paul Moore, Serge E. Hallyn, trondmy, Al Viro

On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

The requirements for common criteria around containers should be very closely 
modeled on the requirements for virtualization. It would be the container 
manager that is responsible for logging the resource assignment events.


> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
> 
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.


> At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.

This would be in addition to the normal audit fields.

> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
> 
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
> 
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
> 
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> 
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
> 
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]

In the virtualization requirements, we only log removal of resources when 
something is removed by intention. If the VM shuts down, the manager issues a 
VIRT_CONTROL stop event and the user space utilities knows this means all 
resources have been unassigned.

> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
> 
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
> 
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.

I'm thinking that there needs to be a clear delineation between what the 
container manager is responsible for and what the kernel needs to do. The 
kernel needs the registration system and to associate an identifier with 
events inside the container.

But would the container manager be mostly responsible for auditing the events 
described here:

https://github.com/linux-audit/audit-documentation/wiki/SPEC-Virtualization-Manager-Guest-Lifecycle-Events

Also, we can already audit exit, unshare, setns, and clone. If the kernel just 
sticks the identifier on them, isn't that sufficient?

-Steve

> (v2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and children
> into same container
> 
> - RGB
> 
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found] ` <20171012141359.saqdtnodwmbz33b2-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
  2017-10-12 15:45   ` Steve Grubb
@ 2017-10-12 16:33   ` Casey Schaufler
  2017-10-12 17:59     ` Eric W. Biederman
  2017-10-13 13:43   ` Alan Cox
  3 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-12 16:33 UTC (permalink / raw)
  To: Richard Guy Briggs, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Linux Containers, Linux API, Linux Audit, Linux FS Devel,
	Linux Kernel, Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	David Howells, Al Viro, Andy Lutomirski, Eric Paris,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.
>
> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
>
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Hang on. If containers are a user space concept, how can
you want CAP_CONTAINER_ANYTHING? If there's not such thing as
a container, how can you be asking for a capability to manage
them?

>   At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
>
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
>
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.
>
> (v2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and children into same container
>
> - RGB
>
> --
> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>
> --
> Linux-audit mailing list
> Linux-audit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> https://www.redhat.com/mailman/listinfo/linux-audit
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found] ` <20171012141359.saqdtnodwmbz33b2-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2017-10-12 16:33   ` Casey Schaufler
  2017-10-12 16:33   ` Casey Schaufler
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-12 16:33 UTC (permalink / raw)
  To: Richard Guy Briggs, cgroups, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi, Andy Lutomirski, jlayton, Carlos O'Donell, Al Viro,
	David Howells, Simo Sorce, trondmy, Eric Paris, Serge E. Hallyn,
	Eric W. Biederman

On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.
>
> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
>
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Hang on. If containers are a user space concept, how can
you want CAP_CONTAINER_ANYTHING? If there's not such thing as
a container, how can you be asking for a capability to manage
them?

>   At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
>
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
>
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.
>
> (v2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and children into same container
>
> - RGB
>
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-12 16:33   ` Casey Schaufler
  0 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-12 16:33 UTC (permalink / raw)
  To: Richard Guy Briggs, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Linux Containers, Linux API, Linux Audit, Linux FS Devel,
	Linux Kernel, Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Al Viro,
	David Howells, Simo Sorce, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Eric Paris, Serge E. Hallyn, Eric W. Biederman

On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.
>
> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
>
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Hang on. If containers are a user space concept, how can
you want CAP_CONTAINER_ANYTHING? If there's not such thing as
a container, how can you be asking for a capability to manage
them?

>   At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
>
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
>
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.
>
> (v2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and children into same container
>
> - RGB
>
> --
> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>
> --
> Linux-audit mailing list
> Linux-audit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> https://www.redhat.com/mailman/listinfo/linux-audit
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-12 14:14 ` Richard Guy Briggs
@ 2017-10-12 17:59     ` Eric W. Biederman
  -1 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-12 17:59 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Paul Moore,
	Linux Kernel, Eric Paris, Al Viro, David Howells, Linux Audit,
	Simo Sorce, Linux Network Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Steve Grubb

Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.

Ugh no.  The semantics here are way too mushy.  We need a clean crisp
unambiguous definition or it will be impossible to get this correct and
impossible to use for any security purpose.

I understand the challenge.  Some of the container managers share
namespaces between containers.  Leading to things that are not really
contained.

Please make this concept like an indellibale die.  Once you are stained
with it you can not escape.  If you don't meet all of the criteria you
aren't stained.

The justification that I heard, and that seems legitimate is that it is
not timely and it is hard to make the connection between the distinct
unshare, setns, and clone events and what is happening in the kernel.

With that justification definitely the network namespace needs to be
stained if it is appropriate.

I also don't see why this can't be a special dedicated audit message.
I just looked at the code in the kernel and nlmsg_type is a u16.  There
are only a handful of audit message types defined.  There is absolutely
no reason to bring proc into this.

I have the same reservation as the others about defining a new cap for
this.  It should be enough to make setting the container id a one time
thing for a set of processes and namespaces.

If this is going to be security it needs to be very simple and very well defined.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-12 17:59     ` Eric W. Biederman
  0 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-12 17:59 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
	Eric Paris, jlayton, Andy Lutomirski, mszeredi, Paul Moore,
	Serge E. Hallyn, Steve Grubb, trondmy, Al Viro

Richard Guy Briggs <rgb@redhat.com> writes:

> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.

Ugh no.  The semantics here are way too mushy.  We need a clean crisp
unambiguous definition or it will be impossible to get this correct and
impossible to use for any security purpose.

I understand the challenge.  Some of the container managers share
namespaces between containers.  Leading to things that are not really
contained.

Please make this concept like an indellibale die.  Once you are stained
with it you can not escape.  If you don't meet all of the criteria you
aren't stained.

The justification that I heard, and that seems legitimate is that it is
not timely and it is hard to make the connection between the distinct
unshare, setns, and clone events and what is happening in the kernel.

With that justification definitely the network namespace needs to be
stained if it is appropriate.

I also don't see why this can't be a special dedicated audit message.
I just looked at the code in the kernel and nlmsg_type is a u16.  There
are only a handful of audit message types defined.  There is absolutely
no reason to bring proc into this.

I have the same reservation as the others about defining a new cap for
this.  It should be enough to make setting the container id a one time
thing for a set of processes and namespaces.

If this is going to be security it needs to be very simple and very well defined.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found] ` <20171012141359.saqdtnodwmbz33b2-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-10-12 17:59     ` Eric W. Biederman
@ 2017-10-13 13:43   ` Alan Cox
  3 siblings, 0 replies; 94+ messages in thread
From: Alan Cox @ 2017-10-13 13:43 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Steve Grubb, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Paul Moore, Linux Kernel, Eric Paris, Al Viro,
	David Howells, Linux Audit, Simo Sorce,
	Linux Network Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Eric W. Biederman

On Thu, 12 Oct 2017 10:14:00 -0400
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

I don't think this has anything to do with containers directly. If i
read it right you need a subtree of stuff to be asigned a (possibly
irrevocable) magic identifier that you can use for other purposes.

Traditional Unix in the more 'secure' space had that decades ago in the
form of luid. At login time you did a setluid() and that set an
irrevocable tag onthe session which was (traditionally) the uid of the
login process so that audit and other related tools always knew how to
tie the process back to the login session.

That doesn't quite work as of itself (if you login you'd get luid set and
not be able to change it for the container), but it seems something
similarly trivial like a "setauditid(void)" would do the trick providing
the kernel picked the UUID randomly [otherwise I can copy another known
UUID to confuse or hide].

As you say a container is a userspace concept. So IMHO any audit
interface should be about auditing and what needs tracking, not about
containers. If the container management tool wants to set a suitable tag
then let it. If not then it doesn't.

Then it's a simple as checking CAP_AUDIT_WRITE to see if you are allowed
to setauditit(), generating a random uuid and a matching getauditid() to
copy it back.

Alan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found] ` <20171012141359.saqdtnodwmbz33b2-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
  2017-10-12 15:45   ` Steve Grubb
@ 2017-10-13 13:43   ` Alan Cox
  2017-10-12 17:59     ` Eric W. Biederman
  2017-10-13 13:43   ` Alan Cox
  3 siblings, 0 replies; 94+ messages in thread
From: Alan Cox @ 2017-10-13 13:43 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
	Eric W. Biederman, Eric Paris, jlayton, Andy Lutomirski,
	mszeredi, Paul Moore, Serge E. Hallyn, Steve Grubb, trondmy,
	Al Viro

On Thu, 12 Oct 2017 10:14:00 -0400
Richard Guy Briggs <rgb@redhat.com> wrote:

> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

I don't think this has anything to do with containers directly. If i
read it right you need a subtree of stuff to be asigned a (possibly
irrevocable) magic identifier that you can use for other purposes.

Traditional Unix in the more 'secure' space had that decades ago in the
form of luid. At login time you did a setluid() and that set an
irrevocable tag onthe session which was (traditionally) the uid of the
login process so that audit and other related tools always knew how to
tie the process back to the login session.

That doesn't quite work as of itself (if you login you'd get luid set and
not be able to change it for the container), but it seems something
similarly trivial like a "setauditid(void)" would do the trick providing
the kernel picked the UUID randomly [otherwise I can copy another known
UUID to confuse or hide].

As you say a container is a userspace concept. So IMHO any audit
interface should be about auditing and what needs tracking, not about
containers. If the container management tool wants to set a suitable tag
then let it. If not then it doesn't.

Then it's a simple as checking CAP_AUDIT_WRITE to see if you are allowed
to setauditit(), generating a random uuid and a matching getauditid() to
copy it back.

Alan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-13 13:43   ` Alan Cox
  0 siblings, 0 replies; 94+ messages in thread
From: Alan Cox @ 2017-10-13 13:43 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development, Simo Sorce, Carlos O'Donell,
	Aristeu Rozanski, David Howells, Eric W. Biederman, Eric Paris,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Paul Moore, Serge E. Hallyn,
	Steve Grubb, trondmy-7I+n7zu2hftEKMMhf/gKZA, Al Viro

On Thu, 12 Oct 2017 10:14:00 -0400
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

I don't think this has anything to do with containers directly. If i
read it right you need a subtree of stuff to be asigned a (possibly
irrevocable) magic identifier that you can use for other purposes.

Traditional Unix in the more 'secure' space had that decades ago in the
form of luid. At login time you did a setluid() and that set an
irrevocable tag onthe session which was (traditionally) the uid of the
login process so that audit and other related tools always knew how to
tie the process back to the login session.

That doesn't quite work as of itself (if you login you'd get luid set and
not be able to change it for the container), but it seems something
similarly trivial like a "setauditid(void)" would do the trick providing
the kernel picked the UUID randomly [otherwise I can copy another known
UUID to confuse or hide].

As you say a container is a userspace concept. So IMHO any audit
interface should be about auditing and what needs tracking, not about
containers. If the container management tool wants to set a suitable tag
then let it. If not then it doesn't.

Then it's a simple as checking CAP_AUDIT_WRITE to see if you are allowed
to setauditit(), generating a random uuid and a matching getauditid() to
copy it back.

Alan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-13 13:43   ` Alan Cox
  0 siblings, 0 replies; 94+ messages in thread
From: Alan Cox @ 2017-10-13 13:43 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development, Simo Sorce, Carlos O'Donell,
	Aristeu Rozanski, David Howells, Eric W. Biederman, Eric Paris,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Paul Moore, Serge E. Hallyn,
	Steve Grubb, trondmy-7I+n7zu2hftEKMMhf/gKZA, Al Viro

On Thu, 12 Oct 2017 10:14:00 -0400
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Containers are a userspace concept.  The kernel knows nothing of them.
> 
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
> 
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.

I don't think this has anything to do with containers directly. If i
read it right you need a subtree of stuff to be asigned a (possibly
irrevocable) magic identifier that you can use for other purposes.

Traditional Unix in the more 'secure' space had that decades ago in the
form of luid. At login time you did a setluid() and that set an
irrevocable tag onthe session which was (traditionally) the uid of the
login process so that audit and other related tools always knew how to
tie the process back to the login session.

That doesn't quite work as of itself (if you login you'd get luid set and
not be able to change it for the container), but it seems something
similarly trivial like a "setauditid(void)" would do the trick providing
the kernel picked the UUID randomly [otherwise I can copy another known
UUID to confuse or hide].

As you say a container is a userspace concept. So IMHO any audit
interface should be about auditing and what needs tracking, not about
containers. If the container management tool wants to set a suitable tag
then let it. If not then it doesn't.

Then it's a simple as checking CAP_AUDIT_WRITE to see if you are allowed
to setauditit(), generating a random uuid and a matching getauditid() to
copy it back.

Alan

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]   ` <75b7d6a6-42ba-2dff-1836-1091c7c024e7-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-10-17  0:33     ` Richard Guy Briggs
  2017-12-09 10:20     ` Mickaël Salaün
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-17  0:33 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Eric Paris, David Howells,
	Linux Audit, Al Viro, Andy Lutomirski, Linux Network Development,
	Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On 2017-10-12 16:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> >
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> >
> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> >
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?

There is such a thing, but the kernel doesn't know about it yet.  This
same situation exists for loginuid and sessionid which are userspace
concepts that the kernel tracks for the convenience of userspace.  As
for its name, I'm not particularly picky, so if you don't like
CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
don't want to give the ability to set a containerID to any process that
is able to do audit logging (such as vsftpd) and similarly we don't want
to give the orchestrator the ability to control the setup of the audit
daemon.
> 
> >   At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> >
> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> >
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> >
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> >
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> >
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> >
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >
> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> >
> > When a container ceases to exist because the last process in that
> > container has exited and hence the last namespace has been destroyed and
> > its refcount dropping to zero, log the fact.
> > (This latter is likely needed for certification accountability.)  A
> > container object may need a list of processes and/or namespaces.
> >
> > A namespace cannot directly migrate from one container to another but
> > could be assigned to a newly spawned container.  A namespace can be
> > moved from one container to another indirectly by having that namespace
> > used in a second process in another container and then ending all the
> > processes in the first container.
> >
> > (v2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and children into same container
> >
> > - RGB

- RGB

--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-12 16:33   ` Casey Schaufler
  (?)
@ 2017-10-17  0:33   ` Richard Guy Briggs
  2017-10-17  1:10     ` Casey Schaufler
       [not found]     ` <20171017003340.whjdkqmkw4lydwy7-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
  -1 siblings, 2 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-17  0:33 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	mszeredi, Andy Lutomirski, jlayton, Carlos O'Donell, Al Viro,
	David Howells, Simo Sorce, trondmy, Eric Paris, Serge E. Hallyn,
	Eric W. Biederman

On 2017-10-12 16:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> >
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> >
> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> >
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?

There is such a thing, but the kernel doesn't know about it yet.  This
same situation exists for loginuid and sessionid which are userspace
concepts that the kernel tracks for the convenience of userspace.  As
for its name, I'm not particularly picky, so if you don't like
CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
don't want to give the ability to set a containerID to any process that
is able to do audit logging (such as vsftpd) and similarly we don't want
to give the orchestrator the ability to control the setup of the audit
daemon.
> 
> >   At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> >
> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> >
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> >
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> >
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> >
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> >
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >
> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> >
> > When a container ceases to exist because the last process in that
> > container has exited and hence the last namespace has been destroyed and
> > its refcount dropping to zero, log the fact.
> > (This latter is likely needed for certification accountability.)  A
> > container object may need a list of processes and/or namespaces.
> >
> > A namespace cannot directly migrate from one container to another but
> > could be assigned to a newly spawned container.  A namespace can be
> > moved from one container to another indirectly by having that namespace
> > used in a second process in another container and then ending all the
> > processes in the first container.
> >
> > (v2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and children into same container
> >
> > - RGB

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]     ` <20171017003340.whjdkqmkw4lydwy7-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2017-10-17  1:10       ` Casey Schaufler
  2017-10-17  1:42         ` Steve Grubb
  1 sibling, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17  1:10 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Eric Paris, David Howells,
	Linux Audit, Al Viro, Andy Lutomirski, Linux Network Development,
	Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
> On 2017-10-12 16:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
> There is such a thing, but the kernel doesn't know about it yet.

Then how can it be the kernel's place to control access to a
container resource, that is, the containerID.

>   This
> same situation exists for loginuid and sessionid which are userspace
> concepts that the kernel tracks for the convenience of userspace.

Ah, no. Loginuid identifies a user, which is a kernel concept in
that a user is defined by the uid. The session ID has well defined
kernel semantics. You're trying to say that the containerID is an
opaque value that is meaningless to the kernel, but you still want
the kernel to protect it. How can the kernel know if it is protecting
it correctly?

>   As
> for its name, I'm not particularly picky, so if you don't like
> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> don't want to give the ability to set a containerID to any process that
> is able to do audit logging (such as vsftpd) and similarly we don't want
> to give the orchestrator the ability to control the setup of the audit
> daemon.

Sorry, but what aspect of the kernel security policy is this
capability supposed to protect? That's what capabilities are
for, not the undefined support of undefined user-space behavior.

If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
more than audit behavior you have to define what system security
policy you're dealing with in order to pick the right capability.

We get this request pretty regularly. "I need my own capability
because I have a niche thing that isn't part of the system security
policy but that is important!" Fit the containerID into the
system security policy, and if that results in using CAP_SYS_ADMIN,
oh well.

>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
>>>
>>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>>> container ID present on an auditable action or event.
>>>
>>> Forked and cloned processes inherit their parent's container ID,
>>> referenced in the process' task_struct.
>>>
>>> Mimic setns(2) and return an error if the process has already initiated
>>> threading or forked since this registration should happen before the
>>> process execution is started by the orchestrator and hence should not
>>> yet have any threads or children.  If this is deemed overly restrictive,
>>> switch all threads and children to the new containerID.
>>>
>>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>>>
>>> Log the creation of every namespace, inheriting/adding its spawning
>>> process' containerID(s), if applicable.  Include the spawning and
>>> spawned namespace IDs (device and inode number tuples).
>>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>>> Note: At this point it appears only network namespaces may need to track
>>> container IDs apart from processes since incoming packets may cause an
>>> auditable event before being associated with a process.
>>>
>>> Log the destruction of every namespace when it is no longer used by any
>>> process, include the namespace IDs (device and inode number tuples).
>>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>>>
>>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>>> the parent and child namespace IDs for any changes to a process'
>>> namespaces. [setns(2)]
>>> Note: It may be possible to combine AUDIT_NS_* record formats and
>>> distinguish them with an op=$action field depending on the fields
>>> required for each message type.
>>>
>>> When a container ceases to exist because the last process in that
>>> container has exited and hence the last namespace has been destroyed and
>>> its refcount dropping to zero, log the fact.
>>> (This latter is likely needed for certification accountability.)  A
>>> container object may need a list of processes and/or namespaces.
>>>
>>> A namespace cannot directly migrate from one container to another but
>>> could be assigned to a newly spawned container.  A namespace can be
>>> moved from one container to another indirectly by having that namespace
>>> used in a second process in another container and then ending all the
>>> processes in the first container.
>>>
>>> (v2)
>>> - switch from u64 to u128 UUID
>>> - switch from "signal" and "trigger" to "register"
>>> - restrict registration to single process or force all threads and children into same container
>>>
>>> - RGB
> - RGB
>
> --
> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17  0:33   ` Richard Guy Briggs
@ 2017-10-17  1:10     ` Casey Schaufler
       [not found]       ` <81c15928-c445-fb8e-251c-bee566fbbf58-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
  2017-10-19  0:05         ` Richard Guy Briggs
       [not found]     ` <20171017003340.whjdkqmkw4lydwy7-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
  1 sibling, 2 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17  1:10 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	mszeredi, Andy Lutomirski, jlayton, Carlos O'Donell, Al Viro,
	David Howells, Simo Sorce, trondmy, Eric Paris, Serge E. Hallyn,
	Eric W. Biederman

On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
> On 2017-10-12 16:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
> There is such a thing, but the kernel doesn't know about it yet.

Then how can it be the kernel's place to control access to a
container resource, that is, the containerID.

>   This
> same situation exists for loginuid and sessionid which are userspace
> concepts that the kernel tracks for the convenience of userspace.

Ah, no. Loginuid identifies a user, which is a kernel concept in
that a user is defined by the uid. The session ID has well defined
kernel semantics. You're trying to say that the containerID is an
opaque value that is meaningless to the kernel, but you still want
the kernel to protect it. How can the kernel know if it is protecting
it correctly?

>   As
> for its name, I'm not particularly picky, so if you don't like
> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> don't want to give the ability to set a containerID to any process that
> is able to do audit logging (such as vsftpd) and similarly we don't want
> to give the orchestrator the ability to control the setup of the audit
> daemon.

Sorry, but what aspect of the kernel security policy is this
capability supposed to protect? That's what capabilities are
for, not the undefined support of undefined user-space behavior.

If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
more than audit behavior you have to define what system security
policy you're dealing with in order to pick the right capability.

We get this request pretty regularly. "I need my own capability
because I have a niche thing that isn't part of the system security
policy but that is important!" Fit the containerID into the
system security policy, and if that results in using CAP_SYS_ADMIN,
oh well.

>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
>>>
>>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>>> container ID present on an auditable action or event.
>>>
>>> Forked and cloned processes inherit their parent's container ID,
>>> referenced in the process' task_struct.
>>>
>>> Mimic setns(2) and return an error if the process has already initiated
>>> threading or forked since this registration should happen before the
>>> process execution is started by the orchestrator and hence should not
>>> yet have any threads or children.  If this is deemed overly restrictive,
>>> switch all threads and children to the new containerID.
>>>
>>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>>>
>>> Log the creation of every namespace, inheriting/adding its spawning
>>> process' containerID(s), if applicable.  Include the spawning and
>>> spawned namespace IDs (device and inode number tuples).
>>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>>> Note: At this point it appears only network namespaces may need to track
>>> container IDs apart from processes since incoming packets may cause an
>>> auditable event before being associated with a process.
>>>
>>> Log the destruction of every namespace when it is no longer used by any
>>> process, include the namespace IDs (device and inode number tuples).
>>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>>>
>>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>>> the parent and child namespace IDs for any changes to a process'
>>> namespaces. [setns(2)]
>>> Note: It may be possible to combine AUDIT_NS_* record formats and
>>> distinguish them with an op=$action field depending on the fields
>>> required for each message type.
>>>
>>> When a container ceases to exist because the last process in that
>>> container has exited and hence the last namespace has been destroyed and
>>> its refcount dropping to zero, log the fact.
>>> (This latter is likely needed for certification accountability.)  A
>>> container object may need a list of processes and/or namespaces.
>>>
>>> A namespace cannot directly migrate from one container to another but
>>> could be assigned to a newly spawned container.  A namespace can be
>>> moved from one container to another indirectly by having that namespace
>>> used in a second process in another container and then ending all the
>>> processes in the first container.
>>>
>>> (v2)
>>> - switch from u64 to u128 UUID
>>> - switch from "signal" and "trigger" to "register"
>>> - restrict registration to single process or force all threads and children into same container
>>>
>>> - RGB
> - RGB
>
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17  0:33   ` Richard Guy Briggs
@ 2017-10-17  1:42         ` Steve Grubb
       [not found]     ` <20171017003340.whjdkqmkw4lydwy7-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
  1 sibling, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-17  1:42 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API,
	Linux Containers, Linux FS Devel, Linux Kernel, David Howells,
	Carlos O'Donell, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Eric W. Biederman, Simo Sorce, Linux Network Development,
	Casey Schaufler, Eric Paris, Al Viro

On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:
> On 2017-10-12 16:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> > > Containers are a userspace concept.  The kernel knows nothing of them.
> > > 
> > > The Linux audit system needs a way to be able to track the container
> > > provenance of events and actions.  Audit needs the kernel's help to do
> > > this.
> > > 
> > > Since the concept of a container is entirely a userspace concept, a
> > > registration from the userspace container orchestration system initiates
> > > this.  This will define a point in time and a set of resources
> > > associated with a particular container with an audit container ID.
> > > 
> > > The registration is a pseudo filesystem (proc, since PID tree already
> > > exists) write of a u8[16] UUID representing the container ID to a file
> > > representing a process that will become the first process in a new
> > > container.  This write might place restrictions on mount namespaces
> > > required to define a container, or at least careful checking of
> > > namespaces in the kernel to verify permissions of the orchestrator so it
> > > can't change its own container ID.  A bind mount of nsfs may be
> > > necessary in the container orchestrator's mntNS.
> > > Note: Use a 128-bit scalar rather than a string to make compares faster
> > > and simpler.
> > > 
> > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > > registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> 
> There is such a thing, but the kernel doesn't know about it yet.  This
> same situation exists for loginuid and sessionid which are userspace
> concepts that the kernel tracks for the convenience of userspace.  As
> for its name, I'm not particularly picky, so if you don't like
> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> don't want to give the ability to set a containerID to any process that
> is able to do audit logging (such as vsftpd) and similarly we don't want
> to give the orchestrator the ability to control the setup of the audit
> daemon.

A long time ago, we were debating what should guard against rouge processes 
from setting the loginuid. Casey argued that the ability to set the loginuid 
means they have the ability to control the audit trail. That means that it 
should be guarded by CAP_AUDIT_CONTROL. I think the same logic applies today. 

The ability to arbitrarily set a container ID means the process has the 
ability to indirectly control the audit trail.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-17  1:42         ` Steve Grubb
  0 siblings, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-17  1:42 UTC (permalink / raw)
  To: linux-audit
  Cc: Richard Guy Briggs, Casey Schaufler, mszeredi, Eric W. Biederman,
	Simo Sorce, jlayton, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Eric Paris, David Howells,
	Al Viro, Andy Lutomirski, Linux Network Development,
	Linux FS Devel, cgroups, Serge E. Hallyn, trondmy

On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:
> On 2017-10-12 16:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> > > Containers are a userspace concept.  The kernel knows nothing of them.
> > > 
> > > The Linux audit system needs a way to be able to track the container
> > > provenance of events and actions.  Audit needs the kernel's help to do
> > > this.
> > > 
> > > Since the concept of a container is entirely a userspace concept, a
> > > registration from the userspace container orchestration system initiates
> > > this.  This will define a point in time and a set of resources
> > > associated with a particular container with an audit container ID.
> > > 
> > > The registration is a pseudo filesystem (proc, since PID tree already
> > > exists) write of a u8[16] UUID representing the container ID to a file
> > > representing a process that will become the first process in a new
> > > container.  This write might place restrictions on mount namespaces
> > > required to define a container, or at least careful checking of
> > > namespaces in the kernel to verify permissions of the orchestrator so it
> > > can't change its own container ID.  A bind mount of nsfs may be
> > > necessary in the container orchestrator's mntNS.
> > > Note: Use a 128-bit scalar rather than a string to make compares faster
> > > and simpler.
> > > 
> > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > > registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> 
> There is such a thing, but the kernel doesn't know about it yet.  This
> same situation exists for loginuid and sessionid which are userspace
> concepts that the kernel tracks for the convenience of userspace.  As
> for its name, I'm not particularly picky, so if you don't like
> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> don't want to give the ability to set a containerID to any process that
> is able to do audit logging (such as vsftpd) and similarly we don't want
> to give the orchestrator the ability to control the setup of the audit
> daemon.

A long time ago, we were debating what should guard against rouge processes 
from setting the loginuid. Casey argued that the ability to set the loginuid 
means they have the ability to control the audit trail. That means that it 
should be guarded by CAP_AUDIT_CONTROL. I think the same logic applies today. 

The ability to arbitrarily set a container ID means the process has the 
ability to indirectly control the audit trail.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17  1:42         ` Steve Grubb
  (?)
  (?)
@ 2017-10-17 12:31         ` Simo Sorce
  -1 siblings, 0 replies; 94+ messages in thread
From: Simo Sorce @ 2017-10-17 12:31 UTC (permalink / raw)
  To: Steve Grubb, linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
	Linux FS Devel, Linux Kernel, David Howells, Carlos O'Donell,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	Andy Lutomirski, Linux Network Development, Casey Schaufler,
	Eric Paris, Al Viro

On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
> On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:

> > There is such a thing, but the kernel doesn't know about it
> > yet.  This same situation exists for loginuid and sessionid which
> > are userspace concepts that the kernel tracks for the convenience
> > of userspace.  As for its name, I'm not particularly picky, so if
> > you don't like CAP_CONTAINER_* then I'm fine with
> > CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give
> > the ability to set a containerID to any process that is able to do
> > audit logging (such as vsftpd) and similarly we don't want to give
> > the orchestrator the ability to control the setup of the audit
> > daemon.
> 
> A long time ago, we were debating what should guard against rouge
> processes from setting the loginuid. Casey argued that the ability to
> set the loginuid means they have the ability to control the audit
> trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I
> think the same logic applies today. 

The difference is that with loginuid you needed to give processes able
to audit also the ability to change it. You do not want to tie the
ability to change container ids to the ability to audit. You want to be
able to do audit stuff (within the container) without allowing it to
change the container id.
Of course if we made container id a write-once property maybe there is
no need for controls at all, but I'm pretty sure there will be
situations where write-once may not be usable in practice.

> The ability to arbitrarily set a container ID means the process has
> the ability to indirectly control the audit trail.

The container Id can be used also for authorization purposes (by other
processes on the host), not just audit, I think this is why a separate
control has been proposed.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17  1:42         ` Steve Grubb
  (?)
@ 2017-10-17 12:31         ` Simo Sorce
  2017-10-17 14:59           ` Casey Schaufler
                             ` (2 more replies)
  -1 siblings, 3 replies; 94+ messages in thread
From: Simo Sorce @ 2017-10-17 12:31 UTC (permalink / raw)
  To: Steve Grubb, linux-audit
  Cc: Richard Guy Briggs, Casey Schaufler, mszeredi, Eric W. Biederman,
	jlayton, Carlos O'Donell, Linux API, Linux Containers,
	Linux Kernel, Eric Paris, David Howells, Al Viro,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	cgroups, Serge E. Hallyn, trondmy

On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
> On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:

> > There is such a thing, but the kernel doesn't know about it
> > yet.  This same situation exists for loginuid and sessionid which
> > are userspace concepts that the kernel tracks for the convenience
> > of userspace.  As for its name, I'm not particularly picky, so if
> > you don't like CAP_CONTAINER_* then I'm fine with
> > CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give
> > the ability to set a containerID to any process that is able to do
> > audit logging (such as vsftpd) and similarly we don't want to give
> > the orchestrator the ability to control the setup of the audit
> > daemon.
> 
> A long time ago, we were debating what should guard against rouge
> processes from setting the loginuid. Casey argued that the ability to
> set the loginuid means they have the ability to control the audit
> trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I
> think the same logic applies today. 

The difference is that with loginuid you needed to give processes able
to audit also the ability to change it. You do not want to tie the
ability to change container ids to the ability to audit. You want to be
able to do audit stuff (within the container) without allowing it to
change the container id.
Of course if we made container id a write-once property maybe there is
no need for controls at all, but I'm pretty sure there will be
situations where write-once may not be usable in practice.

> The ability to arbitrarily set a container ID means the process has
> the ability to indirectly control the audit trail.

The container Id can be used also for authorization purposes (by other
processes on the host), not just audit, I think this is why a separate
control has been proposed.

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]           ` <1508243469.6230.24.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-10-17 14:59             ` Casey Schaufler
  2017-10-18 19:58             ` Paul Moore
  1 sibling, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17 14:59 UTC (permalink / raw)
  To: Simo Sorce, Steve Grubb, linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	Eric Paris, Al Viro

On 10/17/2017 5:31 AM, Simo Sorce wrote:
> On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
>> On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:
>>> There is such a thing, but the kernel doesn't know about it
>>> yet.  This same situation exists for loginuid and sessionid which
>>> are userspace concepts that the kernel tracks for the convenience
>>> of userspace.  As for its name, I'm not particularly picky, so if
>>> you don't like CAP_CONTAINER_* then I'm fine with
>>> CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
>>> CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give
>>> the ability to set a containerID to any process that is able to do
>>> audit logging (such as vsftpd) and similarly we don't want to give
>>> the orchestrator the ability to control the setup of the audit
>>> daemon.
>> A long time ago, we were debating what should guard against rouge
>> processes from setting the loginuid. Casey argued that the ability to
>> set the loginuid means they have the ability to control the audit
>> trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I
>> think the same logic applies today. 
> The difference is that with loginuid you needed to give processes able
> to audit also the ability to change it. You do not want to tie the
> ability to change container ids to the ability to audit. You want to be
> able to do audit stuff (within the container) without allowing it to
> change the container id.

Without a *kernel* policy on containerIDs you can't say what
security policy is being exempted. Without that you can't say what
capability is (or isn't) appropriate. You need a reason to have
a capability check that makes sense in the context of the kernel
security policy. Since we don't know what a container is in the
kernel, that's pretty hard. We don't create "fuzzy" capabilities
based on the trendy application behavior of the moment. If the
behavior is not related it audit, there's no reason for it, and
if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
in your application security model I suggest that is where you
need to make changes.


> Of course if we made container id a write-once property maybe there is
> no need for controls at all, but I'm pretty sure there will be
> situations where write-once may not be usable in practice.
>
>> The ability to arbitrarily set a container ID means the process has
>> the ability to indirectly control the audit trail.
> The container Id can be used also for authorization purposes (by other
> processes on the host), not just audit, I think this is why a separate
> control has been proposed.
>
> Simo.
>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 12:31         ` Simo Sorce
@ 2017-10-17 14:59           ` Casey Schaufler
       [not found]             ` <a07968f6-fef1-f49d-01f1-6c660c0ada20-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
  2017-10-18 19:58           ` Paul Moore
       [not found]           ` <1508243469.6230.24.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2 siblings, 1 reply; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17 14:59 UTC (permalink / raw)
  To: Simo Sorce, Steve Grubb, linux-audit
  Cc: Richard Guy Briggs, mszeredi, Eric W. Biederman, jlayton,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Eric Paris, David Howells, Al Viro, Andy Lutomirski,
	Linux Network Development, Linux FS Devel, cgroups,
	Serge E. Hallyn, trondmy

On 10/17/2017 5:31 AM, Simo Sorce wrote:
> On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
>> On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs wrote:
>>> There is such a thing, but the kernel doesn't know about it
>>> yet.  This same situation exists for loginuid and sessionid which
>>> are userspace concepts that the kernel tracks for the convenience
>>> of userspace.  As for its name, I'm not particularly picky, so if
>>> you don't like CAP_CONTAINER_* then I'm fine with
>>> CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
>>> CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to give
>>> the ability to set a containerID to any process that is able to do
>>> audit logging (such as vsftpd) and similarly we don't want to give
>>> the orchestrator the ability to control the setup of the audit
>>> daemon.
>> A long time ago, we were debating what should guard against rouge
>> processes from setting the loginuid. Casey argued that the ability to
>> set the loginuid means they have the ability to control the audit
>> trail. That means that it should be guarded by CAP_AUDIT_CONTROL. I
>> think the same logic applies today. 
> The difference is that with loginuid you needed to give processes able
> to audit also the ability to change it. You do not want to tie the
> ability to change container ids to the ability to audit. You want to be
> able to do audit stuff (within the container) without allowing it to
> change the container id.

Without a *kernel* policy on containerIDs you can't say what
security policy is being exempted. Without that you can't say what
capability is (or isn't) appropriate. You need a reason to have
a capability check that makes sense in the context of the kernel
security policy. Since we don't know what a container is in the
kernel, that's pretty hard. We don't create "fuzzy" capabilities
based on the trendy application behavior of the moment. If the
behavior is not related it audit, there's no reason for it, and
if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
in your application security model I suggest that is where you
need to make changes.


> Of course if we made container id a write-once property maybe there is
> no need for controls at all, but I'm pretty sure there will be
> situations where write-once may not be usable in practice.
>
>> The ability to arbitrarily set a container ID means the process has
>> the ability to indirectly control the audit trail.
> The container Id can be used also for authorization purposes (by other
> processes on the host), not just audit, I think this is why a separate
> control has been proposed.
>
> Simo.
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 14:59           ` Casey Schaufler
       [not found]             ` <a07968f6-fef1-f49d-01f1-6c660c0ada20-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-10-17 15:28                 ` Simo Sorce
  0 siblings, 0 replies; 94+ messages in thread
From: Simo Sorce @ 2017-10-17 15:28 UTC (permalink / raw)
  To: Casey Schaufler, Steve Grubb, linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	Eric Paris, Al Viro

On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote:
> On 10/17/2017 5:31 AM, Simo Sorce wrote:
> > On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
> > > On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs
> > > wrote:
> > > > There is such a thing, but the kernel doesn't know about it
> > > > yet.  This same situation exists for loginuid and sessionid
> > > > which
> > > > are userspace concepts that the kernel tracks for the
> > > > convenience
> > > > of userspace.  As for its name, I'm not particularly picky, so
> > > > if
> > > > you don't like CAP_CONTAINER_* then I'm fine with
> > > > CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> > > > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to
> > > > give
> > > > the ability to set a containerID to any process that is able to
> > > > do
> > > > audit logging (such as vsftpd) and similarly we don't want to
> > > > give
> > > > the orchestrator the ability to control the setup of the audit
> > > > daemon.
> > > 
> > > A long time ago, we were debating what should guard against rouge
> > > processes from setting the loginuid. Casey argued that the
> > > ability to
> > > set the loginuid means they have the ability to control the audit
> > > trail. That means that it should be guarded by CAP_AUDIT_CONTROL.
> > > I
> > > think the same logic applies today. 
> > 
> > The difference is that with loginuid you needed to give processes
> > able
> > to audit also the ability to change it. You do not want to tie the
> > ability to change container ids to the ability to audit. You want
> > to be
> > able to do audit stuff (within the container) without allowing it
> > to
> > change the container id.
> 
> Without a *kernel* policy on containerIDs you can't say what
> security policy is being exempted.

The policy has been basically stated earlier.

A way to track a set of processes from a specific point in time
forward. The name used is "container id", but it could be anything.
This marker is mostly used by user space to track process hierarchies
without races, these processes can be very privileged, and must not be
allowed to change the marker themselves when granted the current common
capabilities.

Is this a good enough description ? If not can you clarify your
expectations ?

>  Without that you can't say what capability is (or isn't)
> appropriate.

See if the above is sufficient please.

> You need a reason to have a capability check that makes sense in the
> context of the kernel security policy.

I think the proposal had a reason, we may debate on whether that reason
is good enough.

> Since we don't know what a container is in the kernel,

Please do not fixate on the word container.

>  that's pretty hard. We don't create "fuzzy" capabilities
> based on the trendy application behavior of the moment. If the
> behavior is not related it audit, there's no reason for it, and
> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
> in your application security model I suggest that is where you
> need to make changes.

The authors of the proposal came to the conclusion that kernel
assistance is needed. It would be nice to discuss the merits of it.
If you do not understand why the request has been made it would be more
useful to ask specific questions to understand what and why is the ask.

Pushing back is fine, if you have understood the problem and have valid
arguments against a kernel level solution (and possibly suggestions for
a working user space solution), otherwise you are not adding value to
the discussion. 

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-17 15:28                 ` Simo Sorce
  0 siblings, 0 replies; 94+ messages in thread
From: Simo Sorce @ 2017-10-17 15:28 UTC (permalink / raw)
  To: Casey Schaufler, Steve Grubb, linux-audit
  Cc: Richard Guy Briggs, mszeredi, Eric W. Biederman, jlayton,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Eric Paris, David Howells, Al Viro, Andy Lutomirski,
	Linux Network Development, Linux FS Devel, cgroups,
	Serge E. Hallyn, trondmy

On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote:
> On 10/17/2017 5:31 AM, Simo Sorce wrote:
> > On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
> > > On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs
> > > wrote:
> > > > There is such a thing, but the kernel doesn't know about it
> > > > yet.  This same situation exists for loginuid and sessionid
> > > > which
> > > > are userspace concepts that the kernel tracks for the
> > > > convenience
> > > > of userspace.  As for its name, I'm not particularly picky, so
> > > > if
> > > > you don't like CAP_CONTAINER_* then I'm fine with
> > > > CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> > > > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to
> > > > give
> > > > the ability to set a containerID to any process that is able to
> > > > do
> > > > audit logging (such as vsftpd) and similarly we don't want to
> > > > give
> > > > the orchestrator the ability to control the setup of the audit
> > > > daemon.
> > > 
> > > A long time ago, we were debating what should guard against rouge
> > > processes from setting the loginuid. Casey argued that the
> > > ability to
> > > set the loginuid means they have the ability to control the audit
> > > trail. That means that it should be guarded by CAP_AUDIT_CONTROL.
> > > I
> > > think the same logic applies today. 
> > 
> > The difference is that with loginuid you needed to give processes
> > able
> > to audit also the ability to change it. You do not want to tie the
> > ability to change container ids to the ability to audit. You want
> > to be
> > able to do audit stuff (within the container) without allowing it
> > to
> > change the container id.
> 
> Without a *kernel* policy on containerIDs you can't say what
> security policy is being exempted.

The policy has been basically stated earlier.

A way to track a set of processes from a specific point in time
forward. The name used is "container id", but it could be anything.
This marker is mostly used by user space to track process hierarchies
without races, these processes can be very privileged, and must not be
allowed to change the marker themselves when granted the current common
capabilities.

Is this a good enough description ? If not can you clarify your
expectations ?

>  Without that you can't say what capability is (or isn't)
> appropriate.

See if the above is sufficient please.

> You need a reason to have a capability check that makes sense in the
> context of the kernel security policy.

I think the proposal had a reason, we may debate on whether that reason
is good enough.

> Since we don't know what a container is in the kernel,

Please do not fixate on the word container.

>  that's pretty hard. We don't create "fuzzy" capabilities
> based on the trendy application behavior of the moment. If the
> behavior is not related it audit, there's no reason for it, and
> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
> in your application security model I suggest that is where you
> need to make changes.

The authors of the proposal came to the conclusion that kernel
assistance is needed. It would be nice to discuss the merits of it.
If you do not understand why the request has been made it would be more
useful to ask specific questions to understand what and why is the ask.

Pushing back is fine, if you have understood the problem and have valid
arguments against a kernel level solution (and possibly suggestions for
a working user space solution), otherwise you are not adding value to
the discussion. 

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-17 15:28                 ` Simo Sorce
  0 siblings, 0 replies; 94+ messages in thread
From: Simo Sorce @ 2017-10-17 15:28 UTC (permalink / raw)
  To: Casey Schaufler, Steve Grubb, linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	Eric Paris, Al Viro

On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote:
> On 10/17/2017 5:31 AM, Simo Sorce wrote:
> > On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
> > > On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs
> > > wrote:
> > > > There is such a thing, but the kernel doesn't know about it
> > > > yet.  This same situation exists for loginuid and sessionid
> > > > which
> > > > are userspace concepts that the kernel tracks for the
> > > > convenience
> > > > of userspace.  As for its name, I'm not particularly picky, so
> > > > if
> > > > you don't like CAP_CONTAINER_* then I'm fine with
> > > > CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
> > > > CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to
> > > > give
> > > > the ability to set a containerID to any process that is able to
> > > > do
> > > > audit logging (such as vsftpd) and similarly we don't want to
> > > > give
> > > > the orchestrator the ability to control the setup of the audit
> > > > daemon.
> > > 
> > > A long time ago, we were debating what should guard against rouge
> > > processes from setting the loginuid. Casey argued that the
> > > ability to
> > > set the loginuid means they have the ability to control the audit
> > > trail. That means that it should be guarded by CAP_AUDIT_CONTROL.
> > > I
> > > think the same logic applies today. 
> > 
> > The difference is that with loginuid you needed to give processes
> > able
> > to audit also the ability to change it. You do not want to tie the
> > ability to change container ids to the ability to audit. You want
> > to be
> > able to do audit stuff (within the container) without allowing it
> > to
> > change the container id.
> 
> Without a *kernel* policy on containerIDs you can't say what
> security policy is being exempted.

The policy has been basically stated earlier.

A way to track a set of processes from a specific point in time
forward. The name used is "container id", but it could be anything.
This marker is mostly used by user space to track process hierarchies
without races, these processes can be very privileged, and must not be
allowed to change the marker themselves when granted the current common
capabilities.

Is this a good enough description ? If not can you clarify your
expectations ?

>  Without that you can't say what capability is (or isn't)
> appropriate.

See if the above is sufficient please.

> You need a reason to have a capability check that makes sense in the
> context of the kernel security policy.

I think the proposal had a reason, we may debate on whether that reason
is good enough.

> Since we don't know what a container is in the kernel,

Please do not fixate on the word container.

>  that's pretty hard. We don't create "fuzzy" capabilities
> based on the trendy application behavior of the moment. If the
> behavior is not related it audit, there's no reason for it, and
> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
> in your application security model I suggest that is where you
> need to make changes.

The authors of the proposal came to the conclusion that kernel
assistance is needed. It would be nice to discuss the merits of it.
If you do not understand why the request has been made it would be more
useful to ask specific questions to understand what and why is the ask.

Pushing back is fine, if you have understood the problem and have valid
arguments against a kernel level solution (and possibly suggestions for
a working user space solution), otherwise you are not adding value to
the discussion. 

Simo.

-- 
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                 ` <1508254120.6230.34.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-10-17 15:44                   ` James Bottomley
  2017-10-17 16:10                   ` Casey Schaufler
  1 sibling, 0 replies; 94+ messages in thread
From: James Bottomley @ 2017-10-17 15:44 UTC (permalink / raw)
  To: Simo Sorce, Casey Schaufler, Steve Grubb,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Eric Paris, David Howells, Linux Network Development,
	Eric W. Biederman, Andy Lutomirski,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux FS Devel,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Al Viro

On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
> > Without a *kernel* policy on containerIDs you can't say what
> > security policy is being exempted.
> 
> The policy has been basically stated earlier.
> 
> A way to track a set of processes from a specific point in time
> forward. The name used is "container id", but it could be anything.
> This marker is mostly used by user space to track process hierarchies
> without races, these processes can be very privileged, and must not
> be allowed to change the marker themselves when granted the current
> common capabilities.
> 
> Is this a good enough description ? If not can you clarify your
> expectations ?

I think you mean you want to be able to apply a label to a process
which is inherited across forks.  The label should only be susceptible
to modification by something possessing a capability (which one TBD).
 The idea is that processes spawned into a container would be labelled
by the container orchestration system.  It's unclear what should happen
to processes using nsenter after the fact, but policy for that should
be up to the orchestration system.

The label will be used as a tag for audit information.

I think you were missing label inheritance above.

The security implications are that anything that can change the label
could also hide itself and its doings from the audit system and thus
would be used as a means to evade detection.  I actually think this
means the label should be write once (once you've set it, you can't
change it) and orchestration systems should begin as unlabelled
processes allowing them to do arbitrary forks.

For nested containers, I actually think the label should be
hierarchical, so you can add a label for the new nested container but
it still also contains its parents label as well.

James

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                 ` <1508254120.6230.34.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-10-17 15:44                   ` James Bottomley
  2017-10-17 16:10                   ` Casey Schaufler
  1 sibling, 0 replies; 94+ messages in thread
From: James Bottomley @ 2017-10-17 15:44 UTC (permalink / raw)
  To: Simo Sorce, Casey Schaufler, Steve Grubb, linux-audit
  Cc: mszeredi, trondmy, jlayton, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell, cgroups,
	Eric W. Biederman, Andy Lutomirski, Linux Network Development,
	Linux FS Devel, Eric Paris, Al Viro

On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
> > Without a *kernel* policy on containerIDs you can't say what
> > security policy is being exempted.
> 
> The policy has been basically stated earlier.
> 
> A way to track a set of processes from a specific point in time
> forward. The name used is "container id", but it could be anything.
> This marker is mostly used by user space to track process hierarchies
> without races, these processes can be very privileged, and must not
> be allowed to change the marker themselves when granted the current
> common capabilities.
> 
> Is this a good enough description ? If not can you clarify your
> expectations ?

I think you mean you want to be able to apply a label to a process
which is inherited across forks.  The label should only be susceptible
to modification by something possessing a capability (which one TBD).
 The idea is that processes spawned into a container would be labelled
by the container orchestration system.  It's unclear what should happen
to processes using nsenter after the fact, but policy for that should
be up to the orchestration system.

The label will be used as a tag for audit information.

I think you were missing label inheritance above.

The security implications are that anything that can change the label
could also hide itself and its doings from the audit system and thus
would be used as a means to evade detection.  I actually think this
means the label should be write once (once you've set it, you can't
change it) and orchestration systems should begin as unlabelled
processes allowing them to do arbitrary forks.

For nested containers, I actually think the label should be
hierarchical, so you can add a label for the new nested container but
it still also contains its parents label as well.

James

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-17 15:44                   ` James Bottomley
  0 siblings, 0 replies; 94+ messages in thread
From: James Bottomley @ 2017-10-17 15:44 UTC (permalink / raw)
  To: Simo Sorce, Casey Schaufler, Steve Grubb,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	Eric Paris, Al Viro

On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
> > Without a *kernel* policy on containerIDs you can't say what
> > security policy is being exempted.
> 
> The policy has been basically stated earlier.
> 
> A way to track a set of processes from a specific point in time
> forward. The name used is "container id", but it could be anything.
> This marker is mostly used by user space to track process hierarchies
> without races, these processes can be very privileged, and must not
> be allowed to change the marker themselves when granted the current
> common capabilities.
> 
> Is this a good enough description ? If not can you clarify your
> expectations ?

I think you mean you want to be able to apply a label to a process
which is inherited across forks.  The label should only be susceptible
to modification by something possessing a capability (which one TBD).
 The idea is that processes spawned into a container would be labelled
by the container orchestration system.  It's unclear what should happen
to processes using nsenter after the fact, but policy for that should
be up to the orchestration system.

The label will be used as a tag for audit information.

I think you were missing label inheritance above.

The security implications are that anything that can change the label
could also hide itself and its doings from the audit system and thus
would be used as a means to evade detection.  I actually think this
means the label should be write once (once you've set it, you can't
change it) and orchestration systems should begin as unlabelled
processes allowing them to do arbitrary forks.

For nested containers, I actually think the label should be
hierarchical, so you can add a label for the new nested container but
it still also contains its parents label as well.

James

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                 ` <1508254120.6230.34.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-10-17 15:44                   ` James Bottomley
@ 2017-10-17 16:10                   ` Casey Schaufler
  1 sibling, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17 16:10 UTC (permalink / raw)
  To: Simo Sorce, Steve Grubb, linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	Eric Paris, Al Viro

On 10/17/2017 8:28 AM, Simo Sorce wrote:
> On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote:
>> On 10/17/2017 5:31 AM, Simo Sorce wrote:
>>> On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
>>>> On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs
>>>> wrote:
>>>>> There is such a thing, but the kernel doesn't know about it
>>>>> yet.  This same situation exists for loginuid and sessionid
>>>>> which
>>>>> are userspace concepts that the kernel tracks for the
>>>>> convenience
>>>>> of userspace.  As for its name, I'm not particularly picky, so
>>>>> if
>>>>> you don't like CAP_CONTAINER_* then I'm fine with
>>>>> CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
>>>>> CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to
>>>>> give
>>>>> the ability to set a containerID to any process that is able to
>>>>> do
>>>>> audit logging (such as vsftpd) and similarly we don't want to
>>>>> give
>>>>> the orchestrator the ability to control the setup of the audit
>>>>> daemon.
>>>> A long time ago, we were debating what should guard against rouge
>>>> processes from setting the loginuid. Casey argued that the
>>>> ability to
>>>> set the loginuid means they have the ability to control the audit
>>>> trail. That means that it should be guarded by CAP_AUDIT_CONTROL.
>>>> I
>>>> think the same logic applies today. 
>>> The difference is that with loginuid you needed to give processes
>>> able
>>> to audit also the ability to change it. You do not want to tie the
>>> ability to change container ids to the ability to audit. You want
>>> to be
>>> able to do audit stuff (within the container) without allowing it
>>> to
>>> change the container id.
>> Without a *kernel* policy on containerIDs you can't say what
>> security policy is being exempted.
> The policy has been basically stated earlier.

No. The expected user space behavior has been stated.

> A way to track a set of processes from a specific point in time
> forward. The name used is "container id", but it could be anything.

Then you want Jose Bollo's PTAGS. It's insane to add yet another
arbitrary ID to the task for a special purpose. Add a general tagging
mechanism instead. We could add a gazillion new id's, each with it's
own capability if we head down this road.

> This marker is mostly used by user space to track process hierarchies
> without races, these processes can be very privileged, and must not be
> allowed to change the marker themselves when granted the current common
> capabilities.

Let's be clear. What happens in user space stays in user space.
The kernel does not give a fig about user space policy. There has
to be a kernel policy involved that a capability can exempt.

> Is this a good enough description ? If not can you clarify your
> expectations ?

The kernel enforces kernel policy. Capabilities provide a mechanism
to mark a process as exempt from some aspect of kernel policy. If
you don't have a kernel policy, you don't get a capability. Clear?

>
>>  Without that you can't say what capability is (or isn't)
>> appropriate.
> See if the above is sufficient please.
>
>> You need a reason to have a capability check that makes sense in the
>> context of the kernel security policy.
> I think the proposal had a reason, we may debate on whether that reason
> is good enough.
>
>> Since we don't know what a container is in the kernel,
> Please do not fixate on the word container.
>
>>  that's pretty hard. We don't create "fuzzy" capabilities
>> based on the trendy application behavior of the moment. If the
>> behavior is not related it audit, there's no reason for it, and
>> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
>> in your application security model I suggest that is where you
>> need to make changes.
> The authors of the proposal came to the conclusion that kernel
> assistance is needed. It would be nice to discuss the merits of it.
> If you do not understand why the request has been made it would be more
> useful to ask specific questions to understand what and why is the ask.

I understand pretty darn well.

> Pushing back is fine, if you have understood the problem and have valid
> arguments against a kernel level solution (and possibly suggestions for
> a working user space solution), otherwise you are not adding value to
> the discussion.

The presumption is that the request is reasonable. Adding a capability
in support of an undefined behavior is unreasonable. Based on the discussion,
CAP_AUDIT_CONTROL is completely rational. I understand that it would be
difficult to support your application privilege model. I would like to look
into helping out with that, but have too many burning knives in the air
just now.

>
> Simo.
>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 15:28                 ` Simo Sorce
                                   ` (3 preceding siblings ...)
  (?)
@ 2017-10-17 16:10                 ` Casey Schaufler
  -1 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17 16:10 UTC (permalink / raw)
  To: Simo Sorce, Steve Grubb, linux-audit
  Cc: Richard Guy Briggs, mszeredi, Eric W. Biederman, jlayton,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Eric Paris, David Howells, Al Viro, Andy Lutomirski,
	Linux Network Development, Linux FS Devel, cgroups,
	Serge E. Hallyn, trondmy

On 10/17/2017 8:28 AM, Simo Sorce wrote:
> On Tue, 2017-10-17 at 07:59 -0700, Casey Schaufler wrote:
>> On 10/17/2017 5:31 AM, Simo Sorce wrote:
>>> On Mon, 2017-10-16 at 21:42 -0400, Steve Grubb wrote:
>>>> On Monday, October 16, 2017 8:33:40 PM EDT Richard Guy Briggs
>>>> wrote:
>>>>> There is such a thing, but the kernel doesn't know about it
>>>>> yet.  This same situation exists for loginuid and sessionid
>>>>> which
>>>>> are userspace concepts that the kernel tracks for the
>>>>> convenience
>>>>> of userspace.  As for its name, I'm not particularly picky, so
>>>>> if
>>>>> you don't like CAP_CONTAINER_* then I'm fine with
>>>>> CAP_AUDIT_CONTAINERID.  It really needs to be distinct from
>>>>> CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we don't want to
>>>>> give
>>>>> the ability to set a containerID to any process that is able to
>>>>> do
>>>>> audit logging (such as vsftpd) and similarly we don't want to
>>>>> give
>>>>> the orchestrator the ability to control the setup of the audit
>>>>> daemon.
>>>> A long time ago, we were debating what should guard against rouge
>>>> processes from setting the loginuid. Casey argued that the
>>>> ability to
>>>> set the loginuid means they have the ability to control the audit
>>>> trail. That means that it should be guarded by CAP_AUDIT_CONTROL.
>>>> I
>>>> think the same logic applies today. 
>>> The difference is that with loginuid you needed to give processes
>>> able
>>> to audit also the ability to change it. You do not want to tie the
>>> ability to change container ids to the ability to audit. You want
>>> to be
>>> able to do audit stuff (within the container) without allowing it
>>> to
>>> change the container id.
>> Without a *kernel* policy on containerIDs you can't say what
>> security policy is being exempted.
> The policy has been basically stated earlier.

No. The expected user space behavior has been stated.

> A way to track a set of processes from a specific point in time
> forward. The name used is "container id", but it could be anything.

Then you want Jose Bollo's PTAGS. It's insane to add yet another
arbitrary ID to the task for a special purpose. Add a general tagging
mechanism instead. We could add a gazillion new id's, each with it's
own capability if we head down this road.

> This marker is mostly used by user space to track process hierarchies
> without races, these processes can be very privileged, and must not be
> allowed to change the marker themselves when granted the current common
> capabilities.

Let's be clear. What happens in user space stays in user space.
The kernel does not give a fig about user space policy. There has
to be a kernel policy involved that a capability can exempt.

> Is this a good enough description ? If not can you clarify your
> expectations ?

The kernel enforces kernel policy. Capabilities provide a mechanism
to mark a process as exempt from some aspect of kernel policy. If
you don't have a kernel policy, you don't get a capability. Clear?

>
>>  Without that you can't say what capability is (or isn't)
>> appropriate.
> See if the above is sufficient please.
>
>> You need a reason to have a capability check that makes sense in the
>> context of the kernel security policy.
> I think the proposal had a reason, we may debate on whether that reason
> is good enough.
>
>> Since we don't know what a container is in the kernel,
> Please do not fixate on the word container.
>
>>  that's pretty hard. We don't create "fuzzy" capabilities
>> based on the trendy application behavior of the moment. If the
>> behavior is not related it audit, there's no reason for it, and
>> if it is, CAP_AUDIT_CONTROL works just fine. If this doesn't work
>> in your application security model I suggest that is where you
>> need to make changes.
> The authors of the proposal came to the conclusion that kernel
> assistance is needed. It would be nice to discuss the merits of it.
> If you do not understand why the request has been made it would be more
> useful to ask specific questions to understand what and why is the ask.

I understand pretty darn well.

> Pushing back is fine, if you have understood the problem and have valid
> arguments against a kernel level solution (and possibly suggestions for
> a working user space solution), otherwise you are not adding value to
> the discussion.

The presumption is that the request is reasonable. Adding a capability
in support of an undefined behavior is unreasonable. Based on the discussion,
CAP_AUDIT_CONTROL is completely rational. I understand that it would be
difficult to support your application privilege model. I would like to look
into helping out with that, but have too many burning knives in the air
just now.

>
> Simo.
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                   ` <1508255091.3129.27.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
@ 2017-10-17 16:43                     ` Casey Schaufler
  2017-10-18 20:56                       ` Paul Moore
  1 sibling, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17 16:43 UTC (permalink / raw)
  To: James Bottomley, Simo Sorce, Steve Grubb,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Eric Paris, David Howells, Linux Network Development,
	Eric W. Biederman, Andy Lutomirski,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux FS Devel,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Al Viro

On 10/17/2017 8:44 AM, James Bottomley wrote:
> On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
>>> Without a *kernel* policy on containerIDs you can't say what
>>> security policy is being exempted.
>> The policy has been basically stated earlier.
>>
>> A way to track a set of processes from a specific point in time
>> forward. The name used is "container id", but it could be anything.
>> This marker is mostly used by user space to track process hierarchies
>> without races, these processes can be very privileged, and must not
>> be allowed to change the marker themselves when granted the current
>> common capabilities.
>>
>> Is this a good enough description ? If not can you clarify your
>> expectations ?
> I think you mean you want to be able to apply a label to a process
> which is inherited across forks.

That would be PTAGS. I agree that such a general mechanism
could be very useful for a variety of purposes, not just
containers. I do not agree that a single integer (e.g. a
containerID) warrants more than trivial mechanism.

> The label should only be susceptible
> to modification by something possessing a capability (which one TBD).

I think that the reason we're going to have crying and gnashing
of teeth is that whatever capability is used. There will always be
an issue of the capability granted being less specific than the
application security model would like.

And no, we're not going down the 330 capabilities road. It's been
done in the UNIX world. Application security models hate that
just as much as they hate the coarser granularity.

> The idea is that processes spawned into a container would be labelled
> by the container orchestration system.  It's unclear what should happen
> to processes using nsenter after the fact, but policy for that should
> be up to the orchestration system.

I'm fine with that. The user space policy can be anything y'all like.

> The label will be used as a tag for audit information.

Deep breath ...

Which *is* a kernel security policy mechanism. Since the "label"
is part of the audit information that the kernel is guaranteeing
changing it would be covered by CAP_AUDIT_CONTROL. If the kernel
does not use the "label" for any other purpose this is the only
capability that makes sense for it.

> I think you were missing label inheritance above.
>
> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  

Yes. This is a consequence of the capability granularity. There is
no way we can make the capability granularity sufficiently fine to
prevent this. No one wants the 330 capabilities that Data General
had in their secure UNIX system. 

> I actually think this
> means the label should be write once (once you've set it, you can't
> change it) and orchestration systems should begin as unlabelled
> processes allowing them to do arbitrary forks.
>
> For nested containers, I actually think the label should be
> hierarchical, so you can add a label for the new nested container but
> it still also contains its parents label as well.

You can't support this reasonably with a single containerID.
You want PTAGS for this. I know that there is resistance to
requiring anything beyond what's in the base kernel (and for
good reasons) for containers. Especially something that is
pending future work. But let's not jam something into the base
kernel that isn't really going to address the issue.

> James

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 15:44                   ` James Bottomley
  (?)
@ 2017-10-17 16:43                   ` Casey Schaufler
  2017-10-17 17:15                     ` Steve Grubb
       [not found]                     ` <eb96144d-4ab5-7f9f-de18-b296db35a00a-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
  -1 siblings, 2 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-17 16:43 UTC (permalink / raw)
  To: James Bottomley, Simo Sorce, Steve Grubb, linux-audit
  Cc: mszeredi, trondmy, jlayton, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell, cgroups,
	Eric W. Biederman, Andy Lutomirski, Linux Network Development,
	Linux FS Devel, Eric Paris, Al Viro

On 10/17/2017 8:44 AM, James Bottomley wrote:
> On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
>>> Without a *kernel* policy on containerIDs you can't say what
>>> security policy is being exempted.
>> The policy has been basically stated earlier.
>>
>> A way to track a set of processes from a specific point in time
>> forward. The name used is "container id", but it could be anything.
>> This marker is mostly used by user space to track process hierarchies
>> without races, these processes can be very privileged, and must not
>> be allowed to change the marker themselves when granted the current
>> common capabilities.
>>
>> Is this a good enough description ? If not can you clarify your
>> expectations ?
> I think you mean you want to be able to apply a label to a process
> which is inherited across forks.

That would be PTAGS. I agree that such a general mechanism
could be very useful for a variety of purposes, not just
containers. I do not agree that a single integer (e.g. a
containerID) warrants more than trivial mechanism.

> The label should only be susceptible
> to modification by something possessing a capability (which one TBD).

I think that the reason we're going to have crying and gnashing
of teeth is that whatever capability is used. There will always be
an issue of the capability granted being less specific than the
application security model would like.

And no, we're not going down the 330 capabilities road. It's been
done in the UNIX world. Application security models hate that
just as much as they hate the coarser granularity.

> The idea is that processes spawned into a container would be labelled
> by the container orchestration system.  It's unclear what should happen
> to processes using nsenter after the fact, but policy for that should
> be up to the orchestration system.

I'm fine with that. The user space policy can be anything y'all like.

> The label will be used as a tag for audit information.

Deep breath ...

Which *is* a kernel security policy mechanism. Since the "label"
is part of the audit information that the kernel is guaranteeing
changing it would be covered by CAP_AUDIT_CONTROL. If the kernel
does not use the "label" for any other purpose this is the only
capability that makes sense for it.

> I think you were missing label inheritance above.
>
> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  

Yes. This is a consequence of the capability granularity. There is
no way we can make the capability granularity sufficiently fine to
prevent this. No one wants the 330 capabilities that Data General
had in their secure UNIX system. 

> I actually think this
> means the label should be write once (once you've set it, you can't
> change it) and orchestration systems should begin as unlabelled
> processes allowing them to do arbitrary forks.
>
> For nested containers, I actually think the label should be
> hierarchical, so you can add a label for the new nested container but
> it still also contains its parents label as well.

You can't support this reasonably with a single containerID.
You want PTAGS for this. I know that there is resistance to
requiring anything beyond what's in the base kernel (and for
good reasons) for containers. Especially something that is
pending future work. But let's not jam something into the base
kernel that isn't really going to address the issue.

> James

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                     ` <eb96144d-4ab5-7f9f-de18-b296db35a00a-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-10-17 17:15                       ` Steve Grubb
  0 siblings, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-17 17:15 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, David Howells, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Eric Paris, James Bottomley,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	Simo Sorce, cgroups-u79uwXL29TY76Z2rM5mHXA, Linux FS Devel,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Linux Network Development,
	Al Viro

On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote:
> > The idea is that processes spawned into a container would be labelled
> > by the container orchestration system.  It's unclear what should happen
> > to processes using nsenter after the fact, but policy for that should
> > be up to the orchestration system.
> 
> I'm fine with that. The user space policy can be anything y'all like.

I think there should be a login event.


> > The label will be used as a tag for audit information.
> 
> Deep breath ...
> 
> Which *is* a kernel security policy mechanism. Since the "label"
> is part of the audit information that the kernel is guaranteeing
> changing it would be covered by CAP_AUDIT_CONTROL. If the kernel
> does not use the "label" for any other purpose this is the only
> capability that makes sense for it.

I agree. The ability to set the container label grants the ability to evade 
rules or modify audit rules. CAP_AUDIT_CONTROL makes sense to me.


> > I think you were missing label inheritance above.
> > 
> > The security implications are that anything that can change the label
> > could also hide itself and its doings from the audit system and thus
> > would be used as a means to evade detection.

Yes. We have the same problem with loginuid. There are restrictions on who can 
change it once set. And then we made an immutable flag so that people that 
want a hard guarantee can get that.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 16:43                   ` Casey Schaufler
@ 2017-10-17 17:15                     ` Steve Grubb
  2017-10-17 17:57                         ` James Bottomley
       [not found]                     ` <eb96144d-4ab5-7f9f-de18-b296db35a00a-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
  1 sibling, 1 reply; 94+ messages in thread
From: Steve Grubb @ 2017-10-17 17:15 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: James Bottomley, Simo Sorce, linux-audit, mszeredi, trondmy,
	jlayton, Linux API, Linux Containers, Linux Kernel,
	David Howells, Carlos O'Donell, cgroups, Eric W. Biederman,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	Eric Paris, Al Viro

On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote:
> > The idea is that processes spawned into a container would be labelled
> > by the container orchestration system.  It's unclear what should happen
> > to processes using nsenter after the fact, but policy for that should
> > be up to the orchestration system.
> 
> I'm fine with that. The user space policy can be anything y'all like.

I think there should be a login event.


> > The label will be used as a tag for audit information.
> 
> Deep breath ...
> 
> Which *is* a kernel security policy mechanism. Since the "label"
> is part of the audit information that the kernel is guaranteeing
> changing it would be covered by CAP_AUDIT_CONTROL. If the kernel
> does not use the "label" for any other purpose this is the only
> capability that makes sense for it.

I agree. The ability to set the container label grants the ability to evade 
rules or modify audit rules. CAP_AUDIT_CONTROL makes sense to me.


> > I think you were missing label inheritance above.
> > 
> > The security implications are that anything that can change the label
> > could also hide itself and its doings from the audit system and thus
> > would be used as a means to evade detection.

Yes. We have the same problem with loginuid. There are restrictions on who can 
change it once set. And then we made an immutable flag so that people that 
want a hard guarantee can get that.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 17:15                     ` Steve Grubb
@ 2017-10-17 17:57                         ` James Bottomley
  0 siblings, 0 replies; 94+ messages in thread
From: James Bottomley @ 2017-10-17 17:57 UTC (permalink / raw)
  To: Steve Grubb, Casey Schaufler
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Al Viro, David Howells,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	Andy Lutomirski, cgroups-u79uwXL29TY76Z2rM5mHXA, Linux FS Devel,
	Eric Paris, Linux Network Development,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On Tue, 2017-10-17 at 13:15 -0400, Steve Grubb wrote:
> On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote:
> > 
> > > 
> > > The idea is that processes spawned into a container would be
> > > labelled by the container orchestration system.  It's unclear
> > > what should happen to processes using nsenter after the fact, but
> > > policy for that should be up to the orchestration system.
> > 
> > I'm fine with that. The user space policy can be anything y'all
> > like.
> 
> I think there should be a login event.

I thought you wanted this for containers?  Container creation doesn't
have login events.  In an unprivileged orchestration system it may be
hard to synthetically manufacture them.

James

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-17 17:57                         ` James Bottomley
  0 siblings, 0 replies; 94+ messages in thread
From: James Bottomley @ 2017-10-17 17:57 UTC (permalink / raw)
  To: Steve Grubb, Casey Schaufler
  Cc: mszeredi, David Howells, Andy Lutomirski, jlayton,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Eric Paris, linux-audit, Eric W. Biederman, Simo Sorce, cgroups,
	Linux FS Devel, trondmy, Linux Network Development, Al Viro

On Tue, 2017-10-17 at 13:15 -0400, Steve Grubb wrote:
> On Tuesday, October 17, 2017 12:43:18 PM EDT Casey Schaufler wrote:
> > 
> > > 
> > > The idea is that processes spawned into a container would be
> > > labelled by the container orchestration system.  It's unclear
> > > what should happen to processes using nsenter after the fact, but
> > > policy for that should be up to the orchestration system.
> > 
> > I'm fine with that. The user space policy can be anything y'all
> > like.
> 
> I think there should be a login event.

I thought you wanted this for containers?  Container creation doesn't
have login events.  In an unprivileged orchestration system it may be
hard to synthetically manufacture them.

James

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 17:57                         ` James Bottomley
@ 2017-10-18  0:23                             ` Steve Grubb
  -1 siblings, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-18  0:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Al Viro, David Howells,
	Linux FS Devel, linux-audit-H+wXaHxf7aLQT0dZR+AlfA,
	Eric W. Biederman, Andy Lutomirski,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Casey Schaufler, Eric Paris,
	Linux Network Development, trondmy-7I+n7zu2hftEKMMhf/gKZA

On Tuesday, October 17, 2017 1:57:43 PM EDT James Bottomley wrote:
> > > > The idea is that processes spawned into a container would be
> > > > labelled by the container orchestration system.  It's unclear
> > > > what should happen to processes using nsenter after the fact, but
> > > > policy for that should be up to the orchestration system.
> > > 
> > > I'm fine with that. The user space policy can be anything y'all
> > > like.
> > 
> > I think there should be a login event.
> 
> I thought you wanted this for containers?  Container creation doesn't
> have login events.  In an unprivileged orchestration system it may be
> hard to synthetically manufacture them.

I realize this. This work is very similar to problems we've solved 12 years 
ago. We'll figure out what the right name is for it down the road. But the 
concept is the same. If something enters a container, we need to know about 
it. It needs to get tagged and be associated with the container. The way this 
was solved for the loginuid problem was to add a session identifier so that 
new logins of the same loginuid can coexist and we can trace actions back to a 
specific login. I'd think we can apply lessons learned from a while back to 
make container identification act similarly.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-18  0:23                             ` Steve Grubb
  0 siblings, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-18  0:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Casey Schaufler, mszeredi, David Howells, Andy Lutomirski,
	jlayton, Carlos O'Donell, Linux API, Linux Containers,
	Linux Kernel, Eric Paris, linux-audit, Eric W. Biederman,
	Simo Sorce, cgroups, Linux FS Devel, trondmy,
	Linux Network Development, Al Viro

On Tuesday, October 17, 2017 1:57:43 PM EDT James Bottomley wrote:
> > > > The idea is that processes spawned into a container would be
> > > > labelled by the container orchestration system.  It's unclear
> > > > what should happen to processes using nsenter after the fact, but
> > > > policy for that should be up to the orchestration system.
> > > 
> > > I'm fine with that. The user space policy can be anything y'all
> > > like.
> > 
> > I think there should be a login event.
> 
> I thought you wanted this for containers?  Container creation doesn't
> have login events.  In an unprivileged orchestration system it may be
> hard to synthetically manufacture them.

I realize this. This work is very similar to problems we've solved 12 years 
ago. We'll figure out what the right name is for it down the road. But the 
concept is the same. If something enters a container, we need to know about 
it. It needs to get tagged and be associated with the container. The way this 
was solved for the loginuid problem was to add a session identifier so that 
new logins of the same loginuid can coexist and we can trace actions back to a 
specific login. I'd think we can apply lessons learned from a while back to 
make container identification act similarly.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]           ` <1508243469.6230.24.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-10-17 14:59             ` Casey Schaufler
@ 2017-10-18 19:58             ` Paul Moore
  1 sibling, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-18 19:58 UTC (permalink / raw)
  To: Simo Sorce
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Containers,
	Linux Kernel, Eric Paris, Al Viro, Howells, Carlos O'Donell,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	Lutomirski, Linux Network Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Steve Grubb

On Tue, Oct 17, 2017 at 8:31 AM, Simo Sorce <simo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> The container Id can be used also for authorization purposes (by other
> processes on the host), not just audit, I think this is why a separate
> control has been proposed.

Apologies, but I'm just now getting a chance to work my way through
this thread, and I wanted to make a quick comment on this point ...

The audit container ID (note I said "audit container ID" not
"container ID") is intended strictly for use by the audit subsystem at
this point.  Allowing other uses opens the door to a larger set of
problems we are trying to avoid (e.g. handling migration across
hosts).  We would love to have a generic kernel facility that the
audit subsystem could use to identify containers, but we don't, and
previous attempts have failed, so we have to create our own.  We are
intentionally trying to limit its scope in an attempt to limit
problems.  If a more general solution appears in the future I think we
would make every effect to migrate to that; keeping this initial
effort small should make that easier.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 12:31         ` Simo Sorce
  2017-10-17 14:59           ` Casey Schaufler
@ 2017-10-18 19:58           ` Paul Moore
       [not found]           ` <1508243469.6230.24.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-18 19:58 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Steve Grubb, linux-audit, mszeredi, cgroups, jlayton,
	Richard Guy Briggs, Linux API, Containers, Linux FS Devel,
	Linux Kernel, Howells, Carlos O'Donell,
	Linux Network Development, Eric W. Biederman, Lutomirski,
	Eric Paris, Serge E. Hallyn, trondmy, Al Viro

On Tue, Oct 17, 2017 at 8:31 AM, Simo Sorce <simo@redhat.com> wrote:
> The container Id can be used also for authorization purposes (by other
> processes on the host), not just audit, I think this is why a separate
> control has been proposed.

Apologies, but I'm just now getting a chance to work my way through
this thread, and I wanted to make a quick comment on this point ...

The audit container ID (note I said "audit container ID" not
"container ID") is intended strictly for use by the audit subsystem at
this point.  Allowing other uses opens the door to a larger set of
problems we are trying to avoid (e.g. handling migration across
hosts).  We would love to have a generic kernel facility that the
audit subsystem could use to identify containers, but we don't, and
previous attempts have failed, so we have to create our own.  We are
intentionally trying to limit its scope in an attempt to limit
problems.  If a more general solution appears in the future I think we
would make every effect to migrate to that; keeping this initial
effort small should make that easier.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-17 15:44                   ` James Bottomley
@ 2017-10-18 20:56                       ` Paul Moore
  -1 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-18 20:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mszeredi-H+wXaHxf7aLQT0dZR+AlfA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, API, Linux Containers, Linux Kernel, Viro,
	David Howells, Linux FS Devel,
	linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	Simo Sorce, Development, Casey Schaufler, Eric Paris,
	Steve Grubb, trondmy-7I+n7zu2hftEKMMhf/gKZA

On Tue, Oct 17, 2017 at 11:44 AM, James Bottomley
<James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote:
> On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
>> > Without a *kernel* policy on containerIDs you can't say what
>> > security policy is being exempted.
>>
>> The policy has been basically stated earlier.
>>
>> A way to track a set of processes from a specific point in time
>> forward. The name used is "container id", but it could be anything.
>> This marker is mostly used by user space to track process hierarchies
>> without races, these processes can be very privileged, and must not
>> be allowed to change the marker themselves when granted the current
>> common capabilities.
>>
>> Is this a good enough description ? If not can you clarify your
>> expectations ?
>
> I think you mean you want to be able to apply a label to a process
> which is inherited across forks.  The label should only be susceptible
> to modification by something possessing a capability (which one TBD).
> The idea is that processes spawned into a container would be labelled
> by the container orchestration system.  It's unclear what should happen
> to processes using nsenter after the fact, but policy for that should
> be up to the orchestration system.
>
> The label will be used as a tag for audit information.
>
> I think you were missing label inheritance above.

That is a pretty good summary of what we want to do, and what Richard
and I have discussed while brainstorming this offline.  The details
may not have translated well into those initial emails from Richard,
but I think you've got the idea, even if some of the smaller details
are still TBD.  FWIW, right now I'm not as worried about the exact
capability or the size of the audit container ID, I think those things
will sort themselves out as we progress through the implementation,
especially once we get to the next stage when we start to allow copies
of the audit records to be routed to audit daemons running inside
containers (note well that I said "copies", the host system still sees
all).

> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  I actually think this
> means the label should be write once (once you've set it, you can't
> change it) ...

Richard and I have talked about a write once approach, but the
thinking was that you may want to allow a nested container
orchestrator (Why? I don't know, but people always want to do the
craziest things.) and a write-once policy makes that impossible.  If
we punt on the nested orchestrator, I believe we can seriously think
about a write-once policy to simplify things.

A bit off topic, but I've also wondered about not even implementing
read access, just to help ensure the audit container ID wouldn't be
abused, but I'm not sure how practical that will be.  Something else
to sort out during the RFC phase of the implementation with the
container orchestrators.

> ... and orchestration systems should begin as unlabelled
> processes allowing them to do arbitrary forks.

My current thinking is that the default state is to start unlabeled (I
just vomited a little into my SELinux hat); in other words
init/systemd/PID-1 in the host system starts with an "unset" audit
container ID.  This not only helps define the host system (anything
that has an unset audit container ID) but provides a blank slate for
the orchestrator(s).

> For nested containers, I actually think the label should be
> hierarchical, so you can add a label for the new nested container but
> it still also contains its parents label as well.

I haven't made up my mind on this completely just yet, but I'm
currently of the mindset that supporting multiple audit container IDs
on a given process is not a good idea.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-18 20:56                       ` Paul Moore
  0 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-18 20:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Simo Sorce, Casey Schaufler, Steve Grubb, linux-audit, mszeredi,
	jlayton, Carlos O'Donell, API, Linux Containers,
	Linux Kernel, Eric Paris, David Howells, Linux FS Devel,
	Development, Eric W. Biederman, Andy Lutomirski, cgroups,
	trondmy, Viro

On Tue, Oct 17, 2017 at 11:44 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2017-10-17 at 11:28 -0400, Simo Sorce wrote:
>> > Without a *kernel* policy on containerIDs you can't say what
>> > security policy is being exempted.
>>
>> The policy has been basically stated earlier.
>>
>> A way to track a set of processes from a specific point in time
>> forward. The name used is "container id", but it could be anything.
>> This marker is mostly used by user space to track process hierarchies
>> without races, these processes can be very privileged, and must not
>> be allowed to change the marker themselves when granted the current
>> common capabilities.
>>
>> Is this a good enough description ? If not can you clarify your
>> expectations ?
>
> I think you mean you want to be able to apply a label to a process
> which is inherited across forks.  The label should only be susceptible
> to modification by something possessing a capability (which one TBD).
> The idea is that processes spawned into a container would be labelled
> by the container orchestration system.  It's unclear what should happen
> to processes using nsenter after the fact, but policy for that should
> be up to the orchestration system.
>
> The label will be used as a tag for audit information.
>
> I think you were missing label inheritance above.

That is a pretty good summary of what we want to do, and what Richard
and I have discussed while brainstorming this offline.  The details
may not have translated well into those initial emails from Richard,
but I think you've got the idea, even if some of the smaller details
are still TBD.  FWIW, right now I'm not as worried about the exact
capability or the size of the audit container ID, I think those things
will sort themselves out as we progress through the implementation,
especially once we get to the next stage when we start to allow copies
of the audit records to be routed to audit daemons running inside
containers (note well that I said "copies", the host system still sees
all).

> The security implications are that anything that can change the label
> could also hide itself and its doings from the audit system and thus
> would be used as a means to evade detection.  I actually think this
> means the label should be write once (once you've set it, you can't
> change it) ...

Richard and I have talked about a write once approach, but the
thinking was that you may want to allow a nested container
orchestrator (Why? I don't know, but people always want to do the
craziest things.) and a write-once policy makes that impossible.  If
we punt on the nested orchestrator, I believe we can seriously think
about a write-once policy to simplify things.

A bit off topic, but I've also wondered about not even implementing
read access, just to help ensure the audit container ID wouldn't be
abused, but I'm not sure how practical that will be.  Something else
to sort out during the RFC phase of the implementation with the
container orchestrators.

> ... and orchestration systems should begin as unlabelled
> processes allowing them to do arbitrary forks.

My current thinking is that the default state is to start unlabeled (I
just vomited a little into my SELinux hat); in other words
init/systemd/PID-1 in the host system starts with an "unset" audit
container ID.  This not only helps define the host system (anything
that has an unset audit container ID) but provides a blank slate for
the orchestrator(s).

> For nested containers, I actually think the label should be
> hierarchical, so you can add a label for the new nested container but
> it still also contains its parents label as well.

I haven't made up my mind on this completely just yet, but I'm
currently of the mindset that supporting multiple audit container IDs
on a given process is not a good idea.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-18 20:56                       ` Paul Moore
@ 2017-10-18 23:46                           ` Aleksa Sarai
  -1 siblings, 0 replies; 94+ messages in thread
From: Aleksa Sarai @ 2017-10-18 23:46 UTC (permalink / raw)
  To: Paul Moore, James Bottomley
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Simo Sorce, jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	API, Linux Containers, Linux Kernel, Eric Paris, David Howells,
	Casey Schaufler, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Viro,
	Andy Lutomirski, Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Steve Grubb, Eric W. Biederman

>> The security implications are that anything that can change the label
>> could also hide itself and its doings from the audit system and thus
>> would be used as a means to evade detection.  I actually think this
>> means the label should be write once (once you've set it, you can't
>> change it) ...
> 
> Richard and I have talked about a write once approach, but the
> thinking was that you may want to allow a nested container
> orchestrator (Why? I don't know, but people always want to do the
> craziest things.) and a write-once policy makes that impossible.  If
> we punt on the nested orchestrator, I believe we can seriously think
> about a write-once policy to simplify things.

Nested containers are a very widely used use-case (see LXC system 
containers, inside of which people run other container runtimes). So I 
would definitely consider it something that "needs to be supported in 
some way". While the LXC guys might be a *tad* crazy, the use-case isn't. :P

>> ... and orchestration systems should begin as unlabelled
>> processes allowing them to do arbitrary forks.
> 
> My current thinking is that the default state is to start unlabeled (I
> just vomited a little into my SELinux hat); in other words
> init/systemd/PID-1 in the host system starts with an "unset" audit
> container ID.  This not only helps define the host system (anything
> that has an unset audit container ID) but provides a blank slate for
> the orchestrator(s).
> 
>> For nested containers, I actually think the label should be
>> hierarchical, so you can add a label for the new nested container but
>> it still also contains its parents label as well.
> 
> I haven't made up my mind on this completely just yet, but I'm
> currently of the mindset that supporting multiple audit container IDs
> on a given process is not a good idea.

As long as creating a new "container" (that is, changing a process's 
"audit container ID") is an audit event then I think that having a 
hierarchy be explicit is not necessary (userspace audit can figure out 
the hierarchy quite easily -- but also there are cases where thinking of 
it as being hierarchical isn't necessarily correct).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-18 23:46                           ` Aleksa Sarai
  0 siblings, 0 replies; 94+ messages in thread
From: Aleksa Sarai @ 2017-10-18 23:46 UTC (permalink / raw)
  To: Paul Moore, James Bottomley
  Cc: cgroups, mszeredi, Andy Lutomirski, jlayton, Carlos O'Donell,
	API, Linux Containers, Linux Kernel, Viro, David Howells,
	Linux FS Devel, linux-audit, Eric W. Biederman, Simo Sorce,
	Development, Casey Schaufler, Eric Paris, Steve Grubb, trondmy

>> The security implications are that anything that can change the label
>> could also hide itself and its doings from the audit system and thus
>> would be used as a means to evade detection.  I actually think this
>> means the label should be write once (once you've set it, you can't
>> change it) ...
> 
> Richard and I have talked about a write once approach, but the
> thinking was that you may want to allow a nested container
> orchestrator (Why? I don't know, but people always want to do the
> craziest things.) and a write-once policy makes that impossible.  If
> we punt on the nested orchestrator, I believe we can seriously think
> about a write-once policy to simplify things.

Nested containers are a very widely used use-case (see LXC system 
containers, inside of which people run other container runtimes). So I 
would definitely consider it something that "needs to be supported in 
some way". While the LXC guys might be a *tad* crazy, the use-case isn't. :P

>> ... and orchestration systems should begin as unlabelled
>> processes allowing them to do arbitrary forks.
> 
> My current thinking is that the default state is to start unlabeled (I
> just vomited a little into my SELinux hat); in other words
> init/systemd/PID-1 in the host system starts with an "unset" audit
> container ID.  This not only helps define the host system (anything
> that has an unset audit container ID) but provides a blank slate for
> the orchestrator(s).
> 
>> For nested containers, I actually think the label should be
>> hierarchical, so you can add a label for the new nested container but
>> it still also contains its parents label as well.
> 
> I haven't made up my mind on this completely just yet, but I'm
> currently of the mindset that supporting multiple audit container IDs
> on a given process is not a good idea.

As long as creating a new "container" (that is, changing a process's 
"audit container ID") is an audit event then I think that having a 
hierarchy be explicit is not necessary (userspace audit can figure out 
the hierarchy quite easily -- but also there are cases where thinking of 
it as being hierarchical isn't necessarily correct).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]       ` <81c15928-c445-fb8e-251c-bee566fbbf58-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-10-19  0:05         ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-19  0:05 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Eric Paris, David Howells,
	Linux Audit, Al Viro, Andy Lutomirski, Linux Network Development,
	Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On 2017-10-17 01:10, Casey Schaufler wrote:
> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
> > On 2017-10-12 16:33, Casey Schaufler wrote:
> >> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >>> Containers are a userspace concept.  The kernel knows nothing of them.
> >>>
> >>> The Linux audit system needs a way to be able to track the container
> >>> provenance of events and actions.  Audit needs the kernel's help to do
> >>> this.
> >>>
> >>> Since the concept of a container is entirely a userspace concept, a
> >>> registration from the userspace container orchestration system initiates
> >>> this.  This will define a point in time and a set of resources
> >>> associated with a particular container with an audit container ID.
> >>>
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>>
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> Hang on. If containers are a user space concept, how can
> >> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> >> a container, how can you be asking for a capability to manage
> >> them?
> > There is such a thing, but the kernel doesn't know about it yet.
> 
> Then how can it be the kernel's place to control access to a
> container resource, that is, the containerID.

Ok, let me try to address your objections.

The kernel can know enough that if it is already set to not allow it to
be set again.  Or if the user doesn't have permission to set it that the
user be denied this action.  How is this different from loginuid and
sessionid?
> 
> >   This
> > same situation exists for loginuid and sessionid which are userspace
> > concepts that the kernel tracks for the convenience of userspace.
> 
> Ah, no. Loginuid identifies a user, which is a kernel concept in
> that a user is defined by the uid.

This simple explanation doesn't help me.  What makes that a kernel
concept?  The fact that it is stored and compared in more than one
place?

> The session ID has well defined kernel semantics. You're trying to say
> that the containerID is an opaque value that is meaningless to the
> kernel, but you still want the kernel to protect it. How can the
> kernel know if it is protecting it correctly?

How so?  A userspace process triggers this.  Does the kernel know what
these values mean?  Does it do anything with them other than report
them or allow audit to filter them?  It is given some instructions on
how to treat it.

This is what we're trying to do with the containerID.

> >   As
> > for its name, I'm not particularly picky, so if you don't like
> > CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> > needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> > don't want to give the ability to set a containerID to any process that
> > is able to do audit logging (such as vsftpd) and similarly we don't want
> > to give the orchestrator the ability to control the setup of the audit
> > daemon.
> 
> Sorry, but what aspect of the kernel security policy is this
> capability supposed to protect? That's what capabilities are
> for, not the undefined support of undefined user-space behavior.

Similarly, loginuids and sessionIDs are only used for audit tracking and
filtering.

> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
> more than audit behavior you have to define what system security
> policy you're dealing with in order to pick the right capability.

It isn't audit behaviour (yet), it is audit reporting information, a
level above simply writing logs and a level below controlling daemon
behaviour.

> We get this request pretty regularly. "I need my own capability
> because I have a niche thing that isn't part of the system security
> policy but that is important!" Fit the containerID into the
> system security policy, and if that results in using CAP_SYS_ADMIN,
> oh well.

There's far too much piled in to CAP_SYS_ADMIN already, which is making
capabilites less and less useful.  I realize that capabilities are
limited compared with netlink message types, but this falls in between
the abilities needed by CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE.

I'll continue on Steve Grubb's comment...

> >>>   At that time, record the target container's user-supplied
> >>> container identifier along with the target container's first process
> >>> (which may become the target container's "init" process) process ID
> >>> (referenced from the initial PID namespace), all namespace IDs (in the
> >>> form of a nsfs device number and inode number tuple) in a new auxilliary
> >>> record AUDIT_CONTAINER with a qualifying op=$action field.
> >>>
> >>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> >>> container ID present on an auditable action or event.
> >>>
> >>> Forked and cloned processes inherit their parent's container ID,
> >>> referenced in the process' task_struct.
> >>>
> >>> Mimic setns(2) and return an error if the process has already initiated
> >>> threading or forked since this registration should happen before the
> >>> process execution is started by the orchestrator and hence should not
> >>> yet have any threads or children.  If this is deemed overly restrictive,
> >>> switch all threads and children to the new containerID.
> >>>
> >>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> >>>
> >>> Log the creation of every namespace, inheriting/adding its spawning
> >>> process' containerID(s), if applicable.  Include the spawning and
> >>> spawned namespace IDs (device and inode number tuples).
> >>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> >>> Note: At this point it appears only network namespaces may need to track
> >>> container IDs apart from processes since incoming packets may cause an
> >>> auditable event before being associated with a process.
> >>>
> >>> Log the destruction of every namespace when it is no longer used by any
> >>> process, include the namespace IDs (device and inode number tuples).
> >>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >>>
> >>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> >>> the parent and child namespace IDs for any changes to a process'
> >>> namespaces. [setns(2)]
> >>> Note: It may be possible to combine AUDIT_NS_* record formats and
> >>> distinguish them with an op=$action field depending on the fields
> >>> required for each message type.
> >>>
> >>> When a container ceases to exist because the last process in that
> >>> container has exited and hence the last namespace has been destroyed and
> >>> its refcount dropping to zero, log the fact.
> >>> (This latter is likely needed for certification accountability.)  A
> >>> container object may need a list of processes and/or namespaces.
> >>>
> >>> A namespace cannot directly migrate from one container to another but
> >>> could be assigned to a newly spawned container.  A namespace can be
> >>> moved from one container to another indirectly by having that namespace
> >>> used in a second process in another container and then ending all the
> >>> processes in the first container.
> >>>
> >>> (v2)
> >>> - switch from u64 to u128 UUID
> >>> - switch from "signal" and "trigger" to "register"
> >>> - restrict registration to single process or force all threads and children into same container
> >>>
> >>> - RGB
> > - RGB
> >
> > --
> > Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Sr. S/W Engineer, Kernel Security, Base Operating Systems
> > Remote, Ottawa, Red Hat Canada
> > IRC: rgb, SunRaycer
> > Voice: +1.647.777.2635, Internal: (81) 32635
> >
> 

- RGB

--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]       ` <81c15928-c445-fb8e-251c-bee566fbbf58-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-10-19  0:05         ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-19  0:05 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	mszeredi, Andy Lutomirski, jlayton, Carlos O'Donell, Al Viro,
	David Howells, Simo Sorce, trondmy, Eric Paris, Serge E. Hallyn,
	Eric W. Biederman

On 2017-10-17 01:10, Casey Schaufler wrote:
> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
> > On 2017-10-12 16:33, Casey Schaufler wrote:
> >> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >>> Containers are a userspace concept.  The kernel knows nothing of them.
> >>>
> >>> The Linux audit system needs a way to be able to track the container
> >>> provenance of events and actions.  Audit needs the kernel's help to do
> >>> this.
> >>>
> >>> Since the concept of a container is entirely a userspace concept, a
> >>> registration from the userspace container orchestration system initiates
> >>> this.  This will define a point in time and a set of resources
> >>> associated with a particular container with an audit container ID.
> >>>
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>>
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> Hang on. If containers are a user space concept, how can
> >> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> >> a container, how can you be asking for a capability to manage
> >> them?
> > There is such a thing, but the kernel doesn't know about it yet.
> 
> Then how can it be the kernel's place to control access to a
> container resource, that is, the containerID.

Ok, let me try to address your objections.

The kernel can know enough that if it is already set to not allow it to
be set again.  Or if the user doesn't have permission to set it that the
user be denied this action.  How is this different from loginuid and
sessionid?
> 
> >   This
> > same situation exists for loginuid and sessionid which are userspace
> > concepts that the kernel tracks for the convenience of userspace.
> 
> Ah, no. Loginuid identifies a user, which is a kernel concept in
> that a user is defined by the uid.

This simple explanation doesn't help me.  What makes that a kernel
concept?  The fact that it is stored and compared in more than one
place?

> The session ID has well defined kernel semantics. You're trying to say
> that the containerID is an opaque value that is meaningless to the
> kernel, but you still want the kernel to protect it. How can the
> kernel know if it is protecting it correctly?

How so?  A userspace process triggers this.  Does the kernel know what
these values mean?  Does it do anything with them other than report
them or allow audit to filter them?  It is given some instructions on
how to treat it.

This is what we're trying to do with the containerID.

> >   As
> > for its name, I'm not particularly picky, so if you don't like
> > CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> > needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> > don't want to give the ability to set a containerID to any process that
> > is able to do audit logging (such as vsftpd) and similarly we don't want
> > to give the orchestrator the ability to control the setup of the audit
> > daemon.
> 
> Sorry, but what aspect of the kernel security policy is this
> capability supposed to protect? That's what capabilities are
> for, not the undefined support of undefined user-space behavior.

Similarly, loginuids and sessionIDs are only used for audit tracking and
filtering.

> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
> more than audit behavior you have to define what system security
> policy you're dealing with in order to pick the right capability.

It isn't audit behaviour (yet), it is audit reporting information, a
level above simply writing logs and a level below controlling daemon
behaviour.

> We get this request pretty regularly. "I need my own capability
> because I have a niche thing that isn't part of the system security
> policy but that is important!" Fit the containerID into the
> system security policy, and if that results in using CAP_SYS_ADMIN,
> oh well.

There's far too much piled in to CAP_SYS_ADMIN already, which is making
capabilites less and less useful.  I realize that capabilities are
limited compared with netlink message types, but this falls in between
the abilities needed by CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE.

I'll continue on Steve Grubb's comment...

> >>>   At that time, record the target container's user-supplied
> >>> container identifier along with the target container's first process
> >>> (which may become the target container's "init" process) process ID
> >>> (referenced from the initial PID namespace), all namespace IDs (in the
> >>> form of a nsfs device number and inode number tuple) in a new auxilliary
> >>> record AUDIT_CONTAINER with a qualifying op=$action field.
> >>>
> >>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> >>> container ID present on an auditable action or event.
> >>>
> >>> Forked and cloned processes inherit their parent's container ID,
> >>> referenced in the process' task_struct.
> >>>
> >>> Mimic setns(2) and return an error if the process has already initiated
> >>> threading or forked since this registration should happen before the
> >>> process execution is started by the orchestrator and hence should not
> >>> yet have any threads or children.  If this is deemed overly restrictive,
> >>> switch all threads and children to the new containerID.
> >>>
> >>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> >>>
> >>> Log the creation of every namespace, inheriting/adding its spawning
> >>> process' containerID(s), if applicable.  Include the spawning and
> >>> spawned namespace IDs (device and inode number tuples).
> >>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> >>> Note: At this point it appears only network namespaces may need to track
> >>> container IDs apart from processes since incoming packets may cause an
> >>> auditable event before being associated with a process.
> >>>
> >>> Log the destruction of every namespace when it is no longer used by any
> >>> process, include the namespace IDs (device and inode number tuples).
> >>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >>>
> >>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> >>> the parent and child namespace IDs for any changes to a process'
> >>> namespaces. [setns(2)]
> >>> Note: It may be possible to combine AUDIT_NS_* record formats and
> >>> distinguish them with an op=$action field depending on the fields
> >>> required for each message type.
> >>>
> >>> When a container ceases to exist because the last process in that
> >>> container has exited and hence the last namespace has been destroyed and
> >>> its refcount dropping to zero, log the fact.
> >>> (This latter is likely needed for certification accountability.)  A
> >>> container object may need a list of processes and/or namespaces.
> >>>
> >>> A namespace cannot directly migrate from one container to another but
> >>> could be assigned to a newly spawned container.  A namespace can be
> >>> moved from one container to another indirectly by having that namespace
> >>> used in a second process in another container and then ending all the
> >>> processes in the first container.
> >>>
> >>> (v2)
> >>> - switch from u64 to u128 UUID
> >>> - switch from "signal" and "trigger" to "register"
> >>> - restrict registration to single process or force all threads and children into same container
> >>>
> >>> - RGB
> > - RGB
> >
> > --
> > Richard Guy Briggs <rgb@redhat.com>
> > Sr. S/W Engineer, Kernel Security, Base Operating Systems
> > Remote, Ottawa, Red Hat Canada
> > IRC: rgb, SunRaycer
> > Voice: +1.647.777.2635, Internal: (81) 32635
> >
> 

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-19  0:05         ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-19  0:05 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development, mszeredi-H+wXaHxf7aLQT0dZR+AlfA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Al Viro, David Howells, Simo Sorce,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Serge E. Hallyn,
	Eric W. Biederman

On 2017-10-17 01:10, Casey Schaufler wrote:
> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
> > On 2017-10-12 16:33, Casey Schaufler wrote:
> >> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >>> Containers are a userspace concept.  The kernel knows nothing of them.
> >>>
> >>> The Linux audit system needs a way to be able to track the container
> >>> provenance of events and actions.  Audit needs the kernel's help to do
> >>> this.
> >>>
> >>> Since the concept of a container is entirely a userspace concept, a
> >>> registration from the userspace container orchestration system initiates
> >>> this.  This will define a point in time and a set of resources
> >>> associated with a particular container with an audit container ID.
> >>>
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>>
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> Hang on. If containers are a user space concept, how can
> >> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> >> a container, how can you be asking for a capability to manage
> >> them?
> > There is such a thing, but the kernel doesn't know about it yet.
> 
> Then how can it be the kernel's place to control access to a
> container resource, that is, the containerID.

Ok, let me try to address your objections.

The kernel can know enough that if it is already set to not allow it to
be set again.  Or if the user doesn't have permission to set it that the
user be denied this action.  How is this different from loginuid and
sessionid?
> 
> >   This
> > same situation exists for loginuid and sessionid which are userspace
> > concepts that the kernel tracks for the convenience of userspace.
> 
> Ah, no. Loginuid identifies a user, which is a kernel concept in
> that a user is defined by the uid.

This simple explanation doesn't help me.  What makes that a kernel
concept?  The fact that it is stored and compared in more than one
place?

> The session ID has well defined kernel semantics. You're trying to say
> that the containerID is an opaque value that is meaningless to the
> kernel, but you still want the kernel to protect it. How can the
> kernel know if it is protecting it correctly?

How so?  A userspace process triggers this.  Does the kernel know what
these values mean?  Does it do anything with them other than report
them or allow audit to filter them?  It is given some instructions on
how to treat it.

This is what we're trying to do with the containerID.

> >   As
> > for its name, I'm not particularly picky, so if you don't like
> > CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
> > needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
> > don't want to give the ability to set a containerID to any process that
> > is able to do audit logging (such as vsftpd) and similarly we don't want
> > to give the orchestrator the ability to control the setup of the audit
> > daemon.
> 
> Sorry, but what aspect of the kernel security policy is this
> capability supposed to protect? That's what capabilities are
> for, not the undefined support of undefined user-space behavior.

Similarly, loginuids and sessionIDs are only used for audit tracking and
filtering.

> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
> more than audit behavior you have to define what system security
> policy you're dealing with in order to pick the right capability.

It isn't audit behaviour (yet), it is audit reporting information, a
level above simply writing logs and a level below controlling daemon
behaviour.

> We get this request pretty regularly. "I need my own capability
> because I have a niche thing that isn't part of the system security
> policy but that is important!" Fit the containerID into the
> system security policy, and if that results in using CAP_SYS_ADMIN,
> oh well.

There's far too much piled in to CAP_SYS_ADMIN already, which is making
capabilites less and less useful.  I realize that capabilities are
limited compared with netlink message types, but this falls in between
the abilities needed by CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE.

I'll continue on Steve Grubb's comment...

> >>>   At that time, record the target container's user-supplied
> >>> container identifier along with the target container's first process
> >>> (which may become the target container's "init" process) process ID
> >>> (referenced from the initial PID namespace), all namespace IDs (in the
> >>> form of a nsfs device number and inode number tuple) in a new auxilliary
> >>> record AUDIT_CONTAINER with a qualifying op=$action field.
> >>>
> >>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> >>> container ID present on an auditable action or event.
> >>>
> >>> Forked and cloned processes inherit their parent's container ID,
> >>> referenced in the process' task_struct.
> >>>
> >>> Mimic setns(2) and return an error if the process has already initiated
> >>> threading or forked since this registration should happen before the
> >>> process execution is started by the orchestrator and hence should not
> >>> yet have any threads or children.  If this is deemed overly restrictive,
> >>> switch all threads and children to the new containerID.
> >>>
> >>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> >>>
> >>> Log the creation of every namespace, inheriting/adding its spawning
> >>> process' containerID(s), if applicable.  Include the spawning and
> >>> spawned namespace IDs (device and inode number tuples).
> >>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> >>> Note: At this point it appears only network namespaces may need to track
> >>> container IDs apart from processes since incoming packets may cause an
> >>> auditable event before being associated with a process.
> >>>
> >>> Log the destruction of every namespace when it is no longer used by any
> >>> process, include the namespace IDs (device and inode number tuples).
> >>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >>>
> >>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> >>> the parent and child namespace IDs for any changes to a process'
> >>> namespaces. [setns(2)]
> >>> Note: It may be possible to combine AUDIT_NS_* record formats and
> >>> distinguish them with an op=$action field depending on the fields
> >>> required for each message type.
> >>>
> >>> When a container ceases to exist because the last process in that
> >>> container has exited and hence the last namespace has been destroyed and
> >>> its refcount dropping to zero, log the fact.
> >>> (This latter is likely needed for certification accountability.)  A
> >>> container object may need a list of processes and/or namespaces.
> >>>
> >>> A namespace cannot directly migrate from one container to another but
> >>> could be assigned to a newly spawned container.  A namespace can be
> >>> moved from one container to another indirectly by having that namespace
> >>> used in a second process in another container and then ending all the
> >>> processes in the first container.
> >>>
> >>> (v2)
> >>> - switch from u64 to u128 UUID
> >>> - switch from "signal" and "trigger" to "register"
> >>> - restrict registration to single process or force all threads and children into same container
> >>>
> >>> - RGB
> > - RGB
> >
> > --
> > Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Sr. S/W Engineer, Kernel Security, Base Operating Systems
> > Remote, Ottawa, Red Hat Canada
> > IRC: rgb, SunRaycer
> > Voice: +1.647.777.2635, Internal: (81) 32635
> >
> 

- RGB

--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                           ` <49752b6f-8a77-d1e5-8acb-5a1eed0a992c-l3A5Bk7waGM@public.gmane.org>
@ 2017-10-19  0:43                             ` Eric W. Biederman
  0 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-19  0:43 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Simo Sorce, mszeredi-H+wXaHxf7aLQT0dZR+AlfA, David Howells,
	Paul Moore, jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	API, Linux Containers, Linux Kernel, Eric Paris, James Bottomley,
	Casey Schaufler, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Viro,
	Andy Lutomirski, Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Steve Grubb,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> writes:

>>> The security implications are that anything that can change the label
>>> could also hide itself and its doings from the audit system and thus
>>> would be used as a means to evade detection.  I actually think this
>>> means the label should be write once (once you've set it, you can't
>>> change it) ...
>>
>> Richard and I have talked about a write once approach, but the
>> thinking was that you may want to allow a nested container
>> orchestrator (Why? I don't know, but people always want to do the
>> craziest things.) and a write-once policy makes that impossible.  If
>> we punt on the nested orchestrator, I believe we can seriously think
>> about a write-once policy to simplify things.
>
> Nested containers are a very widely used use-case (see LXC system containers,
> inside of which people run other container runtimes). So I would definitely
> consider it something that "needs to be supported in some way". While the LXC
> guys might be a *tad* crazy, the use-case isn't. :P

Of course some of that gets to running auditd inside a container which
we don't have yet either.

So I think to start it is perfectly fine to figure out the non-nested
case first and what makes sense there.  Then to sort out the nested
container case.

The solution might be that a process gets at most one id per ``audit
namespace''.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                           ` <49752b6f-8a77-d1e5-8acb-5a1eed0a992c-l3A5Bk7waGM@public.gmane.org>
@ 2017-10-19  0:43                             ` Eric W. Biederman
  0 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-19  0:43 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Paul Moore, James Bottomley, cgroups, mszeredi, Andy Lutomirski,
	jlayton, Carlos O'Donell, API, Linux Containers,
	Linux Kernel, Viro, David Howells, Linux FS Devel, linux-audit,
	Simo Sorce, Development, Casey Schaufler, Eric Paris,
	Steve Grubb, trondmy

Aleksa Sarai <asarai@suse.de> writes:

>>> The security implications are that anything that can change the label
>>> could also hide itself and its doings from the audit system and thus
>>> would be used as a means to evade detection.  I actually think this
>>> means the label should be write once (once you've set it, you can't
>>> change it) ...
>>
>> Richard and I have talked about a write once approach, but the
>> thinking was that you may want to allow a nested container
>> orchestrator (Why? I don't know, but people always want to do the
>> craziest things.) and a write-once policy makes that impossible.  If
>> we punt on the nested orchestrator, I believe we can seriously think
>> about a write-once policy to simplify things.
>
> Nested containers are a very widely used use-case (see LXC system containers,
> inside of which people run other container runtimes). So I would definitely
> consider it something that "needs to be supported in some way". While the LXC
> guys might be a *tad* crazy, the use-case isn't. :P

Of course some of that gets to running auditd inside a container which
we don't have yet either.

So I think to start it is perfectly fine to figure out the non-nested
case first and what makes sense there.  Then to sort out the nested
container case.

The solution might be that a process gets at most one id per ``audit
namespace''.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-19  0:43                             ` Eric W. Biederman
  0 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-19  0:43 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Paul Moore, James Bottomley, cgroups-u79uwXL29TY76Z2rM5mHXA,
	mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Viro, David Howells,
	Linux FS Devel, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Simo Sorce,
	Development, Casey Schaufler, Eric Paris, Steve Grubb,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> writes:

>>> The security implications are that anything that can change the label
>>> could also hide itself and its doings from the audit system and thus
>>> would be used as a means to evade detection.  I actually think this
>>> means the label should be write once (once you've set it, you can't
>>> change it) ...
>>
>> Richard and I have talked about a write once approach, but the
>> thinking was that you may want to allow a nested container
>> orchestrator (Why? I don't know, but people always want to do the
>> craziest things.) and a write-once policy makes that impossible.  If
>> we punt on the nested orchestrator, I believe we can seriously think
>> about a write-once policy to simplify things.
>
> Nested containers are a very widely used use-case (see LXC system containers,
> inside of which people run other container runtimes). So I would definitely
> consider it something that "needs to be supported in some way". While the LXC
> guys might be a *tad* crazy, the use-case isn't. :P

Of course some of that gets to running auditd inside a container which
we don't have yet either.

So I think to start it is perfectly fine to figure out the non-nested
case first and what makes sense there.  Then to sort out the nested
container case.

The solution might be that a process gets at most one id per ``audit
namespace''.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]         ` <20171019000527.eio6dfsmujmtioyt-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2017-10-19 13:32           ` Casey Schaufler
  0 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-19 13:32 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Linux Kernel, Eric Paris, David Howells,
	Linux Audit, Al Viro, Andy Lutomirski, Linux Network Development,
	Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
> On 2017-10-17 01:10, Casey Schaufler wrote:
>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
>>> On 2017-10-12 16:33, Casey Schaufler wrote:
>>>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>>>
>>>>> The Linux audit system needs a way to be able to track the container
>>>>> provenance of events and actions.  Audit needs the kernel's help to do
>>>>> this.
>>>>>
>>>>> Since the concept of a container is entirely a userspace concept, a
>>>>> registration from the userspace container orchestration system initiates
>>>>> this.  This will define a point in time and a set of resources
>>>>> associated with a particular container with an audit container ID.
>>>>>
>>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>>> representing a process that will become the first process in a new
>>>>> container.  This write might place restrictions on mount namespaces
>>>>> required to define a container, or at least careful checking of
>>>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>>> necessary in the container orchestrator's mntNS.
>>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>>> and simpler.
>>>>>
>>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>>> registration.
>>>> Hang on. If containers are a user space concept, how can
>>>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>>>> a container, how can you be asking for a capability to manage
>>>> them?
>>> There is such a thing, but the kernel doesn't know about it yet.
>> Then how can it be the kernel's place to control access to a
>> container resource, that is, the containerID.
> Ok, let me try to address your objections.
>
> The kernel can know enough that if it is already set to not allow it to
> be set again.  Or if the user doesn't have permission to set it that the
> user be denied this action.  How is this different from loginuid and
> sessionid?
>>>   This
>>> same situation exists for loginuid and sessionid which are userspace
>>> concepts that the kernel tracks for the convenience of userspace.
>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>> that a user is defined by the uid.
> This simple explanation doesn't help me.  What makes that a kernel
> concept?  The fact that it is stored and compared in more than one
> place?
>
>> The session ID has well defined kernel semantics. You're trying to say
>> that the containerID is an opaque value that is meaningless to the
>> kernel, but you still want the kernel to protect it. How can the
>> kernel know if it is protecting it correctly?
> How so?  A userspace process triggers this.  Does the kernel know what
> these values mean?  Does it do anything with them other than report
> them or allow audit to filter them?  It is given some instructions on
> how to treat it.
>
> This is what we're trying to do with the containerID.
>
>>>   As
>>> for its name, I'm not particularly picky, so if you don't like
>>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
>>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
>>> don't want to give the ability to set a containerID to any process that
>>> is able to do audit logging (such as vsftpd) and similarly we don't want
>>> to give the orchestrator the ability to control the setup of the audit
>>> daemon.
>> Sorry, but what aspect of the kernel security policy is this
>> capability supposed to protect? That's what capabilities are
>> for, not the undefined support of undefined user-space behavior.
> Similarly, loginuids and sessionIDs are only used for audit tracking and
> filtering.

Tell me again why you're not reusing either of these?

>
>> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
>> more than audit behavior you have to define what system security
>> policy you're dealing with in order to pick the right capability.
> It isn't audit behaviour (yet), it is audit reporting information, a
> level above simply writing logs and a level below controlling daemon
> behaviour.

You are changing audit information. That's CAP_AUDIT_CONTROL.

>
>> We get this request pretty regularly. "I need my own capability
>> because I have a niche thing that isn't part of the system security
>> policy but that is important!" Fit the containerID into the
>> system security policy, and if that results in using CAP_SYS_ADMIN,
>> oh well.
> There's far too much piled in to CAP_SYS_ADMIN already, which is making
> capabilites less and less useful.  

No. The value of capabilities is in separating privilege from DAC.
Granularity is a bonus. The current granularity is too fine in some
cases and too coarse in others.

> I realize that capabilities are
> limited compared with netlink message types, but this falls in between
> the abilities needed by CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE.

There is *nothing* about your use that makes a compelling
argument for a new capability. If you can't decide between
CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE require both.

>
> I'll continue on Steve Grubb's comment...
>
>>>>>   At that time, record the target container's user-supplied
>>>>> container identifier along with the target container's first process
>>>>> (which may become the target container's "init" process) process ID
>>>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>>>> record AUDIT_CONTAINER with a qualifying op=$action field.
>>>>>
>>>>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>>>>> container ID present on an auditable action or event.
>>>>>
>>>>> Forked and cloned processes inherit their parent's container ID,
>>>>> referenced in the process' task_struct.
>>>>>
>>>>> Mimic setns(2) and return an error if the process has already initiated
>>>>> threading or forked since this registration should happen before the
>>>>> process execution is started by the orchestrator and hence should not
>>>>> yet have any threads or children.  If this is deemed overly restrictive,
>>>>> switch all threads and children to the new containerID.
>>>>>
>>>>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>>>>>
>>>>> Log the creation of every namespace, inheriting/adding its spawning
>>>>> process' containerID(s), if applicable.  Include the spawning and
>>>>> spawned namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>>>>> Note: At this point it appears only network namespaces may need to track
>>>>> container IDs apart from processes since incoming packets may cause an
>>>>> auditable event before being associated with a process.
>>>>>
>>>>> Log the destruction of every namespace when it is no longer used by any
>>>>> process, include the namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>>>>>
>>>>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>>>>> the parent and child namespace IDs for any changes to a process'
>>>>> namespaces. [setns(2)]
>>>>> Note: It may be possible to combine AUDIT_NS_* record formats and
>>>>> distinguish them with an op=$action field depending on the fields
>>>>> required for each message type.
>>>>>
>>>>> When a container ceases to exist because the last process in that
>>>>> container has exited and hence the last namespace has been destroyed and
>>>>> its refcount dropping to zero, log the fact.
>>>>> (This latter is likely needed for certification accountability.)  A
>>>>> container object may need a list of processes and/or namespaces.
>>>>>
>>>>> A namespace cannot directly migrate from one container to another but
>>>>> could be assigned to a newly spawned container.  A namespace can be
>>>>> moved from one container to another indirectly by having that namespace
>>>>> used in a second process in another container and then ending all the
>>>>> processes in the first container.
>>>>>
>>>>> (v2)
>>>>> - switch from u64 to u128 UUID
>>>>> - switch from "signal" and "trigger" to "register"
>>>>> - restrict registration to single process or force all threads and children into same container
>>>>>
>>>>> - RGB
>>> - RGB
>>>
>>> --
>>> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> Sr. S/W Engineer, Kernel Security, Base Operating Systems
>>> Remote, Ottawa, Red Hat Canada
>>> IRC: rgb, SunRaycer
>>> Voice: +1.647.777.2635, Internal: (81) 32635
>>>
> - RGB
>
> --
> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]         ` <20171019000527.eio6dfsmujmtioyt-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2017-10-19 13:32           ` Casey Schaufler
  0 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-19 13:32 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	mszeredi, Andy Lutomirski, jlayton, Carlos O'Donell, Al Viro,
	David Howells, Simo Sorce, trondmy, Eric Paris, Serge E. Hallyn,
	Eric W. Biederman

On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
> On 2017-10-17 01:10, Casey Schaufler wrote:
>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
>>> On 2017-10-12 16:33, Casey Schaufler wrote:
>>>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>>>
>>>>> The Linux audit system needs a way to be able to track the container
>>>>> provenance of events and actions.  Audit needs the kernel's help to do
>>>>> this.
>>>>>
>>>>> Since the concept of a container is entirely a userspace concept, a
>>>>> registration from the userspace container orchestration system initiates
>>>>> this.  This will define a point in time and a set of resources
>>>>> associated with a particular container with an audit container ID.
>>>>>
>>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>>> representing a process that will become the first process in a new
>>>>> container.  This write might place restrictions on mount namespaces
>>>>> required to define a container, or at least careful checking of
>>>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>>> necessary in the container orchestrator's mntNS.
>>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>>> and simpler.
>>>>>
>>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>>> registration.
>>>> Hang on. If containers are a user space concept, how can
>>>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>>>> a container, how can you be asking for a capability to manage
>>>> them?
>>> There is such a thing, but the kernel doesn't know about it yet.
>> Then how can it be the kernel's place to control access to a
>> container resource, that is, the containerID.
> Ok, let me try to address your objections.
>
> The kernel can know enough that if it is already set to not allow it to
> be set again.  Or if the user doesn't have permission to set it that the
> user be denied this action.  How is this different from loginuid and
> sessionid?
>>>   This
>>> same situation exists for loginuid and sessionid which are userspace
>>> concepts that the kernel tracks for the convenience of userspace.
>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>> that a user is defined by the uid.
> This simple explanation doesn't help me.  What makes that a kernel
> concept?  The fact that it is stored and compared in more than one
> place?
>
>> The session ID has well defined kernel semantics. You're trying to say
>> that the containerID is an opaque value that is meaningless to the
>> kernel, but you still want the kernel to protect it. How can the
>> kernel know if it is protecting it correctly?
> How so?  A userspace process triggers this.  Does the kernel know what
> these values mean?  Does it do anything with them other than report
> them or allow audit to filter them?  It is given some instructions on
> how to treat it.
>
> This is what we're trying to do with the containerID.
>
>>>   As
>>> for its name, I'm not particularly picky, so if you don't like
>>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
>>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
>>> don't want to give the ability to set a containerID to any process that
>>> is able to do audit logging (such as vsftpd) and similarly we don't want
>>> to give the orchestrator the ability to control the setup of the audit
>>> daemon.
>> Sorry, but what aspect of the kernel security policy is this
>> capability supposed to protect? That's what capabilities are
>> for, not the undefined support of undefined user-space behavior.
> Similarly, loginuids and sessionIDs are only used for audit tracking and
> filtering.

Tell me again why you're not reusing either of these?

>
>> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
>> more than audit behavior you have to define what system security
>> policy you're dealing with in order to pick the right capability.
> It isn't audit behaviour (yet), it is audit reporting information, a
> level above simply writing logs and a level below controlling daemon
> behaviour.

You are changing audit information. That's CAP_AUDIT_CONTROL.

>
>> We get this request pretty regularly. "I need my own capability
>> because I have a niche thing that isn't part of the system security
>> policy but that is important!" Fit the containerID into the
>> system security policy, and if that results in using CAP_SYS_ADMIN,
>> oh well.
> There's far too much piled in to CAP_SYS_ADMIN already, which is making
> capabilites less and less useful.  

No. The value of capabilities is in separating privilege from DAC.
Granularity is a bonus. The current granularity is too fine in some
cases and too coarse in others.

> I realize that capabilities are
> limited compared with netlink message types, but this falls in between
> the abilities needed by CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE.

There is *nothing* about your use that makes a compelling
argument for a new capability. If you can't decide between
CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE require both.

>
> I'll continue on Steve Grubb's comment...
>
>>>>>   At that time, record the target container's user-supplied
>>>>> container identifier along with the target container's first process
>>>>> (which may become the target container's "init" process) process ID
>>>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>>>> record AUDIT_CONTAINER with a qualifying op=$action field.
>>>>>
>>>>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>>>>> container ID present on an auditable action or event.
>>>>>
>>>>> Forked and cloned processes inherit their parent's container ID,
>>>>> referenced in the process' task_struct.
>>>>>
>>>>> Mimic setns(2) and return an error if the process has already initiated
>>>>> threading or forked since this registration should happen before the
>>>>> process execution is started by the orchestrator and hence should not
>>>>> yet have any threads or children.  If this is deemed overly restrictive,
>>>>> switch all threads and children to the new containerID.
>>>>>
>>>>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>>>>>
>>>>> Log the creation of every namespace, inheriting/adding its spawning
>>>>> process' containerID(s), if applicable.  Include the spawning and
>>>>> spawned namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>>>>> Note: At this point it appears only network namespaces may need to track
>>>>> container IDs apart from processes since incoming packets may cause an
>>>>> auditable event before being associated with a process.
>>>>>
>>>>> Log the destruction of every namespace when it is no longer used by any
>>>>> process, include the namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>>>>>
>>>>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>>>>> the parent and child namespace IDs for any changes to a process'
>>>>> namespaces. [setns(2)]
>>>>> Note: It may be possible to combine AUDIT_NS_* record formats and
>>>>> distinguish them with an op=$action field depending on the fields
>>>>> required for each message type.
>>>>>
>>>>> When a container ceases to exist because the last process in that
>>>>> container has exited and hence the last namespace has been destroyed and
>>>>> its refcount dropping to zero, log the fact.
>>>>> (This latter is likely needed for certification accountability.)  A
>>>>> container object may need a list of processes and/or namespaces.
>>>>>
>>>>> A namespace cannot directly migrate from one container to another but
>>>>> could be assigned to a newly spawned container.  A namespace can be
>>>>> moved from one container to another indirectly by having that namespace
>>>>> used in a second process in another container and then ending all the
>>>>> processes in the first container.
>>>>>
>>>>> (v2)
>>>>> - switch from u64 to u128 UUID
>>>>> - switch from "signal" and "trigger" to "register"
>>>>> - restrict registration to single process or force all threads and children into same container
>>>>>
>>>>> - RGB
>>> - RGB
>>>
>>> --
>>> Richard Guy Briggs <rgb@redhat.com>
>>> Sr. S/W Engineer, Kernel Security, Base Operating Systems
>>> Remote, Ottawa, Red Hat Canada
>>> IRC: rgb, SunRaycer
>>> Voice: +1.647.777.2635, Internal: (81) 32635
>>>
> - RGB
>
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-19 13:32           ` Casey Schaufler
  0 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-10-19 13:32 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development, mszeredi-H+wXaHxf7aLQT0dZR+AlfA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Al Viro, David Howells, Simo Sorce,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Serge E. Hallyn,
	Eric W. Biederman

On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
> On 2017-10-17 01:10, Casey Schaufler wrote:
>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
>>> On 2017-10-12 16:33, Casey Schaufler wrote:
>>>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>>>
>>>>> The Linux audit system needs a way to be able to track the container
>>>>> provenance of events and actions.  Audit needs the kernel's help to do
>>>>> this.
>>>>>
>>>>> Since the concept of a container is entirely a userspace concept, a
>>>>> registration from the userspace container orchestration system initiates
>>>>> this.  This will define a point in time and a set of resources
>>>>> associated with a particular container with an audit container ID.
>>>>>
>>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>>> representing a process that will become the first process in a new
>>>>> container.  This write might place restrictions on mount namespaces
>>>>> required to define a container, or at least careful checking of
>>>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>>> necessary in the container orchestrator's mntNS.
>>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>>> and simpler.
>>>>>
>>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>>> registration.
>>>> Hang on. If containers are a user space concept, how can
>>>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>>>> a container, how can you be asking for a capability to manage
>>>> them?
>>> There is such a thing, but the kernel doesn't know about it yet.
>> Then how can it be the kernel's place to control access to a
>> container resource, that is, the containerID.
> Ok, let me try to address your objections.
>
> The kernel can know enough that if it is already set to not allow it to
> be set again.  Or if the user doesn't have permission to set it that the
> user be denied this action.  How is this different from loginuid and
> sessionid?
>>>   This
>>> same situation exists for loginuid and sessionid which are userspace
>>> concepts that the kernel tracks for the convenience of userspace.
>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>> that a user is defined by the uid.
> This simple explanation doesn't help me.  What makes that a kernel
> concept?  The fact that it is stored and compared in more than one
> place?
>
>> The session ID has well defined kernel semantics. You're trying to say
>> that the containerID is an opaque value that is meaningless to the
>> kernel, but you still want the kernel to protect it. How can the
>> kernel know if it is protecting it correctly?
> How so?  A userspace process triggers this.  Does the kernel know what
> these values mean?  Does it do anything with them other than report
> them or allow audit to filter them?  It is given some instructions on
> how to treat it.
>
> This is what we're trying to do with the containerID.
>
>>>   As
>>> for its name, I'm not particularly picky, so if you don't like
>>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
>>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
>>> don't want to give the ability to set a containerID to any process that
>>> is able to do audit logging (such as vsftpd) and similarly we don't want
>>> to give the orchestrator the ability to control the setup of the audit
>>> daemon.
>> Sorry, but what aspect of the kernel security policy is this
>> capability supposed to protect? That's what capabilities are
>> for, not the undefined support of undefined user-space behavior.
> Similarly, loginuids and sessionIDs are only used for audit tracking and
> filtering.

Tell me again why you're not reusing either of these?

>
>> If it's audit behavior, you want CAP_AUDIT_CONTROL. If it's
>> more than audit behavior you have to define what system security
>> policy you're dealing with in order to pick the right capability.
> It isn't audit behaviour (yet), it is audit reporting information, a
> level above simply writing logs and a level below controlling daemon
> behaviour.

You are changing audit information. That's CAP_AUDIT_CONTROL.

>
>> We get this request pretty regularly. "I need my own capability
>> because I have a niche thing that isn't part of the system security
>> policy but that is important!" Fit the containerID into the
>> system security policy, and if that results in using CAP_SYS_ADMIN,
>> oh well.
> There's far too much piled in to CAP_SYS_ADMIN already, which is making
> capabilites less and less useful.  

No. The value of capabilities is in separating privilege from DAC.
Granularity is a bonus. The current granularity is too fine in some
cases and too coarse in others.

> I realize that capabilities are
> limited compared with netlink message types, but this falls in between
> the abilities needed by CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE.

There is *nothing* about your use that makes a compelling
argument for a new capability. If you can't decide between
CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE require both.

>
> I'll continue on Steve Grubb's comment...
>
>>>>>   At that time, record the target container's user-supplied
>>>>> container identifier along with the target container's first process
>>>>> (which may become the target container's "init" process) process ID
>>>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>>>> record AUDIT_CONTAINER with a qualifying op=$action field.
>>>>>
>>>>> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>>>>> container ID present on an auditable action or event.
>>>>>
>>>>> Forked and cloned processes inherit their parent's container ID,
>>>>> referenced in the process' task_struct.
>>>>>
>>>>> Mimic setns(2) and return an error if the process has already initiated
>>>>> threading or forked since this registration should happen before the
>>>>> process execution is started by the orchestrator and hence should not
>>>>> yet have any threads or children.  If this is deemed overly restrictive,
>>>>> switch all threads and children to the new containerID.
>>>>>
>>>>> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>>>>>
>>>>> Log the creation of every namespace, inheriting/adding its spawning
>>>>> process' containerID(s), if applicable.  Include the spawning and
>>>>> spawned namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>>>>> Note: At this point it appears only network namespaces may need to track
>>>>> container IDs apart from processes since incoming packets may cause an
>>>>> auditable event before being associated with a process.
>>>>>
>>>>> Log the destruction of every namespace when it is no longer used by any
>>>>> process, include the namespace IDs (device and inode number tuples).
>>>>> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>>>>>
>>>>> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>>>>> the parent and child namespace IDs for any changes to a process'
>>>>> namespaces. [setns(2)]
>>>>> Note: It may be possible to combine AUDIT_NS_* record formats and
>>>>> distinguish them with an op=$action field depending on the fields
>>>>> required for each message type.
>>>>>
>>>>> When a container ceases to exist because the last process in that
>>>>> container has exited and hence the last namespace has been destroyed and
>>>>> its refcount dropping to zero, log the fact.
>>>>> (This latter is likely needed for certification accountability.)  A
>>>>> container object may need a list of processes and/or namespaces.
>>>>>
>>>>> A namespace cannot directly migrate from one container to another but
>>>>> could be assigned to a newly spawned container.  A namespace can be
>>>>> moved from one container to another indirectly by having that namespace
>>>>> used in a second process in another container and then ending all the
>>>>> processes in the first container.
>>>>>
>>>>> (v2)
>>>>> - switch from u64 to u128 UUID
>>>>> - switch from "signal" and "trigger" to "register"
>>>>> - restrict registration to single process or force all threads and children into same container
>>>>>
>>>>> - RGB
>>> - RGB
>>>
>>> --
>>> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> Sr. S/W Engineer, Kernel Security, Base Operating Systems
>>> Remote, Ottawa, Red Hat Canada
>>> IRC: rgb, SunRaycer
>>> Voice: +1.647.777.2635, Internal: (81) 32635
>>>
> - RGB
>
> --
> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                             ` <871sm0j7bm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-10-19 15:36                               ` Paul Moore
  0 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-19 15:36 UTC (permalink / raw)
  To: Aleksa Sarai, Eric W. Biederman
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, David Howells, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Eric Paris, James Bottomley,
	Casey Schaufler, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Viro,
	Andy Lutomirski, Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Steve Grubb,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> writes:
>>>> The security implications are that anything that can change the label
>>>> could also hide itself and its doings from the audit system and thus
>>>> would be used as a means to evade detection.  I actually think this
>>>> means the label should be write once (once you've set it, you can't
>>>> change it) ...
>>>
>>> Richard and I have talked about a write once approach, but the
>>> thinking was that you may want to allow a nested container
>>> orchestrator (Why? I don't know, but people always want to do the
>>> craziest things.) and a write-once policy makes that impossible.  If
>>> we punt on the nested orchestrator, I believe we can seriously think
>>> about a write-once policy to simplify things.
>>
>> Nested containers are a very widely used use-case (see LXC system containers,
>> inside of which people run other container runtimes). So I would definitely
>> consider it something that "needs to be supported in some way". While the LXC
>> guys might be a *tad* crazy, the use-case isn't. :P

No worries, we're all a little crazy in our own special ways ;)

Kidding aside, thanks for explaining the use case.

> Of course some of that gets to running auditd inside a container which
> we don't have yet either.
>
> So I think to start it is perfectly fine to figure out the non-nested
> case first and what makes sense there.  Then to sort out the nested
> container case.
>
> The solution might be that a process gets at most one id per ``audit
> namespace''.

In an attempt to stay on-topic, let's try to stick with "audit
container ID" or "container ID" if you must.  I really want to avoid
the term "audit namespace" simply because the term "namespace" implies
some things which we aren't planning on doing.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                             ` <871sm0j7bm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-10-19 15:36                               ` Paul Moore
  0 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-19 15:36 UTC (permalink / raw)
  To: Aleksa Sarai, Eric W. Biederman
  Cc: James Bottomley, cgroups, mszeredi, Andy Lutomirski, jlayton,
	Carlos O'Donell, API, Linux Containers, Linux Kernel, Viro,
	David Howells, Linux FS Devel, linux-audit, Simo Sorce,
	Development, Casey Schaufler, Eric Paris, Steve Grubb, trondmy

On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Aleksa Sarai <asarai@suse.de> writes:
>>>> The security implications are that anything that can change the label
>>>> could also hide itself and its doings from the audit system and thus
>>>> would be used as a means to evade detection.  I actually think this
>>>> means the label should be write once (once you've set it, you can't
>>>> change it) ...
>>>
>>> Richard and I have talked about a write once approach, but the
>>> thinking was that you may want to allow a nested container
>>> orchestrator (Why? I don't know, but people always want to do the
>>> craziest things.) and a write-once policy makes that impossible.  If
>>> we punt on the nested orchestrator, I believe we can seriously think
>>> about a write-once policy to simplify things.
>>
>> Nested containers are a very widely used use-case (see LXC system containers,
>> inside of which people run other container runtimes). So I would definitely
>> consider it something that "needs to be supported in some way". While the LXC
>> guys might be a *tad* crazy, the use-case isn't. :P

No worries, we're all a little crazy in our own special ways ;)

Kidding aside, thanks for explaining the use case.

> Of course some of that gets to running auditd inside a container which
> we don't have yet either.
>
> So I think to start it is perfectly fine to figure out the non-nested
> case first and what makes sense there.  Then to sort out the nested
> container case.
>
> The solution might be that a process gets at most one id per ``audit
> namespace''.

In an attempt to stay on-topic, let's try to stick with "audit
container ID" or "container ID" if you must.  I really want to avoid
the term "audit namespace" simply because the term "namespace" implies
some things which we aren't planning on doing.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-19 15:36                               ` Paul Moore
  0 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-19 15:36 UTC (permalink / raw)
  To: Aleksa Sarai, Eric W. Biederman
  Cc: James Bottomley, cgroups-u79uwXL29TY76Z2rM5mHXA,
	mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Viro, David Howells,
	Linux FS Devel, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Simo Sorce,
	Development, Casey Schaufler, Eric Paris, Steve Grubb,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> writes:
>>>> The security implications are that anything that can change the label
>>>> could also hide itself and its doings from the audit system and thus
>>>> would be used as a means to evade detection.  I actually think this
>>>> means the label should be write once (once you've set it, you can't
>>>> change it) ...
>>>
>>> Richard and I have talked about a write once approach, but the
>>> thinking was that you may want to allow a nested container
>>> orchestrator (Why? I don't know, but people always want to do the
>>> craziest things.) and a write-once policy makes that impossible.  If
>>> we punt on the nested orchestrator, I believe we can seriously think
>>> about a write-once policy to simplify things.
>>
>> Nested containers are a very widely used use-case (see LXC system containers,
>> inside of which people run other container runtimes). So I would definitely
>> consider it something that "needs to be supported in some way". While the LXC
>> guys might be a *tad* crazy, the use-case isn't. :P

No worries, we're all a little crazy in our own special ways ;)

Kidding aside, thanks for explaining the use case.

> Of course some of that gets to running auditd inside a container which
> we don't have yet either.
>
> So I think to start it is perfectly fine to figure out the non-nested
> case first and what makes sense there.  Then to sort out the nested
> container case.
>
> The solution might be that a process gets at most one id per ``audit
> namespace''.

In an attempt to stay on-topic, let's try to stick with "audit
container ID" or "container ID" if you must.  I really want to avoid
the term "audit namespace" simply because the term "namespace" implies
some things which we aren't planning on doing.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]           ` <18cb69a5-f998-0e6e-85df-7f4b9b768a6f-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-10-19 15:51             ` Paul Moore
  0 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-19 15:51 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mszeredi-H+wXaHxf7aLQT0dZR+AlfA,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
	Linux Kernel, David Howells, Carlos O'Donell, Linux Audit,
	Eric W. Biederman, Simo Sorce, Linux Network Development,
	Linux FS Devel, Eric Paris, Al Viro

On Thu, Oct 19, 2017 at 9:32 AM, Casey Schaufler <casey-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org> wrote:
> On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
>> On 2017-10-17 01:10, Casey Schaufler wrote:
>>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
>>>> On 2017-10-12 16:33, Casey Schaufler wrote:
>>>>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>>>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>>>>
>>>>>> The Linux audit system needs a way to be able to track the container
>>>>>> provenance of events and actions.  Audit needs the kernel's help to do
>>>>>> this.
>>>>>>
>>>>>> Since the concept of a container is entirely a userspace concept, a
>>>>>> registration from the userspace container orchestration system initiates
>>>>>> this.  This will define a point in time and a set of resources
>>>>>> associated with a particular container with an audit container ID.
>>>>>>
>>>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>>>> representing a process that will become the first process in a new
>>>>>> container.  This write might place restrictions on mount namespaces
>>>>>> required to define a container, or at least careful checking of
>>>>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>>>> necessary in the container orchestrator's mntNS.
>>>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>>>> and simpler.
>>>>>>
>>>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>>>> registration.
>>>>> Hang on. If containers are a user space concept, how can
>>>>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>>>>> a container, how can you be asking for a capability to manage
>>>>> them?
>>>> There is such a thing, but the kernel doesn't know about it yet.
>>> Then how can it be the kernel's place to control access to a
>>> container resource, that is, the containerID.
>> Ok, let me try to address your objections.
>>
>> The kernel can know enough that if it is already set to not allow it to
>> be set again.  Or if the user doesn't have permission to set it that the
>> user be denied this action.  How is this different from loginuid and
>> sessionid?
>>>>   This
>>>> same situation exists for loginuid and sessionid which are userspace
>>>> concepts that the kernel tracks for the convenience of userspace.
>>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>>> that a user is defined by the uid.
>> This simple explanation doesn't help me.  What makes that a kernel
>> concept?  The fact that it is stored and compared in more than one
>> place?
>>
>>> The session ID has well defined kernel semantics. You're trying to say
>>> that the containerID is an opaque value that is meaningless to the
>>> kernel, but you still want the kernel to protect it. How can the
>>> kernel know if it is protecting it correctly?
>> How so?  A userspace process triggers this.  Does the kernel know what
>> these values mean?  Does it do anything with them other than report
>> them or allow audit to filter them?  It is given some instructions on
>> how to treat it.
>>
>> This is what we're trying to do with the containerID.
>>
>>>>   As
>>>> for its name, I'm not particularly picky, so if you don't like
>>>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
>>>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
>>>> don't want to give the ability to set a containerID to any process that
>>>> is able to do audit logging (such as vsftpd) and similarly we don't want
>>>> to give the orchestrator the ability to control the setup of the audit
>>>> daemon.
>>> Sorry, but what aspect of the kernel security policy is this
>>> capability supposed to protect? That's what capabilities are
>>> for, not the undefined support of undefined user-space behavior.
>> Similarly, loginuids and sessionIDs are only used for audit tracking and
>> filtering.
>
> Tell me again why you're not reusing either of these?

Ah, granularity arguments, welcome back old friend :)

Once again, we're still trying to sort all this out so I reserve the
right to change my mind, but my current thinking is as follows ...
CAP_AUDIT_WRITE exists to control which applications can submit
userspace generated audit records to the kernel, CAP_AUDIT_CONTROL
exists to control which applications can manage the in-kernel audit
configuration (e.g. filter rules) and the current task's loginuid
value.  Reusing CAP_AUDIT_WRITE here would allow any application that
can submit userspace audit records the ability to change the audit
container ID; this would be bad, we don't allow CAP_AUDIT_WRITE to
change the loginuid, it would be even worse to allow it to change the
audit container ID.  Reusing CAP_AUDIT_CONTROL is less worse than than
CAP_AUDIT_WRITE, but it gets sticky once we get to the part where we
want to auditd instances in containers, complete with their own
queues, filtering rules, etc..  Perhaps we could use CAP_AUDIT_CONTROL
to guard the audit container ID value, but we would always want to do
that check in the init userns in order to prevent container bound
processes from manipulating their own audit container ID.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-19 13:32           ` Casey Schaufler
  (?)
@ 2017-10-19 15:51           ` Paul Moore
  -1 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-19 15:51 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Richard Guy Briggs, mszeredi, Eric W. Biederman, Simo Sorce,
	jlayton, Carlos O'Donell, Linux API, Linux Containers,
	Linux Kernel, Eric Paris, David Howells, Linux Audit, Al Viro,
	Andy Lutomirski, Linux Network Development, Linux FS Devel,
	cgroups, Serge E. Hallyn, trondmy

On Thu, Oct 19, 2017 at 9:32 AM, Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 10/18/2017 5:05 PM, Richard Guy Briggs wrote:
>> On 2017-10-17 01:10, Casey Schaufler wrote:
>>> On 10/16/2017 5:33 PM, Richard Guy Briggs wrote:
>>>> On 2017-10-12 16:33, Casey Schaufler wrote:
>>>>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>>>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>>>>
>>>>>> The Linux audit system needs a way to be able to track the container
>>>>>> provenance of events and actions.  Audit needs the kernel's help to do
>>>>>> this.
>>>>>>
>>>>>> Since the concept of a container is entirely a userspace concept, a
>>>>>> registration from the userspace container orchestration system initiates
>>>>>> this.  This will define a point in time and a set of resources
>>>>>> associated with a particular container with an audit container ID.
>>>>>>
>>>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>>>> representing a process that will become the first process in a new
>>>>>> container.  This write might place restrictions on mount namespaces
>>>>>> required to define a container, or at least careful checking of
>>>>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>>>> necessary in the container orchestrator's mntNS.
>>>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>>>> and simpler.
>>>>>>
>>>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>>>> registration.
>>>>> Hang on. If containers are a user space concept, how can
>>>>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>>>>> a container, how can you be asking for a capability to manage
>>>>> them?
>>>> There is such a thing, but the kernel doesn't know about it yet.
>>> Then how can it be the kernel's place to control access to a
>>> container resource, that is, the containerID.
>> Ok, let me try to address your objections.
>>
>> The kernel can know enough that if it is already set to not allow it to
>> be set again.  Or if the user doesn't have permission to set it that the
>> user be denied this action.  How is this different from loginuid and
>> sessionid?
>>>>   This
>>>> same situation exists for loginuid and sessionid which are userspace
>>>> concepts that the kernel tracks for the convenience of userspace.
>>> Ah, no. Loginuid identifies a user, which is a kernel concept in
>>> that a user is defined by the uid.
>> This simple explanation doesn't help me.  What makes that a kernel
>> concept?  The fact that it is stored and compared in more than one
>> place?
>>
>>> The session ID has well defined kernel semantics. You're trying to say
>>> that the containerID is an opaque value that is meaningless to the
>>> kernel, but you still want the kernel to protect it. How can the
>>> kernel know if it is protecting it correctly?
>> How so?  A userspace process triggers this.  Does the kernel know what
>> these values mean?  Does it do anything with them other than report
>> them or allow audit to filter them?  It is given some instructions on
>> how to treat it.
>>
>> This is what we're trying to do with the containerID.
>>
>>>>   As
>>>> for its name, I'm not particularly picky, so if you don't like
>>>> CAP_CONTAINER_* then I'm fine with CAP_AUDIT_CONTAINERID.  It really
>>>> needs to be distinct from CAP_AUDIT_WRITE and CAP_AUDIT_CONTROL since we
>>>> don't want to give the ability to set a containerID to any process that
>>>> is able to do audit logging (such as vsftpd) and similarly we don't want
>>>> to give the orchestrator the ability to control the setup of the audit
>>>> daemon.
>>> Sorry, but what aspect of the kernel security policy is this
>>> capability supposed to protect? That's what capabilities are
>>> for, not the undefined support of undefined user-space behavior.
>> Similarly, loginuids and sessionIDs are only used for audit tracking and
>> filtering.
>
> Tell me again why you're not reusing either of these?

Ah, granularity arguments, welcome back old friend :)

Once again, we're still trying to sort all this out so I reserve the
right to change my mind, but my current thinking is as follows ...
CAP_AUDIT_WRITE exists to control which applications can submit
userspace generated audit records to the kernel, CAP_AUDIT_CONTROL
exists to control which applications can manage the in-kernel audit
configuration (e.g. filter rules) and the current task's loginuid
value.  Reusing CAP_AUDIT_WRITE here would allow any application that
can submit userspace audit records the ability to change the audit
container ID; this would be bad, we don't allow CAP_AUDIT_WRITE to
change the loginuid, it would be even worse to allow it to change the
audit container ID.  Reusing CAP_AUDIT_CONTROL is less worse than than
CAP_AUDIT_WRITE, but it gets sticky once we get to the part where we
want to auditd instances in containers, complete with their own
queues, filtering rules, etc..  Perhaps we could use CAP_AUDIT_CONTROL
to guard the audit container ID value, but we would always want to do
that check in the init userns in order to prevent container bound
processes from manipulating their own audit container ID.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                               ` <CAHC9VhTYF-MJm3ejWXE1H-eeXKaNBkeWKwdiKdj093xATYn7nQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-19 16:25                                 ` Eric W. Biederman
  0 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-19 16:25 UTC (permalink / raw)
  To: Paul Moore
  Cc: Simo Sorce, mszeredi-H+wXaHxf7aLQT0dZR+AlfA, David Howells,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Eric Paris, James Bottomley,
	Casey Schaufler, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Viro,
	Andy Lutomirski, Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Steve Grubb,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

Paul Moore <paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org> writes:

> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> writes:
>>>>> The security implications are that anything that can change the label
>>>>> could also hide itself and its doings from the audit system and thus
>>>>> would be used as a means to evade detection.  I actually think this
>>>>> means the label should be write once (once you've set it, you can't
>>>>> change it) ...
>>>>
>>>> Richard and I have talked about a write once approach, but the
>>>> thinking was that you may want to allow a nested container
>>>> orchestrator (Why? I don't know, but people always want to do the
>>>> craziest things.) and a write-once policy makes that impossible.  If
>>>> we punt on the nested orchestrator, I believe we can seriously think
>>>> about a write-once policy to simplify things.
>>>
>>> Nested containers are a very widely used use-case (see LXC system containers,
>>> inside of which people run other container runtimes). So I would definitely
>>> consider it something that "needs to be supported in some way". While the LXC
>>> guys might be a *tad* crazy, the use-case isn't. :P
>
> No worries, we're all a little crazy in our own special ways ;)
>
> Kidding aside, thanks for explaining the use case.
>
>> Of course some of that gets to running auditd inside a container which
>> we don't have yet either.
>>
>> So I think to start it is perfectly fine to figure out the non-nested
>> case first and what makes sense there.  Then to sort out the nested
>> container case.
>>
>> The solution might be that a process gets at most one id per ``audit
>> namespace''.
>
> In an attempt to stay on-topic, let's try to stick with "audit
> container ID" or "container ID" if you must.  I really want to avoid
> the term "audit namespace" simply because the term "namespace" implies
> some things which we aren't planning on doing.

This is 100% on topic.  I am saying that unless we are planing to have
auditd running in a container with it's own set of rules you probably
don't care about nested containers.  Last time I heard a discussion
about that the term in use was audit namespace.   So I was referring to
that support when I said audit namespace, even if the end result only
loosely fits the term namespace.

I could be wrong of course.  I don't fully understand what is driving
the desire to connect audit and containers.  But my naive guess is that
one from an audit perspective you don't care about nested containers
unless there is also a nested auditd who is looking at it from a nested
perspective.

So far we have established with the term container that we are talking
about a running instance of processes, not a filesystem instance that
Docker and friends ship around.   Beyond that I am not certain what you
care about.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                               ` <CAHC9VhTYF-MJm3ejWXE1H-eeXKaNBkeWKwdiKdj093xATYn7nQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-19 16:25                                 ` Eric W. Biederman
  0 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-19 16:25 UTC (permalink / raw)
  To: Paul Moore
  Cc: Aleksa Sarai, James Bottomley, cgroups, mszeredi,
	Andy Lutomirski, jlayton, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Viro, David Howells,
	Linux FS Devel, linux-audit, Simo Sorce, Development,
	Casey Schaufler, Eric Paris, Steve Grubb, trondmy

Paul Moore <paul@paul-moore.com> writes:

> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Aleksa Sarai <asarai@suse.de> writes:
>>>>> The security implications are that anything that can change the label
>>>>> could also hide itself and its doings from the audit system and thus
>>>>> would be used as a means to evade detection.  I actually think this
>>>>> means the label should be write once (once you've set it, you can't
>>>>> change it) ...
>>>>
>>>> Richard and I have talked about a write once approach, but the
>>>> thinking was that you may want to allow a nested container
>>>> orchestrator (Why? I don't know, but people always want to do the
>>>> craziest things.) and a write-once policy makes that impossible.  If
>>>> we punt on the nested orchestrator, I believe we can seriously think
>>>> about a write-once policy to simplify things.
>>>
>>> Nested containers are a very widely used use-case (see LXC system containers,
>>> inside of which people run other container runtimes). So I would definitely
>>> consider it something that "needs to be supported in some way". While the LXC
>>> guys might be a *tad* crazy, the use-case isn't. :P
>
> No worries, we're all a little crazy in our own special ways ;)
>
> Kidding aside, thanks for explaining the use case.
>
>> Of course some of that gets to running auditd inside a container which
>> we don't have yet either.
>>
>> So I think to start it is perfectly fine to figure out the non-nested
>> case first and what makes sense there.  Then to sort out the nested
>> container case.
>>
>> The solution might be that a process gets at most one id per ``audit
>> namespace''.
>
> In an attempt to stay on-topic, let's try to stick with "audit
> container ID" or "container ID" if you must.  I really want to avoid
> the term "audit namespace" simply because the term "namespace" implies
> some things which we aren't planning on doing.

This is 100% on topic.  I am saying that unless we are planing to have
auditd running in a container with it's own set of rules you probably
don't care about nested containers.  Last time I heard a discussion
about that the term in use was audit namespace.   So I was referring to
that support when I said audit namespace, even if the end result only
loosely fits the term namespace.

I could be wrong of course.  I don't fully understand what is driving
the desire to connect audit and containers.  But my naive guess is that
one from an audit perspective you don't care about nested containers
unless there is also a nested auditd who is looking at it from a nested
perspective.

So far we have established with the term container that we are talking
about a running instance of processes, not a filesystem instance that
Docker and friends ship around.   Beyond that I am not certain what you
care about.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-19 16:25                                 ` Eric W. Biederman
  0 siblings, 0 replies; 94+ messages in thread
From: Eric W. Biederman @ 2017-10-19 16:25 UTC (permalink / raw)
  To: Paul Moore
  Cc: Aleksa Sarai, James Bottomley, cgroups-u79uwXL29TY76Z2rM5mHXA,
	mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Viro, David Howells,
	Linux FS Devel, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Simo Sorce,
	Development, Casey Schaufler, Eric Paris, Steve Grubb,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

Paul Moore <paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org> writes:

> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> writes:
>>>>> The security implications are that anything that can change the label
>>>>> could also hide itself and its doings from the audit system and thus
>>>>> would be used as a means to evade detection.  I actually think this
>>>>> means the label should be write once (once you've set it, you can't
>>>>> change it) ...
>>>>
>>>> Richard and I have talked about a write once approach, but the
>>>> thinking was that you may want to allow a nested container
>>>> orchestrator (Why? I don't know, but people always want to do the
>>>> craziest things.) and a write-once policy makes that impossible.  If
>>>> we punt on the nested orchestrator, I believe we can seriously think
>>>> about a write-once policy to simplify things.
>>>
>>> Nested containers are a very widely used use-case (see LXC system containers,
>>> inside of which people run other container runtimes). So I would definitely
>>> consider it something that "needs to be supported in some way". While the LXC
>>> guys might be a *tad* crazy, the use-case isn't. :P
>
> No worries, we're all a little crazy in our own special ways ;)
>
> Kidding aside, thanks for explaining the use case.
>
>> Of course some of that gets to running auditd inside a container which
>> we don't have yet either.
>>
>> So I think to start it is perfectly fine to figure out the non-nested
>> case first and what makes sense there.  Then to sort out the nested
>> container case.
>>
>> The solution might be that a process gets at most one id per ``audit
>> namespace''.
>
> In an attempt to stay on-topic, let's try to stick with "audit
> container ID" or "container ID" if you must.  I really want to avoid
> the term "audit namespace" simply because the term "namespace" implies
> some things which we aren't planning on doing.

This is 100% on topic.  I am saying that unless we are planing to have
auditd running in a container with it's own set of rules you probably
don't care about nested containers.  Last time I heard a discussion
about that the term in use was audit namespace.   So I was referring to
that support when I said audit namespace, even if the end result only
loosely fits the term namespace.

I could be wrong of course.  I don't fully understand what is driving
the desire to connect audit and containers.  But my naive guess is that
one from an audit perspective you don't care about nested containers
unless there is also a nested auditd who is looking at it from a nested
perspective.

So far we have established with the term container that we are talking
about a running instance of processes, not a filesystem instance that
Docker and friends ship around.   Beyond that I am not certain what you
care about.

Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]                                 ` <87y3o7gl5l.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2017-10-19 17:47                                   ` Paul Moore
  0 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-19 17:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Simo Sorce, mszeredi-H+wXaHxf7aLQT0dZR+AlfA, David Howells,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Eric Paris, James Bottomley,
	Casey Schaufler, linux-audit-H+wXaHxf7aLQT0dZR+AlfA, Viro,
	Andy Lutomirski, Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Steve Grubb,
	trondmy-7I+n7zu2hftEKMMhf/gKZA

On Thu, Oct 19, 2017 at 12:25 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Paul Moore <paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org> writes:
>
>> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>> Aleksa Sarai <asarai-l3A5Bk7waGM@public.gmane.org> writes:
>>>>>> The security implications are that anything that can change the label
>>>>>> could also hide itself and its doings from the audit system and thus
>>>>>> would be used as a means to evade detection.  I actually think this
>>>>>> means the label should be write once (once you've set it, you can't
>>>>>> change it) ...
>>>>>
>>>>> Richard and I have talked about a write once approach, but the
>>>>> thinking was that you may want to allow a nested container
>>>>> orchestrator (Why? I don't know, but people always want to do the
>>>>> craziest things.) and a write-once policy makes that impossible.  If
>>>>> we punt on the nested orchestrator, I believe we can seriously think
>>>>> about a write-once policy to simplify things.
>>>>
>>>> Nested containers are a very widely used use-case (see LXC system containers,
>>>> inside of which people run other container runtimes). So I would definitely
>>>> consider it something that "needs to be supported in some way". While the LXC
>>>> guys might be a *tad* crazy, the use-case isn't. :P
>>
>> No worries, we're all a little crazy in our own special ways ;)
>>
>> Kidding aside, thanks for explaining the use case.
>>
>>> Of course some of that gets to running auditd inside a container which
>>> we don't have yet either.
>>>
>>> So I think to start it is perfectly fine to figure out the non-nested
>>> case first and what makes sense there.  Then to sort out the nested
>>> container case.
>>>
>>> The solution might be that a process gets at most one id per ``audit
>>> namespace''.
>>
>> In an attempt to stay on-topic, let's try to stick with "audit
>> container ID" or "container ID" if you must.  I really want to avoid
>> the term "audit namespace" simply because the term "namespace" implies
>> some things which we aren't planning on doing.
>
> This is 100% on topic.  I am saying that unless we are planing to have
> auditd running in a container with it's own set of rules you probably
> don't care about nested containers.  Last time I heard a discussion
> about that the term in use was audit namespace.   So I was referring to
> that support when I said audit namespace, even if the end result only
> loosely fits the term namespace.

My "stay on-topic" comment is directed at, and limited to, your choice
of terminology, not the discussion about container nesting.  I'm
purposefully not using the term "audit namespace" to refer to anything
that Richard has presented, and I'm kindly asking you to do the same,
it simply doesn't fit.

> I could be wrong of course.  I don't fully understand what is driving
> the desire to connect audit and containers.  But my naive guess is that
> one from an audit perspective you don't care about nested containers
> unless there is also a nested auditd who is looking at it from a nested
> perspective.

Two motivations that are clear to me: the first is the desire to be
able to associate events in the audit log with a container (much like
how the session ID helped us associate events with a login session),
the second is the desire for users to run an audit daemon instance in
their containers to capture audit events generated by their container.
There is also a security certification motivation, see some of Steve's
comments for more on that.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-19 16:25                                 ` Eric W. Biederman
  (?)
@ 2017-10-19 17:47                                 ` Paul Moore
  -1 siblings, 0 replies; 94+ messages in thread
From: Paul Moore @ 2017-10-19 17:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Aleksa Sarai, James Bottomley, cgroups, mszeredi,
	Andy Lutomirski, jlayton, Carlos O'Donell, API,
	Linux Containers, Linux Kernel, Viro, David Howells,
	Linux FS Devel, linux-audit, Simo Sorce, Development,
	Casey Schaufler, Eric Paris, Steve Grubb, trondmy

On Thu, Oct 19, 2017 at 12:25 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
>
>> On Wed, Oct 18, 2017 at 8:43 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Aleksa Sarai <asarai@suse.de> writes:
>>>>>> The security implications are that anything that can change the label
>>>>>> could also hide itself and its doings from the audit system and thus
>>>>>> would be used as a means to evade detection.  I actually think this
>>>>>> means the label should be write once (once you've set it, you can't
>>>>>> change it) ...
>>>>>
>>>>> Richard and I have talked about a write once approach, but the
>>>>> thinking was that you may want to allow a nested container
>>>>> orchestrator (Why? I don't know, but people always want to do the
>>>>> craziest things.) and a write-once policy makes that impossible.  If
>>>>> we punt on the nested orchestrator, I believe we can seriously think
>>>>> about a write-once policy to simplify things.
>>>>
>>>> Nested containers are a very widely used use-case (see LXC system containers,
>>>> inside of which people run other container runtimes). So I would definitely
>>>> consider it something that "needs to be supported in some way". While the LXC
>>>> guys might be a *tad* crazy, the use-case isn't. :P
>>
>> No worries, we're all a little crazy in our own special ways ;)
>>
>> Kidding aside, thanks for explaining the use case.
>>
>>> Of course some of that gets to running auditd inside a container which
>>> we don't have yet either.
>>>
>>> So I think to start it is perfectly fine to figure out the non-nested
>>> case first and what makes sense there.  Then to sort out the nested
>>> container case.
>>>
>>> The solution might be that a process gets at most one id per ``audit
>>> namespace''.
>>
>> In an attempt to stay on-topic, let's try to stick with "audit
>> container ID" or "container ID" if you must.  I really want to avoid
>> the term "audit namespace" simply because the term "namespace" implies
>> some things which we aren't planning on doing.
>
> This is 100% on topic.  I am saying that unless we are planing to have
> auditd running in a container with it's own set of rules you probably
> don't care about nested containers.  Last time I heard a discussion
> about that the term in use was audit namespace.   So I was referring to
> that support when I said audit namespace, even if the end result only
> loosely fits the term namespace.

My "stay on-topic" comment is directed at, and limited to, your choice
of terminology, not the discussion about container nesting.  I'm
purposefully not using the term "audit namespace" to refer to anything
that Richard has presented, and I'm kindly asking you to do the same,
it simply doesn't fit.

> I could be wrong of course.  I don't fully understand what is driving
> the desire to connect audit and containers.  But my naive guess is that
> one from an audit perspective you don't care about nested containers
> unless there is also a nested auditd who is looking at it from a nested
> perspective.

Two motivations that are clear to me: the first is the desire to be
able to associate events in the audit log with a container (much like
how the session ID helped us associate events with a login session),
the second is the desire for users to run an audit daemon instance in
their containers to capture audit events generated by their container.
There is also a security certification motivation, see some of Steve's
comments for more on that.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-12 15:45 ` Steve Grubb
  2017-10-19 19:57     ` Richard Guy Briggs
@ 2017-10-19 19:57   ` Richard Guy Briggs
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-19 19:57 UTC (permalink / raw)
  To: Steve Grubb
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Paul Moore,
	Linux Kernel, Eric Paris, Al Viro, David Howells, Linux Audit,
	Simo Sorce, Linux Network Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman

On 2017-10-12 15:45, Steve Grubb wrote:
> On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> > 
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> > 
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> 
> The requirements for common criteria around containers should be very closely 
> modeled on the requirements for virtualization. It would be the container 
> manager that is responsible for logging the resource assignment events.

I suspect we are in violent agreement here.

> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> > 
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.

No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...

> > At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> This would be in addition to the normal audit fields.

It was intended that this be an auxilliary record, but this issue is
being debated in threads about other upstream issues currently so I
won't cover that here.

> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> > 
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> > 
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> > 
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> > 
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> > 
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> 
> In the virtualization requirements, we only log removal of resources when 
> something is removed by intention. If the VM shuts down, the manager issues a 
> VIRT_CONTROL stop event and the user space utilities knows this means all 
> resources have been unassigned.

Ok, this assumes the orchestrator is waiting on that child process (and
that it is in turn waiting on all its children) so it knows when that
job has exited naturally or errored out.  I don't know if there is any
consensus or best practice with orchestrators out there now.  The kernel
should know, so it seemed reasonable to report what was known.  Besides,
in this case, I was talking specifically about namespace creation and
destruction rather than containers.

> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> > 
> > When a container ceases to exist because the last process in that
> > container has exited and hence the last namespace has been destroyed and
> > its refcount dropping to zero, log the fact.
> > (This latter is likely needed for certification accountability.)  A
> > container object may need a list of processes and/or namespaces.
> > 
> > A namespace cannot directly migrate from one container to another but
> > could be assigned to a newly spawned container.  A namespace can be
> > moved from one container to another indirectly by having that namespace
> > used in a second process in another container and then ending all the
> > processes in the first container.
> 
> I'm thinking that there needs to be a clear delineation between what the 
> container manager is responsible for and what the kernel needs to do. The 
> kernel needs the registration system and to associate an identifier with 
> events inside the container.

Agreed this needs to be defined much better than it is.

> But would the container manager be mostly responsible for auditing the events 
> described here:
> 
> https://github.com/linux-audit/audit-documentation/wiki/SPEC-Virtualization-Manager-Guest-Lifecycle-Events

I'm having trouble fitting all these events into the container model,
but recognize its importance in continuing to try to do so or to be able
to justify deviations from this SPEC.

> Also, we can already audit exit, unshare, setns, and clone. If the kernel just 
> sticks the identifier on them, isn't that sufficient?

I think this last one is incomplete without a way to identify the
namespaces involved.

> -Steve
> 
> > (v2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and children
> > into same container
> > 
> > - RGB

- RGB

--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-12 15:45 ` Steve Grubb
@ 2017-10-19 19:57     ` Richard Guy Briggs
  2017-10-19 19:57   ` Richard Guy Briggs
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-19 19:57 UTC (permalink / raw)
  To: Steve Grubb
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
	Eric W. Biederman, Eric Paris, jlayton, Andy Lutomirski,
	mszeredi, Paul Moore, Serge E. Hallyn, trondmy, Al Viro

On 2017-10-12 15:45, Steve Grubb wrote:
> On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> > 
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> > 
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> 
> The requirements for common criteria around containers should be very closely 
> modeled on the requirements for virtualization. It would be the container 
> manager that is responsible for logging the resource assignment events.

I suspect we are in violent agreement here.

> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> > 
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.

No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...

> > At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> This would be in addition to the normal audit fields.

It was intended that this be an auxilliary record, but this issue is
being debated in threads about other upstream issues currently so I
won't cover that here.

> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> > 
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> > 
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> > 
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> > 
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> > 
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> 
> In the virtualization requirements, we only log removal of resources when 
> something is removed by intention. If the VM shuts down, the manager issues a 
> VIRT_CONTROL stop event and the user space utilities knows this means all 
> resources have been unassigned.

Ok, this assumes the orchestrator is waiting on that child process (and
that it is in turn waiting on all its children) so it knows when that
job has exited naturally or errored out.  I don't know if there is any
consensus or best practice with orchestrators out there now.  The kernel
should know, so it seemed reasonable to report what was known.  Besides,
in this case, I was talking specifically about namespace creation and
destruction rather than containers.

> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> > 
> > When a container ceases to exist because the last process in that
> > container has exited and hence the last namespace has been destroyed and
> > its refcount dropping to zero, log the fact.
> > (This latter is likely needed for certification accountability.)  A
> > container object may need a list of processes and/or namespaces.
> > 
> > A namespace cannot directly migrate from one container to another but
> > could be assigned to a newly spawned container.  A namespace can be
> > moved from one container to another indirectly by having that namespace
> > used in a second process in another container and then ending all the
> > processes in the first container.
> 
> I'm thinking that there needs to be a clear delineation between what the 
> container manager is responsible for and what the kernel needs to do. The 
> kernel needs the registration system and to associate an identifier with 
> events inside the container.

Agreed this needs to be defined much better than it is.

> But would the container manager be mostly responsible for auditing the events 
> described here:
> 
> https://github.com/linux-audit/audit-documentation/wiki/SPEC-Virtualization-Manager-Guest-Lifecycle-Events

I'm having trouble fitting all these events into the container model,
but recognize its importance in continuing to try to do so or to be able
to justify deviations from this SPEC.

> Also, we can already audit exit, unshare, setns, and clone. If the kernel just 
> sticks the identifier on them, isn't that sufficient?

I think this last one is incomplete without a way to identify the
namespaces involved.

> -Steve
> 
> > (v2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and children
> > into same container
> > 
> > - RGB

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-19 19:57     ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-19 19:57 UTC (permalink / raw)
  To: Steve Grubb
  Cc: cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development,
	Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
	Eric W. Biederman, Eric Paris, jlayton, Andy Lutomirski,
	mszeredi, Paul Moore, Serge E. Hallyn, trondmy, Al Viro

On 2017-10-12 15:45, Steve Grubb wrote:
> On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> > 
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> > 
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> 
> The requirements for common criteria around containers should be very closely 
> modeled on the requirements for virtualization. It would be the container 
> manager that is responsible for logging the resource assignment events.

I suspect we are in violent agreement here.

> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> > 
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.

No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...

> > At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> This would be in addition to the normal audit fields.

It was intended that this be an auxilliary record, but this issue is
being debated in threads about other upstream issues currently so I
won't cover that here.

> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> > 
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> > 
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> > 
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> > 
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> > 
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> 
> In the virtualization requirements, we only log removal of resources when 
> something is removed by intention. If the VM shuts down, the manager issues a 
> VIRT_CONTROL stop event and the user space utilities knows this means all 
> resources have been unassigned.

Ok, this assumes the orchestrator is waiting on that child process (and
that it is in turn waiting on all its children) so it knows when that
job has exited naturally or errored out.  I don't know if there is any
consensus or best practice with orchestrators out there now.  The kernel
should know, so it seemed reasonable to report what was known.  Besides,
in this case, I was talking specifically about namespace creation and
destruction rather than containers.

> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> > 
> > When a container ceases to exist because the last process in that
> > container has exited and hence the last namespace has been destroyed and
> > its refcount dropping to zero, log the fact.
> > (This latter is likely needed for certification accountability.)  A
> > container object may need a list of processes and/or namespaces.
> > 
> > A namespace cannot directly migrate from one container to another but
> > could be assigned to a newly spawned container.  A namespace can be
> > moved from one container to another indirectly by having that namespace
> > used in a second process in another container and then ending all the
> > processes in the first container.
> 
> I'm thinking that there needs to be a clear delineation between what the 
> container manager is responsible for and what the kernel needs to do. The 
> kernel needs the registration system and to associate an identifier with 
> events inside the container.

Agreed this needs to be defined much better than it is.

> But would the container manager be mostly responsible for auditing the events 
> described here:
> 
> https://github.com/linux-audit/audit-documentation/wiki/SPEC-Virtualization-Manager-Guest-Lifecycle-Events

I'm having trouble fitting all these events into the container model,
but recognize its importance in continuing to try to do so or to be able
to justify deviations from this SPEC.

> Also, we can already audit exit, unshare, setns, and clone. If the kernel just 
> sticks the identifier on them, isn't that sufficient?

I think this last one is incomplete without a way to identify the
namespaces involved.

> -Steve
> 
> > (v2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and children
> > into same container
> > 
> > - RGB

- RGB

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]     ` <20171019195747.4ssujtaj3f5ipsoh-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2017-10-19 23:11       ` Aleksa Sarai
  0 siblings, 0 replies; 94+ messages in thread
From: Aleksa Sarai @ 2017-10-19 23:11 UTC (permalink / raw)
  To: Richard Guy Briggs, Steve Grubb
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mszeredi-H+wXaHxf7aLQT0dZR+AlfA,
	David Howells, Simo Sorce, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Paul Moore, Linux Audit, Al Viro, Andy Lutomirski, Eric Paris,
	Linux FS Devel, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Linux Network Development, Eric W. Biederman

>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>>
>> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> 
> No, because then any process with that capability (vsftpd) could change
> its own container ID.  This is discussed more in other parts of the
> thread...

Not if we make the container ID append-only (to support nesting), or 
write-once (the other idea thrown around). In that case, you can't move 
"out" from a particular container ID, you can only go "deeper". These 
semantics don't make sense for generic containers, but since the point 
of this facility is *specifically* for audit I imagine that not being 
able to move a process from a sub-container's ID is a benefit.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]     ` <20171019195747.4ssujtaj3f5ipsoh-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2017-10-19 23:11       ` Aleksa Sarai
  0 siblings, 0 replies; 94+ messages in thread
From: Aleksa Sarai @ 2017-10-19 23:11 UTC (permalink / raw)
  To: Richard Guy Briggs, Steve Grubb
  Cc: mszeredi, trondmy, Andy Lutomirski, jlayton, Carlos O'Donell,
	Linux API, Linux Containers, Paul Moore, Linux Kernel,
	Eric Paris, Al Viro, David Howells, Linux Audit, Simo Sorce,
	Linux Network Development, Linux FS Devel, cgroups,
	Eric W. Biederman

>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>>
>> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> 
> No, because then any process with that capability (vsftpd) could change
> its own container ID.  This is discussed more in other parts of the
> thread...

Not if we make the container ID append-only (to support nesting), or 
write-once (the other idea thrown around). In that case, you can't move 
"out" from a particular container ID, you can only go "deeper". These 
semantics don't make sense for generic containers, but since the point 
of this facility is *specifically* for audit I imagine that not being 
able to move a process from a sub-container's ID is a benefit.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-19 23:11       ` Aleksa Sarai
  0 siblings, 0 replies; 94+ messages in thread
From: Aleksa Sarai @ 2017-10-19 23:11 UTC (permalink / raw)
  To: Richard Guy Briggs, Steve Grubb
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Paul Moore,
	Linux Kernel, Eric Paris, Al Viro, David Howells, Linux Audit,
	Simo Sorce, Linux Network Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman

>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>>
>> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> 
> No, because then any process with that capability (vsftpd) could change
> its own container ID.  This is discussed more in other parts of the
> thread...

Not if we make the container ID append-only (to support nesting), or 
write-once (the other idea thrown around). In that case, you can't move 
"out" from a particular container ID, you can only go "deeper". These 
semantics don't make sense for generic containers, but since the point 
of this facility is *specifically* for audit I imagine that not being 
able to move a process from a sub-container's ID is a benefit.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]       ` <8f495870-dd6c-23b9-b82b-4228a441c729-l3A5Bk7waGM@public.gmane.org>
@ 2017-10-19 23:15         ` Aleksa Sarai
  2017-10-20  2:25         ` Steve Grubb
  1 sibling, 0 replies; 94+ messages in thread
From: Aleksa Sarai @ 2017-10-19 23:15 UTC (permalink / raw)
  To: Richard Guy Briggs, Steve Grubb
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Paul Moore,
	Linux Kernel, David Howells, Linux Audit, Al Viro, Simo Sorce,
	Eric Paris, Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Linux Network Development, trondmy-7I+n7zu2hftEKMMhf/gKZA

>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>> representing a process that will become the first process in a new
>>>> container.  This write might place restrictions on mount namespaces
>>>> required to define a container, or at least careful checking of
>>>> namespaces in the kernel to verify permissions of the orchestrator 
>>>> so it
>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>> necessary in the container orchestrator's mntNS.
>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>> and simpler.
>>>>
>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>> registration.
>>>
>>> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
>>
>> No, because then any process with that capability (vsftpd) could change
>> its own container ID.  This is discussed more in other parts of the
>> thread...
> 
> Not if we make the container ID append-only (to support nesting), or 
> write-once (the other idea thrown around). In that case, you can't move 
> "out" from a particular container ID, you can only go "deeper". These 
> semantics don't make sense for generic containers, but since the point 
> of this facility is *specifically* for audit I imagine that not being 
> able to move a process from a sub-container's ID is a benefit.

[This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in 
a sister thread.]

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-10-19 23:11       ` Aleksa Sarai
  (?)
@ 2017-10-19 23:15       ` Aleksa Sarai
  -1 siblings, 0 replies; 94+ messages in thread
From: Aleksa Sarai @ 2017-10-19 23:15 UTC (permalink / raw)
  To: Richard Guy Briggs, Steve Grubb
  Cc: cgroups, mszeredi, David Howells, Simo Sorce, jlayton,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Paul Moore, Linux Audit, Al Viro, Andy Lutomirski, Eric Paris,
	Linux FS Devel, trondmy, Linux Network Development,
	Eric W. Biederman

>>>> The registration is a pseudo filesystem (proc, since PID tree already
>>>> exists) write of a u8[16] UUID representing the container ID to a file
>>>> representing a process that will become the first process in a new
>>>> container.  This write might place restrictions on mount namespaces
>>>> required to define a container, or at least careful checking of
>>>> namespaces in the kernel to verify permissions of the orchestrator 
>>>> so it
>>>> can't change its own container ID.  A bind mount of nsfs may be
>>>> necessary in the container orchestrator's mntNS.
>>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>>> and simpler.
>>>>
>>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>>> registration.
>>>
>>> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
>>
>> No, because then any process with that capability (vsftpd) could change
>> its own container ID.  This is discussed more in other parts of the
>> thread...
> 
> Not if we make the container ID append-only (to support nesting), or 
> write-once (the other idea thrown around). In that case, you can't move 
> "out" from a particular container ID, you can only go "deeper". These 
> semantics don't make sense for generic containers, but since the point 
> of this facility is *specifically* for audit I imagine that not being 
> able to move a process from a sub-container's ID is a benefit.

[This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in 
a sister thread.]

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]       ` <8f495870-dd6c-23b9-b82b-4228a441c729-l3A5Bk7waGM@public.gmane.org>
  2017-10-19 23:15         ` Aleksa Sarai
@ 2017-10-20  2:25         ` Steve Grubb
  1 sibling, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-20  2:25 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, mszeredi-H+wXaHxf7aLQT0dZR+AlfA,
	David Howells, Simo Sorce, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Linux API, Linux Containers, Linux Kernel, Paul Moore,
	Carlos O'Donell, Linux Audit, Al Viro, Andy Lutomirski,
	Eric Paris, Linux FS Devel, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Linux Network Development, Eric W. Biederman

On Thursday, October 19, 2017 7:11:33 PM EDT Aleksa Sarai wrote:
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>> 
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> 
> >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> > 
> > No, because then any process with that capability (vsftpd) could change
> > its own container ID.  This is discussed more in other parts of the
> > thread...

For the record, I changed my mind. CAP_AUDIT_CONTROL is the correct 
capability. 

> Not if we make the container ID append-only (to support nesting), or
> write-once (the other idea thrown around). 

Well...I like to use lessons learned if they can be applied. In the normal 
world without containers we have uid, auid, and session_id. uid is who you are 
now, auid is how you got into the system, session_id distinguishes individual 
auids. We have a default auid of -1 for system objects and a real number for 
people.

I think there should be the equivalent of auid and session_id but tailored for 
containers. Loginuid == container id. It can be set, overridden, or appended 
to (we'll figure this out later) in very limited circumstances. 
Container_session == session which is tamper-proof. This way things can enter 
a container with the same ID but under a different session. And everything 
else gets to inherit the original ID. This way we can trace actions to 
something that entered the container rather than normal system activity in the 
container.

What a security officer wants to know is what did people do inside the 
system / container. The system objects we typically don't care about. Sure 
they might get hacked and then work on behalf of someone, but they would 
almost always pop a shell so that they can have freedom. That should set off 
an AVC or create other activity that gets picked up.

-Steve

> In that case, you can't move "out" from a particular container ID, you can
> only go "deeper". These semantics don't make sense for generic containers,
> but since the point of this facility is *specifically* for audit I imagine
> that not being able to move a process from a sub-container's ID is a
> benefit.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]       ` <8f495870-dd6c-23b9-b82b-4228a441c729-l3A5Bk7waGM@public.gmane.org>
@ 2017-10-20  2:25         ` Steve Grubb
  2017-10-20  2:25         ` Steve Grubb
  1 sibling, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-20  2:25 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Richard Guy Briggs, mszeredi, trondmy, Andy Lutomirski, jlayton,
	Carlos O'Donell, Linux API, Linux Containers, Paul Moore,
	Linux Kernel, Eric Paris, Al Viro, David Howells, Linux Audit,
	Simo Sorce, Linux Network Development, Linux FS Devel, cgroups,
	Eric W. Biederman

On Thursday, October 19, 2017 7:11:33 PM EDT Aleksa Sarai wrote:
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>> 
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> 
> >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> > 
> > No, because then any process with that capability (vsftpd) could change
> > its own container ID.  This is discussed more in other parts of the
> > thread...

For the record, I changed my mind. CAP_AUDIT_CONTROL is the correct 
capability. 

> Not if we make the container ID append-only (to support nesting), or
> write-once (the other idea thrown around). 

Well...I like to use lessons learned if they can be applied. In the normal 
world without containers we have uid, auid, and session_id. uid is who you are 
now, auid is how you got into the system, session_id distinguishes individual 
auids. We have a default auid of -1 for system objects and a real number for 
people.

I think there should be the equivalent of auid and session_id but tailored for 
containers. Loginuid == container id. It can be set, overridden, or appended 
to (we'll figure this out later) in very limited circumstances. 
Container_session == session which is tamper-proof. This way things can enter 
a container with the same ID but under a different session. And everything 
else gets to inherit the original ID. This way we can trace actions to 
something that entered the container rather than normal system activity in the 
container.

What a security officer wants to know is what did people do inside the 
system / container. The system objects we typically don't care about. Sure 
they might get hacked and then work on behalf of someone, but they would 
almost always pop a shell so that they can have freedom. That should set off 
an AVC or create other activity that gets picked up.

-Steve

> In that case, you can't move "out" from a particular container ID, you can
> only go "deeper". These semantics don't make sense for generic containers,
> but since the point of this facility is *specifically* for audit I imagine
> that not being able to move a process from a sub-container's ID is a
> benefit.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-10-20  2:25         ` Steve Grubb
  0 siblings, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-10-20  2:25 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Richard Guy Briggs, mszeredi-H+wXaHxf7aLQT0dZR+AlfA,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
	Linux Containers, Paul Moore, Linux Kernel, Eric Paris, Al Viro,
	David Howells, Linux Audit, Simo Sorce,
	Linux Network Development, Linux FS Devel,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman

On Thursday, October 19, 2017 7:11:33 PM EDT Aleksa Sarai wrote:
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>> 
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> 
> >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> > 
> > No, because then any process with that capability (vsftpd) could change
> > its own container ID.  This is discussed more in other parts of the
> > thread...

For the record, I changed my mind. CAP_AUDIT_CONTROL is the correct 
capability. 

> Not if we make the container ID append-only (to support nesting), or
> write-once (the other idea thrown around). 

Well...I like to use lessons learned if they can be applied. In the normal 
world without containers we have uid, auid, and session_id. uid is who you are 
now, auid is how you got into the system, session_id distinguishes individual 
auids. We have a default auid of -1 for system objects and a real number for 
people.

I think there should be the equivalent of auid and session_id but tailored for 
containers. Loginuid == container id. It can be set, overridden, or appended 
to (we'll figure this out later) in very limited circumstances. 
Container_session == session which is tamper-proof. This way things can enter 
a container with the same ID but under a different session. And everything 
else gets to inherit the original ID. This way we can trace actions to 
something that entered the container rather than normal system activity in the 
container.

What a security officer wants to know is what did people do inside the 
system / container. The system objects we typically don't care about. Sure 
they might get hacked and then work on behalf of someone, but they would 
almost always pop a shell so that they can have freedom. That should set off 
an AVC or create other activity that gets picked up.

-Steve

> In that case, you can't move "out" from a particular container ID, you can
> only go "deeper". These semantics don't make sense for generic containers,
> but since the point of this facility is *specifically* for audit I imagine
> that not being able to move a process from a sub-container's ID is a
> benefit.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]   ` <75b7d6a6-42ba-2dff-1836-1091c7c024e7-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
  2017-10-17  0:33     ` Richard Guy Briggs
@ 2017-12-09 10:20     ` Mickaël Salaün
  1 sibling, 0 replies; 94+ messages in thread
From: Mickaël Salaün @ 2017-12-09 10:20 UTC (permalink / raw)
  To: Casey Schaufler, Richard Guy Briggs,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	Michael Kerrisk, David Howells, Eric W. Biederman, Simo Sorce,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Al Viro


[-- Attachment #1.1.1: Type: text/plain, Size: 3001 bytes --]


On 12/10/2017 18:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>> Containers are a userspace concept.  The kernel knows nothing of them.
>>
>> The Linux audit system needs a way to be able to track the container
>> provenance of events and actions.  Audit needs the kernel's help to do
>> this.
>>
>> Since the concept of a container is entirely a userspace concept, a
>> registration from the userspace container orchestration system initiates
>> this.  This will define a point in time and a set of resources
>> associated with a particular container with an audit container ID.
>>
>> The registration is a pseudo filesystem (proc, since PID tree already
>> exists) write of a u8[16] UUID representing the container ID to a file
>> representing a process that will become the first process in a new
>> container.  This write might place restrictions on mount namespaces
>> required to define a container, or at least careful checking of
>> namespaces in the kernel to verify permissions of the orchestrator so it
>> can't change its own container ID.  A bind mount of nsfs may be
>> necessary in the container orchestrator's mntNS.
>> Note: Use a 128-bit scalar rather than a string to make compares faster
>> and simpler.
>>
>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>> registration.
> 
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?
> 
>>   At that time, record the target container's user-supplied
>> container identifier along with the target container's first process
>> (which may become the target container's "init" process) process ID
>> (referenced from the initial PID namespace), all namespace IDs (in the
>> form of a nsfs device number and inode number tuple) in a new auxilliary
>> record AUDIT_CONTAINER with a qualifying op=$action field.

Here is an idea to avoid privilege problems or the need for a new
capability: make it automatic. What makes a container a container seems
to be the use of at least a namespace. What about automatically create
and assign an ID to a process when it enters a namespace different than
one of its parent process? This delegates the (permission)
responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

One interesting side effect of this approach would be to be able to
identify which processes are in the same set of namespaces, even if not
spawn from the container but entered after its creation (i.e. using
setns), by creating container IDs as a (deterministic) checksum from the
/proc/self/ns/* IDs.

Since the concern is to identify a container, I think the ability to
audit the switch from one container ID to another is enough. I don't
think we need nested IDs.

As a side note, you may want to take a look at the Linux-VServer's XID.

Regards,
 Mickaël


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]   ` <75b7d6a6-42ba-2dff-1836-1091c7c024e7-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-12-09 10:20     ` Mickaël Salaün
  2017-12-09 10:20     ` Mickaël Salaün
  1 sibling, 0 replies; 94+ messages in thread
From: Mickaël Salaün @ 2017-12-09 10:20 UTC (permalink / raw)
  To: Casey Schaufler, Richard Guy Briggs, cgroups, Linux Containers,
	Linux API, Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi, Eric W. Biederman, Simo Sorce, jlayton,
	Carlos O'Donell, David Howells, Al Viro, Andy Lutomirski,
	Eric Paris, trondmy, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 3001 bytes --]


On 12/10/2017 18:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>> Containers are a userspace concept.  The kernel knows nothing of them.
>>
>> The Linux audit system needs a way to be able to track the container
>> provenance of events and actions.  Audit needs the kernel's help to do
>> this.
>>
>> Since the concept of a container is entirely a userspace concept, a
>> registration from the userspace container orchestration system initiates
>> this.  This will define a point in time and a set of resources
>> associated with a particular container with an audit container ID.
>>
>> The registration is a pseudo filesystem (proc, since PID tree already
>> exists) write of a u8[16] UUID representing the container ID to a file
>> representing a process that will become the first process in a new
>> container.  This write might place restrictions on mount namespaces
>> required to define a container, or at least careful checking of
>> namespaces in the kernel to verify permissions of the orchestrator so it
>> can't change its own container ID.  A bind mount of nsfs may be
>> necessary in the container orchestrator's mntNS.
>> Note: Use a 128-bit scalar rather than a string to make compares faster
>> and simpler.
>>
>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>> registration.
> 
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?
> 
>>   At that time, record the target container's user-supplied
>> container identifier along with the target container's first process
>> (which may become the target container's "init" process) process ID
>> (referenced from the initial PID namespace), all namespace IDs (in the
>> form of a nsfs device number and inode number tuple) in a new auxilliary
>> record AUDIT_CONTAINER with a qualifying op=$action field.

Here is an idea to avoid privilege problems or the need for a new
capability: make it automatic. What makes a container a container seems
to be the use of at least a namespace. What about automatically create
and assign an ID to a process when it enters a namespace different than
one of its parent process? This delegates the (permission)
responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

One interesting side effect of this approach would be to be able to
identify which processes are in the same set of namespaces, even if not
spawn from the container but entered after its creation (i.e. using
setns), by creating container IDs as a (deterministic) checksum from the
/proc/self/ns/* IDs.

Since the concern is to identify a container, I think the ability to
audit the switch from one container ID to another is enough. I don't
think we need nested IDs.

As a side note, you may want to take a look at the Linux-VServer's XID.

Regards,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-12-09 10:20     ` Mickaël Salaün
  0 siblings, 0 replies; 94+ messages in thread
From: Mickaël Salaün @ 2017-12-09 10:20 UTC (permalink / raw)
  To: Casey Schaufler, Richard Guy Briggs,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	David Howells, Al Viro, Andy Lutomirski, Eric Paris,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Michael Kerrisk


[-- Attachment #1.1: Type: text/plain, Size: 3001 bytes --]


On 12/10/2017 18:33, Casey Schaufler wrote:
> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>> Containers are a userspace concept.  The kernel knows nothing of them.
>>
>> The Linux audit system needs a way to be able to track the container
>> provenance of events and actions.  Audit needs the kernel's help to do
>> this.
>>
>> Since the concept of a container is entirely a userspace concept, a
>> registration from the userspace container orchestration system initiates
>> this.  This will define a point in time and a set of resources
>> associated with a particular container with an audit container ID.
>>
>> The registration is a pseudo filesystem (proc, since PID tree already
>> exists) write of a u8[16] UUID representing the container ID to a file
>> representing a process that will become the first process in a new
>> container.  This write might place restrictions on mount namespaces
>> required to define a container, or at least careful checking of
>> namespaces in the kernel to verify permissions of the orchestrator so it
>> can't change its own container ID.  A bind mount of nsfs may be
>> necessary in the container orchestrator's mntNS.
>> Note: Use a 128-bit scalar rather than a string to make compares faster
>> and simpler.
>>
>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>> registration.
> 
> Hang on. If containers are a user space concept, how can
> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> a container, how can you be asking for a capability to manage
> them?
> 
>>   At that time, record the target container's user-supplied
>> container identifier along with the target container's first process
>> (which may become the target container's "init" process) process ID
>> (referenced from the initial PID namespace), all namespace IDs (in the
>> form of a nsfs device number and inode number tuple) in a new auxilliary
>> record AUDIT_CONTAINER with a qualifying op=$action field.

Here is an idea to avoid privilege problems or the need for a new
capability: make it automatic. What makes a container a container seems
to be the use of at least a namespace. What about automatically create
and assign an ID to a process when it enters a namespace different than
one of its parent process? This delegates the (permission)
responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

One interesting side effect of this approach would be to be able to
identify which processes are in the same set of namespaces, even if not
spawn from the container but entered after its creation (i.e. using
setns), by creating container IDs as a (deterministic) checksum from the
/proc/self/ns/* IDs.

Since the concern is to identify a container, I think the ability to
audit the switch from one container ID to another is enough. I don't
think we need nested IDs.

As a side note, you may want to take a look at the Linux-VServer's XID.

Regards,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]     ` <7ebca85a-425c-2b95-9a5f-59d81707339e-WFhQfpSGs3bR7s880joybQ@public.gmane.org>
@ 2017-12-09 18:28       ` Casey Schaufler
  2017-12-11 15:10       ` Richard Guy Briggs
  1 sibling, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-12-09 18:28 UTC (permalink / raw)
  To: Mickaël Salaün, Richard Guy Briggs,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	Michael Kerrisk, David Howells, Eric W. Biederman, Simo Sorce,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Al Viro

On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:
> On 12/10/2017 18:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
>>
>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace.

You might think so, but I am assured that you can have a container
without using namespaces. Intel's "Clear Containers", which use
virtualization technology, are one example. I have considered creating
"Smack Containers" using mandatory access control technology, more
to press the point that "containers" is a marketing concept, not
technology.

>  What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

That gets ugly when you have a container that uses user, filesystem,
network and whatever else namespaces. If all containers used the same
set of namespaces I think this would be a fine idea, but they don't.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.
>
> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Because a container doesn't have to use namespaces to be a container
you still need a mechanism for a process to declare that it is in fact
in a container, and to identify the container.

>
> As a side note, you may want to take a look at the Linux-VServer's XID.
>
> Regards,
>  Micka�l
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]     ` <7ebca85a-425c-2b95-9a5f-59d81707339e-WFhQfpSGs3bR7s880joybQ@public.gmane.org>
  2017-12-09 18:28       ` Casey Schaufler
@ 2017-12-09 18:28       ` Casey Schaufler
  1 sibling, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-12-09 18:28 UTC (permalink / raw)
  To: Mickaël Salaün, Richard Guy Briggs, cgroups,
	Linux Containers, Linux API, Linux Audit, Linux FS Devel,
	Linux Kernel, Linux Network Development
  Cc: mszeredi, Eric W. Biederman, Simo Sorce, jlayton,
	Carlos O'Donell, David Howells, Al Viro, Andy Lutomirski,
	Eric Paris, trondmy, Michael Kerrisk

On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:
> On 12/10/2017 18:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
>>
>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace.

You might think so, but I am assured that you can have a container
without using namespaces. Intel's "Clear Containers", which use
virtualization technology, are one example. I have considered creating
"Smack Containers" using mandatory access control technology, more
to press the point that "containers" is a marketing concept, not
technology.

>  What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

That gets ugly when you have a container that uses user, filesystem,
network and whatever else namespaces. If all containers used the same
set of namespaces I think this would be a fine idea, but they don't.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.
>
> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Because a container doesn't have to use namespaces to be a container
you still need a mechanism for a process to declare that it is in fact
in a container, and to identify the container.

>
> As a side note, you may want to take a look at the Linux-VServer's XID.
>
> Regards,
>  Micka�l
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-12-09 18:28       ` Casey Schaufler
  0 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-12-09 18:28 UTC (permalink / raw)
  To: Mickaël Salaün, Richard Guy Briggs,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	David Howells, Al Viro, Andy Lutomirski, Eric Paris,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Michael Kerrisk

On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:
> On 12/10/2017 18:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
>>
>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace.

You might think so, but I am assured that you can have a container
without using namespaces. Intel's "Clear Containers", which use
virtualization technology, are one example. I have considered creating
"Smack Containers" using mandatory access control technology, more
to press the point that "containers" is a marketing concept, not
technology.

>  What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

That gets ugly when you have a container that uses user, filesystem,
network and whatever else namespaces. If all containers used the same
set of namespaces I think this would be a fine idea, but they don't.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.
>
> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Because a container doesn't have to use namespaces to be a container
you still need a mechanism for a process to declare that it is in fact
in a container, and to identify the container.

>
> As a side note, you may want to take a look at the Linux-VServer's XID.
>
> Regards,
>  Micka�l
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-12-09 18:28       ` Casey Schaufler
  0 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-12-09 18:28 UTC (permalink / raw)
  To: Mickaël Salaün, Richard Guy Briggs,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	David Howells, Al Viro, Andy Lutomirski, Eric Paris,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Michael Kerrisk

On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:
> On 12/10/2017 18:33, Casey Schaufler wrote:
>> On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
>>> Containers are a userspace concept.  The kernel knows nothing of them.
>>>
>>> The Linux audit system needs a way to be able to track the container
>>> provenance of events and actions.  Audit needs the kernel's help to do
>>> this.
>>>
>>> Since the concept of a container is entirely a userspace concept, a
>>> registration from the userspace container orchestration system initiates
>>> this.  This will define a point in time and a set of resources
>>> associated with a particular container with an audit container ID.
>>>
>>> The registration is a pseudo filesystem (proc, since PID tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container.  This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator so it
>>> can't change its own container ID.  A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>> Hang on. If containers are a user space concept, how can
>> you want CAP_CONTAINER_ANYTHING? If there's not such thing as
>> a container, how can you be asking for a capability to manage
>> them?
>>
>>>   At that time, record the target container's user-supplied
>>> container identifier along with the target container's first process
>>> (which may become the target container's "init" process) process ID
>>> (referenced from the initial PID namespace), all namespace IDs (in the
>>> form of a nsfs device number and inode number tuple) in a new auxilliary
>>> record AUDIT_CONTAINER with a qualifying op=$action field.
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace.

You might think so, but I am assured that you can have a container
without using namespaces. Intel's "Clear Containers", which use
virtualization technology, are one example. I have considered creating
"Smack Containers" using mandatory access control technology, more
to press the point that "containers" is a marketing concept, not
technology.

>  What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

That gets ugly when you have a container that uses user, filesystem,
network and whatever else namespaces. If all containers used the same
set of namespaces I think this would be a fine idea, but they don't.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.
>
> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Because a container doesn't have to use namespaces to be a container
you still need a mechanism for a process to declare that it is in fact
in a container, and to identify the container.

>
> As a side note, you may want to take a look at the Linux-VServer's XID.
>
> Regards,
>  Micka�l
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]     ` <7ebca85a-425c-2b95-9a5f-59d81707339e-WFhQfpSGs3bR7s880joybQ@public.gmane.org>
  2017-12-09 18:28       ` Casey Schaufler
@ 2017-12-11 15:10       ` Richard Guy Briggs
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-12-11 15:10 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
	Eric Paris, Michael Kerrisk, David Howells, Linux FS Devel,
	Linux Audit, Eric W. Biederman, Simo Sorce,
	Linux Network Development, Casey Schaufler,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Al Viro

On 2017-12-09 11:20, Mickaël Salaün wrote:
> 
> On 12/10/2017 18:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >> Containers are a userspace concept.  The kernel knows nothing of them.
> >>
> >> The Linux audit system needs a way to be able to track the container
> >> provenance of events and actions.  Audit needs the kernel's help to do
> >> this.
> >>
> >> Since the concept of a container is entirely a userspace concept, a
> >> registration from the userspace container orchestration system initiates
> >> this.  This will define a point in time and a set of resources
> >> associated with a particular container with an audit container ID.
> >>
> >> The registration is a pseudo filesystem (proc, since PID tree already
> >> exists) write of a u8[16] UUID representing the container ID to a file
> >> representing a process that will become the first process in a new
> >> container.  This write might place restrictions on mount namespaces
> >> required to define a container, or at least careful checking of
> >> namespaces in the kernel to verify permissions of the orchestrator so it
> >> can't change its own container ID.  A bind mount of nsfs may be
> >> necessary in the container orchestrator's mntNS.
> >> Note: Use a 128-bit scalar rather than a string to make compares faster
> >> and simpler.
> >>
> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >> registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> > 
> >>   At that time, record the target container's user-supplied
> >> container identifier along with the target container's first process
> >> (which may become the target container's "init" process) process ID
> >> (referenced from the initial PID namespace), all namespace IDs (in the
> >> form of a nsfs device number and inode number tuple) in a new auxilliary
> >> record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace. What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

A container doesn't imply a namespace and vice versa.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.

This would be really helpful, but it isn't the case.

> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Since container namespace membership is arbitrary between container
orchestrators, this needs a registration process and a way for the
container orchestrator to know the ID.


I completely agree with Casey here.

> As a side note, you may want to take a look at the Linux-VServer's XID.
> 
> Regards,
>  Mickaël

- RGB

--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]     ` <7ebca85a-425c-2b95-9a5f-59d81707339e-WFhQfpSGs3bR7s880joybQ@public.gmane.org>
  2017-12-09 18:28       ` Casey Schaufler
@ 2017-12-11 15:10       ` Richard Guy Briggs
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-12-11 15:10 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Casey Schaufler, cgroups, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development, mszeredi, Eric W. Biederman,
	Simo Sorce, jlayton, Carlos O'Donell, David Howells, Al Viro,
	Andy Lutomirski, Eric Paris, trondmy, Michael Kerrisk

On 2017-12-09 11:20, Mickaël Salaün wrote:
> 
> On 12/10/2017 18:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >> Containers are a userspace concept.  The kernel knows nothing of them.
> >>
> >> The Linux audit system needs a way to be able to track the container
> >> provenance of events and actions.  Audit needs the kernel's help to do
> >> this.
> >>
> >> Since the concept of a container is entirely a userspace concept, a
> >> registration from the userspace container orchestration system initiates
> >> this.  This will define a point in time and a set of resources
> >> associated with a particular container with an audit container ID.
> >>
> >> The registration is a pseudo filesystem (proc, since PID tree already
> >> exists) write of a u8[16] UUID representing the container ID to a file
> >> representing a process that will become the first process in a new
> >> container.  This write might place restrictions on mount namespaces
> >> required to define a container, or at least careful checking of
> >> namespaces in the kernel to verify permissions of the orchestrator so it
> >> can't change its own container ID.  A bind mount of nsfs may be
> >> necessary in the container orchestrator's mntNS.
> >> Note: Use a 128-bit scalar rather than a string to make compares faster
> >> and simpler.
> >>
> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >> registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> > 
> >>   At that time, record the target container's user-supplied
> >> container identifier along with the target container's first process
> >> (which may become the target container's "init" process) process ID
> >> (referenced from the initial PID namespace), all namespace IDs (in the
> >> form of a nsfs device number and inode number tuple) in a new auxilliary
> >> record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace. What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

A container doesn't imply a namespace and vice versa.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.

This would be really helpful, but it isn't the case.

> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Since container namespace membership is arbitrary between container
orchestrators, this needs a registration process and a way for the
container orchestrator to know the ID.


I completely agree with Casey here.

> As a side note, you may want to take a look at the Linux-VServer's XID.
> 
> Regards,
>  Mickaël

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-12-11 15:10       ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-12-11 15:10 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Casey Schaufler, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Linux Containers, Linux API, Linux Audit, Linux FS Devel,
	Linux Kernel, Linux Network Development,
	mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	David Howells, Al Viro, Andy Lutomirski, Eric Paris,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Michael Kerrisk

On 2017-12-09 11:20, Mickaël Salaün wrote:
> 
> On 12/10/2017 18:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >> Containers are a userspace concept.  The kernel knows nothing of them.
> >>
> >> The Linux audit system needs a way to be able to track the container
> >> provenance of events and actions.  Audit needs the kernel's help to do
> >> this.
> >>
> >> Since the concept of a container is entirely a userspace concept, a
> >> registration from the userspace container orchestration system initiates
> >> this.  This will define a point in time and a set of resources
> >> associated with a particular container with an audit container ID.
> >>
> >> The registration is a pseudo filesystem (proc, since PID tree already
> >> exists) write of a u8[16] UUID representing the container ID to a file
> >> representing a process that will become the first process in a new
> >> container.  This write might place restrictions on mount namespaces
> >> required to define a container, or at least careful checking of
> >> namespaces in the kernel to verify permissions of the orchestrator so it
> >> can't change its own container ID.  A bind mount of nsfs may be
> >> necessary in the container orchestrator's mntNS.
> >> Note: Use a 128-bit scalar rather than a string to make compares faster
> >> and simpler.
> >>
> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >> registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> > 
> >>   At that time, record the target container's user-supplied
> >> container identifier along with the target container's first process
> >> (which may become the target container's "init" process) process ID
> >> (referenced from the initial PID namespace), all namespace IDs (in the
> >> form of a nsfs device number and inode number tuple) in a new auxilliary
> >> record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace. What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

A container doesn't imply a namespace and vice versa.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.

This would be really helpful, but it isn't the case.

> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Since container namespace membership is arbitrary between container
orchestrators, this needs a registration process and a way for the
container orchestrator to know the ID.


I completely agree with Casey here.

> As a side note, you may want to take a look at the Linux-VServer's XID.
> 
> Regards,
>  Mickaël

- RGB

--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-12-11 15:10       ` Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-12-11 15:10 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Casey Schaufler, cgroups, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development, mszeredi, Eric W. Biederman,
	Simo Sorce, jlayton, Carlos O'Donell, David Howells, Al Viro,
	Andy Lutomirski, Eric Paris, trondmy, Michael Kerrisk

On 2017-12-09 11:20, Micka�l Sala�n wrote:
> 
> On 12/10/2017 18:33, Casey Schaufler wrote:
> > On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> >> Containers are a userspace concept.  The kernel knows nothing of them.
> >>
> >> The Linux audit system needs a way to be able to track the container
> >> provenance of events and actions.  Audit needs the kernel's help to do
> >> this.
> >>
> >> Since the concept of a container is entirely a userspace concept, a
> >> registration from the userspace container orchestration system initiates
> >> this.  This will define a point in time and a set of resources
> >> associated with a particular container with an audit container ID.
> >>
> >> The registration is a pseudo filesystem (proc, since PID tree already
> >> exists) write of a u8[16] UUID representing the container ID to a file
> >> representing a process that will become the first process in a new
> >> container.  This write might place restrictions on mount namespaces
> >> required to define a container, or at least careful checking of
> >> namespaces in the kernel to verify permissions of the orchestrator so it
> >> can't change its own container ID.  A bind mount of nsfs may be
> >> necessary in the container orchestrator's mntNS.
> >> Note: Use a 128-bit scalar rather than a string to make compares faster
> >> and simpler.
> >>
> >> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >> registration.
> > 
> > Hang on. If containers are a user space concept, how can
> > you want CAP_CONTAINER_ANYTHING? If there's not such thing as
> > a container, how can you be asking for a capability to manage
> > them?
> > 
> >>   At that time, record the target container's user-supplied
> >> container identifier along with the target container's first process
> >> (which may become the target container's "init" process) process ID
> >> (referenced from the initial PID namespace), all namespace IDs (in the
> >> form of a nsfs device number and inode number tuple) in a new auxilliary
> >> record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> Here is an idea to avoid privilege problems or the need for a new
> capability: make it automatic. What makes a container a container seems
> to be the use of at least a namespace. What about automatically create
> and assign an ID to a process when it enters a namespace different than
> one of its parent process? This delegates the (permission)
> responsibility to the use of namespaces (e.g. /proc/sys/user/max_* limit).

A container doesn't imply a namespace and vice versa.

> One interesting side effect of this approach would be to be able to
> identify which processes are in the same set of namespaces, even if not
> spawn from the container but entered after its creation (i.e. using
> setns), by creating container IDs as a (deterministic) checksum from the
> /proc/self/ns/* IDs.

This would be really helpful, but it isn't the case.

> Since the concern is to identify a container, I think the ability to
> audit the switch from one container ID to another is enough. I don't
> think we need nested IDs.

Since container namespace membership is arbitrary between container
orchestrators, this needs a registration process and a way for the
container orchestrator to know the ID.


I completely agree with Casey here.

> As a side note, you may want to take a look at the Linux-VServer's XID.
> 
> Regards,
>  Micka�l

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]       ` <f8ea78be-9bbf-2967-7b12-ac93bb85b0bc-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
@ 2017-12-11 16:30         ` Eric Paris
  0 siblings, 0 replies; 94+ messages in thread
From: Eric Paris @ 2017-12-11 16:30 UTC (permalink / raw)
  To: Casey Schaufler, Mickaël Salaün, Richard Guy Briggs,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	Michael Kerrisk, David Howells, Eric W. Biederman, Simo Sorce,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Al Viro

On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
> On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:

> >  What about automatically create
> > and assign an ID to a process when it enters a namespace different
> > than
> > one of its parent process? This delegates the (permission)
> > responsibility to the use of namespaces (e.g. /proc/sys/user/max_*
> > limit).
> 
> That gets ugly when you have a container that uses user, filesystem,
> network and whatever else namespaces. If all containers used the same
> set of namespaces I think this would be a fine idea, but they don't.
> 
> > One interesting side effect of this approach would be to be able to
> > identify which processes are in the same set of namespaces, even if
> > not
> > spawn from the container but entered after its creation (i.e. using
> > setns), by creating container IDs as a (deterministic) checksum
> > from the
> > /proc/self/ns/* IDs.
> > 
> > Since the concern is to identify a container, I think the ability
> > to
> > audit the switch from one container ID to another is enough. I
> > don't
> > think we need nested IDs.
> 
> Because a container doesn't have to use namespaces to be a container
> you still need a mechanism for a process to declare that it is in
> fact
> in a container, and to identify the container.

I like the idea but I'm still tossing it around in my head (and
thinking about Casey's statement too). Lets say we have a 'docker-like' 
container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
in all init namespaces and I run
  nsenter -t 100 -n ip link set eth0 promisc on
How should this be logged? Did this command run in it's own 'container'
unrelated to the 'docker-like' container?

-Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-12-09 18:28       ` Casey Schaufler
  (?)
  (?)
@ 2017-12-11 16:30       ` Eric Paris
  2017-12-11 16:52         ` Casey Schaufler
                           ` (2 more replies)
  -1 siblings, 3 replies; 94+ messages in thread
From: Eric Paris @ 2017-12-11 16:30 UTC (permalink / raw)
  To: Casey Schaufler, Mickaël Salaün, Richard Guy Briggs,
	cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development
  Cc: mszeredi, Eric W. Biederman, Simo Sorce, jlayton,
	Carlos O'Donell, David Howells, Al Viro, Andy Lutomirski,
	Eric Paris, trondmy, Michael Kerrisk

On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
> On 12/9/2017 2:20 AM, Micka�l Sala�n wrote:

> >  What about automatically create
> > and assign an ID to a process when it enters a namespace different
> > than
> > one of its parent process? This delegates the (permission)
> > responsibility to the use of namespaces (e.g. /proc/sys/user/max_*
> > limit).
> 
> That gets ugly when you have a container that uses user, filesystem,
> network and whatever else namespaces. If all containers used the same
> set of namespaces I think this would be a fine idea, but they don't.
> 
> > One interesting side effect of this approach would be to be able to
> > identify which processes are in the same set of namespaces, even if
> > not
> > spawn from the container but entered after its creation (i.e. using
> > setns), by creating container IDs as a (deterministic) checksum
> > from the
> > /proc/self/ns/* IDs.
> > 
> > Since the concern is to identify a container, I think the ability
> > to
> > audit the switch from one container ID to another is enough. I
> > don't
> > think we need nested IDs.
> 
> Because a container doesn't have to use namespaces to be a container
> you still need a mechanism for a process to declare that it is in
> fact
> in a container, and to identify the container.

I like the idea but I'm still tossing it around in my head (and
thinking about Casey's statement too). Lets say we have a 'docker-like' 
container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
in all init namespaces and I run
  nsenter -t 100 -n ip link set eth0 promisc on
How should this be logged? Did this command run in it's own 'container'
unrelated to the 'docker-like' container?

-Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]         ` <1513009857.6310.337.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-12-11 16:52           ` Casey Schaufler
  2017-12-11 19:37           ` Steve Grubb
  1 sibling, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-12-11 16:52 UTC (permalink / raw)
  To: Eric Paris, Mickaël Salaün, Richard Guy Briggs,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell,
	Michael Kerrisk, David Howells, Eric W. Biederman, Simo Sorce,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Al Viro

On 12/11/2017 8:30 AM, Eric Paris wrote:
> On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
>> Because a container doesn't have to use namespaces to be a container
>> you still need a mechanism for a process to declare that it is in
>> fact
>> in a container, and to identify the container.
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like' 
> container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
>   nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged? Did this command run in it's own 'container'
> unrelated to the 'docker-like' container?

Jose Bollo's PTAGS ( https://gitlab.com/jobol/ptags ) would be
prefect. Any time you declare something to be a container or
enter a namespace you slap a tag on it. Identifying nested
containers would be easy, you'd have multiple tags.

PTAGS unfortunately needs module stacking, but how hard could that be?


> -Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
  2017-12-11 16:30       ` Eric Paris
@ 2017-12-11 16:52         ` Casey Schaufler
       [not found]         ` <1513009857.6310.337.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-12-11 19:37           ` Steve Grubb
  2 siblings, 0 replies; 94+ messages in thread
From: Casey Schaufler @ 2017-12-11 16:52 UTC (permalink / raw)
  To: Eric Paris, Mickaël Salaün, Richard Guy Briggs,
	cgroups, Linux Containers, Linux API, Linux Audit,
	Linux FS Devel, Linux Kernel, Linux Network Development
  Cc: mszeredi, Eric W. Biederman, Simo Sorce, jlayton,
	Carlos O'Donell, David Howells, Al Viro, Andy Lutomirski,
	Eric Paris, trondmy, Michael Kerrisk

On 12/11/2017 8:30 AM, Eric Paris wrote:
> On Sat, 2017-12-09 at 10:28 -0800, Casey Schaufler wrote:
>> Because a container doesn't have to use namespaces to be a container
>> you still need a mechanism for a process to declare that it is in
>> fact
>> in a container, and to identify the container.
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like' 
> container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
>   nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged? Did this command run in it's own 'container'
> unrelated to the 'docker-like' container?

Jose Bollo's PTAGS ( https://gitlab.com/jobol/ptags ) would be
prefect. Any time you declare something to be a container or
enter a namespace you slap a tag on it. Identifying nested
containers would be easy, you'd have multiple tags.

PTAGS unfortunately needs module stacking, but how hard could that be?


> -Eric

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]         ` <1513009857.6310.337.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2017-12-11 16:52           ` Casey Schaufler
@ 2017-12-11 19:37           ` Steve Grubb
  1 sibling, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-12-11 19:37 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Linux API, Linux Containers, Linux FS Devel, Linux Kernel,
	Eric Paris, Mickaël Salaün, Linux Network Development,
	Casey Schaufler, cgroups-u79uwXL29TY76Z2rM5mHXA

On Monday, December 11, 2017 11:30:57 AM EST Eric Paris wrote:
> > Because a container doesn't have to use namespaces to be a container
> > you still need a mechanism for a process to declare that it is in
> > fact
> > in a container, and to identify the container.
> 
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like'
> container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
>   nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged?

If it is a normal process, then everything would match the init name space and 
you wouldn't have entered a container. If it were a container, any generated 
event should have the container ID from registration attached to it.

> Did this command run in it's own 'container' unrelated to the 'docker-like'
> container?

That should be determined by what's in the task struct.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
       [not found]         ` <1513009857.6310.337.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-12-11 19:37           ` Steve Grubb
  2017-12-11 19:37           ` Steve Grubb
  1 sibling, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-12-11 19:37 UTC (permalink / raw)
  To: linux-audit
  Cc: Eric Paris, Casey Schaufler, Mickaël Salaün,
	Richard Guy Briggs, cgroups, Linux Containers, Linux API,
	Linux FS Devel, Linux Kernel, Linux Network Development

On Monday, December 11, 2017 11:30:57 AM EST Eric Paris wrote:
> > Because a container doesn't have to use namespaces to be a container
> > you still need a mechanism for a process to declare that it is in
> > fact
> > in a container, and to identify the container.
> 
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like'
> container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
>   nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged?

If it is a normal process, then everything would match the init name space and 
you wouldn't have entered a container. If it were a container, any generated 
event should have the container ID from registration attached to it.

> Did this command run in it's own 'container' unrelated to the 'docker-like'
> container?

That should be determined by what's in the task struct.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: RFC(v2): Audit Kernel Container IDs
@ 2017-12-11 19:37           ` Steve Grubb
  0 siblings, 0 replies; 94+ messages in thread
From: Steve Grubb @ 2017-12-11 19:37 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Eric Paris, Casey Schaufler, Mickaël Salaün,
	Richard Guy Briggs, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Linux Containers, Linux API, Linux FS Devel, Linux Kernel,
	Linux Network Development

On Monday, December 11, 2017 11:30:57 AM EST Eric Paris wrote:
> > Because a container doesn't have to use namespaces to be a container
> > you still need a mechanism for a process to declare that it is in
> > fact
> > in a container, and to identify the container.
> 
> I like the idea but I'm still tossing it around in my head (and
> thinking about Casey's statement too). Lets say we have a 'docker-like'
> container with pid=100  netns=X,userns=Y,mountns=Z. If I'm on the host
> in all init namespaces and I run
>   nsenter -t 100 -n ip link set eth0 promisc on
> How should this be logged?

If it is a normal process, then everything would match the init name space and 
you wouldn't have entered a container. If it were a container, any generated 
event should have the container ID from registration attached to it.

> Did this command run in it's own 'container' unrelated to the 'docker-like'
> container?

That should be determined by what's in the task struct.

-Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RFC(v2): Audit Kernel Container IDs
@ 2017-10-12 14:14 Richard Guy Briggs
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Guy Briggs @ 2017-10-12 14:14 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
	Linux Audit, Linux FS Devel, Linux Kernel,
	Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Steve Grubb, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Paul Moore,
	Al Viro, David Howells, Simo Sorce,
	trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Eric W. Biederman

Containers are a userspace concept.  The kernel knows nothing of them.

The Linux audit system needs a way to be able to track the container
provenance of events and actions.  Audit needs the kernel's help to do
this.

Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this.  This will define a point in time and a set of resources
associated with a particular container with an audit container ID.

The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator so it
can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.  At that time, record the target container's user-supplied
container identifier along with the target container's first process
(which may become the target container's "init" process) process ID
(referenced from the initial PID namespace), all namespace IDs (in the
form of a nsfs device number and inode number tuple) in a new auxilliary
record AUDIT_CONTAINER with a qualifying op=$action field.

Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
container ID present on an auditable action or event.

Forked and cloned processes inherit their parent's container ID,
referenced in the process' task_struct.

Mimic setns(2) and return an error if the process has already initiated
threading or forked since this registration should happen before the
process execution is started by the orchestrator and hence should not
yet have any threads or children.  If this is deemed overly restrictive,
switch all threads and children to the new containerID.

Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.

Log the creation of every namespace, inheriting/adding its spawning
process' containerID(s), if applicable.  Include the spawning and
spawned namespace IDs (device and inode number tuples).
[AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
Note: At this point it appears only network namespaces may need to track
container IDs apart from processes since incoming packets may cause an
auditable event before being associated with a process.

Log the destruction of every namespace when it is no longer used by any
process, include the namespace IDs (device and inode number tuples).
[AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]

Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
the parent and child namespace IDs for any changes to a process'
namespaces. [setns(2)]
Note: It may be possible to combine AUDIT_NS_* record formats and
distinguish them with an op=$action field depending on the fields
required for each message type.

When a container ceases to exist because the last process in that
container has exited and hence the last namespace has been destroyed and
its refcount dropping to zero, log the fact.
(This latter is likely needed for certification accountability.)  A
container object may need a list of processes and/or namespaces.

A namespace cannot directly migrate from one container to another but
could be assigned to a newly spawned container.  A namespace can be
moved from one container to another indirectly by having that namespace
used in a second process in another container and then ending all the
processes in the first container.

(v2)
- switch from u64 to u128 UUID
- switch from "signal" and "trigger" to "register"
- restrict registration to single process or force all threads and children into same container

- RGB

--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2017-12-11 19:37 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12 14:14 RFC(v2): Audit Kernel Container IDs Richard Guy Briggs
2017-10-12 14:14 ` Richard Guy Briggs
2017-10-12 15:45 ` Steve Grubb
2017-10-19 19:57   ` Richard Guy Briggs
2017-10-19 19:57     ` Richard Guy Briggs
     [not found]     ` <20171019195747.4ssujtaj3f5ipsoh-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-10-19 23:11       ` Aleksa Sarai
2017-10-19 23:11     ` Aleksa Sarai
2017-10-19 23:11       ` Aleksa Sarai
2017-10-19 23:15       ` Aleksa Sarai
     [not found]       ` <8f495870-dd6c-23b9-b82b-4228a441c729-l3A5Bk7waGM@public.gmane.org>
2017-10-19 23:15         ` Aleksa Sarai
2017-10-20  2:25         ` Steve Grubb
2017-10-20  2:25       ` Steve Grubb
2017-10-20  2:25         ` Steve Grubb
2017-10-19 19:57   ` Richard Guy Briggs
     [not found] ` <20171012141359.saqdtnodwmbz33b2-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-10-12 15:45   ` Steve Grubb
2017-10-12 16:33   ` Casey Schaufler
2017-10-12 17:59   ` Eric W. Biederman
2017-10-12 17:59     ` Eric W. Biederman
2017-10-13 13:43   ` Alan Cox
2017-10-12 16:33 ` Casey Schaufler
2017-10-12 16:33   ` Casey Schaufler
2017-10-17  0:33   ` Richard Guy Briggs
2017-10-17  1:10     ` Casey Schaufler
     [not found]       ` <81c15928-c445-fb8e-251c-bee566fbbf58-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
2017-10-19  0:05         ` Richard Guy Briggs
2017-10-19  0:05       ` Richard Guy Briggs
2017-10-19  0:05         ` Richard Guy Briggs
     [not found]         ` <20171019000527.eio6dfsmujmtioyt-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-10-19 13:32           ` Casey Schaufler
2017-10-19 13:32         ` Casey Schaufler
2017-10-19 13:32           ` Casey Schaufler
2017-10-19 15:51           ` Paul Moore
     [not found]           ` <18cb69a5-f998-0e6e-85df-7f4b9b768a6f-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
2017-10-19 15:51             ` Paul Moore
     [not found]     ` <20171017003340.whjdkqmkw4lydwy7-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-10-17  1:10       ` Casey Schaufler
2017-10-17  1:42       ` Steve Grubb
2017-10-17  1:42         ` Steve Grubb
2017-10-17 12:31         ` Simo Sorce
2017-10-17 14:59           ` Casey Schaufler
     [not found]             ` <a07968f6-fef1-f49d-01f1-6c660c0ada20-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
2017-10-17 15:28               ` Simo Sorce
2017-10-17 15:28                 ` Simo Sorce
2017-10-17 15:28                 ` Simo Sorce
2017-10-17 15:44                 ` James Bottomley
2017-10-17 15:44                   ` James Bottomley
2017-10-17 16:43                   ` Casey Schaufler
2017-10-17 17:15                     ` Steve Grubb
2017-10-17 17:57                       ` James Bottomley
2017-10-17 17:57                         ` James Bottomley
     [not found]                         ` <1508263063.3129.35.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-10-18  0:23                           ` Steve Grubb
2017-10-18  0:23                             ` Steve Grubb
     [not found]                     ` <eb96144d-4ab5-7f9f-de18-b296db35a00a-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
2017-10-17 17:15                       ` Steve Grubb
     [not found]                   ` <1508255091.3129.27.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-10-17 16:43                     ` Casey Schaufler
2017-10-18 20:56                     ` Paul Moore
2017-10-18 20:56                       ` Paul Moore
     [not found]                       ` <CAHC9VhRV9m6-APj3ofMQc22rL-WUoDzB8-urUxryszjCHHHLTg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-18 23:46                         ` Aleksa Sarai
2017-10-18 23:46                           ` Aleksa Sarai
     [not found]                           ` <49752b6f-8a77-d1e5-8acb-5a1eed0a992c-l3A5Bk7waGM@public.gmane.org>
2017-10-19  0:43                             ` Eric W. Biederman
2017-10-19  0:43                           ` Eric W. Biederman
2017-10-19  0:43                             ` Eric W. Biederman
2017-10-19 15:36                             ` Paul Moore
2017-10-19 15:36                               ` Paul Moore
2017-10-19 16:25                               ` Eric W. Biederman
2017-10-19 16:25                                 ` Eric W. Biederman
2017-10-19 17:47                                 ` Paul Moore
     [not found]                                 ` <87y3o7gl5l.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-10-19 17:47                                   ` Paul Moore
     [not found]                               ` <CAHC9VhTYF-MJm3ejWXE1H-eeXKaNBkeWKwdiKdj093xATYn7nQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-19 16:25                                 ` Eric W. Biederman
     [not found]                             ` <871sm0j7bm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-10-19 15:36                               ` Paul Moore
     [not found]                 ` <1508254120.6230.34.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-17 15:44                   ` James Bottomley
2017-10-17 16:10                   ` Casey Schaufler
2017-10-17 16:10                 ` Casey Schaufler
2017-10-18 19:58           ` Paul Moore
     [not found]           ` <1508243469.6230.24.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-17 14:59             ` Casey Schaufler
2017-10-18 19:58             ` Paul Moore
2017-10-17 12:31         ` Simo Sorce
     [not found]   ` <75b7d6a6-42ba-2dff-1836-1091c7c024e7-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
2017-10-17  0:33     ` Richard Guy Briggs
2017-12-09 10:20     ` Mickaël Salaün
2017-12-09 10:20   ` Mickaël Salaün
2017-12-09 10:20     ` Mickaël Salaün
2017-12-09 18:28     ` Casey Schaufler
2017-12-09 18:28       ` Casey Schaufler
2017-12-09 18:28       ` Casey Schaufler
2017-12-11 16:30       ` Eric Paris
2017-12-11 16:52         ` Casey Schaufler
     [not found]         ` <1513009857.6310.337.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-12-11 16:52           ` Casey Schaufler
2017-12-11 19:37           ` Steve Grubb
2017-12-11 19:37         ` Steve Grubb
2017-12-11 19:37           ` Steve Grubb
     [not found]       ` <f8ea78be-9bbf-2967-7b12-ac93bb85b0bc-iSGtlc1asvQWG2LlvL+J4A@public.gmane.org>
2017-12-11 16:30         ` Eric Paris
2017-12-11 15:10     ` Richard Guy Briggs
2017-12-11 15:10       ` Richard Guy Briggs
2017-12-11 15:10       ` Richard Guy Briggs
     [not found]     ` <7ebca85a-425c-2b95-9a5f-59d81707339e-WFhQfpSGs3bR7s880joybQ@public.gmane.org>
2017-12-09 18:28       ` Casey Schaufler
2017-12-11 15:10       ` Richard Guy Briggs
2017-10-13 13:43 ` Alan Cox
2017-10-13 13:43   ` Alan Cox
2017-10-13 13:43   ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2017-10-12 14:14 Richard Guy Briggs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.