Re: New Container vulnerability could potentially use an SELinux fix.

From: Stephen Smalley <sds@tycho.nsa.gov>
To: dwalsh@redhat.com, Miloslav Trmac <mitr@redhat.com>,
	selinux@vger.kernel.org
Subject: Re: New Container vulnerability could potentially use an SELinux fix.
Date: Mon, 10 Jun 2019 11:00:53 -0400	[thread overview]
Message-ID: <df0e048f-ef5f-8a43-81cb-3d3f6cf10230@tycho.nsa.gov> (raw)
In-Reply-To: <e8b8b026-0409-098b-bd2a-20ed43c4d10b@redhat.com>

On 6/10/19 10:37 AM, Daniel Walsh wrote:
> On 6/10/19 10:08 AM, Stephen Smalley wrote:
>> On 6/8/19 10:08 AM, Daniel Walsh wrote:
>>> On 6/7/19 5:26 PM, Stephen Smalley wrote:
>>>> On 6/7/19 5:06 PM, Daniel Walsh wrote:
>>>>> On 6/7/19 12:44 PM, Stephen Smalley wrote:
>>>>>> On 6/7/19 11:42 AM, Daniel Walsh wrote:
>>>>>>> We have periodic vulnerablities around bad container images having
>>>>>>> symbolic link attacks against the host.
>>>>>>>
>>>>>>> One came out last week about doing a `podman cp`
>>>>>>>
>>>>>>> Which would copy content from the host into the container.  The
>>>>>>> issue
>>>>>>> was that if the container was running, it could trick the processes
>>>>>>> copying content into it to follow a symbolic link to external of the
>>>>>>> container image.
>>>>>>>
>>>>>>> The question came up, is there a way to use SELinux to prevent
>>>>>>> this. And
>>>>>>> sadly the answer right now is no, because we have no way to know
>>>>>>> what
>>>>>>> the label of the process attempting to update the container file
>>>>>>> system
>>>>>>> is running as.  Usually it will be running as unconfined_t.
>>>>>>>
>>>>>>> One idea would be to add a rule to policy that control the
>>>>>>> following of
>>>>>>> symbolic links to only those specified in policy.
>>>>>>>
>>>>>>>
>>>>>>> Something like
>>>>>>>
>>>>>>> SPECIALRESTRICTED TYPE container_file_t
>>>>>>>
>>>>>>> allow container_file_t container_file_t:symlink follow;
>>>>>>>
>>>>>>> Then if a process attempted to copy content onto a symbolic link
>>>>>>> from
>>>>>>> container_file_t to a non container_file_t type, the kernel would
>>>>>>> deny
>>>>>>> access.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> SELinux would prevent it if you didn't allow unconfined_t (or other
>>>>>> privileged domains) to follow untrustworthy symlinks (e.g. don't
>>>>>> allow
>>>>>> unconfined_t container_file_t:lnk_file read; in the first place).
>>>>>> That's the right way to prevent it.
>>>>>>
>>>>>> Trying to apply a check between symlink and its target as you suggest
>>>>>> is problematic; we don't generally have them both at the same point.
>>>>>> If we are allowed to follow the symlink, we read its contents and
>>>>>> perform a path walk on that, and that could be a multi-component
>>>>>> pathname lookup that itself spans further symlinks, mount points,
>>>>>> etc.  I think that would be challenging to support in the kernel,
>>>>>> subject to races, and certainly would require changes outside of just
>>>>>> SELinux.
>>>>>>
>>>>>> If you truly cannot impose such restrictions on unconfined_t, then
>>>>>> maybe podman should run in its own domain.
>>>>>>
>>>>> This is not an issue with just podman.  Podman can mount the image and
>>>>> the tools can just read/write content into the mountpoint.
>>>>>
>>>>> I thought I recalled a LSM that prefented symlink attacks when users
>>>>> would link a file in the homedir against /etc/shadow and then
>>>>> attempt to
>>>>> get the admin to modify the file in his homedir?
>>>>>
>>>>> I was thinking that if that existed we could build more controls on it
>>>>> based on Labels rather then just UIDs matching.
>>>>
>>>> Not sure if you are thinking of symlink attacks or hard link attacks.
>>>> SELinux supports preventing the former by restricting the ability to
>>>> follow symlinks based on lnk_file read permission, so you can prevent
>>>> trusted processes from following untrustworthy symlinks.  SELinux
>>>> supports preventing the latter by restricting the ability to create
>>>> hard links to unauthorized files.  But you need to write your policies
>>>> in a manner that leverages that support, and a fully unconfined domain
>>>> isn't going to be protected via SELinux by definition; ideally you'd
>>>> be phasing out unconfined altogether like Android did.  Modern kernels
>>>> also have the /proc/sys/fs/protected_hardlinks and
>>>> /proc/sys/fs/protected_symlinks settings, which restrict based on UID,
>>>> but the symlink checks aren't based on the target of the symlink
>>>> either.
>>>
>>> Android does not have an Admin, so it is a lot easier for them.  But not
>>> going to get into that now.  I obviously understand how SELinux works.
>>> But perhaps I am looking for something differntly.
>>>
>>> This link defines pretty close to what I would want, but extended for
>>> labels rather then just UIDS.
>>>
>>> https://sysctl-explorer.net/fs/protected_symlinks/
>>>
>>>
>>>> A long-standing class of security issues is the symlink-based
>>>> time-of-check-time-of-use race, most commonly seen in world-writable
>>>> directories like /tmp. The common method of exploitation of this flaw
>>>> is to cross privilege boundaries when following a given symlink (i.e.
>>>> a **PRIVILEGED** process follows a symlink belonging **PROVIDED BY
>>>> OTHERS**). For a likely incomplete list of hundreds of examples across
>>>> the years, please see:
>>>> http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
>>>>
>>>> When set to “0”, symlink following behavior is unrestricted.
>>>>
>>>> When set to “1” symlinks are permitted to be followed only when
>>>> outside a sticky world-writable directory **WE COULD POTENTIALLY SET
>>>> THIS OR SOME OTHER FLAG**, or when the **LABEL** of the symlink and
>>>> follower match, or when the directory **LABEL** matches the symlink’s
>>>> **LABEL**.
>>>>
>>>> This protection is based on the restrictions in Openwall and
>>>> grsecurity.
>>>>
>>
>> That's the /proc/sys/fs/protected_symlinks feature I mentioned in my
>> email above.  It isn't based on the target of the symlink; it is only
>> based on the attributes of the follower process (e.g. root), the
>> attributes of the parent directory containing the symlink (e.g. /tmp),
>> and the attributes of the symlink file (e.g. /tmp/foo -> /etc/shadow).
>> At no point is it checking anything about the target of the symlink,
>> e.g. /etc/shadow.  If dwalsh creates a symlink under /tmp (ln -s
>> /etc/shadow /tmp/foo) and root tries to follow /tmp/foo, then that
>> will fail because 1) the process fsuid (root) != the /tmp/foo symlink
>> owner (dwalsh), and 2) /tmp is a sticky and world-writable directory,
>> and 3) the /tmp directory owner (root) != the /tmp/foo symlink owner
>> (dwalsh). Note that conditions (2) and (3) render the check useless
>> for your use case, since you want to prevent following any symlinks
>> writable by container processes in any directory within the container
>> filesystem, so the directory need not be world-writable/sticky and the
>> parent directory UID/label might be identical to the symlink UID/label.
> We we are mounting the file system (Most of the time), So we could add a
> flag to indicate that this is a protected file system.

You are effectively already doing that by mounting with a context mount 
that assigns container_file_t or whatever type to the filesystem.  You 
don't need something new there.

>>
>>
>> The existing SELinux lnk_file read permission check enables you to
>> apply stronger label-based controls to all symlinks within the
>> container filesystem, not just ones in /tmp-like directories.  Don't
>> allow unconfined_t or any other privileged domain read permission to
>> container_file_t:lnk_file (or preferably to any file type for which
>> :lnk_file create is allowed to container process domains), and you'll
>> never have to worry about them following a symlink writable by a
>> container process.  This of course assumes that the container
>> filesystem is always labeled with a type that is untrusted, whether
>> via mount contexts or actual labels.
> 
> But we want to allow domains to follow container_file_t links that point
> to container_file_t objects.  Just not follow them if they point to
> other types.  This means there is no Protection that I could write to a
> domain like unconfined_t to say only follow links when the types match.
> Or the types have allow rules.

You really don't want programs on the host OS that are acting on a 
container filesystem to ever follow any symlinks within it.  It just 
isn't a good idea; even if you limit it to intra-container symlinks, 
then an attacker could use the host process to overwrite some file 
within the container that wasn't directly writable by him.

In any event, I don't know how one would implement a check between the 
symlink and its target; you'd have to save the symlink information until 
you reach the final target and then call a hook with both of them.  And 
what if there are multiple symlinks in that path?  Symlinks to symlinks?