linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Stéphane Graber" <stgraber@ubuntu.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-security-module@vger.kernel.org,
	Kees Cook <keescook@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-api@vger.kernel.org,
	Linux Containers <containers@lists.linux-foundation.org>,
	Jann Horn <jannh@google.com>,
	linux-kernel@vger.kernel.org, smbarber@chromium.org,
	Seth Forshee <seth.forshee@canonical.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH v2 00/28] user_namespace: introduce fsid mappings
Date: Mon, 17 Feb 2020 18:11:41 -0500	[thread overview]
Message-ID: <CA+enf=v6WpYO9uEmWZ=m2bkMEVLcRGiG4WiJeaMaH_uSfnkz8g@mail.gmail.com> (raw)
In-Reply-To: <1581980625.24289.30.camel@HansenPartnership.com>

On Mon, Feb 17, 2020 at 6:03 PM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Mon, 2020-02-17 at 16:57 -0500, Stéphane Graber wrote:
> > On Mon, Feb 17, 2020 at 4:12 PM James Bottomley <
> > James.Bottomley@hansenpartnership.com> wrote:
> >
> > > On Fri, 2020-02-14 at 19:35 +0100, Christian Brauner wrote:
> > > [...]
> > > > With this patch series we simply introduce the ability to create
> > > > fsid mappings that are different from the id mappings of a user
> > > > namespace. The whole feature set is placed under a config option
> > > > that defaults to false.
> > > >
> > > > In the usual case of running an unprivileged container we will
> > > > have setup an id mapping, e.g. 0 100000 100000. The on-disk
> > > > mapping will correspond to this id mapping, i.e. all files which
> > > > we want to appear as 0:0 inside the user namespace will be
> > > > chowned to 100000:100000 on the host. This works, because
> > > > whenever the kernel needs to do a filesystem access it will
> > > > lookup the corresponding uid and gid in the idmapping tables of
> > > > the container.
> > > > Now think about the case where we want to have an id mapping of 0
> > > > 100000 100000 but an on-disk mapping of 0 300000 100000 which is
> > > > needed to e.g. share a single on-disk mapping with multiple
> > > > containers that all have different id mappings.
> > > > This will be problematic. Whenever a filesystem access is
> > > > requested, the kernel will now try to lookup a mapping for 300000
> > > > in the id mapping tables of the user namespace but since there is
> > > > none the files will appear to be owned by the overflow id, i.e.
> > > > usually 65534:65534 or nobody:nogroup.
> > > >
> > > > With fsid mappings we can solve this by writing an id mapping of
> > > > 0 100000 100000 and an fsid mapping of 0 300000 100000. On
> > > > filesystem access the kernel will now lookup the mapping for
> > > > 300000 in the fsid mapping tables of the user namespace. And
> > > > since such a mapping exists, the corresponding files will have
> > > > correct ownership.
> > >
> > > How do we parametrise this new fsid shift for the unprivileged use
> > > case?  For newuidmap/newgidmap, it's easy because each user gets a
> > > dedicated range and everything "just works (tm)".  However, for the
> > > fsid mapping, assuming some newfsuid/newfsgid tool to help, that
> > > tool has to know not only your allocated uid/gid chunk, but also
> > > the offset map of the image.  The former is easy, but the latter is
> > > going to vary by the actual image ... well unless we standardise
> > > some accepted shift for images and it simply becomes a known static
> > > offset.
> > >
> >
> > For unprivileged runtimes, I would expect images to be unshifted and
> > be unpacked from within a userns.
>
> For images whose resting format is an archive like tar, I concur.
>
> >  So your unprivileged user would be allowed a uid/gid range through
> > /etc/subuid and /etc/subgid and allowed to use them through
> > newuidmap/newgidmap.In that namespace, you can then pull
> > and unpack any images/layers you may want and the resulting fs tree
> > will look correct from within that namespace.
> >
> > All that is possible today and is how for example unprivileged LXC
> > works right now.
>
> I do have a counter example, but it might be more esoteric: I do use
> unprivileged architecture emulation containers to maintain actual
> physical system boot environments.  These are stored as mountable disk
> images, not as archives, so I do need a simple remapping ... however, I
> think this use case is simple: it's a back shift along my owned uid/gid
> range, so tools for allowing unprivileged use can easily cope with this
> use case, so the use is either fsid identity or fsid back along
> existing user_ns mapping.
>
> > What this patchset then allows is for containers to have differing
> > uid/gid maps while still being based off the same image or layers.
> > In this scenario, you would carve a subset of your main uid/gid map
> > for each container you run and run them in a child user namespace
> > while setting up a fsuid/fsgid map such that their filesystem access
> > do not follow their uid/gid map. This then results in proper
> > isolation for processes, networks, ... as everything runs as
> > different kuid/kgid but the VFS view will be the same in all
> > containers.
>
> Who owns the shifted range of the image ... all tenants or none?

I would expect the most common case being none of them.
So you'd have a uid/gid range carved out of your own allocation which is
used to unpack images, let's call that the image map.

Your containers would then use a uid/gid map which is distinct from that map
and distinct from each other but all using the image map as their
fsuid/fsgid map.

This will make the VFS behave in a normal way and would also allow for
shared paths between those containers by using a shared directory
through bind-mount which is also owned by a uid/gid in that image range.

> > Shared storage between those otherwise isolated containers would also
> > work just fine by simply bind-mounting the same path into two or more
> > containers.
> >
> >
> > Now one additional thing that would be safe for a setuid wrapper to
> > allow would be for arbitrary mapping of any of the uid/gid that the
> > user owns to be used within the fsuid/fsgid map. One potential use
> > for this would be to create any number of user namespaces, each with
> > their own mapping for uid 0 while still having all VFS access be
> > mapped to the user that spawned them (say uid=1000, gid=1000).
> >
> >
> > Note that in our case, the intended use for this is from a privileged
> > runtime where our images would be unshifted as would be the container
> > storage and any shared storage for containers. The security model
> > effectively relying on properly configured filesystem permissions and
> > mount namespaces such that the content of those paths can never be
> > seen by anyone but root outside of those containers (and therefore
> > avoids all the issues around setuid/setgid/fscaps).
>
> Yes, I understand ... all orchestration systems are currently hugely
> privileged.  However, there is interest in getting them down to only
> "slightly privileged".
>
> James
>
>
> > We will then be able to allocate distinct, random, ranges of 65536
> > uids/gids (or more) for each container without ever having to do any
> > uid/gid shifting at the filesystem layer or run into issues when
> > having to setup shared storage between containers or attaching
> > external storage volumes to those containers.

      reply	other threads:[~2020-02-17 23:12 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 18:35 [PATCH v2 00/28] user_namespace: introduce fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 01/28] user_namespace: introduce fsid mappings infrastructure Christian Brauner
2020-02-14 18:35 ` [PATCH v2 02/28] proc: add /proc/<pid>/fsuid_map Christian Brauner
2020-02-14 18:35 ` [PATCH v2 03/28] proc: add /proc/<pid>/fsgid_map Christian Brauner
2020-02-14 18:35 ` [PATCH v2 04/28] fsuidgid: add fsid mapping helpers Christian Brauner
2020-02-14 19:11   ` Jann Horn
2020-02-16 16:55     ` Christian Brauner
2020-02-14 18:35 ` [PATCH v2 05/28] proc: task_state(): use from_kfs{g,u}id_munged Christian Brauner
2020-02-14 18:35 ` [PATCH v2 06/28] cred: add kfs{g,u}id Christian Brauner
2020-02-14 18:35 ` [PATCH v2 07/28] sys: __sys_setfsuid(): handle fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 08/28] sys: __sys_setfsgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 09/28] sys:__sys_setuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 10/28] sys:__sys_setgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 11/28] sys:__sys_setreuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 12/28] sys:__sys_setregid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 13/28] sys:__sys_setresuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 14/28] sys:__sys_setresgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 15/28] fs: add is_userns_visible() helper Christian Brauner
2020-02-14 18:35 ` [PATCH v2 16/28] namei: may_{o_}create(): handle fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 17/28] inode: inode_owner_or_capable(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 18/28] capability: privileged_wrt_inode_uidgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 19/28] stat: " Christian Brauner
2020-02-14 19:03   ` Tycho Andersen
2020-02-16 14:12     ` Christian Brauner
2020-02-14 18:35 ` [PATCH v2 20/28] open: " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 21/28] posix_acl: " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 22/28] attr: notify_change(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 23/28] commoncap: cap_bprm_set_creds(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 24/28] commoncap: cap_task_fix_setuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 25/28] commoncap: handle fsid mappings with vfs caps Christian Brauner
2020-02-14 18:35 ` [PATCH v2 26/28] exec: bprm_fill_uid(): handle fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 27/28] ptrace: adapt ptrace_may_access() to always uses unmapped fsids Christian Brauner
2020-02-14 18:35 ` [PATCH v2 28/28] devpts: handle fsid mappings Christian Brauner
2020-02-16 15:55 ` [PATCH v2 00/28] user_namespace: introduce " Florian Weimer
2020-02-16 16:40   ` Christian Brauner
2020-02-17 21:06 ` James Bottomley
2020-02-17 21:20   ` Christian Brauner
2020-02-17 22:35     ` James Bottomley
2020-02-17 23:05       ` Christian Brauner
2020-02-17 21:11 ` James Bottomley
     [not found]   ` <CA+enf=vwd-dxzve87t7Mw1Z35RZqdLzVaKq=fZ4EGOpnES0f5w@mail.gmail.com>
2020-02-17 22:02     ` Stéphane Graber
2020-02-17 23:03     ` James Bottomley
2020-02-17 23:11       ` Stéphane Graber [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+enf=v6WpYO9uEmWZ=m2bkMEVLcRGiG4WiJeaMaH_uSfnkz8g@mail.gmail.com' \
    --to=stgraber@ubuntu.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=adobriyan@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=seth.forshee@canonical.com \
    --cc=smbarber@chromium.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).