containers.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Alban Crequy <alban@kinvolk.io>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Lennart Poettering <lennart@poettering.net>,
	Mimi Zohar <zohar@linux.ibm.com>,
	David Howells <dhowells@redhat.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Tycho Andersen <tycho@tycho.ws>,
	Miklos Szeredi <miklos@szeredi.hu>,
	smbarber@chromium.org, Christoph Hellwig <hch@infradead.org>,
	linux-ext4@vger.kernel.org, Mrunal Patel <mpatel@redhat.com>,
	Kees Cook <keescook@chromium.org>, Arnd Bergmann <arnd@arndb.de>,
	Jann Horn <jannh@google.com>,
	selinux@vger.kernel.org, Josh Triplett <josh@joshtriplett.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	Geoffrey Thomas <geofft@ldpreload.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	John Johansen <john.johansen@canonical.com>,
	Theodore Tso <tytso@mit.edu>,
	Seth Forshee <seth.forshee@canonical.com>,
	Dmitry Kasatkin <dmitry.kasatkin@gmail.com>,
	Stephen Smalley <stephen.smalley.work@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-unionfs@vger.kernel.org,
	LSM <linux-security-module@vger.kernel.org>,
	linux-audit@redhat.com, linux-api@vger.kernel.org,
	Casey Schaufler <casey@schaufler-ca.com>,
	linux-integrity <linux-integrity@vger.kernel.org>,
	Todd Kjos <tkjos@google.com>
Subject: Re: [PATCH 00/34] fs: idmapped mounts
Date: Tue, 3 Nov 2020 15:10:35 +0100	[thread overview]
Message-ID: <CADZs7q4abuN6n8HMrpe2R2kRBUDVPoYRNpezKk4cvXRk7CVHng@mail.gmail.com> (raw)
In-Reply-To: <87361xdm4c.fsf@x220.int.ebiederm.org>

On Thu, Oct 29, 2020 at 5:37 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Aleksa Sarai <cyphar@cyphar.com> writes:
>
> > On 2020-10-29, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Christian Brauner <christian.brauner@ubuntu.com> writes:
> >>
> >> > Hey everyone,
> >> >
> >> > I vanished for a little while to focus on this work here so sorry for
> >> > not being available by mail for a while.
> >> >
> >> > Since quite a long time we have issues with sharing mounts between
> >> > multiple unprivileged containers with different id mappings, sharing a
> >> > rootfs between multiple containers with different id mappings, and also
> >> > sharing regular directories and filesystems between users with different
> >> > uids and gids. The latter use-cases have become even more important with
> >> > the availability and adoption of systemd-homed (cf. [1]) to implement
> >> > portable home directories.
> >>
> >> Can you walk us through the motivating use case?
> >>
> >> As of this year's LPC I had the distinct impression that the primary use
> >> case for such a feature was due to the RLIMIT_NPROC problem where two
> >> containers with the same users still wanted different uid mappings to
> >> the disk because the users were conflicting with each other because of
> >> the per user rlimits.
> >>
> >> Fixing rlimits is straight forward to implement, and easier to manage
> >> for implementations and administrators.
> >
> > This is separate to the question of "isolated user namespaces" and
> > managing different mappings between containers. This patchset is solving
> > the same problem that shiftfs solved -- sharing a single directory tree
> > between containers that have different ID mappings. rlimits (nor any of
> > the other proposals we discussed at LPC) will help with this problem.
>
> First and foremost: A uid shift on write to a filesystem is a security
> bug waiting to happen.  This is especially in the context of facilities
> like iouring, that play very agressive games with how process context
> makes it to  system calls.
>
> The only reason containers were not immediately exploitable when iouring
> was introduced is because the mechanisms are built so that even if
> something escapes containment the security properties still apply.
> Changes to the uid when writing to the filesystem does not have that
> property.  The tiniest slip in containment will be a security issue.
>
> This is not even the least bit theoretical.  I have seem reports of how
> shitfs+overlayfs created a situation where anyone could read
> /etc/shadow.
>
> If you are going to write using the same uid to disk from different
> containers the question becomes why can't those containers configure
> those users to use the same kuid?
>
> What fixing rlimits does is it fixes one of the reasons that different
> containers could not share the same kuid for users that want to write to
> disk with the same uid.
>
>
> I humbly suggest that it will be more secure, and easier to maintain for
> both developers and users if we fix the reasons people want different
> containers to have the same user running with different kuids.
>
> If not what are the reasons we fundamentally need the same on-disk user
> using multiple kuids in the kernel?

I would like to use this patch set in the context of Kubernetes. I
described my two possible setups in
https://www.spinics.net/lists/linux-containers/msg36537.html:

1. Each Kubernetes pod has its own userns but with the same user id mapping
2. Each Kubernetes pod has its own userns with non-overlapping user id
mapping (providing additional isolation between pods)

But even in the setup where all pods run with the same id mappings,
this patch set is still useful to me for 2 reasons:

1. To avoid the expensive recursive chown of the rootfs. We cannot
necessarily extract the tarball directly with the right uids because
we might use the same container image for privileged containers (with
the host userns) and unprivileged containers (with a new userns), so
we have at least 2 “mappings” (taking more time and resulting in more
storage space). Although the “metacopy” mount option in overlayfs
helps to make the recursive chown faster, it can still take time with
large container images with lots of files. I’d like to use this patch
set to set up the root fs in constant time.

2. To manage large external volumes (NFS or other filesystems). Even
if admins can decide to use the same kuid on all the nodes of the
Kubernetes cluster, this is impractical for migration. People can have
existing Kubernetes clusters (currently without using user namespaces)
and large NFS volumes. If they want to switch to a new version of
Kubernetes with the user namespace feature enabled, they would need to
recursively chown all the files on the NFS shares. This could take
time on large filesystems and realistically, we want to support
rolling updates where some nodes use the previous version without user
namespaces and new nodes are progressively migrated to the new userns
with the new id mapping. If both sets of nodes use the same NFS share,
that can’t work.

Alban
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

  parent reply	other threads:[~2020-11-03 14:32 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29  0:32 [PATCH 00/34] fs: idmapped mounts Christian Brauner
2020-10-29  0:32 ` [PATCH 01/34] namespace: take lock_mount_hash() directly when changing flags Christian Brauner
2020-11-01 14:41   ` Christoph Hellwig
2020-11-02 13:33     ` Christian Brauner
2020-10-29  0:32 ` [PATCH 02/34] namespace: only take read lock in do_reconfigure_mnt() Christian Brauner
2020-10-29  0:32 ` [PATCH 03/34] fs: add mount_setattr() Christian Brauner
2020-11-01 14:42   ` Christoph Hellwig
2020-11-02 13:34     ` Christian Brauner
2020-10-29  0:32 ` [PATCH 04/34] tests: add mount_setattr() selftests Christian Brauner
2020-10-29  0:32 ` [PATCH 05/34] fs: introduce MOUNT_ATTR_IDMAP Christian Brauner
2020-11-01 14:45   ` Christoph Hellwig
2020-11-02 13:29     ` Christian Brauner
2020-10-29  0:32 ` [PATCH 06/34] fs: add id translation helpers Christian Brauner
2020-11-01 14:46   ` Christoph Hellwig
2020-11-02 13:25     ` Christian Brauner
2020-10-29  0:32 ` [PATCH 07/34] capability: handle idmapped mounts Christian Brauner
2020-11-01 14:48   ` Christoph Hellwig
2020-11-02 13:23     ` Christian Brauner
2020-10-29  0:32 ` [PATCH 08/34] namei: add idmapped mount aware permission helpers Christian Brauner
2020-10-29  0:32 ` [PATCH 09/34] inode: add idmapped mount aware init and " Christian Brauner
2020-10-29  0:32 ` [PATCH 10/34] attr: handle idmapped mounts Christian Brauner
2020-10-29  0:32 ` [PATCH 11/34] acl: " Christian Brauner
2020-10-29  0:32 ` [PATCH 12/34] xattr: " Christian Brauner
2020-10-29  0:32 ` [PATCH 13/34] selftests: add idmapped mounts xattr selftest Christian Brauner
2020-10-29  0:32 ` [PATCH 14/34] commoncap: handle idmapped mounts Christian Brauner
2020-10-29  0:32 ` [PATCH 15/34] stat: add mapped_generic_fillattr() Christian Brauner
2020-10-29  0:32 ` [PATCH 16/34] namei: handle idmapped mounts in may_*() helpers Christian Brauner
2020-10-29  0:32 ` [PATCH 17/34] namei: introduce struct renamedata Christian Brauner
2020-10-29  0:32 ` [PATCH 18/34] namei: prepare for idmapped mounts Christian Brauner
2020-10-29  0:32 ` [PATCH 19/34] namei: add lookup helpers with idmapped mounts aware permission checking Christian Brauner
2020-10-29  0:32 ` [PATCH 20/34] open: handle idmapped mounts in do_truncate() Christian Brauner
2020-10-29  0:32 ` [PATCH 21/34] open: handle idmapped mounts Christian Brauner
2020-10-29  0:32 ` [PATCH 22/34] af_unix: " Christian Brauner
2020-10-29  0:32 ` [PATCH 23/34] utimes: " Christian Brauner
2020-10-29  0:32 ` [PATCH 24/34] would_dump: " Christian Brauner
2020-10-29  0:32 ` [PATCH 25/34] exec: " Christian Brauner
2020-10-29  0:32 ` [PATCH 26/34] fs: add helpers for idmap mounts Christian Brauner
2020-10-29  0:32 ` [PATCH 27/34] apparmor: handle idmapped mounts Christian Brauner
2020-10-29  0:32 ` [PATCH 28/34] audit: " Christian Brauner
2020-10-29  0:32 ` [PATCH 29/34] ima: " Christian Brauner
2020-10-29  0:32 ` [PATCH 30/34] ext4: support " Christian Brauner
2020-10-29  0:32 ` [PATCH 31/34] expfs: handle " Christian Brauner
2020-10-29  0:32 ` [PATCH 32/34] overlayfs: handle idmapped lower directories Christian Brauner
2020-10-30 11:10   ` Amir Goldstein
2020-10-30 11:52     ` Christian Brauner
2020-10-29  0:32 ` [PATCH 33/34] overlayfs: handle idmapped merged mounts Christian Brauner
2020-10-30  9:57   ` Amir Goldstein
2020-10-29  0:32 ` [PATCH 34/34] fat: handle idmapped mounts Christian Brauner
2020-10-29  2:27 ` [PATCH 00/34] fs: " Dave Chinner
2020-10-29 16:19   ` Christian Brauner
2020-10-29 21:06     ` Tycho Andersen
2020-10-29  7:20 ` Sargun Dhillon
2020-10-29 15:47 ` Eric W. Biederman
2020-10-29 15:51   ` Aleksa Sarai
2020-10-29 16:37     ` Eric W. Biederman
2020-10-30  2:18       ` Serge E. Hallyn
2020-10-30 15:07       ` Seth Forshee
2020-10-30 16:03         ` Serge E. Hallyn
2020-11-03 14:10       ` Alban Crequy [this message]
2020-10-29 16:05   ` Lennart Poettering
2020-10-29 16:36     ` Sargun Dhillon
2020-10-29 16:54     ` Eric W. Biederman
2020-10-29 16:12   ` Tycho Andersen
2020-10-29 16:23     ` Serge E. Hallyn
2020-10-29 16:44     ` Eric W. Biederman
2020-10-29 18:04       ` Stéphane Graber
2020-10-29 21:03       ` Tycho Andersen
2020-10-29 21:58 ` Andy Lutomirski
2020-10-30 12:01   ` Christian Brauner
2020-10-30 16:17     ` Serge E. Hallyn
2020-10-31 17:43     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADZs7q4abuN6n8HMrpe2R2kRBUDVPoYRNpezKk4cvXRk7CVHng@mail.gmail.com \
    --to=alban@kinvolk.io \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=arnd@arndb.de \
    --cc=casey@schaufler-ca.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=dmitry.kasatkin@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=geofft@ldpreload.com \
    --cc=hch@infradead.org \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=jannh@google.com \
    --cc=john.johansen@canonical.com \
    --cc=josh@joshtriplett.org \
    --cc=keescook@chromium.org \
    --cc=lennart@poettering.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-audit@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mpatel@redhat.com \
    --cc=selinux@vger.kernel.org \
    --cc=seth.forshee@canonical.com \
    --cc=smbarber@chromium.org \
    --cc=stephen.smalley.work@gmail.com \
    --cc=tkjos@google.com \
    --cc=tycho@tycho.ws \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).