linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Miklos Szeredi <mszeredi@redhat.com>
Cc: Dongsu Park <dongsu@kinvolk.io>,
	lkml <linux-kernel@vger.kernel.org>,
	containers@lists.linux-foundation.org,
	Alban Crequy <alban@kinvolk.io>,
	Seth Forshee <seth.forshee@canonical.com>,
	Sargun Dhillon <sargun@sargun.me>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 08/11] fuse: Support fuse filesystems outside of init_user_ns
Date: Fri, 16 Feb 2018 15:52:32 -0600	[thread overview]
Message-ID: <87606wtxen.fsf@xmission.com> (raw)
In-Reply-To: <CAOssrKcKz8p9YQJLf2W_NCBo+12auxir5jFwXGbANdWdgavpsw@mail.gmail.com> (Miklos Szeredi's message of "Tue, 13 Feb 2018 11:20:07 +0100")

Miklos Szeredi <mszeredi@redhat.com> writes:

> On Mon, Feb 12, 2018 at 5:35 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Miklos Szeredi <mszeredi@redhat.com> writes:
>>
>>> On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park <dongsu@kinvolk.io> wrote:
>>>> From: Seth Forshee <seth.forshee@canonical.com>
>>>>
>>>> In order to support mounts from namespaces other than
>>>> init_user_ns, fuse must translate uids and gids to/from the
>>>> userns of the process servicing requests on /dev/fuse. This
>>>> patch does that, with a couple of restrictions on the namespace:
>>>>
>>>>  - The userns for the fuse connection is fixed to the namespace
>>>>    from which /dev/fuse is opened.
>>>>
>>>>  - The namespace must be the same as s_user_ns.
>>>>
>>>> These restrictions simplify the implementation by avoiding the
>>>> need to pass around userns references and by allowing fuse to
>>>> rely on the checks in inode_change_ok for ownership changes.
>>>> Either restriction could be relaxed in the future if needed.
>>>
>>> Can we not introduce potential userspace interface regressions?
>>>
>>> The issue with pid namespaces fixed in commit 5d6d3a301c4e ("fuse:
>>> allow server to run in different pid_ns") will probably bite us here
>>> as well.
>>
>> Maybe, but unlike the pid namespace no one has been able to mount
>> fuse outside of init_user_ns so we are much less exposed.  I agree we
>> should be careful.
>
> Have to wrap my head around all the rules here.
>
> There's the may_mount() one:
>
>     ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN)
>
> Um, first of all, why isn't it checking current->cred->user_ns?
>
> Ah, there it is in sget():
>
>     ns_capable(user_ns, CAP_SYS_ADMIN)
>
> I get the plain capable(CAP_SYS_ADMIN) check in sget_userns() if fs
> doesn't have FS_USERNS_MOUNT.  This is the one that prevents fuse
> mounts from being created when (current->cred->user_ns !=
> &init_user_ns).
>
> Maybe there's a logic to this web of namespaces, but I don't yet see
> it.  Is it documented somewhere?

I think this is a bit simpler than the fiddly details in the
implementation might make it look.

The fundamental idea is that permission to have full control over
a mount namespace, is different than permission to have full control
over an instance of a filesystem.

Implementing that separation of permission checks gets a little bit
fiddly.  The first challenge is that there are several filesystems like
sysfs and proc whose internal mount is created outside of a process.
Then there are the file systems like nfs and afs that have ``referral
points'' that transition you to other instances of those filesystems
when you transition over them.  That is the reason why there are
exceptions for SB_KERNMOUNT and SB_SUBMOUNT.

may_mount is just the permission check for the mount namespace.  It
checks that the current process has CAP_SYS_ADMIN in the user namespace
that owns the current mount namespace.  AKA is the process allowed to
change the mount namespace.

sget is just the permission check for mounting a filesystem.  It checks
that the mounter has CAP_SYS_ADMIN over the user namespace that will own
the newly mounted filesystem.

By the time execition gets to to sget_userns in general all of the
permission checks have all been made.  But if the filesystem is not one
that supports mounting within a user namespace the code checks
capable(CAP_SYS_ADMIN).

That is more convoluted than I would like but the checks derive from the
definition of what we are doing.

>
>>> We basically need two modes of operation:
>>>
>>> a) old, backward compatible (not introducing any new failure mores),
>>> created with privileged mount
>>> b) new, non-backward compatible, created with unprivileged mount
>>>
>>> Technically there would still be a risk from breaking userspace, since
>>> we are using the same entry point for both, but let's hope that no
>>> practical problems come from that.
>>
>> Answering from a 10,000 foot perspective:
>>
>> There are two cases.  Requests to read/write the filesystem from outside
>> of s_user_ns.  These run no risk of breaking userspace as this mode has
>> not been implemented before.
>
> This comes from the fact that (s_user_ns == &init_user_ns) and all
> user namespaces are "inside" init_user_ns, right?

Yes.

> One question: why does current code use the from_[ug]id_munged()
> variant, when the conversion can never fail.  Or can it?

There is always at least (uid_t)-1 that can fail if it shows up on a
filesystem.  As far as I can tell no one was using it for a uid, there
were already uses of (uid_t)-1 as a special case, and I just grabbed it
to become INVALID_UID.

In practice the mapping can't fail unless someone malicious starts using
that id.

I believe I picked the _munged variant so in case that version hits
we are guaranteed to return the 16bit nobody user.

Eric

  reply	other threads:[~2018-02-16 21:52 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-22 14:32 [PATCH v5 00/11] FUSE mounts from non-init user namespaces Dongsu Park
2017-12-22 14:32 ` [PATCH 01/11] block_dev: Support checking inode permissions in lookup_bdev() Dongsu Park
2017-12-22 18:59   ` Coly Li
2017-12-23 12:00     ` Dongsu Park
2017-12-23  3:03   ` Serge E. Hallyn
2017-12-22 14:32 ` [PATCH 02/11] mtd: Check permissions towards mtd block device inode when mounting Dongsu Park
2017-12-22 21:06   ` Richard Weinberger
2017-12-23 12:18     ` Dongsu Park
2017-12-23 12:56       ` Richard Weinberger
2017-12-23  3:05   ` Serge E. Hallyn
2017-12-22 14:32 ` [PATCH 03/11] fs: Allow superblock owner to change ownership of inodes Dongsu Park
2017-12-23  3:17   ` Serge E. Hallyn
2018-01-05 19:24   ` Luis R. Rodriguez
2018-01-09 15:10     ` Dongsu Park
2018-01-09 17:23       ` Luis R. Rodriguez
2018-02-13 13:18   ` Miklos Szeredi
2018-02-16 22:00     ` Eric W. Biederman
2017-12-22 14:32 ` [PATCH 04/11] fs: Don't remove suid for CAP_FSETID for userns root Dongsu Park
2017-12-23  3:26   ` Serge E. Hallyn
2017-12-23 12:38     ` Dongsu Park
2018-02-13 13:37       ` Miklos Szeredi
2017-12-22 14:32 ` [PATCH 05/11] fs: Allow superblock owner to access do_remount_sb() Dongsu Park
2017-12-23  3:30   ` Serge E. Hallyn
2017-12-22 14:32 ` [PATCH 06/11] capabilities: Allow privileged user in s_user_ns to set security.* xattrs Dongsu Park
2017-12-23  3:33   ` Serge E. Hallyn
2017-12-22 14:32 ` [PATCH 07/11] fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems Dongsu Park
2017-12-23  3:39   ` Serge E. Hallyn
2018-02-14 12:28   ` Miklos Szeredi
2018-02-19 22:56     ` Eric W. Biederman
2017-12-22 14:32 ` [PATCH 08/11] fuse: Support fuse filesystems outside of init_user_ns Dongsu Park
2017-12-23  3:46   ` Serge E. Hallyn
2018-01-17 10:59   ` Alban Crequy
2018-01-17 14:29     ` Seth Forshee
2018-01-17 18:56       ` Alban Crequy
2018-01-17 19:31         ` Seth Forshee
2018-01-18 10:29           ` Alban Crequy
2018-02-12 15:57   ` Miklos Szeredi
2018-02-12 16:35     ` Eric W. Biederman
2018-02-13 10:20       ` Miklos Szeredi
2018-02-16 21:52         ` Eric W. Biederman [this message]
2018-02-20  2:12   ` Eric W. Biederman
2017-12-22 14:32 ` [PATCH 09/11] fuse: Restrict allow_other to the superblock's namespace or a descendant Dongsu Park
2017-12-23  3:50   ` Serge E. Hallyn
2018-02-19 23:16   ` Eric W. Biederman
2017-12-22 14:32 ` [PATCH 10/11] fuse: Allow user namespace mounts Dongsu Park
2017-12-23  3:51   ` Serge E. Hallyn
2018-02-14 13:44   ` Miklos Szeredi
2018-02-15  8:46     ` Miklos Szeredi
2017-12-22 14:32 ` [PATCH 11/11] evm: Don't update hmacs in user ns mounts Dongsu Park
2017-12-23  4:03   ` Serge E. Hallyn
2017-12-24  5:12     ` Mimi Zohar
2017-12-24  5:56       ` Mimi Zohar
2017-12-25  7:05 ` [PATCH v5 00/11] FUSE mounts from non-init user namespaces Eric W. Biederman
2018-01-09 15:05   ` Dongsu Park
2018-01-18 14:58     ` Alban Crequy
2018-02-19 23:09       ` Eric W. Biederman
2018-02-13 11:32 ` Miklos Szeredi
2018-02-16 21:53   ` Eric W. Biederman
2018-02-21 20:24 ` [PATCH v6 0/6] fuse: " Eric W. Biederman
2018-02-21 20:29   ` [PATCH v6 1/5] fuse: Remove the buggy retranslation of pids in fuse_dev_do_read Eric W. Biederman
2018-02-22 10:13     ` Miklos Szeredi
2018-02-22 19:04       ` Eric W. Biederman
2018-02-21 20:29   ` [PATCH v6 2/5] fuse: Fail all requests with invalid uids or gids Eric W. Biederman
2018-02-22 10:26     ` Miklos Szeredi
2018-02-22 18:15       ` Eric W. Biederman
2018-02-21 20:29   ` [PATCH v6 3/5] fuse: Support fuse filesystems outside of init_user_ns Eric W. Biederman
2018-02-21 20:29   ` [PATCH v6 4/5] fuse: Ensure posix acls are translated " Eric W. Biederman
2018-02-22 11:40     ` Miklos Szeredi
2018-02-22 19:18       ` Eric W. Biederman
2018-02-22 22:50         ` Eric W. Biederman
2018-02-26  7:47           ` Miklos Szeredi
2018-02-26 16:35             ` Eric W. Biederman
2018-02-26 21:51               ` Eric W. Biederman
2018-02-21 20:29   ` [PATCH v6 5/5] fuse: Restrict allow_other to the superblock's namespace or a descendant Eric W. Biederman
2018-02-26 23:52   ` [PATCH v7 0/7] fuse: mounts from non-init user namespaces Eric W. Biederman
2018-02-26 23:52     ` [PATCH v7 1/7] fuse: Remove the buggy retranslation of pids in fuse_dev_do_read Eric W. Biederman
2018-02-26 23:52     ` [PATCH v7 2/7] fuse: Fail all requests with invalid uids or gids Eric W. Biederman
2018-02-26 23:52     ` [PATCH v7 3/7] fs/posix_acl: Document that get_acl respects ACL_DONT_CACHE Eric W. Biederman
2018-02-27  1:13       ` Linus Torvalds
2018-02-27  2:53         ` Eric W. Biederman
2018-02-27  3:14           ` Eric W. Biederman
2018-02-27  3:41             ` Linus Torvalds
2018-03-02 19:53               ` [RFC][PATCH] fs/posix_acl: Update the comments and support lightweight cache skipping Eric W. Biederman
2018-02-27  3:36           ` [PATCH v7 3/7] fs/posix_acl: Document that get_acl respects ACL_DONT_CACHE Linus Torvalds
2018-02-26 23:52     ` [PATCH v7 4/7] fuse: Cache a NULL acl when FUSE_GETXATTR returns -ENOSYS Eric W. Biederman
2018-02-26 23:53     ` [PATCH v7 5/7] fuse: Simplfiy the posix acl handling logic Eric W. Biederman
2018-02-27  9:00       ` Miklos Szeredi
2018-03-02 21:49         ` Eric W. Biederman
2018-02-26 23:53     ` [PATCH v7 6/7] fuse: Support fuse filesystems outside of init_user_ns Eric W. Biederman
2018-02-26 23:53     ` [PATCH v7 7/7] fuse: Restrict allow_other to the superblock's namespace or a descendant Eric W. Biederman
2018-03-02 21:58     ` [PATCH v8 0/6] fuse: mounts from non-init user namespaces Eric W. Biederman
2018-03-02 21:59       ` [PATCH v8 1/6] fs/posix_acl: Update the comments and support lightweight cache skipping Eric W. Biederman
2018-03-05  9:53         ` Miklos Szeredi
2018-03-05 13:53           ` Eric W. Biederman
2018-03-02 21:59       ` [PATCH v8 2/6] fuse: Simplfiy the posix acl handling logic Eric W. Biederman
2018-03-02 21:59       ` [PATCH v8 3/6] fuse: Remove the buggy retranslation of pids in fuse_dev_do_read Eric W. Biederman
2018-03-02 21:59       ` [PATCH v8 4/6] fuse: Fail all requests with invalid uids or gids Eric W. Biederman
2018-03-02 21:59       ` [PATCH v8 5/6] fuse: Support fuse filesystems outside of init_user_ns Eric W. Biederman
2018-03-02 21:59       ` [PATCH v8 6/6] fuse: Restrict allow_other to the superblock's namespace or a descendant Eric W. Biederman
2018-03-08 21:23       ` [PATCH v9 0/4] fuse: mounts from non-init user namespaces Eric W. Biederman
2018-03-08 21:24         ` [PATCH v9 1/4] fuse: Remove the buggy retranslation of pids in fuse_dev_do_read Eric W. Biederman
2018-03-08 21:24         ` [PATCH v9 2/4] fuse: Fail all requests with invalid uids or gids Eric W. Biederman
2018-03-08 21:24         ` [PATCH v9 3/4] fuse: Support fuse filesystems outside of init_user_ns Eric W. Biederman
2018-03-08 21:24         ` [PATCH v9 4/4] fuse: Restrict allow_other to the superblock's namespace or a descendant Eric W. Biederman
2018-03-20 16:25         ` [PATCH v9 0/4] fuse: mounts from non-init user namespaces Miklos Szeredi
2018-03-20 18:27           ` Eric W. Biederman
2018-03-21  8:38             ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87606wtxen.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=alban@kinvolk.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=dongsu@kinvolk.io \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=sargun@sargun.me \
    --cc=seth.forshee@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).