All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/11] FUSE mounts from non-init user namespaces
@ 2017-12-22 14:32 Dongsu Park
  0 siblings, 0 replies; 14+ messages in thread
From: Dongsu Park @ 2017-12-22 14:32 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Miklos Szeredi,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Seth Forshee, Alban Crequy, Eric W . Biederman, Sargun Dhillon

This patchset v5 is based on work by Seth Forshee and Eric Biederman.
The latest patchset was v4:
https://www.mail-archive.com/linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg1132206.html

At the moment, filesystems backed by physical medium can only be mounted
by real root in the initial user namespace. This restriction exists
because if it's allowed for root user in non-init user namespaces to
mount the filesystem, then it effectively allows the user to control the
underlying source of the filesystem. In case of FUSE, the source would
mean any underlying device.

However, in many use cases such as containers, it's necessary to allow
filesystems to be mounted from non-init user namespaces. Goal of this
patchset is to allow FUSE filesystems to be mounted from non-init user
namespaces. Support for other filesystems like ext4 are not in the
scope of this patchset.

Let me describe how to test mounting from non-init user namespaces. It's
assumed that tests are done via sshfs, a userspace filesystem based on
FUSE with ssh as backend. Testing system is Fedora 27.

====
$ sudo dnf install -y sshfs
$ sudo mkdir -p /mnt/userns

### workaround to get the sshfs permission checks
$ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies

$ unshare -U -r -m
# sshfs root@localhost: /mnt/userns

### You can see sshfs being mounted from a non-init user namespace
# mount | grep sshfs
root@localhost: on /mnt/userns type fuse.sshfs
(rw,nosuid,nodev,relatime,user_id=0,group_id=0)

# touch /mnt/userns/test
# ls -l /mnt/userns/test
-rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
====

Open another terminal, check the mountpoint from outside the namespace.

====
$ grep userns /proc/$(pidof sshfs)/mountinfo
131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
root@localhost: rw,user_id=0,group_id=0
====

After all tests are done, you can unmount the filesystem
inside the namespace.

====
# fusermount -u /mnt/userns
====

Changes since v4:
 * Remove other parts like ext4 to keep the patchset minimal for FUSE
 * Add and change commit messages
 * Describe how to test non-init user namespaces

TODO:
 * Think through potential security implications. There are 2 patches
   being prepared for security issues. One is "ima: define a new policy
   option named force" by Mimi Zohar, which adds an option to specify
   that the results should not be cached:
   https://marc.info/?l=linux-integrity&m=151275680115856&w=2
   The other one is to basically prevent FUSE results from being cached,
   which is still in progress.

 * Test IMA/LSMs. Details are written in
   https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md

Patches 1-2 deal with an additional flag of lookup_bdev() to check for
additional inode permission.

Patches 3-7 allow the superblock owner to change ownership of inodes, and
deal with additional capability checks w.r.t user namespaces.

Patches 8-10 allow FUSE filesystems to be mounted outside of the init
user namespace.

Patch 11 handles a corner case of non-root users in EVM.

The patchset is also available in our github repo:
  https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1


Eric W. Biederman (1):
  fs: Allow superblock owner to change ownership of inodes

Seth Forshee (10):
  block_dev: Support checking inode permissions in lookup_bdev()
  mtd: Check permissions towards mtd block device inode when mounting
  fs: Don't remove suid for CAP_FSETID for userns root
  fs: Allow superblock owner to access do_remount_sb()
  capabilities: Allow privileged user in s_user_ns to set security.*
    xattrs
  fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
  fuse: Support fuse filesystems outside of init_user_ns
  fuse: Restrict allow_other to the superblock's namespace or a
    descendant
  fuse: Allow user namespace mounts
  evm: Don't update hmacs in user ns mounts

 drivers/md/bcache/super.c           |  2 +-
 drivers/md/dm-table.c               |  2 +-
 drivers/mtd/mtdsuper.c              |  6 +++++-
 fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
 fs/block_dev.c                      | 13 ++++++++++---
 fs/fuse/cuse.c                      |  3 ++-
 fs/fuse/dev.c                       | 11 ++++++++---
 fs/fuse/dir.c                       | 16 ++++++++--------
 fs/fuse/fuse_i.h                    |  6 +++++-
 fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
 fs/inode.c                          |  6 ++++--
 fs/ioctl.c                          |  4 ++--
 fs/namespace.c                      |  4 ++--
 fs/proc/base.c                      |  7 +++++++
 fs/proc/generic.c                   |  7 +++++++
 fs/proc/proc_sysctl.c               |  7 +++++++
 fs/quota/quota.c                    |  2 +-
 include/linux/fs.h                  |  2 +-
 kernel/user_namespace.c             |  1 +
 security/commoncap.c                |  8 ++++++--
 security/integrity/evm/evm_crypto.c |  3 ++-
 21 files changed, 127 insertions(+), 52 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
  2018-01-18 14:58     ` Alban Crequy
@ 2018-02-19 23:09           ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2018-02-19 23:09 UTC (permalink / raw)
  To: Alban Crequy
  Cc: Miklos Szeredi, Linux Containers, LKML, Seth Forshee, Sargun Dhillon

Alban Crequy <alban-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org> writes:

> Hi Eric,
>
> Do you have some cycles for this now that it is the new year?
>
> A review on the associated ima issue would also be appreciated:
> https://www.mail-archive.com/linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg1587678.html

It has taken me longer than I expected but I do have time now.  I am
moving through these patches and issues a little slowly I do intend to
get us through the fuse issues this development cycle if at all
possible.

I think for starters we should restrict ourselves to the last 4 patches
aka (8, 9, 10, 11).

In particular we should concentrate on
[8/11] fuse: Support fuse filesystems outside of init_user_ns
[9/11] fuse: Restrict allow_other to the superblock's namespace or a descendant

The tricky issues are handled in the vfs, and I think the remaining
tricky issues are evm and ima.  Which are close enough to be resolved
that we can count them as resolved.

Once we have 8 & 9 reviewed and merged we can double check there isn't
some silly reason not to set FS_USERNS_MOUNT on fuse and then enable it.

I would like to double check and ensure there are not silly issues with
posix acls or anything else in the vfs.  But I think except for a silly
oversight we are good.

I should probably also add a patch that adds to
Documentation/filesystems that explains what the vfs does for
unprivileged mounts.  So that I can point people working on filesystems
and are thinking about enabling user namespace mounts at the
documentation for what the vfs does.  That would also provide a good
checklist to ensure the way the vfs handles things is sufficient for
fuse.

As for the earlier patches that enable things.  Overall they are
good.  They are slightly dangerous as they enable more code paths
to unprivileged users.  But mostly I think they are not immediately
necessary and as such a distraction to getting this code in.

That said once we get the fuse bits reviewed merged I will be more than
happy to merge the relaxation of permission checks that we can perform
now that s_user_ns exists.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
@ 2018-02-19 23:09           ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2018-02-19 23:09 UTC (permalink / raw)
  To: Alban Crequy
  Cc: Dongsu Park, LKML, Linux Containers, Miklos Szeredi,
	Seth Forshee, Sargun Dhillon

Alban Crequy <alban@kinvolk.io> writes:

> Hi Eric,
>
> Do you have some cycles for this now that it is the new year?
>
> A review on the associated ima issue would also be appreciated:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1587678.html

It has taken me longer than I expected but I do have time now.  I am
moving through these patches and issues a little slowly I do intend to
get us through the fuse issues this development cycle if at all
possible.

I think for starters we should restrict ourselves to the last 4 patches
aka (8, 9, 10, 11).

In particular we should concentrate on
[8/11] fuse: Support fuse filesystems outside of init_user_ns
[9/11] fuse: Restrict allow_other to the superblock's namespace or a descendant

The tricky issues are handled in the vfs, and I think the remaining
tricky issues are evm and ima.  Which are close enough to be resolved
that we can count them as resolved.

Once we have 8 & 9 reviewed and merged we can double check there isn't
some silly reason not to set FS_USERNS_MOUNT on fuse and then enable it.

I would like to double check and ensure there are not silly issues with
posix acls or anything else in the vfs.  But I think except for a silly
oversight we are good.

I should probably also add a patch that adds to
Documentation/filesystems that explains what the vfs does for
unprivileged mounts.  So that I can point people working on filesystems
and are thinking about enabling user namespace mounts at the
documentation for what the vfs does.  That would also provide a good
checklist to ensure the way the vfs handles things is sufficient for
fuse.

As for the earlier patches that enable things.  Overall they are
good.  They are slightly dangerous as they enable more code paths
to unprivileged users.  But mostly I think they are not immediately
necessary and as such a distraction to getting this code in.

That said once we get the fuse bits reviewed merged I will be more than
happy to merge the relaxation of permission checks that we can perform
now that s_user_ns exists.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
       [not found]     ` <CAOssrKey+oxahrXHO5d6Lu1ZD=r1t-b0i4iZM_Ke9ToqTckjkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-16 21:53       ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2018-02-16 21:53 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, lkml,
	Seth Forshee, Alban Crequy, Sargun Dhillon

Miklos Szeredi <mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park <dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org> wrote:
>
>> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
>> additional inode permission.
>
> fuse_blk is less suitable for unprivileged mounting than plain fuse.
> fusermount doesn't allow mounting fuse_blk unprivileged, so there's
> little data about that usecase (IIRC ntfs3g guys did that, or at least
> tried to do it, but I don't remember the details).
>
> As such, I think we should leave it out of the initial version.  Which
> means you can drop patches 1-2 from this series.  Unless there's a
> strong use case for this.  In which case we should look hard at the
> differences between fuse_blk and fuse and how that affects
> unprivileged operation.   There are a few assumptions about fuse_blk
> filesystem being more "well behaved", I think.

Especially to start with I am fine with that.

It makes a lot of sense to get the obvious cases first.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
  2018-02-13 11:32     ` Miklos Szeredi
  (?)
@ 2018-02-16 21:53     ` Eric W. Biederman
  -1 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2018-02-16 21:53 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Dongsu Park, lkml, containers, Alban Crequy, Seth Forshee,
	Sargun Dhillon

Miklos Szeredi <mszeredi@redhat.com> writes:

> On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park <dongsu@kinvolk.io> wrote:
>
>> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
>> additional inode permission.
>
> fuse_blk is less suitable for unprivileged mounting than plain fuse.
> fusermount doesn't allow mounting fuse_blk unprivileged, so there's
> little data about that usecase (IIRC ntfs3g guys did that, or at least
> tried to do it, but I don't remember the details).
>
> As such, I think we should leave it out of the initial version.  Which
> means you can drop patches 1-2 from this series.  Unless there's a
> strong use case for this.  In which case we should look hard at the
> differences between fuse_blk and fuse and how that affects
> unprivileged operation.   There are a few assumptions about fuse_blk
> filesystem being more "well behaved", I think.

Especially to start with I am fine with that.

It makes a lot of sense to get the obvious cases first.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
  2017-12-22 14:32 Dongsu Park
@ 2018-02-13 11:32     ` Miklos Szeredi
       [not found] ` <cover.1512741134.git.dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
  1 sibling, 0 replies; 14+ messages in thread
From: Miklos Szeredi @ 2018-02-13 11:32 UTC (permalink / raw)
  To: Dongsu Park
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, lkml,
	Seth Forshee, Alban Crequy, Eric W . Biederman, Sargun Dhillon

On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park <dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org> wrote:

> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
> additional inode permission.

fuse_blk is less suitable for unprivileged mounting than plain fuse.
fusermount doesn't allow mounting fuse_blk unprivileged, so there's
little data about that usecase (IIRC ntfs3g guys did that, or at least
tried to do it, but I don't remember the details).

As such, I think we should leave it out of the initial version.  Which
means you can drop patches 1-2 from this series.  Unless there's a
strong use case for this.  In which case we should look hard at the
differences between fuse_blk and fuse and how that affects
unprivileged operation.   There are a few assumptions about fuse_blk
filesystem being more "well behaved", I think.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
@ 2018-02-13 11:32     ` Miklos Szeredi
  0 siblings, 0 replies; 14+ messages in thread
From: Miklos Szeredi @ 2018-02-13 11:32 UTC (permalink / raw)
  To: Dongsu Park
  Cc: lkml, containers, Alban Crequy, Eric W . Biederman, Seth Forshee,
	Sargun Dhillon

On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park <dongsu@kinvolk.io> wrote:

> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
> additional inode permission.

fuse_blk is less suitable for unprivileged mounting than plain fuse.
fusermount doesn't allow mounting fuse_blk unprivileged, so there's
little data about that usecase (IIRC ntfs3g guys did that, or at least
tried to do it, but I don't remember the details).

As such, I think we should leave it out of the initial version.  Which
means you can drop patches 1-2 from this series.  Unless there's a
strong use case for this.  In which case we should look hard at the
differences between fuse_blk and fuse and how that affects
unprivileged operation.   There are a few assumptions about fuse_blk
filesystem being more "well behaved", I think.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
       [not found]     ` <CANxcAMvwwiPXBTKmTM9sEo8Y1T--V7fNaFqzHfyEvwvaYQV60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-18 14:58       ` Alban Crequy
  0 siblings, 0 replies; 14+ messages in thread
From: Alban Crequy @ 2018-01-18 14:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, LKML, Seth Forshee, Sargun Dhillon

On Tue, Jan 9, 2018 at 4:05 PM, Dongsu Park <dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org> wrote:
> Hi,
>
> On Mon, Dec 25, 2017 at 8:05 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Dongsu Park <dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org> writes:
>>
>>> This patchset v5 is based on work by Seth Forshee and Eric Biederman.
>>> The latest patchset was v4:
>>> https://www.mail-archive.com/linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg1132206.html
>>>
>>> At the moment, filesystems backed by physical medium can only be mounted
>>> by real root in the initial user namespace. This restriction exists
>>> because if it's allowed for root user in non-init user namespaces to
>>> mount the filesystem, then it effectively allows the user to control the
>>> underlying source of the filesystem. In case of FUSE, the source would
>>> mean any underlying device.
>>>
>>> However, in many use cases such as containers, it's necessary to allow
>>> filesystems to be mounted from non-init user namespaces. Goal of this
>>> patchset is to allow FUSE filesystems to be mounted from non-init user
>>> namespaces. Support for other filesystems like ext4 are not in the
>>> scope of this patchset.
>>>
>>> Let me describe how to test mounting from non-init user namespaces. It's
>>> assumed that tests are done via sshfs, a userspace filesystem based on
>>> FUSE with ssh as backend. Testing system is Fedora 27.
>>
>> In general I am for this work, and more bodies and more eyes on it is
>> generally better.
>>
>> I will review this after the New Year, I am out for the holidays right
>> now.
>
> Thanks. I'll wait for your review.

Hi Eric,

Do you have some cycles for this now that it is the new year?

A review on the associated ima issue would also be appreciated:
https://www.mail-archive.com/linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg1587678.html

Cheers,
Alban

>>> ====
>>> $ sudo dnf install -y sshfs
>>> $ sudo mkdir -p /mnt/userns
>>>
>>> ### workaround to get the sshfs permission checks
>>> $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies
>>>
>>> $ unshare -U -r -m
>>> # sshfs root@localhost: /mnt/userns
>>>
>>> ### You can see sshfs being mounted from a non-init user namespace
>>> # mount | grep sshfs
>>> root@localhost: on /mnt/userns type fuse.sshfs
>>> (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
>>>
>>> # touch /mnt/userns/test
>>> # ls -l /mnt/userns/test
>>> -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
>>> ====
>>>
>>> Open another terminal, check the mountpoint from outside the namespace.
>>>
>>> ====
>>> $ grep userns /proc/$(pidof sshfs)/mountinfo
>>> 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
>>> root@localhost: rw,user_id=0,group_id=0
>>> ====
>>>
>>> After all tests are done, you can unmount the filesystem
>>> inside the namespace.
>>>
>>> ====
>>> # fusermount -u /mnt/userns
>>> ====
>>>
>>> Changes since v4:
>>>  * Remove other parts like ext4 to keep the patchset minimal for FUSE
>>>  * Add and change commit messages
>>>  * Describe how to test non-init user namespaces
>>>
>>> TODO:
>>>  * Think through potential security implications. There are 2 patches
>>>    being prepared for security issues. One is "ima: define a new policy
>>>    option named force" by Mimi Zohar, which adds an option to specify
>>>    that the results should not be cached:
>>>    https://marc.info/?l=linux-integrity&m=151275680115856&w=2
>>>    The other one is to basically prevent FUSE results from being cached,
>>>    which is still in progress.
>>>
>>>  * Test IMA/LSMs. Details are written in
>>>    https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md
>>>
>>> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
>>> additional inode permission.
>>>
>>> Patches 3-7 allow the superblock owner to change ownership of inodes, and
>>> deal with additional capability checks w.r.t user namespaces.
>>>
>>> Patches 8-10 allow FUSE filesystems to be mounted outside of the init
>>> user namespace.
>>>
>>> Patch 11 handles a corner case of non-root users in EVM.
>>>
>>> The patchset is also available in our github repo:
>>>   https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1
>>>
>>>
>>> Eric W. Biederman (1):
>>>   fs: Allow superblock owner to change ownership of inodes
>>>
>>> Seth Forshee (10):
>>>   block_dev: Support checking inode permissions in lookup_bdev()
>>>   mtd: Check permissions towards mtd block device inode when mounting
>>>   fs: Don't remove suid for CAP_FSETID for userns root
>>>   fs: Allow superblock owner to access do_remount_sb()
>>>   capabilities: Allow privileged user in s_user_ns to set security.*
>>>     xattrs
>>>   fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
>>>   fuse: Support fuse filesystems outside of init_user_ns
>>>   fuse: Restrict allow_other to the superblock's namespace or a
>>>     descendant
>>>   fuse: Allow user namespace mounts
>>>   evm: Don't update hmacs in user ns mounts
>>>
>>>  drivers/md/bcache/super.c           |  2 +-
>>>  drivers/md/dm-table.c               |  2 +-
>>>  drivers/mtd/mtdsuper.c              |  6 +++++-
>>>  fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
>>>  fs/block_dev.c                      | 13 ++++++++++---
>>>  fs/fuse/cuse.c                      |  3 ++-
>>>  fs/fuse/dev.c                       | 11 ++++++++---
>>>  fs/fuse/dir.c                       | 16 ++++++++--------
>>>  fs/fuse/fuse_i.h                    |  6 +++++-
>>>  fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
>>>  fs/inode.c                          |  6 ++++--
>>>  fs/ioctl.c                          |  4 ++--
>>>  fs/namespace.c                      |  4 ++--
>>>  fs/proc/base.c                      |  7 +++++++
>>>  fs/proc/generic.c                   |  7 +++++++
>>>  fs/proc/proc_sysctl.c               |  7 +++++++
>>>  fs/quota/quota.c                    |  2 +-
>>>  include/linux/fs.h                  |  2 +-
>>>  kernel/user_namespace.c             |  1 +
>>>  security/commoncap.c                |  8 ++++++--
>>>  security/integrity/evm/evm_crypto.c |  3 ++-
>>>  21 files changed, 127 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
  2018-01-09 15:05   ` Dongsu Park
@ 2018-01-18 14:58     ` Alban Crequy
       [not found]       ` <CADZs7q438szfwd-kaaRDnpDFrmno3zy7Zq+6EsnotW8bS0vrTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]     ` <CANxcAMvwwiPXBTKmTM9sEo8Y1T--V7fNaFqzHfyEvwvaYQV60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Alban Crequy @ 2018-01-18 14:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dongsu Park, LKML, Linux Containers, Miklos Szeredi,
	Seth Forshee, Sargun Dhillon

On Tue, Jan 9, 2018 at 4:05 PM, Dongsu Park <dongsu@kinvolk.io> wrote:
> Hi,
>
> On Mon, Dec 25, 2017 at 8:05 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Dongsu Park <dongsu@kinvolk.io> writes:
>>
>>> This patchset v5 is based on work by Seth Forshee and Eric Biederman.
>>> The latest patchset was v4:
>>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1132206.html
>>>
>>> At the moment, filesystems backed by physical medium can only be mounted
>>> by real root in the initial user namespace. This restriction exists
>>> because if it's allowed for root user in non-init user namespaces to
>>> mount the filesystem, then it effectively allows the user to control the
>>> underlying source of the filesystem. In case of FUSE, the source would
>>> mean any underlying device.
>>>
>>> However, in many use cases such as containers, it's necessary to allow
>>> filesystems to be mounted from non-init user namespaces. Goal of this
>>> patchset is to allow FUSE filesystems to be mounted from non-init user
>>> namespaces. Support for other filesystems like ext4 are not in the
>>> scope of this patchset.
>>>
>>> Let me describe how to test mounting from non-init user namespaces. It's
>>> assumed that tests are done via sshfs, a userspace filesystem based on
>>> FUSE with ssh as backend. Testing system is Fedora 27.
>>
>> In general I am for this work, and more bodies and more eyes on it is
>> generally better.
>>
>> I will review this after the New Year, I am out for the holidays right
>> now.
>
> Thanks. I'll wait for your review.

Hi Eric,

Do you have some cycles for this now that it is the new year?

A review on the associated ima issue would also be appreciated:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1587678.html

Cheers,
Alban

>>> ====
>>> $ sudo dnf install -y sshfs
>>> $ sudo mkdir -p /mnt/userns
>>>
>>> ### workaround to get the sshfs permission checks
>>> $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies
>>>
>>> $ unshare -U -r -m
>>> # sshfs root@localhost: /mnt/userns
>>>
>>> ### You can see sshfs being mounted from a non-init user namespace
>>> # mount | grep sshfs
>>> root@localhost: on /mnt/userns type fuse.sshfs
>>> (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
>>>
>>> # touch /mnt/userns/test
>>> # ls -l /mnt/userns/test
>>> -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
>>> ====
>>>
>>> Open another terminal, check the mountpoint from outside the namespace.
>>>
>>> ====
>>> $ grep userns /proc/$(pidof sshfs)/mountinfo
>>> 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
>>> root@localhost: rw,user_id=0,group_id=0
>>> ====
>>>
>>> After all tests are done, you can unmount the filesystem
>>> inside the namespace.
>>>
>>> ====
>>> # fusermount -u /mnt/userns
>>> ====
>>>
>>> Changes since v4:
>>>  * Remove other parts like ext4 to keep the patchset minimal for FUSE
>>>  * Add and change commit messages
>>>  * Describe how to test non-init user namespaces
>>>
>>> TODO:
>>>  * Think through potential security implications. There are 2 patches
>>>    being prepared for security issues. One is "ima: define a new policy
>>>    option named force" by Mimi Zohar, which adds an option to specify
>>>    that the results should not be cached:
>>>    https://marc.info/?l=linux-integrity&m=151275680115856&w=2
>>>    The other one is to basically prevent FUSE results from being cached,
>>>    which is still in progress.
>>>
>>>  * Test IMA/LSMs. Details are written in
>>>    https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md
>>>
>>> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
>>> additional inode permission.
>>>
>>> Patches 3-7 allow the superblock owner to change ownership of inodes, and
>>> deal with additional capability checks w.r.t user namespaces.
>>>
>>> Patches 8-10 allow FUSE filesystems to be mounted outside of the init
>>> user namespace.
>>>
>>> Patch 11 handles a corner case of non-root users in EVM.
>>>
>>> The patchset is also available in our github repo:
>>>   https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1
>>>
>>>
>>> Eric W. Biederman (1):
>>>   fs: Allow superblock owner to change ownership of inodes
>>>
>>> Seth Forshee (10):
>>>   block_dev: Support checking inode permissions in lookup_bdev()
>>>   mtd: Check permissions towards mtd block device inode when mounting
>>>   fs: Don't remove suid for CAP_FSETID for userns root
>>>   fs: Allow superblock owner to access do_remount_sb()
>>>   capabilities: Allow privileged user in s_user_ns to set security.*
>>>     xattrs
>>>   fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
>>>   fuse: Support fuse filesystems outside of init_user_ns
>>>   fuse: Restrict allow_other to the superblock's namespace or a
>>>     descendant
>>>   fuse: Allow user namespace mounts
>>>   evm: Don't update hmacs in user ns mounts
>>>
>>>  drivers/md/bcache/super.c           |  2 +-
>>>  drivers/md/dm-table.c               |  2 +-
>>>  drivers/mtd/mtdsuper.c              |  6 +++++-
>>>  fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
>>>  fs/block_dev.c                      | 13 ++++++++++---
>>>  fs/fuse/cuse.c                      |  3 ++-
>>>  fs/fuse/dev.c                       | 11 ++++++++---
>>>  fs/fuse/dir.c                       | 16 ++++++++--------
>>>  fs/fuse/fuse_i.h                    |  6 +++++-
>>>  fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
>>>  fs/inode.c                          |  6 ++++--
>>>  fs/ioctl.c                          |  4 ++--
>>>  fs/namespace.c                      |  4 ++--
>>>  fs/proc/base.c                      |  7 +++++++
>>>  fs/proc/generic.c                   |  7 +++++++
>>>  fs/proc/proc_sysctl.c               |  7 +++++++
>>>  fs/quota/quota.c                    |  2 +-
>>>  include/linux/fs.h                  |  2 +-
>>>  kernel/user_namespace.c             |  1 +
>>>  security/commoncap.c                |  8 ++++++--
>>>  security/integrity/evm/evm_crypto.c |  3 ++-
>>>  21 files changed, 127 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
       [not found]   ` <877etbcmnd.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2018-01-09 15:05     ` Dongsu Park
  0 siblings, 0 replies; 14+ messages in thread
From: Dongsu Park @ 2018-01-09 15:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linux Containers, LKML, Seth Forshee,
	Alban Crequy, Sargun Dhillon

Hi,

On Mon, Dec 25, 2017 at 8:05 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Dongsu Park <dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org> writes:
>
>> This patchset v5 is based on work by Seth Forshee and Eric Biederman.
>> The latest patchset was v4:
>> https://www.mail-archive.com/linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg1132206.html
>>
>> At the moment, filesystems backed by physical medium can only be mounted
>> by real root in the initial user namespace. This restriction exists
>> because if it's allowed for root user in non-init user namespaces to
>> mount the filesystem, then it effectively allows the user to control the
>> underlying source of the filesystem. In case of FUSE, the source would
>> mean any underlying device.
>>
>> However, in many use cases such as containers, it's necessary to allow
>> filesystems to be mounted from non-init user namespaces. Goal of this
>> patchset is to allow FUSE filesystems to be mounted from non-init user
>> namespaces. Support for other filesystems like ext4 are not in the
>> scope of this patchset.
>>
>> Let me describe how to test mounting from non-init user namespaces. It's
>> assumed that tests are done via sshfs, a userspace filesystem based on
>> FUSE with ssh as backend. Testing system is Fedora 27.
>
> In general I am for this work, and more bodies and more eyes on it is
> generally better.
>
> I will review this after the New Year, I am out for the holidays right
> now.

Thanks. I'll wait for your review.

Dongsu

> Eric
>
>
>>
>> ====
>> $ sudo dnf install -y sshfs
>> $ sudo mkdir -p /mnt/userns
>>
>> ### workaround to get the sshfs permission checks
>> $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies
>>
>> $ unshare -U -r -m
>> # sshfs root@localhost: /mnt/userns
>>
>> ### You can see sshfs being mounted from a non-init user namespace
>> # mount | grep sshfs
>> root@localhost: on /mnt/userns type fuse.sshfs
>> (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
>>
>> # touch /mnt/userns/test
>> # ls -l /mnt/userns/test
>> -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
>> ====
>>
>> Open another terminal, check the mountpoint from outside the namespace.
>>
>> ====
>> $ grep userns /proc/$(pidof sshfs)/mountinfo
>> 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
>> root@localhost: rw,user_id=0,group_id=0
>> ====
>>
>> After all tests are done, you can unmount the filesystem
>> inside the namespace.
>>
>> ====
>> # fusermount -u /mnt/userns
>> ====
>>
>> Changes since v4:
>>  * Remove other parts like ext4 to keep the patchset minimal for FUSE
>>  * Add and change commit messages
>>  * Describe how to test non-init user namespaces
>>
>> TODO:
>>  * Think through potential security implications. There are 2 patches
>>    being prepared for security issues. One is "ima: define a new policy
>>    option named force" by Mimi Zohar, which adds an option to specify
>>    that the results should not be cached:
>>    https://marc.info/?l=linux-integrity&m=151275680115856&w=2
>>    The other one is to basically prevent FUSE results from being cached,
>>    which is still in progress.
>>
>>  * Test IMA/LSMs. Details are written in
>>    https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md
>>
>> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
>> additional inode permission.
>>
>> Patches 3-7 allow the superblock owner to change ownership of inodes, and
>> deal with additional capability checks w.r.t user namespaces.
>>
>> Patches 8-10 allow FUSE filesystems to be mounted outside of the init
>> user namespace.
>>
>> Patch 11 handles a corner case of non-root users in EVM.
>>
>> The patchset is also available in our github repo:
>>   https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1
>>
>>
>> Eric W. Biederman (1):
>>   fs: Allow superblock owner to change ownership of inodes
>>
>> Seth Forshee (10):
>>   block_dev: Support checking inode permissions in lookup_bdev()
>>   mtd: Check permissions towards mtd block device inode when mounting
>>   fs: Don't remove suid for CAP_FSETID for userns root
>>   fs: Allow superblock owner to access do_remount_sb()
>>   capabilities: Allow privileged user in s_user_ns to set security.*
>>     xattrs
>>   fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
>>   fuse: Support fuse filesystems outside of init_user_ns
>>   fuse: Restrict allow_other to the superblock's namespace or a
>>     descendant
>>   fuse: Allow user namespace mounts
>>   evm: Don't update hmacs in user ns mounts
>>
>>  drivers/md/bcache/super.c           |  2 +-
>>  drivers/md/dm-table.c               |  2 +-
>>  drivers/mtd/mtdsuper.c              |  6 +++++-
>>  fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
>>  fs/block_dev.c                      | 13 ++++++++++---
>>  fs/fuse/cuse.c                      |  3 ++-
>>  fs/fuse/dev.c                       | 11 ++++++++---
>>  fs/fuse/dir.c                       | 16 ++++++++--------
>>  fs/fuse/fuse_i.h                    |  6 +++++-
>>  fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
>>  fs/inode.c                          |  6 ++++--
>>  fs/ioctl.c                          |  4 ++--
>>  fs/namespace.c                      |  4 ++--
>>  fs/proc/base.c                      |  7 +++++++
>>  fs/proc/generic.c                   |  7 +++++++
>>  fs/proc/proc_sysctl.c               |  7 +++++++
>>  fs/quota/quota.c                    |  2 +-
>>  include/linux/fs.h                  |  2 +-
>>  kernel/user_namespace.c             |  1 +
>>  security/commoncap.c                |  8 ++++++--
>>  security/integrity/evm/evm_crypto.c |  3 ++-
>>  21 files changed, 127 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
  2017-12-25  7:05 ` Eric W. Biederman
       [not found]   ` <877etbcmnd.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2018-01-09 15:05   ` Dongsu Park
  2018-01-18 14:58     ` Alban Crequy
       [not found]     ` <CANxcAMvwwiPXBTKmTM9sEo8Y1T--V7fNaFqzHfyEvwvaYQV60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 2 replies; 14+ messages in thread
From: Dongsu Park @ 2018-01-09 15:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, Linux Containers, Alban Crequy, Miklos Szeredi,
	Seth Forshee, Sargun Dhillon

Hi,

On Mon, Dec 25, 2017 at 8:05 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Dongsu Park <dongsu@kinvolk.io> writes:
>
>> This patchset v5 is based on work by Seth Forshee and Eric Biederman.
>> The latest patchset was v4:
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1132206.html
>>
>> At the moment, filesystems backed by physical medium can only be mounted
>> by real root in the initial user namespace. This restriction exists
>> because if it's allowed for root user in non-init user namespaces to
>> mount the filesystem, then it effectively allows the user to control the
>> underlying source of the filesystem. In case of FUSE, the source would
>> mean any underlying device.
>>
>> However, in many use cases such as containers, it's necessary to allow
>> filesystems to be mounted from non-init user namespaces. Goal of this
>> patchset is to allow FUSE filesystems to be mounted from non-init user
>> namespaces. Support for other filesystems like ext4 are not in the
>> scope of this patchset.
>>
>> Let me describe how to test mounting from non-init user namespaces. It's
>> assumed that tests are done via sshfs, a userspace filesystem based on
>> FUSE with ssh as backend. Testing system is Fedora 27.
>
> In general I am for this work, and more bodies and more eyes on it is
> generally better.
>
> I will review this after the New Year, I am out for the holidays right
> now.

Thanks. I'll wait for your review.

Dongsu

> Eric
>
>
>>
>> ====
>> $ sudo dnf install -y sshfs
>> $ sudo mkdir -p /mnt/userns
>>
>> ### workaround to get the sshfs permission checks
>> $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies
>>
>> $ unshare -U -r -m
>> # sshfs root@localhost: /mnt/userns
>>
>> ### You can see sshfs being mounted from a non-init user namespace
>> # mount | grep sshfs
>> root@localhost: on /mnt/userns type fuse.sshfs
>> (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
>>
>> # touch /mnt/userns/test
>> # ls -l /mnt/userns/test
>> -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
>> ====
>>
>> Open another terminal, check the mountpoint from outside the namespace.
>>
>> ====
>> $ grep userns /proc/$(pidof sshfs)/mountinfo
>> 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
>> root@localhost: rw,user_id=0,group_id=0
>> ====
>>
>> After all tests are done, you can unmount the filesystem
>> inside the namespace.
>>
>> ====
>> # fusermount -u /mnt/userns
>> ====
>>
>> Changes since v4:
>>  * Remove other parts like ext4 to keep the patchset minimal for FUSE
>>  * Add and change commit messages
>>  * Describe how to test non-init user namespaces
>>
>> TODO:
>>  * Think through potential security implications. There are 2 patches
>>    being prepared for security issues. One is "ima: define a new policy
>>    option named force" by Mimi Zohar, which adds an option to specify
>>    that the results should not be cached:
>>    https://marc.info/?l=linux-integrity&m=151275680115856&w=2
>>    The other one is to basically prevent FUSE results from being cached,
>>    which is still in progress.
>>
>>  * Test IMA/LSMs. Details are written in
>>    https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md
>>
>> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
>> additional inode permission.
>>
>> Patches 3-7 allow the superblock owner to change ownership of inodes, and
>> deal with additional capability checks w.r.t user namespaces.
>>
>> Patches 8-10 allow FUSE filesystems to be mounted outside of the init
>> user namespace.
>>
>> Patch 11 handles a corner case of non-root users in EVM.
>>
>> The patchset is also available in our github repo:
>>   https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1
>>
>>
>> Eric W. Biederman (1):
>>   fs: Allow superblock owner to change ownership of inodes
>>
>> Seth Forshee (10):
>>   block_dev: Support checking inode permissions in lookup_bdev()
>>   mtd: Check permissions towards mtd block device inode when mounting
>>   fs: Don't remove suid for CAP_FSETID for userns root
>>   fs: Allow superblock owner to access do_remount_sb()
>>   capabilities: Allow privileged user in s_user_ns to set security.*
>>     xattrs
>>   fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
>>   fuse: Support fuse filesystems outside of init_user_ns
>>   fuse: Restrict allow_other to the superblock's namespace or a
>>     descendant
>>   fuse: Allow user namespace mounts
>>   evm: Don't update hmacs in user ns mounts
>>
>>  drivers/md/bcache/super.c           |  2 +-
>>  drivers/md/dm-table.c               |  2 +-
>>  drivers/mtd/mtdsuper.c              |  6 +++++-
>>  fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
>>  fs/block_dev.c                      | 13 ++++++++++---
>>  fs/fuse/cuse.c                      |  3 ++-
>>  fs/fuse/dev.c                       | 11 ++++++++---
>>  fs/fuse/dir.c                       | 16 ++++++++--------
>>  fs/fuse/fuse_i.h                    |  6 +++++-
>>  fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
>>  fs/inode.c                          |  6 ++++--
>>  fs/ioctl.c                          |  4 ++--
>>  fs/namespace.c                      |  4 ++--
>>  fs/proc/base.c                      |  7 +++++++
>>  fs/proc/generic.c                   |  7 +++++++
>>  fs/proc/proc_sysctl.c               |  7 +++++++
>>  fs/quota/quota.c                    |  2 +-
>>  include/linux/fs.h                  |  2 +-
>>  kernel/user_namespace.c             |  1 +
>>  security/commoncap.c                |  8 ++++++--
>>  security/integrity/evm/evm_crypto.c |  3 ++-
>>  21 files changed, 127 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
       [not found] ` <cover.1512741134.git.dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
@ 2017-12-25  7:05   ` Eric W. Biederman
  2018-02-13 11:32     ` Miklos Szeredi
  1 sibling, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2017-12-25  7:05 UTC (permalink / raw)
  To: Dongsu Park
  Cc: Miklos Szeredi,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Seth Forshee, Alban Crequy,
	Sargun Dhillon

Dongsu Park <dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org> writes:

> This patchset v5 is based on work by Seth Forshee and Eric Biederman.
> The latest patchset was v4:
> https://www.mail-archive.com/linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg1132206.html
>
> At the moment, filesystems backed by physical medium can only be mounted
> by real root in the initial user namespace. This restriction exists
> because if it's allowed for root user in non-init user namespaces to
> mount the filesystem, then it effectively allows the user to control the
> underlying source of the filesystem. In case of FUSE, the source would
> mean any underlying device.
>
> However, in many use cases such as containers, it's necessary to allow
> filesystems to be mounted from non-init user namespaces. Goal of this
> patchset is to allow FUSE filesystems to be mounted from non-init user
> namespaces. Support for other filesystems like ext4 are not in the
> scope of this patchset.
>
> Let me describe how to test mounting from non-init user namespaces. It's
> assumed that tests are done via sshfs, a userspace filesystem based on
> FUSE with ssh as backend. Testing system is Fedora 27.

In general I am for this work, and more bodies and more eyes on it is
generally better.

I will review this after the New Year, I am out for the holidays right
now.

Eric


>
> ====
> $ sudo dnf install -y sshfs
> $ sudo mkdir -p /mnt/userns
>
> ### workaround to get the sshfs permission checks
> $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies
>
> $ unshare -U -r -m
> # sshfs root@localhost: /mnt/userns
>
> ### You can see sshfs being mounted from a non-init user namespace
> # mount | grep sshfs
> root@localhost: on /mnt/userns type fuse.sshfs
> (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
>
> # touch /mnt/userns/test
> # ls -l /mnt/userns/test
> -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
> ====
>
> Open another terminal, check the mountpoint from outside the namespace.
>
> ====
> $ grep userns /proc/$(pidof sshfs)/mountinfo
> 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
> root@localhost: rw,user_id=0,group_id=0
> ====
>
> After all tests are done, you can unmount the filesystem
> inside the namespace.
>
> ====
> # fusermount -u /mnt/userns
> ====
>
> Changes since v4:
>  * Remove other parts like ext4 to keep the patchset minimal for FUSE
>  * Add and change commit messages
>  * Describe how to test non-init user namespaces
>
> TODO:
>  * Think through potential security implications. There are 2 patches
>    being prepared for security issues. One is "ima: define a new policy
>    option named force" by Mimi Zohar, which adds an option to specify
>    that the results should not be cached:
>    https://marc.info/?l=linux-integrity&m=151275680115856&w=2
>    The other one is to basically prevent FUSE results from being cached,
>    which is still in progress.
>
>  * Test IMA/LSMs. Details are written in
>    https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md
>
> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
> additional inode permission.
>
> Patches 3-7 allow the superblock owner to change ownership of inodes, and
> deal with additional capability checks w.r.t user namespaces.
>
> Patches 8-10 allow FUSE filesystems to be mounted outside of the init
> user namespace.
>
> Patch 11 handles a corner case of non-root users in EVM.
>
> The patchset is also available in our github repo:
>   https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1
>
>
> Eric W. Biederman (1):
>   fs: Allow superblock owner to change ownership of inodes
>
> Seth Forshee (10):
>   block_dev: Support checking inode permissions in lookup_bdev()
>   mtd: Check permissions towards mtd block device inode when mounting
>   fs: Don't remove suid for CAP_FSETID for userns root
>   fs: Allow superblock owner to access do_remount_sb()
>   capabilities: Allow privileged user in s_user_ns to set security.*
>     xattrs
>   fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
>   fuse: Support fuse filesystems outside of init_user_ns
>   fuse: Restrict allow_other to the superblock's namespace or a
>     descendant
>   fuse: Allow user namespace mounts
>   evm: Don't update hmacs in user ns mounts
>
>  drivers/md/bcache/super.c           |  2 +-
>  drivers/md/dm-table.c               |  2 +-
>  drivers/mtd/mtdsuper.c              |  6 +++++-
>  fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
>  fs/block_dev.c                      | 13 ++++++++++---
>  fs/fuse/cuse.c                      |  3 ++-
>  fs/fuse/dev.c                       | 11 ++++++++---
>  fs/fuse/dir.c                       | 16 ++++++++--------
>  fs/fuse/fuse_i.h                    |  6 +++++-
>  fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
>  fs/inode.c                          |  6 ++++--
>  fs/ioctl.c                          |  4 ++--
>  fs/namespace.c                      |  4 ++--
>  fs/proc/base.c                      |  7 +++++++
>  fs/proc/generic.c                   |  7 +++++++
>  fs/proc/proc_sysctl.c               |  7 +++++++
>  fs/quota/quota.c                    |  2 +-
>  include/linux/fs.h                  |  2 +-
>  kernel/user_namespace.c             |  1 +
>  security/commoncap.c                |  8 ++++++--
>  security/integrity/evm/evm_crypto.c |  3 ++-
>  21 files changed, 127 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 00/11] FUSE mounts from non-init user namespaces
  2017-12-22 14:32 Dongsu Park
@ 2017-12-25  7:05 ` Eric W. Biederman
       [not found]   ` <877etbcmnd.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2018-01-09 15:05   ` Dongsu Park
       [not found] ` <cover.1512741134.git.dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
  1 sibling, 2 replies; 14+ messages in thread
From: Eric W. Biederman @ 2017-12-25  7:05 UTC (permalink / raw)
  To: Dongsu Park
  Cc: linux-kernel, containers, Alban Crequy, Miklos Szeredi,
	Seth Forshee, Sargun Dhillon

Dongsu Park <dongsu@kinvolk.io> writes:

> This patchset v5 is based on work by Seth Forshee and Eric Biederman.
> The latest patchset was v4:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1132206.html
>
> At the moment, filesystems backed by physical medium can only be mounted
> by real root in the initial user namespace. This restriction exists
> because if it's allowed for root user in non-init user namespaces to
> mount the filesystem, then it effectively allows the user to control the
> underlying source of the filesystem. In case of FUSE, the source would
> mean any underlying device.
>
> However, in many use cases such as containers, it's necessary to allow
> filesystems to be mounted from non-init user namespaces. Goal of this
> patchset is to allow FUSE filesystems to be mounted from non-init user
> namespaces. Support for other filesystems like ext4 are not in the
> scope of this patchset.
>
> Let me describe how to test mounting from non-init user namespaces. It's
> assumed that tests are done via sshfs, a userspace filesystem based on
> FUSE with ssh as backend. Testing system is Fedora 27.

In general I am for this work, and more bodies and more eyes on it is
generally better.

I will review this after the New Year, I am out for the holidays right
now.

Eric


>
> ====
> $ sudo dnf install -y sshfs
> $ sudo mkdir -p /mnt/userns
>
> ### workaround to get the sshfs permission checks
> $ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies
>
> $ unshare -U -r -m
> # sshfs root@localhost: /mnt/userns
>
> ### You can see sshfs being mounted from a non-init user namespace
> # mount | grep sshfs
> root@localhost: on /mnt/userns type fuse.sshfs
> (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
>
> # touch /mnt/userns/test
> # ls -l /mnt/userns/test
> -rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
> ====
>
> Open another terminal, check the mountpoint from outside the namespace.
>
> ====
> $ grep userns /proc/$(pidof sshfs)/mountinfo
> 131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
> root@localhost: rw,user_id=0,group_id=0
> ====
>
> After all tests are done, you can unmount the filesystem
> inside the namespace.
>
> ====
> # fusermount -u /mnt/userns
> ====
>
> Changes since v4:
>  * Remove other parts like ext4 to keep the patchset minimal for FUSE
>  * Add and change commit messages
>  * Describe how to test non-init user namespaces
>
> TODO:
>  * Think through potential security implications. There are 2 patches
>    being prepared for security issues. One is "ima: define a new policy
>    option named force" by Mimi Zohar, which adds an option to specify
>    that the results should not be cached:
>    https://marc.info/?l=linux-integrity&m=151275680115856&w=2
>    The other one is to basically prevent FUSE results from being cached,
>    which is still in progress.
>
>  * Test IMA/LSMs. Details are written in
>    https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md
>
> Patches 1-2 deal with an additional flag of lookup_bdev() to check for
> additional inode permission.
>
> Patches 3-7 allow the superblock owner to change ownership of inodes, and
> deal with additional capability checks w.r.t user namespaces.
>
> Patches 8-10 allow FUSE filesystems to be mounted outside of the init
> user namespace.
>
> Patch 11 handles a corner case of non-root users in EVM.
>
> The patchset is also available in our github repo:
>   https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1
>
>
> Eric W. Biederman (1):
>   fs: Allow superblock owner to change ownership of inodes
>
> Seth Forshee (10):
>   block_dev: Support checking inode permissions in lookup_bdev()
>   mtd: Check permissions towards mtd block device inode when mounting
>   fs: Don't remove suid for CAP_FSETID for userns root
>   fs: Allow superblock owner to access do_remount_sb()
>   capabilities: Allow privileged user in s_user_ns to set security.*
>     xattrs
>   fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
>   fuse: Support fuse filesystems outside of init_user_ns
>   fuse: Restrict allow_other to the superblock's namespace or a
>     descendant
>   fuse: Allow user namespace mounts
>   evm: Don't update hmacs in user ns mounts
>
>  drivers/md/bcache/super.c           |  2 +-
>  drivers/md/dm-table.c               |  2 +-
>  drivers/mtd/mtdsuper.c              |  6 +++++-
>  fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
>  fs/block_dev.c                      | 13 ++++++++++---
>  fs/fuse/cuse.c                      |  3 ++-
>  fs/fuse/dev.c                       | 11 ++++++++---
>  fs/fuse/dir.c                       | 16 ++++++++--------
>  fs/fuse/fuse_i.h                    |  6 +++++-
>  fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
>  fs/inode.c                          |  6 ++++--
>  fs/ioctl.c                          |  4 ++--
>  fs/namespace.c                      |  4 ++--
>  fs/proc/base.c                      |  7 +++++++
>  fs/proc/generic.c                   |  7 +++++++
>  fs/proc/proc_sysctl.c               |  7 +++++++
>  fs/quota/quota.c                    |  2 +-
>  include/linux/fs.h                  |  2 +-
>  kernel/user_namespace.c             |  1 +
>  security/commoncap.c                |  8 ++++++--
>  security/integrity/evm/evm_crypto.c |  3 ++-
>  21 files changed, 127 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 00/11] FUSE mounts from non-init user namespaces
@ 2017-12-22 14:32 Dongsu Park
  2017-12-25  7:05 ` Eric W. Biederman
       [not found] ` <cover.1512741134.git.dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
  0 siblings, 2 replies; 14+ messages in thread
From: Dongsu Park @ 2017-12-22 14:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: containers, Alban Crequy, Eric W . Biederman, Miklos Szeredi,
	Seth Forshee, Sargun Dhillon, Dongsu Park

This patchset v5 is based on work by Seth Forshee and Eric Biederman.
The latest patchset was v4:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1132206.html

At the moment, filesystems backed by physical medium can only be mounted
by real root in the initial user namespace. This restriction exists
because if it's allowed for root user in non-init user namespaces to
mount the filesystem, then it effectively allows the user to control the
underlying source of the filesystem. In case of FUSE, the source would
mean any underlying device.

However, in many use cases such as containers, it's necessary to allow
filesystems to be mounted from non-init user namespaces. Goal of this
patchset is to allow FUSE filesystems to be mounted from non-init user
namespaces. Support for other filesystems like ext4 are not in the
scope of this patchset.

Let me describe how to test mounting from non-init user namespaces. It's
assumed that tests are done via sshfs, a userspace filesystem based on
FUSE with ssh as backend. Testing system is Fedora 27.

====
$ sudo dnf install -y sshfs
$ sudo mkdir -p /mnt/userns

### workaround to get the sshfs permission checks
$ sudo chown -R $UID:$UID /etc/ssh/ssh_config.d /usr/share/crypto-policies

$ unshare -U -r -m
# sshfs root@localhost: /mnt/userns

### You can see sshfs being mounted from a non-init user namespace
# mount | grep sshfs
root@localhost: on /mnt/userns type fuse.sshfs
(rw,nosuid,nodev,relatime,user_id=0,group_id=0)

# touch /mnt/userns/test
# ls -l /mnt/userns/test
-rw-r--r-- 1 root root 0 Dec 11 19:01 /mnt/userns/test
====

Open another terminal, check the mountpoint from outside the namespace.

====
$ grep userns /proc/$(pidof sshfs)/mountinfo
131 102 0:35 / /mnt/userns rw,nosuid,nodev,relatime - fuse.sshfs
root@localhost: rw,user_id=0,group_id=0
====

After all tests are done, you can unmount the filesystem
inside the namespace.

====
# fusermount -u /mnt/userns
====

Changes since v4:
 * Remove other parts like ext4 to keep the patchset minimal for FUSE
 * Add and change commit messages
 * Describe how to test non-init user namespaces

TODO:
 * Think through potential security implications. There are 2 patches
   being prepared for security issues. One is "ima: define a new policy
   option named force" by Mimi Zohar, which adds an option to specify
   that the results should not be cached:
   https://marc.info/?l=linux-integrity&m=151275680115856&w=2
   The other one is to basically prevent FUSE results from being cached,
   which is still in progress.

 * Test IMA/LSMs. Details are written in
   https://github.com/kinvolk/fuse-userns-patches/blob/master/tests/TESTING_INTEGRITY.md

Patches 1-2 deal with an additional flag of lookup_bdev() to check for
additional inode permission.

Patches 3-7 allow the superblock owner to change ownership of inodes, and
deal with additional capability checks w.r.t user namespaces.

Patches 8-10 allow FUSE filesystems to be mounted outside of the init
user namespace.

Patch 11 handles a corner case of non-root users in EVM.

The patchset is also available in our github repo:
  https://github.com/kinvolk/linux/tree/dongsu/fuse-userns-v5-1


Eric W. Biederman (1):
  fs: Allow superblock owner to change ownership of inodes

Seth Forshee (10):
  block_dev: Support checking inode permissions in lookup_bdev()
  mtd: Check permissions towards mtd block device inode when mounting
  fs: Don't remove suid for CAP_FSETID for userns root
  fs: Allow superblock owner to access do_remount_sb()
  capabilities: Allow privileged user in s_user_ns to set security.*
    xattrs
  fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
  fuse: Support fuse filesystems outside of init_user_ns
  fuse: Restrict allow_other to the superblock's namespace or a
    descendant
  fuse: Allow user namespace mounts
  evm: Don't update hmacs in user ns mounts

 drivers/md/bcache/super.c           |  2 +-
 drivers/md/dm-table.c               |  2 +-
 drivers/mtd/mtdsuper.c              |  6 +++++-
 fs/attr.c                           | 34 ++++++++++++++++++++++++++--------
 fs/block_dev.c                      | 13 ++++++++++---
 fs/fuse/cuse.c                      |  3 ++-
 fs/fuse/dev.c                       | 11 ++++++++---
 fs/fuse/dir.c                       | 16 ++++++++--------
 fs/fuse/fuse_i.h                    |  6 +++++-
 fs/fuse/inode.c                     | 35 +++++++++++++++++++++--------------
 fs/inode.c                          |  6 ++++--
 fs/ioctl.c                          |  4 ++--
 fs/namespace.c                      |  4 ++--
 fs/proc/base.c                      |  7 +++++++
 fs/proc/generic.c                   |  7 +++++++
 fs/proc/proc_sysctl.c               |  7 +++++++
 fs/quota/quota.c                    |  2 +-
 include/linux/fs.h                  |  2 +-
 kernel/user_namespace.c             |  1 +
 security/commoncap.c                |  8 ++++++--
 security/integrity/evm/evm_crypto.c |  3 ++-
 21 files changed, 127 insertions(+), 52 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-02-19 23:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-22 14:32 [PATCH v5 00/11] FUSE mounts from non-init user namespaces Dongsu Park
  -- strict thread matches above, loose matches on Subject: below --
2017-12-22 14:32 Dongsu Park
2017-12-25  7:05 ` Eric W. Biederman
     [not found]   ` <877etbcmnd.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-01-09 15:05     ` Dongsu Park
2018-01-09 15:05   ` Dongsu Park
2018-01-18 14:58     ` Alban Crequy
     [not found]       ` <CADZs7q438szfwd-kaaRDnpDFrmno3zy7Zq+6EsnotW8bS0vrTA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-19 23:09         ` Eric W. Biederman
2018-02-19 23:09           ` Eric W. Biederman
     [not found]     ` <CANxcAMvwwiPXBTKmTM9sEo8Y1T--V7fNaFqzHfyEvwvaYQV60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-18 14:58       ` Alban Crequy
     [not found] ` <cover.1512741134.git.dongsu-lYLaGTFnO9sWenYVfaLwtA@public.gmane.org>
2017-12-25  7:05   ` Eric W. Biederman
2018-02-13 11:32   ` Miklos Szeredi
2018-02-13 11:32     ` Miklos Szeredi
2018-02-16 21:53     ` Eric W. Biederman
     [not found]     ` <CAOssrKey+oxahrXHO5d6Lu1ZD=r1t-b0i4iZM_Ke9ToqTckjkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-16 21:53       ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.