linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/39] fs: idmapped mounts
@ 2020-11-15 10:36 Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 01/39] namespace: take lock_mount_hash() directly when changing flags Christian Brauner
                   ` (41 more replies)
  0 siblings, 42 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner

Hey everyone,

This is v2. It is reworked according to the reviews coming from
Christoph and others to adapt all relevant helpers and inode_operations
methods to account for idmapped mounts instead of introducing new
helpers and methods specific to idmapped mounts like we did before.
We've also moved the overlayfs conversion to handle idmapped mounts into
a separate patchset that will be sent out separately after the core
changes. The converted filesytems in this series include fat and ext4.
The config option to disable idmapped mounts has been moved from a a vfs
config ption to a per-filesystem option. They default to off. Having a
config option allows us to gain some confidence in the patchset over
multiple kernel releases.

There are two noteable things about this version. First, that it comes
with a really large test-suite to test current vfs behavior and
idmapped mounts behavior. We intend this test-suite to grow over time
and at some point cover most basic core vfs functionality that isn't
covered in xfstests and have it be part of the selftests.
Second, while while working on adapting this patchset to the requested
changes, the runC and containerd crowd was nice enough to adapt
containerd to this patchset to make use of idmapped mounts in one of the
most widely used container runtimes:
https://github.com/containerd/containerd/pull/4734

With this patchset we make it possible to attach idmappings to bind
mounts. This handles several common use-cases. Here are just a few:
- Shifting of a container rootfs or base image without having to mangle
  every file (runc, Docker, containerd, k8s, LXD, systemd ...)
- Sharing of data between host or privileged containers with
  underprivileged containers (runc, Docker, containerd, k8s, LXD, ...)
- Shifting of subset of ownership-less filesystems (vfat) for use by
  multiple users, effectively allowing for DAC on such devices (systemd,
  Android, ...)
- Data sharing between multiple user namespaces with incompatible maps
  (LXD, k8s, ...)
Making it possible to share directories and mounts between users with
different uids and gids is itself quite an important use-case in
distributed systems environments. It's of course especially useful in
general for portable usb sticks, sharing data between multiple users in
general, and sharing home directories between multiple users. The last
example is now elegantly expressed in systemd's homed concept for
portable home directories. As mentioned above, idmapped mounts also
allow data from the host to be shared with unprivileged containers,
between privileged and unprivileged containers simultaneously and in
addition also between unprivileged containers with different idmappings
whenever they are used to isolate one container completely from another
container.
As can be seen from answers to earlier threads of this patchset and from
the list of potential users interest in this patchset is fairly
widespread.

We have implemented and proposed multiple solutions to this before. This
included the introduction of fsid mappings, a tiny filesystem that is
currently carried in Ubuntu that has shown it's limitations, and an
approach to call override creds in the vfs. None of these solutions have
covered all of the above use-cases. Some of them have been fairly hacky
too by e.g. violating how things should be passed down to the individual
filesystems.
The solution proposed here has it's origins in multiple discussions
during Linux Plumbers 2017 during and after the end of the containers
microconference.
To the best of my knowledge this involved Aleksa, Stéphane, Eric, David,
James, and myself. The original idea or a variant thereof has been
discussed, again to the best of my knowledge, after a Linux conference
in St. Petersburg in Russia in 2017 between Christoph, Tycho, and
myself.
We've taken the time to implement a working version of this solution
over the last weeks to the best of my abilities. Tycho has signed up
for this sligthly crazy endeavour as well and he has helped with the
conversion of the xattr codepaths and will be involved with others in
converting additional filesystems.

This series makes idmappings a property of struct vfsmount instead of
tying it to a process being inside of a user namespace which has been
the case for all other proposed approaches. It also allows to pass down
the user namespace into the filesystms which is a clean way instead of
violating calling conventions by strapping the user namespace
information that is a property of the mount to the caller's credentials
or similar hacks.
With this idmappings become a property of bind-mounts, i.e. each
bind-mount can have a separate idmapping. Such idmapped mounts can even
be created inside of the initial user namespace.

The vfsmount struct gains a new struct user_namespace member. The
idmapping of the user namespace becomes the idmapping of the mount. A
caller that is privileged with respect to the user namespace of
the superblock of the underlying filesystem can create an idmapped
bind-mount. In the future, we can enable unprivileged use-cases by
checking whether the caller is privileged wrt to the user namespace an
already idmapped mount has been marked with, allowing them to change the
idmapping. For now, keep things simple until we feel sure enough. Note,
that with syscall interception it is already possible to intercept
idmapped mount requests from unprivileged containers and handle them in
a sufficiently privileged container manager. Support for this is
already available in LXD and will be available in runC were syscall
interception is currently becoming part of the runtime spec (see
https://github.com/opencontainers/runtime-spec/pull/1074).

The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an argument
to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. By default vfsmounts are marked with the initial
user namespace and no behavioral or performance changes should be
observed. All mapping operations are nops for the initial user
namespace. When a file/inode is accessed through an idmapped mount the
i_uid and i_gid of the inode will be remapped according to the user
namespace the mount has been marked with.

In order to support idmapped mounts, filesystems need to be changed and
mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The initial
version contains fat and ext4 including a list of examples. But patches
for other filesystems are actively worked on but will be sent out
separately. We are here to see this through and there are multiple
people involved in converting filesystems. So filesystem developers are
not left alone with this.

I have written a simple tool available at
https://github.com/brauner/mount-idmapped that allows to create idmapped
mounts so people can play with this patch series. Here are a few
illustrations:

1. Create a simple idmapped mount of another user's home directory

u1001@f2-vm:/$ sudo ./mount-idmapped --map-mount b:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--

2. Create mapping of the whole ext4 rootfs without a mapping for uid and gid 0

ubuntu@f2-vm:~$ sudo /mount-idmapped --map-mount b:1:1:65536 / /mnt/
ubuntu@f2-vm:~$ findmnt | grep mnt
└─/mnt                                /dev/sda2  ext4       rw,relatime
  └─/mnt/mnt                          /dev/sda2  ext4       rw,relatime
ubuntu@f2-vm:~$ sudo mkdir /AS-ROOT-CAN-CREATE
ubuntu@f2-vm:~$ sudo mkdir /mnt/AS-ROOT-CANT-CREATE
mkdir: cannot create directory ‘/mnt/AS-ROOT-CANT-CREATE’: Value too large for defined data type
ubuntu@f2-vm:~$ mkdir /mnt/home/ubuntu/AS-USER-1000-CAN-CREATE

3. Create a vfat usb mount and expose to user 1001 and 5000

ubuntu@f2-vm:/$ sudo mount /dev/sdb /mnt
ubuntu@f2-vm:/$ findmnt  | grep mnt
└─/mnt                                /dev/sdb vfat       rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
ubuntu@f2-vm:/$ ls -al /mnt
total 12
drwxr-xr-x  2 root root 4096 Jan  1  1970 .
drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
-rwxr-xr-x  1 root root    4 Oct 28 03:44 aaa
-rwxr-xr-x  1 root root    0 Oct 28 01:09 bbb
ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:1001:1 /mnt /mnt-1001/
ubuntu@f2-vm:/$ ls -al /mnt-1001/
total 12
drwxr-xr-x  2 u1001 u1001 4096 Jan  1  1970 .
drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
-rwxr-xr-x  1 u1001 u1001    4 Oct 28 03:44 aaa
-rwxr-xr-x  1 u1001 u1001    0 Oct 28 01:09 bbb
ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:5000:1 /mnt /mnt-5000/
ubuntu@f2-vm:/$ ls -al /mnt-5000/
total 12
drwxr-xr-x  2 5000 5000 4096 Jan  1  1970 .
drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
-rwxr-xr-x  1 5000 5000    4 Oct 28 03:44 aaa
-rwxr-xr-x  1 5000 5000    0 Oct 28 01:09 bbb

4. Create an idmapped rootfs mount for a container

root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/
total 68
drwxr-xr-x 17 20000 20000 4096 Sep 24 07:48 .
drwxrwx---  3 20000 20000 4096 Oct 16 19:26 ..
lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 bin -> usr/bin
drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 boot
drwxr-xr-x  3 20000 20000 4096 Oct 16 19:26 dev
drwxr-xr-x 61 20000 20000 4096 Oct 16 19:26 etc
drwxr-xr-x  3 20000 20000 4096 Sep 24 07:45 home
lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 lib -> usr/lib
lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib32 -> usr/lib32
lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib64 -> usr/lib64
lrwxrwxrwx  1 20000 20000   10 Sep 24 07:43 libx32 -> usr/libx32
drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 media
drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 mnt
drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 opt
drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 proc
drwx------  2 20000 20000 4096 Sep 24 07:43 root
drwxr-xr-x  2 20000 20000 4096 Sep 24 07:45 run
lrwxrwxrwx  1 20000 20000    8 Sep 24 07:43 sbin -> usr/sbin
drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 srv
drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 sys
drwxrwxrwt  2 20000 20000 4096 Sep 24 07:44 tmp
drwxr-xr-x 13 20000 20000 4096 Sep 24 07:43 usr
drwxr-xr-x 12 20000 20000 4096 Sep 24 07:44 var
root@f2-vm:~# /mount-idmapped --map-mount b:20000:10000:100000 /var/lib/lxc/f2/rootfs/ /mnt
root@f2-vm:~# ls -al /mnt
total 68
drwxr-xr-x 17 10000 10000 4096 Sep 24 07:48 .
drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 bin -> usr/bin
drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 boot
drwxr-xr-x  3 10000 10000 4096 Oct 16 19:26 dev
drwxr-xr-x 61 10000 10000 4096 Oct 16 19:26 etc
drwxr-xr-x  3 10000 10000 4096 Sep 24 07:45 home
lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 lib -> usr/lib
lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib32 -> usr/lib32
lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib64 -> usr/lib64
lrwxrwxrwx  1 10000 10000   10 Sep 24 07:43 libx32 -> usr/libx32
drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 media
drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 mnt
drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 opt
drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 proc
drwx------  2 10000 10000 4096 Sep 24 07:43 root
drwxr-xr-x  2 10000 10000 4096 Sep 24 07:45 run
lrwxrwxrwx  1 10000 10000    8 Sep 24 07:43 sbin -> usr/sbin
drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 srv
drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 sys
drwxrwxrwt  2 10000 10000 4096 Sep 24 07:44 tmp
drwxr-xr-x 13 10000 10000 4096 Sep 24 07:43 usr
drwxr-xr-x 12 10000 10000 4096 Sep 24 07:44 var
root@f2-vm:~# lxc-start f2 # uses /mnt as rootfs
root@f2-vm:~# lxc-attach f2 -- cat /proc/1/uid_map
         0      10000      10000
root@f2-vm:~# lxc-attach f2 -- cat /proc/1/gid_map
         0      10000      10000
root@f2-vm:~# lxc-attach f2 -- ls -al /
total 52
drwxr-xr-x  17 root   root    4096 Sep 24 07:48 .
drwxr-xr-x  17 root   root    4096 Sep 24 07:48 ..
lrwxrwxrwx   1 root   root       7 Sep 24 07:43 bin -> usr/bin
drwxr-xr-x   2 root   root    4096 Apr 15  2020 boot
drwxr-xr-x   5 root   root     500 Oct 28 23:39 dev
drwxr-xr-x  61 root   root    4096 Oct 28 23:39 etc
drwxr-xr-x   3 root   root    4096 Sep 24 07:45 home
lrwxrwxrwx   1 root   root       7 Sep 24 07:43 lib -> usr/lib
lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib32 -> usr/lib32
lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib64 -> usr/lib64
lrwxrwxrwx   1 root   root      10 Sep 24 07:43 libx32 -> usr/libx32
drwxr-xr-x   2 root   root    4096 Sep 24 07:43 media
drwxr-xr-x   2 root   root    4096 Sep 24 07:43 mnt
drwxr-xr-x   2 root   root    4096 Sep 24 07:43 opt
dr-xr-xr-x 232 nobody nogroup    0 Oct 28 23:39 proc
drwx------   2 root   root    4096 Oct 28 23:41 root
drwxr-xr-x  12 root   root     360 Oct 28 23:39 run
lrwxrwxrwx   1 root   root       8 Sep 24 07:43 sbin -> usr/sbin
drwxr-xr-x   2 root   root    4096 Sep 24 07:43 srv
dr-xr-xr-x  13 nobody nogroup    0 Oct 28 23:39 sys
drwxrwxrwt  11 root   root    4096 Oct 28 23:40 tmp
drwxr-xr-x  13 root   root    4096 Sep 24 07:43 usr
drwxr-xr-x  12 root   root    4096 Sep 24 07:44 var
root@f2-vm:~# lxc-attach f2 -- ls -al /my-file
-rw-r--r-- 1 root root 0 Oct 28 23:43 /my-file
root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/my-file
-rw-r--r-- 1 20000 20000 0 Oct 28 23:43 /var/lib/lxc/f2/rootfs/my-file

I'd like to say thanks to:
Al for pointing me into the direction to avoid inode alias issues during
lookup. David for various discussions around this. Christoph for proving
a first proper review and for being involved in the original idea. Tycho
for helping with this series and on future patches to convert
filesystems. Alban Crequy and the Kinvolk located just a few streets
away from me in Berlin for providing use-case discussions and writing
patches for containerd! Stéphane for his invaluable input on many things
and level head and enabling me to work on this. Amir for explaining and
discussing aspects of overlayfs with me. I'd like to especially thank
Seth Forshee because he provided a lot of good analysis, suggestions,
and participated in short-notice discussions in both chat and video for
some nitty-gritty technical details.

This series can be found and pulled from the three usual locations:
https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=idmapped_mounts
https://github.com/brauner/linux/tree/idmapped_mounts
https://gitlab.com/brauner/linux/-/commits/idmapped_mounts

Thanks!
Christian

Christian Brauner (37):
  namespace: take lock_mount_hash() directly when changing flags
  mount: make {lock,unlock}_mount_hash() static
  namespace: only take read lock in do_reconfigure_mnt()
  fs: add mount_setattr()
  tests: add mount_setattr() selftests
  fs: add id translation helpers
  mount: attach mappings to mounts
  capability: handle idmapped mounts
  namei: add idmapped mount aware permission helpers
  inode: add idmapped mount aware init and permission helpers
  attr: handle idmapped mounts
  acl: handle idmapped mounts
  commoncap: handle idmapped mounts
  stat: handle idmapped mounts
  namei: handle idmapped mounts in may_*() helpers
  namei: introduce struct renamedata
  namei: prepare for idmapped mounts
  open: handle idmapped mounts in do_truncate()
  open: handle idmapped mounts
  af_unix: handle idmapped mounts
  utimes: handle idmapped mounts
  fcntl: handle idmapped mounts
  notify: handle idmapped mounts
  init: handle idmapped mounts
  ioctl: handle idmapped mounts
  would_dump: handle idmapped mounts
  exec: handle idmapped mounts
  fs: add helpers for idmap mounts
  apparmor: handle idmapped mounts
  audit: handle idmapped mounts
  ima: handle idmapped mounts
  fat: handle idmapped mounts
  ext4: support idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  fs: introduce MOUNT_ATTR_IDMAP
  tests: add vfs/idmapped mounts test suite

Tycho Andersen (2):
  xattr: handle idmapped mounts
  selftests: add idmapped mounts xattr selftest

 Documentation/filesystems/locking.rst         |    6 +-
 Documentation/filesystems/porting.rst         |    2 +
 Documentation/filesystems/vfs.rst             |   17 +-
 arch/alpha/kernel/syscalls/syscall.tbl        |    1 +
 arch/arm/tools/syscall.tbl                    |    1 +
 arch/arm64/include/asm/unistd32.h             |    2 +
 arch/ia64/kernel/syscalls/syscall.tbl         |    1 +
 arch/m68k/kernel/syscalls/syscall.tbl         |    1 +
 arch/microblaze/kernel/syscalls/syscall.tbl   |    1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl     |    1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl     |    1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl     |    1 +
 arch/parisc/kernel/syscalls/syscall.tbl       |    1 +
 arch/powerpc/kernel/syscalls/syscall.tbl      |    1 +
 arch/powerpc/platforms/cell/spufs/inode.c     |    5 +-
 arch/s390/kernel/syscalls/syscall.tbl         |    1 +
 arch/sh/kernel/syscalls/syscall.tbl           |    1 +
 arch/sparc/kernel/syscalls/syscall.tbl        |    1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |    1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |    1 +
 arch/xtensa/kernel/syscalls/syscall.tbl       |    1 +
 drivers/android/binderfs.c                    |    3 +-
 drivers/base/devtmpfs.c                       |   12 +-
 fs/9p/acl.c                                   |    7 +-
 fs/9p/v9fs.h                                  |    3 +-
 fs/9p/v9fs_vfs.h                              |    2 +-
 fs/9p/vfs_inode.c                             |   32 +-
 fs/9p/vfs_inode_dotl.c                        |   34 +-
 fs/9p/xattr.c                                 |    1 +
 fs/adfs/adfs.h                                |    3 +-
 fs/adfs/inode.c                               |    5 +-
 fs/affs/affs.h                                |   10 +-
 fs/affs/inode.c                               |    7 +-
 fs/affs/namei.c                               |   15 +-
 fs/afs/dir.c                                  |   34 +-
 fs/afs/inode.c                                |    5 +-
 fs/afs/internal.h                             |    4 +-
 fs/afs/security.c                             |    2 +-
 fs/afs/xattr.c                                |    2 +
 fs/attr.c                                     |   78 +-
 fs/autofs/root.c                              |   13 +-
 fs/bad_inode.c                                |   31 +-
 fs/bfs/dir.c                                  |   12 +-
 fs/btrfs/acl.c                                |    5 +-
 fs/btrfs/ctree.h                              |    3 +-
 fs/btrfs/inode.c                              |   41 +-
 fs/btrfs/ioctl.c                              |   25 +-
 fs/btrfs/tests/btrfs-tests.c                  |    2 +-
 fs/btrfs/xattr.c                              |    2 +
 fs/cachefiles/interface.c                     |    4 +-
 fs/cachefiles/namei.c                         |   19 +-
 fs/cachefiles/xattr.c                         |   16 +-
 fs/ceph/acl.c                                 |    5 +-
 fs/ceph/dir.c                                 |   23 +-
 fs/ceph/inode.c                               |   13 +-
 fs/ceph/super.h                               |    8 +-
 fs/ceph/xattr.c                               |    1 +
 fs/cifs/cifsfs.c                              |    4 +-
 fs/cifs/cifsfs.h                              |   18 +-
 fs/cifs/dir.c                                 |    8 +-
 fs/cifs/inode.c                               |   22 +-
 fs/cifs/link.c                                |    3 +-
 fs/cifs/xattr.c                               |    1 +
 fs/coda/coda_linux.h                          |    4 +-
 fs/coda/dir.c                                 |   18 +-
 fs/coda/inode.c                               |    5 +-
 fs/coda/pioctl.c                              |    6 +-
 fs/configfs/configfs_internal.h               |    7 +-
 fs/configfs/dir.c                             |    3 +-
 fs/configfs/inode.c                           |    5 +-
 fs/configfs/symlink.c                         |    5 +-
 fs/coredump.c                                 |   12 +-
 fs/crypto/policy.c                            |    2 +-
 fs/debugfs/inode.c                            |    9 +-
 fs/ecryptfs/crypto.c                          |    4 +-
 fs/ecryptfs/inode.c                           |   74 +-
 fs/ecryptfs/main.c                            |    6 +
 fs/ecryptfs/mmap.c                            |    4 +-
 fs/efivarfs/file.c                            |    2 +-
 fs/efivarfs/inode.c                           |    4 +-
 fs/erofs/inode.c                              |    2 +-
 fs/exec.c                                     |   12 +-
 fs/exfat/exfat_fs.h                           |    3 +-
 fs/exfat/file.c                               |    9 +-
 fs/exfat/namei.c                              |   13 +-
 fs/ext2/acl.c                                 |    5 +-
 fs/ext2/acl.h                                 |    3 +-
 fs/ext2/ext2.h                                |    2 +-
 fs/ext2/ialloc.c                              |    2 +-
 fs/ext2/inode.c                               |   11 +-
 fs/ext2/ioctl.c                               |    6 +-
 fs/ext2/namei.c                               |   22 +-
 fs/ext2/xattr_security.c                      |    1 +
 fs/ext2/xattr_trusted.c                       |    1 +
 fs/ext2/xattr_user.c                          |    1 +
 fs/ext4/Kconfig                               |    9 +
 fs/ext4/acl.c                                 |    5 +-
 fs/ext4/acl.h                                 |    3 +-
 fs/ext4/ext4.h                                |   15 +-
 fs/ext4/ialloc.c                              |    7 +-
 fs/ext4/inode.c                               |   14 +-
 fs/ext4/ioctl.c                               |   18 +-
 fs/ext4/namei.c                               |   52 +-
 fs/ext4/super.c                               |    6 +-
 fs/ext4/xattr_hurd.c                          |    1 +
 fs/ext4/xattr_security.c                      |    1 +
 fs/ext4/xattr_trusted.c                       |    1 +
 fs/ext4/xattr_user.c                          |    1 +
 fs/f2fs/acl.c                                 |    5 +-
 fs/f2fs/acl.h                                 |    3 +-
 fs/f2fs/f2fs.h                                |    3 +-
 fs/f2fs/file.c                                |   23 +-
 fs/f2fs/namei.c                               |   26 +-
 fs/f2fs/xattr.c                               |    4 +-
 fs/fat/fat.h                                  |    3 +-
 fs/fat/file.c                                 |   20 +-
 fs/fat/namei_msdos.c                          |   15 +-
 fs/fat/namei_vfat.c                           |   15 +-
 fs/fcntl.c                                    |    3 +-
 fs/fuse/acl.c                                 |    3 +-
 fs/fuse/dir.c                                 |   41 +-
 fs/fuse/fuse_i.h                              |    4 +-
 fs/fuse/xattr.c                               |    2 +
 fs/gfs2/acl.c                                 |    5 +-
 fs/gfs2/acl.h                                 |    3 +-
 fs/gfs2/file.c                                |    4 +-
 fs/gfs2/inode.c                               |   54 +-
 fs/gfs2/inode.h                               |    2 +-
 fs/gfs2/xattr.c                               |    1 +
 fs/hfs/attr.c                                 |    1 +
 fs/hfs/dir.c                                  |   13 +-
 fs/hfs/hfs_fs.h                               |    2 +-
 fs/hfs/inode.c                                |    7 +-
 fs/hfsplus/dir.c                              |   25 +-
 fs/hfsplus/inode.c                            |   11 +-
 fs/hfsplus/ioctl.c                            |    2 +-
 fs/hfsplus/xattr.c                            |    1 +
 fs/hfsplus/xattr_security.c                   |    1 +
 fs/hfsplus/xattr_trusted.c                    |    1 +
 fs/hfsplus/xattr_user.c                       |    1 +
 fs/hostfs/hostfs_kern.c                       |   31 +-
 fs/hpfs/hpfs_fn.h                             |    2 +-
 fs/hpfs/inode.c                               |    7 +-
 fs/hpfs/namei.c                               |   20 +-
 fs/hugetlbfs/inode.c                          |   31 +-
 fs/init.c                                     |   21 +-
 fs/inode.c                                    |   38 +-
 fs/internal.h                                 |    9 +
 fs/jffs2/acl.c                                |    5 +-
 fs/jffs2/acl.h                                |    3 +-
 fs/jffs2/dir.c                                |   32 +-
 fs/jffs2/fs.c                                 |    7 +-
 fs/jffs2/os-linux.h                           |    2 +-
 fs/jffs2/security.c                           |    1 +
 fs/jffs2/xattr_trusted.c                      |    1 +
 fs/jffs2/xattr_user.c                         |    1 +
 fs/jfs/acl.c                                  |    5 +-
 fs/jfs/file.c                                 |    9 +-
 fs/jfs/ioctl.c                                |    2 +-
 fs/jfs/jfs_acl.h                              |    3 +-
 fs/jfs/jfs_inode.c                            |    2 +-
 fs/jfs/jfs_inode.h                            |    2 +-
 fs/jfs/namei.c                                |   21 +-
 fs/jfs/xattr.c                                |    2 +
 fs/kernfs/dir.c                               |    7 +-
 fs/kernfs/inode.c                             |   15 +-
 fs/kernfs/kernfs-internal.h                   |    5 +-
 fs/libfs.c                                    |   20 +-
 fs/minix/bitmap.c                             |    2 +-
 fs/minix/file.c                               |    7 +-
 fs/minix/inode.c                              |    2 +-
 fs/minix/namei.c                              |   25 +-
 fs/mount.h                                    |   10 -
 fs/namei.c                                    |  317 +-
 fs/namespace.c                                |  473 ++-
 fs/nfs/dir.c                                  |   23 +-
 fs/nfs/inode.c                                |    5 +-
 fs/nfs/internal.h                             |   10 +-
 fs/nfs/namespace.c                            |    7 +-
 fs/nfs/nfs3_fs.h                              |    3 +-
 fs/nfs/nfs3acl.c                              |    3 +-
 fs/nfs/nfs4proc.c                             |    3 +
 fs/nfsd/nfs2acl.c                             |    4 +-
 fs/nfsd/nfs3acl.c                             |    4 +-
 fs/nfsd/nfs4acl.c                             |    4 +-
 fs/nfsd/nfs4recover.c                         |    6 +-
 fs/nfsd/nfsfh.c                               |    2 +-
 fs/nfsd/nfsproc.c                             |    2 +-
 fs/nfsd/vfs.c                                 |   47 +-
 fs/nilfs2/inode.c                             |   13 +-
 fs/nilfs2/ioctl.c                             |    2 +-
 fs/nilfs2/namei.c                             |   20 +-
 fs/nilfs2/nilfs.h                             |    4 +-
 fs/notify/fanotify/fanotify_user.c            |    2 +-
 fs/notify/inotify/inotify_user.c              |    3 +-
 fs/ntfs/inode.c                               |    2 +-
 fs/ocfs2/acl.c                                |    5 +-
 fs/ocfs2/acl.h                                |    3 +-
 fs/ocfs2/dlmfs/dlmfs.c                        |   17 +-
 fs/ocfs2/file.c                               |   13 +-
 fs/ocfs2/file.h                               |    5 +-
 fs/ocfs2/ioctl.c                              |    2 +-
 fs/ocfs2/namei.c                              |   21 +-
 fs/ocfs2/refcounttree.c                       |    4 +-
 fs/ocfs2/xattr.c                              |    3 +
 fs/omfs/dir.c                                 |   13 +-
 fs/omfs/file.c                                |    7 +-
 fs/omfs/inode.c                               |    2 +-
 fs/open.c                                     |   52 +-
 fs/orangefs/acl.c                             |    5 +-
 fs/orangefs/inode.c                           |   15 +-
 fs/orangefs/namei.c                           |   12 +-
 fs/orangefs/orangefs-kernel.h                 |    7 +-
 fs/orangefs/xattr.c                           |    1 +
 fs/overlayfs/copy_up.c                        |   20 +-
 fs/overlayfs/dir.c                            |   31 +-
 fs/overlayfs/file.c                           |    6 +-
 fs/overlayfs/inode.c                          |   21 +-
 fs/overlayfs/overlayfs.h                      |   40 +-
 fs/overlayfs/super.c                          |   27 +-
 fs/overlayfs/util.c                           |    4 +-
 fs/posix_acl.c                                |   78 +-
 fs/proc/base.c                                |   24 +-
 fs/proc/fd.c                                  |    4 +-
 fs/proc/fd.h                                  |    3 +-
 fs/proc/generic.c                             |    9 +-
 fs/proc/internal.h                            |    2 +-
 fs/proc/proc_net.c                            |    2 +-
 fs/proc/proc_sysctl.c                         |   12 +-
 fs/proc/root.c                                |    2 +-
 fs/proc_namespace.c                           |    1 +
 fs/ramfs/file-nommu.c                         |    9 +-
 fs/ramfs/inode.c                              |   18 +-
 fs/reiserfs/acl.h                             |    3 +-
 fs/reiserfs/inode.c                           |    7 +-
 fs/reiserfs/ioctl.c                           |    4 +-
 fs/reiserfs/namei.c                           |   21 +-
 fs/reiserfs/reiserfs.h                        |    3 +-
 fs/reiserfs/xattr.c                           |   12 +-
 fs/reiserfs/xattr.h                           |    2 +-
 fs/reiserfs/xattr_acl.c                       |    7 +-
 fs/reiserfs/xattr_security.c                  |    3 +-
 fs/reiserfs/xattr_trusted.c                   |    3 +-
 fs/reiserfs/xattr_user.c                      |    3 +-
 fs/remap_range.c                              |    7 +-
 fs/stat.c                                     |   10 +-
 fs/sysv/file.c                                |    7 +-
 fs/sysv/ialloc.c                              |    2 +-
 fs/sysv/itree.c                               |    2 +-
 fs/sysv/namei.c                               |   21 +-
 fs/tracefs/inode.c                            |    4 +-
 fs/ubifs/dir.c                                |   29 +-
 fs/ubifs/file.c                               |    5 +-
 fs/ubifs/ioctl.c                              |    2 +-
 fs/ubifs/ubifs.h                              |    3 +-
 fs/ubifs/xattr.c                              |    1 +
 fs/udf/file.c                                 |    9 +-
 fs/udf/ialloc.c                               |    2 +-
 fs/udf/namei.c                                |   24 +-
 fs/udf/symlink.c                              |    2 +-
 fs/ufs/ialloc.c                               |    2 +-
 fs/ufs/inode.c                                |    7 +-
 fs/ufs/namei.c                                |   19 +-
 fs/ufs/ufs.h                                  |    3 +-
 fs/utimes.c                                   |    4 +-
 fs/vboxsf/dir.c                               |   12 +-
 fs/vboxsf/utils.c                             |    5 +-
 fs/vboxsf/vfsmod.h                            |    3 +-
 fs/verity/enable.c                            |    2 +-
 fs/xattr.c                                    |  136 +-
 fs/xfs/xfs_acl.c                              |    5 +-
 fs/xfs/xfs_acl.h                              |    3 +-
 fs/xfs/xfs_ioctl.c                            |    4 +-
 fs/xfs/xfs_iops.c                             |   55 +-
 fs/xfs/xfs_xattr.c                            |    3 +-
 fs/zonefs/super.c                             |    9 +-
 include/linux/audit.h                         |   10 +-
 include/linux/capability.h                    |   14 +-
 include/linux/fs.h                            |  145 +-
 include/linux/ima.h                           |   15 +-
 include/linux/lsm_hook_defs.h                 |   15 +-
 include/linux/lsm_hooks.h                     |    1 +
 include/linux/mount.h                         |   14 +-
 include/linux/nfs_fs.h                        |    4 +-
 include/linux/posix_acl.h                     |   15 +-
 include/linux/posix_acl_xattr.h               |   12 +-
 include/linux/security.h                      |   44 +-
 include/linux/syscalls.h                      |    3 +
 include/linux/xattr.h                         |   30 +-
 include/uapi/asm-generic/unistd.h             |    4 +-
 include/uapi/linux/mount.h                    |   25 +
 ipc/mqueue.c                                  |   16 +-
 kernel/auditsc.c                              |   29 +-
 kernel/bpf/inode.c                            |   13 +-
 kernel/capability.c                           |   14 +-
 kernel/cgroup/cgroup.c                        |    2 +-
 kernel/sys.c                                  |    2 +-
 mm/madvise.c                                  |    4 +-
 mm/memcontrol.c                               |    2 +-
 mm/mincore.c                                  |    4 +-
 mm/shmem.c                                    |   45 +-
 net/socket.c                                  |    6 +-
 net/unix/af_unix.c                            |    4 +-
 security/apparmor/apparmorfs.c                |    3 +-
 security/apparmor/domain.c                    |   13 +-
 security/apparmor/file.c                      |    5 +-
 security/apparmor/lsm.c                       |   12 +-
 security/commoncap.c                          |   46 +-
 security/integrity/evm/evm_crypto.c           |   11 +-
 security/integrity/evm/evm_main.c             |    4 +-
 security/integrity/evm/evm_secfs.c            |    2 +-
 security/integrity/ima/ima.h                  |   19 +-
 security/integrity/ima/ima_api.c              |   10 +-
 security/integrity/ima/ima_appraise.c         |   22 +-
 security/integrity/ima/ima_asymmetric_keys.c  |    2 +-
 security/integrity/ima/ima_main.c             |   28 +-
 security/integrity/ima/ima_policy.c           |   17 +-
 security/integrity/ima/ima_queue_keys.c       |    2 +-
 security/security.c                           |   25 +-
 security/selinux/hooks.c                      |   22 +-
 security/smack/smack_lsm.c                    |   18 +-
 tools/include/uapi/asm-generic/unistd.h       |    4 +-
 tools/testing/selftests/Makefile              |    1 +
 .../testing/selftests/idmap_mounts/.gitignore |    2 +
 tools/testing/selftests/idmap_mounts/Makefile |   12 +
 tools/testing/selftests/idmap_mounts/config   |    1 +
 tools/testing/selftests/idmap_mounts/core.c   | 3476 +++++++++++++++++
 .../testing/selftests/idmap_mounts/internal.h |  127 +
 tools/testing/selftests/idmap_mounts/utils.c  |  136 +
 tools/testing/selftests/idmap_mounts/utils.h  |   17 +
 tools/testing/selftests/idmap_mounts/xattr.c  |  172 +
 .../selftests/mount_setattr/.gitignore        |    1 +
 .../testing/selftests/mount_setattr/Makefile  |    7 +
 tools/testing/selftests/mount_setattr/config  |    1 +
 .../mount_setattr/mount_setattr_test.c        |  889 +++++
 335 files changed, 7594 insertions(+), 1570 deletions(-)
 create mode 100644 tools/testing/selftests/idmap_mounts/.gitignore
 create mode 100644 tools/testing/selftests/idmap_mounts/Makefile
 create mode 100644 tools/testing/selftests/idmap_mounts/config
 create mode 100644 tools/testing/selftests/idmap_mounts/core.c
 create mode 100644 tools/testing/selftests/idmap_mounts/internal.h
 create mode 100644 tools/testing/selftests/idmap_mounts/utils.c
 create mode 100644 tools/testing/selftests/idmap_mounts/utils.h
 create mode 100644 tools/testing/selftests/idmap_mounts/xattr.c
 create mode 100644 tools/testing/selftests/mount_setattr/.gitignore
 create mode 100644 tools/testing/selftests/mount_setattr/Makefile
 create mode 100644 tools/testing/selftests/mount_setattr/config
 create mode 100644 tools/testing/selftests/mount_setattr/mount_setattr_test.c


base-commit: 3cea11cd5e3b00d91caf0b4730194039b45c5891
-- 
2.29.2


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v2 01/39] namespace: take lock_mount_hash() directly when changing flags
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 02/39] mount: make {lock,unlock}_mount_hash() static Christian Brauner
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Changing mount options always ends up taking lock_mount_hash() but when
MNT_READONLY is requested and neither the mount nor the superblock are
not already MNT_READONLY we end up taking the lock, dropping it, and
retaking it to change the other mount attributes. Instead of this,
acquire the lock once when changing mount properties. This simplifies
the locking in these codepath, makes them easier to reason about and
avoids having to reacquire the lock right after dropping it.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Remove pointless __mnt_unmake_readonly() helper.
  - Even though Christoph suggested to lockdep_assert_held() into places that
    require {lock,unlock}_mount_hash() it seems that seqlock's don't support
    it.
---
 fs/namespace.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index cebaa3e81794..f183161833ad 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -463,7 +463,6 @@ static int mnt_make_readonly(struct mount *mnt)
 {
 	int ret = 0;
 
-	lock_mount_hash();
 	mnt->mnt.mnt_flags |= MNT_WRITE_HOLD;
 	/*
 	 * After storing MNT_WRITE_HOLD, we'll read the counters. This store
@@ -497,18 +496,9 @@ static int mnt_make_readonly(struct mount *mnt)
 	 */
 	smp_wmb();
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
-	unlock_mount_hash();
 	return ret;
 }
 
-static int __mnt_unmake_readonly(struct mount *mnt)
-{
-	lock_mount_hash();
-	mnt->mnt.mnt_flags &= ~MNT_READONLY;
-	unlock_mount_hash();
-	return 0;
-}
-
 int sb_prepare_remount_readonly(struct super_block *sb)
 {
 	struct mount *mnt;
@@ -2508,7 +2498,8 @@ static int change_mount_ro_state(struct mount *mnt, unsigned int mnt_flags)
 	if (readonly_request)
 		return mnt_make_readonly(mnt);
 
-	return __mnt_unmake_readonly(mnt);
+	mnt->mnt.mnt_flags &= ~MNT_READONLY;
+	return 0;
 }
 
 /*
@@ -2517,11 +2508,9 @@ static int change_mount_ro_state(struct mount *mnt, unsigned int mnt_flags)
  */
 static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
 {
-	lock_mount_hash();
 	mnt_flags |= mnt->mnt.mnt_flags & ~MNT_USER_SETTABLE_MASK;
 	mnt->mnt.mnt_flags = mnt_flags;
 	touch_mnt_namespace(mnt->mnt_ns);
-	unlock_mount_hash();
 }
 
 static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt)
@@ -2567,9 +2556,11 @@ static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
 		return -EPERM;
 
 	down_write(&sb->s_umount);
+	lock_mount_hash();
 	ret = change_mount_ro_state(mnt, mnt_flags);
 	if (ret == 0)
 		set_mount_attributes(mnt, mnt_flags);
+	unlock_mount_hash();
 	up_write(&sb->s_umount);
 
 	mnt_warn_timestamp_expiry(path, &mnt->mnt);
@@ -2610,8 +2601,11 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
 		err = -EPERM;
 		if (ns_capable(sb->s_user_ns, CAP_SYS_ADMIN)) {
 			err = reconfigure_super(fc);
-			if (!err)
+			if (!err) {
+				lock_mount_hash();
 				set_mount_attributes(mnt, mnt_flags);
+				unlock_mount_hash();
+			}
 		}
 		up_write(&sb->s_umount);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 02/39] mount: make {lock,unlock}_mount_hash() static
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 01/39] namespace: take lock_mount_hash() directly when changing flags Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 03/39] namespace: only take read lock in do_reconfigure_mnt() Christian Brauner
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

The helpers are only called in fs/namespace.c functions so there's no need to
have them exposed in a header as Christoph pointed out.

Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Add a patch to make {lock,unlock)_mount_hash() static.
---
 fs/mount.h     | 10 ----------
 fs/namespace.c | 10 ++++++++++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index c7abb7b394d8..562d96d57bad 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -125,16 +125,6 @@ static inline void get_mnt_ns(struct mnt_namespace *ns)
 
 extern seqlock_t mount_lock;
 
-static inline void lock_mount_hash(void)
-{
-	write_seqlock(&mount_lock);
-}
-
-static inline void unlock_mount_hash(void)
-{
-	write_sequnlock(&mount_lock);
-}
-
 struct proc_mounts {
 	struct mnt_namespace *ns;
 	struct path root;
diff --git a/fs/namespace.c b/fs/namespace.c
index f183161833ad..798bbf4f48ad 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -87,6 +87,16 @@ EXPORT_SYMBOL_GPL(fs_kobj);
  */
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
 
+static inline void lock_mount_hash(void)
+{
+	write_seqlock(&mount_lock);
+}
+
+static inline void unlock_mount_hash(void)
+{
+	write_sequnlock(&mount_lock);
+}
+
 static inline struct hlist_head *m_hash(struct vfsmount *mnt, struct dentry *dentry)
 {
 	unsigned long tmp = ((unsigned long)mnt / L1_CACHE_BYTES);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 03/39] namespace: only take read lock in do_reconfigure_mnt()
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 01/39] namespace: take lock_mount_hash() directly when changing flags Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 02/39] mount: make {lock,unlock}_mount_hash() static Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 04/39] fs: add mount_setattr() Christian Brauner
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

do_reconfigure_mnt() used to take the down_write(&sb->s_umount) lock
which seems unnecessary since we're not changing the superblock. We're
only checking whether it is already read-only. Setting other mount
attributes is protected by lock_mount_hash() afaict and not by s_umount.

So I think the history of down_write(&sb->s_umount) lock being taken
when setting mount attributes dates back to the introduction of
MNT_READONLY in [2]. Afaict, this introduced the concept of having
read-only mounts in contrast to just having a read-only superblock. When
it got introduced it was simply plumbed into do_remount() which already
took down_write(&sb->s_umount) because it was only used to actually
change the superblock before [2]. Afaict, it would've already been
possible back then to only use down_read(&sb->s_umount) for
MS_BIND | MS_REMOUNT since actual mount options were protected by
the vfsmount lock already. But that would've meant special casing the
locking for MS_BIND | MS_REMOUNT in do_remount() which people might not
have considered worth it.
Then in [1] MS_BIND | MS_REMOUNT mount option changes were split out of
do_remount() into do_reconfigure_mnt() but the down_write(&sb->s_umount)
lock was simply copied over.
Now that we have this be a separate helper only take
the down_read(&sb->s_umount) lock since we're only interested in
checking whether the super block is currently read-only and blocking any
writers from changing it. Essentially, checking that the super block is
read-only has the advantage that we can avoid having to go into the
slowpath and through MNT_WRITE_HOLD and can simply set the read-only
flag on the mount in set_mount_attributes().

[1]: commit 43f5e655eff7 ("vfs: Separate changing mount flags full remount")
[2]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount")
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/namespace.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 798bbf4f48ad..8497d149ecaa 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2512,10 +2512,6 @@ static int change_mount_ro_state(struct mount *mnt, unsigned int mnt_flags)
 	return 0;
 }
 
-/*
- * Update the user-settable attributes on a mount.  The caller must hold
- * sb->s_umount for writing.
- */
 static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
 {
 	mnt_flags |= mnt->mnt.mnt_flags & ~MNT_USER_SETTABLE_MASK;
@@ -2565,13 +2561,17 @@ static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
 	if (!can_change_locked_flags(mnt, mnt_flags))
 		return -EPERM;
 
-	down_write(&sb->s_umount);
+	/*
+	 * We're only checking whether the superblock is read-only not changing
+	 * it, so only take down_read(&sb->s_umount).
+	 */
+	down_read(&sb->s_umount);
 	lock_mount_hash();
 	ret = change_mount_ro_state(mnt, mnt_flags);
 	if (ret == 0)
 		set_mount_attributes(mnt, mnt_flags);
 	unlock_mount_hash();
-	up_write(&sb->s_umount);
+	up_read(&sb->s_umount);
 
 	mnt_warn_timestamp_expiry(path, &mnt->mnt);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 04/39] fs: add mount_setattr()
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (2 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 03/39] namespace: only take read lock in do_reconfigure_mnt() Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 05/39] tests: add mount_setattr() selftests Christian Brauner
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

This implements the mount_setattr() syscall. While the new mount api
allows to change the properties of a superblock there is currently no
way to change the mount properties of a mount or mount tree using file
descriptors which the new mount api is based on. In addition the old mount api
has the restriction that mount options cannot be applied recursively. This
hasn't changed since changing mount options on a per-mount basis was implemented
in [1] and has been a frequent request.
The legacy mount is currently unable to accommodate this behavior without
introducing a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND |
MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost mount.
Changing MS_REC to apply to the whole mount tree would mean introducing a
significant uapi change and would likely cause significant regressions.

The new mount_setattr() syscall allows to recursively clear and set mount
options in one shot. Multiple calls to change mount options requesting the same
changes are idempotent:

int mount_setattr(int dfd, const char *path, unsigned flags,
                  struct mount_attr *uattr, size_t usize);

Flags to modify path resolution behavior are specified in the @flags argument.
Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW, and AT_NO_AUTOMOUNT
are supported. If useful, additional lookup flags to restrict path resolution as
introduced with openat2() might be supported in the future.

mount_setattr() can be expected to grow over time and is designed with
extensibility in mind. It follows the extensible syscall pattern we have used
with other syscalls such as openat2(), clone3(), sched_{set,get}attr(), and
others.
The set of mount options is passed in the uapi struct mount_attr which currently
has the following layout:

struct mount_attr {
	__u64 attr_set;
	__u64 attr_clr;
	__u32 propagation;
};

The @attr_set and @attr_clr members are used to clear and set mount options.
This way a user can e.g. request that a set of flags is to be raised such as
turning mounts readonly by raising MOUNT_ATTR_RDONLY in @attr_set while at the
same time requesting that another set of flags is to be lowered such as removing
noexec from a mount tree by specifying MOUNT_ATTR_NOEXEC in @attr_clr.

Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0, not a
bitmap, users wanting to transition to a different atime setting cannot simply
specify the atime setting in @attr_set, but must also specify MOUNT_ATTR__ATIME
in the @attr_clr field. So we ensure that MOUNT_ATTR__ATIME can't be partially
set in @attr_clr and that @attr_set can't have any atime bits set if
MOUNT_ATTR__ATIME isn't set in @attr_clr.

The @propagation field lets callers specify the propagation type of a mount
tree. Propagation is a single property that has four different settings and as
such is not really a flag argument but an enum.  Specifically, it would be
unclear what setting and clearing propagation settings in combination would
amount to. The legacy mount() syscall thus forbids the combination of multiple
propagation settings too. The goal is to keep the semantics of mount propagation
somewhat simple as they are overly complex as it is.

[1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount")
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Split into multiple helpers.
---
 arch/alpha/kernel/syscalls/syscall.tbl      |   1 +
 arch/arm/tools/syscall.tbl                  |   1 +
 arch/arm64/include/asm/unistd32.h           |   2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |   1 +
 arch/s390/kernel/syscalls/syscall.tbl       |   1 +
 arch/sh/kernel/syscalls/syscall.tbl         |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |   1 +
 fs/internal.h                               |   8 +
 fs/namespace.c                              | 325 ++++++++++++++++++--
 include/linux/syscalls.h                    |   3 +
 include/uapi/asm-generic/unistd.h           |   4 +-
 include/uapi/linux/mount.h                  |  22 ++
 tools/include/uapi/asm-generic/unistd.h     |   4 +-
 23 files changed, 358 insertions(+), 26 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index ee7b01bb7346..24d8709624b8 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -480,3 +480,4 @@
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	faccessat2			sys_faccessat2
 550	common	process_madvise			sys_process_madvise
+551	common	mount_setattr			sys_mount_setattr
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index d056a548358e..e3785513d445 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -454,3 +454,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 107f08e03b9f..78af754e070a 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -889,6 +889,8 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
 #define __NR_process_madvise 440
 __SYSCALL(__NR_process_madvise, sys_process_madvise)
+#define __NR_mount_setattr 441
+__SYSCALL(__NR_mount_setattr, sys_mount_setattr)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index b96ed8b8a508..f7d4b1f55be0 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -361,3 +361,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 625fb6d32842..e96e9c6a6ffa 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -440,3 +440,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index aae729c95cf9..6538f075a18e 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -446,3 +446,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 32817c954435..64d129db1aa7 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -379,3 +379,4 @@
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	faccessat2			sys_faccessat2
 440	n32	process_madvise			sys_process_madvise
+441	n32	mount_setattr			sys_mount_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 9e4ea3c31b1c..94b24e6b2608 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -355,3 +355,4 @@
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	faccessat2			sys_faccessat2
 440	n64	process_madvise			sys_process_madvise
+441	n64	mount_setattr			sys_mount_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 29f5f28cf5ce..eae522306767 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -428,3 +428,4 @@
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	faccessat2			sys_faccessat2
 440	o32	process_madvise			sys_process_madvise
+441	o32	mount_setattr			sys_mount_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index f375ea528e59..c7e25f1d219f 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -438,3 +438,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 1275daec7fec..0b309ef64e91 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -530,3 +530,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 28c168000483..0b30398fee42 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -443,3 +443,4 @@
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
 440  common	process_madvise		sys_process_madvise		sys_process_madvise
+441  common	mount_setattr		sys_mount_setattr		sys_mount_setattr
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 783738448ff5..8e4949c5b740 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -443,3 +443,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 78160260991b..409f21a650b8 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -486,3 +486,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 0d0667a9fbd7..2a694420f6cd 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -445,3 +445,4 @@
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
 440	i386	process_madvise		sys_process_madvise
+441	i386	mount_setattr		sys_mount_setattr
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 379819244b91..4d594d0246c1 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -362,6 +362,7 @@
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
 440	common	process_madvise		sys_process_madvise
+441	common	mount_setattr		sys_mount_setattr
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index b070f272995d..a650dc05593d 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -411,3 +411,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	process_madvise			sys_process_madvise
+441	common	mount_setattr			sys_mount_setattr
diff --git a/fs/internal.h b/fs/internal.h
index a7cd0f64faa4..a5a6c470dc07 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -82,6 +82,14 @@ int may_linkat(struct path *link);
 /*
  * namespace.c
  */
+struct mount_kattr {
+	unsigned int attr_set;
+	unsigned int attr_clr;
+	unsigned int propagation;
+	unsigned int lookup_flags;
+	bool recurse;
+};
+
 extern struct vfsmount *lookup_mnt(const struct path *);
 extern int finish_automount(struct vfsmount *, struct path *);
 
diff --git a/fs/namespace.c b/fs/namespace.c
index 8497d149ecaa..9fc8b22dba26 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -469,10 +469,8 @@ void mnt_drop_write_file(struct file *file)
 }
 EXPORT_SYMBOL(mnt_drop_write_file);
 
-static int mnt_make_readonly(struct mount *mnt)
+static inline int mnt_hold_writers(struct mount *mnt)
 {
-	int ret = 0;
-
 	mnt->mnt.mnt_flags |= MNT_WRITE_HOLD;
 	/*
 	 * After storing MNT_WRITE_HOLD, we'll read the counters. This store
@@ -497,15 +495,29 @@ static int mnt_make_readonly(struct mount *mnt)
 	 * we're counting up here.
 	 */
 	if (mnt_get_writers(mnt) > 0)
-		ret = -EBUSY;
-	else
-		mnt->mnt.mnt_flags |= MNT_READONLY;
+		return -EBUSY;
+
+	return 0;
+}
+
+static inline void mnt_unhold_writers(struct mount *mnt)
+{
 	/*
 	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
 	 */
 	smp_wmb();
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+}
+
+static int mnt_make_readonly(struct mount *mnt)
+{
+	int ret;
+
+	ret = mnt_hold_writers(mnt);
+	if (!ret)
+		mnt->mnt.mnt_flags |= MNT_READONLY;
+	mnt_unhold_writers(mnt);
 	return ret;
 }
 
@@ -3438,6 +3450,33 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
 	return ret;
 }
 
+static int build_attr_flags(unsigned int attr_flags, unsigned int *flags)
+{
+	unsigned int aflags = 0;
+
+	if (attr_flags & ~(MOUNT_ATTR_RDONLY |
+			   MOUNT_ATTR_NOSUID |
+			   MOUNT_ATTR_NODEV |
+			   MOUNT_ATTR_NOEXEC |
+			   MOUNT_ATTR__ATIME |
+			   MOUNT_ATTR_NODIRATIME))
+		return -EINVAL;
+
+	if (attr_flags & MOUNT_ATTR_RDONLY)
+		aflags |= MNT_READONLY;
+	if (attr_flags & MOUNT_ATTR_NOSUID)
+		aflags |= MNT_NOSUID;
+	if (attr_flags & MOUNT_ATTR_NODEV)
+		aflags |= MNT_NODEV;
+	if (attr_flags & MOUNT_ATTR_NOEXEC)
+		aflags |= MNT_NOEXEC;
+	if (attr_flags & MOUNT_ATTR_NODIRATIME)
+		aflags |= MNT_NODIRATIME;
+
+	*flags = aflags;
+	return 0;
+}
+
 /*
  * Create a kernel mount representation for a new, prepared superblock
  * (specified by fs_fd) and attach to an open_tree-like file descriptor.
@@ -3460,24 +3499,9 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	if ((flags & ~(FSMOUNT_CLOEXEC)) != 0)
 		return -EINVAL;
 
-	if (attr_flags & ~(MOUNT_ATTR_RDONLY |
-			   MOUNT_ATTR_NOSUID |
-			   MOUNT_ATTR_NODEV |
-			   MOUNT_ATTR_NOEXEC |
-			   MOUNT_ATTR__ATIME |
-			   MOUNT_ATTR_NODIRATIME))
-		return -EINVAL;
-
-	if (attr_flags & MOUNT_ATTR_RDONLY)
-		mnt_flags |= MNT_READONLY;
-	if (attr_flags & MOUNT_ATTR_NOSUID)
-		mnt_flags |= MNT_NOSUID;
-	if (attr_flags & MOUNT_ATTR_NODEV)
-		mnt_flags |= MNT_NODEV;
-	if (attr_flags & MOUNT_ATTR_NOEXEC)
-		mnt_flags |= MNT_NOEXEC;
-	if (attr_flags & MOUNT_ATTR_NODIRATIME)
-		mnt_flags |= MNT_NODIRATIME;
+	ret = build_attr_flags(attr_flags, &mnt_flags);
+	if (ret)
+		return ret;
 
 	switch (attr_flags & MOUNT_ATTR__ATIME) {
 	case MOUNT_ATTR_STRICTATIME:
@@ -3785,6 +3809,259 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	return error;
 }
 
+static unsigned int recalc_flags(struct mount_kattr *kattr, struct mount *mnt)
+{
+	unsigned int flags = mnt->mnt.mnt_flags;
+
+	/*  flags to clear */
+	flags &= ~kattr->attr_clr;
+	/* flags to raise */
+	flags |= kattr->attr_set;
+
+	return flags;
+}
+
+static struct mount *mount_setattr_prepare(struct mount_kattr *kattr,
+					   struct mount *mnt, int *err)
+{
+	struct mount *m = mnt, *last = NULL;
+
+	if (!is_mounted(&m->mnt)) {
+		*err = -EINVAL;
+		goto out;
+	}
+
+	if (!(mnt_has_parent(m) ? check_mnt(m) : is_anon_ns(m->mnt_ns))) {
+		*err = -EINVAL;
+		goto out;
+	}
+
+	do {
+		unsigned int flags;
+
+		flags = recalc_flags(kattr, m);
+		if (!can_change_locked_flags(m, flags)) {
+			*err = -EPERM;
+			goto out;
+		}
+
+		last = m;
+
+		if ((kattr->attr_set & MNT_READONLY) &&
+		    !(m->mnt.mnt_flags & MNT_READONLY)) {
+			*err = mnt_hold_writers(m);
+			if (*err)
+				goto out;
+		}
+	} while (kattr->recurse && (m = next_mnt(m, mnt)));
+
+out:
+	return last;
+}
+
+static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt,
+				 struct mount *last, int err)
+{
+	struct mount *m = mnt;
+
+	do {
+		if (!err) {
+			unsigned int flags;
+
+			flags = recalc_flags(kattr, m);
+			WRITE_ONCE(m->mnt.mnt_flags, flags);
+		}
+
+		/*
+		 * We either set MNT_READONLY above so make it visible
+		 * before ~MNT_WRITE_HOLD or we failed to recursively
+		 * apply mount options.
+		 */
+		if ((kattr->attr_set & MNT_READONLY) &&
+		    (m->mnt.mnt_flags & MNT_WRITE_HOLD))
+			mnt_unhold_writers(m);
+
+		if (!err && kattr->propagation)
+			change_mnt_propagation(m, kattr->propagation);
+
+		/*
+		 * On failure, only cleanup until we found the first mount we
+		 * failed to handle.
+		 */
+		if (err && m == last)
+			break;
+	} while (kattr->recurse && (m = next_mnt(m, mnt)));
+
+	if (!err)
+		touch_mnt_namespace(mnt->mnt_ns);
+}
+
+static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
+{
+	struct mount *mnt = real_mount(path->mnt), *last = NULL;
+	int err = 0;
+
+	if (path->dentry != mnt->mnt.mnt_root)
+		return -EINVAL;
+
+	if (kattr->propagation) {
+		/*
+		 * Only take namespace_lock() if we're actually changing
+		 * propagation.
+		 */
+		namespace_lock();
+		if (kattr->propagation == MS_SHARED) {
+			err = invent_group_ids(mnt, kattr->recurse);
+			if (err) {
+				namespace_unlock();
+				return err;
+			}
+		}
+	}
+
+	lock_mount_hash();
+
+	/*
+	 * Get the mount tree in a shape where we can change mount properties
+	 * without failure.
+	 */
+	last = mount_setattr_prepare(kattr, mnt, &err);
+	if (last) /* Commit all changes or revert to the old state. */
+		mount_setattr_commit(kattr, mnt, last, err);
+
+	unlock_mount_hash();
+
+	if (kattr->propagation) {
+		namespace_unlock();
+		if (err)
+			cleanup_group_ids(mnt, NULL);
+	}
+
+	return err;
+}
+
+static int build_mount_kattr(const struct mount_attr *attr,
+			     struct mount_kattr *kattr, unsigned int flags)
+{
+	unsigned int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
+
+	if (flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (flags & AT_SYMLINK_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
+	if (flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+
+	*kattr = (struct mount_kattr){
+		.lookup_flags	= lookup_flags,
+		.recurse	= !!(flags & AT_RECURSIVE),
+	};
+
+	switch (attr->propagation) {
+	case MAKE_PROPAGATION_UNCHANGED:
+		kattr->propagation = 0;
+		break;
+	case MAKE_PROPAGATION_UNBINDABLE:
+		kattr->propagation = MS_UNBINDABLE;
+		break;
+	case MAKE_PROPAGATION_PRIVATE:
+		kattr->propagation = MS_PRIVATE;
+		break;
+	case MAKE_PROPAGATION_DEPENDENT:
+		kattr->propagation = MS_SLAVE;
+		break;
+	case MAKE_PROPAGATION_SHARED:
+		kattr->propagation = MS_SHARED;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (upper_32_bits(attr->attr_set))
+		return -EINVAL;
+	if (build_attr_flags(lower_32_bits(attr->attr_set), &kattr->attr_set))
+		return -EINVAL;
+
+	if (upper_32_bits(attr->attr_clr))
+		return -EINVAL;
+	if (build_attr_flags(lower_32_bits(attr->attr_clr), &kattr->attr_clr))
+		return -EINVAL;
+
+	/*
+	 * Since the MOUNT_ATTR_<atime> values are an enum, not a bitmap, users
+	 * wanting to transition to a different atime setting cannot simply
+	 * specify the atime setting in @attr_set, but must also specify
+	 * MOUNT_ATTR__ATIME in the @attr_clr field.
+	 * So ensure that MOUNT_ATTR__ATIME can't be partially set in
+	 * @attr_clr and that @attr_set can't have any atime bits set if
+	 * MOUNT_ATTR__ATIME isn't set in @attr_clr.
+	 */
+	if (!(attr->attr_clr & MOUNT_ATTR__ATIME) && (attr->attr_set & MOUNT_ATTR__ATIME))
+		return -EINVAL;
+	else if ((attr->attr_clr & MOUNT_ATTR__ATIME) &&
+		 ((attr->attr_clr & MOUNT_ATTR__ATIME) != MOUNT_ATTR__ATIME))
+		return -EINVAL;
+
+	if (attr->attr_clr & MOUNT_ATTR__ATIME) {
+		/* Clear all previous time settings as they are mutually exclusive. */
+		kattr->attr_clr |= MNT_RELATIME | MNT_NOATIME;
+		switch (attr->attr_set & MOUNT_ATTR__ATIME) {
+		case MOUNT_ATTR_RELATIME:
+			kattr->attr_set |= MNT_RELATIME;
+			break;
+		case MOUNT_ATTR_NOATIME:
+			kattr->attr_set |= MNT_NOATIME;
+			break;
+		case MOUNT_ATTR_STRICTATIME:
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path, unsigned int, flags,
+		struct mount_attr __user *, uattr, size_t, usize)
+{
+	int err;
+	struct path target;
+	struct mount_attr attr;
+	struct mount_kattr kattr;
+
+	BUILD_BUG_ON(sizeof(struct mount_attr) < MOUNT_ATTR_SIZE_VER0);
+	BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_LATEST);
+
+	if (flags & ~(AT_EMPTY_PATH | AT_RECURSIVE | AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT))
+		return -EINVAL;
+
+	if (unlikely(usize < MOUNT_ATTR_SIZE_VER0))
+		return -EINVAL;
+
+	if (!may_mount())
+		return -EPERM;
+
+	err = copy_struct_from_user(&attr, sizeof(attr), uattr, usize);
+	if (err)
+		return err;
+
+	if (attr.attr_set == 0 && attr.attr_clr == 0 && attr.propagation == 0)
+		return 0;
+
+	err = build_mount_kattr(&attr, &kattr, flags);
+	if (err)
+		return err;
+
+	err = user_path_at(dfd, path, kattr.lookup_flags, &target);
+	if (err)
+		return err;
+
+	err = do_mount_setattr(&target, &kattr);
+	path_put(&target);
+	return err;
+}
+
 static void __init init_mount_tree(void)
 {
 	struct vfsmount *mnt;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 37bea07c12f2..a62d5904fb6a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -68,6 +68,7 @@ union bpf_attr;
 struct io_uring_params;
 struct clone_args;
 struct open_how;
+struct mount_attr;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -999,6 +1000,8 @@ asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
 asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
 			       int to_dfd, const char __user *to_path,
 			       unsigned int ms_flags);
+asmlinkage long sys_mount_setattr(int dfd, const char __user *path, unsigned int flags,
+				  struct mount_attr __user *uattr, size_t usize);
 asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags);
 asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key,
 			     const void __user *value, int aux);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 2056318988f7..0517f36fe783 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -859,9 +859,11 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
 #define __NR_process_madvise 440
 __SYSCALL(__NR_process_madvise, sys_process_madvise)
+#define __NR_mount_setattr 441
+__SYSCALL(__NR_mount_setattr, sys_mount_setattr)
 
 #undef __NR_syscalls
-#define __NR_syscalls 441
+#define __NR_syscalls 442
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index dd8306ea336c..fb3ad26fdebf 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -118,4 +118,26 @@ enum fsconfig_command {
 #define MOUNT_ATTR_STRICTATIME	0x00000020 /* - Always perform atime updates */
 #define MOUNT_ATTR_NODIRATIME	0x00000080 /* Do not update directory access times */
 
+/*
+ * mount_setattr()
+ */
+struct mount_attr {
+	__u64 attr_set;
+	__u64 attr_clr;
+	__u64 propagation;
+};
+
+/* Change propagation through mount_setattr(). */
+enum propagation_type {
+	MAKE_PROPAGATION_UNCHANGED	= 0, /* Don't change mount propagation (default). */
+	MAKE_PROPAGATION_UNBINDABLE	= 1, /* Make unbindable. */
+	MAKE_PROPAGATION_PRIVATE	= 2, /* Do not receive or send mount events. */
+	MAKE_PROPAGATION_DEPENDENT	= 3, /* Only receive mount events. */
+	MAKE_PROPAGATION_SHARED		= 4, /* Send and receive mount events. */
+};
+
+/* List of all mount_attr versions. */
+#define MOUNT_ATTR_SIZE_VER0	24 /* sizeof first published struct */
+#define MOUNT_ATTR_SIZE_LATEST	MOUNT_ATTR_SIZE_VER0
+
 #endif /* _UAPI_LINUX_MOUNT_H */
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index f2b5d72a46c2..1cbe8bbda5fa 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_mount_setattr 441
+__SYSCALL(__NR_mount_setattr, sys_mount_setattr)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 442
 
 /*
  * 32 bit systems traditionally used different
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 05/39] tests: add mount_setattr() selftests
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (3 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 04/39] fs: add mount_setattr() Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 06/39] fs: add id translation helpers Christian Brauner
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Add a range of selftests for the new mount_setattr() syscall to verify
that it works as expected. This tests that:
- no invalid flags can be specified
- changing properties of a single mount works and leaves other mounts in
  the mount tree unchanged
- changing a mount tre to read-only when one of the mounts has writers
  fails and leaves the whole mount tree unchanged
- changing mount properties from multiple threads works
- changing atime settings works
- changing mount propagation works
- changing the mount options of a mount tree where the individual mounts
  in the tree have different mount options only changes the flags that
  were requested to change
- changing mount options from another mount namespace fails
- changing mount options from another user namespace fails

[==========] Running 9 tests from 2 test cases.
[ RUN      ] mount_setattr.invalid_attributes
[       OK ] mount_setattr.invalid_attributes
[ RUN      ] mount_setattr.basic
[       OK ] mount_setattr.basic
[ RUN      ] mount_setattr.basic_recursive
[       OK ] mount_setattr.basic_recursive
[ RUN      ] mount_setattr.mount_has_writers
[       OK ] mount_setattr.mount_has_writers
[ RUN      ] mount_setattr.mixed_mount_options
[       OK ] mount_setattr.mixed_mount_options
[ RUN      ] mount_setattr.time_changes
[       OK ] mount_setattr.time_changes
[ RUN      ] mount_setattr.multi_threaded
[       OK ] mount_setattr.multi_threaded
[ RUN      ] mount_setattr.wrong_user_namespace
[       OK ] mount_setattr.wrong_user_namespace
[ RUN      ] mount_setattr.wrong_mount_namespace
[       OK ] mount_setattr.wrong_mount_namespace
[==========] 9 / 9 tests passed.
[  PASSED  ]

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
/* v2 */
unchanged
---
 tools/testing/selftests/Makefile              |   1 +
 .../selftests/mount_setattr/.gitignore        |   1 +
 .../testing/selftests/mount_setattr/Makefile  |   7 +
 tools/testing/selftests/mount_setattr/config  |   1 +
 .../mount_setattr/mount_setattr_test.c        | 889 ++++++++++++++++++
 5 files changed, 899 insertions(+)
 create mode 100644 tools/testing/selftests/mount_setattr/.gitignore
 create mode 100644 tools/testing/selftests/mount_setattr/Makefile
 create mode 100644 tools/testing/selftests/mount_setattr/config
 create mode 100644 tools/testing/selftests/mount_setattr/mount_setattr_test.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index d9c283503159..87b7107dd9a6 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -34,6 +34,7 @@ TARGETS += memfd
 TARGETS += memory-hotplug
 TARGETS += mincore
 TARGETS += mount
+TARGETS += mount_setattr
 TARGETS += mqueue
 TARGETS += net
 TARGETS += net/forwarding
diff --git a/tools/testing/selftests/mount_setattr/.gitignore b/tools/testing/selftests/mount_setattr/.gitignore
new file mode 100644
index 000000000000..5f74d8488472
--- /dev/null
+++ b/tools/testing/selftests/mount_setattr/.gitignore
@@ -0,0 +1 @@
+mount_setattr_test
diff --git a/tools/testing/selftests/mount_setattr/Makefile b/tools/testing/selftests/mount_setattr/Makefile
new file mode 100644
index 000000000000..2250f7dcb81e
--- /dev/null
+++ b/tools/testing/selftests/mount_setattr/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for mount selftests.
+CFLAGS = -g -I../../../../usr/include/ -Wall -O2 -pthread
+
+TEST_GEN_FILES += mount_setattr_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/mount_setattr/config b/tools/testing/selftests/mount_setattr/config
new file mode 100644
index 000000000000..416bd53ce982
--- /dev/null
+++ b/tools/testing/selftests/mount_setattr/config
@@ -0,0 +1 @@
+CONFIG_USER_NS=y
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
new file mode 100644
index 000000000000..d6e2555c8cac
--- /dev/null
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -0,0 +1,889 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdio.h>
+#include <errno.h>
+#include <pthread.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/mount.h>
+#include <sys/wait.h>
+#include <sys/vfs.h>
+#include <sys/statvfs.h>
+#include <sys/sysinfo.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <grp.h>
+#include <stdbool.h>
+#include <stdarg.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef CLONE_NEWNS
+#define CLONE_NEWNS 0x00020000
+#endif
+
+#ifndef CLONE_NEWUSER
+#define CLONE_NEWUSER 0x10000000
+#endif
+
+#ifndef MS_REC
+#define MS_REC 16384
+#endif
+
+#ifndef MS_RELATIME
+#define MS_RELATIME (1 << 21)
+#endif
+
+#ifndef MS_STRICTATIME
+#define MS_STRICTATIME (1 << 24)
+#endif
+
+#ifndef MOUNT_ATTR_RDONLY
+#define MOUNT_ATTR_RDONLY 0x00000001
+#endif
+
+#ifndef MOUNT_ATTR_NOSUID
+#define MOUNT_ATTR_NOSUID 0x00000002
+#endif
+
+#ifndef MOUNT_ATTR_NOEXEC
+#define MOUNT_ATTR_NOEXEC 0x00000008
+#endif
+
+#ifndef MOUNT_ATTR_NODIRATIME
+#define MOUNT_ATTR_NODIRATIME 0x00000080
+#endif
+
+#ifndef MOUNT_ATTR__ATIME
+#define MOUNT_ATTR__ATIME 0x00000070
+#endif
+
+#ifndef MOUNT_ATTR_RELATIME
+#define MOUNT_ATTR_RELATIME 0x00000000
+#endif
+
+#ifndef MOUNT_ATTR_NOATIME
+#define MOUNT_ATTR_NOATIME 0x00000010
+#endif
+
+#ifndef MOUNT_ATTR_STRICTATIME
+#define MOUNT_ATTR_STRICTATIME 0x00000020
+#endif
+
+#ifndef AT_RECURSIVE
+#define AT_RECURSIVE 0x8000
+#endif
+
+#ifndef MAKE_PROPAGATION_SHARED
+#define MAKE_PROPAGATION_SHARED 4
+#endif
+
+#define DEFAULT_THREADS 4
+#define ptr_to_int(p) ((int)((intptr_t)(p)))
+#define int_to_ptr(u) ((void *)((intptr_t)(u)))
+
+#ifndef __NR_mount_setattr
+	#if defined __alpha__
+		#define __NR_mount_setattr 551
+	#elif defined _MIPS_SIM
+		#if _MIPS_SIM == _MIPS_SIM_ABI32	/* o32 */
+			#define __NR_mount_setattr 4441
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_NABI32	/* n32 */
+			#define __NR_mount_setattr 6441
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_ABI64	/* n64 */
+			#define __NR_mount_setattr 5441
+		#endif
+	#elif defined __ia64__
+		#define __NR_mount_setattr (441 + 1024)
+	#else
+		#define __NR_mount_setattr 441
+	#endif
+
+struct mount_attr {
+	__u64 attr_set;
+	__u64 attr_clr;
+	__u64 propagation;
+};
+#endif
+
+static inline int sys_mount_setattr(int dfd, const char *path, unsigned int flags,
+				    struct mount_attr *attr, size_t size)
+{
+	return syscall(__NR_mount_setattr, dfd, path, flags, attr, size);
+}
+
+static ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = write(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+static int write_file(const char *path, const void *buf, size_t count)
+{
+	int fd;
+	ssize_t ret;
+
+	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
+	if (fd < 0)
+		return -1;
+
+	ret = write_nointr(fd, buf, count);
+	close(fd);
+	if (ret < 0 || (size_t)ret != count)
+		return -1;
+
+	return 0;
+}
+
+static int create_and_enter_userns(void)
+{
+	uid_t uid;
+	gid_t gid;
+	char map[100];
+
+	uid = getuid();
+	gid = getgid();
+
+	if (unshare(CLONE_NEWUSER))
+		return -1;
+
+	if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) &&
+	    errno != ENOENT)
+		return -1;
+
+	snprintf(map, sizeof(map), "0 %d 1", uid);
+	if (write_file("/proc/self/uid_map", map, strlen(map)))
+		return -1;
+
+
+	snprintf(map, sizeof(map), "0 %d 1", gid);
+	if (write_file("/proc/self/gid_map", map, strlen(map)))
+		return -1;
+
+	if (setgid(0))
+		return -1;
+
+	if (setuid(0))
+		return -1;
+
+	return 0;
+}
+
+static int prepare_unpriv_mountns(void)
+{
+	if (create_and_enter_userns())
+		return -1;
+
+	if (unshare(CLONE_NEWNS))
+		return -1;
+
+	if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0))
+		return -1;
+
+	return 0;
+}
+
+static int read_mnt_flags(const char *path)
+{
+	int ret;
+	struct statvfs stat;
+	unsigned int mnt_flags;
+
+	ret = statvfs(path, &stat);
+	if (ret != 0)
+		return -EINVAL;
+
+	if (stat.f_flag &
+	    ~(ST_RDONLY | ST_NOSUID | ST_NODEV | ST_NOEXEC | ST_NOATIME |
+	      ST_NODIRATIME | ST_RELATIME | ST_SYNCHRONOUS | ST_MANDLOCK))
+		return -EINVAL;
+
+	mnt_flags = 0;
+	if (stat.f_flag & ST_RDONLY)
+		mnt_flags |= MS_RDONLY;
+	if (stat.f_flag & ST_NOSUID)
+		mnt_flags |= MS_NOSUID;
+	if (stat.f_flag & ST_NODEV)
+		mnt_flags |= MS_NODEV;
+	if (stat.f_flag & ST_NOEXEC)
+		mnt_flags |= MS_NOEXEC;
+	if (stat.f_flag & ST_NOATIME)
+		mnt_flags |= MS_NOATIME;
+	if (stat.f_flag & ST_NODIRATIME)
+		mnt_flags |= MS_NODIRATIME;
+	if (stat.f_flag & ST_RELATIME)
+		mnt_flags |= MS_RELATIME;
+	if (stat.f_flag & ST_SYNCHRONOUS)
+		mnt_flags |= MS_SYNCHRONOUS;
+	if (stat.f_flag & ST_MANDLOCK)
+		mnt_flags |= ST_MANDLOCK;
+
+	return mnt_flags;
+}
+
+static char *get_field(char *src, int nfields)
+{
+	int i;
+	char *p = src;
+
+	for (i = 0; i < nfields; i++) {
+		while (*p && *p != ' ' && *p != '\t')
+			p++;
+
+		if (!*p)
+			break;
+
+		p++;
+	}
+
+	return p;
+}
+
+static void null_endofword(char *word)
+{
+	while (*word && *word != ' ' && *word != '\t')
+		word++;
+	*word = '\0';
+}
+
+static bool is_shared_mount(const char *path)
+{
+	size_t len = 0;
+	char *line = NULL;
+	FILE *f = NULL;
+
+	f = fopen("/proc/self/mountinfo", "re");
+	if (!f)
+		return false;
+
+	while (getline(&line, &len, f) != -1) {
+		char *opts, *target;
+
+		target = get_field(line, 4);
+		if (!target)
+			continue;
+
+		opts = get_field(target, 2);
+		if (!opts)
+			continue;
+
+		null_endofword(target);
+
+		if (strcmp(target, path) != 0)
+			continue;
+
+		null_endofword(opts);
+		if (strstr(opts, "shared:"))
+			return true;
+	}
+
+	free(line);
+	fclose(f);
+
+	return false;
+}
+
+static void *mount_setattr_thread(void *data)
+{
+	struct mount_attr attr = {
+		.attr_set	= MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOSUID,
+		.attr_clr	= 0,
+		.propagation	= MAKE_PROPAGATION_SHARED,
+	};
+
+	if (sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)))
+		pthread_exit(int_to_ptr(-1));
+
+	pthread_exit(int_to_ptr(0));
+}
+
+FIXTURE(mount_setattr) {
+};
+
+FIXTURE_SETUP(mount_setattr)
+{
+	ASSERT_EQ(prepare_unpriv_mountns(), 0);
+
+	(void)umount2("/mnt", MNT_DETACH);
+	(void)umount2("/tmp", MNT_DETACH);
+
+	ASSERT_EQ(mount("testing", "/tmp", "tmpfs", MS_NOATIME | MS_NODEV,
+			"size=100000,mode=700"), 0);
+
+	ASSERT_EQ(mkdir("/tmp/B", 0777), 0);
+
+	ASSERT_EQ(mount("testing", "/tmp/B", "tmpfs", MS_NOATIME | MS_NODEV,
+			"size=100000,mode=700"), 0);
+
+	ASSERT_EQ(mkdir("/tmp/B/BB", 0777), 0);
+
+	ASSERT_EQ(mount("testing", "/tmp/B/BB", "tmpfs", MS_NOATIME | MS_NODEV,
+			"size=100000,mode=700"), 0);
+
+	ASSERT_EQ(mount("testing", "/mnt", "tmpfs", MS_NOATIME | MS_NODEV,
+			"size=100000,mode=700"), 0);
+
+	ASSERT_EQ(mkdir("/mnt/A", 0777), 0);
+
+	ASSERT_EQ(mount("testing", "/mnt/A", "tmpfs", MS_NOATIME | MS_NODEV,
+			"size=100000,mode=700"), 0);
+
+	ASSERT_EQ(mkdir("/mnt/A/AA", 0777), 0);
+
+	ASSERT_EQ(mount("/tmp", "/mnt/A/AA", NULL, MS_BIND | MS_REC, NULL), 0);
+
+	ASSERT_EQ(mkdir("/mnt/B", 0777), 0);
+
+	ASSERT_EQ(mount("testing", "/mnt/B", "ramfs",
+			MS_NOATIME | MS_NODEV | MS_NOSUID, 0), 0);
+
+	ASSERT_EQ(mkdir("/mnt/B/BB", 0777), 0);
+
+	ASSERT_EQ(mount("testing", "/tmp/B/BB", "devpts",
+			MS_RELATIME | MS_NOEXEC | MS_RDONLY, 0), 0);
+}
+
+FIXTURE_TEARDOWN(mount_setattr)
+{
+	(void)umount2("/mnt/A", MNT_DETACH);
+	(void)umount2("/tmp", MNT_DETACH);
+}
+
+TEST_F(mount_setattr, invalid_attributes)
+{
+	struct mount_attr invalid_attr = {
+		.attr_set = (1U << 31),
+	};
+
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &invalid_attr,
+				    sizeof(invalid_attr)), 0);
+
+	invalid_attr.attr_set	= 0;
+	invalid_attr.attr_clr	= (1U << 31);
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &invalid_attr,
+				    sizeof(invalid_attr)), 0);
+
+	invalid_attr.attr_clr		= 0;
+	invalid_attr.propagation	= (1U << 31);
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &invalid_attr,
+				    sizeof(invalid_attr)), 0);
+
+	invalid_attr.attr_set		= (1U << 31);
+	invalid_attr.attr_clr		= (1U << 31);
+	invalid_attr.propagation	= (1U << 31);
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &invalid_attr,
+				    sizeof(invalid_attr)), 0);
+
+	ASSERT_NE(sys_mount_setattr(-1, "mnt/A", AT_RECURSIVE, &invalid_attr,
+				    sizeof(invalid_attr)), 0);
+}
+
+TEST_F(mount_setattr, extensibility)
+{
+	unsigned int old_flags = 0, new_flags = 0, expected_flags = 0;
+	char *s = "dummy";
+	struct mount_attr invalid_attr = {};
+	struct mount_attr_large {
+		struct mount_attr attr1;
+		struct mount_attr attr2;
+		struct mount_attr attr3;
+	} large_attr = {};
+
+	old_flags = read_mnt_flags("/mnt/A");
+	ASSERT_GT(old_flags, 0);
+
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, NULL,
+				    sizeof(invalid_attr)), 0);
+	ASSERT_EQ(errno, EFAULT);
+
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, (void *)s,
+				    sizeof(invalid_attr)), 0);
+	ASSERT_EQ(errno, EINVAL);
+
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &invalid_attr, 0), 0);
+	ASSERT_EQ(errno, EINVAL);
+
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &invalid_attr,
+				    sizeof(invalid_attr) / 2), 0);
+	ASSERT_EQ(errno, EINVAL);
+
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &invalid_attr,
+				    sizeof(invalid_attr) / 2), 0);
+	ASSERT_EQ(errno, EINVAL);
+
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE,
+				    (void *)&large_attr, sizeof(large_attr)), 0);
+
+	large_attr.attr3.attr_set = MOUNT_ATTR_RDONLY;
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE,
+				    (void *)&large_attr, sizeof(large_attr)), 0);
+
+	large_attr.attr3.attr_set = 0;
+	large_attr.attr1.attr_set = MOUNT_ATTR_RDONLY;
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE,
+				    (void *)&large_attr, sizeof(large_attr)), 0);
+
+	expected_flags = old_flags;
+	expected_flags |= MS_RDONLY;
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+}
+
+TEST_F(mount_setattr, basic)
+{
+	unsigned int old_flags = 0, new_flags = 0, expected_flags = 0;
+	struct mount_attr attr = {
+		.attr_set	= MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOEXEC | MOUNT_ATTR_RELATIME,
+		.attr_clr	= MOUNT_ATTR__ATIME,
+	};
+
+	old_flags = read_mnt_flags("/mnt/A");
+	ASSERT_GT(old_flags, 0);
+
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", 0, &attr, sizeof(attr)), 0);
+
+	expected_flags = old_flags;
+	expected_flags |= MS_RDONLY;
+	expected_flags |= MS_NOEXEC;
+	expected_flags &= ~MS_NOATIME;
+	expected_flags |= MS_RELATIME;
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, old_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, old_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, old_flags);
+}
+
+TEST_F(mount_setattr, basic_recursive)
+{
+	int fd;
+	unsigned int old_flags = 0, new_flags = 0, expected_flags = 0;
+	struct mount_attr attr = {
+		.attr_set	= MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOEXEC | MOUNT_ATTR_RELATIME,
+		.attr_clr	= MOUNT_ATTR__ATIME,
+	};
+
+	old_flags = read_mnt_flags("/mnt/A");
+	ASSERT_GT(old_flags, 0);
+
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags = old_flags;
+	expected_flags |= MS_RDONLY;
+	expected_flags |= MS_NOEXEC;
+	expected_flags &= ~MS_NOATIME;
+	expected_flags |= MS_RELATIME;
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	memset(&attr, 0, sizeof(attr));
+	attr.attr_clr = MOUNT_ATTR_RDONLY;
+	attr.propagation = MAKE_PROPAGATION_SHARED;
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags &= ~MS_RDONLY;
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B/BB"), true);
+
+	fd = open("/mnt/A/AA/B/b", O_RDWR | O_CLOEXEC | O_CREAT | O_EXCL, 0777);
+	ASSERT_GE(fd, 0);
+
+	/*
+	 * We're holding a fd open for writing so this needs to fail somewhere
+	 * in the middle and the mount options need to be unchanged.
+	 */
+	attr.attr_set = MOUNT_ATTR_RDONLY;
+	ASSERT_LT(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B/BB"), true);
+
+	EXPECT_EQ(close(fd), 0);
+}
+
+TEST_F(mount_setattr, mount_has_writers)
+{
+	int fd, dfd;
+	unsigned int old_flags = 0, new_flags = 0;
+	struct mount_attr attr = {
+		.attr_set	= MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOEXEC | MOUNT_ATTR_RELATIME,
+		.attr_clr	= MOUNT_ATTR__ATIME,
+		.propagation	= MAKE_PROPAGATION_SHARED,
+	};
+
+	old_flags = read_mnt_flags("/mnt/A");
+	ASSERT_GT(old_flags, 0);
+
+	fd = open("/mnt/A/AA/B/b", O_RDWR | O_CLOEXEC | O_CREAT | O_EXCL, 0777);
+	ASSERT_GE(fd, 0);
+
+	/*
+	 * We're holding a fd open to a mount somwhere in the middle so this
+	 * needs to fail somewhere in the middle. After this the mount options
+	 * need to be unchanged.
+	 */
+	ASSERT_LT(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, old_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A"), false);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, old_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA"), false);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, old_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B"), false);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, old_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B/BB"), false);
+
+	dfd = open("/mnt/A/AA/B", O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dfd, 0);
+	EXPECT_EQ(fsync(dfd), 0);
+	EXPECT_EQ(close(dfd), 0);
+
+	EXPECT_EQ(fsync(fd), 0);
+	EXPECT_EQ(close(fd), 0);
+
+	/* All writers are gone so this should succeed. */
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+}
+
+TEST_F(mount_setattr, mixed_mount_options)
+{
+	unsigned int old_flags1 = 0, old_flags2 = 0, new_flags = 0, expected_flags = 0;
+	struct mount_attr attr = {
+		.attr_clr = MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOSUID | MOUNT_ATTR_NOEXEC | MOUNT_ATTR__ATIME,
+		.attr_set = MOUNT_ATTR_RELATIME,
+	};
+
+	old_flags1 = read_mnt_flags("/mnt/B");
+	ASSERT_GT(old_flags1, 0);
+
+	old_flags2 = read_mnt_flags("/mnt/B/BB");
+	ASSERT_GT(old_flags2, 0);
+
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/B", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags = old_flags2;
+	expected_flags &= ~(MS_RDONLY | MS_NOEXEC | MS_NOATIME | MS_NOSUID);
+	expected_flags |= MS_RELATIME;
+
+	new_flags = read_mnt_flags("/mnt/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	expected_flags = old_flags2;
+	expected_flags &= ~(MS_RDONLY | MS_NOEXEC | MS_NOATIME | MS_NOSUID);
+	expected_flags |= MS_RELATIME;
+
+	new_flags = read_mnt_flags("/mnt/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+}
+
+TEST_F(mount_setattr, time_changes)
+{
+	unsigned int old_flags = 0, new_flags = 0, expected_flags = 0;
+	struct mount_attr attr = {
+		.attr_set	= MOUNT_ATTR_NODIRATIME | MOUNT_ATTR_NOATIME,
+	};
+
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	attr.attr_set = MOUNT_ATTR_STRICTATIME;
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	attr.attr_set = MOUNT_ATTR_STRICTATIME | MOUNT_ATTR_NOATIME;
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	attr.attr_set = MOUNT_ATTR_STRICTATIME | MOUNT_ATTR_NOATIME;
+	attr.attr_clr = MOUNT_ATTR__ATIME;
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	attr.attr_set = 0;
+	attr.attr_clr = MOUNT_ATTR_STRICTATIME;
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	attr.attr_clr = MOUNT_ATTR_NOATIME;
+	ASSERT_NE(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	old_flags = read_mnt_flags("/mnt/A");
+	ASSERT_GT(old_flags, 0);
+
+	attr.attr_set = MOUNT_ATTR_NODIRATIME | MOUNT_ATTR_NOATIME;
+	attr.attr_clr = MOUNT_ATTR__ATIME;
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags = old_flags;
+	expected_flags |= MS_NOATIME;
+	expected_flags |= MS_NODIRATIME;
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	memset(&attr, 0, sizeof(attr));
+	attr.attr_set &= ~MOUNT_ATTR_NOATIME;
+	attr.attr_set |= MOUNT_ATTR_RELATIME;
+	attr.attr_clr |= MOUNT_ATTR__ATIME;
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags &= ~MS_NOATIME;
+	expected_flags |= MS_RELATIME;
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	memset(&attr, 0, sizeof(attr));
+	attr.attr_set &= ~MOUNT_ATTR_RELATIME;
+	attr.attr_set |= MOUNT_ATTR_STRICTATIME;
+	attr.attr_clr |= MOUNT_ATTR__ATIME;
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags &= ~MS_RELATIME;
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	memset(&attr, 0, sizeof(attr));
+	attr.attr_set &= ~MOUNT_ATTR_STRICTATIME;
+	attr.attr_set |= MOUNT_ATTR_NOATIME;
+	attr.attr_clr |= MOUNT_ATTR__ATIME;
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags |= MS_NOATIME;
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	memset(&attr, 0, sizeof(attr));
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	memset(&attr, 0, sizeof(attr));
+	attr.attr_clr = MOUNT_ATTR_NODIRATIME;
+	ASSERT_EQ(sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	expected_flags &= ~MS_NODIRATIME;
+
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+}
+
+TEST_F(mount_setattr, multi_threaded)
+{
+	int i, j, nthreads, ret = 0;
+	unsigned int old_flags = 0, new_flags = 0, expected_flags = 0;
+	pthread_attr_t pattr;
+	pthread_t threads[DEFAULT_THREADS];
+
+	old_flags = read_mnt_flags("/mnt/A");
+	ASSERT_GT(old_flags, 0);
+
+	/* Try to change mount options from multiple threads. */
+	nthreads = get_nprocs_conf();
+	if (nthreads > DEFAULT_THREADS)
+		nthreads = DEFAULT_THREADS;
+
+	pthread_attr_init(&pattr);
+	for (i = 0; i < nthreads; i++)
+		ASSERT_EQ(pthread_create(&threads[i], &pattr, mount_setattr_thread, NULL), 0);
+
+	for (j = 0; j < i; j++) {
+		void *retptr = NULL;
+
+		EXPECT_EQ(pthread_join(threads[j], &retptr), 0);
+
+		ret += ptr_to_int(retptr);
+		EXPECT_EQ(ret, 0);
+	}
+	pthread_attr_destroy(&pattr);
+
+	ASSERT_EQ(ret, 0);
+
+	expected_flags = old_flags;
+	expected_flags |= MS_RDONLY;
+	expected_flags |= MS_NOSUID;
+	new_flags = read_mnt_flags("/mnt/A");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B"), true);
+
+	new_flags = read_mnt_flags("/mnt/A/AA/B/BB");
+	ASSERT_EQ(new_flags, expected_flags);
+
+	ASSERT_EQ(is_shared_mount("/mnt/A/AA/B/BB"), true);
+}
+
+TEST_F(mount_setattr, wrong_user_namespace)
+{
+	int ret;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_RDONLY,
+	};
+
+	EXPECT_EQ(create_and_enter_userns(), 0);
+	ret = sys_mount_setattr(-1, "/mnt/A", AT_RECURSIVE, &attr, sizeof(attr));
+	ASSERT_LT(ret, 0);
+	ASSERT_EQ(errno, EPERM);
+}
+
+TEST_F(mount_setattr, wrong_mount_namespace)
+{
+	int fd, ret;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_RDONLY,
+	};
+
+	fd = open("/mnt/A", O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(fd, 0);
+
+	ASSERT_EQ(unshare(CLONE_NEWNS), 0);
+
+	ret = sys_mount_setattr(fd, "", AT_EMPTY_PATH | AT_RECURSIVE, &attr, sizeof(attr));
+	ASSERT_LT(ret, 0);
+	ASSERT_EQ(errno, EINVAL);
+}
+
+TEST_HARNESS_MAIN
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 06/39] fs: add id translation helpers
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (4 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 05/39] tests: add mount_setattr() selftests Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 07/39] mount: attach mappings to mounts Christian Brauner
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Add simple helpers to make it easy to map kuids into and from idmapped
mounts. We provide simple wrappers that filesystems can use to
e.g. initialize inodes similar to i_{uid,gid}_read() and
i_{uid,gid}_write(). Accessing an inode through an idmapped mount will
require the inode to be mapped according to the mount's user namespace.
If the fsids are used to compare against inodes or to initialize inodes
they are required to be shifted from the mount's user namespace. Passing
the initial user namespace to these helpers makes them a nop and so any
non-idmapped paths will not be impacted.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig <hch@lst.de>:
  - Get rid of the ifdefs and the config option that hid idmapped mounts.
---
 include/linux/fs.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21cc971fd960..9e487cbf0f5c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -39,6 +39,7 @@
 #include <linux/fs_types.h>
 #include <linux/build_bug.h>
 #include <linux/stddef.h>
+#include <linux/cred.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1574,6 +1575,48 @@ static inline void i_gid_write(struct inode *inode, gid_t gid)
 	inode->i_gid = make_kgid(inode->i_sb->s_user_ns, gid);
 }
 
+static inline kuid_t kuid_into_mnt(struct user_namespace *to, kuid_t kuid)
+{
+	return make_kuid(to, __kuid_val(kuid));
+}
+
+static inline kgid_t kgid_into_mnt(struct user_namespace *to, kgid_t kgid)
+{
+	return make_kgid(to, __kgid_val(kgid));
+}
+
+static inline kuid_t i_uid_into_mnt(struct user_namespace *to,
+				    const struct inode *inode)
+{
+	return kuid_into_mnt(to, inode->i_uid);
+}
+
+static inline kgid_t i_gid_into_mnt(struct user_namespace *to,
+				    const struct inode *inode)
+{
+	return kgid_into_mnt(to, inode->i_gid);
+}
+
+static inline kuid_t kuid_from_mnt(struct user_namespace *to, kuid_t kuid)
+{
+	return KUIDT_INIT(from_kuid(to, kuid));
+}
+
+static inline kgid_t kgid_from_mnt(struct user_namespace *to, kgid_t kgid)
+{
+	return KGIDT_INIT(from_kgid(to, kgid));
+}
+
+static inline kuid_t fsuid_into_mnt(struct user_namespace *to)
+{
+	return kuid_from_mnt(to, current_fsuid());
+}
+
+static inline kgid_t fsgid_into_mnt(struct user_namespace *to)
+{
+	return kgid_from_mnt(to, current_fsgid());
+}
+
 extern struct timespec64 current_time(struct inode *inode);
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (5 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 06/39] fs: add id translation helpers Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-23 15:47   ` Tycho Andersen
  2020-11-15 10:36 ` [PATCH v2 08/39] capability: handle idmapped mounts Christian Brauner
                   ` (34 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

In order to support per-mount idmappings vfsmounts will be marked with user
namespaces. The idmapping associated with that user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts will be marked with the initial user namespace. The
initial user namespace is used to indicate that a mount is not idmapped. All
operations behave as before and no performance impact is seen.

Based on prior discussions we want to attach the whole user namespace and not
just a dedicated idmapping struct. This allows us to reuse all the helpers that
already exist for dealing with idmappings instead of introducing a whole new
range of helpers. In addition, if we decide in the future that we are confident
enough to enable unprivileged user to setup idmapped mounts we can allow
already idmapped mounts to be marked with another user namespace. For now, we
will enforce in later patches that once a mount has been idmapped it can't be
remapped. This keeps permission checking and life-cycle management simple
especially since users can always create a new mount with a different idmapping
anyway.

The idea to attach user namespaces to vfsmounts has been floated around in
various forms at Linux Plumbers in ~2018 with the original idea being tracing
back to a discussion during a conference in St. Petersburg between Christoph,
Tycho, and myself.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
- Christoph Hellwig:
  - Split internal implementation into separate patch and move syscall
    implementation later.
---
 fs/namespace.c        |  6 ++++++
 include/linux/fs.h    |  1 +
 include/linux/mount.h | 12 ++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9fc8b22dba26..15fb0ae3f01f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -220,6 +220,7 @@ static struct mount *alloc_vfsmnt(const char *name)
 		INIT_HLIST_NODE(&mnt->mnt_mp_list);
 		INIT_LIST_HEAD(&mnt->mnt_umounting);
 		INIT_HLIST_HEAD(&mnt->mnt_stuck_children);
+		mnt->mnt.mnt_user_ns = &init_user_ns;
 	}
 	return mnt;
 
@@ -559,6 +560,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 
 static void free_vfsmnt(struct mount *mnt)
 {
+	if (mnt_idmapped(&mnt->mnt) && mnt_user_ns(&mnt->mnt) != &init_user_ns)
+		put_user_ns(mnt_user_ns(&mnt->mnt));
 	kfree_const(mnt->mnt_devname);
 #ifdef CONFIG_SMP
 	free_percpu(mnt->mnt_pcp);
@@ -1067,6 +1070,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL);
 
 	atomic_inc(&sb->s_active);
+	mnt->mnt.mnt_user_ns = old->mnt.mnt_user_ns;
+	if (mnt_user_ns(&old->mnt) != &init_user_ns)
+		mnt->mnt.mnt_user_ns = get_user_ns(mnt->mnt.mnt_user_ns);
 	mnt->mnt.mnt_sb = sb;
 	mnt->mnt.mnt_root = dget(root);
 	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9e487cbf0f5c..9e05fb4f997c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2260,6 +2260,7 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
+#define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
 #define FS_THP_SUPPORT		8192	/* Remove once all fs converted */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index aaf343b38671..3c7ba1bd4a21 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -31,6 +31,7 @@ struct fs_context;
 #define MNT_RELATIME	0x20
 #define MNT_READONLY	0x40	/* does the user want this to be r/o? */
 #define MNT_NOSYMFOLLOW	0x80
+#define MNT_IDMAPPED	0x400
 
 #define MNT_SHRINKABLE	0x100
 #define MNT_WRITE_HOLD	0x200
@@ -72,8 +73,19 @@ struct vfsmount {
 	struct dentry *mnt_root;	/* root of the mounted tree */
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	int mnt_flags;
+	struct user_namespace *mnt_user_ns;
 } __randomize_layout;
 
+static inline bool mnt_idmapped(const struct vfsmount *mnt)
+{
+	return READ_ONCE(mnt->mnt_flags) & MNT_IDMAPPED;
+}
+
+static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
+{
+	return mnt->mnt_user_ns;
+}
+
 struct file; /* forward dec */
 struct path;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 08/39] capability: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (6 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 07/39] mount: attach mappings to mounts Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 09/39] namei: add idmapped mount aware permission helpers Christian Brauner
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

In order to determine whether a caller holds privilege over a given
inode the capability framework exposes the two helpers
privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
verifies that the inode has a mapping in the caller's user namespace and
the latter additionally verifies that the caller has the requested
capability in their current user namespace. If the inode is accessed
through an idmapped mount we first need to map it according to the
mount's user namespace. Afterwards the checks are identical to
non-idmapped inodes. If the initial user namespace is passed all
operations are a nop so non-idmapped mounts will not see a change in
behavior and will also not see any performance impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 fs/attr.c                  |  8 ++++----
 fs/exec.c                  |  2 +-
 fs/inode.c                 |  2 +-
 fs/namei.c                 | 13 ++++++++-----
 fs/overlayfs/super.c       |  2 +-
 fs/posix_acl.c             |  2 +-
 fs/xfs/xfs_ioctl.c         |  2 +-
 include/linux/capability.h |  7 +++++--
 kernel/capability.c        | 14 +++++++++-----
 security/commoncap.c       |  5 +++--
 10 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index b4bbdbd4c8ca..d270f640a192 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -23,7 +23,7 @@ static bool chown_ok(const struct inode *inode, kuid_t uid)
 	if (uid_eq(current_fsuid(), inode->i_uid) &&
 	    uid_eq(uid, inode->i_uid))
 		return true;
-	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if (capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_CHOWN))
 		return true;
 	if (uid_eq(inode->i_uid, INVALID_UID) &&
 	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
@@ -36,7 +36,7 @@ static bool chgrp_ok(const struct inode *inode, kgid_t gid)
 	if (uid_eq(current_fsuid(), inode->i_uid) &&
 	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
 		return true;
-	if (capable_wrt_inode_uidgid(inode, CAP_CHOWN))
+	if (capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_CHOWN))
 		return true;
 	if (gid_eq(inode->i_gid, INVALID_GID) &&
 	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
@@ -92,7 +92,7 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
 		/* Also check the setgid bit! */
 		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
 				inode->i_gid) &&
-		    !capable_wrt_inode_uidgid(inode, CAP_FSETID))
+		    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID))
 			attr->ia_mode &= ~S_ISGID;
 	}
 
@@ -193,7 +193,7 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
 		umode_t mode = attr->ia_mode;
 
 		if (!in_group_p(inode->i_gid) &&
-		    !capable_wrt_inode_uidgid(inode, CAP_FSETID))
+		    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID))
 			mode &= ~S_ISGID;
 		inode->i_mode = mode;
 	}
diff --git a/fs/exec.c b/fs/exec.c
index 547a2390baf5..8e75d7a33514 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1398,7 +1398,7 @@ void would_dump(struct linux_binprm *bprm, struct file *file)
 		/* Ensure mm->user_ns contains the executable */
 		user_ns = old = bprm->mm->user_ns;
 		while ((user_ns != &init_user_ns) &&
-		       !privileged_wrt_inode_uidgid(user_ns, inode))
+		       !privileged_wrt_inode_uidgid(user_ns, &init_user_ns, inode))
 			user_ns = user_ns->parent;
 
 		if (old != user_ns) {
diff --git a/fs/inode.c b/fs/inode.c
index 9d78c37b00b8..7a15372d9c2d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2147,7 +2147,7 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
 			mode |= S_ISGID;
 		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
 			 !in_group_p(inode->i_gid) &&
-			 !capable_wrt_inode_uidgid(dir, CAP_FSETID))
+			 !capable_wrt_inode_uidgid(&init_user_ns, dir, CAP_FSETID))
 			mode &= ~S_ISGID;
 	} else
 		inode->i_gid = current_fsgid();
diff --git a/fs/namei.c b/fs/namei.c
index d4a6dd772303..3f52730af6c5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -357,10 +357,11 @@ int generic_permission(struct inode *inode, int mask)
 	if (S_ISDIR(inode->i_mode)) {
 		/* DACs are overridable for directories */
 		if (!(mask & MAY_WRITE))
-			if (capable_wrt_inode_uidgid(inode,
+			if (capable_wrt_inode_uidgid(&init_user_ns, inode,
 						     CAP_DAC_READ_SEARCH))
 				return 0;
-		if (capable_wrt_inode_uidgid(inode, CAP_DAC_OVERRIDE))
+		if (capable_wrt_inode_uidgid(&init_user_ns, inode,
+					     CAP_DAC_OVERRIDE))
 			return 0;
 		return -EACCES;
 	}
@@ -370,7 +371,8 @@ int generic_permission(struct inode *inode, int mask)
 	 */
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
 	if (mask == MAY_READ)
-		if (capable_wrt_inode_uidgid(inode, CAP_DAC_READ_SEARCH))
+		if (capable_wrt_inode_uidgid(&init_user_ns, inode,
+					     CAP_DAC_READ_SEARCH))
 			return 0;
 	/*
 	 * Read/write DACs are always overridable.
@@ -378,7 +380,8 @@ int generic_permission(struct inode *inode, int mask)
 	 * at least one exec bit set.
 	 */
 	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
-		if (capable_wrt_inode_uidgid(inode, CAP_DAC_OVERRIDE))
+		if (capable_wrt_inode_uidgid(&init_user_ns, inode,
+					     CAP_DAC_OVERRIDE))
 			return 0;
 
 	return -EACCES;
@@ -2657,7 +2660,7 @@ int __check_sticky(struct inode *dir, struct inode *inode)
 		return 0;
 	if (uid_eq(dir->i_uid, fsuid))
 		return 0;
-	return !capable_wrt_inode_uidgid(inode, CAP_FOWNER);
+	return !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FOWNER);
 }
 EXPORT_SYMBOL(__check_sticky);
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983bcfbb3..196fe3e3f02b 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -972,7 +972,7 @@ ovl_posix_acl_xattr_set(const struct xattr_handler *handler,
 	if (unlikely(inode->i_mode & S_ISGID) &&
 	    handler->flags == ACL_TYPE_ACCESS &&
 	    !in_group_p(inode->i_gid) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_FSETID)) {
+	    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID)) {
 		struct iattr iattr = { .ia_valid = ATTR_KILL_SGID };
 
 		err = ovl_setattr(dentry, &iattr);
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 95882b3f5f62..4ca6d53c6f0a 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -656,7 +656,7 @@ int posix_acl_update_mode(struct inode *inode, umode_t *mode_p,
 	if (error == 0)
 		*acl = NULL;
 	if (!in_group_p(inode->i_gid) &&
-	    !capable_wrt_inode_uidgid(inode, CAP_FSETID))
+	    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID))
 		mode &= ~S_ISGID;
 	*mode_p = mode;
 	return 0;
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3fbd98f61ea5..97bd29fc8c43 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1502,7 +1502,7 @@ xfs_ioctl_setattr(
 	 */
 
 	if ((VFS_I(ip)->i_mode & (S_ISUID|S_ISGID)) &&
-	    !capable_wrt_inode_uidgid(VFS_I(ip), CAP_FSETID))
+	    !capable_wrt_inode_uidgid(&init_user_ns, VFS_I(ip), CAP_FSETID))
 		VFS_I(ip)->i_mode &= ~(S_ISUID|S_ISGID);
 
 	/* Change the ownerships and register project quota modifications */
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 1e7fe311cabe..041e336f3369 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -247,8 +247,11 @@ static inline bool ns_capable_setid(struct user_namespace *ns, int cap)
 	return true;
 }
 #endif /* CONFIG_MULTIUSER */
-extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *inode);
-extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
+extern bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
+					struct user_namespace *mnt_user_ns,
+					const struct inode *inode);
+extern bool capable_wrt_inode_uidgid(struct user_namespace *mnt_user_ns,
+				     const struct inode *inode, int cap);
 extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
 extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
 static inline bool perfmon_capable(void)
diff --git a/kernel/capability.c b/kernel/capability.c
index de7eac903a2a..28e3a599ff7a 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -484,10 +484,12 @@ EXPORT_SYMBOL(file_ns_capable);
  *
  * Return true if the inode uid and gid are within the namespace.
  */
-bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *inode)
+bool privileged_wrt_inode_uidgid(struct user_namespace *ns,
+				 struct user_namespace *mnt_user_ns,
+				 const struct inode *inode)
 {
-	return kuid_has_mapping(ns, inode->i_uid) &&
-		kgid_has_mapping(ns, inode->i_gid);
+	return kuid_has_mapping(ns, i_uid_into_mnt(mnt_user_ns, inode)) &&
+	       kgid_has_mapping(ns, i_gid_into_mnt(mnt_user_ns, inode));
 }
 
 /**
@@ -499,11 +501,13 @@ bool privileged_wrt_inode_uidgid(struct user_namespace *ns, const struct inode *
  * its own user namespace and that the given inode's uid and gid are
  * mapped into the current user namespace.
  */
-bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
+bool capable_wrt_inode_uidgid(struct user_namespace *mnt_user_ns,
+			      const struct inode *inode, int cap)
 {
 	struct user_namespace *ns = current_user_ns();
 
-	return ns_capable(ns, cap) && privileged_wrt_inode_uidgid(ns, inode);
+	return ns_capable(ns, cap) &&
+	       privileged_wrt_inode_uidgid(ns, mnt_user_ns, inode);
 }
 EXPORT_SYMBOL(capable_wrt_inode_uidgid);
 
diff --git a/security/commoncap.c b/security/commoncap.c
index 59bf3c1674c8..4cd2bdfd0a8b 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -489,7 +489,7 @@ int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size)
 		return -EINVAL;
 	if (!validheader(size, cap))
 		return -EINVAL;
-	if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
+	if (!capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_SETFCAP))
 		return -EPERM;
 	if (size == XATTR_CAPS_SZ_2)
 		if (ns_capable(inode->i_sb->s_user_ns, CAP_SETFCAP))
@@ -957,7 +957,8 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
 		struct inode *inode = d_backing_inode(dentry);
 		if (!inode)
 			return -EINVAL;
-		if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
+		if (!capable_wrt_inode_uidgid(&init_user_ns, inode,
+					      CAP_SETFCAP))
 			return -EPERM;
 		return 0;
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 09/39] namei: add idmapped mount aware permission helpers
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (7 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 08/39] capability: handle idmapped mounts Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 10/39] inode: add idmapped mount aware init and " Christian Brauner
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

The two helpers inode_permission() and generic_permission() are used by the
vfs to perform basic permission checking by verifying that the caller is
privileged over an inode. In order to handle idmapped mounts we extend the
two helpers with an additional user namespace argument. On idmapped mounts
the two helpers will make sure to map the inode according to the mount's
user namespace and then peform identical permission checks to
inode_permission() and generic_permission(). If the initial user namespace
is passed nothing changes so there will be no performance impact on
non-idmapped mounts.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 fs/attr.c                          |  3 +-
 fs/btrfs/inode.c                   |  2 +-
 fs/btrfs/ioctl.c                   | 10 ++---
 fs/ceph/inode.c                    |  2 +-
 fs/cifs/cifsfs.c                   |  2 +-
 fs/configfs/symlink.c              |  2 +-
 fs/ecryptfs/inode.c                |  2 +-
 fs/exec.c                          |  2 +-
 fs/fuse/dir.c                      |  4 +-
 fs/gfs2/inode.c                    |  2 +-
 fs/hostfs/hostfs_kern.c            |  2 +-
 fs/init.c                          |  9 ++--
 fs/kernfs/inode.c                  |  2 +-
 fs/libfs.c                         |  7 ++-
 fs/namei.c                         | 68 +++++++++++++++++-------------
 fs/nfs/dir.c                       |  2 +-
 fs/nfsd/nfsfh.c                    |  2 +-
 fs/nfsd/vfs.c                      |  4 +-
 fs/nilfs2/inode.c                  |  2 +-
 fs/notify/fanotify/fanotify_user.c |  2 +-
 fs/notify/inotify/inotify_user.c   |  2 +-
 fs/ocfs2/file.c                    |  2 +-
 fs/ocfs2/refcounttree.c            |  4 +-
 fs/open.c                          | 10 ++---
 fs/orangefs/inode.c                |  2 +-
 fs/overlayfs/file.c                |  2 +-
 fs/overlayfs/inode.c               |  4 +-
 fs/overlayfs/util.c                |  2 +-
 fs/posix_acl.c                     | 17 +++++---
 fs/proc/base.c                     |  4 +-
 fs/proc/fd.c                       |  2 +-
 fs/reiserfs/xattr.c                |  2 +-
 fs/remap_range.c                   |  2 +-
 fs/udf/file.c                      |  2 +-
 fs/verity/enable.c                 |  2 +-
 fs/xattr.c                         |  2 +-
 include/linux/fs.h                 |  4 +-
 include/linux/posix_acl.h          |  4 +-
 ipc/mqueue.c                       |  2 +-
 kernel/bpf/inode.c                 |  4 +-
 kernel/cgroup/cgroup.c             |  2 +-
 kernel/sys.c                       |  2 +-
 mm/madvise.c                       |  2 +-
 mm/memcontrol.c                    |  2 +-
 mm/mincore.c                       |  2 +-
 net/unix/af_unix.c                 |  2 +-
 46 files changed, 122 insertions(+), 96 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index d270f640a192..c9e29e589cec 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -244,7 +244,8 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de
 			return -EPERM;
 
 		if (!inode_owner_or_capable(inode)) {
-			error = inode_permission(inode, MAY_WRITE);
+			error = inode_permission(&init_user_ns, inode,
+						 MAY_WRITE);
 			if (error)
 				return error;
 		}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index da58c58ef9aa..32e3bf88d4f7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9794,7 +9794,7 @@ static int btrfs_permission(struct inode *inode, int mask)
 		if (BTRFS_I(inode)->flags & BTRFS_INODE_READONLY)
 			return -EACCES;
 	}
-	return generic_permission(inode, mask);
+	return generic_permission(&init_user_ns, inode, mask);
 }
 
 static int btrfs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ab408a23ba32..771ee08920ed 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -910,7 +910,7 @@ static int btrfs_may_delete(struct inode *dir, struct dentry *victim, int isdir)
 	BUG_ON(d_inode(victim->d_parent) != dir);
 	audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);
 
-	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	error = inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 	if (IS_APPEND(dir))
@@ -939,7 +939,7 @@ static inline int btrfs_may_create(struct inode *dir, struct dentry *child)
 		return -EEXIST;
 	if (IS_DEADDIR(dir))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	return inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 }
 
 /*
@@ -2489,7 +2489,7 @@ static int btrfs_search_path_in_tree_user(struct inode *inode,
 				ret = PTR_ERR(temp_inode);
 				goto out_put;
 			}
-			ret = inode_permission(temp_inode, MAY_READ | MAY_EXEC);
+			ret = inode_permission(&init_user_ns, temp_inode, MAY_READ | MAY_EXEC);
 			iput(temp_inode);
 			if (ret) {
 				ret = -EACCES;
@@ -3019,7 +3019,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
 		if (root == dest)
 			goto out_dput;
 
-		err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
+		err = inode_permission(&init_user_ns, inode, MAY_WRITE | MAY_EXEC);
 		if (err)
 			goto out_dput;
 	}
@@ -3090,7 +3090,7 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
 		 * running and allows defrag on files open in read-only mode.
 		 */
 		if (!capable(CAP_SYS_ADMIN) &&
-		    inode_permission(inode, MAY_WRITE)) {
+		    inode_permission(&init_user_ns, inode, MAY_WRITE)) {
 			ret = -EPERM;
 			goto out;
 		}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 526faf4778ce..abfe42df8a1a 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2335,7 +2335,7 @@ int ceph_permission(struct inode *inode, int mask)
 	err = ceph_do_getattr(inode, CEPH_CAP_AUTH_SHARED, false);
 
 	if (!err)
-		err = generic_permission(inode, mask);
+		err = generic_permission(&init_user_ns, inode, mask);
 	return err;
 }
 
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 472cb7777e3e..f79dbc581eee 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -316,7 +316,7 @@ static int cifs_permission(struct inode *inode, int mask)
 		on the client (above and beyond ACL on servers) for
 		servers which do not support setting and viewing mode bits,
 		so allowing client to check permissions is useful */
-		return generic_permission(inode, mask);
+		return generic_permission(&init_user_ns, inode, mask);
 }
 
 static struct kmem_cache *cifs_inode_cachep;
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index cb61467478ca..0b592c55f38e 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -197,7 +197,7 @@ int configfs_symlink(struct inode *dir, struct dentry *dentry, const char *symna
 	if (dentry->d_inode || d_unhashed(dentry))
 		ret = -EEXIST;
 	else
-		ret = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+		ret = inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 	if (!ret)
 		ret = type->ct_item_ops->allow_link(parent_item, target_item);
 	if (!ret) {
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index e23752d9a79f..9b1ae410983c 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -864,7 +864,7 @@ int ecryptfs_truncate(struct dentry *dentry, loff_t new_length)
 static int
 ecryptfs_permission(struct inode *inode, int mask)
 {
-	return inode_permission(ecryptfs_inode_to_lower(inode), mask);
+	return inode_permission(&init_user_ns, ecryptfs_inode_to_lower(inode), mask);
 }
 
 /**
diff --git a/fs/exec.c b/fs/exec.c
index 8e75d7a33514..b499a1a03934 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1391,7 +1391,7 @@ EXPORT_SYMBOL(begin_new_exec);
 void would_dump(struct linux_binprm *bprm, struct file *file)
 {
 	struct inode *inode = file_inode(file);
-	if (inode_permission(inode, MAY_READ) < 0) {
+	if (inode_permission(&init_user_ns, inode, MAY_READ) < 0) {
 		struct user_namespace *old, *user_ns;
 		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
 
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index ff7dbeb16f88..7ce7baed3f4f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1254,7 +1254,7 @@ static int fuse_permission(struct inode *inode, int mask)
 	}
 
 	if (fc->default_permissions) {
-		err = generic_permission(inode, mask);
+		err = generic_permission(&init_user_ns, inode, mask);
 
 		/* If permission is denied, try to refresh file
 		   attributes.  This is also needed, because the root
@@ -1262,7 +1262,7 @@ static int fuse_permission(struct inode *inode, int mask)
 		if (err == -EACCES && !refreshed) {
 			err = fuse_perm_getattr(inode, mask);
 			if (!err)
-				err = generic_permission(inode, mask);
+				err = generic_permission(&init_user_ns, inode, mask);
 		}
 
 		/* Note: the opposite of the above test does not
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 6774865f5b5b..5d6ffa898030 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1850,7 +1850,7 @@ int gfs2_permission(struct inode *inode, int mask)
 	if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
 		error = -EPERM;
 	else
-		error = generic_permission(inode, mask);
+		error = generic_permission(&init_user_ns, inode, mask);
 	if (gfs2_holder_initialized(&i_gh))
 		gfs2_glock_dq_uninit(&i_gh);
 
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index c070c0d8e3e9..36da0c31d053 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -779,7 +779,7 @@ static int hostfs_permission(struct inode *ino, int desired)
 		err = access_file(name, r, w, x);
 	__putname(name);
 	if (!err)
-		err = generic_permission(ino, desired);
+		err = generic_permission(&init_user_ns, ino, desired);
 	return err;
 }
 
diff --git a/fs/init.c b/fs/init.c
index e9c320a48cf1..2b4842f4802b 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -49,7 +49,8 @@ int __init init_chdir(const char *filename)
 	error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path);
 	if (error)
 		return error;
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = inode_permission(&init_user_ns, path.dentry->d_inode,
+				 MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &path);
 	path_put(&path);
@@ -64,7 +65,8 @@ int __init init_chroot(const char *filename)
 	error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path);
 	if (error)
 		return error;
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = inode_permission(&init_user_ns, path.dentry->d_inode,
+				 MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 	error = -EPERM;
@@ -118,7 +120,8 @@ int __init init_eaccess(const char *filename)
 	error = kern_path(filename, LOOKUP_FOLLOW, &path);
 	if (error)
 		return error;
-	error = inode_permission(d_inode(path.dentry), MAY_ACCESS);
+	error = inode_permission(&init_user_ns, d_inode(path.dentry),
+				 MAY_ACCESS);
 	path_put(&path);
 	return error;
 }
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index fc2469a20fed..ff5598cc1de0 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -285,7 +285,7 @@ int kernfs_iop_permission(struct inode *inode, int mask)
 	kernfs_refresh_inode(kn, inode);
 	mutex_unlock(&kernfs_mutex);
 
-	return generic_permission(inode, mask);
+	return generic_permission(&init_user_ns, inode, mask);
 }
 
 int kernfs_xattr_get(struct kernfs_node *kn, const char *name,
diff --git a/fs/libfs.c b/fs/libfs.c
index fc34361c1489..23d0a00668fd 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1316,9 +1316,14 @@ static ssize_t empty_dir_listxattr(struct dentry *dentry, char *list, size_t siz
 	return -EOPNOTSUPP;
 }
 
+static int empty_dir_permission(struct inode *inode, int mask)
+{
+	return generic_permission(&init_user_ns, inode, mask);
+}
+
 static const struct inode_operations empty_dir_inode_operations = {
 	.lookup		= empty_dir_lookup,
-	.permission	= generic_permission,
+	.permission	= empty_dir_permission,
 	.setattr	= empty_dir_setattr,
 	.getattr	= empty_dir_getattr,
 	.listxattr	= empty_dir_listxattr,
diff --git a/fs/namei.c b/fs/namei.c
index 3f52730af6c5..38222f92efb6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -259,7 +259,7 @@ void putname(struct filename *name)
 		__putname(name);
 }
 
-static int check_acl(struct inode *inode, int mask)
+static int check_acl(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 #ifdef CONFIG_FS_POSIX_ACL
 	struct posix_acl *acl;
@@ -271,14 +271,14 @@ static int check_acl(struct inode *inode, int mask)
 		/* no ->get_acl() calls in RCU mode... */
 		if (is_uncached_acl(acl))
 			return -ECHILD;
-	        return posix_acl_permission(inode, acl, mask);
+	        return posix_acl_permission(user_ns, inode, acl, mask);
 	}
 
 	acl = get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
 	if (acl) {
-	        int error = posix_acl_permission(inode, acl, mask);
+	        int error = posix_acl_permission(user_ns, inode, acl, mask);
 	        posix_acl_release(acl);
 	        return error;
 	}
@@ -293,12 +293,15 @@ static int check_acl(struct inode *inode, int mask)
  * Note that the POSIX ACL check cares about the MAY_NOT_BLOCK bit,
  * for RCU walking.
  */
-static int acl_permission_check(struct inode *inode, int mask)
+static int acl_permission_check(struct user_namespace *user_ns,
+				struct inode *inode, int mask)
 {
 	unsigned int mode = inode->i_mode;
+	kuid_t i_uid;
 
 	/* Are we the owner? If so, ACL's don't matter */
-	if (likely(uid_eq(current_fsuid(), inode->i_uid))) {
+	i_uid = i_uid_into_mnt(user_ns, inode);
+	if (likely(uid_eq(current_fsuid(), i_uid))) {
 		mask &= 7;
 		mode >>= 6;
 		return (mask & ~mode) ? -EACCES : 0;
@@ -306,7 +309,7 @@ static int acl_permission_check(struct inode *inode, int mask)
 
 	/* Do we have ACL's? */
 	if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
-		int error = check_acl(inode, mask);
+		int error = check_acl(user_ns, inode, mask);
 		if (error != -EAGAIN)
 			return error;
 	}
@@ -320,7 +323,8 @@ static int acl_permission_check(struct inode *inode, int mask)
 	 * about? Need to check group ownership if so.
 	 */
 	if (mask & (mode ^ (mode >> 3))) {
-		if (in_group_p(inode->i_gid))
+		kgid_t kgid = i_gid_into_mnt(user_ns, inode);
+		if (in_group_p(kgid))
 			mode >>= 3;
 	}
 
@@ -343,25 +347,25 @@ static int acl_permission_check(struct inode *inode, int mask)
  * request cannot be satisfied (eg. requires blocking or too much complexity).
  * It would then be called again in ref-walk mode.
  */
-int generic_permission(struct inode *inode, int mask)
+int generic_permission(struct user_namespace *user_ns, struct inode *inode,
+		       int mask)
 {
 	int ret;
 
 	/*
 	 * Do the basic permission checks.
 	 */
-	ret = acl_permission_check(inode, mask);
+	ret = acl_permission_check(user_ns, inode, mask);
 	if (ret != -EACCES)
 		return ret;
 
 	if (S_ISDIR(inode->i_mode)) {
 		/* DACs are overridable for directories */
 		if (!(mask & MAY_WRITE))
-			if (capable_wrt_inode_uidgid(&init_user_ns, inode,
+			if (capable_wrt_inode_uidgid(user_ns, inode,
 						     CAP_DAC_READ_SEARCH))
 				return 0;
-		if (capable_wrt_inode_uidgid(&init_user_ns, inode,
-					     CAP_DAC_OVERRIDE))
+		if (capable_wrt_inode_uidgid(user_ns, inode, CAP_DAC_OVERRIDE))
 			return 0;
 		return -EACCES;
 	}
@@ -371,7 +375,7 @@ int generic_permission(struct inode *inode, int mask)
 	 */
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
 	if (mask == MAY_READ)
-		if (capable_wrt_inode_uidgid(&init_user_ns, inode,
+		if (capable_wrt_inode_uidgid(user_ns, inode,
 					     CAP_DAC_READ_SEARCH))
 			return 0;
 	/*
@@ -380,8 +384,7 @@ int generic_permission(struct inode *inode, int mask)
 	 * at least one exec bit set.
 	 */
 	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
-		if (capable_wrt_inode_uidgid(&init_user_ns, inode,
-					     CAP_DAC_OVERRIDE))
+		if (capable_wrt_inode_uidgid(user_ns, inode, CAP_DAC_OVERRIDE))
 			return 0;
 
 	return -EACCES;
@@ -394,7 +397,8 @@ EXPORT_SYMBOL(generic_permission);
  * flag in inode->i_opflags, that says "this has not special
  * permission function, use the fast case".
  */
-static inline int do_inode_permission(struct inode *inode, int mask)
+static inline int do_inode_permission(struct user_namespace *user_ns,
+				      struct inode *inode, int mask)
 {
 	if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
 		if (likely(inode->i_op->permission))
@@ -405,7 +409,7 @@ static inline int do_inode_permission(struct inode *inode, int mask)
 		inode->i_opflags |= IOP_FASTPERM;
 		spin_unlock(&inode->i_lock);
 	}
-	return generic_permission(inode, mask);
+	return generic_permission(user_ns, inode, mask);
 }
 
 /**
@@ -430,6 +434,7 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
 
 /**
  * inode_permission - Check for access rights to a given inode
+ * @userns: The user namespace the inode is seen from
  * @inode: Inode to check permission on
  * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
  *
@@ -439,7 +444,8 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
  *
  * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
  */
-int inode_permission(struct inode *inode, int mask)
+int inode_permission(struct user_namespace *user_ns,
+		     struct inode *inode, int mask)
 {
 	int retval;
 
@@ -463,7 +469,7 @@ int inode_permission(struct inode *inode, int mask)
 			return -EACCES;
 	}
 
-	retval = do_inode_permission(inode, mask);
+	retval = do_inode_permission(user_ns, inode, mask);
 	if (retval)
 		return retval;
 
@@ -1009,7 +1015,7 @@ static bool safe_hardlink_source(struct inode *inode)
 		return false;
 
 	/* Hardlinking to unreadable or unwritable sources is dangerous. */
-	if (inode_permission(inode, MAY_READ | MAY_WRITE))
+	if (inode_permission(&init_user_ns, inode, MAY_READ | MAY_WRITE))
 		return false;
 
 	return true;
@@ -1569,13 +1575,14 @@ static struct dentry *lookup_slow(const struct qstr *name,
 static inline int may_lookup(struct nameidata *nd)
 {
 	if (nd->flags & LOOKUP_RCU) {
-		int err = inode_permission(nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
+		int err = inode_permission(&init_user_ns, nd->inode,
+					   MAY_EXEC | MAY_NOT_BLOCK);
 		if (err != -ECHILD)
 			return err;
 		if (unlazy_walk(nd))
 			return -ECHILD;
 	}
-	return inode_permission(nd->inode, MAY_EXEC);
+	return inode_permission(&init_user_ns, nd->inode, MAY_EXEC);
 }
 
 static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
@@ -2507,7 +2514,7 @@ static int lookup_one_len_common(const char *name, struct dentry *base,
 			return err;
 	}
 
-	return inode_permission(base->d_inode, MAY_EXEC);
+	return inode_permission(&init_user_ns, base->d_inode, MAY_EXEC);
 }
 
 /**
@@ -2701,7 +2708,7 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
 
 	audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);
 
-	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	error = inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 	if (IS_APPEND(dir))
@@ -2745,7 +2752,7 @@ static inline int may_create(struct inode *dir, struct dentry *child)
 	if (!kuid_has_mapping(s_user_ns, current_fsuid()) ||
 	    !kgid_has_mapping(s_user_ns, current_fsgid()))
 		return -EOVERFLOW;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	return inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 }
 
 /*
@@ -2875,7 +2882,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 		break;
 	}
 
-	error = inode_permission(inode, MAY_OPEN | acc_mode);
+	error = inode_permission(&init_user_ns, inode, MAY_OPEN | acc_mode);
 	if (error)
 		return error;
 
@@ -2937,7 +2944,8 @@ static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t m
 	    !kgid_has_mapping(s_user_ns, current_fsgid()))
 		return -EOVERFLOW;
 
-	error = inode_permission(dir->dentry->d_inode, MAY_WRITE | MAY_EXEC);
+	error = inode_permission(&init_user_ns, dir->dentry->d_inode,
+				 MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 
@@ -3274,7 +3282,7 @@ struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag)
 	int error;
 
 	/* we want directory to be writable */
-	error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	error = inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 	if (error)
 		goto out_err;
 	error = -EOPNOTSUPP;
@@ -4265,12 +4273,12 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	 */
 	if (new_dir != old_dir) {
 		if (is_dir) {
-			error = inode_permission(source, MAY_WRITE);
+			error = inode_permission(&init_user_ns, source, MAY_WRITE);
 			if (error)
 				return error;
 		}
 		if ((flags & RENAME_EXCHANGE) && new_is_dir) {
-			error = inode_permission(target, MAY_WRITE);
+			error = inode_permission(&init_user_ns, target, MAY_WRITE);
 			if (error)
 				return error;
 		}
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index cb52db9a0cfb..c890144c496a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2793,7 +2793,7 @@ int nfs_permission(struct inode *inode, int mask)
 
 	res = nfs_revalidate_inode(NFS_SERVER(inode), inode);
 	if (res == 0)
-		res = generic_permission(inode, mask);
+		res = generic_permission(&init_user_ns, inode, mask);
 	goto out;
 }
 EXPORT_SYMBOL_GPL(nfs_permission);
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c81dbbad8792..380f0f74740c 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -40,7 +40,7 @@ static int nfsd_acceptable(void *expv, struct dentry *dentry)
 		/* make sure parents give x permission to user */
 		int err;
 		parent = dget_parent(tdentry);
-		err = inode_permission(d_inode(parent), MAY_EXEC);
+		err = inode_permission(&init_user_ns, d_inode(parent), MAY_EXEC);
 		if (err < 0) {
 			dput(parent);
 			break;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 1ecaceebee13..3cad79c3b441 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -2380,13 +2380,13 @@ nfsd_permission(struct svc_rqst *rqstp, struct svc_export *exp,
 		return 0;
 
 	/* This assumes  NFSD_MAY_{READ,WRITE,EXEC} == MAY_{READ,WRITE,EXEC} */
-	err = inode_permission(inode, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
+	err = inode_permission(&init_user_ns, inode, acc & (MAY_READ|MAY_WRITE|MAY_EXEC));
 
 	/* Allow read access to binaries even when mode 111 */
 	if (err == -EACCES && S_ISREG(inode->i_mode) &&
 	     (acc == (NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE) ||
 	      acc == (NFSD_MAY_READ | NFSD_MAY_READ_IF_EXEC)))
-		err = inode_permission(inode, MAY_EXEC);
+		err = inode_permission(&init_user_ns, inode, MAY_EXEC);
 
 	return err? nfserrno(err) : 0;
 }
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 745d371d6fea..b6517220cad5 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -851,7 +851,7 @@ int nilfs_permission(struct inode *inode, int mask)
 	    root->cno != NILFS_CPTREE_CURRENT_CNO)
 		return -EROFS; /* snapshot is not writable */
 
-	return generic_permission(inode, mask);
+	return generic_permission(&init_user_ns, inode, mask);
 }
 
 int nilfs_load_inode_block(struct inode *inode, struct buffer_head **pbh)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 3e01d8f2ab90..de4d01bb1d8d 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -702,7 +702,7 @@ static int fanotify_find_path(int dfd, const char __user *filename,
 	}
 
 	/* you can only watch an inode if you have read permissions on it */
-	ret = inode_permission(path->dentry->d_inode, MAY_READ);
+	ret = inode_permission(&init_user_ns, path->dentry->d_inode, MAY_READ);
 	if (ret) {
 		path_put(path);
 		goto out;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 186722ba3894..e995fd4e4e53 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -343,7 +343,7 @@ static int inotify_find_inode(const char __user *dirname, struct path *path,
 	if (error)
 		return error;
 	/* you can only watch an inode if you have read permissions on it */
-	error = inode_permission(path->dentry->d_inode, MAY_READ);
+	error = inode_permission(&init_user_ns, path->dentry->d_inode, MAY_READ);
 	if (error) {
 		path_put(path);
 		return error;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 85979e2214b3..0c75619adf54 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1355,7 +1355,7 @@ int ocfs2_permission(struct inode *inode, int mask)
 		dump_stack();
 	}
 
-	ret = generic_permission(inode, mask);
+	ret = generic_permission(&init_user_ns, inode, mask);
 
 	ocfs2_inode_unlock_tracker(inode, 0, &oh, had_lock);
 out:
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 3b397fa9c9e8..c26937824be1 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -4346,7 +4346,7 @@ static inline int ocfs2_may_create(struct inode *dir, struct dentry *child)
 		return -EEXIST;
 	if (IS_DEADDIR(dir))
 		return -ENOENT;
-	return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+	return inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 }
 
 /**
@@ -4400,7 +4400,7 @@ static int ocfs2_vfs_reflink(struct dentry *old_dentry, struct inode *dir,
 	 * file.
 	 */
 	if (!preserve) {
-		error = inode_permission(inode, MAY_READ);
+		error = inode_permission(&init_user_ns, inode, MAY_READ);
 		if (error)
 			return error;
 	}
diff --git a/fs/open.c b/fs/open.c
index 9af548fb841b..b0e8430e29f3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -83,7 +83,7 @@ long vfs_truncate(const struct path *path, loff_t length)
 	if (error)
 		goto out;
 
-	error = inode_permission(inode, MAY_WRITE);
+	error = inode_permission(&init_user_ns, inode, MAY_WRITE);
 	if (error)
 		goto mnt_drop_write_and_out;
 
@@ -436,7 +436,7 @@ static long do_faccessat(int dfd, const char __user *filename, int mode, int fla
 			goto out_path_release;
 	}
 
-	res = inode_permission(inode, mode | MAY_ACCESS);
+	res = inode_permission(&init_user_ns, inode, mode | MAY_ACCESS);
 	/* SuS v2 requires we report a read only fs too */
 	if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
 		goto out_path_release;
@@ -492,7 +492,7 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 	if (error)
 		goto out;
 
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = inode_permission(&init_user_ns, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
@@ -521,7 +521,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 	if (!d_can_lookup(f.file->f_path.dentry))
 		goto out_putf;
 
-	error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR);
+	error = inode_permission(&init_user_ns, file_inode(f.file), MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &f.file->f_path);
 out_putf:
@@ -540,7 +540,7 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
 	if (error)
 		goto out;
 
-	error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	error = inode_permission(&init_user_ns, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 48f0547d4850..4c790cc8042d 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -933,7 +933,7 @@ int orangefs_permission(struct inode *inode, int mask)
 	if (ret < 0)
 		return ret;
 
-	return generic_permission(inode, mask);
+	return generic_permission(&init_user_ns, inode, mask);
 }
 
 int orangefs_update_time(struct inode *inode, struct timespec64 *time, int flags)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index efccb7c1f9bc..f966b5108358 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -50,7 +50,7 @@ static struct file *ovl_open_realfile(const struct file *file,
 		acc_mode |= MAY_APPEND;
 
 	old_cred = ovl_override_creds(inode->i_sb);
-	err = inode_permission(realinode, MAY_OPEN | acc_mode);
+	err = inode_permission(&init_user_ns, realinode, MAY_OPEN | acc_mode);
 	if (err) {
 		realfile = ERR_PTR(err);
 	} else if (!inode_owner_or_capable(realinode)) {
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index b584dca845ba..c8e3c17ca131 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -294,7 +294,7 @@ int ovl_permission(struct inode *inode, int mask)
 	 * Check overlay inode with the creds of task and underlying inode
 	 * with creds of mounter
 	 */
-	err = generic_permission(inode, mask);
+	err = generic_permission(&init_user_ns, inode, mask);
 	if (err)
 		return err;
 
@@ -305,7 +305,7 @@ int ovl_permission(struct inode *inode, int mask)
 		/* Make sure mounter can read file for copy up later */
 		mask |= MAY_READ;
 	}
-	err = inode_permission(realinode, mask);
+	err = inode_permission(&init_user_ns, realinode, mask);
 	revert_creds(old_cred);
 
 	return err;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 23f475627d07..ff67da201303 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -476,7 +476,7 @@ struct file *ovl_path_open(struct path *path, int flags)
 		BUG();
 	}
 
-	err = inode_permission(inode, acc_mode | MAY_OPEN);
+	err = inode_permission(&init_user_ns, inode, acc_mode | MAY_OPEN);
 	if (err)
 		return ERR_PTR(err);
 
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 4ca6d53c6f0a..5b6296cc89c4 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -345,10 +345,13 @@ EXPORT_SYMBOL(posix_acl_from_mode);
  * by the acl. Returns -E... otherwise.
  */
 int
-posix_acl_permission(struct inode *inode, const struct posix_acl *acl, int want)
+posix_acl_permission(struct user_namespace *user_ns, struct inode *inode,
+		     const struct posix_acl *acl, int want)
 {
 	const struct posix_acl_entry *pa, *pe, *mask_obj;
 	int found = 0;
+	kuid_t uid;
+	kgid_t gid;
 
 	want &= MAY_READ | MAY_WRITE | MAY_EXEC;
 
@@ -356,22 +359,26 @@ posix_acl_permission(struct inode *inode, const struct posix_acl *acl, int want)
                 switch(pa->e_tag) {
                         case ACL_USER_OBJ:
 				/* (May have been checked already) */
-				if (uid_eq(inode->i_uid, current_fsuid()))
+				uid = i_uid_into_mnt(user_ns, inode);
+				if (uid_eq(uid, current_fsuid()))
                                         goto check_perm;
                                 break;
                         case ACL_USER:
-				if (uid_eq(pa->e_uid, current_fsuid()))
+				uid = kuid_into_mnt(user_ns, pa->e_uid);
+				if (uid_eq(uid, current_fsuid()))
                                         goto mask;
 				break;
                         case ACL_GROUP_OBJ:
-                                if (in_group_p(inode->i_gid)) {
+				gid = i_gid_into_mnt(user_ns, inode);
+				if (in_group_p(gid)) {
 					found = 1;
 					if ((pa->e_perm & want) == want)
 						goto mask;
                                 }
 				break;
                         case ACL_GROUP:
-				if (in_group_p(pa->e_gid)) {
+				gid = kgid_into_mnt(user_ns, pa->e_gid);
+				if (in_group_p(gid)) {
 					found = 1;
 					if ((pa->e_perm & want) == want)
 						goto mask;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0f707003dda5..3d21c0150e05 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -751,7 +751,7 @@ static int proc_pid_permission(struct inode *inode, int mask)
 
 		return -EPERM;
 	}
-	return generic_permission(inode, mask);
+	return generic_permission(&init_user_ns, inode, mask);
 }
 
 
@@ -3487,7 +3487,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
 		return 0;
 	}
 
-	return generic_permission(inode, mask);
+	return generic_permission(&init_user_ns, inode, mask);
 }
 
 static const struct inode_operations proc_tid_comm_inode_operations = {
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 81882a13212d..ad692a39381d 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -299,7 +299,7 @@ int proc_fd_permission(struct inode *inode, int mask)
 	struct task_struct *p;
 	int rv;
 
-	rv = generic_permission(inode, mask);
+	rv = generic_permission(&init_user_ns, inode, mask);
 	if (rv == 0)
 		return rv;
 
diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c
index fe63a7c3e0da..ec440d1957a1 100644
--- a/fs/reiserfs/xattr.c
+++ b/fs/reiserfs/xattr.c
@@ -957,7 +957,7 @@ int reiserfs_permission(struct inode *inode, int mask)
 	if (IS_PRIVATE(inode))
 		return 0;
 
-	return generic_permission(inode, mask);
+	return generic_permission(&init_user_ns, inode, mask);
 }
 
 static int xattr_hide_revalidate(struct dentry *dentry, unsigned int flags)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index e6099beefa97..9e5b27641756 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -438,7 +438,7 @@ static bool allow_file_dedupe(struct file *file)
 		return true;
 	if (uid_eq(current_fsuid(), file_inode(file)->i_uid))
 		return true;
-	if (!inode_permission(file_inode(file), MAY_WRITE))
+	if (!inode_permission(&init_user_ns, file_inode(file), MAY_WRITE))
 		return true;
 	return false;
 }
diff --git a/fs/udf/file.c b/fs/udf/file.c
index ad8eefad27d7..928283925d68 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -183,7 +183,7 @@ long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	long old_block, new_block;
 	int result;
 
-	if (inode_permission(inode, MAY_READ) != 0) {
+	if (inode_permission(&init_user_ns, inode, MAY_READ) != 0) {
 		udf_debug("no permission to access inode %lu\n", inode->i_ino);
 		return -EPERM;
 	}
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 5ab3bbec8108..7449ef0050f4 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -369,7 +369,7 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
 	 * has verity enabled, and to stabilize the data being hashed.
 	 */
 
-	err = inode_permission(inode, MAY_WRITE);
+	err = inode_permission(&init_user_ns, inode, MAY_WRITE);
 	if (err)
 		return err;
 
diff --git a/fs/xattr.c b/fs/xattr.c
index cd7a563e8bcd..61a9947f62f4 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -131,7 +131,7 @@ xattr_permission(struct inode *inode, const char *name, int mask)
 			return -EPERM;
 	}
 
-	return inode_permission(inode, mask);
+	return inode_permission(&init_user_ns, inode, mask);
 }
 
 /*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9e05fb4f997c..8f6fb065450b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2787,8 +2787,8 @@ static inline int bmap(struct inode *inode,  sector_t *block)
 #endif
 
 extern int notify_change(struct dentry *, struct iattr *, struct inode **);
-extern int inode_permission(struct inode *, int);
-extern int generic_permission(struct inode *, int);
+extern int inode_permission(struct user_namespace *, struct inode *, int);
+extern int generic_permission(struct user_namespace *, struct inode *, int);
 extern int __check_sticky(struct inode *dir, struct inode *inode);
 
 static inline bool execute_ok(struct inode *inode)
diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
index 90797f1b421d..8276baefed13 100644
--- a/include/linux/posix_acl.h
+++ b/include/linux/posix_acl.h
@@ -15,6 +15,8 @@
 #include <linux/refcount.h>
 #include <uapi/linux/posix_acl.h>
 
+struct user_namespace;
+
 struct posix_acl_entry {
 	short			e_tag;
 	unsigned short		e_perm;
@@ -62,7 +64,7 @@ posix_acl_release(struct posix_acl *acl)
 extern void posix_acl_init(struct posix_acl *, int);
 extern struct posix_acl *posix_acl_alloc(int, gfp_t);
 extern int posix_acl_valid(struct user_namespace *, const struct posix_acl *);
-extern int posix_acl_permission(struct inode *, const struct posix_acl *, int);
+extern int posix_acl_permission(struct user_namespace *, struct inode *, const struct posix_acl *, int);
 extern struct posix_acl *posix_acl_from_mode(umode_t, gfp_t);
 extern int posix_acl_equiv_mode(const struct posix_acl *, umode_t *);
 extern int __posix_acl_create(struct posix_acl **, gfp_t, umode_t *);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index beff0cfcd1e8..693f01fe1216 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -873,7 +873,7 @@ static int prepare_open(struct dentry *dentry, int oflag, int ro,
 	if ((oflag & O_ACCMODE) == (O_RDWR | O_WRONLY))
 		return -EINVAL;
 	acc = oflag2acc[oflag & O_ACCMODE];
-	return inode_permission(d_inode(dentry), acc);
+	return inode_permission(&init_user_ns, d_inode(dentry), acc);
 }
 
 static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index dd4b7fd60ee7..f1c393e5d47d 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -507,7 +507,7 @@ static void *bpf_obj_do_get(const char __user *pathname,
 		return ERR_PTR(ret);
 
 	inode = d_backing_inode(path.dentry);
-	ret = inode_permission(inode, ACC_MODE(flags));
+	ret = inode_permission(&init_user_ns, inode, ACC_MODE(flags));
 	if (ret)
 		goto out;
 
@@ -558,7 +558,7 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
 static struct bpf_prog *__get_prog_inode(struct inode *inode, enum bpf_prog_type type)
 {
 	struct bpf_prog *prog;
-	int ret = inode_permission(inode, MAY_READ);
+	int ret = inode_permission(&init_user_ns, inode, MAY_READ);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e41c21819ba0..b13aab5a9715 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4673,7 +4673,7 @@ static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)
 	if (!inode)
 		return -ENOMEM;
 
-	ret = inode_permission(inode, MAY_WRITE);
+	ret = inode_permission(&init_user_ns, inode, MAY_WRITE);
 	iput(inode);
 	return ret;
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index a730c03ee607..469a659c8e84 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1847,7 +1847,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	if (!S_ISREG(inode->i_mode) || path_noexec(&exe.file->f_path))
 		goto exit;
 
-	err = inode_permission(inode, MAY_EXEC);
+	err = inode_permission(&init_user_ns, inode, MAY_EXEC);
 	if (err)
 		goto exit;
 
diff --git a/mm/madvise.c b/mm/madvise.c
index 416a56b8e757..8afabc363b6b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -540,7 +540,7 @@ static inline bool can_do_pageout(struct vm_area_struct *vma)
 	 * opens a side channel.
 	 */
 	return inode_owner_or_capable(file_inode(vma->vm_file)) ||
-		inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0;
+		inode_permission(&init_user_ns, file_inode(vma->vm_file), MAY_WRITE) == 0;
 }
 
 static long madvise_pageout(struct vm_area_struct *vma,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3a24e3b619f5..d876e82055d8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4900,7 +4900,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 
 	/* the process need read permission on control file */
 	/* AV: shouldn't we check that it's been opened for read instead? */
-	ret = inode_permission(file_inode(cfile.file), MAY_READ);
+	ret = inode_permission(&init_user_ns, file_inode(cfile.file), MAY_READ);
 	if (ret < 0)
 		goto out_put_cfile;
 
diff --git a/mm/mincore.c b/mm/mincore.c
index 02db1a834021..d5a58e61eac6 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -167,7 +167,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma)
 	 * mappings, which opens a side channel.
 	 */
 	return inode_owner_or_capable(file_inode(vma->vm_file)) ||
-		inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0;
+		inode_permission(&init_user_ns, file_inode(vma->vm_file), MAY_WRITE) == 0;
 }
 
 static const struct mm_walk_ops mincore_walk_ops = {
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 41c3303c3357..f568526d4a02 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -936,7 +936,7 @@ static struct sock *unix_find_other(struct net *net,
 		if (err)
 			goto fail;
 		inode = d_backing_inode(path.dentry);
-		err = inode_permission(inode, MAY_WRITE);
+		err = inode_permission(&init_user_ns, inode, MAY_WRITE);
 		if (err)
 			goto put_fail;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 10/39] inode: add idmapped mount aware init and permission helpers
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (8 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 09/39] namei: add idmapped mount aware permission helpers Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-28 18:12   ` Serge E. Hallyn
  2020-11-15 10:36 ` [PATCH v2 11/39] attr: handle idmapped mounts Christian Brauner
                   ` (31 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped mount
we first need to map it according to the mount's user namespace.
Afterwards the checks are identical to non-idmapped mounts. If the initial
user namespace is passed all operations are a nop so non-idmapped mounts
will not see a change in behavior and will not see any performance impact.

Similarly, we allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the fsuid
and fsgid of the caller from the mount's user namespace. If the initial
user namespace is passed all operations are a nop so non-idmapped mounts
will not see a change in behavior and will also not see any performance
impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 fs/9p/acl.c                  |  2 +-
 fs/9p/vfs_inode.c            |  2 +-
 fs/attr.c                    |  6 +++---
 fs/bfs/dir.c                 |  2 +-
 fs/btrfs/inode.c             |  2 +-
 fs/btrfs/ioctl.c             | 10 +++++-----
 fs/btrfs/tests/btrfs-tests.c |  2 +-
 fs/crypto/policy.c           |  2 +-
 fs/efivarfs/file.c           |  2 +-
 fs/ext2/ialloc.c             |  2 +-
 fs/ext2/ioctl.c              |  6 +++---
 fs/ext4/ialloc.c             |  2 +-
 fs/ext4/ioctl.c              | 14 +++++++-------
 fs/f2fs/file.c               | 14 +++++++-------
 fs/f2fs/namei.c              |  2 +-
 fs/f2fs/xattr.c              |  2 +-
 fs/fcntl.c                   |  2 +-
 fs/gfs2/file.c               |  2 +-
 fs/hfsplus/inode.c           |  2 +-
 fs/hfsplus/ioctl.c           |  2 +-
 fs/hugetlbfs/inode.c         |  2 +-
 fs/inode.c                   | 23 ++++++++++++++---------
 fs/jfs/ioctl.c               |  2 +-
 fs/jfs/jfs_inode.c           |  2 +-
 fs/minix/bitmap.c            |  2 +-
 fs/namei.c                   |  4 ++--
 fs/nilfs2/inode.c            |  2 +-
 fs/nilfs2/ioctl.c            |  2 +-
 fs/ocfs2/dlmfs/dlmfs.c       |  4 ++--
 fs/ocfs2/ioctl.c             |  2 +-
 fs/ocfs2/namei.c             |  2 +-
 fs/omfs/inode.c              |  2 +-
 fs/overlayfs/dir.c           |  2 +-
 fs/overlayfs/file.c          |  4 ++--
 fs/overlayfs/super.c         |  2 +-
 fs/overlayfs/util.c          |  2 +-
 fs/posix_acl.c               |  2 +-
 fs/ramfs/inode.c             |  2 +-
 fs/reiserfs/ioctl.c          |  4 ++--
 fs/reiserfs/namei.c          |  2 +-
 fs/sysv/ialloc.c             |  2 +-
 fs/ubifs/dir.c               |  2 +-
 fs/ubifs/ioctl.c             |  2 +-
 fs/udf/ialloc.c              |  2 +-
 fs/ufs/ialloc.c              |  2 +-
 fs/xattr.c                   |  2 +-
 fs/xfs/xfs_ioctl.c           |  2 +-
 fs/zonefs/super.c            |  2 +-
 include/linux/fs.h           |  7 ++++---
 kernel/bpf/inode.c           |  2 +-
 mm/madvise.c                 |  2 +-
 mm/mincore.c                 |  2 +-
 mm/shmem.c                   |  2 +-
 security/selinux/hooks.c     |  4 ++--
 54 files changed, 95 insertions(+), 89 deletions(-)

diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index 6261719f6f2a..d77b28e8d57a 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -258,7 +258,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler,
 
 	if (S_ISLNK(inode->i_mode))
 		return -EOPNOTSUPP;
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 	if (value) {
 		/* update the cached acl value */
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index ae0c38ad1fcb..f058e89df30f 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -251,7 +251,7 @@ int v9fs_init_inode(struct v9fs_session_info *v9ses,
 {
 	int err = 0;
 
-	inode_init_owner(inode, NULL, mode);
+	inode_init_owner(inode, &init_user_ns, NULL, mode);
 	inode->i_blocks = 0;
 	inode->i_rdev = rdev;
 	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
diff --git a/fs/attr.c b/fs/attr.c
index c9e29e589cec..00ae0b000146 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -87,7 +87,7 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
 
 	/* Make sure a caller can chmod. */
 	if (ia_valid & ATTR_MODE) {
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EPERM;
 		/* Also check the setgid bit! */
 		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
@@ -98,7 +98,7 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
 
 	/* Check for setting the inode time. */
 	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EPERM;
 	}
 
@@ -243,7 +243,7 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de
 		if (IS_IMMUTABLE(inode))
 			return -EPERM;
 
-		if (!inode_owner_or_capable(inode)) {
+		if (!inode_owner_or_capable(&init_user_ns, inode)) {
 			error = inode_permission(&init_user_ns, inode,
 						 MAY_WRITE);
 			if (error)
diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index d8dfe3a0cb39..c5ae76a87be5 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -96,7 +96,7 @@ static int bfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 	}
 	set_bit(ino, info->si_imap);
 	info->si_freei--;
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
 	inode->i_blocks = 0;
 	inode->i_op = &bfs_file_inops;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 32e3bf88d4f7..ed1a5bf5f068 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6015,7 +6015,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
 	if (ret != 0)
 		goto fail_unlock;
 
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	inode_set_bytes(inode, 0);
 
 	inode->i_mtime = current_time(inode);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 771ee08920ed..39f25b5d06ed 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -205,7 +205,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
 	const char *comp = NULL;
 	u32 binode_flags;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 
 	if (btrfs_root_readonly(root))
@@ -417,7 +417,7 @@ static int btrfs_ioctl_fssetxattr(struct file *file, void __user *arg)
 	unsigned old_i_flags;
 	int ret = 0;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 
 	if (btrfs_root_readonly(root))
@@ -1813,7 +1813,7 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
 			btrfs_info(BTRFS_I(file_inode(file))->root->fs_info,
 				   "Snapshot src from another FS");
 			ret = -EXDEV;
-		} else if (!inode_owner_or_capable(src_inode)) {
+		} else if (!inode_owner_or_capable(&init_user_ns, src_inode)) {
 			/*
 			 * Subvolume creation is not restricted, but snapshots
 			 * are limited to own subvolumes only
@@ -1933,7 +1933,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 	u64 flags;
 	int ret = 0;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 
 	ret = mnt_want_write_file(file);
@@ -4403,7 +4403,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file,
 	int ret = 0;
 	int received_uuid_changed;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 
 	ret = mnt_want_write_file(file);
diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index 999c14e5d0bd..ac8e604d44e3 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -56,7 +56,7 @@ struct inode *btrfs_new_test_inode(void)
 
 	inode = new_inode(test_mnt->mnt_sb);
 	if (inode)
-		inode_init_owner(inode, NULL, S_IFREG);
+		inode_init_owner(inode, &init_user_ns, NULL, S_IFREG);
 
 	return inode;
 }
diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index 4441d9944b9e..6ddd9f0d8b36 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -462,7 +462,7 @@ int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg)
 		return -EFAULT;
 	policy.version = version;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	ret = mnt_want_write_file(filp);
diff --git a/fs/efivarfs/file.c b/fs/efivarfs/file.c
index feaa5e182b7b..e6bc0302643b 100644
--- a/fs/efivarfs/file.c
+++ b/fs/efivarfs/file.c
@@ -137,7 +137,7 @@ efivarfs_ioc_setxflags(struct file *file, void __user *arg)
 	unsigned int oldflags = efivarfs_getflags(inode);
 	int error;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
index 432c3febea6d..5081f2dd8a20 100644
--- a/fs/ext2/ialloc.c
+++ b/fs/ext2/ialloc.c
@@ -551,7 +551,7 @@ struct inode *ext2_new_inode(struct inode *dir, umode_t mode,
 		inode->i_uid = current_fsuid();
 		inode->i_gid = dir->i_gid;
 	} else
-		inode_init_owner(inode, dir, mode);
+		inode_init_owner(inode, &init_user_ns, dir, mode);
 
 	inode->i_ino = ino;
 	inode->i_blocks = 0;
diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c
index 32a8d10b579d..b399cbb7022d 100644
--- a/fs/ext2/ioctl.c
+++ b/fs/ext2/ioctl.c
@@ -39,7 +39,7 @@ long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (ret)
 			return ret;
 
-		if (!inode_owner_or_capable(inode)) {
+		if (!inode_owner_or_capable(&init_user_ns, inode)) {
 			ret = -EACCES;
 			goto setflags_out;
 		}
@@ -84,7 +84,7 @@ long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case EXT2_IOC_SETVERSION: {
 		__u32 generation;
 
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EPERM;
 		ret = mnt_want_write_file(filp);
 		if (ret)
@@ -117,7 +117,7 @@ long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
 			return -ENOTTY;
 
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EACCES;
 
 		if (get_user(rsv_window_size, (int __user *)arg))
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index b215c564bc31..d91f69282311 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -972,7 +972,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 		inode->i_uid = current_fsuid();
 		inode->i_gid = dir->i_gid;
 	} else
-		inode_init_owner(inode, dir, mode);
+		inode_init_owner(inode, &init_user_ns, dir, mode);
 
 	if (ext4_has_feature_project(sb) &&
 	    ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT))
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f0381876a7e5..e35aba820254 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -139,7 +139,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	}
 
 	if (IS_RDONLY(inode) || IS_APPEND(inode) || IS_IMMUTABLE(inode) ||
-	    !inode_owner_or_capable(inode) || !capable(CAP_SYS_ADMIN)) {
+	    !inode_owner_or_capable(&init_user_ns, inode) || !capable(CAP_SYS_ADMIN)) {
 		err = -EPERM;
 		goto journal_err_out;
 	}
@@ -829,7 +829,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case FS_IOC_SETFLAGS: {
 		int err;
 
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EACCES;
 
 		if (get_user(flags, (int __user *) arg))
@@ -871,7 +871,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		__u32 generation;
 		int err;
 
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EPERM;
 
 		if (ext4_has_metadata_csum(inode->i_sb)) {
@@ -1010,7 +1010,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case EXT4_IOC_MIGRATE:
 	{
 		int err;
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EACCES;
 
 		err = mnt_want_write_file(filp);
@@ -1032,7 +1032,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case EXT4_IOC_ALLOC_DA_BLKS:
 	{
 		int err;
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EACCES;
 
 		err = mnt_want_write_file(filp);
@@ -1214,7 +1214,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 
 	case EXT4_IOC_CLEAR_ES_CACHE:
 	{
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EACCES;
 		ext4_clear_inode_es(inode);
 		return 0;
@@ -1260,7 +1260,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 			return -EFAULT;
 
 		/* Make sure caller has proper permission */
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EACCES;
 
 		if (fa.fsx_xflags & ~EXT4_SUPPORTED_FS_XFLAGS)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ee861c6d9ff0..333442e96cc4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1955,7 +1955,7 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg)
 	u32 iflags;
 	int ret;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	if (get_user(fsflags, (int __user *)arg))
@@ -2002,7 +2002,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	int ret;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	if (!S_ISREG(inode->i_mode))
@@ -2069,7 +2069,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp)
 	struct inode *inode = file_inode(filp);
 	int ret;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	ret = mnt_want_write_file(filp);
@@ -2111,7 +2111,7 @@ static int f2fs_ioc_start_volatile_write(struct file *filp)
 	struct inode *inode = file_inode(filp);
 	int ret;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	if (!S_ISREG(inode->i_mode))
@@ -2146,7 +2146,7 @@ static int f2fs_ioc_release_volatile_write(struct file *filp)
 	struct inode *inode = file_inode(filp);
 	int ret;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	ret = mnt_want_write_file(filp);
@@ -2175,7 +2175,7 @@ static int f2fs_ioc_abort_volatile_write(struct file *filp)
 	struct inode *inode = file_inode(filp);
 	int ret;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	ret = mnt_want_write_file(filp);
@@ -3153,7 +3153,7 @@ static int f2fs_ioc_fssetxattr(struct file *filp, unsigned long arg)
 		return -EFAULT;
 
 	/* Make sure caller has proper permission */
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	if (fa.fsx_xflags & ~F2FS_SUPPORTED_XFLAGS)
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 8fa37d1434de..66b522e61e50 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -46,7 +46,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
 
 	nid_free = true;
 
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 
 	inode->i_ino = ino;
 	inode->i_blocks = 0;
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index 65afcc3cc68a..d772bf13a814 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -114,7 +114,7 @@ static int f2fs_xattr_advise_set(const struct xattr_handler *handler,
 	unsigned char old_advise = F2FS_I(inode)->i_advise;
 	unsigned char new_advise;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 	if (value == NULL)
 		return -EINVAL;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 19ac5baad50f..df091d435603 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -46,7 +46,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 
 	/* O_NOATIME can only be set by the owner or superuser */
 	if ((arg & O_NOATIME) && !(filp->f_flags & O_NOATIME))
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EPERM;
 
 	/* required for strict SunOS emulation */
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index b39b339feddc..1d994bdfffaa 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -238,7 +238,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask,
 		goto out;
 
 	error = -EACCES;
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		goto out;
 
 	error = 0;
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index e3da9e96b835..02d51cbcff04 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -376,7 +376,7 @@ struct inode *hfsplus_new_inode(struct super_block *sb, struct inode *dir,
 		return NULL;
 
 	inode->i_ino = sbi->next_cnid++;
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	set_nlink(inode, 1);
 	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
 
diff --git a/fs/hfsplus/ioctl.c b/fs/hfsplus/ioctl.c
index ce15b9496b77..3edb1926d127 100644
--- a/fs/hfsplus/ioctl.c
+++ b/fs/hfsplus/ioctl.c
@@ -91,7 +91,7 @@ static int hfsplus_ioctl_setflags(struct file *file, int __user *user_flags)
 	if (err)
 		goto out;
 
-	if (!inode_owner_or_capable(inode)) {
+	if (!inode_owner_or_capable(&init_user_ns, inode)) {
 		err = -EACCES;
 		goto out_drop_write;
 	}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index b5c109703daa..fed6ddfc3f3a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -836,7 +836,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 		struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
 
 		inode->i_ino = get_next_ino();
-		inode_init_owner(inode, dir, mode);
+		inode_init_owner(inode, &init_user_ns, dir, mode);
 		lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
 				&hugetlbfs_i_mmap_rwsem_key);
 		inode->i_mapping->a_ops = &hugetlbfs_aops;
diff --git a/fs/inode.c b/fs/inode.c
index 7a15372d9c2d..d6dfa876c58d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2132,13 +2132,14 @@ EXPORT_SYMBOL(init_special_inode);
 /**
  * inode_init_owner - Init uid,gid,mode for new inode according to posix standards
  * @inode: New inode
+ * @user_ns: User namespace the inode is accessed from
  * @dir: Directory inode
  * @mode: mode of the new inode
  */
-void inode_init_owner(struct inode *inode, const struct inode *dir,
-			umode_t mode)
+void inode_init_owner(struct inode *inode, struct user_namespace *user_ns,
+		      const struct inode *dir, umode_t mode)
 {
-	inode->i_uid = current_fsuid();
+	inode->i_uid = fsuid_into_mnt(user_ns);
 	if (dir && dir->i_mode & S_ISGID) {
 		inode->i_gid = dir->i_gid;
 
@@ -2146,31 +2147,35 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
 		if (S_ISDIR(mode))
 			mode |= S_ISGID;
 		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
-			 !in_group_p(inode->i_gid) &&
-			 !capable_wrt_inode_uidgid(&init_user_ns, dir, CAP_FSETID))
+			 !in_group_p(i_gid_into_mnt(user_ns, inode)) &&
+			 !capable_wrt_inode_uidgid(user_ns, dir, CAP_FSETID))
 			mode &= ~S_ISGID;
 	} else
-		inode->i_gid = current_fsgid();
+		inode->i_gid = fsgid_into_mnt(user_ns);
 	inode->i_mode = mode;
 }
 EXPORT_SYMBOL(inode_init_owner);
 
 /**
  * inode_owner_or_capable - check current task permissions to inode
+ * @user_ns: User namespace the inode is accessed from
  * @inode: inode being checked
  *
  * Return true if current either has CAP_FOWNER in a namespace with the
  * inode owner uid mapped, or owns the file.
  */
-bool inode_owner_or_capable(const struct inode *inode)
+bool inode_owner_or_capable(struct user_namespace *user_ns,
+			    const struct inode *inode)
 {
+	kuid_t i_uid;
 	struct user_namespace *ns;
 
-	if (uid_eq(current_fsuid(), inode->i_uid))
+	i_uid = i_uid_into_mnt(user_ns, inode);
+	if (uid_eq(current_fsuid(), i_uid))
 		return true;
 
 	ns = current_user_ns();
-	if (kuid_has_mapping(ns, inode->i_uid) && ns_capable(ns, CAP_FOWNER))
+	if (kuid_has_mapping(ns, i_uid) && ns_capable(ns, CAP_FOWNER))
 		return true;
 	return false;
 }
diff --git a/fs/jfs/ioctl.c b/fs/jfs/ioctl.c
index 10ee0ecca1a8..2581d4db58ff 100644
--- a/fs/jfs/ioctl.c
+++ b/fs/jfs/ioctl.c
@@ -76,7 +76,7 @@ long jfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (err)
 			return err;
 
-		if (!inode_owner_or_capable(inode)) {
+		if (!inode_owner_or_capable(&init_user_ns, inode)) {
 			err = -EACCES;
 			goto setflags_out;
 		}
diff --git a/fs/jfs/jfs_inode.c b/fs/jfs/jfs_inode.c
index 4cef170630db..282a785bbf29 100644
--- a/fs/jfs/jfs_inode.c
+++ b/fs/jfs/jfs_inode.c
@@ -64,7 +64,7 @@ struct inode *ialloc(struct inode *parent, umode_t mode)
 		goto fail_put;
 	}
 
-	inode_init_owner(inode, parent, mode);
+	inode_init_owner(inode, &init_user_ns, parent, mode);
 	/*
 	 * New inodes need to save sane values on disk when
 	 * uid & gid mount options are used
diff --git a/fs/minix/bitmap.c b/fs/minix/bitmap.c
index f4e5e5181a14..d99a78c83fbc 100644
--- a/fs/minix/bitmap.c
+++ b/fs/minix/bitmap.c
@@ -252,7 +252,7 @@ struct inode *minix_new_inode(const struct inode *dir, umode_t mode, int *error)
 		iput(inode);
 		return NULL;
 	}
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	inode->i_ino = j;
 	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
 	inode->i_blocks = 0;
diff --git a/fs/namei.c b/fs/namei.c
index 38222f92efb6..35952c28ee29 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1047,7 +1047,7 @@ int may_linkat(struct path *link)
 	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
 	 * otherwise, it must be a safe source.
 	 */
-	if (safe_hardlink_source(inode) || inode_owner_or_capable(inode))
+	if (safe_hardlink_source(inode) || inode_owner_or_capable(&init_user_ns, inode))
 		return 0;
 
 	audit_log_path_denied(AUDIT_ANOM_LINK, "linkat");
@@ -2897,7 +2897,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 	}
 
 	/* O_NOATIME can only be set by the owner or superuser */
-	if (flag & O_NOATIME && !inode_owner_or_capable(inode))
+	if (flag & O_NOATIME && !inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 
 	return 0;
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index b6517220cad5..d286c3bf7d43 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -348,7 +348,7 @@ struct inode *nilfs_new_inode(struct inode *dir, umode_t mode)
 	/* reference count of i_bh inherits from nilfs_mdt_read_block() */
 
 	atomic64_inc(&root->inodes_count);
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	inode->i_ino = ino;
 	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
 
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 07d26f61f22a..b053b40315bf 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -132,7 +132,7 @@ static int nilfs_ioctl_setflags(struct inode *inode, struct file *filp,
 	unsigned int flags, oldflags;
 	int ret;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	if (get_user(flags, (int __user *)argp))
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index 583820ec63e2..64491af88239 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -329,7 +329,7 @@ static struct inode *dlmfs_get_root_inode(struct super_block *sb)
 
 	if (inode) {
 		inode->i_ino = get_next_ino();
-		inode_init_owner(inode, NULL, mode);
+		inode_init_owner(inode, &init_user_ns, NULL, mode);
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 		inc_nlink(inode);
 
@@ -352,7 +352,7 @@ static struct inode *dlmfs_get_inode(struct inode *parent,
 		return NULL;
 
 	inode->i_ino = get_next_ino();
-	inode_init_owner(inode, parent, mode);
+	inode_init_owner(inode, &init_user_ns, parent, mode);
 	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 
 	ip = DLMFS_I(inode);
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index 89984172fc4a..50c9b30ee9f6 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -96,7 +96,7 @@ static int ocfs2_set_inode_attr(struct inode *inode, unsigned flags,
 	}
 
 	status = -EACCES;
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		goto bail_unlock;
 
 	if (!S_ISDIR(inode->i_mode))
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index c46bf7f581a1..51a80acbb97e 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -198,7 +198,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
 	 * callers. */
 	if (S_ISDIR(mode))
 		set_nlink(inode, 2);
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	status = dquot_initialize(inode);
 	if (status)
 		return ERR_PTR(status);
diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c
index ce93ccca8639..eed9e1273104 100644
--- a/fs/omfs/inode.c
+++ b/fs/omfs/inode.c
@@ -48,7 +48,7 @@ struct inode *omfs_new_inode(struct inode *dir, umode_t mode)
 		goto fail;
 
 	inode->i_ino = new_block;
-	inode_init_owner(inode, NULL, mode);
+	inode_init_owner(inode, &init_user_ns, NULL, mode);
 	inode->i_mapping->a_ops = &omfs_aops;
 
 	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 28a075b5f5b2..80b2fab73df7 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -636,7 +636,7 @@ static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
 	inode->i_state |= I_CREATING;
 	spin_unlock(&inode->i_lock);
 
-	inode_init_owner(inode, dentry->d_parent->d_inode, mode);
+	inode_init_owner(inode, &init_user_ns, dentry->d_parent->d_inode, mode);
 	attr.mode = inode->i_mode;
 
 	err = ovl_create_or_link(dentry, inode, &attr, false);
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index f966b5108358..d58b49a1ea3b 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -53,7 +53,7 @@ static struct file *ovl_open_realfile(const struct file *file,
 	err = inode_permission(&init_user_ns, realinode, MAY_OPEN | acc_mode);
 	if (err) {
 		realfile = ERR_PTR(err);
-	} else if (!inode_owner_or_capable(realinode)) {
+	} else if (!inode_owner_or_capable(&init_user_ns, realinode)) {
 		realfile = ERR_PTR(-EPERM);
 	} else {
 		realfile = open_with_fake_path(&file->f_path, flags, realinode,
@@ -582,7 +582,7 @@ static long ovl_ioctl_set_flags(struct file *file, unsigned int cmd,
 	struct inode *inode = file_inode(file);
 	unsigned int oldflags;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EACCES;
 
 	ret = mnt_want_write_file(file);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 196fe3e3f02b..82f2c35894e4 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -960,7 +960,7 @@ ovl_posix_acl_xattr_set(const struct xattr_handler *handler,
 		goto out_acl_release;
 	}
 	err = -EPERM;
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		goto out_acl_release;
 
 	posix_acl_release(acl);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index ff67da201303..0a1f4bccb5da 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -481,7 +481,7 @@ struct file *ovl_path_open(struct path *path, int flags)
 		return ERR_PTR(err);
 
 	/* O_NOATIME is an optimization, don't fail if not permitted */
-	if (inode_owner_or_capable(inode))
+	if (inode_owner_or_capable(&init_user_ns, inode))
 		flags |= O_NOATIME;
 
 	return dentry_open(path, flags, current_cred());
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 5b6296cc89c4..87b5ec67000b 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -874,7 +874,7 @@ set_posix_acl(struct inode *inode, int type, struct posix_acl *acl)
 
 	if (type == ACL_TYPE_DEFAULT && !S_ISDIR(inode->i_mode))
 		return acl ? -EACCES : 0;
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 
 	if (acl) {
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index ee179a81b3da..83641b9614bd 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -67,7 +67,7 @@ struct inode *ramfs_get_inode(struct super_block *sb,
 
 	if (inode) {
 		inode->i_ino = get_next_ino();
-		inode_init_owner(inode, dir, mode);
+		inode_init_owner(inode, &init_user_ns, dir, mode);
 		inode->i_mapping->a_ops = &ramfs_aops;
 		mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
 		mapping_set_unevictable(inode->i_mapping);
diff --git a/fs/reiserfs/ioctl.c b/fs/reiserfs/ioctl.c
index adb21bea3d60..4f1cbd930179 100644
--- a/fs/reiserfs/ioctl.c
+++ b/fs/reiserfs/ioctl.c
@@ -59,7 +59,7 @@ long reiserfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 			if (err)
 				break;
 
-			if (!inode_owner_or_capable(inode)) {
+			if (!inode_owner_or_capable(&init_user_ns, inode)) {
 				err = -EPERM;
 				goto setflags_out;
 			}
@@ -101,7 +101,7 @@ long reiserfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		err = put_user(inode->i_generation, (int __user *)arg);
 		break;
 	case REISERFS_IOC_SETVERSION:
-		if (!inode_owner_or_capable(inode)) {
+		if (!inode_owner_or_capable(&init_user_ns, inode)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
index 1594687582f0..6e43aec49b43 100644
--- a/fs/reiserfs/namei.c
+++ b/fs/reiserfs/namei.c
@@ -615,7 +615,7 @@ static int new_inode_init(struct inode *inode, struct inode *dir, umode_t mode)
 	 * the quota init calls have to know who to charge the quota to, so
 	 * we have to set uid and gid here
 	 */
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	return dquot_initialize(inode);
 }
 
diff --git a/fs/sysv/ialloc.c b/fs/sysv/ialloc.c
index 6c9801986af6..96288d35dcb9 100644
--- a/fs/sysv/ialloc.c
+++ b/fs/sysv/ialloc.c
@@ -163,7 +163,7 @@ struct inode * sysv_new_inode(const struct inode * dir, umode_t mode)
 	*sbi->s_sb_fic_count = cpu_to_fs16(sbi, count);
 	fs16_add(sbi, sbi->s_sb_total_free_inodes, -1);
 	dirty_sb(sb);
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	inode->i_ino = fs16_to_cpu(sbi, ino);
 	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
 	inode->i_blocks = 0;
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 155521e51ac5..1639331f9543 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -94,7 +94,7 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, struct inode *dir,
 	 */
 	inode->i_flags |= S_NOCMTIME;
 
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	inode->i_mtime = inode->i_atime = inode->i_ctime =
 			 current_time(inode);
 	inode->i_mapping->nrpages = 0;
diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index 4363d85a3fd4..2326d5122beb 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -155,7 +155,7 @@ long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		if (IS_RDONLY(inode))
 			return -EROFS;
 
-		if (!inode_owner_or_capable(inode))
+		if (!inode_owner_or_capable(&init_user_ns, inode))
 			return -EACCES;
 
 		if (get_user(flags, (int __user *) arg))
diff --git a/fs/udf/ialloc.c b/fs/udf/ialloc.c
index 84ed23edebfd..e2d07cc1d3c3 100644
--- a/fs/udf/ialloc.c
+++ b/fs/udf/ialloc.c
@@ -103,7 +103,7 @@ struct inode *udf_new_inode(struct inode *dir, umode_t mode)
 		mutex_unlock(&sbi->s_alloc_mutex);
 	}
 
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UID_SET))
 		inode->i_uid = sbi->s_uid;
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_GID_SET))
diff --git a/fs/ufs/ialloc.c b/fs/ufs/ialloc.c
index 969fd60436d3..a04c6ea490a0 100644
--- a/fs/ufs/ialloc.c
+++ b/fs/ufs/ialloc.c
@@ -289,7 +289,7 @@ struct inode *ufs_new_inode(struct inode *dir, umode_t mode)
 	ufs_mark_sb_dirty(sb);
 
 	inode->i_ino = cg * uspi->s_ipg + bit;
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 	inode->i_blocks = 0;
 	inode->i_generation = 0;
 	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
diff --git a/fs/xattr.c b/fs/xattr.c
index 61a9947f62f4..fcc79c2a1ea1 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -127,7 +127,7 @@ xattr_permission(struct inode *inode, const char *name, int mask)
 		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
 			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
 		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
-		    (mask & MAY_WRITE) && !inode_owner_or_capable(inode))
+		    (mask & MAY_WRITE) && !inode_owner_or_capable(&init_user_ns, inode))
 			return -EPERM;
 	}
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 97bd29fc8c43..218e80afc859 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1300,7 +1300,7 @@ xfs_ioctl_setattr_get_trans(
 	 * The user ID of the calling process must be equal to the file owner
 	 * ID, except in cases where the CAP_FSETID capability is applicable.
 	 */
-	if (!inode_owner_or_capable(VFS_I(ip))) {
+	if (!inode_owner_or_capable(&init_user_ns, VFS_I(ip))) {
 		error = -EPERM;
 		goto out_cancel;
 	}
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index ff5930be096c..5021a41e880c 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1221,7 +1221,7 @@ static void zonefs_init_dir_inode(struct inode *parent, struct inode *inode,
 	struct super_block *sb = parent->i_sb;
 
 	inode->i_ino = blkdev_nr_zones(sb->s_bdev->bd_disk) + type + 1;
-	inode_init_owner(inode, parent, S_IFDIR | 0555);
+	inode_init_owner(inode, &init_user_ns, parent, S_IFDIR | 0555);
 	inode->i_op = &zonefs_dir_inode_operations;
 	inode->i_fop = &simple_dir_operations;
 	set_nlink(inode, 2);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8f6fb065450b..a5845a67a34b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1744,7 +1744,7 @@ static inline int sb_start_intwrite_trylock(struct super_block *sb)
 }
 
 
-extern bool inode_owner_or_capable(const struct inode *inode);
+extern bool inode_owner_or_capable(struct user_namespace *user_ns, const struct inode *inode);
 
 /*
  * VFS helper functions..
@@ -1786,8 +1786,9 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
 /*
  * VFS file helper functions.
  */
-extern void inode_init_owner(struct inode *inode, const struct inode *dir,
-			umode_t mode);
+extern void inode_init_owner(struct inode *inode,
+			     struct user_namespace *user_ns,
+			     const struct inode *dir, umode_t mode);
 extern bool may_open_dev(const struct path *path);
 
 /*
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index f1c393e5d47d..cfd2e0868f2d 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -122,7 +122,7 @@ static struct inode *bpf_get_inode(struct super_block *sb,
 	inode->i_mtime = inode->i_atime;
 	inode->i_ctime = inode->i_atime;
 
-	inode_init_owner(inode, dir, mode);
+	inode_init_owner(inode, &init_user_ns, dir, mode);
 
 	return inode;
 }
diff --git a/mm/madvise.c b/mm/madvise.c
index 8afabc363b6b..a3ab05c08c28 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -539,7 +539,7 @@ static inline bool can_do_pageout(struct vm_area_struct *vma)
 	 * otherwise we'd be including shared non-exclusive mappings, which
 	 * opens a side channel.
 	 */
-	return inode_owner_or_capable(file_inode(vma->vm_file)) ||
+	return inode_owner_or_capable(&init_user_ns, file_inode(vma->vm_file)) ||
 		inode_permission(&init_user_ns, file_inode(vma->vm_file), MAY_WRITE) == 0;
 }
 
diff --git a/mm/mincore.c b/mm/mincore.c
index d5a58e61eac6..ad2dfb7a4500 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -166,7 +166,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma)
 	 * for writing; otherwise we'd be including shared non-exclusive
 	 * mappings, which opens a side channel.
 	 */
-	return inode_owner_or_capable(file_inode(vma->vm_file)) ||
+	return inode_owner_or_capable(&init_user_ns, file_inode(vma->vm_file)) ||
 		inode_permission(&init_user_ns, file_inode(vma->vm_file), MAY_WRITE) == 0;
 }
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 537c137698f8..1bd6a9487222 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2303,7 +2303,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 	inode = new_inode(sb);
 	if (inode) {
 		inode->i_ino = ino;
-		inode_init_owner(inode, dir, mode);
+		inode_init_owner(inode, &init_user_ns, dir, mode);
 		inode->i_blocks = 0;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 		inode->i_generation = prandom_u32();
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6b1826fc3658..14a195fa55eb 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3133,13 +3133,13 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 	}
 
 	if (!selinux_initialized(&selinux_state))
-		return (inode_owner_or_capable(inode) ? 0 : -EPERM);
+		return (inode_owner_or_capable(&init_user_ns, inode) ? 0 : -EPERM);
 
 	sbsec = inode->i_sb->s_security;
 	if (!(sbsec->flags & SBLABEL_MNT))
 		return -EOPNOTSUPP;
 
-	if (!inode_owner_or_capable(inode))
+	if (!inode_owner_or_capable(&init_user_ns, inode))
 		return -EPERM;
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 11/39] attr: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (9 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 10/39] inode: add idmapped mount aware init and " Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 12/39] acl: " Christian Brauner
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

When file attributes are changed filesystems mostly rely on the
setattr_prepare(), setattr_copy(), and notify_change() helpers for
initialization and permission checking. Let them handle idmapped mounts. If
the inode is accessed through an idmapped mount we need to map it according
to the mount's user namespace. Afterwards the checks are identical to
non-idmapped mounts. If the initial user namespace is passed all operations
are a nop so non-idmapped mounts will not see a change in behavior and will
also not see any performance impact. Helpers that perform checks on the
ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are
intended values and so they won't be mapped according to the mount's user
namespace. This is more transparent to the caller.
If the initial user namespace is passed all operations are a nop so
non-idmapped mounts will not see a change in behavior and will not see any
performance impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 arch/powerpc/platforms/cell/spufs/inode.c |  2 +-
 drivers/base/devtmpfs.c                   |  4 +-
 fs/9p/vfs_inode.c                         |  4 +-
 fs/9p/vfs_inode_dotl.c                    |  4 +-
 fs/adfs/inode.c                           |  2 +-
 fs/affs/inode.c                           |  4 +-
 fs/attr.c                                 | 71 ++++++++++++++---------
 fs/btrfs/inode.c                          |  4 +-
 fs/cachefiles/interface.c                 |  4 +-
 fs/ceph/inode.c                           |  2 +-
 fs/cifs/inode.c                           |  8 +--
 fs/ecryptfs/inode.c                       |  6 +-
 fs/exfat/file.c                           |  4 +-
 fs/ext2/inode.c                           |  4 +-
 fs/ext4/inode.c                           |  4 +-
 fs/f2fs/file.c                            |  2 +-
 fs/fat/file.c                             |  4 +-
 fs/fuse/dir.c                             |  2 +-
 fs/gfs2/inode.c                           |  4 +-
 fs/hfs/inode.c                            |  4 +-
 fs/hfsplus/inode.c                        |  4 +-
 fs/hostfs/hostfs_kern.c                   |  4 +-
 fs/hpfs/inode.c                           |  4 +-
 fs/hugetlbfs/inode.c                      |  4 +-
 fs/inode.c                                |  2 +-
 fs/jffs2/fs.c                             |  2 +-
 fs/jfs/file.c                             |  4 +-
 fs/kernfs/inode.c                         |  4 +-
 fs/libfs.c                                |  4 +-
 fs/minix/file.c                           |  4 +-
 fs/nfsd/nfsproc.c                         |  2 +-
 fs/nfsd/vfs.c                             |  4 +-
 fs/nilfs2/inode.c                         |  4 +-
 fs/ntfs/inode.c                           |  2 +-
 fs/ocfs2/dlmfs/dlmfs.c                    |  4 +-
 fs/ocfs2/file.c                           |  4 +-
 fs/omfs/file.c                            |  4 +-
 fs/open.c                                 |  6 +-
 fs/orangefs/inode.c                       |  4 +-
 fs/overlayfs/copy_up.c                    |  8 +--
 fs/overlayfs/dir.c                        |  2 +-
 fs/overlayfs/inode.c                      |  4 +-
 fs/overlayfs/super.c                      |  2 +-
 fs/proc/base.c                            |  4 +-
 fs/proc/generic.c                         |  4 +-
 fs/proc/proc_sysctl.c                     |  4 +-
 fs/ramfs/file-nommu.c                     |  4 +-
 fs/reiserfs/inode.c                       |  4 +-
 fs/sysv/file.c                            |  4 +-
 fs/ubifs/file.c                           |  2 +-
 fs/udf/file.c                             |  4 +-
 fs/ufs/inode.c                            |  4 +-
 fs/utimes.c                               |  2 +-
 fs/xfs/xfs_iops.c                         |  2 +-
 fs/zonefs/super.c                         |  4 +-
 include/linux/fs.h                        |  8 ++-
 mm/shmem.c                                |  4 +-
 57 files changed, 151 insertions(+), 132 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 25390569e24c..3de526eb2275 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -98,7 +98,7 @@ spufs_setattr(struct dentry *dentry, struct iattr *attr)
 	if ((attr->ia_valid & ATTR_SIZE) &&
 	    (attr->ia_size != inode->i_size))
 		return -EINVAL;
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index eac184e6d657..2e0c3cdb4184 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -221,7 +221,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
 		newattrs.ia_gid = gid;
 		newattrs.ia_valid = ATTR_MODE|ATTR_UID|ATTR_GID;
 		inode_lock(d_inode(dentry));
-		notify_change(dentry, &newattrs, NULL);
+		notify_change(&init_user_ns, dentry, &newattrs, NULL);
 		inode_unlock(d_inode(dentry));
 
 		/* mark as kernel-created inode */
@@ -328,7 +328,7 @@ static int handle_remove(const char *nodename, struct device *dev)
 			newattrs.ia_valid =
 				ATTR_UID|ATTR_GID|ATTR_MODE;
 			inode_lock(d_inode(dentry));
-			notify_change(dentry, &newattrs, NULL);
+			notify_change(&init_user_ns, dentry, &newattrs, NULL);
 			inode_unlock(d_inode(dentry));
 			err = vfs_unlink(d_inode(parent.dentry), dentry, NULL);
 			if (!err || err == -ENOENT)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index f058e89df30f..404526499c94 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1040,7 +1040,7 @@ static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	struct p9_wstat wstat;
 
 	p9_debug(P9_DEBUG_VFS, "\n");
-	retval = setattr_prepare(dentry, iattr);
+	retval = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (retval)
 		return retval;
 
@@ -1090,7 +1090,7 @@ static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
 
 	v9fs_invalidate_inode_attr(d_inode(dentry));
 
-	setattr_copy(d_inode(dentry), iattr);
+	setattr_copy(&init_user_ns, d_inode(dentry), iattr);
 	mark_inode_dirty(d_inode(dentry));
 	return 0;
 }
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 0028eccb665a..282ec5cb45dc 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -546,7 +546,7 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
 
 	p9_debug(P9_DEBUG_VFS, "\n");
 
-	retval = setattr_prepare(dentry, iattr);
+	retval = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (retval)
 		return retval;
 
@@ -582,7 +582,7 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
 		truncate_setsize(inode, iattr->ia_size);
 
 	v9fs_invalidate_inode_attr(inode);
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 	mark_inode_dirty(inode);
 	if (iattr->ia_valid & ATTR_MODE) {
 		/* We also want to update ACL when we update mode bits */
diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c
index 32620f4a7623..278dcee6ae22 100644
--- a/fs/adfs/inode.c
+++ b/fs/adfs/inode.c
@@ -299,7 +299,7 @@ adfs_notify_change(struct dentry *dentry, struct iattr *attr)
 	unsigned int ia_valid = attr->ia_valid;
 	int error;
 	
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 
 	/*
 	 * we can't change the UID or GID of any file -
diff --git a/fs/affs/inode.c b/fs/affs/inode.c
index 044412110b52..767e5bdfb703 100644
--- a/fs/affs/inode.c
+++ b/fs/affs/inode.c
@@ -223,7 +223,7 @@ affs_notify_change(struct dentry *dentry, struct iattr *attr)
 
 	pr_debug("notify_change(%lu,0x%x)\n", inode->i_ino, attr->ia_valid);
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		goto out;
 
@@ -249,7 +249,7 @@ affs_notify_change(struct dentry *dentry, struct iattr *attr)
 		affs_truncate(inode);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 
 	if (attr->ia_valid & ATTR_MODE)
diff --git a/fs/attr.c b/fs/attr.c
index 00ae0b000146..4b36440236d4 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -18,27 +18,31 @@
 #include <linux/evm.h>
 #include <linux/ima.h>
 
-static bool chown_ok(const struct inode *inode, kuid_t uid)
+static bool chown_ok(struct user_namespace *user_ns,
+		     const struct inode *inode,
+		     kuid_t uid)
 {
-	if (uid_eq(current_fsuid(), inode->i_uid) &&
-	    uid_eq(uid, inode->i_uid))
+	kuid_t kuid = i_uid_into_mnt(user_ns, inode);
+	if (uid_eq(current_fsuid(), kuid) && uid_eq(uid, kuid))
 		return true;
-	if (capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_CHOWN))
+	if (capable_wrt_inode_uidgid(user_ns, inode, CAP_CHOWN))
 		return true;
-	if (uid_eq(inode->i_uid, INVALID_UID) &&
+	if (uid_eq(kuid, INVALID_UID) &&
 	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
 		return true;
 	return false;
 }
 
-static bool chgrp_ok(const struct inode *inode, kgid_t gid)
+static bool chgrp_ok(struct user_namespace *user_ns,
+		     const struct inode *inode, kgid_t gid)
 {
-	if (uid_eq(current_fsuid(), inode->i_uid) &&
-	    (in_group_p(gid) || gid_eq(gid, inode->i_gid)))
+	kgid_t kgid = i_gid_into_mnt(user_ns, inode);
+	if (uid_eq(current_fsuid(), i_uid_into_mnt(user_ns, inode)) &&
+	    (in_group_p(gid) || gid_eq(gid, kgid)))
 		return true;
-	if (capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_CHOWN))
+	if (capable_wrt_inode_uidgid(user_ns, inode, CAP_CHOWN))
 		return true;
-	if (gid_eq(inode->i_gid, INVALID_GID) &&
+	if (gid_eq(kgid, INVALID_GID) &&
 	    ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN))
 		return true;
 	return false;
@@ -46,6 +50,7 @@ static bool chgrp_ok(const struct inode *inode, kgid_t gid)
 
 /**
  * setattr_prepare - check if attribute changes to a dentry are allowed
+ * @user_ns:	user namespace of the mount
  * @dentry:	dentry to check
  * @attr:	attributes to change
  *
@@ -58,7 +63,8 @@ static bool chgrp_ok(const struct inode *inode, kgid_t gid)
  * Should be called as the first thing in ->setattr implementations,
  * possibly after taking additional locks.
  */
-int setattr_prepare(struct dentry *dentry, struct iattr *attr)
+int setattr_prepare(struct user_namespace *user_ns, struct dentry *dentry,
+		    struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	unsigned int ia_valid = attr->ia_valid;
@@ -78,27 +84,27 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
 		goto kill_priv;
 
 	/* Make sure a caller can chown. */
-	if ((ia_valid & ATTR_UID) && !chown_ok(inode, attr->ia_uid))
+	if ((ia_valid & ATTR_UID) && !chown_ok(user_ns, inode, attr->ia_uid))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
-	if ((ia_valid & ATTR_GID) && !chgrp_ok(inode, attr->ia_gid))
+	if ((ia_valid & ATTR_GID) && !chgrp_ok(user_ns, inode, attr->ia_gid))
 		return -EPERM;
 
 	/* Make sure a caller can chmod. */
 	if (ia_valid & ATTR_MODE) {
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EPERM;
 		/* Also check the setgid bit! */
-		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
-				inode->i_gid) &&
-		    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID))
+               if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
+                                i_gid_into_mnt(user_ns, inode)) &&
+                    !capable_wrt_inode_uidgid(user_ns, inode, CAP_FSETID))
 			attr->ia_mode &= ~S_ISGID;
 	}
 
 	/* Check for setting the inode time. */
 	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EPERM;
 	}
 
@@ -162,20 +168,27 @@ EXPORT_SYMBOL(inode_newsize_ok);
 
 /**
  * setattr_copy - copy simple metadata updates into the generic inode
+ * @user_ns:	the user namespace the inode is accessed from
  * @inode:	the inode to be updated
  * @attr:	the new attributes
  *
  * setattr_copy must be called with i_mutex held.
  *
  * setattr_copy updates the inode's metadata with that specified
- * in attr. Noticeably missing is inode size update, which is more complex
+ * in attr on idmapped mounts. If file ownership is changed setattr_copy
+ * doesn't map ia_uid and ia_gid. It will asssume the caller has already
+ * provided the intended values. Necessary permission checks to determine
+ * whether or not the S_ISGID property needs to be removed are performed with
+ * the correct idmapped mount permission helpers.
+ * Noticeably missing is inode size update, which is more complex
  * as it requires pagecache updates.
  *
  * The inode is not marked as dirty after this operation. The rationale is
  * that for "simple" filesystems, the struct inode is the inode storage.
  * The caller is free to mark the inode dirty afterwards if needed.
  */
-void setattr_copy(struct inode *inode, const struct iattr *attr)
+void setattr_copy(struct user_namespace *user_ns, struct inode *inode,
+		  const struct iattr *attr)
 {
 	unsigned int ia_valid = attr->ia_valid;
 
@@ -191,9 +204,9 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
 		inode->i_ctime = attr->ia_ctime;
 	if (ia_valid & ATTR_MODE) {
 		umode_t mode = attr->ia_mode;
-
-		if (!in_group_p(inode->i_gid) &&
-		    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID))
+		kgid_t kgid = i_gid_into_mnt(user_ns, inode);
+		if (!in_group_p(kgid) &&
+		    !capable_wrt_inode_uidgid(user_ns, inode, CAP_FSETID))
 			mode &= ~S_ISGID;
 		inode->i_mode = mode;
 	}
@@ -202,6 +215,7 @@ EXPORT_SYMBOL(setattr_copy);
 
 /**
  * notify_change - modify attributes of a filesytem object
+ * @user_ns:	the user namespace of the mount
  * @dentry:	object affected
  * @attr:	new attributes
  * @delegated_inode: returns inode, if the inode is delegated
@@ -214,13 +228,17 @@ EXPORT_SYMBOL(setattr_copy);
  * retry.  Because breaking a delegation may take a long time, the
  * caller should drop the i_mutex before doing so.
  *
+ * If file ownership is changed notify_change() doesn't map ia_uid and
+ * ia_gid. It will asssume the caller has already provided the intended values.
+ *
  * Alternatively, a caller may pass NULL for delegated_inode.  This may
  * be appropriate for callers that expect the underlying filesystem not
  * to be NFS exported.  Also, passing NULL is fine for callers holding
  * the file open for write, as there can be no conflicting delegation in
  * that case.
  */
-int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
+int notify_change(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *attr, struct inode **delegated_inode)
 {
 	struct inode *inode = dentry->d_inode;
 	umode_t mode = inode->i_mode;
@@ -243,9 +261,8 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de
 		if (IS_IMMUTABLE(inode))
 			return -EPERM;
 
-		if (!inode_owner_or_capable(&init_user_ns, inode)) {
-			error = inode_permission(&init_user_ns, inode,
-						 MAY_WRITE);
+		if (!inode_owner_or_capable(user_ns, inode)) {
+			error = inode_permission(user_ns, inode, MAY_WRITE);
 			if (error)
 				return error;
 		}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ed1a5bf5f068..e65c54b7e828 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4875,7 +4875,7 @@ static int btrfs_setattr(struct dentry *dentry, struct iattr *attr)
 	if (btrfs_root_readonly(root))
 		return -EROFS;
 
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err)
 		return err;
 
@@ -4886,7 +4886,7 @@ static int btrfs_setattr(struct dentry *dentry, struct iattr *attr)
 	}
 
 	if (attr->ia_valid) {
-		setattr_copy(inode, attr);
+		setattr_copy(&init_user_ns, inode, attr);
 		inode_inc_iversion(inode);
 		err = btrfs_dirty_inode(inode);
 
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 4cea5fbf695e..5efa6a3702c0 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -470,14 +470,14 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
 		_debug("discard tail %llx", oi_size);
 		newattrs.ia_valid = ATTR_SIZE;
 		newattrs.ia_size = oi_size & PAGE_MASK;
-		ret = notify_change(object->backer, &newattrs, NULL);
+		ret = notify_change(&init_user_ns, object->backer, &newattrs, NULL);
 		if (ret < 0)
 			goto truncate_failed;
 	}
 
 	newattrs.ia_valid = ATTR_SIZE;
 	newattrs.ia_size = ni_size;
-	ret = notify_change(object->backer, &newattrs, NULL);
+	ret = notify_change(&init_user_ns, object->backer, &newattrs, NULL);
 
 truncate_failed:
 	inode_unlock(d_inode(object->backer));
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index abfe42df8a1a..06645c2efa6f 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2251,7 +2251,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr)
 	if (ceph_snap(inode) != CEPH_NOSNAP)
 		return -EROFS;
 
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err != 0)
 		return err;
 
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 9ee5f304592f..4d69b786e403 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -2602,7 +2602,7 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs)
 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM)
 		attrs->ia_valid |= ATTR_FORCE;
 
-	rc = setattr_prepare(direntry, attrs);
+	rc = setattr_prepare(&init_user_ns, direntry, attrs);
 	if (rc < 0)
 		goto out;
 
@@ -2707,7 +2707,7 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs)
 	    attrs->ia_size != i_size_read(inode))
 		truncate_setsize(inode, attrs->ia_size);
 
-	setattr_copy(inode, attrs);
+	setattr_copy(&init_user_ns, inode, attrs);
 	mark_inode_dirty(inode);
 
 	/* force revalidate when any of these times are set since some
@@ -2749,7 +2749,7 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM)
 		attrs->ia_valid |= ATTR_FORCE;
 
-	rc = setattr_prepare(direntry, attrs);
+	rc = setattr_prepare(&init_user_ns, direntry, attrs);
 	if (rc < 0) {
 		free_xid(xid);
 		return rc;
@@ -2897,7 +2897,7 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
 	    attrs->ia_size != i_size_read(inode))
 		truncate_setsize(inode, attrs->ia_size);
 
-	setattr_copy(inode, attrs);
+	setattr_copy(&init_user_ns, inode, attrs);
 	mark_inode_dirty(inode);
 
 cifs_setattr_exit:
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 9b1ae410983c..2454aef02a68 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -855,7 +855,7 @@ int ecryptfs_truncate(struct dentry *dentry, loff_t new_length)
 		struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
 
 		inode_lock(d_inode(lower_dentry));
-		rc = notify_change(lower_dentry, &lower_ia, NULL);
+		rc = notify_change(&init_user_ns, lower_dentry, &lower_ia, NULL);
 		inode_unlock(d_inode(lower_dentry));
 	}
 	return rc;
@@ -933,7 +933,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
 	}
 	mutex_unlock(&crypt_stat->cs_mutex);
 
-	rc = setattr_prepare(dentry, ia);
+	rc = setattr_prepare(&init_user_ns, dentry, ia);
 	if (rc)
 		goto out;
 	if (ia->ia_valid & ATTR_SIZE) {
@@ -959,7 +959,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
 		lower_ia.ia_valid &= ~ATTR_MODE;
 
 	inode_lock(d_inode(lower_dentry));
-	rc = notify_change(lower_dentry, &lower_ia, NULL);
+	rc = notify_change(&init_user_ns, lower_dentry, &lower_ia, NULL);
 	inode_unlock(d_inode(lower_dentry));
 out:
 	fsstack_copy_attr_all(inode, lower_inode);
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index a92478eabfa4..ace35aa8e64b 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -305,7 +305,7 @@ int exfat_setattr(struct dentry *dentry, struct iattr *attr)
 				ATTR_TIMES_SET);
 	}
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	attr->ia_valid = ia_valid;
 	if (error)
 		goto out;
@@ -340,7 +340,7 @@ int exfat_setattr(struct dentry *dentry, struct iattr *attr)
 		up_write(&EXFAT_I(inode)->truncate_lock);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	exfat_truncate_atime(&inode->i_atime);
 	mark_inode_dirty(inode);
 
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 11c5c6fe75bb..8eccb67d04b2 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1668,7 +1668,7 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
 	struct inode *inode = d_inode(dentry);
 	int error;
 
-	error = setattr_prepare(dentry, iattr);
+	error = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (error)
 		return error;
 
@@ -1688,7 +1688,7 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
 		if (error)
 			return error;
 	}
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 	if (iattr->ia_valid & ATTR_MODE)
 		error = posix_acl_chmod(inode, inode->i_mode);
 	mark_inode_dirty(inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b96a18679a27..554ffd367b44 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5323,7 +5323,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 				  ATTR_GID | ATTR_TIMES_SET))))
 		return -EPERM;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -5498,7 +5498,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 	}
 
 	if (!error) {
-		setattr_copy(inode, attr);
+		setattr_copy(&init_user_ns, inode, attr);
 		mark_inode_dirty(inode);
 	}
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 333442e96cc4..39bc0c77a895 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -863,7 +863,7 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
 		!f2fs_is_compress_backend_ready(inode))
 		return -EOPNOTSUPP;
 
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err)
 		return err;
 
diff --git a/fs/fat/file.c b/fs/fat/file.c
index f9ee27cf4d7c..805b501467e9 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -480,7 +480,7 @@ int fat_setattr(struct dentry *dentry, struct iattr *attr)
 			attr->ia_valid &= ~TIMES_SET_FLAGS;
 	}
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	attr->ia_valid = ia_valid;
 	if (error) {
 		if (sbi->options.quiet)
@@ -550,7 +550,7 @@ int fat_setattr(struct dentry *dentry, struct iattr *attr)
 		fat_truncate_time(inode, &attr->ia_mtime, S_MTIME);
 	attr->ia_valid &= ~(ATTR_ATIME|ATTR_CTIME|ATTR_MTIME);
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 out:
 	return error;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 7ce7baed3f4f..dc6e8cc565d2 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1584,7 +1584,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 	if (!fc->default_permissions)
 		attr->ia_valid |= ATTR_FORCE;
 
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err)
 		return err;
 
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 5d6ffa898030..e6f481ff6f8e 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1859,7 +1859,7 @@ int gfs2_permission(struct inode *inode, int mask)
 
 static int __gfs2_setattr_simple(struct inode *inode, struct iattr *attr)
 {
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
@@ -1980,7 +1980,7 @@ static int gfs2_setattr(struct dentry *dentry, struct iattr *attr)
 	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
 		goto error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		goto error;
 
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index f35a37c65e5f..c646218b72bf 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -608,7 +608,7 @@ int hfs_inode_setattr(struct dentry *dentry, struct iattr * attr)
 	struct hfs_sb_info *hsb = HFS_SB(inode->i_sb);
 	int error;
 
-	error = setattr_prepare(dentry, attr); /* basic permission checks */
+	error = setattr_prepare(&init_user_ns, dentry, attr); /* basic permission checks */
 	if (error)
 		return error;
 
@@ -647,7 +647,7 @@ int hfs_inode_setattr(struct dentry *dentry, struct iattr * attr)
 						  current_time(inode);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index 02d51cbcff04..a2789730f451 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -246,7 +246,7 @@ static int hfsplus_setattr(struct dentry *dentry, struct iattr *attr)
 	struct inode *inode = d_inode(dentry);
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -264,7 +264,7 @@ static int hfsplus_setattr(struct dentry *dentry, struct iattr *attr)
 		inode->i_mtime = inode->i_ctime = current_time(inode);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 
 	return 0;
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 36da0c31d053..a00899c8fd52 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -792,7 +792,7 @@ static int hostfs_setattr(struct dentry *dentry, struct iattr *attr)
 
 	int fd = HOSTFS_I(inode)->fd;
 
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err)
 		return err;
 
@@ -849,7 +849,7 @@ static int hostfs_setattr(struct dentry *dentry, struct iattr *attr)
 	    attr->ia_size != i_size_read(inode))
 		truncate_setsize(inode, attr->ia_size);
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/hpfs/inode.c b/fs/hpfs/inode.c
index eb8b4baf0f2e..8ba2152a78ba 100644
--- a/fs/hpfs/inode.c
+++ b/fs/hpfs/inode.c
@@ -274,7 +274,7 @@ int hpfs_setattr(struct dentry *dentry, struct iattr *attr)
 	if ((attr->ia_valid & ATTR_SIZE) && attr->ia_size > inode->i_size)
 		goto out_unlock;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		goto out_unlock;
 
@@ -288,7 +288,7 @@ int hpfs_setattr(struct dentry *dentry, struct iattr *attr)
 		hpfs_truncate(inode);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 
 	hpfs_write_inode(inode);
 
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index fed6ddfc3f3a..53d57d852073 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -761,7 +761,7 @@ static int hugetlbfs_setattr(struct dentry *dentry, struct iattr *attr)
 
 	BUG_ON(!inode);
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -780,7 +780,7 @@ static int hugetlbfs_setattr(struct dentry *dentry, struct iattr *attr)
 			return error;
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/inode.c b/fs/inode.c
index d6dfa876c58d..66d3f7397d86 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1913,7 +1913,7 @@ static int __remove_privs(struct dentry *dentry, int kill)
 	 * Note we call this on write, so notify_change will not
 	 * encounter any conflicting delegations:
 	 */
-	return notify_change(dentry, &newattrs, NULL);
+	return notify_change(&init_user_ns, dentry, &newattrs, NULL);
 }
 
 /*
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index 78858f6e9583..67993808f4da 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -195,7 +195,7 @@ int jffs2_setattr(struct dentry *dentry, struct iattr *iattr)
 	struct inode *inode = d_inode(dentry);
 	int rc;
 
-	rc = setattr_prepare(dentry, iattr);
+	rc = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (rc)
 		return rc;
 
diff --git a/fs/jfs/file.c b/fs/jfs/file.c
index 930d2701f206..ff49876e9c9b 100644
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@@ -90,7 +90,7 @@ int jfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	struct inode *inode = d_inode(dentry);
 	int rc;
 
-	rc = setattr_prepare(dentry, iattr);
+	rc = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (rc)
 		return rc;
 
@@ -118,7 +118,7 @@ int jfs_setattr(struct dentry *dentry, struct iattr *iattr)
 		jfs_truncate(inode);
 	}
 
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 	mark_inode_dirty(inode);
 
 	if (iattr->ia_valid & ATTR_MODE)
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index ff5598cc1de0..86bd4c593b78 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -122,7 +122,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr)
 		return -EINVAL;
 
 	mutex_lock(&kernfs_mutex);
-	error = setattr_prepare(dentry, iattr);
+	error = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (error)
 		goto out;
 
@@ -131,7 +131,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr)
 		goto out;
 
 	/* this ignores size changes */
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 
 out:
 	mutex_unlock(&kernfs_mutex);
diff --git a/fs/libfs.c b/fs/libfs.c
index 23d0a00668fd..f50446576b10 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -497,13 +497,13 @@ int simple_setattr(struct dentry *dentry, struct iattr *iattr)
 	struct inode *inode = d_inode(dentry);
 	int error;
 
-	error = setattr_prepare(dentry, iattr);
+	error = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (error)
 		return error;
 
 	if (iattr->ia_valid & ATTR_SIZE)
 		truncate_setsize(inode, iattr->ia_size);
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/minix/file.c b/fs/minix/file.c
index c50b0a20fcd9..f07acd268577 100644
--- a/fs/minix/file.c
+++ b/fs/minix/file.c
@@ -27,7 +27,7 @@ static int minix_setattr(struct dentry *dentry, struct iattr *attr)
 	struct inode *inode = d_inode(dentry);
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -41,7 +41,7 @@ static int minix_setattr(struct dentry *dentry, struct iattr *attr)
 		minix_truncate(inode);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 0d71549f9d42..aaa1dd09e243 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -90,7 +90,7 @@ nfsd_proc_setattr(struct svc_rqst *rqstp)
 		if (delta < 0)
 			delta = -delta;
 		if (delta < MAX_TOUCH_TIME_ERROR &&
-		    setattr_prepare(fhp->fh_dentry, iap) != 0) {
+		    setattr_prepare(&init_user_ns, fhp->fh_dentry, iap) != 0) {
 			/*
 			 * Turn off ATTR_[AM]TIME_SET but leave ATTR_[AM]TIME.
 			 * This will cause notify_change to set these times
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 3cad79c3b441..e9b1452b505e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -448,7 +448,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 			.ia_size	= iap->ia_size,
 		};
 
-		host_err = notify_change(dentry, &size_attr, NULL);
+		host_err = notify_change(&init_user_ns, dentry, &size_attr, NULL);
 		if (host_err)
 			goto out_unlock;
 		iap->ia_valid &= ~ATTR_SIZE;
@@ -463,7 +463,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 	}
 
 	iap->ia_valid |= ATTR_CTIME;
-	host_err = notify_change(dentry, iap, NULL);
+	host_err = notify_change(&init_user_ns, dentry, iap, NULL);
 
 out_unlock:
 	fh_unlock(fhp);
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index d286c3bf7d43..9ac47c8d27a8 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -812,7 +812,7 @@ int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	struct super_block *sb = inode->i_sb;
 	int err;
 
-	err = setattr_prepare(dentry, iattr);
+	err = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (err)
 		return err;
 
@@ -827,7 +827,7 @@ int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
 		nilfs_truncate(inode);
 	}
 
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 	mark_inode_dirty(inode);
 
 	if (iattr->ia_valid & ATTR_MODE) {
diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
index caf563981532..1f2ab03f4903 100644
--- a/fs/ntfs/inode.c
+++ b/fs/ntfs/inode.c
@@ -2868,7 +2868,7 @@ int ntfs_setattr(struct dentry *dentry, struct iattr *attr)
 	int err;
 	unsigned int ia_valid = attr->ia_valid;
 
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err)
 		goto out;
 	/* We do not support NTFS ACLs yet. */
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index 64491af88239..d5e63ca81afe 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -196,11 +196,11 @@ static int dlmfs_file_setattr(struct dentry *dentry, struct iattr *attr)
 	struct inode *inode = d_inode(dentry);
 
 	attr->ia_valid &= ~ATTR_SIZE;
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 0c75619adf54..cabf355b148f 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1142,7 +1142,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 	if (!(attr->ia_valid & OCFS2_VALID_ATTRS))
 		return 0;
 
-	status = setattr_prepare(dentry, attr);
+	status = setattr_prepare(&init_user_ns, dentry, attr);
 	if (status)
 		return status;
 
@@ -1263,7 +1263,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 		}
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 
 	status = ocfs2_mark_inode_dirty(handle, inode, bh);
diff --git a/fs/omfs/file.c b/fs/omfs/file.c
index 2c7b70ee1388..729339cd7902 100644
--- a/fs/omfs/file.c
+++ b/fs/omfs/file.c
@@ -348,7 +348,7 @@ static int omfs_setattr(struct dentry *dentry, struct iattr *attr)
 	struct inode *inode = d_inode(dentry);
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -361,7 +361,7 @@ static int omfs_setattr(struct dentry *dentry, struct iattr *attr)
 		omfs_truncate(inode);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/open.c b/fs/open.c
index b0e8430e29f3..2dc94689a7dc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -61,7 +61,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 
 	inode_lock(dentry->d_inode);
 	/* Note any delegations or leases have already been broken: */
-	ret = notify_change(dentry, &newattrs, NULL);
+	ret = notify_change(&init_user_ns, dentry, &newattrs, NULL);
 	inode_unlock(dentry->d_inode);
 	return ret;
 }
@@ -580,7 +580,7 @@ int chmod_common(const struct path *path, umode_t mode)
 		goto out_unlock;
 	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
-	error = notify_change(path->dentry, &newattrs, &delegated_inode);
+	error = notify_change(&init_user_ns, path->dentry, &newattrs, &delegated_inode);
 out_unlock:
 	inode_unlock(inode);
 	if (delegated_inode) {
@@ -671,7 +671,7 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
 	inode_lock(inode);
 	error = security_path_chown(path, uid, gid);
 	if (!error)
-		error = notify_change(path->dentry, &newattrs, &delegated_inode);
+		error = notify_change(&init_user_ns, path->dentry, &newattrs, &delegated_inode);
 	inode_unlock(inode);
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 4c790cc8042d..8ac9491ceb9a 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -855,7 +855,7 @@ int __orangefs_setattr(struct inode *inode, struct iattr *iattr)
 		ORANGEFS_I(inode)->attr_uid = current_fsuid();
 		ORANGEFS_I(inode)->attr_gid = current_fsgid();
 	}
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 	spin_unlock(&inode->i_lock);
 	mark_inode_dirty(inode);
 
@@ -876,7 +876,7 @@ int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
 	int ret;
 	gossip_debug(GOSSIP_INODE_DEBUG, "__orangefs_setattr: called on %pd\n",
 	    dentry);
-	ret = setattr_prepare(dentry, iattr);
+	ret = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (ret)
 	        goto out;
 	ret = __orangefs_setattr(d_inode(dentry), iattr);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 955ecd4030f0..b972a58f1836 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -235,7 +235,7 @@ static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat)
 		.ia_size = stat->size,
 	};
 
-	return notify_change(upperdentry, &attr, NULL);
+	return notify_change(&init_user_ns, upperdentry, &attr, NULL);
 }
 
 static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
@@ -247,7 +247,7 @@ static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat)
 		.ia_mtime = stat->mtime,
 	};
 
-	return notify_change(upperdentry, &attr, NULL);
+	return notify_change(&init_user_ns, upperdentry, &attr, NULL);
 }
 
 int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
@@ -259,7 +259,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 			.ia_valid = ATTR_MODE,
 			.ia_mode = stat->mode,
 		};
-		err = notify_change(upperdentry, &attr, NULL);
+		err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
 	}
 	if (!err) {
 		struct iattr attr = {
@@ -267,7 +267,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
 			.ia_uid = stat->uid,
 			.ia_gid = stat->gid,
 		};
-		err = notify_change(upperdentry, &attr, NULL);
+		err = notify_change(&init_user_ns, upperdentry, &attr, NULL);
 	}
 	if (!err)
 		ovl_set_timestamps(upperdentry, stat);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 80b2fab73df7..28e4d3e4d54d 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -508,7 +508,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
 			.ia_mode = cattr->mode,
 		};
 		inode_lock(newdentry->d_inode);
-		err = notify_change(newdentry, &attr, NULL);
+		err = notify_change(&init_user_ns, newdentry, &attr, NULL);
 		inode_unlock(newdentry->d_inode);
 		if (err)
 			goto out_cleanup;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index c8e3c17ca131..7d1a455e4177 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -21,7 +21,7 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 	struct dentry *upperdentry;
 	const struct cred *old_cred;
 
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err)
 		return err;
 
@@ -79,7 +79,7 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 
 		inode_lock(upperdentry->d_inode);
 		old_cred = ovl_override_creds(dentry->d_sb);
-		err = notify_change(upperdentry, attr, NULL);
+		err = notify_change(&init_user_ns, upperdentry, attr, NULL);
 		revert_creds(old_cred);
 		if (!err)
 			ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 82f2c35894e4..64f5f8f6f84e 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -759,7 +759,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 
 		/* Clear any inherited mode bits */
 		inode_lock(work->d_inode);
-		err = notify_change(work, &attr, NULL);
+		err = notify_change(&init_user_ns, work, &attr, NULL);
 		inode_unlock(work->d_inode);
 		if (err)
 			goto out_dput;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3d21c0150e05..e60f4c60f94c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -693,11 +693,11 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr)
 	if (attr->ia_valid & ATTR_MODE)
 		return -EPERM;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 2f9fa179194d..fee877db93f2 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -121,11 +121,11 @@ static int proc_notify_change(struct dentry *dentry, struct iattr *iattr)
 	struct proc_dir_entry *de = PDE(inode);
 	int error;
 
-	error = setattr_prepare(dentry, iattr);
+	error = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (error)
 		return error;
 
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 	mark_inode_dirty(inode);
 
 	proc_set_user(de, inode->i_uid, inode->i_gid);
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 317899222d7f..ec67dbc1f705 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -821,11 +821,11 @@ static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
 	if (attr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID))
 		return -EPERM;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index 355523f4a4bf..f0358fe410d3 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -165,7 +165,7 @@ static int ramfs_nommu_setattr(struct dentry *dentry, struct iattr *ia)
 	int ret = 0;
 
 	/* POSIX UID/GID verification for setting inode attributes */
-	ret = setattr_prepare(dentry, ia);
+	ret = setattr_prepare(&init_user_ns, dentry, ia);
 	if (ret)
 		return ret;
 
@@ -185,7 +185,7 @@ static int ramfs_nommu_setattr(struct dentry *dentry, struct iattr *ia)
 		}
 	}
 
-	setattr_copy(inode, ia);
+	setattr_copy(&init_user_ns, inode, ia);
  out:
 	ia->ia_valid = old_ia_valid;
 	return ret;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index c76d563dec0e..944f2b487cf8 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -3288,7 +3288,7 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr)
 	unsigned int ia_valid;
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -3413,7 +3413,7 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr)
 	}
 
 	if (!error) {
-		setattr_copy(inode, attr);
+		setattr_copy(&init_user_ns, inode, attr);
 		mark_inode_dirty(inode);
 	}
 
diff --git a/fs/sysv/file.c b/fs/sysv/file.c
index 45fc79a18594..ca7e216b7b9e 100644
--- a/fs/sysv/file.c
+++ b/fs/sysv/file.c
@@ -34,7 +34,7 @@ static int sysv_setattr(struct dentry *dentry, struct iattr *attr)
 	struct inode *inode = d_inode(dentry);
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -47,7 +47,7 @@ static int sysv_setattr(struct dentry *dentry, struct iattr *attr)
 		sysv_truncate(inode);
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index b77d1637bbbc..25041c6d08e3 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1265,7 +1265,7 @@ int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
 
 	dbg_gen("ino %lu, mode %#x, ia_valid %#x",
 		inode->i_ino, inode->i_mode, attr->ia_valid);
-	err = setattr_prepare(dentry, attr);
+	err = setattr_prepare(&init_user_ns, dentry, attr);
 	if (err)
 		return err;
 
diff --git a/fs/udf/file.c b/fs/udf/file.c
index 928283925d68..0d5ade491b1a 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -259,7 +259,7 @@ static int udf_setattr(struct dentry *dentry, struct iattr *attr)
 	struct super_block *sb = inode->i_sb;
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -282,7 +282,7 @@ static int udf_setattr(struct dentry *dentry, struct iattr *attr)
 	if (attr->ia_valid & ATTR_MODE)
 		udf_update_extra_perms(inode, attr->ia_mode);
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index c843ec858cf7..6b51f3b20143 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -1217,7 +1217,7 @@ int ufs_setattr(struct dentry *dentry, struct iattr *attr)
 	unsigned int ia_valid = attr->ia_valid;
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -1227,7 +1227,7 @@ int ufs_setattr(struct dentry *dentry, struct iattr *attr)
 			return error;
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	mark_inode_dirty(inode);
 	return 0;
 }
diff --git a/fs/utimes.c b/fs/utimes.c
index fd3cc4226224..1a4130bee157 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -62,7 +62,7 @@ int vfs_utimes(const struct path *path, struct timespec64 *times)
 	}
 retry_deleg:
 	inode_lock(inode);
-	error = notify_change(path->dentry, &newattrs, &delegated_inode);
+	error = notify_change(&init_user_ns, path->dentry, &newattrs, &delegated_inode);
 	inode_unlock(inode);
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 5e165456da68..33c6c8c14275 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -639,7 +639,7 @@ xfs_vn_change_ok(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	return setattr_prepare(dentry, iattr);
+	return setattr_prepare(&init_user_ns, dentry, iattr);
 }
 
 /*
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 5021a41e880c..f266e28ce00d 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -488,7 +488,7 @@ static int zonefs_inode_setattr(struct dentry *dentry, struct iattr *iattr)
 	if (unlikely(IS_IMMUTABLE(inode)))
 		return -EPERM;
 
-	ret = setattr_prepare(dentry, iattr);
+	ret = setattr_prepare(&init_user_ns, dentry, iattr);
 	if (ret)
 		return ret;
 
@@ -516,7 +516,7 @@ static int zonefs_inode_setattr(struct dentry *dentry, struct iattr *iattr)
 			return ret;
 	}
 
-	setattr_copy(inode, iattr);
+	setattr_copy(&init_user_ns, inode, iattr);
 
 	return 0;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a5845a67a34b..95200607ac9c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2787,7 +2787,8 @@ static inline int bmap(struct inode *inode,  sector_t *block)
 }
 #endif
 
-extern int notify_change(struct dentry *, struct iattr *, struct inode **);
+extern int notify_change(struct user_namespace *, struct dentry *,
+			 struct iattr *, struct inode **);
 extern int inode_permission(struct user_namespace *, struct inode *, int);
 extern int generic_permission(struct user_namespace *, struct inode *, int);
 extern int __check_sticky(struct inode *dir, struct inode *inode);
@@ -3244,9 +3245,10 @@ extern int buffer_migrate_page_norefs(struct address_space *,
 #define buffer_migrate_page_norefs NULL
 #endif
 
-extern int setattr_prepare(struct dentry *, struct iattr *);
+extern int setattr_prepare(struct user_namespace *, struct dentry *, struct iattr *);
 extern int inode_newsize_ok(const struct inode *, loff_t offset);
-extern void setattr_copy(struct inode *inode, const struct iattr *attr);
+extern void setattr_copy(struct user_namespace *, struct inode *inode,
+			 const struct iattr *attr);
 
 extern int file_update_time(struct file *file);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 1bd6a9487222..368555ddc411 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1087,7 +1087,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	int error;
 
-	error = setattr_prepare(dentry, attr);
+	error = setattr_prepare(&init_user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -1141,7 +1141,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 		}
 	}
 
-	setattr_copy(inode, attr);
+	setattr_copy(&init_user_ns, inode, attr);
 	if (attr->ia_valid & ATTR_MODE)
 		error = posix_acl_chmod(inode, inode->i_mode);
 	return error;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 12/39] acl: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (10 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 11/39] attr: handle idmapped mounts Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 13/39] xattr: " Christian Brauner
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the inode.
Add helpers that make it possible to handle acls on idampped mounts.

The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the ACL_USER
and ACL_GROUP type according to the initial user namespace (or the
superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either shift from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.

In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix acls
are set by the respective filesystems to handle this case we extend the
->set() method to take an additional user namespace argument to pass the
mount's user namespace down.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 Documentation/filesystems/locking.rst |  6 +--
 Documentation/filesystems/porting.rst |  2 +
 fs/9p/acl.c                           |  3 +-
 fs/9p/xattr.c                         |  1 +
 fs/afs/xattr.c                        |  2 +
 fs/btrfs/acl.c                        |  2 +-
 fs/btrfs/inode.c                      |  2 +-
 fs/btrfs/xattr.c                      |  2 +
 fs/ceph/acl.c                         |  2 +-
 fs/ceph/inode.c                       |  2 +-
 fs/ceph/xattr.c                       |  1 +
 fs/cifs/xattr.c                       |  1 +
 fs/ecryptfs/inode.c                   |  1 +
 fs/ext2/acl.c                         |  2 +-
 fs/ext2/inode.c                       |  2 +-
 fs/ext2/xattr_security.c              |  1 +
 fs/ext2/xattr_trusted.c               |  1 +
 fs/ext2/xattr_user.c                  |  1 +
 fs/ext4/acl.c                         |  2 +-
 fs/ext4/inode.c                       |  2 +-
 fs/ext4/xattr_hurd.c                  |  1 +
 fs/ext4/xattr_security.c              |  1 +
 fs/ext4/xattr_trusted.c               |  1 +
 fs/ext4/xattr_user.c                  |  1 +
 fs/f2fs/acl.c                         |  2 +-
 fs/f2fs/file.c                        |  2 +-
 fs/f2fs/xattr.c                       |  2 +
 fs/fuse/xattr.c                       |  2 +
 fs/gfs2/acl.c                         |  2 +-
 fs/gfs2/inode.c                       |  2 +-
 fs/gfs2/xattr.c                       |  1 +
 fs/hfs/attr.c                         |  1 +
 fs/hfsplus/xattr.c                    |  1 +
 fs/hfsplus/xattr_security.c           |  1 +
 fs/hfsplus/xattr_trusted.c            |  1 +
 fs/hfsplus/xattr_user.c               |  1 +
 fs/jffs2/acl.c                        |  2 +-
 fs/jffs2/fs.c                         |  2 +-
 fs/jffs2/security.c                   |  1 +
 fs/jffs2/xattr_trusted.c              |  1 +
 fs/jffs2/xattr_user.c                 |  1 +
 fs/jfs/acl.c                          |  2 +-
 fs/jfs/file.c                         |  2 +-
 fs/jfs/xattr.c                        |  2 +
 fs/kernfs/inode.c                     |  2 +
 fs/nfs/nfs4proc.c                     |  3 ++
 fs/nfsd/nfs2acl.c                     |  4 +-
 fs/nfsd/nfs3acl.c                     |  4 +-
 fs/nfsd/nfs4acl.c                     |  4 +-
 fs/ocfs2/acl.c                        |  2 +-
 fs/ocfs2/xattr.c                      |  3 ++
 fs/orangefs/acl.c                     |  2 +-
 fs/orangefs/inode.c                   |  2 +-
 fs/orangefs/xattr.c                   |  1 +
 fs/overlayfs/super.c                  |  3 ++
 fs/posix_acl.c                        | 54 +++++++++++++++++----------
 fs/reiserfs/xattr_acl.c               |  4 +-
 fs/reiserfs/xattr_security.c          |  3 +-
 fs/reiserfs/xattr_trusted.c           |  3 +-
 fs/reiserfs/xattr_user.c              |  3 +-
 fs/ubifs/xattr.c                      |  1 +
 fs/xattr.c                            | 10 ++---
 fs/xfs/xfs_acl.c                      |  2 +-
 fs/xfs/xfs_iops.c                     |  2 +-
 fs/xfs/xfs_xattr.c                    |  3 +-
 include/linux/capability.h            |  3 +-
 include/linux/posix_acl.h             |  9 +++--
 include/linux/posix_acl_xattr.h       | 12 ++++--
 include/linux/xattr.h                 |  6 +--
 mm/shmem.c                            |  3 +-
 net/socket.c                          |  1 +
 security/commoncap.c                  | 17 ++++++---
 72 files changed, 159 insertions(+), 80 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index c0f2c7586531..e3b33c59b7f6 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -126,9 +126,9 @@ prototypes::
 	int (*get)(const struct xattr_handler *handler, struct dentry *dentry,
 		   struct inode *inode, const char *name, void *buffer,
 		   size_t size);
-	int (*set)(const struct xattr_handler *handler, struct dentry *dentry,
-		   struct inode *inode, const char *name, const void *buffer,
-		   size_t size, int flags);
+	int (*set)(const struct xattr_handler *handler, struct user_namespace *user_ns,
+                   struct dentry *dentry, struct inode *inode, const char *name,
+                   const void *buffer, size_t size, int flags);
 
 locking rules:
 	all may block
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 867036aa90b8..de1dcec3b5b8 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -717,6 +717,8 @@ be removed.  Switch while you still can; the old one won't stay.
 **mandatory**
 
 ->setxattr() and xattr_handler.set() get dentry and inode passed separately.
+The xattr_handler.set() gets passed the user namespace of the mount the inode
+is seen from so filesystems can idmap the i_uid and i_gid accordingly.
 dentry might be yet to be attached to inode, so do _not_ use its ->d_inode
 in the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
 called before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack
diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index d77b28e8d57a..650b14dd3ccd 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -239,6 +239,7 @@ static int v9fs_xattr_get_acl(const struct xattr_handler *handler,
 }
 
 static int v9fs_xattr_set_acl(const struct xattr_handler *handler,
+			      struct user_namespace *user_ns,
 			      struct dentry *dentry, struct inode *inode,
 			      const char *name, const void *value,
 			      size_t size, int flags)
@@ -279,7 +280,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler,
 			struct iattr iattr = { 0 };
 			struct posix_acl *old_acl = acl;
 
-			retval = posix_acl_update_mode(inode, &iattr.ia_mode, &acl);
+			retval = posix_acl_update_mode(user_ns, inode, &iattr.ia_mode, &acl);
 			if (retval)
 				goto err_out;
 			if (!acl) {
diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index ac8ff8ca4c11..13d8cb1712d6 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -147,6 +147,7 @@ static int v9fs_xattr_handler_get(const struct xattr_handler *handler,
 }
 
 static int v9fs_xattr_handler_set(const struct xattr_handler *handler,
+				  struct user_namespace *user_ns,
 				  struct dentry *dentry, struct inode *inode,
 				  const char *name, const void *value,
 				  size_t size, int flags)
diff --git a/fs/afs/xattr.c b/fs/afs/xattr.c
index 38884d6c57cd..e8217f020970 100644
--- a/fs/afs/xattr.c
+++ b/fs/afs/xattr.c
@@ -120,6 +120,7 @@ static const struct afs_operation_ops afs_store_acl_operation = {
  * Set a file's AFS3 ACL.
  */
 static int afs_xattr_set_acl(const struct xattr_handler *handler,
+			     struct user_namespace *user_ns,
                              struct dentry *dentry,
                              struct inode *inode, const char *name,
                              const void *buffer, size_t size, int flags)
@@ -253,6 +254,7 @@ static const struct afs_operation_ops yfs_store_opaque_acl2_operation = {
  * Set a file's YFS ACL.
  */
 static int afs_xattr_set_yfs(const struct xattr_handler *handler,
+			     struct user_namespace *user_ns,
                              struct dentry *dentry,
                              struct inode *inode, const char *name,
                              const void *buffer, size_t size, int flags)
diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index a0af1b952c4d..b5a683e895c6 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -113,7 +113,7 @@ int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	umode_t old_mode = inode->i_mode;
 
 	if (type == ACL_TYPE_ACCESS && acl) {
-		ret = posix_acl_update_mode(inode, &inode->i_mode, &acl);
+		ret = posix_acl_update_mode(&init_user_ns, inode, &inode->i_mode, &acl);
 		if (ret)
 			return ret;
 	}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e65c54b7e828..e6f4aed0d311 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4891,7 +4891,7 @@ static int btrfs_setattr(struct dentry *dentry, struct iattr *attr)
 		err = btrfs_dirty_inode(inode);
 
 		if (!err && attr->ia_valid & ATTR_MODE)
-			err = posix_acl_chmod(inode, inode->i_mode);
+			err = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 	}
 
 	return err;
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 95d9aebff2c4..a6adb9decdfc 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -360,6 +360,7 @@ static int btrfs_xattr_handler_get(const struct xattr_handler *handler,
 }
 
 static int btrfs_xattr_handler_set(const struct xattr_handler *handler,
+				   struct user_namespace *user_ns,
 				   struct dentry *unused, struct inode *inode,
 				   const char *name, const void *buffer,
 				   size_t size, int flags)
@@ -369,6 +370,7 @@ static int btrfs_xattr_handler_set(const struct xattr_handler *handler,
 }
 
 static int btrfs_xattr_handler_set_prop(const struct xattr_handler *handler,
+					struct user_namespace *user_ns,
 					struct dentry *unused, struct inode *inode,
 					const char *name, const void *value,
 					size_t size, int flags)
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index e0465741c591..bceab4c01585 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -100,7 +100,7 @@ int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	case ACL_TYPE_ACCESS:
 		name = XATTR_NAME_POSIX_ACL_ACCESS;
 		if (acl) {
-			ret = posix_acl_update_mode(inode, &new_mode, &acl);
+			ret = posix_acl_update_mode(&init_user_ns, inode, &new_mode, &acl);
 			if (ret)
 				goto out;
 		}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 06645c2efa6f..19ec845ba5ec 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2266,7 +2266,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr)
 	err = __ceph_setattr(inode, attr);
 
 	if (err >= 0 && (attr->ia_valid & ATTR_MODE))
-		err = posix_acl_chmod(inode, attr->ia_mode);
+		err = posix_acl_chmod(&init_user_ns, inode, attr->ia_mode);
 
 	return err;
 }
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 197cb1234341..a01e0563049a 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -1163,6 +1163,7 @@ static int ceph_get_xattr_handler(const struct xattr_handler *handler,
 }
 
 static int ceph_set_xattr_handler(const struct xattr_handler *handler,
+				  struct user_namespace *user_ns,
 				  struct dentry *unused, struct inode *inode,
 				  const char *name, const void *value,
 				  size_t size, int flags)
diff --git a/fs/cifs/xattr.c b/fs/cifs/xattr.c
index b8299173ea7e..33935bd038aa 100644
--- a/fs/cifs/xattr.c
+++ b/fs/cifs/xattr.c
@@ -99,6 +99,7 @@ static int cifs_creation_time_set(unsigned int xid, struct cifs_tcon *pTcon,
 }
 
 static int cifs_xattr_set(const struct xattr_handler *handler,
+			  struct user_namespace *user_ns,
 			  struct dentry *dentry, struct inode *inode,
 			  const char *name, const void *value,
 			  size_t size, int flags)
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 2454aef02a68..28cf830f0a4d 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1131,6 +1131,7 @@ static int ecryptfs_xattr_get(const struct xattr_handler *handler,
 }
 
 static int ecryptfs_xattr_set(const struct xattr_handler *handler,
+			      struct user_namespace *user_ns,
 			      struct dentry *dentry, struct inode *inode,
 			      const char *name, const void *value, size_t size,
 			      int flags)
diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c
index cf4c77f8dd08..826987b23ccb 100644
--- a/fs/ext2/acl.c
+++ b/fs/ext2/acl.c
@@ -223,7 +223,7 @@ ext2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	umode_t mode = inode->i_mode;
 
 	if (type == ACL_TYPE_ACCESS && acl) {
-		error = posix_acl_update_mode(inode, &mode, &acl);
+		error = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 		if (error)
 			return error;
 		update_mode = 1;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 8eccb67d04b2..5ff1e5e3c0fe 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1690,7 +1690,7 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
 	}
 	setattr_copy(&init_user_ns, inode, iattr);
 	if (iattr->ia_valid & ATTR_MODE)
-		error = posix_acl_chmod(inode, inode->i_mode);
+		error = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 	mark_inode_dirty(inode);
 
 	return error;
diff --git a/fs/ext2/xattr_security.c b/fs/ext2/xattr_security.c
index 9a682e440acb..f536f656b1dd 100644
--- a/fs/ext2/xattr_security.c
+++ b/fs/ext2/xattr_security.c
@@ -19,6 +19,7 @@ ext2_xattr_security_get(const struct xattr_handler *handler,
 
 static int
 ext2_xattr_security_set(const struct xattr_handler *handler,
+			struct user_namespace *user_ns,
 			struct dentry *unused, struct inode *inode,
 			const char *name, const void *value,
 			size_t size, int flags)
diff --git a/fs/ext2/xattr_trusted.c b/fs/ext2/xattr_trusted.c
index 49add1107850..2a20108a8253 100644
--- a/fs/ext2/xattr_trusted.c
+++ b/fs/ext2/xattr_trusted.c
@@ -26,6 +26,7 @@ ext2_xattr_trusted_get(const struct xattr_handler *handler,
 
 static int
 ext2_xattr_trusted_set(const struct xattr_handler *handler,
+		       struct user_namespace *user_ns,
 		       struct dentry *unused, struct inode *inode,
 		       const char *name, const void *value,
 		       size_t size, int flags)
diff --git a/fs/ext2/xattr_user.c b/fs/ext2/xattr_user.c
index c243a3b4d69d..e710f491663c 100644
--- a/fs/ext2/xattr_user.c
+++ b/fs/ext2/xattr_user.c
@@ -30,6 +30,7 @@ ext2_xattr_user_get(const struct xattr_handler *handler,
 
 static int
 ext2_xattr_user_set(const struct xattr_handler *handler,
+		    struct user_namespace *user_ns,
 		    struct dentry *unused, struct inode *inode,
 		    const char *name, const void *value,
 		    size_t size, int flags)
diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 68aaed48315f..4aad060010d8 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -245,7 +245,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	ext4_fc_start_update(inode);
 
 	if ((type == ACL_TYPE_ACCESS) && acl) {
-		error = posix_acl_update_mode(inode, &mode, &acl);
+		error = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 		if (error)
 			goto out_stop;
 		if (mode != inode->i_mode)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 554ffd367b44..7bc58f25bf2e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5510,7 +5510,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 		ext4_orphan_del(NULL, inode);
 
 	if (!error && (ia_valid & ATTR_MODE))
-		rc = posix_acl_chmod(inode, inode->i_mode);
+		rc = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 
 err_out:
 	if  (error)
diff --git a/fs/ext4/xattr_hurd.c b/fs/ext4/xattr_hurd.c
index 8cfa74a56361..74925eace7dc 100644
--- a/fs/ext4/xattr_hurd.c
+++ b/fs/ext4/xattr_hurd.c
@@ -32,6 +32,7 @@ ext4_xattr_hurd_get(const struct xattr_handler *handler,
 
 static int
 ext4_xattr_hurd_set(const struct xattr_handler *handler,
+		    struct user_namespace *user_ns,
 		    struct dentry *unused, struct inode *inode,
 		    const char *name, const void *value,
 		    size_t size, int flags)
diff --git a/fs/ext4/xattr_security.c b/fs/ext4/xattr_security.c
index 197a9d8a15ef..cf6c772200b9 100644
--- a/fs/ext4/xattr_security.c
+++ b/fs/ext4/xattr_security.c
@@ -23,6 +23,7 @@ ext4_xattr_security_get(const struct xattr_handler *handler,
 
 static int
 ext4_xattr_security_set(const struct xattr_handler *handler,
+			struct user_namespace *user_ns,
 			struct dentry *unused, struct inode *inode,
 			const char *name, const void *value,
 			size_t size, int flags)
diff --git a/fs/ext4/xattr_trusted.c b/fs/ext4/xattr_trusted.c
index e9389e5d75c3..b62112b9cb3b 100644
--- a/fs/ext4/xattr_trusted.c
+++ b/fs/ext4/xattr_trusted.c
@@ -30,6 +30,7 @@ ext4_xattr_trusted_get(const struct xattr_handler *handler,
 
 static int
 ext4_xattr_trusted_set(const struct xattr_handler *handler,
+		       struct user_namespace *user_ns,
 		       struct dentry *unused, struct inode *inode,
 		       const char *name, const void *value,
 		       size_t size, int flags)
diff --git a/fs/ext4/xattr_user.c b/fs/ext4/xattr_user.c
index d4546184b34b..6cd01df6e8a5 100644
--- a/fs/ext4/xattr_user.c
+++ b/fs/ext4/xattr_user.c
@@ -31,6 +31,7 @@ ext4_xattr_user_get(const struct xattr_handler *handler,
 
 static int
 ext4_xattr_user_set(const struct xattr_handler *handler,
+		    struct user_namespace *user_ns,
 		    struct dentry *unused, struct inode *inode,
 		    const char *name, const void *value,
 		    size_t size, int flags)
diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index 306413589827..50735e8a354e 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -213,7 +213,7 @@ static int __f2fs_set_acl(struct inode *inode, int type,
 	case ACL_TYPE_ACCESS:
 		name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
 		if (acl && !ipage) {
-			error = posix_acl_update_mode(inode, &mode, &acl);
+			error = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 			if (error)
 				return error;
 			set_acl_inode(inode, mode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 39bc0c77a895..5fc6c44020ed 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -942,7 +942,7 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
 	__setattr_copy(inode, attr);
 
 	if (attr->ia_valid & ATTR_MODE) {
-		err = posix_acl_chmod(inode, f2fs_get_inode_mode(inode));
+		err = posix_acl_chmod(&init_user_ns, inode, f2fs_get_inode_mode(inode));
 		if (err || is_inode_flag_set(inode, FI_ACL_MODE)) {
 			inode->i_mode = F2FS_I(inode)->i_acl_mode;
 			clear_inode_flag(inode, FI_ACL_MODE);
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index d772bf13a814..2b0b270ca80e 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -64,6 +64,7 @@ static int f2fs_xattr_generic_get(const struct xattr_handler *handler,
 }
 
 static int f2fs_xattr_generic_set(const struct xattr_handler *handler,
+		struct user_namespace *user_ns,
 		struct dentry *unused, struct inode *inode,
 		const char *name, const void *value,
 		size_t size, int flags)
@@ -107,6 +108,7 @@ static int f2fs_xattr_advise_get(const struct xattr_handler *handler,
 }
 
 static int f2fs_xattr_advise_set(const struct xattr_handler *handler,
+		struct user_namespace *user_ns,
 		struct dentry *unused, struct inode *inode,
 		const char *name, const void *value,
 		size_t size, int flags)
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 371bdcbc7233..518590494fdc 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -182,6 +182,7 @@ static int fuse_xattr_get(const struct xattr_handler *handler,
 }
 
 static int fuse_xattr_set(const struct xattr_handler *handler,
+			  struct user_namespace *user_ns,
 			  struct dentry *dentry, struct inode *inode,
 			  const char *name, const void *value, size_t size,
 			  int flags)
@@ -205,6 +206,7 @@ static int no_xattr_get(const struct xattr_handler *handler,
 }
 
 static int no_xattr_set(const struct xattr_handler *handler,
+			struct user_namespace *user_ns,
 			struct dentry *dentry, struct inode *nodee,
 			const char *name, const void *value,
 			size_t size, int flags)
diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index 2e939f5fe751..ce88ef29eef0 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -130,7 +130,7 @@ int gfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 
 	mode = inode->i_mode;
 	if (type == ACL_TYPE_ACCESS && acl) {
-		ret = posix_acl_update_mode(inode, &mode, &acl);
+		ret = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 		if (ret)
 			goto unlock;
 	}
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index e6f481ff6f8e..00a90287b919 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1991,7 +1991,7 @@ static int gfs2_setattr(struct dentry *dentry, struct iattr *attr)
 	else {
 		error = gfs2_setattr_simple(inode, attr);
 		if (!error && attr->ia_valid & ATTR_MODE)
-			error = posix_acl_chmod(inode, inode->i_mode);
+			error = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 	}
 
 error:
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c
index 9d7667bc4292..8e0eb5b3528f 100644
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -1214,6 +1214,7 @@ int __gfs2_xattr_set(struct inode *inode, const char *name,
 }
 
 static int gfs2_xattr_set(const struct xattr_handler *handler,
+			  struct user_namespace *user_ns,
 			  struct dentry *unused, struct inode *inode,
 			  const char *name, const void *value,
 			  size_t size, int flags)
diff --git a/fs/hfs/attr.c b/fs/hfs/attr.c
index 74fa62643136..ed2a50535bc1 100644
--- a/fs/hfs/attr.c
+++ b/fs/hfs/attr.c
@@ -121,6 +121,7 @@ static int hfs_xattr_get(const struct xattr_handler *handler,
 }
 
 static int hfs_xattr_set(const struct xattr_handler *handler,
+			 struct user_namespace *user_ns,
 			 struct dentry *unused, struct inode *inode,
 			 const char *name, const void *value, size_t size,
 			 int flags)
diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
index bb0b27d88e50..0fc6000659db 100644
--- a/fs/hfsplus/xattr.c
+++ b/fs/hfsplus/xattr.c
@@ -858,6 +858,7 @@ static int hfsplus_osx_getxattr(const struct xattr_handler *handler,
 }
 
 static int hfsplus_osx_setxattr(const struct xattr_handler *handler,
+				struct user_namespace *user_ns,
 				struct dentry *unused, struct inode *inode,
 				const char *name, const void *buffer,
 				size_t size, int flags)
diff --git a/fs/hfsplus/xattr_security.c b/fs/hfsplus/xattr_security.c
index cfbe6a3bfb1e..00c82f82a1dd 100644
--- a/fs/hfsplus/xattr_security.c
+++ b/fs/hfsplus/xattr_security.c
@@ -23,6 +23,7 @@ static int hfsplus_security_getxattr(const struct xattr_handler *handler,
 }
 
 static int hfsplus_security_setxattr(const struct xattr_handler *handler,
+				     struct user_namespace *user_ns,
 				     struct dentry *unused, struct inode *inode,
 				     const char *name, const void *buffer,
 				     size_t size, int flags)
diff --git a/fs/hfsplus/xattr_trusted.c b/fs/hfsplus/xattr_trusted.c
index fbad91e1dada..e3a2712a55d1 100644
--- a/fs/hfsplus/xattr_trusted.c
+++ b/fs/hfsplus/xattr_trusted.c
@@ -22,6 +22,7 @@ static int hfsplus_trusted_getxattr(const struct xattr_handler *handler,
 }
 
 static int hfsplus_trusted_setxattr(const struct xattr_handler *handler,
+				    struct user_namespace *user_ns,
 				    struct dentry *unused, struct inode *inode,
 				    const char *name, const void *buffer,
 				    size_t size, int flags)
diff --git a/fs/hfsplus/xattr_user.c b/fs/hfsplus/xattr_user.c
index 74d19faf255e..497f8114a132 100644
--- a/fs/hfsplus/xattr_user.c
+++ b/fs/hfsplus/xattr_user.c
@@ -22,6 +22,7 @@ static int hfsplus_user_getxattr(const struct xattr_handler *handler,
 }
 
 static int hfsplus_user_setxattr(const struct xattr_handler *handler,
+				 struct user_namespace *user_ns,
 				 struct dentry *unused, struct inode *inode,
 				 const char *name, const void *buffer,
 				 size_t size, int flags)
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index 093ffbd82395..cf07a2fdf8bf 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -236,7 +236,7 @@ int jffs2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 		if (acl) {
 			umode_t mode;
 
-			rc = posix_acl_update_mode(inode, &mode, &acl);
+			rc = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 			if (rc)
 				return rc;
 			if (inode->i_mode != mode) {
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index 67993808f4da..ee9f51bab4c6 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -201,7 +201,7 @@ int jffs2_setattr(struct dentry *dentry, struct iattr *iattr)
 
 	rc = jffs2_do_setattr(inode, iattr);
 	if (!rc && (iattr->ia_valid & ATTR_MODE))
-		rc = posix_acl_chmod(inode, inode->i_mode);
+		rc = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 
 	return rc;
 }
diff --git a/fs/jffs2/security.c b/fs/jffs2/security.c
index c2332e30f218..451e603c61c1 100644
--- a/fs/jffs2/security.c
+++ b/fs/jffs2/security.c
@@ -57,6 +57,7 @@ static int jffs2_security_getxattr(const struct xattr_handler *handler,
 }
 
 static int jffs2_security_setxattr(const struct xattr_handler *handler,
+				   struct user_namespace *user_ns,
 				   struct dentry *unused, struct inode *inode,
 				   const char *name, const void *buffer,
 				   size_t size, int flags)
diff --git a/fs/jffs2/xattr_trusted.c b/fs/jffs2/xattr_trusted.c
index 5d6030826c52..ca1afca47910 100644
--- a/fs/jffs2/xattr_trusted.c
+++ b/fs/jffs2/xattr_trusted.c
@@ -25,6 +25,7 @@ static int jffs2_trusted_getxattr(const struct xattr_handler *handler,
 }
 
 static int jffs2_trusted_setxattr(const struct xattr_handler *handler,
+				  struct user_namespace *user_ns,
 				  struct dentry *unused, struct inode *inode,
 				  const char *name, const void *buffer,
 				  size_t size, int flags)
diff --git a/fs/jffs2/xattr_user.c b/fs/jffs2/xattr_user.c
index 9d027b4abcf9..f6bc7da9c0d3 100644
--- a/fs/jffs2/xattr_user.c
+++ b/fs/jffs2/xattr_user.c
@@ -25,6 +25,7 @@ static int jffs2_user_getxattr(const struct xattr_handler *handler,
 }
 
 static int jffs2_user_setxattr(const struct xattr_handler *handler,
+			       struct user_namespace *user_ns,
 			       struct dentry *unused, struct inode *inode,
 			       const char *name, const void *buffer,
 			       size_t size, int flags)
diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
index 92cc0ac2d1fc..cf79a34bfada 100644
--- a/fs/jfs/acl.c
+++ b/fs/jfs/acl.c
@@ -101,7 +101,7 @@ int jfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	tid = txBegin(inode->i_sb, 0);
 	mutex_lock(&JFS_IP(inode)->commit_mutex);
 	if (type == ACL_TYPE_ACCESS && acl) {
-		rc = posix_acl_update_mode(inode, &mode, &acl);
+		rc = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 		if (rc)
 			goto end_tx;
 		if (mode != inode->i_mode)
diff --git a/fs/jfs/file.c b/fs/jfs/file.c
index ff49876e9c9b..61c3b0c1fbf6 100644
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@@ -122,7 +122,7 @@ int jfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	mark_inode_dirty(inode);
 
 	if (iattr->ia_valid & ATTR_MODE)
-		rc = posix_acl_chmod(inode, inode->i_mode);
+		rc = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 	return rc;
 }
 
diff --git a/fs/jfs/xattr.c b/fs/jfs/xattr.c
index db41e7803163..2c83750bdf2b 100644
--- a/fs/jfs/xattr.c
+++ b/fs/jfs/xattr.c
@@ -932,6 +932,7 @@ static int jfs_xattr_get(const struct xattr_handler *handler,
 }
 
 static int jfs_xattr_set(const struct xattr_handler *handler,
+			 struct user_namespace *user_ns,
 			 struct dentry *unused, struct inode *inode,
 			 const char *name, const void *value,
 			 size_t size, int flags)
@@ -950,6 +951,7 @@ static int jfs_xattr_get_os2(const struct xattr_handler *handler,
 }
 
 static int jfs_xattr_set_os2(const struct xattr_handler *handler,
+			     struct user_namespace *user_ns,
 			     struct dentry *unused, struct inode *inode,
 			     const char *name, const void *value,
 			     size_t size, int flags)
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 86bd4c593b78..74ee0ffcfe71 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -319,6 +319,7 @@ static int kernfs_vfs_xattr_get(const struct xattr_handler *handler,
 }
 
 static int kernfs_vfs_xattr_set(const struct xattr_handler *handler,
+				struct user_namespace *user_ns,
 				struct dentry *unused, struct inode *inode,
 				const char *suffix, const void *value,
 				size_t size, int flags)
@@ -385,6 +386,7 @@ static int kernfs_vfs_user_xattr_rm(struct kernfs_node *kn,
 }
 
 static int kernfs_vfs_user_xattr_set(const struct xattr_handler *handler,
+				     struct user_namespace *user_ns,
 				     struct dentry *unused, struct inode *inode,
 				     const char *suffix, const void *value,
 				     size_t size, int flags)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9e0ca9b2b210..77504501323c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7486,6 +7486,7 @@ nfs4_release_lockowner(struct nfs_server *server, struct nfs4_lock_state *lsp)
 #define XATTR_NAME_NFSV4_ACL "system.nfs4_acl"
 
 static int nfs4_xattr_set_nfs4_acl(const struct xattr_handler *handler,
+				   struct user_namespace *user_ns,
 				   struct dentry *unused, struct inode *inode,
 				   const char *key, const void *buf,
 				   size_t buflen, int flags)
@@ -7508,6 +7509,7 @@ static bool nfs4_xattr_list_nfs4_acl(struct dentry *dentry)
 #ifdef CONFIG_NFS_V4_SECURITY_LABEL
 
 static int nfs4_xattr_set_nfs4_label(const struct xattr_handler *handler,
+				     struct user_namespace *user_ns,
 				     struct dentry *unused, struct inode *inode,
 				     const char *key, const void *buf,
 				     size_t buflen, int flags)
@@ -7558,6 +7560,7 @@ nfs4_listxattr_nfs4_label(struct inode *inode, char *list, size_t list_len)
 
 #ifdef CONFIG_NFS_V4_2
 static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
+				    struct user_namespace *user_ns,
 				    struct dentry *unused, struct inode *inode,
 				    const char *key, const void *buf,
 				    size_t buflen, int flags)
diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
index 6a900f770dd2..e5f06f21c24a 100644
--- a/fs/nfsd/nfs2acl.c
+++ b/fs/nfsd/nfs2acl.c
@@ -113,10 +113,10 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
 
 	fh_lock(fh);
 
-	error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access);
+	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS, argp->acl_access);
 	if (error)
 		goto out_drop_lock;
-	error = set_posix_acl(inode, ACL_TYPE_DEFAULT, argp->acl_default);
+	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_DEFAULT, argp->acl_default);
 	if (error)
 		goto out_drop_lock;
 
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
index 34a394e50e1d..3618f62e118e 100644
--- a/fs/nfsd/nfs3acl.c
+++ b/fs/nfsd/nfs3acl.c
@@ -103,10 +103,10 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
 
 	fh_lock(fh);
 
-	error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access);
+	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS, argp->acl_access);
 	if (error)
 		goto out_drop_lock;
-	error = set_posix_acl(inode, ACL_TYPE_DEFAULT, argp->acl_default);
+	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_DEFAULT, argp->acl_default);
 
 out_drop_lock:
 	fh_unlock(fh);
diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
index 71292a0d6f09..0b35bc490aee 100644
--- a/fs/nfsd/nfs4acl.c
+++ b/fs/nfsd/nfs4acl.c
@@ -781,12 +781,12 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	fh_lock(fhp);
 
-	host_error = set_posix_acl(inode, ACL_TYPE_ACCESS, pacl);
+	host_error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS, pacl);
 	if (host_error < 0)
 		goto out_drop_lock;
 
 	if (S_ISDIR(inode->i_mode)) {
-		host_error = set_posix_acl(inode, ACL_TYPE_DEFAULT, dpacl);
+		host_error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_DEFAULT, dpacl);
 	}
 
 out_drop_lock:
diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index 7b07f5df3a29..7e64dbe93251 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -274,7 +274,7 @@ int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	if (type == ACL_TYPE_ACCESS && acl) {
 		umode_t mode;
 
-		status = posix_acl_update_mode(inode, &mode, &acl);
+		status = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 		if (status)
 			goto unlock;
 
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 9ccd19d8f7b1..7b034aef0b0d 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -7249,6 +7249,7 @@ static int ocfs2_xattr_security_get(const struct xattr_handler *handler,
 }
 
 static int ocfs2_xattr_security_set(const struct xattr_handler *handler,
+				    struct user_namespace *user_ns,
 				    struct dentry *unused, struct inode *inode,
 				    const char *name, const void *value,
 				    size_t size, int flags)
@@ -7321,6 +7322,7 @@ static int ocfs2_xattr_trusted_get(const struct xattr_handler *handler,
 }
 
 static int ocfs2_xattr_trusted_set(const struct xattr_handler *handler,
+				   struct user_namespace *user_ns,
 				   struct dentry *unused, struct inode *inode,
 				   const char *name, const void *value,
 				   size_t size, int flags)
@@ -7351,6 +7353,7 @@ static int ocfs2_xattr_user_get(const struct xattr_handler *handler,
 }
 
 static int ocfs2_xattr_user_set(const struct xattr_handler *handler,
+				struct user_namespace *user_ns,
 				struct dentry *unused, struct inode *inode,
 				const char *name, const void *value,
 				size_t size, int flags)
diff --git a/fs/orangefs/acl.c b/fs/orangefs/acl.c
index a25e6c890975..ba55d61906c2 100644
--- a/fs/orangefs/acl.c
+++ b/fs/orangefs/acl.c
@@ -132,7 +132,7 @@ int orangefs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 		 * and "mode" to the new desired value. It is up to
 		 * us to propagate the new mode back to the server...
 		 */
-		error = posix_acl_update_mode(inode, &iattr.ia_mode, &acl);
+		error = posix_acl_update_mode(&init_user_ns, inode, &iattr.ia_mode, &acl);
 		if (error) {
 			gossip_err("%s: posix_acl_update_mode err: %d\n",
 				   __func__,
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 8ac9491ceb9a..563fe9ab8eb2 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -861,7 +861,7 @@ int __orangefs_setattr(struct inode *inode, struct iattr *iattr)
 
 	if (iattr->ia_valid & ATTR_MODE)
 		/* change mod on a file that has ACLs */
-		ret = posix_acl_chmod(inode, inode->i_mode);
+		ret = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 
 	ret = 0;
 out:
diff --git a/fs/orangefs/xattr.c b/fs/orangefs/xattr.c
index bdc285aea360..41a4145ba697 100644
--- a/fs/orangefs/xattr.c
+++ b/fs/orangefs/xattr.c
@@ -526,6 +526,7 @@ ssize_t orangefs_listxattr(struct dentry *dentry, char *buffer, size_t size)
 }
 
 static int orangefs_xattr_set_default(const struct xattr_handler *handler,
+				      struct user_namespace *user_ns,
 				      struct dentry *unused,
 				      struct inode *inode,
 				      const char *name,
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 64f5f8f6f84e..9c12df942407 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -935,6 +935,7 @@ ovl_posix_acl_xattr_get(const struct xattr_handler *handler,
 
 static int __maybe_unused
 ovl_posix_acl_xattr_set(const struct xattr_handler *handler,
+			struct user_namespace *user_ns,
 			struct dentry *dentry, struct inode *inode,
 			const char *name, const void *value,
 			size_t size, int flags)
@@ -999,6 +1000,7 @@ static int ovl_own_xattr_get(const struct xattr_handler *handler,
 }
 
 static int ovl_own_xattr_set(const struct xattr_handler *handler,
+			     struct user_namespace *user_ns,
 			     struct dentry *dentry, struct inode *inode,
 			     const char *name, const void *value,
 			     size_t size, int flags)
@@ -1014,6 +1016,7 @@ static int ovl_other_xattr_get(const struct xattr_handler *handler,
 }
 
 static int ovl_other_xattr_set(const struct xattr_handler *handler,
+			       struct user_namespace *user_ns,
 			       struct dentry *dentry, struct inode *inode,
 			       const char *name, const void *value,
 			       size_t size, int flags)
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 87b5ec67000b..ee6c017802c3 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -559,7 +559,7 @@ __posix_acl_chmod(struct posix_acl **acl, gfp_t gfp, umode_t mode)
 EXPORT_SYMBOL(__posix_acl_chmod);
 
 int
-posix_acl_chmod(struct inode *inode, umode_t mode)
+posix_acl_chmod(struct user_namespace *user_ns, struct inode *inode, umode_t mode)
 {
 	struct posix_acl *acl;
 	int ret = 0;
@@ -638,6 +638,7 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
 
 /**
  * posix_acl_update_mode  -  update mode in set_acl
+ * @user_ns: user namespace the inode is accessed from
  * @inode: target inode
  * @mode_p: mode (pointer) for update
  * @acl: acl pointer
@@ -651,8 +652,8 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
  *
  * Called from set_acl inode operations.
  */
-int posix_acl_update_mode(struct inode *inode, umode_t *mode_p,
-			  struct posix_acl **acl)
+int posix_acl_update_mode(struct user_namespace *user_ns, struct inode *inode,
+			  umode_t *mode_p, struct posix_acl **acl)
 {
 	umode_t mode = inode->i_mode;
 	int error;
@@ -662,8 +663,8 @@ int posix_acl_update_mode(struct inode *inode, umode_t *mode_p,
 		return error;
 	if (error == 0)
 		*acl = NULL;
-	if (!in_group_p(inode->i_gid) &&
-	    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID))
+	if (!in_group_p(i_gid_into_mnt(user_ns, inode)) &&
+	    !capable_wrt_inode_uidgid(user_ns, inode, CAP_FSETID))
 		mode &= ~S_ISGID;
 	*mode_p = mode;
 	return 0;
@@ -675,7 +676,8 @@ EXPORT_SYMBOL(posix_acl_update_mode);
  */
 static void posix_acl_fix_xattr_userns(
 	struct user_namespace *to, struct user_namespace *from,
-	void *value, size_t size)
+	struct user_namespace *mnt_user_ns,
+	void *value, size_t size, bool from_user)
 {
 	struct posix_acl_xattr_header *header = value;
 	struct posix_acl_xattr_entry *entry = (void *)(header + 1), *end;
@@ -700,10 +702,18 @@ static void posix_acl_fix_xattr_userns(
 		switch(le16_to_cpu(entry->e_tag)) {
 		case ACL_USER:
 			uid = make_kuid(from, le32_to_cpu(entry->e_id));
+			if (from_user)
+				uid = kuid_from_mnt(mnt_user_ns, uid);
+			else
+				uid = kuid_into_mnt(mnt_user_ns, uid);
 			entry->e_id = cpu_to_le32(from_kuid(to, uid));
 			break;
 		case ACL_GROUP:
 			gid = make_kgid(from, le32_to_cpu(entry->e_id));
+			if (from_user)
+				gid = kgid_from_mnt(mnt_user_ns, gid);
+			else
+				gid = kgid_into_mnt(mnt_user_ns, gid);
 			entry->e_id = cpu_to_le32(from_kgid(to, gid));
 			break;
 		default:
@@ -712,21 +722,25 @@ static void posix_acl_fix_xattr_userns(
 	}
 }
 
-void posix_acl_fix_xattr_from_user(void *value, size_t size)
+void posix_acl_fix_xattr_from_user(struct user_namespace *mnt_user_ns,
+				   void *value, size_t size)
 {
 	struct user_namespace *user_ns = current_user_ns();
-	if (user_ns == &init_user_ns)
+	if ((user_ns == &init_user_ns) && (mnt_user_ns == &init_user_ns))
 		return;
-	posix_acl_fix_xattr_userns(&init_user_ns, user_ns, value, size);
+	posix_acl_fix_xattr_userns(&init_user_ns, user_ns, mnt_user_ns, value, size, true);
 }
+EXPORT_SYMBOL(posix_acl_fix_xattr_from_user);
 
-void posix_acl_fix_xattr_to_user(void *value, size_t size)
+void posix_acl_fix_xattr_to_user(struct user_namespace *mnt_user_ns,
+				 void *value, size_t size)
 {
 	struct user_namespace *user_ns = current_user_ns();
-	if (user_ns == &init_user_ns)
+	if ((user_ns == &init_user_ns) && (mnt_user_ns == &init_user_ns))
 		return;
-	posix_acl_fix_xattr_userns(user_ns, &init_user_ns, value, size);
+	posix_acl_fix_xattr_userns(user_ns, &init_user_ns, mnt_user_ns, value, size, false);
 }
+EXPORT_SYMBOL(posix_acl_fix_xattr_to_user);
 
 /*
  * Convert from extended attribute to in-memory representation.
@@ -865,7 +879,8 @@ posix_acl_xattr_get(const struct xattr_handler *handler,
 }
 
 int
-set_posix_acl(struct inode *inode, int type, struct posix_acl *acl)
+set_posix_acl(struct user_namespace *user_ns, struct inode *inode,
+	      int type, struct posix_acl *acl)
 {
 	if (!IS_POSIXACL(inode))
 		return -EOPNOTSUPP;
@@ -874,7 +889,7 @@ set_posix_acl(struct inode *inode, int type, struct posix_acl *acl)
 
 	if (type == ACL_TYPE_DEFAULT && !S_ISDIR(inode->i_mode))
 		return acl ? -EACCES : 0;
-	if (!inode_owner_or_capable(&init_user_ns, inode))
+	if (!inode_owner_or_capable(user_ns, inode))
 		return -EPERM;
 
 	if (acl) {
@@ -888,9 +903,10 @@ EXPORT_SYMBOL(set_posix_acl);
 
 static int
 posix_acl_xattr_set(const struct xattr_handler *handler,
-		    struct dentry *unused, struct inode *inode,
-		    const char *name, const void *value,
-		    size_t size, int flags)
+			   struct user_namespace *user_ns,
+			   struct dentry *unused, struct inode *inode,
+			   const char *name, const void *value, size_t size,
+			   int flags)
 {
 	struct posix_acl *acl = NULL;
 	int ret;
@@ -900,7 +916,7 @@ posix_acl_xattr_set(const struct xattr_handler *handler,
 		if (IS_ERR(acl))
 			return PTR_ERR(acl);
 	}
-	ret = set_posix_acl(inode, handler->flags, acl);
+	ret = set_posix_acl(user_ns, inode, handler->flags, acl);
 	posix_acl_release(acl);
 	return ret;
 }
@@ -934,7 +950,7 @@ int simple_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	int error;
 
 	if (type == ACL_TYPE_ACCESS) {
-		error = posix_acl_update_mode(inode,
+		error = posix_acl_update_mode(&init_user_ns, inode,
 				&inode->i_mode, &acl);
 		if (error)
 			return error;
diff --git a/fs/reiserfs/xattr_acl.c b/fs/reiserfs/xattr_acl.c
index ccd40df6eb45..b8f397134c17 100644
--- a/fs/reiserfs/xattr_acl.c
+++ b/fs/reiserfs/xattr_acl.c
@@ -40,7 +40,7 @@ reiserfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	reiserfs_write_unlock(inode->i_sb);
 	if (error == 0) {
 		if (type == ACL_TYPE_ACCESS && acl) {
-			error = posix_acl_update_mode(inode, &mode, &acl);
+			error = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 			if (error)
 				goto unlock;
 			update_mode = 1;
@@ -399,5 +399,5 @@ int reiserfs_acl_chmod(struct inode *inode)
 	    !reiserfs_posixacl(inode->i_sb))
 		return 0;
 
-	return posix_acl_chmod(inode, inode->i_mode);
+	return posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 }
diff --git a/fs/reiserfs/xattr_security.c b/fs/reiserfs/xattr_security.c
index 20be9a0e5870..c98411a46c64 100644
--- a/fs/reiserfs/xattr_security.c
+++ b/fs/reiserfs/xattr_security.c
@@ -21,7 +21,8 @@ security_get(const struct xattr_handler *handler, struct dentry *unused,
 }
 
 static int
-security_set(const struct xattr_handler *handler, struct dentry *unused,
+security_set(const struct xattr_handler *handler, struct user_namespace *user_ns,
+	     struct dentry *unused,
 	     struct inode *inode, const char *name, const void *buffer,
 	     size_t size, int flags)
 {
diff --git a/fs/reiserfs/xattr_trusted.c b/fs/reiserfs/xattr_trusted.c
index 5ed48da3d02b..d197cd9da0ec 100644
--- a/fs/reiserfs/xattr_trusted.c
+++ b/fs/reiserfs/xattr_trusted.c
@@ -20,7 +20,8 @@ trusted_get(const struct xattr_handler *handler, struct dentry *unused,
 }
 
 static int
-trusted_set(const struct xattr_handler *handler, struct dentry *unused,
+trusted_set(const struct xattr_handler *handler, struct user_namespace *user_ns,
+	    struct dentry *unused,
 	    struct inode *inode, const char *name, const void *buffer,
 	    size_t size, int flags)
 {
diff --git a/fs/reiserfs/xattr_user.c b/fs/reiserfs/xattr_user.c
index a573ca45bacc..a79112aa0b4d 100644
--- a/fs/reiserfs/xattr_user.c
+++ b/fs/reiserfs/xattr_user.c
@@ -18,7 +18,8 @@ user_get(const struct xattr_handler *handler, struct dentry *unused,
 }
 
 static int
-user_set(const struct xattr_handler *handler, struct dentry *unused,
+user_set(const struct xattr_handler *handler, struct user_namespace *user_ns,
+	 struct dentry *unused,
 	 struct inode *inode, const char *name, const void *buffer,
 	 size_t size, int flags)
 {
diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index a0b9b349efe6..87d710d0e5b1 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -681,6 +681,7 @@ static int xattr_get(const struct xattr_handler *handler,
 }
 
 static int xattr_set(const struct xattr_handler *handler,
+			   struct user_namespace *user_ns,
 			   struct dentry *dentry, struct inode *inode,
 			   const char *name, const void *value,
 			   size_t size, int flags)
diff --git a/fs/xattr.c b/fs/xattr.c
index fcc79c2a1ea1..ff9ffe77a4b2 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -174,7 +174,7 @@ __vfs_setxattr(struct dentry *dentry, struct inode *inode, const char *name,
 		return -EOPNOTSUPP;
 	if (size == 0)
 		value = "";  /* empty EA, do not remove */
-	return handler->set(handler, dentry, inode, name, value, size, flags);
+	return handler->set(handler, &init_user_ns, dentry, inode, name, value, size, flags);
 }
 EXPORT_SYMBOL(__vfs_setxattr);
 
@@ -438,7 +438,7 @@ __vfs_removexattr(struct dentry *dentry, const char *name)
 		return PTR_ERR(handler);
 	if (!handler->set)
 		return -EOPNOTSUPP;
-	return handler->set(handler, dentry, inode, name, NULL, 0, XATTR_REPLACE);
+	return handler->set(handler, &init_user_ns, dentry, inode, name, NULL, 0, XATTR_REPLACE);
 }
 EXPORT_SYMBOL(__vfs_removexattr);
 
@@ -536,9 +536,9 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 		}
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
 		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_from_user(kvalue, size);
+			posix_acl_fix_xattr_from_user(&init_user_ns, kvalue, size);
 		else if (strcmp(kname, XATTR_NAME_CAPS) == 0) {
-			error = cap_convert_nscap(d, &kvalue, size);
+			error = cap_convert_nscap(&init_user_ns, d, &kvalue, size);
 			if (error < 0)
 				goto out;
 			size = error;
@@ -636,7 +636,7 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 	if (error > 0) {
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
 		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_to_user(kvalue, error);
+			posix_acl_fix_xattr_to_user(&init_user_ns, kvalue, error);
 		if (size && copy_to_user(value, kvalue, error))
 			error = -EFAULT;
 	} else if (error == -ERANGE && size >= XATTR_SIZE_MAX) {
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index c544951a0c07..2e9d2d2878ce 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -244,7 +244,7 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 		return error;
 
 	if (type == ACL_TYPE_ACCESS) {
-		error = posix_acl_update_mode(inode, &mode, &acl);
+		error = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
 		if (error)
 			return error;
 		set_mode = true;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 33c6c8c14275..36d583ed8d25 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -810,7 +810,7 @@ xfs_setattr_nonsize(
 	 * 	     Posix ACL code seems to care about this issue either.
 	 */
 	if ((mask & ATTR_MODE) && !(flags & XFS_ATTR_NOACL)) {
-		error = posix_acl_chmod(inode, inode->i_mode);
+		error = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index bca48b308c02..76c9bdab2234 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -38,7 +38,8 @@ xfs_xattr_get(const struct xattr_handler *handler, struct dentry *unused,
 }
 
 static int
-xfs_xattr_set(const struct xattr_handler *handler, struct dentry *unused,
+xfs_xattr_set(const struct xattr_handler *handler, struct user_namespace *user_ns,
+		struct dentry *unused,
 		struct inode *inode, const char *name, const void *value,
 		size_t size, int flags)
 {
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 041e336f3369..0d54ca8abaed 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -273,6 +273,7 @@ static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns)
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
 
-extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size);
+extern int cap_convert_nscap(struct user_namespace *user_ns,
+			     struct dentry *dentry, void **ivalue, size_t size);
 
 #endif /* !_LINUX_CAPABILITY_H */
diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
index 8276baefed13..28a789218d83 100644
--- a/include/linux/posix_acl.h
+++ b/include/linux/posix_acl.h
@@ -71,13 +71,13 @@ extern int __posix_acl_create(struct posix_acl **, gfp_t, umode_t *);
 extern int __posix_acl_chmod(struct posix_acl **, gfp_t, umode_t);
 
 extern struct posix_acl *get_posix_acl(struct inode *, int);
-extern int set_posix_acl(struct inode *, int, struct posix_acl *);
+extern int set_posix_acl(struct user_namespace *, struct inode *, int, struct posix_acl *);
 
 #ifdef CONFIG_FS_POSIX_ACL
-extern int posix_acl_chmod(struct inode *, umode_t);
+extern int posix_acl_chmod(struct user_namespace *, struct inode *, umode_t);
 extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **,
 		struct posix_acl **);
-extern int posix_acl_update_mode(struct inode *, umode_t *, struct posix_acl **);
+extern int posix_acl_update_mode(struct user_namespace *, struct inode *, umode_t *, struct posix_acl **);
 
 extern int simple_set_acl(struct inode *, struct posix_acl *, int);
 extern int simple_acl_create(struct inode *, struct inode *);
@@ -94,7 +94,8 @@ static inline void cache_no_acl(struct inode *inode)
 	inode->i_default_acl = NULL;
 }
 #else
-static inline int posix_acl_chmod(struct inode *inode, umode_t mode)
+static inline int posix_acl_chmod(struct user_namespace *user_ns,
+				  struct inode *inode, umode_t mode)
 {
 	return 0;
 }
diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h
index 2387709991b5..9fdac573e1cb 100644
--- a/include/linux/posix_acl_xattr.h
+++ b/include/linux/posix_acl_xattr.h
@@ -33,13 +33,17 @@ posix_acl_xattr_count(size_t size)
 }
 
 #ifdef CONFIG_FS_POSIX_ACL
-void posix_acl_fix_xattr_from_user(void *value, size_t size);
-void posix_acl_fix_xattr_to_user(void *value, size_t size);
+void posix_acl_fix_xattr_from_user(struct user_namespace *mnt_user_ns,
+				   void *value, size_t size);
+void posix_acl_fix_xattr_to_user(struct user_namespace *mnt_user_ns,
+				 void *value, size_t size);
 #else
-static inline void posix_acl_fix_xattr_from_user(void *value, size_t size)
+static inline void posix_acl_fix_xattr_from_user(struct user_namespace *mnt_user_ns,
+						 void *value, size_t size)
 {
 }
-static inline void posix_acl_fix_xattr_to_user(void *value, size_t size)
+static inline void posix_acl_fix_xattr_to_user(struct user_namespace *mnt_user_ns,
+					       void *value, size_t size)
 {
 }
 #endif
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index 10b4dc2709f0..6d2ef03ba399 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -34,9 +34,9 @@ struct xattr_handler {
 	int (*get)(const struct xattr_handler *, struct dentry *dentry,
 		   struct inode *inode, const char *name, void *buffer,
 		   size_t size);
-	int (*set)(const struct xattr_handler *, struct dentry *dentry,
-		   struct inode *inode, const char *name, const void *buffer,
-		   size_t size, int flags);
+	int (*set)(const struct xattr_handler *, struct user_namespace *user_ns,
+		   struct dentry *dentry, struct inode *inode, const char *name,
+		   const void *buffer, size_t size, int flags);
 };
 
 const char *xattr_full_name(const struct xattr_handler *, const char *);
diff --git a/mm/shmem.c b/mm/shmem.c
index 368555ddc411..8fdf52576e06 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1143,7 +1143,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 
 	setattr_copy(&init_user_ns, inode, attr);
 	if (attr->ia_valid & ATTR_MODE)
-		error = posix_acl_chmod(inode, inode->i_mode);
+		error = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
 	return error;
 }
 
@@ -3278,6 +3278,7 @@ static int shmem_xattr_handler_get(const struct xattr_handler *handler,
 }
 
 static int shmem_xattr_handler_set(const struct xattr_handler *handler,
+				   struct user_namespace *user_ns,
 				   struct dentry *unused, struct inode *inode,
 				   const char *name, const void *value,
 				   size_t size, int flags)
diff --git a/net/socket.c b/net/socket.c
index 6e6cccc2104f..3bf36327f531 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -334,6 +334,7 @@ static const struct xattr_handler sockfs_xattr_handler = {
 };
 
 static int sockfs_security_xattr_set(const struct xattr_handler *handler,
+				     struct user_namespace *user_ns,
 				     struct dentry *dentry, struct inode *inode,
 				     const char *suffix, const void *value,
 				     size_t size, int flags)
diff --git a/security/commoncap.c b/security/commoncap.c
index 4cd2bdfd0a8b..abefe9223b5e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -451,15 +451,18 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
 }
 
 static kuid_t rootid_from_xattr(const void *value, size_t size,
-				struct user_namespace *task_ns)
+				struct user_namespace *task_ns,
+				struct user_namespace *user_ns)
 {
 	const struct vfs_ns_cap_data *nscap = value;
+	kuid_t rootkid;
 	uid_t rootid = 0;
 
 	if (size == XATTR_CAPS_SZ_3)
 		rootid = le32_to_cpu(nscap->rootid);
 
-	return make_kuid(task_ns, rootid);
+	rootkid = make_kuid(task_ns, rootid);
+	return kuid_from_mnt(user_ns, rootkid);
 }
 
 static bool validheader(size_t size, const struct vfs_cap_data *cap)
@@ -473,7 +476,8 @@ static bool validheader(size_t size, const struct vfs_cap_data *cap)
  *
  * If all is ok, we return the new size, on error return < 0.
  */
-int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size)
+int cap_convert_nscap(struct user_namespace *user_ns, struct dentry *dentry,
+		      void **ivalue, size_t size)
 {
 	struct vfs_ns_cap_data *nscap;
 	uid_t nsrootid;
@@ -489,14 +493,14 @@ int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size)
 		return -EINVAL;
 	if (!validheader(size, cap))
 		return -EINVAL;
-	if (!capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_SETFCAP))
+	if (!capable_wrt_inode_uidgid(user_ns, inode, CAP_SETFCAP))
 		return -EPERM;
-	if (size == XATTR_CAPS_SZ_2)
+	if (size == XATTR_CAPS_SZ_2 && (user_ns == &init_user_ns))
 		if (ns_capable(inode->i_sb->s_user_ns, CAP_SETFCAP))
 			/* user is privileged, just write the v2 */
 			return size;
 
-	rootid = rootid_from_xattr(*ivalue, size, task_ns);
+	rootid = rootid_from_xattr(*ivalue, size, task_ns, user_ns);
 	if (!uid_valid(rootid))
 		return -EINVAL;
 
@@ -520,6 +524,7 @@ int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size)
 	*ivalue = nscap;
 	return newsize;
 }
+EXPORT_SYMBOL(cap_convert_nscap);
 
 /*
  * Calculate the new process capability sets from the capability sets attached
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 13/39] xattr: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (11 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 12/39] acl: " Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 14/39] commoncap: " Christian Brauner
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Tycho Andersen, Christoph Hellwig, Christian Brauner

From: Tycho Andersen <tycho@tycho.pizza>

When interacting with extended attributes the vfs verifies that the caller
is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid or
gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an idmapped
mount we need to map it according to the mount's user namespace. Afterwards
the checks are identical to non-idmapped mounts. This has no effect for
e.g. security xattrs that are set since no filesystem performs checks based
on the uid and gid of the inode as the vfs will have already done.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 fs/cachefiles/xattr.c                 |  16 ++--
 fs/ecryptfs/crypto.c                  |   4 +-
 fs/ecryptfs/inode.c                   |   4 +-
 fs/ecryptfs/mmap.c                    |   4 +-
 fs/nfsd/vfs.c                         |  12 +--
 fs/overlayfs/copy_up.c                |  12 +--
 fs/overlayfs/dir.c                    |   2 +-
 fs/overlayfs/inode.c                  |   8 +-
 fs/overlayfs/overlayfs.h              |   6 +-
 fs/overlayfs/super.c                  |   4 +-
 fs/xattr.c                            | 120 +++++++++++++++-----------
 include/linux/xattr.h                 |  24 ++++--
 security/apparmor/domain.c            |   4 +-
 security/commoncap.c                  |   6 +-
 security/integrity/evm/evm_crypto.c   |  11 +--
 security/integrity/evm/evm_main.c     |   4 +-
 security/integrity/ima/ima_appraise.c |   8 +-
 security/selinux/hooks.c              |   2 +-
 security/smack/smack_lsm.c            |   4 +-
 19 files changed, 143 insertions(+), 112 deletions(-)

diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 72e42438f3d7..3b6a3f1610f4 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -39,7 +39,7 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
 	_enter("%p{%s}", object, type);
 
 	/* attempt to install a type label directly */
-	ret = vfs_setxattr(dentry, cachefiles_xattr_cache, type, 2,
+	ret = vfs_setxattr(&init_user_ns, dentry, cachefiles_xattr_cache, type, 2,
 			   XATTR_CREATE);
 	if (ret == 0) {
 		_debug("SET"); /* we succeeded */
@@ -54,7 +54,7 @@ int cachefiles_check_object_type(struct cachefiles_object *object)
 	}
 
 	/* read the current type label */
-	ret = vfs_getxattr(dentry, cachefiles_xattr_cache, xtype, 3);
+	ret = vfs_getxattr(&init_user_ns, dentry, cachefiles_xattr_cache, xtype, 3);
 	if (ret < 0) {
 		if (ret == -ERANGE)
 			goto bad_type_length;
@@ -110,7 +110,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object,
 	_debug("SET #%u", auxdata->len);
 
 	clear_bit(FSCACHE_COOKIE_AUX_UPDATED, &object->fscache.cookie->flags);
-	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+	ret = vfs_setxattr(&init_user_ns, dentry, cachefiles_xattr_cache,
 			   &auxdata->type, auxdata->len,
 			   XATTR_CREATE);
 	if (ret < 0 && ret != -ENOMEM)
@@ -140,7 +140,7 @@ int cachefiles_update_object_xattr(struct cachefiles_object *object,
 	_debug("SET #%u", auxdata->len);
 
 	clear_bit(FSCACHE_COOKIE_AUX_UPDATED, &object->fscache.cookie->flags);
-	ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+	ret = vfs_setxattr(&init_user_ns, dentry, cachefiles_xattr_cache,
 			   &auxdata->type, auxdata->len,
 			   XATTR_REPLACE);
 	if (ret < 0 && ret != -ENOMEM)
@@ -171,7 +171,7 @@ int cachefiles_check_auxdata(struct cachefiles_object *object)
 	if (!auxbuf)
 		return -ENOMEM;
 
-	xlen = vfs_getxattr(dentry, cachefiles_xattr_cache,
+	xlen = vfs_getxattr(&init_user_ns, dentry, cachefiles_xattr_cache,
 			    &auxbuf->type, 512 + 1);
 	ret = -ESTALE;
 	if (xlen < 1 ||
@@ -213,7 +213,7 @@ int cachefiles_check_object_xattr(struct cachefiles_object *object,
 	}
 
 	/* read the current type label */
-	ret = vfs_getxattr(dentry, cachefiles_xattr_cache,
+	ret = vfs_getxattr(&init_user_ns, dentry, cachefiles_xattr_cache,
 			   &auxbuf->type, 512 + 1);
 	if (ret < 0) {
 		if (ret == -ENODATA)
@@ -270,7 +270,7 @@ int cachefiles_check_object_xattr(struct cachefiles_object *object,
 		}
 
 		/* update the current label */
-		ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+		ret = vfs_setxattr(&init_user_ns, dentry, cachefiles_xattr_cache,
 				   &auxdata->type, auxdata->len,
 				   XATTR_REPLACE);
 		if (ret < 0) {
@@ -309,7 +309,7 @@ int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
 {
 	int ret;
 
-	ret = vfs_removexattr(dentry, cachefiles_xattr_cache);
+	ret = vfs_removexattr(&init_user_ns, dentry, cachefiles_xattr_cache);
 	if (ret < 0) {
 		if (ret == -ENOENT || ret == -ENODATA)
 			ret = 0;
diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 0681540c48d9..943e523f4c9d 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1110,8 +1110,8 @@ ecryptfs_write_metadata_to_xattr(struct dentry *ecryptfs_dentry,
 	}
 
 	inode_lock(lower_inode);
-	rc = __vfs_setxattr(lower_dentry, lower_inode, ECRYPTFS_XATTR_NAME,
-			    page_virt, size, 0);
+	rc = __vfs_setxattr(&init_user_ns, lower_dentry, lower_inode,
+			    ECRYPTFS_XATTR_NAME, page_virt, size, 0);
 	if (!rc && ecryptfs_inode)
 		fsstack_copy_attr_all(ecryptfs_inode, lower_inode);
 	inode_unlock(lower_inode);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 28cf830f0a4d..c35c2d2f3fe6 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1022,7 +1022,7 @@ ecryptfs_setxattr(struct dentry *dentry, struct inode *inode,
 		rc = -EOPNOTSUPP;
 		goto out;
 	}
-	rc = vfs_setxattr(lower_dentry, name, value, size, flags);
+	rc = vfs_setxattr(&init_user_ns, lower_dentry, name, value, size, flags);
 	if (!rc && inode)
 		fsstack_copy_attr_all(inode, d_inode(lower_dentry));
 out:
@@ -1087,7 +1087,7 @@ static int ecryptfs_removexattr(struct dentry *dentry, struct inode *inode,
 		goto out;
 	}
 	inode_lock(lower_inode);
-	rc = __vfs_removexattr(lower_dentry, name);
+	rc = __vfs_removexattr(&init_user_ns, lower_dentry, name);
 	inode_unlock(lower_inode);
 out:
 	return rc;
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 019572c6b39a..2f333a40ff4d 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -426,8 +426,8 @@ static int ecryptfs_write_inode_size_to_xattr(struct inode *ecryptfs_inode)
 	if (size < 0)
 		size = 8;
 	put_unaligned_be64(i_size_read(ecryptfs_inode), xattr_virt);
-	rc = __vfs_setxattr(lower_dentry, lower_inode, ECRYPTFS_XATTR_NAME,
-			    xattr_virt, size, 0);
+	rc = __vfs_setxattr(&init_user_ns, lower_dentry, lower_inode,
+			    ECRYPTFS_XATTR_NAME, xattr_virt, size, 0);
 	inode_unlock(lower_inode);
 	if (rc)
 		printk(KERN_ERR "Error whilst attempting to write inode size "
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index e9b1452b505e..9a0e0e5b34f5 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -499,7 +499,7 @@ int nfsd4_is_junction(struct dentry *dentry)
 		return 0;
 	if (!(inode->i_mode & S_ISVTX))
 		return 0;
-	if (vfs_getxattr(dentry, NFSD_JUNCTION_XATTR_NAME, NULL, 0) <= 0)
+	if (vfs_getxattr(&init_user_ns, dentry, NFSD_JUNCTION_XATTR_NAME, NULL, 0) <= 0)
 		return 0;
 	return 1;
 }
@@ -2138,7 +2138,7 @@ nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name,
 
 	inode_lock_shared(inode);
 
-	len = vfs_getxattr(dentry, name, NULL, 0);
+	len = vfs_getxattr(&init_user_ns, dentry, name, NULL, 0);
 
 	/*
 	 * Zero-length attribute, just return.
@@ -2165,7 +2165,7 @@ nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name,
 		goto out;
 	}
 
-	len = vfs_getxattr(dentry, name, buf, len);
+	len = vfs_getxattr(&init_user_ns, dentry, name, buf, len);
 	if (len <= 0) {
 		kvfree(buf);
 		buf = NULL;
@@ -2272,7 +2272,7 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
 
 	fh_lock(fhp);
 
-	ret = __vfs_removexattr_locked(fhp->fh_dentry, name, NULL);
+	ret = __vfs_removexattr_locked(&init_user_ns, fhp->fh_dentry, name, NULL);
 
 	fh_unlock(fhp);
 	fh_drop_write(fhp);
@@ -2296,8 +2296,8 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name,
 		return nfserrno(ret);
 	fh_lock(fhp);
 
-	ret = __vfs_setxattr_locked(fhp->fh_dentry, name, buf, len, flags,
-				    NULL);
+	ret = __vfs_setxattr_locked(&init_user_ns, fhp->fh_dentry, name, buf,
+				    len, flags, NULL);
 
 	fh_unlock(fhp);
 	fh_drop_write(fhp);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b972a58f1836..0ad1c47e2ca2 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -85,9 +85,9 @@ int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
 		if (ovl_is_private_xattr(sb, name))
 			continue;
 retry:
-		size = vfs_getxattr(old, name, value, value_size);
+		size = vfs_getxattr(&init_user_ns, old, name, value, value_size);
 		if (size == -ERANGE)
-			size = vfs_getxattr(old, name, NULL, 0);
+			size = vfs_getxattr(&init_user_ns, old, name, NULL, 0);
 
 		if (size < 0) {
 			error = size;
@@ -114,7 +114,7 @@ int ovl_copy_xattr(struct super_block *sb, struct dentry *old,
 			error = 0;
 			continue; /* Discard */
 		}
-		error = vfs_setxattr(new, name, value, size, 0);
+		error = vfs_setxattr(&init_user_ns, new, name, value, size, 0);
 		if (error) {
 			if (error != -EOPNOTSUPP || ovl_must_copy_xattr(name))
 				break;
@@ -791,7 +791,7 @@ static ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value)
 	ssize_t res;
 	char *buf;
 
-	res = vfs_getxattr(dentry, name, NULL, 0);
+	res = vfs_getxattr(&init_user_ns, dentry, name, NULL, 0);
 	if (res == -ENODATA || res == -EOPNOTSUPP)
 		res = 0;
 
@@ -800,7 +800,7 @@ static ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value)
 		if (!buf)
 			return -ENOMEM;
 
-		res = vfs_getxattr(dentry, name, buf, res);
+		res = vfs_getxattr(&init_user_ns, dentry, name, buf, res);
 		if (res < 0)
 			kfree(buf);
 		else
@@ -842,7 +842,7 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c)
 	 * don't want that to happen for normal copy-up operation.
 	 */
 	if (capability) {
-		err = vfs_setxattr(upperpath.dentry, XATTR_NAME_CAPS,
+		err = vfs_setxattr(&init_user_ns, upperpath.dentry, XATTR_NAME_CAPS,
 				   capability, cap_size, 0);
 		if (err)
 			goto out_free;
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 28e4d3e4d54d..a1794c838b78 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -449,7 +449,7 @@ static int ovl_set_upper_acl(struct dentry *upperdentry, const char *name,
 	if (err < 0)
 		goto out_free;
 
-	err = vfs_setxattr(upperdentry, name, buffer, size, XATTR_CREATE);
+	err = vfs_setxattr(&init_user_ns, upperdentry, name, buffer, size, XATTR_CREATE);
 out_free:
 	kfree(buffer);
 	return err;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 7d1a455e4177..2835f33f386a 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -346,7 +346,7 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		goto out;
 
 	if (!value && !upperdentry) {
-		err = vfs_getxattr(realdentry, name, NULL, 0);
+		err = vfs_getxattr(&init_user_ns, realdentry, name, NULL, 0);
 		if (err < 0)
 			goto out_drop_write;
 	}
@@ -361,10 +361,10 @@ int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 
 	old_cred = ovl_override_creds(dentry->d_sb);
 	if (value)
-		err = vfs_setxattr(realdentry, name, value, size, flags);
+		err = vfs_setxattr(&init_user_ns, realdentry, name, value, size, flags);
 	else {
 		WARN_ON(flags != XATTR_REPLACE);
-		err = vfs_removexattr(realdentry, name);
+		err = vfs_removexattr(&init_user_ns, realdentry, name);
 	}
 	revert_creds(old_cred);
 
@@ -386,7 +386,7 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
 		ovl_i_dentry_upper(inode) ?: ovl_dentry_lower(dentry);
 
 	old_cred = ovl_override_creds(dentry->d_sb);
-	res = vfs_getxattr(realdentry, name, value, size);
+	res = vfs_getxattr(&init_user_ns, realdentry, name, value, size);
 	revert_creds(old_cred);
 	return res;
 }
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f8880aa2ba0e..bf26fb3fa2c1 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -184,7 +184,7 @@ static inline ssize_t ovl_do_getxattr(struct ovl_fs *ofs, struct dentry *dentry,
 				      size_t size)
 {
 	const char *name = ovl_xattr(ofs, ox);
-	return vfs_getxattr(dentry, name, value, size);
+	return vfs_getxattr(&init_user_ns, dentry, name, value, size);
 }
 
 static inline int ovl_do_setxattr(struct ovl_fs *ofs, struct dentry *dentry,
@@ -192,7 +192,7 @@ static inline int ovl_do_setxattr(struct ovl_fs *ofs, struct dentry *dentry,
 				  size_t size)
 {
 	const char *name = ovl_xattr(ofs, ox);
-	int err = vfs_setxattr(dentry, name, value, size, 0);
+	int err = vfs_setxattr(&init_user_ns, dentry, name, value, size, 0);
 	pr_debug("setxattr(%pd2, \"%s\", \"%*pE\", %zu, 0) = %i\n",
 		 dentry, name, min((int)size, 48), value, size, err);
 	return err;
@@ -202,7 +202,7 @@ static inline int ovl_do_removexattr(struct ovl_fs *ofs, struct dentry *dentry,
 				     enum ovl_xattr ox)
 {
 	const char *name = ovl_xattr(ofs, ox);
-	int err = vfs_removexattr(dentry, name);
+	int err = vfs_removexattr(&init_user_ns, dentry, name);
 	pr_debug("removexattr(%pd2, \"%s\") = %i\n", dentry, name, err);
 	return err;
 }
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 9c12df942407..4dcd092ca561 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -749,11 +749,11 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
 		 * allowed as upper are limited to "normal" ones, where checking
 		 * for the above two errors is sufficient.
 		 */
-		err = vfs_removexattr(work, XATTR_NAME_POSIX_ACL_DEFAULT);
+		err = vfs_removexattr(&init_user_ns, work, XATTR_NAME_POSIX_ACL_DEFAULT);
 		if (err && err != -ENODATA && err != -EOPNOTSUPP)
 			goto out_dput;
 
-		err = vfs_removexattr(work, XATTR_NAME_POSIX_ACL_ACCESS);
+		err = vfs_removexattr(&init_user_ns, work, XATTR_NAME_POSIX_ACL_ACCESS);
 		if (err && err != -ENODATA && err != -EOPNOTSUPP)
 			goto out_dput;
 
diff --git a/fs/xattr.c b/fs/xattr.c
index ff9ffe77a4b2..438fedfcd402 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -83,7 +83,8 @@ xattr_resolve_name(struct inode *inode, const char **name)
  * because different namespaces have very different rules.
  */
 static int
-xattr_permission(struct inode *inode, const char *name, int mask)
+xattr_permission(struct user_namespace *user_ns, struct inode *inode,
+		 const char *name, int mask)
 {
 	/*
 	 * We can never set or remove an extended attribute on a read-only
@@ -127,11 +128,11 @@ xattr_permission(struct inode *inode, const char *name, int mask)
 		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
 			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
 		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
-		    (mask & MAY_WRITE) && !inode_owner_or_capable(&init_user_ns, inode))
+		    (mask & MAY_WRITE) && !inode_owner_or_capable(user_ns, inode))
 			return -EPERM;
 	}
 
-	return inode_permission(&init_user_ns, inode, mask);
+	return inode_permission(user_ns, inode, mask);
 }
 
 /*
@@ -162,8 +163,9 @@ xattr_supported_namespace(struct inode *inode, const char *prefix)
 EXPORT_SYMBOL(xattr_supported_namespace);
 
 int
-__vfs_setxattr(struct dentry *dentry, struct inode *inode, const char *name,
-	       const void *value, size_t size, int flags)
+__vfs_setxattr(struct user_namespace *user_ns, struct dentry *dentry,
+	       struct inode *inode, const char *name, const void *value,
+	       size_t size, int flags)
 {
 	const struct xattr_handler *handler;
 
@@ -174,7 +176,7 @@ __vfs_setxattr(struct dentry *dentry, struct inode *inode, const char *name,
 		return -EOPNOTSUPP;
 	if (size == 0)
 		value = "";  /* empty EA, do not remove */
-	return handler->set(handler, &init_user_ns, dentry, inode, name, value, size, flags);
+	return handler->set(handler, user_ns, dentry, inode, name, value, size, flags);
 }
 EXPORT_SYMBOL(__vfs_setxattr);
 
@@ -182,6 +184,7 @@ EXPORT_SYMBOL(__vfs_setxattr);
  *  __vfs_setxattr_noperm - perform setxattr operation without performing
  *  permission checks.
  *
+ *  @user_ns - user namespace of the mount
  *  @dentry - object to perform setxattr on
  *  @name - xattr name to set
  *  @value - value to set @name to
@@ -194,8 +197,9 @@ EXPORT_SYMBOL(__vfs_setxattr);
  *  is executed. It also assumes that the caller will make the appropriate
  *  permission checks.
  */
-int __vfs_setxattr_noperm(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags)
+int __vfs_setxattr_noperm(struct user_namespace *user_ns, struct dentry *dentry,
+			  const char *name, const void *value, size_t size,
+			  int flags)
 {
 	struct inode *inode = dentry->d_inode;
 	int error = -EAGAIN;
@@ -205,7 +209,7 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const char *name,
 	if (issec)
 		inode->i_flags &= ~S_NOSEC;
 	if (inode->i_opflags & IOP_XATTR) {
-		error = __vfs_setxattr(dentry, inode, name, value, size, flags);
+		error = __vfs_setxattr(user_ns, dentry, inode, name, value, size, flags);
 		if (!error) {
 			fsnotify_xattr(dentry);
 			security_inode_post_setxattr(dentry, name, value,
@@ -244,14 +248,14 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const char *name,
  *  a delegation was broken on, NULL if none.
  */
 int
-__vfs_setxattr_locked(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags,
-		struct inode **delegated_inode)
+__vfs_setxattr_locked(struct user_namespace *user_ns, struct dentry *dentry,
+		      const char *name, const void *value, size_t size,
+		      int flags, struct inode **delegated_inode)
 {
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_WRITE);
+	error = xattr_permission(user_ns, inode, name, MAY_WRITE);
 	if (error)
 		return error;
 
@@ -263,7 +267,7 @@ __vfs_setxattr_locked(struct dentry *dentry, const char *name,
 	if (error)
 		goto out;
 
-	error = __vfs_setxattr_noperm(dentry, name, value, size, flags);
+	error = __vfs_setxattr_noperm(user_ns, dentry, name, value, size, flags);
 
 out:
 	return error;
@@ -271,8 +275,8 @@ __vfs_setxattr_locked(struct dentry *dentry, const char *name,
 EXPORT_SYMBOL_GPL(__vfs_setxattr_locked);
 
 int
-vfs_setxattr(struct dentry *dentry, const char *name, const void *value,
-		size_t size, int flags)
+vfs_setxattr(struct user_namespace *user_ns, struct dentry *dentry,
+	     const char *name, const void *value, size_t size, int flags)
 {
 	struct inode *inode = dentry->d_inode;
 	struct inode *delegated_inode = NULL;
@@ -280,8 +284,8 @@ vfs_setxattr(struct dentry *dentry, const char *name, const void *value,
 
 retry_deleg:
 	inode_lock(inode);
-	error = __vfs_setxattr_locked(dentry, name, value, size, flags,
-	    &delegated_inode);
+	error = __vfs_setxattr_locked(user_ns, dentry, name, value, size, flags,
+				      &delegated_inode);
 	inode_unlock(inode);
 
 	if (delegated_inode) {
@@ -328,15 +332,16 @@ xattr_getsecurity(struct inode *inode, const char *name, void *value,
  * Returns the result of alloc, if failed, or the getxattr operation.
  */
 ssize_t
-vfs_getxattr_alloc(struct dentry *dentry, const char *name, char **xattr_value,
-		   size_t xattr_size, gfp_t flags)
+vfs_getxattr_alloc(struct user_namespace *user_ns, struct dentry *dentry,
+		   const char *name, char **xattr_value, size_t xattr_size,
+		   gfp_t flags)
 {
 	const struct xattr_handler *handler;
 	struct inode *inode = dentry->d_inode;
 	char *value = *xattr_value;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_READ);
+	error = xattr_permission(user_ns, inode, name, MAY_READ);
 	if (error)
 		return error;
 
@@ -377,12 +382,13 @@ __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name,
 EXPORT_SYMBOL(__vfs_getxattr);
 
 ssize_t
-vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t size)
+vfs_getxattr(struct user_namespace *user_ns, struct dentry *dentry,
+	     const char *name, void *value, size_t size)
 {
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_READ);
+	error = xattr_permission(user_ns, inode, name, MAY_READ);
 	if (error)
 		return error;
 
@@ -428,7 +434,7 @@ vfs_listxattr(struct dentry *dentry, char *list, size_t size)
 EXPORT_SYMBOL_GPL(vfs_listxattr);
 
 int
-__vfs_removexattr(struct dentry *dentry, const char *name)
+__vfs_removexattr(struct user_namespace *user_ns, struct dentry *dentry, const char *name)
 {
 	struct inode *inode = d_inode(dentry);
 	const struct xattr_handler *handler;
@@ -438,7 +444,7 @@ __vfs_removexattr(struct dentry *dentry, const char *name)
 		return PTR_ERR(handler);
 	if (!handler->set)
 		return -EOPNOTSUPP;
-	return handler->set(handler, &init_user_ns, dentry, inode, name, NULL, 0, XATTR_REPLACE);
+	return handler->set(handler, user_ns, dentry, inode, name, NULL, 0, XATTR_REPLACE);
 }
 EXPORT_SYMBOL(__vfs_removexattr);
 
@@ -452,13 +458,13 @@ EXPORT_SYMBOL(__vfs_removexattr);
  *  a delegation was broken on, NULL if none.
  */
 int
-__vfs_removexattr_locked(struct dentry *dentry, const char *name,
-		struct inode **delegated_inode)
+__vfs_removexattr_locked(struct user_namespace *user_ns, struct dentry *dentry,
+			 const char *name, struct inode **delegated_inode)
 {
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	error = xattr_permission(inode, name, MAY_WRITE);
+	error = xattr_permission(user_ns, inode, name, MAY_WRITE);
 	if (error)
 		return error;
 
@@ -470,7 +476,7 @@ __vfs_removexattr_locked(struct dentry *dentry, const char *name,
 	if (error)
 		goto out;
 
-	error = __vfs_removexattr(dentry, name);
+	error = __vfs_removexattr(user_ns, dentry, name);
 
 	if (!error) {
 		fsnotify_xattr(dentry);
@@ -483,7 +489,8 @@ __vfs_removexattr_locked(struct dentry *dentry, const char *name,
 EXPORT_SYMBOL_GPL(__vfs_removexattr_locked);
 
 int
-vfs_removexattr(struct dentry *dentry, const char *name)
+vfs_removexattr(struct user_namespace *user_ns, struct dentry *dentry,
+		const char *name)
 {
 	struct inode *inode = dentry->d_inode;
 	struct inode *delegated_inode = NULL;
@@ -491,7 +498,7 @@ vfs_removexattr(struct dentry *dentry, const char *name)
 
 retry_deleg:
 	inode_lock(inode);
-	error = __vfs_removexattr_locked(dentry, name, &delegated_inode);
+	error = __vfs_removexattr_locked(user_ns, dentry, name, &delegated_inode);
 	inode_unlock(inode);
 
 	if (delegated_inode) {
@@ -508,8 +515,9 @@ EXPORT_SYMBOL_GPL(vfs_removexattr);
  * Extended attribute SET operations
  */
 static long
-setxattr(struct dentry *d, const char __user *name, const void __user *value,
-	 size_t size, int flags)
+setxattr(struct user_namespace *user_ns, struct dentry *d,
+	 const char __user *name, const void __user *value, size_t size,
+	 int flags)
 {
 	int error;
 	void *kvalue = NULL;
@@ -536,16 +544,16 @@ setxattr(struct dentry *d, const char __user *name, const void __user *value,
 		}
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
 		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_from_user(&init_user_ns, kvalue, size);
+			posix_acl_fix_xattr_from_user(user_ns, kvalue, size);
 		else if (strcmp(kname, XATTR_NAME_CAPS) == 0) {
-			error = cap_convert_nscap(&init_user_ns, d, &kvalue, size);
+			error = cap_convert_nscap(user_ns, d, &kvalue, size);
 			if (error < 0)
 				goto out;
 			size = error;
 		}
 	}
 
-	error = vfs_setxattr(d, kname, kvalue, size, flags);
+	error = vfs_setxattr(user_ns, d, kname, kvalue, size, flags);
 out:
 	kvfree(kvalue);
 
@@ -558,13 +566,17 @@ static int path_setxattr(const char __user *pathname,
 {
 	struct path path;
 	int error;
+
 retry:
 	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
 	if (error)
 		return error;
 	error = mnt_want_write(path.mnt);
 	if (!error) {
-		error = setxattr(path.dentry, name, value, size, flags);
+		struct user_namespace *user_ns;
+
+		user_ns = mnt_user_ns(path.mnt);
+		error = setxattr(user_ns, path.dentry, name, value, size, flags);
 		mnt_drop_write(path.mnt);
 	}
 	path_put(&path);
@@ -600,7 +612,11 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 	audit_file(f.file);
 	error = mnt_want_write_file(f.file);
 	if (!error) {
-		error = setxattr(f.file->f_path.dentry, name, value, size, flags);
+		struct user_namespace *user_ns;
+
+		user_ns = mnt_user_ns(f.file->f_path.mnt);
+		error = setxattr(user_ns, f.file->f_path.dentry, name, value,
+				 size, flags);
 		mnt_drop_write_file(f.file);
 	}
 	fdput(f);
@@ -611,8 +627,8 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
  * Extended attribute GET operations
  */
 static ssize_t
-getxattr(struct dentry *d, const char __user *name, void __user *value,
-	 size_t size)
+getxattr(struct user_namespace *user_ns, struct dentry *d,
+	 const char __user *name, void __user *value, size_t size)
 {
 	ssize_t error;
 	void *kvalue = NULL;
@@ -632,11 +648,11 @@ getxattr(struct dentry *d, const char __user *name, void __user *value,
 			return -ENOMEM;
 	}
 
-	error = vfs_getxattr(d, kname, kvalue, size);
+	error = vfs_getxattr(user_ns, d, kname, kvalue, size);
 	if (error > 0) {
 		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
 		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_to_user(&init_user_ns, kvalue, error);
+			posix_acl_fix_xattr_to_user(user_ns, kvalue, error);
 		if (size && copy_to_user(value, kvalue, error))
 			error = -EFAULT;
 	} else if (error == -ERANGE && size >= XATTR_SIZE_MAX) {
@@ -654,13 +670,15 @@ static ssize_t path_getxattr(const char __user *pathname,
 			     const char __user *name, void __user *value,
 			     size_t size, unsigned int lookup_flags)
 {
+	struct user_namespace *user_ns;
 	struct path path;
 	ssize_t error;
 retry:
 	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
 	if (error)
 		return error;
-	error = getxattr(path.dentry, name, value, size);
+	user_ns = mnt_user_ns(path.mnt);
+	error = getxattr(user_ns, path.dentry, name, value, size);
 	path_put(&path);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
@@ -684,13 +702,15 @@ SYSCALL_DEFINE4(lgetxattr, const char __user *, pathname,
 SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
 		void __user *, value, size_t, size)
 {
+	struct user_namespace *user_ns;
 	struct fd f = fdget(fd);
 	ssize_t error = -EBADF;
 
 	if (!f.file)
 		return error;
 	audit_file(f.file);
-	error = getxattr(f.file->f_path.dentry, name, value, size);
+	user_ns = mnt_user_ns(f.file->f_path.mnt);
+	error = getxattr(user_ns, f.file->f_path.dentry, name, value, size);
 	fdput(f);
 	return error;
 }
@@ -774,7 +794,7 @@ SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
  * Extended attribute REMOVE operations
  */
 static long
-removexattr(struct dentry *d, const char __user *name)
+removexattr(struct user_namespace *user_ns, struct dentry *d, const char __user *name)
 {
 	int error;
 	char kname[XATTR_NAME_MAX + 1];
@@ -785,7 +805,7 @@ removexattr(struct dentry *d, const char __user *name)
 	if (error < 0)
 		return error;
 
-	return vfs_removexattr(d, kname);
+	return vfs_removexattr(user_ns, d, kname);
 }
 
 static int path_removexattr(const char __user *pathname,
@@ -799,7 +819,9 @@ static int path_removexattr(const char __user *pathname,
 		return error;
 	error = mnt_want_write(path.mnt);
 	if (!error) {
-		error = removexattr(path.dentry, name);
+		struct user_namespace *user_ns = mnt_user_ns(path.mnt);
+
+		error = removexattr(user_ns, path.dentry, name);
 		mnt_drop_write(path.mnt);
 	}
 	path_put(&path);
@@ -832,7 +854,9 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 	audit_file(f.file);
 	error = mnt_want_write_file(f.file);
 	if (!error) {
-		error = removexattr(f.file->f_path.dentry, name);
+		struct user_namespace *user_ns = mnt_user_ns(f.file->f_path.mnt);
+
+		error = removexattr(user_ns, f.file->f_path.dentry, name);
 		mnt_drop_write_file(f.file);
 	}
 	fdput(f);
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index 6d2ef03ba399..a015f5b03ef2 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -16,6 +16,7 @@
 #include <linux/types.h>
 #include <linux/spinlock.h>
 #include <linux/mm.h>
+#include <linux/user_namespace.h>
 #include <uapi/linux/xattr.h>
 
 struct inode;
@@ -48,18 +49,23 @@ struct xattr {
 };
 
 ssize_t __vfs_getxattr(struct dentry *, struct inode *, const char *, void *, size_t);
-ssize_t vfs_getxattr(struct dentry *, const char *, void *, size_t);
+ssize_t vfs_getxattr(struct user_namespace *, struct dentry *, const char *, void *, size_t);
 ssize_t vfs_listxattr(struct dentry *d, char *list, size_t size);
-int __vfs_setxattr(struct dentry *, struct inode *, const char *, const void *, size_t, int);
-int __vfs_setxattr_noperm(struct dentry *, const char *, const void *, size_t, int);
-int __vfs_setxattr_locked(struct dentry *, const char *, const void *, size_t, int, struct inode **);
-int vfs_setxattr(struct dentry *, const char *, const void *, size_t, int);
-int __vfs_removexattr(struct dentry *, const char *);
-int __vfs_removexattr_locked(struct dentry *, const char *, struct inode **);
-int vfs_removexattr(struct dentry *, const char *);
+int __vfs_setxattr(struct user_namespace *, struct dentry *, struct inode *,
+		   const char *, const void *, size_t, int);
+int __vfs_setxattr_noperm(struct user_namespace *, struct dentry *,
+			  const char *, const void *, size_t, int);
+int __vfs_setxattr_locked(struct user_namespace *, struct dentry *,
+			  const char *, const void *, size_t, int,
+			  struct inode **);
+int vfs_setxattr(struct user_namespace *, struct dentry *, const char *, const void *, size_t, int);
+int __vfs_removexattr(struct user_namespace *, struct dentry *, const char *);
+int __vfs_removexattr_locked(struct user_namespace *, struct dentry *, const char *, struct inode **);
+int vfs_removexattr(struct user_namespace *, struct dentry *, const char *);
 
 ssize_t generic_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size);
-ssize_t vfs_getxattr_alloc(struct dentry *dentry, const char *name,
+ssize_t vfs_getxattr_alloc(struct user_namespace *user_ns,
+			   struct dentry *dentry, const char *name,
 			   char **xattr_value, size_t size, gfp_t flags);
 
 int xattr_supported_namespace(struct inode *inode, const char *prefix);
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index f919ebd042fd..16f184bc48de 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -324,8 +324,8 @@ static int aa_xattrs_match(const struct linux_binprm *bprm,
 	d = bprm->file->f_path.dentry;
 
 	for (i = 0; i < profile->xattr_count; i++) {
-		size = vfs_getxattr_alloc(d, profile->xattrs[i], &value,
-					  value_size, GFP_KERNEL);
+		size = vfs_getxattr_alloc(&init_user_ns, d, profile->xattrs[i],
+					  &value, value_size, GFP_KERNEL);
 		if (size >= 0) {
 			u32 perm;
 
diff --git a/security/commoncap.c b/security/commoncap.c
index abefe9223b5e..873820091b07 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -313,7 +313,7 @@ int cap_inode_killpriv(struct dentry *dentry)
 {
 	int error;
 
-	error = __vfs_removexattr(dentry, XATTR_NAME_CAPS);
+	error = __vfs_removexattr(&init_user_ns, dentry, XATTR_NAME_CAPS);
 	if (error == -EOPNOTSUPP)
 		error = 0;
 	return error;
@@ -386,8 +386,8 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
 		return -EINVAL;
 
 	size = sizeof(struct vfs_ns_cap_data);
-	ret = (int) vfs_getxattr_alloc(dentry, XATTR_NAME_CAPS,
-				 &tmpbuf, size, GFP_NOFS);
+	ret = (int)vfs_getxattr_alloc(&init_user_ns, dentry, XATTR_NAME_CAPS,
+				      &tmpbuf, size, GFP_NOFS);
 	dput(dentry);
 
 	if (ret < 0)
diff --git a/security/integrity/evm/evm_crypto.c b/security/integrity/evm/evm_crypto.c
index 168c3b78ac47..f720f78cbbb1 100644
--- a/security/integrity/evm/evm_crypto.c
+++ b/security/integrity/evm/evm_crypto.c
@@ -222,7 +222,7 @@ static int evm_calc_hmac_or_hash(struct dentry *dentry,
 				ima_present = true;
 			continue;
 		}
-		size = vfs_getxattr_alloc(dentry, xattr->name,
+		size = vfs_getxattr_alloc(&init_user_ns, dentry, xattr->name,
 					  &xattr_value, xattr_size, GFP_NOFS);
 		if (size == -ENOMEM) {
 			error = -ENOMEM;
@@ -275,8 +275,8 @@ static int evm_is_immutable(struct dentry *dentry, struct inode *inode)
 		return 1;
 
 	/* Do this the hard way */
-	rc = vfs_getxattr_alloc(dentry, XATTR_NAME_EVM, (char **)&xattr_data, 0,
-				GFP_NOFS);
+	rc = vfs_getxattr_alloc(&init_user_ns, dentry, XATTR_NAME_EVM,
+				(char **)&xattr_data, 0, GFP_NOFS);
 	if (rc <= 0) {
 		if (rc == -ENODATA)
 			return 0;
@@ -319,11 +319,12 @@ int evm_update_evmxattr(struct dentry *dentry, const char *xattr_name,
 			   xattr_value_len, &data);
 	if (rc == 0) {
 		data.hdr.xattr.sha1.type = EVM_XATTR_HMAC;
-		rc = __vfs_setxattr_noperm(dentry, XATTR_NAME_EVM,
+		rc = __vfs_setxattr_noperm(&init_user_ns, dentry,
+					   XATTR_NAME_EVM,
 					   &data.hdr.xattr.data[1],
 					   SHA1_DIGEST_SIZE + 1, 0);
 	} else if (rc == -ENODATA && (inode->i_opflags & IOP_XATTR)) {
-		rc = __vfs_removexattr(dentry, XATTR_NAME_EVM);
+		rc = __vfs_removexattr(&init_user_ns, dentry, XATTR_NAME_EVM);
 	}
 	return rc;
 }
diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
index 76d19146d74b..0de367aaa2d3 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -146,8 +146,8 @@ static enum integrity_status evm_verify_hmac(struct dentry *dentry,
 	/* if status is not PASS, try to check again - against -ENOMEM */
 
 	/* first need to know the sig type */
-	rc = vfs_getxattr_alloc(dentry, XATTR_NAME_EVM, (char **)&xattr_data, 0,
-				GFP_NOFS);
+	rc = vfs_getxattr_alloc(&init_user_ns, dentry, XATTR_NAME_EVM,
+				(char **)&xattr_data, 0, GFP_NOFS);
 	if (rc <= 0) {
 		evm_status = INTEGRITY_FAIL;
 		if (rc == -ENODATA) {
diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
index 3dd8c2e4314e..3492f0b2da1c 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -89,7 +89,7 @@ static int ima_fix_xattr(struct dentry *dentry,
 		iint->ima_hash->xattr.ng.type = IMA_XATTR_DIGEST_NG;
 		iint->ima_hash->xattr.ng.algo = algo;
 	}
-	rc = __vfs_setxattr_noperm(dentry, XATTR_NAME_IMA,
+	rc = __vfs_setxattr_noperm(&init_user_ns, dentry, XATTR_NAME_IMA,
 				   &iint->ima_hash->xattr.data[offset],
 				   (sizeof(iint->ima_hash->xattr) - offset) +
 				   iint->ima_hash->length, 0);
@@ -210,8 +210,8 @@ int ima_read_xattr(struct dentry *dentry,
 {
 	ssize_t ret;
 
-	ret = vfs_getxattr_alloc(dentry, XATTR_NAME_IMA, (char **)xattr_value,
-				 0, GFP_NOFS);
+	ret = vfs_getxattr_alloc(&init_user_ns, dentry, XATTR_NAME_IMA,
+				 (char **)xattr_value, 0, GFP_NOFS);
 	if (ret == -EOPNOTSUPP)
 		ret = 0;
 	return ret;
@@ -515,7 +515,7 @@ void ima_inode_post_setattr(struct dentry *dentry)
 
 	action = ima_must_appraise(inode, MAY_ACCESS, POST_SETATTR);
 	if (!action)
-		__vfs_removexattr(dentry, XATTR_NAME_IMA);
+		__vfs_removexattr(&init_user_ns, dentry, XATTR_NAME_IMA);
 	iint = integrity_iint_find(inode);
 	if (iint) {
 		set_bit(IMA_CHANGE_ATTR, &iint->atomic_flags);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 14a195fa55eb..3f42950feed0 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6518,7 +6518,7 @@ static int selinux_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen
  */
 static int selinux_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
 {
-	return __vfs_setxattr_noperm(dentry, XATTR_NAME_SELINUX, ctx, ctxlen, 0);
+	return __vfs_setxattr_noperm(&init_user_ns, dentry, XATTR_NAME_SELINUX, ctx, ctxlen, 0);
 }
 
 static int selinux_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 5c90b9fa4d40..11003244b18b 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -3425,7 +3425,7 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 			 */
 			if (isp->smk_flags & SMK_INODE_CHANGED) {
 				isp->smk_flags &= ~SMK_INODE_CHANGED;
-				rc = __vfs_setxattr(dp, inode,
+				rc = __vfs_setxattr(&init_user_ns, dp, inode,
 					XATTR_NAME_SMACKTRANSMUTE,
 					TRANS_TRUE, TRANS_TRUE_SIZE,
 					0);
@@ -4603,7 +4603,7 @@ static int smack_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
 
 static int smack_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
 {
-	return __vfs_setxattr_noperm(dentry, XATTR_NAME_SMACK, ctx, ctxlen, 0);
+	return __vfs_setxattr_noperm(&init_user_ns, dentry, XATTR_NAME_SMACK, ctx, ctxlen, 0);
 }
 
 static int smack_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 14/39] commoncap: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (12 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 13/39] xattr: " Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-22 21:18   ` Paul Moore
  2020-11-15 10:36 ` [PATCH v2 15/39] stat: " Christian Brauner
                   ` (27 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller (e.g. during exec), or even whether they need to be removed. The
main infrastructure for this resides in the capability codepaths but they
are called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.
In order to actually get filesystem capabilities from disk the capability
infrastructure exposes the get_vfs_caps_from_disk() helper. For user
namespace aware filesystem capabilities a root uid is stored alongside the
capabilities.
In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according to
the superblock's user namespace. If it can be translated to uid 0 according
to that id mapping the caller can use the filesystem capabilities stored on
disk. If we are accessing the inode that holds the filesystem capabilities
through an idmapped mount we need to map the root uid according to the
mount's user namespace.
Afterwards the checks are identical to non-idmapped mounts. Reading
filesystem caps from disk enforces that the root uid associated with the
filesystem capability must have a mapping in the superblock's user
namespace and that the caller is either in the same user namespace or is a
descendant of the superblock's user namespace. For filesystems that are
mountable inside user namespace the container can just mount the filesystem
and won't usually need to idmap it. If it does create an idmapped mount it
can mark it with a user namespace it has created and which is therefore a
descendant of the s_user_ns. For filesystems that are not mountable inside
user namespaces the descendant rule is trivially true because the s_user_ns
will be the initial user namespace.

If the initial user namespace is passed all operations are a nop so
non-idmapped mounts will not see a change in behavior and will also not see
any performance impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 fs/attr.c                     |  2 +-
 fs/xattr.c                    | 14 +++++------
 include/linux/capability.h    |  4 +++-
 include/linux/lsm_hook_defs.h | 15 +++++++-----
 include/linux/lsm_hooks.h     |  1 +
 include/linux/security.h      | 44 +++++++++++++++++++++++------------
 kernel/auditsc.c              |  5 ++--
 security/commoncap.c          | 30 +++++++++++++++---------
 security/security.c           | 25 ++++++++++++--------
 security/selinux/hooks.c      | 20 +++++++++-------
 security/smack/smack_lsm.c    | 14 ++++++-----
 11 files changed, 107 insertions(+), 67 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 4b36440236d4..e990cda1ea6f 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -113,7 +113,7 @@ int setattr_prepare(struct user_namespace *user_ns, struct dentry *dentry,
 	if (ia_valid & ATTR_KILL_PRIV) {
 		int error;
 
-		error = security_inode_killpriv(dentry);
+		error = security_inode_killpriv(user_ns, dentry);
 		if (error)
 			return error;
 	}
diff --git a/fs/xattr.c b/fs/xattr.c
index 438fedfcd402..8c50b2a935e4 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -259,7 +259,7 @@ __vfs_setxattr_locked(struct user_namespace *user_ns, struct dentry *dentry,
 	if (error)
 		return error;
 
-	error = security_inode_setxattr(dentry, name, value, size, flags);
+	error = security_inode_setxattr(user_ns, dentry, name, value, size, flags);
 	if (error)
 		goto out;
 
@@ -298,18 +298,18 @@ vfs_setxattr(struct user_namespace *user_ns, struct dentry *dentry,
 EXPORT_SYMBOL_GPL(vfs_setxattr);
 
 static ssize_t
-xattr_getsecurity(struct inode *inode, const char *name, void *value,
-			size_t size)
+xattr_getsecurity(struct user_namespace *user_ns, struct inode *inode,
+		  const char *name, void *value, size_t size)
 {
 	void *buffer = NULL;
 	ssize_t len;
 
 	if (!value || !size) {
-		len = security_inode_getsecurity(inode, name, &buffer, false);
+		len = security_inode_getsecurity(user_ns, inode, name, &buffer, false);
 		goto out_noalloc;
 	}
 
-	len = security_inode_getsecurity(inode, name, &buffer, true);
+	len = security_inode_getsecurity(user_ns, inode, name, &buffer, true);
 	if (len < 0)
 		return len;
 	if (size < len) {
@@ -399,7 +399,7 @@ vfs_getxattr(struct user_namespace *user_ns, struct dentry *dentry,
 	if (!strncmp(name, XATTR_SECURITY_PREFIX,
 				XATTR_SECURITY_PREFIX_LEN)) {
 		const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
-		int ret = xattr_getsecurity(inode, suffix, value, size);
+		int ret = xattr_getsecurity(user_ns, inode, suffix, value, size);
 		/*
 		 * Only overwrite the return value if a security module
 		 * is actually active.
@@ -468,7 +468,7 @@ __vfs_removexattr_locked(struct user_namespace *user_ns, struct dentry *dentry,
 	if (error)
 		return error;
 
-	error = security_inode_removexattr(dentry, name);
+	error = security_inode_removexattr(user_ns, dentry, name);
 	if (error)
 		goto out;
 
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 0d54ca8abaed..92cd727677c4 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -271,7 +271,9 @@ static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns)
 }
 
 /* audit system wants to get cap info from files as well */
-extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
+extern int get_vfs_caps_from_disk(struct user_namespace *user_ns,
+				  const struct dentry *dentry,
+				  struct cpu_vfs_cap_data *cpu_caps);
 
 extern int cap_convert_nscap(struct user_namespace *user_ns,
 			     struct dentry *dentry, void **ivalue, size_t size);
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a940117e7a..ae3dd00f6ffb 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -133,17 +133,20 @@ LSM_HOOK(int, 0, inode_follow_link, struct dentry *dentry, struct inode *inode,
 LSM_HOOK(int, 0, inode_permission, struct inode *inode, int mask)
 LSM_HOOK(int, 0, inode_setattr, struct dentry *dentry, struct iattr *attr)
 LSM_HOOK(int, 0, inode_getattr, const struct path *path)
-LSM_HOOK(int, 0, inode_setxattr, struct dentry *dentry, const char *name,
-	 const void *value, size_t size, int flags)
+LSM_HOOK(int, 0, inode_setxattr, struct user_namespace *user_ns,
+	 struct dentry *dentry, const char *name, const void *value,
+	 size_t size, int flags)
 LSM_HOOK(void, LSM_RET_VOID, inode_post_setxattr, struct dentry *dentry,
 	 const char *name, const void *value, size_t size, int flags)
 LSM_HOOK(int, 0, inode_getxattr, struct dentry *dentry, const char *name)
 LSM_HOOK(int, 0, inode_listxattr, struct dentry *dentry)
-LSM_HOOK(int, 0, inode_removexattr, struct dentry *dentry, const char *name)
+LSM_HOOK(int, 0, inode_removexattr, struct user_namespace *user_ns,
+	 struct dentry *dentry, const char *name)
 LSM_HOOK(int, 0, inode_need_killpriv, struct dentry *dentry)
-LSM_HOOK(int, 0, inode_killpriv, struct dentry *dentry)
-LSM_HOOK(int, -EOPNOTSUPP, inode_getsecurity, struct inode *inode,
-	 const char *name, void **buffer, bool alloc)
+LSM_HOOK(int, 0, inode_killpriv, struct user_namespace *user_ns,
+	 struct dentry *dentry)
+LSM_HOOK(int, -EOPNOTSUPP, inode_getsecurity, struct user_namespace *user_ns,
+	 struct inode *inode, const char *name, void **buffer, bool alloc)
 LSM_HOOK(int, -EOPNOTSUPP, inode_setsecurity, struct inode *inode,
 	 const char *name, const void *value, size_t size, int flags)
 LSM_HOOK(int, 0, inode_listsecurity, struct inode *inode, char *buffer,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c503f7ab8afb..465c9c308922 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -444,6 +444,7 @@
  * @inode_killpriv:
  *	The setuid bit is being removed.  Remove similar security labels.
  *	Called with the dentry->d_inode->i_mutex held.
+ *	@user_ns the user namespace of the mount.
  *	@dentry is the dentry being changed.
  *	Return 0 on success.  If error is returned, then the operation
  *	causing setuid bit removal is failed.
diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..b676a0816ae7 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -146,10 +146,13 @@ extern int cap_capset(struct cred *new, const struct cred *old,
 extern int cap_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file);
 extern int cap_inode_setxattr(struct dentry *dentry, const char *name,
 			      const void *value, size_t size, int flags);
-extern int cap_inode_removexattr(struct dentry *dentry, const char *name);
+extern int cap_inode_removexattr(struct user_namespace *user_ns,
+				 struct dentry *dentry, const char *name);
 extern int cap_inode_need_killpriv(struct dentry *dentry);
-extern int cap_inode_killpriv(struct dentry *dentry);
-extern int cap_inode_getsecurity(struct inode *inode, const char *name,
+extern int cap_inode_killpriv(struct user_namespace *user_ns,
+			      struct dentry *dentry);
+extern int cap_inode_getsecurity(struct user_namespace *user_ns,
+				 struct inode *inode, const char *name,
 				 void **buffer, bool alloc);
 extern int cap_mmap_addr(unsigned long addr);
 extern int cap_mmap_file(struct file *file, unsigned long reqprot,
@@ -344,16 +347,21 @@ int security_inode_follow_link(struct dentry *dentry, struct inode *inode,
 int security_inode_permission(struct inode *inode, int mask);
 int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
 int security_inode_getattr(const struct path *path);
-int security_inode_setxattr(struct dentry *dentry, const char *name,
+int security_inode_setxattr(struct user_namespace *user_ns,
+			    struct dentry *dentry, const char *name,
 			    const void *value, size_t size, int flags);
 void security_inode_post_setxattr(struct dentry *dentry, const char *name,
 				  const void *value, size_t size, int flags);
 int security_inode_getxattr(struct dentry *dentry, const char *name);
 int security_inode_listxattr(struct dentry *dentry);
-int security_inode_removexattr(struct dentry *dentry, const char *name);
+int security_inode_removexattr(struct user_namespace *user_ns,
+			       struct dentry *dentry, const char *name);
 int security_inode_need_killpriv(struct dentry *dentry);
-int security_inode_killpriv(struct dentry *dentry);
-int security_inode_getsecurity(struct inode *inode, const char *name, void **buffer, bool alloc);
+int security_inode_killpriv(struct user_namespace *user_ns,
+			    struct dentry *dentry);
+int security_inode_getsecurity(struct user_namespace *user_ns,
+			       struct inode *inode, const char *name,
+			       void **buffer, bool alloc);
 int security_inode_setsecurity(struct inode *inode, const char *name, const void *value, size_t size, int flags);
 int security_inode_listsecurity(struct inode *inode, char *buffer, size_t buffer_size);
 void security_inode_getsecid(struct inode *inode, u32 *secid);
@@ -830,8 +838,9 @@ static inline int security_inode_getattr(const struct path *path)
 	return 0;
 }
 
-static inline int security_inode_setxattr(struct dentry *dentry,
-		const char *name, const void *value, size_t size, int flags)
+static inline int security_inode_setxattr(struct user_namespace *user_ns,
+		struct dentry *dentry, const char *name, const void *value,
+		size_t size, int flags)
 {
 	return cap_inode_setxattr(dentry, name, value, size, flags);
 }
@@ -851,10 +860,11 @@ static inline int security_inode_listxattr(struct dentry *dentry)
 	return 0;
 }
 
-static inline int security_inode_removexattr(struct dentry *dentry,
-			const char *name)
+static inline int security_inode_removexattr(struct user_namespace *user_ns,
+					     struct dentry *dentry,
+					     const char *name)
 {
-	return cap_inode_removexattr(dentry, name);
+	return cap_inode_removexattr(user_ns, dentry, name);
 }
 
 static inline int security_inode_need_killpriv(struct dentry *dentry)
@@ -862,12 +872,16 @@ static inline int security_inode_need_killpriv(struct dentry *dentry)
 	return cap_inode_need_killpriv(dentry);
 }
 
-static inline int security_inode_killpriv(struct dentry *dentry)
+static inline int security_inode_killpriv(struct user_namespace *user_ns,
+					  struct dentry *dentry)
 {
-	return cap_inode_killpriv(dentry);
+	return cap_inode_killpriv(user_ns, dentry);
 }
 
-static inline int security_inode_getsecurity(struct inode *inode, const char *name, void **buffer, bool alloc)
+static inline int security_inode_getsecurity(struct user_namespace *user_ns,
+					     struct inode *inode,
+					     const char *name, void **buffer,
+					     bool alloc)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8dba8f0983b5..ddb9213a3e81 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1944,7 +1944,7 @@ static inline int audit_copy_fcaps(struct audit_names *name,
 	if (!dentry)
 		return 0;
 
-	rc = get_vfs_caps_from_disk(dentry, &caps);
+	rc = get_vfs_caps_from_disk(&init_user_ns, dentry, &caps);
 	if (rc)
 		return rc;
 
@@ -2495,7 +2495,8 @@ int __audit_log_bprm_fcaps(struct linux_binprm *bprm,
 	ax->d.next = context->aux;
 	context->aux = (void *)ax;
 
-	get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
+	get_vfs_caps_from_disk(mnt_user_ns(bprm->file->f_path.mnt),
+			       bprm->file->f_path.dentry, &vcaps);
 
 	ax->fcap.permitted = vcaps.permitted;
 	ax->fcap.inheritable = vcaps.inheritable;
diff --git a/security/commoncap.c b/security/commoncap.c
index 873820091b07..5071674f67b3 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -303,17 +303,18 @@ int cap_inode_need_killpriv(struct dentry *dentry)
 
 /**
  * cap_inode_killpriv - Erase the security markings on an inode
+ * @user_ns: The user namespace of the mount
  * @dentry: The inode/dentry to alter
  *
  * Erase the privilege-enhancing security markings on an inode.
  *
  * Returns 0 if successful, -ve on error.
  */
-int cap_inode_killpriv(struct dentry *dentry)
+int cap_inode_killpriv(struct user_namespace *user_ns, struct dentry *dentry)
 {
 	int error;
 
-	error = __vfs_removexattr(&init_user_ns, dentry, XATTR_NAME_CAPS);
+	error = __vfs_removexattr(user_ns, dentry, XATTR_NAME_CAPS);
 	if (error == -EOPNOTSUPP)
 		error = 0;
 	return error;
@@ -366,8 +367,8 @@ static bool is_v3header(size_t size, const struct vfs_cap_data *cap)
  * by the integrity subsystem, which really wants the unconverted values -
  * so that's good.
  */
-int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
-			  bool alloc)
+int cap_inode_getsecurity(struct user_namespace *user_ns, struct inode *inode,
+			  const char *name, void **buffer, bool alloc)
 {
 	int size, ret;
 	kuid_t kroot;
@@ -386,8 +387,8 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
 		return -EINVAL;
 
 	size = sizeof(struct vfs_ns_cap_data);
-	ret = (int)vfs_getxattr_alloc(&init_user_ns, dentry, XATTR_NAME_CAPS,
-				      &tmpbuf, size, GFP_NOFS);
+	ret = (int)vfs_getxattr_alloc(user_ns, dentry, XATTR_NAME_CAPS, &tmpbuf,
+				      size, GFP_NOFS);
 	dput(dentry);
 
 	if (ret < 0)
@@ -412,6 +413,9 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
 	root = le32_to_cpu(nscap->rootid);
 	kroot = make_kuid(fs_ns, root);
 
+	/* If this is an idmapped mount shift the kuid. */
+	kroot = kuid_into_mnt(user_ns, kroot);
+
 	/* If the root kuid maps to a valid uid in current ns, then return
 	 * this as a nscap. */
 	mappedroot = from_kuid(current_user_ns(), kroot);
@@ -573,7 +577,9 @@ static inline int bprm_caps_from_vfs_caps(struct cpu_vfs_cap_data *caps,
 /*
  * Extract the on-exec-apply capability sets for an executable file.
  */
-int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps)
+int get_vfs_caps_from_disk(struct user_namespace *user_ns,
+			   const struct dentry *dentry,
+			   struct cpu_vfs_cap_data *cpu_caps)
 {
 	struct inode *inode = d_backing_inode(dentry);
 	__u32 magic_etc;
@@ -629,6 +635,7 @@ int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data
 	/* Limit the caps to the mounter of the filesystem
 	 * or the more limited uid specified in the xattr.
 	 */
+	rootkuid = kuid_into_mnt(user_ns, rootkuid);
 	if (!rootid_owns_currentns(rootkuid))
 		return -ENODATA;
 
@@ -674,7 +681,7 @@ static int get_file_caps(struct linux_binprm *bprm, struct file *file,
 	if (!current_in_userns(file->f_path.mnt->mnt_sb->s_user_ns))
 		return 0;
 
-	rc = get_vfs_caps_from_disk(file->f_path.dentry, &vcaps);
+	rc = get_vfs_caps_from_disk(mnt_user_ns(file->f_path.mnt), file->f_path.dentry, &vcaps);
 	if (rc < 0) {
 		if (rc == -EINVAL)
 			printk(KERN_NOTICE "Invalid argument reading file caps for %s\n",
@@ -939,6 +946,7 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
 
 /**
  * cap_inode_removexattr - Determine whether an xattr may be removed
+ * @user_ns: The user namespace of the vfsmount
  * @dentry: The inode/dentry being altered
  * @name: The name of the xattr to be changed
  *
@@ -948,7 +956,8 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
  * This is used to make sure security xattrs don't get removed by those who
  * aren't privileged to remove them.
  */
-int cap_inode_removexattr(struct dentry *dentry, const char *name)
+int cap_inode_removexattr(struct user_namespace *mnt_user_ns,
+			  struct dentry *dentry, const char *name)
 {
 	struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
 
@@ -962,8 +971,7 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
 		struct inode *inode = d_backing_inode(dentry);
 		if (!inode)
 			return -EINVAL;
-		if (!capable_wrt_inode_uidgid(&init_user_ns, inode,
-					      CAP_SETFCAP))
+		if (!capable_wrt_inode_uidgid(mnt_user_ns, inode, CAP_SETFCAP))
 			return -EPERM;
 		return 0;
 	}
diff --git a/security/security.c b/security/security.c
index a28045dc9e7f..942ae47b6b86 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1279,7 +1279,8 @@ int security_inode_getattr(const struct path *path)
 	return call_int_hook(inode_getattr, 0, path);
 }
 
-int security_inode_setxattr(struct dentry *dentry, const char *name,
+int security_inode_setxattr(struct user_namespace *user_ns,
+			    struct dentry *dentry, const char *name,
 			    const void *value, size_t size, int flags)
 {
 	int ret;
@@ -1290,8 +1291,8 @@ int security_inode_setxattr(struct dentry *dentry, const char *name,
 	 * SELinux and Smack integrate the cap call,
 	 * so assume that all LSMs supplying this call do so.
 	 */
-	ret = call_int_hook(inode_setxattr, 1, dentry, name, value, size,
-				flags);
+	ret = call_int_hook(inode_setxattr, 1, user_ns, dentry, name, value,
+			    size, flags);
 
 	if (ret == 1)
 		ret = cap_inode_setxattr(dentry, name, value, size, flags);
@@ -1326,7 +1327,8 @@ int security_inode_listxattr(struct dentry *dentry)
 	return call_int_hook(inode_listxattr, 0, dentry);
 }
 
-int security_inode_removexattr(struct dentry *dentry, const char *name)
+int security_inode_removexattr(struct user_namespace *user_ns,
+			       struct dentry *dentry, const char *name)
 {
 	int ret;
 
@@ -1336,9 +1338,9 @@ int security_inode_removexattr(struct dentry *dentry, const char *name)
 	 * SELinux and Smack integrate the cap call,
 	 * so assume that all LSMs supplying this call do so.
 	 */
-	ret = call_int_hook(inode_removexattr, 1, dentry, name);
+	ret = call_int_hook(inode_removexattr, 1, user_ns, dentry, name);
 	if (ret == 1)
-		ret = cap_inode_removexattr(dentry, name);
+		ret = cap_inode_removexattr(user_ns, dentry, name);
 	if (ret)
 		return ret;
 	ret = ima_inode_removexattr(dentry, name);
@@ -1352,12 +1354,15 @@ int security_inode_need_killpriv(struct dentry *dentry)
 	return call_int_hook(inode_need_killpriv, 0, dentry);
 }
 
-int security_inode_killpriv(struct dentry *dentry)
+int security_inode_killpriv(struct user_namespace *user_ns,
+			    struct dentry *dentry)
 {
-	return call_int_hook(inode_killpriv, 0, dentry);
+	return call_int_hook(inode_killpriv, 0, user_ns, dentry);
 }
 
-int security_inode_getsecurity(struct inode *inode, const char *name, void **buffer, bool alloc)
+int security_inode_getsecurity(struct user_namespace *user_ns,
+			       struct inode *inode, const char *name,
+			       void **buffer, bool alloc)
 {
 	struct security_hook_list *hp;
 	int rc;
@@ -1368,7 +1373,7 @@ int security_inode_getsecurity(struct inode *inode, const char *name, void **buf
 	 * Only one module will provide an attribute with a given name.
 	 */
 	hlist_for_each_entry(hp, &security_hook_heads.inode_getsecurity, list) {
-		rc = hp->hook.inode_getsecurity(inode, name, buffer, alloc);
+		rc = hp->hook.inode_getsecurity(user_ns, inode, name, buffer, alloc);
 		if (rc != LSM_RET_DEFAULT(inode_getsecurity))
 			return rc;
 	}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3f42950feed0..35066d41fe9f 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3112,7 +3112,8 @@ static bool has_cap_mac_admin(bool audit)
 	return true;
 }
 
-static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
+static int selinux_inode_setxattr(struct user_namespace *user_ns,
+				  struct dentry *dentry, const char *name,
 				  const void *value, size_t size, int flags)
 {
 	struct inode *inode = d_backing_inode(dentry);
@@ -3133,13 +3134,13 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 	}
 
 	if (!selinux_initialized(&selinux_state))
-		return (inode_owner_or_capable(&init_user_ns, inode) ? 0 : -EPERM);
+		return (inode_owner_or_capable(user_ns, inode) ? 0 : -EPERM);
 
 	sbsec = inode->i_sb->s_security;
 	if (!(sbsec->flags & SBLABEL_MNT))
 		return -EOPNOTSUPP;
 
-	if (!inode_owner_or_capable(&init_user_ns, inode))
+	if (!inode_owner_or_capable(user_ns, inode))
 		return -EPERM;
 
 	ad.type = LSM_AUDIT_DATA_DENTRY;
@@ -3260,10 +3261,11 @@ static int selinux_inode_listxattr(struct dentry *dentry)
 	return dentry_has_perm(cred, dentry, FILE__GETATTR);
 }
 
-static int selinux_inode_removexattr(struct dentry *dentry, const char *name)
+static int selinux_inode_removexattr(struct user_namespace *user_ns,
+				     struct dentry *dentry, const char *name)
 {
 	if (strcmp(name, XATTR_NAME_SELINUX)) {
-		int rc = cap_inode_removexattr(dentry, name);
+		int rc = cap_inode_removexattr(user_ns, dentry, name);
 		if (rc)
 			return rc;
 
@@ -3329,7 +3331,9 @@ static int selinux_path_notify(const struct path *path, u64 mask,
  *
  * Permission check is handled by selinux_inode_getxattr hook.
  */
-static int selinux_inode_getsecurity(struct inode *inode, const char *name, void **buffer, bool alloc)
+static int selinux_inode_getsecurity(struct user_namespace *user_ns,
+				     struct inode *inode, const char *name,
+				     void **buffer, bool alloc)
 {
 	u32 size;
 	int error;
@@ -6524,8 +6528,8 @@ static int selinux_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
 static int selinux_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
 {
 	int len = 0;
-	len = selinux_inode_getsecurity(inode, XATTR_SELINUX_SUFFIX,
-						ctx, true);
+	len = selinux_inode_getsecurity(&init_user_ns, inode,
+					XATTR_SELINUX_SUFFIX, ctx, true);
 	if (len < 0)
 		return len;
 	*ctxlen = len;
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 11003244b18b..fde9f124d5ee 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1240,7 +1240,8 @@ static int smack_inode_getattr(const struct path *path)
  *
  * Returns 0 if access is permitted, an error code otherwise
  */
-static int smack_inode_setxattr(struct dentry *dentry, const char *name,
+static int smack_inode_setxattr(struct user_namespace *user_ns,
+				struct dentry *dentry, const char *name,
 				const void *value, size_t size, int flags)
 {
 	struct smk_audit_info ad;
@@ -1362,7 +1363,8 @@ static int smack_inode_getxattr(struct dentry *dentry, const char *name)
  *
  * Returns 0 if access is permitted, an error code otherwise
  */
-static int smack_inode_removexattr(struct dentry *dentry, const char *name)
+static int smack_inode_removexattr(struct user_namespace *user_ns,
+				   struct dentry *dentry, const char *name)
 {
 	struct inode_smack *isp;
 	struct smk_audit_info ad;
@@ -1377,7 +1379,7 @@ static int smack_inode_removexattr(struct dentry *dentry, const char *name)
 		if (!smack_privileged(CAP_MAC_ADMIN))
 			rc = -EPERM;
 	} else
-		rc = cap_inode_removexattr(dentry, name);
+		rc = cap_inode_removexattr(user_ns, dentry, name);
 
 	if (rc != 0)
 		return rc;
@@ -1420,9 +1422,9 @@ static int smack_inode_removexattr(struct dentry *dentry, const char *name)
  *
  * Returns the size of the attribute or an error code
  */
-static int smack_inode_getsecurity(struct inode *inode,
-				   const char *name, void **buffer,
-				   bool alloc)
+static int smack_inode_getsecurity(struct user_namespace *user_ns,
+				   struct inode *inode, const char *name,
+				   void **buffer, bool alloc)
 {
 	struct socket_smack *ssp;
 	struct socket *sock;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 15/39] stat: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (13 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 14/39] commoncap: " Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 16/39] namei: handle idmapped mounts in may_*() helpers Christian Brauner
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

The generic_fillattr() helper fills in the basic attributes associated with
an inode. Enable it to handle idmapped mounts. If the inode is accessed
through an idmapped mount we need to map it according to the mount's user
namespace. If the initial user namespace is passed all operations are a nop
so non-idmapped mounts will not see a change in behavior and will also not
see any performance impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 fs/9p/vfs_inode.c      |  4 ++--
 fs/9p/vfs_inode_dotl.c |  4 ++--
 fs/afs/inode.c         |  2 +-
 fs/btrfs/inode.c       |  2 +-
 fs/ceph/inode.c        |  2 +-
 fs/cifs/inode.c        |  2 +-
 fs/coda/inode.c        |  2 +-
 fs/ecryptfs/inode.c    |  4 ++--
 fs/erofs/inode.c       |  2 +-
 fs/exfat/file.c        |  2 +-
 fs/ext2/inode.c        |  2 +-
 fs/ext4/inode.c        |  2 +-
 fs/f2fs/file.c         |  2 +-
 fs/fat/file.c          |  2 +-
 fs/fuse/dir.c          |  2 +-
 fs/gfs2/inode.c        |  2 +-
 fs/hfsplus/inode.c     |  2 +-
 fs/kernfs/inode.c      |  2 +-
 fs/libfs.c             |  4 ++--
 fs/minix/inode.c       |  2 +-
 fs/nfs/inode.c         |  2 +-
 fs/nfs/namespace.c     |  2 +-
 fs/ocfs2/file.c        |  2 +-
 fs/orangefs/inode.c    |  2 +-
 fs/proc/base.c         |  8 ++++----
 fs/proc/generic.c      |  2 +-
 fs/proc/proc_net.c     |  2 +-
 fs/proc/proc_sysctl.c  |  2 +-
 fs/proc/root.c         |  2 +-
 fs/stat.c              | 10 ++++++----
 fs/sysv/itree.c        |  2 +-
 fs/ubifs/dir.c         |  2 +-
 fs/udf/symlink.c       |  2 +-
 fs/vboxsf/utils.c      |  2 +-
 include/linux/fs.h     |  2 +-
 mm/shmem.c             |  2 +-
 36 files changed, 48 insertions(+), 46 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 404526499c94..0a5c022c1c70 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1006,7 +1006,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
 	v9ses = v9fs_dentry2v9ses(dentry);
 	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
-		generic_fillattr(d_inode(dentry), stat);
+		generic_fillattr(&init_user_ns, d_inode(dentry), stat);
 		return 0;
 	}
 	fid = v9fs_fid_lookup(dentry);
@@ -1018,7 +1018,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
 		return PTR_ERR(st);
 
 	v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
-	generic_fillattr(d_inode(dentry), stat);
+	generic_fillattr(&init_user_ns, d_inode(dentry), stat);
 
 	p9stat_free(st);
 	kfree(st);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 282ec5cb45dc..8f3c1daf72ba 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -466,7 +466,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
 	v9ses = v9fs_dentry2v9ses(dentry);
 	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
-		generic_fillattr(d_inode(dentry), stat);
+		generic_fillattr(&init_user_ns, d_inode(dentry), stat);
 		return 0;
 	}
 	fid = v9fs_fid_lookup(dentry);
@@ -482,7 +482,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
 		return PTR_ERR(st);
 
 	v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
-	generic_fillattr(d_inode(dentry), stat);
+	generic_fillattr(&init_user_ns, d_inode(dentry), stat);
 	/* Change block size to what the server returned */
 	stat->blksize = st->st_blksize;
 
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 0fe8844b4bee..17ecdee404eb 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -737,7 +737,7 @@ int afs_getattr(const struct path *path, struct kstat *stat,
 
 	do {
 		read_seqbegin_or_lock(&vnode->cb_lock, &seq);
-		generic_fillattr(inode, stat);
+		generic_fillattr(&init_user_ns, inode, stat);
 		if (test_bit(AFS_VNODE_SILLY_DELETED, &vnode->flags) &&
 		    stat->nlink > 0)
 			stat->nlink -= 1;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6f4aed0d311..99b4fd66681d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8783,7 +8783,7 @@ static int btrfs_getattr(const struct path *path, struct kstat *stat,
 				  STATX_ATTR_IMMUTABLE |
 				  STATX_ATTR_NODUMP);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
 
 	spin_lock(&BTRFS_I(inode)->lock);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 19ec845ba5ec..decfa5e1ec4c 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2380,7 +2380,7 @@ int ceph_getattr(const struct path *path, struct kstat *stat,
 			return err;
 	}
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	stat->ino = ceph_present_inode(inode);
 
 	/*
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 4d69b786e403..4fb3b6961096 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -2400,7 +2400,7 @@ int cifs_getattr(const struct path *path, struct kstat *stat,
 			return rc;
 	}
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	stat->blksize = cifs_sb->bsize;
 	stat->ino = CIFS_I(inode)->uniqueid;
 
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index b1c70e2b9b1e..4d113e191cb8 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -256,7 +256,7 @@ int coda_getattr(const struct path *path, struct kstat *stat,
 {
 	int err = coda_revalidate_inode(d_inode(path->dentry));
 	if (!err)
-		generic_fillattr(d_inode(path->dentry), stat);
+		generic_fillattr(&init_user_ns, d_inode(path->dentry), stat);
 	return err;
 }
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index c35c2d2f3fe6..d98448c75051 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -975,7 +975,7 @@ static int ecryptfs_getattr_link(const struct path *path, struct kstat *stat,
 
 	mount_crypt_stat = &ecryptfs_superblock_to_private(
 						dentry->d_sb)->mount_crypt_stat;
-	generic_fillattr(d_inode(dentry), stat);
+	generic_fillattr(&init_user_ns, d_inode(dentry), stat);
 	if (mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES) {
 		char *target;
 		size_t targetsiz;
@@ -1003,7 +1003,7 @@ static int ecryptfs_getattr(const struct path *path, struct kstat *stat,
 	if (!rc) {
 		fsstack_copy_attr_all(d_inode(dentry),
 				      ecryptfs_inode_to_lower(d_inode(dentry)));
-		generic_fillattr(d_inode(dentry), stat);
+		generic_fillattr(&init_user_ns, d_inode(dentry), stat);
 		stat->blocks = lower_stat.blocks;
 	}
 	return rc;
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 139d0bed42f8..6b57529b1c1d 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -342,7 +342,7 @@ int erofs_getattr(const struct path *path, struct kstat *stat,
 	stat->attributes_mask |= (STATX_ATTR_COMPRESSED |
 				  STATX_ATTR_IMMUTABLE);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index ace35aa8e64b..e9705b3295d3 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -273,7 +273,7 @@ int exfat_getattr(const struct path *path, struct kstat *stat,
 	struct inode *inode = d_backing_inode(path->dentry);
 	struct exfat_inode_info *ei = EXFAT_I(inode);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	exfat_truncate_atime(&stat->atime);
 	stat->result_mask |= STATX_BTIME;
 	stat->btime.tv_sec = ei->i_crtime.tv_sec;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 5ff1e5e3c0fe..7fcda76094ff 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1659,7 +1659,7 @@ int ext2_getattr(const struct path *path, struct kstat *stat,
 			STATX_ATTR_IMMUTABLE |
 			STATX_ATTR_NODUMP);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7bc58f25bf2e..03be530cebbe 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5557,7 +5557,7 @@ int ext4_getattr(const struct path *path, struct kstat *stat,
 				  STATX_ATTR_NODUMP |
 				  STATX_ATTR_VERITY);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5fc6c44020ed..20f7e5bdd9ad 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -814,7 +814,7 @@ int f2fs_getattr(const struct path *path, struct kstat *stat,
 				  STATX_ATTR_NODUMP |
 				  STATX_ATTR_VERITY);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 
 	/* we need to show initial sectors used for inline_data/dentries */
 	if ((S_ISREG(inode->i_mode) && f2fs_has_inline_data(inode)) ||
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 805b501467e9..f7e04f533d31 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -398,7 +398,7 @@ int fat_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int flags)
 {
 	struct inode *inode = d_inode(path->dentry);
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	stat->blksize = MSDOS_SB(inode->i_sb)->cluster_size;
 
 	if (MSDOS_SB(inode->i_sb)->options.nfs == FAT_NFS_NOSTALE_RO) {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index dc6e8cc565d2..0750755685cd 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1064,7 +1064,7 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
 		forget_all_cached_acls(inode);
 		err = fuse_do_getattr(inode, stat, file);
 	} else if (stat) {
-		generic_fillattr(inode, stat);
+		generic_fillattr(&init_user_ns, inode, stat);
 		stat->mode = fi->orig_i_mode;
 		stat->ino = fi->orig_ino;
 	}
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 00a90287b919..eebc07f62c50 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2047,7 +2047,7 @@ static int gfs2_getattr(const struct path *path, struct kstat *stat,
 				  STATX_ATTR_IMMUTABLE |
 				  STATX_ATTR_NODUMP);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 
 	if (gfs2_holder_initialized(&gh))
 		gfs2_glock_dq_uninit(&gh);
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index a2789730f451..a34f37a53dcd 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -286,7 +286,7 @@ int hfsplus_getattr(const struct path *path, struct kstat *stat,
 	stat->attributes_mask |= STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE |
 				 STATX_ATTR_NODUMP;
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 74ee0ffcfe71..062593491bbe 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -193,7 +193,7 @@ int kernfs_iop_getattr(const struct path *path, struct kstat *stat,
 	kernfs_refresh_inode(kn, inode);
 	mutex_unlock(&kernfs_mutex);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/libfs.c b/fs/libfs.c
index f50446576b10..6aa8bead838f 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -31,7 +31,7 @@ int simple_getattr(const struct path *path, struct kstat *stat,
 		   u32 request_mask, unsigned int query_flags)
 {
 	struct inode *inode = d_inode(path->dentry);
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	stat->blocks = inode->i_mapping->nrpages << (PAGE_SHIFT - 9);
 	return 0;
 }
@@ -1302,7 +1302,7 @@ static int empty_dir_getattr(const struct path *path, struct kstat *stat,
 			     u32 request_mask, unsigned int query_flags)
 {
 	struct inode *inode = d_inode(path->dentry);
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 34f546404aa1..91c81d2fc90d 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -658,7 +658,7 @@ int minix_getattr(const struct path *path, struct kstat *stat,
 	struct super_block *sb = path->dentry->d_sb;
 	struct inode *inode = d_inode(path->dentry);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	if (INODE_VERSION(inode) == MINIX_V1)
 		stat->blocks = (BLOCK_SIZE / 512) * V1_minix_blocks(stat->size, sb);
 	else
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index aa6493905bbe..8d74974d7992 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -858,7 +858,7 @@ int nfs_getattr(const struct path *path, struct kstat *stat,
 	/* Only return attributes that were revalidated. */
 	stat->result_mask &= request_mask;
 out_no_update:
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
 	if (S_ISDIR(inode->i_mode))
 		stat->blksize = NFS_SERVER(inode)->dtsize;
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 2bcbe38afe2e..55fc711e368b 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -213,7 +213,7 @@ nfs_namespace_getattr(const struct path *path, struct kstat *stat,
 {
 	if (NFS_FH(d_inode(path->dentry))->size != 0)
 		return nfs_getattr(path, stat, request_mask, query_flags);
-	generic_fillattr(d_inode(path->dentry), stat);
+	generic_fillattr(&init_user_ns, d_inode(path->dentry), stat);
 	return 0;
 }
 
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index cabf355b148f..a070d4c9b6ed 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1313,7 +1313,7 @@ int ocfs2_getattr(const struct path *path, struct kstat *stat,
 		goto bail;
 	}
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	/*
 	 * If there is inline data in the inode, the inode will normally not
 	 * have data blocks allocated (it may have an external xattr block).
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 563fe9ab8eb2..b94032f77e61 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -903,7 +903,7 @@ int orangefs_getattr(const struct path *path, struct kstat *stat,
 	ret = orangefs_inode_getattr(inode,
 	    request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
 	if (ret == 0) {
-		generic_fillattr(inode, stat);
+		generic_fillattr(&init_user_ns, inode, stat);
 
 		/* override block size reported to stat */
 		if (!(request_mask & STATX_SIZE))
diff --git a/fs/proc/base.c b/fs/proc/base.c
index e60f4c60f94c..541779a52b7d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1932,7 +1932,7 @@ int pid_getattr(const struct path *path, struct kstat *stat,
 	struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb);
 	struct task_struct *task;
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 
 	stat->uid = GLOBAL_ROOT_UID;
 	stat->gid = GLOBAL_ROOT_GID;
@@ -2618,7 +2618,7 @@ static struct dentry *proc_pident_instantiate(struct dentry *dentry,
 	return d_splice_alias(inode, dentry);
 }
 
-static struct dentry *proc_pident_lookup(struct inode *dir, 
+static struct dentry *proc_pident_lookup(struct inode *dir,
 					 struct dentry *dentry,
 					 const struct pid_entry *p,
 					 const struct pid_entry *end)
@@ -2818,7 +2818,7 @@ static const struct pid_entry attr_dir_stuff[] = {
 
 static int proc_attr_dir_readdir(struct file *file, struct dir_context *ctx)
 {
-	return proc_pident_readdir(file, ctx, 
+	return proc_pident_readdir(file, ctx,
 				   attr_dir_stuff, ARRAY_SIZE(attr_dir_stuff));
 }
 
@@ -3795,7 +3795,7 @@ static int proc_task_getattr(const struct path *path, struct kstat *stat,
 {
 	struct inode *inode = d_inode(path->dentry);
 	struct task_struct *p = get_proc_task(inode);
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 
 	if (p) {
 		stat->nlink += get_nr_threads(p);
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index fee877db93f2..6b8094a64176 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -145,7 +145,7 @@ static int proc_getattr(const struct path *path, struct kstat *stat,
 		}
 	}
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index ed8a6306990c..bc555b7da6dc 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -313,7 +313,7 @@ static int proc_tgid_net_getattr(const struct path *path, struct kstat *stat,
 
 	net = get_proc_task_net(inode);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 
 	if (net != NULL) {
 		stat->nlink = net->proc_net->nlink;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index ec67dbc1f705..87c828348140 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -840,7 +840,7 @@ static int proc_sys_getattr(const struct path *path, struct kstat *stat,
 	if (IS_ERR(head))
 		return PTR_ERR(head);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	if (table)
 		stat->mode = (stat->mode & S_IFMT) | table->mode;
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 5e444d4f9717..244e4b6f15ef 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -311,7 +311,7 @@ void __init proc_root_init(void)
 static int proc_root_getattr(const struct path *path, struct kstat *stat,
 			     u32 request_mask, unsigned int query_flags)
 {
-	generic_fillattr(d_inode(path->dentry), stat);
+	generic_fillattr(&init_user_ns, d_inode(path->dentry), stat);
 	stat->nlink = proc_root.nlink + nr_processes();
 	return 0;
 }
diff --git a/fs/stat.c b/fs/stat.c
index dacecdda2e79..868e39e101bc 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -26,6 +26,7 @@
 
 /**
  * generic_fillattr - Fill in the basic attributes from the inode struct
+ * @user_ns: the user namespace from which we access this inode
  * @inode: Inode to use as the source
  * @stat: Where to fill in the attributes
  *
@@ -33,14 +34,15 @@
  * found on the VFS inode structure.  This is the default if no getattr inode
  * operation is supplied.
  */
-void generic_fillattr(struct inode *inode, struct kstat *stat)
+void generic_fillattr(struct user_namespace *mnt_user_ns, struct inode *inode,
+		      struct kstat *stat)
 {
 	stat->dev = inode->i_sb->s_dev;
 	stat->ino = inode->i_ino;
 	stat->mode = inode->i_mode;
 	stat->nlink = inode->i_nlink;
-	stat->uid = inode->i_uid;
-	stat->gid = inode->i_gid;
+	stat->uid = i_uid_into_mnt(mnt_user_ns, inode);
+	stat->gid = i_gid_into_mnt(mnt_user_ns, inode);
 	stat->rdev = inode->i_rdev;
 	stat->size = i_size_read(inode);
 	stat->atime = inode->i_atime;
@@ -87,7 +89,7 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 		return inode->i_op->getattr(path, stat, request_mask,
 					    query_flags);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(mnt_user_ns(path->mnt), inode, stat);
 	return 0;
 }
 EXPORT_SYMBOL(vfs_getattr_nosec);
diff --git a/fs/sysv/itree.c b/fs/sysv/itree.c
index bcb67b0cabe7..83cffab6955f 100644
--- a/fs/sysv/itree.c
+++ b/fs/sysv/itree.c
@@ -445,7 +445,7 @@ int sysv_getattr(const struct path *path, struct kstat *stat,
 		 u32 request_mask, unsigned int flags)
 {
 	struct super_block *s = path->dentry->d_sb;
-	generic_fillattr(d_inode(path->dentry), stat);
+	generic_fillattr(&init_user_ns, d_inode(path->dentry), stat);
 	stat->blocks = (s->s_blocksize / 512) * sysv_nblocks(s, stat->size);
 	stat->blksize = s->s_blocksize;
 	return 0;
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 1639331f9543..144422c39125 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1579,7 +1579,7 @@ int ubifs_getattr(const struct path *path, struct kstat *stat,
 				STATX_ATTR_ENCRYPTED |
 				STATX_ATTR_IMMUTABLE);
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	stat->blksize = UBIFS_BLOCK_SIZE;
 	stat->size = ui->ui_size;
 
diff --git a/fs/udf/symlink.c b/fs/udf/symlink.c
index c973db239604..54a44d1f023c 100644
--- a/fs/udf/symlink.c
+++ b/fs/udf/symlink.c
@@ -159,7 +159,7 @@ static int udf_symlink_getattr(const struct path *path, struct kstat *stat,
 	struct inode *inode = d_backing_inode(dentry);
 	struct page *page;
 
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 	page = read_mapping_page(inode->i_mapping, 0, NULL);
 	if (IS_ERR(page))
 		return PTR_ERR(page);
diff --git a/fs/vboxsf/utils.c b/fs/vboxsf/utils.c
index 018057546067..d2cd1c99f48e 100644
--- a/fs/vboxsf/utils.c
+++ b/fs/vboxsf/utils.c
@@ -233,7 +233,7 @@ int vboxsf_getattr(const struct path *path, struct kstat *kstat,
 	if (err)
 		return err;
 
-	generic_fillattr(d_inode(dentry), kstat);
+	generic_fillattr(&init_user_ns, d_inode(dentry), kstat);
 	return 0;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 95200607ac9c..17f23f0ffca3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3123,7 +3123,7 @@ extern int __page_symlink(struct inode *inode, const char *symname, int len,
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
 extern void kfree_link(void *);
-extern void generic_fillattr(struct inode *, struct kstat *);
+extern void generic_fillattr(struct user_namespace *, struct inode *, struct kstat *);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
 extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
 void __inode_add_bytes(struct inode *inode, loff_t bytes);
diff --git a/mm/shmem.c b/mm/shmem.c
index 8fdf52576e06..7c317b0d06f0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1072,7 +1072,7 @@ static int shmem_getattr(const struct path *path, struct kstat *stat,
 		shmem_recalc_inode(inode);
 		spin_unlock_irq(&info->lock);
 	}
-	generic_fillattr(inode, stat);
+	generic_fillattr(&init_user_ns, inode, stat);
 
 	if (is_huge_enabled(sb_info))
 		stat->blksize = HPAGE_PMD_SIZE;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 16/39] namei: handle idmapped mounts in may_*() helpers
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (14 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 15/39] stat: " Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 17/39] namei: introduce struct renamedata Christian Brauner
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

The may_follow_link(), may_linkat(), may_lookup(), may_open(), may_o_create(),
may_create_in_sticky(), may_delete(), and may_create() helpers determine
whether the caller is privileged enough to perform the associated operations.
Let them handle idmapped mounts by mappings the inode and fsids according to
the mount's user namespace. Afterwards the checks are identical to non-idmapped
inodes. If the initial user namespace is passed all operations are a nop so
non-idmapped mounts will not see a change in behavior and will also not see any
performance impact.
Since the may_*() helpers are not exposed to other parts of the kernel we can
simply extend them with an additional argument in case they don't already have
access to the mount's user namespace.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/btrfs/ioctl.c   |   5 +-
 fs/inode.c         |   2 +-
 fs/namei.c         | 121 +++++++++++++++++++++++++++------------------
 fs/xattr.c         |   2 +-
 include/linux/fs.h |  13 +++--
 5 files changed, 86 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 39f25b5d06ed..ccac53bb2a1c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -915,8 +915,9 @@ static int btrfs_may_delete(struct inode *dir, struct dentry *victim, int isdir)
 		return error;
 	if (IS_APPEND(dir))
 		return -EPERM;
-	if (check_sticky(dir, d_inode(victim)) || IS_APPEND(d_inode(victim)) ||
-	    IS_IMMUTABLE(d_inode(victim)) || IS_SWAPFILE(d_inode(victim)))
+	if (check_sticky(&init_user_ns, dir, d_inode(victim)) ||
+	    IS_APPEND(d_inode(victim)) || IS_IMMUTABLE(d_inode(victim)) ||
+	    IS_SWAPFILE(d_inode(victim)))
 		return -EPERM;
 	if (isdir) {
 		if (!d_is_dir(victim))
diff --git a/fs/inode.c b/fs/inode.c
index 66d3f7397d86..75c64f003c45 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1797,7 +1797,7 @@ bool atime_needs_update(const struct path *path, struct inode *inode)
 	/* Atime updates will likely cause i_uid and i_gid to be written
 	 * back improprely if their true value is unknown to the vfs.
 	 */
-	if (HAS_UNMAPPED_ID(inode))
+	if (HAS_UNMAPPED_ID(mnt_user_ns(mnt), inode))
 		return false;
 
 	if (IS_NOATIME(inode))
diff --git a/fs/namei.c b/fs/namei.c
index 35952c28ee29..4dc842d1cd3a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -465,7 +465,7 @@ int inode_permission(struct user_namespace *user_ns,
 		 * written back improperly if their true value is unknown
 		 * to the vfs.
 		 */
-		if (HAS_UNMAPPED_ID(inode))
+		if (HAS_UNMAPPED_ID(user_ns, inode))
 			return -EACCES;
 	}
 
@@ -963,11 +963,16 @@ int sysctl_protected_regular __read_mostly;
  */
 static inline int may_follow_link(struct nameidata *nd, const struct inode *inode)
 {
+	struct user_namespace *user_ns;
+	kuid_t i_uid;
+
 	if (!sysctl_protected_symlinks)
 		return 0;
 
+	user_ns = mnt_user_ns(nd->path.mnt);
+	i_uid = i_uid_into_mnt(user_ns, inode);
 	/* Allowed if owner and follower match. */
-	if (uid_eq(current_cred()->fsuid, inode->i_uid))
+	if (uid_eq(current_cred()->fsuid, i_uid))
 		return 0;
 
 	/* Allowed if parent directory not sticky and world-writable. */
@@ -975,7 +980,7 @@ static inline int may_follow_link(struct nameidata *nd, const struct inode *inod
 		return 0;
 
 	/* Allowed if parent directory and link owner match. */
-	if (uid_valid(nd->dir_uid) && uid_eq(nd->dir_uid, inode->i_uid))
+	if (uid_valid(nd->dir_uid) && uid_eq(nd->dir_uid, i_uid))
 		return 0;
 
 	if (nd->flags & LOOKUP_RCU)
@@ -998,7 +1003,7 @@ static inline int may_follow_link(struct nameidata *nd, const struct inode *inod
  *
  * Otherwise returns true.
  */
-static bool safe_hardlink_source(struct inode *inode)
+static bool safe_hardlink_source(struct user_namespace *user_ns, struct inode *inode)
 {
 	umode_t mode = inode->i_mode;
 
@@ -1015,7 +1020,7 @@ static bool safe_hardlink_source(struct inode *inode)
 		return false;
 
 	/* Hardlinking to unreadable or unwritable sources is dangerous. */
-	if (inode_permission(&init_user_ns, inode, MAY_READ | MAY_WRITE))
+	if (inode_permission(user_ns, inode, MAY_READ | MAY_WRITE))
 		return false;
 
 	return true;
@@ -1036,9 +1041,12 @@ static bool safe_hardlink_source(struct inode *inode)
 int may_linkat(struct path *link)
 {
 	struct inode *inode = link->dentry->d_inode;
+	struct user_namespace *user_ns;
 
 	/* Inode writeback is not safe when the uid or gid are invalid. */
-	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+	user_ns = mnt_user_ns(link->mnt);
+	if (!uid_valid(i_uid_into_mnt(user_ns, inode)) ||
+	    !gid_valid(i_gid_into_mnt(user_ns, inode)))
 		return -EOVERFLOW;
 
 	if (!sysctl_protected_hardlinks)
@@ -1047,7 +1055,8 @@ int may_linkat(struct path *link)
 	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
 	 * otherwise, it must be a safe source.
 	 */
-	if (safe_hardlink_source(inode) || inode_owner_or_capable(&init_user_ns, inode))
+	if (safe_hardlink_source(user_ns, inode) ||
+	    inode_owner_or_capable(user_ns, inode))
 		return 0;
 
 	audit_log_path_denied(AUDIT_ANOM_LINK, "linkat");
@@ -1075,14 +1084,18 @@ int may_linkat(struct path *link)
  *
  * Returns 0 if the open is allowed, -ve on error.
  */
-static int may_create_in_sticky(umode_t dir_mode, kuid_t dir_uid,
-				struct inode * const inode)
+static int may_create_in_sticky(struct nameidata *nd, struct inode *const inode)
 {
+	struct user_namespace *user_ns;
+	umode_t dir_mode = nd->dir_mode;
+	kuid_t dir_uid = nd->dir_uid;
+
+	user_ns = mnt_user_ns(nd->path.mnt);
 	if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) ||
 	    (!sysctl_protected_regular && S_ISREG(inode->i_mode)) ||
 	    likely(!(dir_mode & S_ISVTX)) ||
-	    uid_eq(inode->i_uid, dir_uid) ||
-	    uid_eq(current_fsuid(), inode->i_uid))
+	    uid_eq(i_uid_into_mnt(user_ns, inode), dir_uid) ||
+	    uid_eq(current_fsuid(), i_uid_into_mnt(user_ns, inode)))
 		return 0;
 
 	if (likely(dir_mode & 0002) ||
@@ -1574,15 +1587,16 @@ static struct dentry *lookup_slow(const struct qstr *name,
 
 static inline int may_lookup(struct nameidata *nd)
 {
+	struct user_namespace *user_ns = mnt_user_ns(nd->path.mnt);
+
 	if (nd->flags & LOOKUP_RCU) {
-		int err = inode_permission(&init_user_ns, nd->inode,
-					   MAY_EXEC | MAY_NOT_BLOCK);
+		int err = inode_permission(user_ns, nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
 		if (err != -ECHILD)
 			return err;
 		if (unlazy_walk(nd))
 			return -ECHILD;
 	}
-	return inode_permission(&init_user_ns, nd->inode, MAY_EXEC);
+	return inode_permission(user_ns, nd->inode, MAY_EXEC);
 }
 
 static int reserve_stack(struct nameidata *nd, struct path *link, unsigned seq)
@@ -2181,7 +2195,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 OK:
 			/* pathname or trailing symlink, done */
 			if (!depth) {
-				nd->dir_uid = nd->inode->i_uid;
+				struct user_namespace *user_ns;
+
+				user_ns = mnt_user_ns(nd->path.mnt);
+				nd->dir_uid = i_uid_into_mnt(user_ns, nd->inode);
 				nd->dir_mode = nd->inode->i_mode;
 				nd->flags &= ~LOOKUP_PARENT;
 				return 0;
@@ -2659,15 +2676,16 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
 }
 EXPORT_SYMBOL(user_path_at_empty);
 
-int __check_sticky(struct inode *dir, struct inode *inode)
+int __check_sticky(struct user_namespace *user_ns, struct inode *dir,
+		   struct inode *inode)
 {
 	kuid_t fsuid = current_fsuid();
 
-	if (uid_eq(inode->i_uid, fsuid))
+	if (uid_eq(i_uid_into_mnt(user_ns, inode), fsuid))
 		return 0;
-	if (uid_eq(dir->i_uid, fsuid))
+	if (uid_eq(i_uid_into_mnt(user_ns, dir), fsuid))
 		return 0;
-	return !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FOWNER);
+	return !capable_wrt_inode_uidgid(user_ns, inode, CAP_FOWNER);
 }
 EXPORT_SYMBOL(__check_sticky);
 
@@ -2691,7 +2709,7 @@ EXPORT_SYMBOL(__check_sticky);
  * 11. We don't allow removal of NFS sillyrenamed files; it's handled by
  *     nfs_async_unlink().
  */
-static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
+static int may_delete(struct user_namespace *user_ns, struct inode *dir, struct dentry *victim, bool isdir)
 {
 	struct inode *inode = d_backing_inode(victim);
 	int error;
@@ -2703,19 +2721,21 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
 	BUG_ON(victim->d_parent->d_inode != dir);
 
 	/* Inode writeback is not safe when the uid or gid are invalid. */
-	if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
+	if (!uid_valid(i_uid_into_mnt(user_ns, inode)) ||
+	    !gid_valid(i_gid_into_mnt(user_ns, inode)))
 		return -EOVERFLOW;
 
 	audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);
 
-	error = inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
+	error = inode_permission(user_ns, dir, MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 	if (IS_APPEND(dir))
 		return -EPERM;
 
-	if (check_sticky(dir, inode) || IS_APPEND(inode) ||
-	    IS_IMMUTABLE(inode) || IS_SWAPFILE(inode) || HAS_UNMAPPED_ID(inode))
+	if (check_sticky(user_ns, dir, inode) || IS_APPEND(inode) ||
+	    IS_IMMUTABLE(inode) || IS_SWAPFILE(inode) ||
+	    HAS_UNMAPPED_ID(user_ns, inode))
 		return -EPERM;
 	if (isdir) {
 		if (!d_is_dir(victim))
@@ -2740,7 +2760,8 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
  *  4. We should have write and exec permissions on dir
  *  5. We can't do it if dir is immutable (done in permission())
  */
-static inline int may_create(struct inode *dir, struct dentry *child)
+static inline int may_create(struct user_namespace *user_ns, struct inode *dir,
+			     struct dentry *child)
 {
 	struct user_namespace *s_user_ns;
 	audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
@@ -2749,10 +2770,10 @@ static inline int may_create(struct inode *dir, struct dentry *child)
 	if (IS_DEADDIR(dir))
 		return -ENOENT;
 	s_user_ns = dir->i_sb->s_user_ns;
-	if (!kuid_has_mapping(s_user_ns, current_fsuid()) ||
-	    !kgid_has_mapping(s_user_ns, current_fsgid()))
+	if (!kuid_has_mapping(s_user_ns, fsuid_into_mnt(user_ns)) ||
+	    !kgid_has_mapping(s_user_ns, fsgid_into_mnt(user_ns)))
 		return -EOVERFLOW;
-	return inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
+	return inode_permission(user_ns, dir, MAY_WRITE | MAY_EXEC);
 }
 
 /*
@@ -2802,7 +2823,7 @@ EXPORT_SYMBOL(unlock_rename);
 int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 		bool want_excl)
 {
-	int error = may_create(dir, dentry);
+	int error = may_create(&init_user_ns, dir, dentry);
 	if (error)
 		return error;
 
@@ -2825,7 +2846,7 @@ int vfs_mkobj(struct dentry *dentry, umode_t mode,
 		void *arg)
 {
 	struct inode *dir = dentry->d_parent->d_inode;
-	int error = may_create(dir, dentry);
+	int error = may_create(&init_user_ns, dir, dentry);
 	if (error)
 		return error;
 
@@ -2849,6 +2870,7 @@ bool may_open_dev(const struct path *path)
 
 static int may_open(const struct path *path, int acc_mode, int flag)
 {
+	struct user_namespace *user_ns;
 	struct dentry *dentry = path->dentry;
 	struct inode *inode = dentry->d_inode;
 	int error;
@@ -2882,7 +2904,8 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 		break;
 	}
 
-	error = inode_permission(&init_user_ns, inode, MAY_OPEN | acc_mode);
+	user_ns = mnt_user_ns(path->mnt);
+	error = inode_permission(user_ns, inode, MAY_OPEN | acc_mode);
 	if (error)
 		return error;
 
@@ -2897,7 +2920,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
 	}
 
 	/* O_NOATIME can only be set by the owner or superuser */
-	if (flag & O_NOATIME && !inode_owner_or_capable(&init_user_ns, inode))
+	if (flag & O_NOATIME && !inode_owner_or_capable(user_ns, inode))
 		return -EPERM;
 
 	return 0;
@@ -2934,17 +2957,18 @@ static inline int open_to_namei_flags(int flag)
 
 static int may_o_create(const struct path *dir, struct dentry *dentry, umode_t mode)
 {
-	struct user_namespace *s_user_ns;
+	struct user_namespace *s_user_ns, *user_ns;
 	int error = security_path_mknod(dir, dentry, mode, 0);
 	if (error)
 		return error;
 
+	user_ns = mnt_user_ns(dir->mnt);
 	s_user_ns = dir->dentry->d_sb->s_user_ns;
-	if (!kuid_has_mapping(s_user_ns, current_fsuid()) ||
-	    !kgid_has_mapping(s_user_ns, current_fsgid()))
+	if (!kuid_has_mapping(s_user_ns, fsuid_into_mnt(user_ns)) ||
+	    !kgid_has_mapping(s_user_ns, fsgid_into_mnt(user_ns)))
 		return -EOVERFLOW;
 
-	error = inode_permission(&init_user_ns, dir->dentry->d_inode,
+	error = inode_permission(user_ns, dir->dentry->d_inode,
 				 MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
@@ -3238,7 +3262,7 @@ static int do_open(struct nameidata *nd,
 			return -EEXIST;
 		if (d_is_dir(nd->path.dentry))
 			return -EISDIR;
-		error = may_create_in_sticky(nd->dir_mode, nd->dir_uid,
+		error = may_create_in_sticky(nd,
 					     d_backing_inode(nd->path.dentry));
 		if (unlikely(error))
 			return error;
@@ -3540,7 +3564,7 @@ EXPORT_SYMBOL(user_path_create);
 int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	bool is_whiteout = S_ISCHR(mode) && dev == WHITEOUT_DEV;
-	int error = may_create(dir, dentry);
+	int error = may_create(&init_user_ns, dir, dentry);
 
 	if (error)
 		return error;
@@ -3641,7 +3665,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, d
 
 int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 {
-	int error = may_create(dir, dentry);
+	int error = may_create(&init_user_ns, dir, dentry);
 	unsigned max_links = dir->i_sb->s_max_links;
 
 	if (error)
@@ -3702,7 +3726,7 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
 
 int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 {
-	int error = may_delete(dir, dentry, 1);
+	int error = may_delete(&init_user_ns, dir, dentry, 1);
 
 	if (error)
 		return error;
@@ -3824,7 +3848,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
 int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
 {
 	struct inode *target = dentry->d_inode;
-	int error = may_delete(dir, dentry, 0);
+	int error = may_delete(&init_user_ns, dir, dentry, 0);
 
 	if (error)
 		return error;
@@ -3956,7 +3980,7 @@ SYSCALL_DEFINE1(unlink, const char __user *, pathname)
 
 int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
 {
-	int error = may_create(dir, dentry);
+	int error = may_create(&init_user_ns, dir, dentry);
 
 	if (error)
 		return error;
@@ -4045,7 +4069,7 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 	if (!inode)
 		return -ENOENT;
 
-	error = may_create(dir, new_dentry);
+	error = may_create(&init_user_ns, dir, new_dentry);
 	if (error)
 		return error;
 
@@ -4062,7 +4086,7 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 	 * be writen back improperly if their true value is unknown to
 	 * the vfs.
 	 */
-	if (HAS_UNMAPPED_ID(inode))
+	if (HAS_UNMAPPED_ID(&init_user_ns, inode))
 		return -EPERM;
 	if (!dir->i_op->link)
 		return -EPERM;
@@ -4237,6 +4261,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	       struct inode **delegated_inode, unsigned int flags)
 {
 	int error;
+	struct user_namespace *user_ns = &init_user_ns;
 	bool is_dir = d_is_dir(old_dentry);
 	struct inode *source = old_dentry->d_inode;
 	struct inode *target = new_dentry->d_inode;
@@ -4247,19 +4272,19 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (source == target)
 		return 0;
 
-	error = may_delete(old_dir, old_dentry, is_dir);
+	error = may_delete(user_ns, old_dir, old_dentry, is_dir);
 	if (error)
 		return error;
 
 	if (!target) {
-		error = may_create(new_dir, new_dentry);
+		error = may_create(user_ns, new_dir, new_dentry);
 	} else {
 		new_is_dir = d_is_dir(new_dentry);
 
 		if (!(flags & RENAME_EXCHANGE))
-			error = may_delete(new_dir, new_dentry, is_dir);
+			error = may_delete(user_ns, new_dir, new_dentry, is_dir);
 		else
-			error = may_delete(new_dir, new_dentry, new_is_dir);
+			error = may_delete(user_ns, new_dir, new_dentry, new_is_dir);
 	}
 	if (error)
 		return error;
diff --git a/fs/xattr.c b/fs/xattr.c
index 8c50b2a935e4..20376592dad6 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -98,7 +98,7 @@ xattr_permission(struct user_namespace *user_ns, struct inode *inode,
 		 * to be writen back improperly if their true value is
 		 * unknown to the vfs.
 		 */
-		if (HAS_UNMAPPED_ID(inode))
+		if (HAS_UNMAPPED_ID(user_ns, inode))
 			return -EPERM;
 	}
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 17f23f0ffca3..05a228ce767a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2065,9 +2065,10 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
 
-static inline bool HAS_UNMAPPED_ID(struct inode *inode)
+static inline bool HAS_UNMAPPED_ID(struct user_namespace *user_ns, struct inode *inode)
 {
-	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
+	return !uid_valid(i_uid_into_mnt(user_ns, inode)) ||
+	       !gid_valid(i_gid_into_mnt(user_ns, inode));
 }
 
 static inline enum rw_hint file_write_hint(struct file *file)
@@ -2791,7 +2792,8 @@ extern int notify_change(struct user_namespace *, struct dentry *,
 			 struct iattr *, struct inode **);
 extern int inode_permission(struct user_namespace *, struct inode *, int);
 extern int generic_permission(struct user_namespace *, struct inode *, int);
-extern int __check_sticky(struct inode *dir, struct inode *inode);
+extern int __check_sticky(struct user_namespace *user_ns, struct inode *dir,
+			  struct inode *inode);
 
 static inline bool execute_ok(struct inode *inode)
 {
@@ -3412,12 +3414,13 @@ static inline bool is_sxid(umode_t mode)
 	return (mode & S_ISUID) || ((mode & S_ISGID) && (mode & S_IXGRP));
 }
 
-static inline int check_sticky(struct inode *dir, struct inode *inode)
+static inline int check_sticky(struct user_namespace *user_ns,
+			       struct inode *dir, struct inode *inode)
 {
 	if (!(dir->i_mode & S_ISVTX))
 		return 0;
 
-	return __check_sticky(dir, inode);
+	return __check_sticky(user_ns, dir, inode);
 }
 
 static inline void inode_has_no_xattr(struct inode *inode)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 17/39] namei: introduce struct renamedata
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (15 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 16/39] namei: handle idmapped mounts in may_*() helpers Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 18/39] namei: prepare for idmapped mounts Christian Brauner
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

In order to handle idmapped mounts we will extend the vfs rename helper to
take two new arguments in follow up patches. Since this operations already
takes a bunch of arguments add a simple struct renamedata (based on struct
nameidata) and make the current helper to use it before we extend it.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/cachefiles/namei.c    |  9 +++++++--
 fs/ecryptfs/inode.c      | 10 +++++++---
 fs/namei.c               | 21 +++++++++++++++------
 fs/nfsd/vfs.c            |  8 +++++++-
 fs/overlayfs/overlayfs.h |  9 ++++++++-
 include/linux/fs.h       | 12 +++++++++++-
 6 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index ecc8ecbbfa5a..7b987de0babe 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -412,9 +412,14 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 	if (ret < 0) {
 		cachefiles_io_error(cache, "Rename security error %d", ret);
 	} else {
+		struct renamedata rd = {
+			.old_dir	= d_inode(dir),
+			.old_dentry	= rep,
+			.new_dir	= d_inode(cache->graveyard),
+			.new_dentry	= grave,
+		};
 		trace_cachefiles_rename(object, rep, grave, why);
-		ret = vfs_rename(d_inode(dir), rep,
-				 d_inode(cache->graveyard), grave, NULL, 0);
+		ret = vfs_rename(&rd);
 		if (ret != 0 && ret != -ENOMEM)
 			cachefiles_io_error(cache,
 					    "Rename failed with error %d", ret);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index d98448c75051..838949ede439 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -590,6 +590,7 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	struct dentry *lower_new_dir_dentry;
 	struct dentry *trap;
 	struct inode *target_inode;
+	struct renamedata rd = {};
 
 	if (flags)
 		return -EINVAL;
@@ -619,9 +620,12 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		rc = -ENOTEMPTY;
 		goto out_lock;
 	}
-	rc = vfs_rename(d_inode(lower_old_dir_dentry), lower_old_dentry,
-			d_inode(lower_new_dir_dentry), lower_new_dentry,
-			NULL, 0);
+
+	rd.old_dir	= d_inode(lower_old_dir_dentry);
+	rd.old_dentry	= lower_old_dentry;
+	rd.new_dir	= d_inode(lower_new_dir_dentry);
+	rd.new_dentry	= lower_new_dentry;
+	rc = vfs_rename(&rd);
 	if (rc)
 		goto out_lock;
 	if (target_inode)
diff --git a/fs/namei.c b/fs/namei.c
index 4dc842d1cd3a..0a2450de83bb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4256,12 +4256,15 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *	   ->i_mutex on parents, which works but leads to some truly excessive
  *	   locking].
  */
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-	       struct inode *new_dir, struct dentry *new_dentry,
-	       struct inode **delegated_inode, unsigned int flags)
+int vfs_rename(struct renamedata *rd)
 {
 	int error;
 	struct user_namespace *user_ns = &init_user_ns;
+	struct inode *old_dir = rd->old_dir, *new_dir = rd->new_dir;
+	struct dentry *old_dentry = rd->old_dentry,
+		      *new_dentry = rd->new_dentry;
+	struct inode **delegated_inode = rd->delegated_inode;
+	unsigned int flags = rd->flags;
 	bool is_dir = d_is_dir(old_dentry);
 	struct inode *source = old_dentry->d_inode;
 	struct inode *target = new_dentry->d_inode;
@@ -4385,6 +4388,7 @@ EXPORT_SYMBOL(vfs_rename);
 static int do_renameat2(int olddfd, const char __user *oldname, int newdfd,
 			const char __user *newname, unsigned int flags)
 {
+	struct renamedata rd;
 	struct dentry *old_dentry, *new_dentry;
 	struct dentry *trap;
 	struct path old_path, new_path;
@@ -4490,9 +4494,14 @@ static int do_renameat2(int olddfd, const char __user *oldname, int newdfd,
 				     &new_path, new_dentry, flags);
 	if (error)
 		goto exit5;
-	error = vfs_rename(old_path.dentry->d_inode, old_dentry,
-			   new_path.dentry->d_inode, new_dentry,
-			   &delegated_inode, flags);
+
+	rd.old_dir	   = old_path.dentry->d_inode;
+	rd.old_dentry	   = old_dentry;
+	rd.new_dir	   = new_path.dentry->d_inode;
+	rd.new_dentry	   = new_dentry;
+	rd.delegated_inode = &delegated_inode;
+	rd.flags	   = flags;
+	error = vfs_rename(&rd);
 exit5:
 	dput(new_dentry);
 exit4:
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9a0e0e5b34f5..3d7a8cd61098 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1787,7 +1787,13 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 		has_cached = true;
 		goto out_dput_old;
 	} else {
-		host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
+		struct renamedata rd = {
+			.old_dir	= fdir,
+			.old_dentry	= odentry,
+			.new_dir	= tdir,
+			.new_dentry	= ndentry,
+		};
+		host_err = vfs_rename(&rd);
 		if (!host_err) {
 			host_err = commit_metadata(tfhp);
 			if (!host_err)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index bf26fb3fa2c1..73da8710b0f0 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -212,9 +212,16 @@ static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry,
 				unsigned int flags)
 {
 	int err;
+	struct renamedata rd = {
+		.old_dir 	= olddir,
+		.old_dentry 	= olddentry,
+		.new_dir 	= newdir,
+		.new_dentry 	= newdentry,
+		.flags 		= flags,
+	};
 
 	pr_debug("rename(%pd2, %pd2, 0x%x)\n", olddentry, newdentry, flags);
-	err = vfs_rename(olddir, olddentry, newdir, newdentry, NULL, flags);
+	err = vfs_rename(&rd);
 	if (err) {
 		pr_debug("...rename(%pd2, %pd2, ...) = %i\n",
 			 olddentry, newdentry, err);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 05a228ce767a..22c0ab9d0fcf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1756,7 +1756,17 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *);
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
-extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int);
+
+struct renamedata {
+	struct inode *old_dir;
+	struct dentry *old_dentry;
+	struct inode *new_dir;
+	struct dentry *new_dentry;
+	struct inode **delegated_inode;
+	unsigned int flags;
+} __randomize_layout;
+
+extern int vfs_rename(struct renamedata *);
 
 static inline int vfs_whiteout(struct inode *dir, struct dentry *dentry)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 18/39] namei: prepare for idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (16 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 17/39] namei: introduce struct renamedata Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 19/39] open: handle idmapped mounts in do_truncate() Christian Brauner
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

The various vfs_*() helpers are called by filesystems or by the vfs itself
to perform core operations create, link, mkdir, mknod, rename, rmdir,
tmpfile and unlink. Enable them to handle idmapped mounts. If the inode is
accessed through an idmapped mount it is mapped according to the mount's
user namespace.
Afterwards the checks and operations are identical to non-idmapped mounts.
If the initial user namespace is passed all mapping operations are a nop so
non-idmapped mounts will not see a change in behavior and will also not see
any performance impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    helpers with an additional argument and switch all callers.
---
 drivers/base/devtmpfs.c  |   8 +--
 fs/cachefiles/namei.c    |  10 ++--
 fs/ecryptfs/inode.c      |  22 ++++----
 fs/init.c                |  12 ++---
 fs/namei.c               | 108 +++++++++++++++++++++++++--------------
 fs/nfsd/nfs4recover.c    |   6 +--
 fs/nfsd/vfs.c            |  19 ++++---
 fs/overlayfs/dir.c       |   4 +-
 fs/overlayfs/overlayfs.h |  20 ++++----
 include/linux/fs.h       |  25 +++++----
 ipc/mqueue.c             |   2 +-
 net/unix/af_unix.c       |   2 +-
 12 files changed, 144 insertions(+), 94 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 2e0c3cdb4184..fd4e86c58111 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -162,7 +162,7 @@ static int dev_mkdir(const char *name, umode_t mode)
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
-	err = vfs_mkdir(d_inode(path.dentry), dentry, mode);
+	err = vfs_mkdir(&init_user_ns, d_inode(path.dentry), dentry, mode);
 	if (!err)
 		/* mark as kernel-created inode */
 		d_inode(dentry)->i_private = &thread;
@@ -212,7 +212,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
-	err = vfs_mknod(d_inode(path.dentry), dentry, mode, dev->devt);
+	err = vfs_mknod(&init_user_ns, d_inode(path.dentry), dentry, mode, dev->devt);
 	if (!err) {
 		struct iattr newattrs;
 
@@ -242,7 +242,7 @@ static int dev_rmdir(const char *name)
 		return PTR_ERR(dentry);
 	if (d_really_is_positive(dentry)) {
 		if (d_inode(dentry)->i_private == &thread)
-			err = vfs_rmdir(d_inode(parent.dentry), dentry);
+			err = vfs_rmdir(&init_user_ns, d_inode(parent.dentry), dentry);
 		else
 			err = -EPERM;
 	} else {
@@ -330,7 +330,7 @@ static int handle_remove(const char *nodename, struct device *dev)
 			inode_lock(d_inode(dentry));
 			notify_change(&init_user_ns, dentry, &newattrs, NULL);
 			inode_unlock(d_inode(dentry));
-			err = vfs_unlink(d_inode(parent.dentry), dentry, NULL);
+			err = vfs_unlink(&init_user_ns, d_inode(parent.dentry), dentry, NULL);
 			if (!err || err == -ENOENT)
 				deleted = 1;
 		}
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 7b987de0babe..ae01fac0a80c 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -311,7 +311,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 			cachefiles_io_error(cache, "Unlink security error");
 		} else {
 			trace_cachefiles_unlink(object, rep, why);
-			ret = vfs_unlink(d_inode(dir), rep, NULL);
+			ret = vfs_unlink(&init_user_ns, d_inode(dir), rep, NULL);
 
 			if (preemptive)
 				cachefiles_mark_object_buried(cache, rep, why);
@@ -413,8 +413,10 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		cachefiles_io_error(cache, "Rename security error %d", ret);
 	} else {
 		struct renamedata rd = {
+			.old_user_ns	= &init_user_ns,
 			.old_dir	= d_inode(dir),
 			.old_dentry	= rep,
+			.new_user_ns	= &init_user_ns,
 			.new_dir	= d_inode(cache->graveyard),
 			.new_dentry	= grave,
 		};
@@ -566,7 +568,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			if (ret < 0)
 				goto create_error;
 			start = jiffies;
-			ret = vfs_mkdir(d_inode(dir), next, 0);
+			ret = vfs_mkdir(&init_user_ns, d_inode(dir), next, 0);
 			cachefiles_hist(cachefiles_mkdir_histogram, start);
 			if (!key)
 				trace_cachefiles_mkdir(object, next, ret);
@@ -602,7 +604,7 @@ int cachefiles_walk_to_object(struct cachefiles_object *parent,
 			if (ret < 0)
 				goto create_error;
 			start = jiffies;
-			ret = vfs_create(d_inode(dir), next, S_IFREG, true);
+			ret = vfs_create(&init_user_ns, d_inode(dir), next, S_IFREG, true);
 			cachefiles_hist(cachefiles_create_histogram, start);
 			trace_cachefiles_create(object, next, ret);
 			if (ret < 0)
@@ -796,7 +798,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
 		ret = security_path_mkdir(&path, subdir, 0700);
 		if (ret < 0)
 			goto mkdir_error;
-		ret = vfs_mkdir(d_inode(dir), subdir, 0700);
+		ret = vfs_mkdir(&init_user_ns, d_inode(dir), subdir, 0700);
 		if (ret < 0)
 			goto mkdir_error;
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 838949ede439..42066c5613ca 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -141,7 +141,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
 	else if (d_unhashed(lower_dentry))
 		rc = -EINVAL;
 	else
-		rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL);
+		rc = vfs_unlink(&init_user_ns, lower_dir_inode, lower_dentry, NULL);
 	if (rc) {
 		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
 		goto out_unlock;
@@ -180,7 +180,8 @@ ecryptfs_do_create(struct inode *directory_inode,
 
 	lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry);
 	lower_dir_dentry = lock_parent(lower_dentry);
-	rc = vfs_create(d_inode(lower_dir_dentry), lower_dentry, mode, true);
+	rc = vfs_create(&init_user_ns, d_inode(lower_dir_dentry), lower_dentry,
+			mode, true);
 	if (rc) {
 		printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
 		       "rc = [%d]\n", __func__, rc);
@@ -190,7 +191,7 @@ ecryptfs_do_create(struct inode *directory_inode,
 	inode = __ecryptfs_get_inode(d_inode(lower_dentry),
 				     directory_inode->i_sb);
 	if (IS_ERR(inode)) {
-		vfs_unlink(d_inode(lower_dir_dentry), lower_dentry, NULL);
+		vfs_unlink(&init_user_ns, d_inode(lower_dir_dentry), lower_dentry, NULL);
 		goto out_lock;
 	}
 	fsstack_copy_attr_times(directory_inode, d_inode(lower_dir_dentry));
@@ -436,8 +437,8 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
 	dget(lower_old_dentry);
 	dget(lower_new_dentry);
 	lower_dir_dentry = lock_parent(lower_new_dentry);
-	rc = vfs_link(lower_old_dentry, d_inode(lower_dir_dentry),
-		      lower_new_dentry, NULL);
+	rc = vfs_link(lower_old_dentry, &init_user_ns,
+		      d_inode(lower_dir_dentry), lower_new_dentry, NULL);
 	if (rc || d_really_is_negative(lower_new_dentry))
 		goto out_lock;
 	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
@@ -481,7 +482,7 @@ static int ecryptfs_symlink(struct inode *dir, struct dentry *dentry,
 						  strlen(symname));
 	if (rc)
 		goto out_lock;
-	rc = vfs_symlink(d_inode(lower_dir_dentry), lower_dentry,
+	rc = vfs_symlink(&init_user_ns, d_inode(lower_dir_dentry), lower_dentry,
 			 encoded_symname);
 	kfree(encoded_symname);
 	if (rc || d_really_is_negative(lower_dentry))
@@ -507,7 +508,7 @@ static int ecryptfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 
 	lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	lower_dir_dentry = lock_parent(lower_dentry);
-	rc = vfs_mkdir(d_inode(lower_dir_dentry), lower_dentry, mode);
+	rc = vfs_mkdir(&init_user_ns, d_inode(lower_dir_dentry), lower_dentry, mode);
 	if (rc || d_really_is_negative(lower_dentry))
 		goto out;
 	rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb);
@@ -541,7 +542,7 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
 	else if (d_unhashed(lower_dentry))
 		rc = -EINVAL;
 	else
-		rc = vfs_rmdir(lower_dir_inode, lower_dentry);
+		rc = vfs_rmdir(&init_user_ns, lower_dir_inode, lower_dentry);
 	if (!rc) {
 		clear_nlink(d_inode(dentry));
 		fsstack_copy_attr_times(dir, lower_dir_inode);
@@ -563,7 +564,8 @@ ecryptfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev
 
 	lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	lower_dir_dentry = lock_parent(lower_dentry);
-	rc = vfs_mknod(d_inode(lower_dir_dentry), lower_dentry, mode, dev);
+	rc = vfs_mknod(&init_user_ns, d_inode(lower_dir_dentry), lower_dentry,
+		       mode, dev);
 	if (rc || d_really_is_negative(lower_dentry))
 		goto out;
 	rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb);
@@ -621,8 +623,10 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		goto out_lock;
 	}
 
+	rd.old_user_ns	= &init_user_ns;
 	rd.old_dir	= d_inode(lower_old_dir_dentry);
 	rd.old_dentry	= lower_old_dentry;
+	rd.new_user_ns	= &init_user_ns;
 	rd.new_dir	= d_inode(lower_new_dir_dentry);
 	rd.new_dentry	= lower_new_dentry;
 	rc = vfs_rename(&rd);
diff --git a/fs/init.c b/fs/init.c
index 2b4842f4802b..76f493600030 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -160,8 +160,8 @@ int __init init_mknod(const char *filename, umode_t mode, unsigned int dev)
 		mode &= ~current_umask();
 	error = security_path_mknod(&path, dentry, mode, dev);
 	if (!error)
-		error = vfs_mknod(path.dentry->d_inode, dentry, mode,
-				  new_decode_dev(dev));
+		error = vfs_mknod(&init_user_ns, path.dentry->d_inode, dentry,
+				  mode, new_decode_dev(dev));
 	done_path_create(&path, dentry);
 	return error;
 }
@@ -190,8 +190,8 @@ int __init init_link(const char *oldname, const char *newname)
 	error = security_path_link(old_path.dentry, &new_path, new_dentry);
 	if (error)
 		goto out_dput;
-	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry,
-			 NULL);
+	error = vfs_link(old_path.dentry, &init_user_ns, 
+			 new_path.dentry->d_inode, new_dentry, NULL);
 out_dput:
 	done_path_create(&new_path, new_dentry);
 out:
@@ -210,7 +210,7 @@ int __init init_symlink(const char *oldname, const char *newname)
 		return PTR_ERR(dentry);
 	error = security_path_symlink(&path, dentry, oldname);
 	if (!error)
-		error = vfs_symlink(path.dentry->d_inode, dentry, oldname);
+		error = vfs_symlink(&init_user_ns, path.dentry->d_inode, dentry, oldname);
 	done_path_create(&path, dentry);
 	return error;
 }
@@ -233,7 +233,7 @@ int __init init_mkdir(const char *pathname, umode_t mode)
 		mode &= ~current_umask();
 	error = security_path_mkdir(&path, dentry, mode);
 	if (!error)
-		error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
+		error = vfs_mkdir(&init_user_ns, path.dentry->d_inode, dentry, mode);
 	done_path_create(&path, dentry);
 	return error;
 }
diff --git a/fs/namei.c b/fs/namei.c
index 0a2450de83bb..b91bf923d22c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2820,10 +2820,10 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
 }
 EXPORT_SYMBOL(unlock_rename);
 
-int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		bool want_excl)
+int vfs_create(struct user_namespace *user_ns, struct inode *dir,
+	       struct dentry *dentry, umode_t mode, bool want_excl)
 {
-	int error = may_create(&init_user_ns, dir, dentry);
+	int error = may_create(user_ns, dir, dentry);
 	if (error)
 		return error;
 
@@ -3298,7 +3298,8 @@ static int do_open(struct nameidata *nd,
 	return error;
 }
 
-struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag)
+struct dentry *vfs_tmpfile(struct user_namespace *user_ns,
+			   struct dentry *dentry, umode_t mode, int open_flag)
 {
 	struct dentry *child = NULL;
 	struct inode *dir = dentry->d_inode;
@@ -3306,7 +3307,7 @@ struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode, int open_flag)
 	int error;
 
 	/* we want directory to be writable */
-	error = inode_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
+	error = inode_permission(user_ns, dir, MAY_WRITE | MAY_EXEC);
 	if (error)
 		goto out_err;
 	error = -EOPNOTSUPP;
@@ -3341,6 +3342,7 @@ static int do_tmpfile(struct nameidata *nd, unsigned flags,
 		const struct open_flags *op,
 		struct file *file)
 {
+	struct user_namespace *user_ns;
 	struct dentry *child;
 	struct path path;
 	int error = path_lookupat(nd, flags | LOOKUP_DIRECTORY, &path);
@@ -3349,7 +3351,8 @@ static int do_tmpfile(struct nameidata *nd, unsigned flags,
 	error = mnt_want_write(path.mnt);
 	if (unlikely(error))
 		goto out;
-	child = vfs_tmpfile(path.dentry, op->mode, op->open_flag);
+	user_ns = mnt_user_ns(path.mnt);
+	child = vfs_tmpfile(user_ns, path.dentry, op->mode, op->open_flag);
 	error = PTR_ERR(child);
 	if (IS_ERR(child))
 		goto out2;
@@ -3561,10 +3564,11 @@ inline struct dentry *user_path_create(int dfd, const char __user *pathname,
 }
 EXPORT_SYMBOL(user_path_create);
 
-int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
+int vfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+	      struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	bool is_whiteout = S_ISCHR(mode) && dev == WHITEOUT_DEV;
-	int error = may_create(&init_user_ns, dir, dentry);
+	int error = may_create(user_ns, dir, dentry);
 
 	if (error)
 		return error;
@@ -3611,6 +3615,7 @@ static int may_mknod(umode_t mode)
 static long do_mknodat(int dfd, const char __user *filename, umode_t mode,
 		unsigned int dev)
 {
+	struct user_namespace *user_ns;
 	struct dentry *dentry;
 	struct path path;
 	int error;
@@ -3629,18 +3634,22 @@ static long do_mknodat(int dfd, const char __user *filename, umode_t mode,
 	error = security_path_mknod(&path, dentry, mode, dev);
 	if (error)
 		goto out;
+
+	user_ns = mnt_user_ns(path.mnt);
 	switch (mode & S_IFMT) {
 		case 0: case S_IFREG:
-			error = vfs_create(path.dentry->d_inode,dentry,mode,true);
+			error = vfs_create(user_ns, path.dentry->d_inode,
+					   dentry, mode, true);
 			if (!error)
 				ima_post_path_mknod(dentry);
 			break;
 		case S_IFCHR: case S_IFBLK:
-			error = vfs_mknod(path.dentry->d_inode,dentry,mode,
-					new_decode_dev(dev));
+			error = vfs_mknod(user_ns, path.dentry->d_inode, dentry,
+					  mode, new_decode_dev(dev));
 			break;
 		case S_IFIFO: case S_IFSOCK:
-			error = vfs_mknod(path.dentry->d_inode,dentry,mode,0);
+			error = vfs_mknod(user_ns, path.dentry->d_inode, dentry,
+					  mode, 0);
 			break;
 	}
 out:
@@ -3663,9 +3672,10 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, d
 	return do_mknodat(AT_FDCWD, filename, mode, dev);
 }
 
-int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+int vfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+	      struct dentry *dentry, umode_t mode)
 {
-	int error = may_create(&init_user_ns, dir, dentry);
+	int error = may_create(user_ns, dir, dentry);
 	unsigned max_links = dir->i_sb->s_max_links;
 
 	if (error)
@@ -3704,8 +3714,11 @@ static long do_mkdirat(int dfd, const char __user *pathname, umode_t mode)
 	if (!IS_POSIXACL(path.dentry->d_inode))
 		mode &= ~current_umask();
 	error = security_path_mkdir(&path, dentry, mode);
-	if (!error)
-		error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
+	if (!error) {
+		struct user_namespace *user_ns;
+		user_ns = mnt_user_ns(path.mnt);
+		error = vfs_mkdir(user_ns, path.dentry->d_inode, dentry, mode);
+	}
 	done_path_create(&path, dentry);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
@@ -3724,9 +3737,10 @@ SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
 	return do_mkdirat(AT_FDCWD, pathname, mode);
 }
 
-int vfs_rmdir(struct inode *dir, struct dentry *dentry)
+int vfs_rmdir(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry)
 {
-	int error = may_delete(&init_user_ns, dir, dentry, 1);
+	int error = may_delete(user_ns, dir, dentry, 1);
 
 	if (error)
 		return error;
@@ -3766,6 +3780,7 @@ EXPORT_SYMBOL(vfs_rmdir);
 
 long do_rmdir(int dfd, struct filename *name)
 {
+	struct user_namespace *user_ns;
 	int error = 0;
 	struct dentry *dentry;
 	struct path path;
@@ -3806,7 +3821,8 @@ long do_rmdir(int dfd, struct filename *name)
 	error = security_path_rmdir(&path, dentry);
 	if (error)
 		goto exit3;
-	error = vfs_rmdir(path.dentry->d_inode, dentry);
+	user_ns = mnt_user_ns(path.mnt);
+	error = vfs_rmdir(user_ns, path.dentry->d_inode, dentry);
 exit3:
 	dput(dentry);
 exit2:
@@ -3845,10 +3861,11 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
  * be appropriate for callers that expect the underlying filesystem not
  * to be NFS exported.
  */
-int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
+int vfs_unlink(struct user_namespace *user_ns, struct inode *dir,
+	       struct dentry *dentry, struct inode **delegated_inode)
 {
 	struct inode *target = dentry->d_inode;
-	int error = may_delete(&init_user_ns, dir, dentry, 0);
+	int error = may_delete(user_ns, dir, dentry, 0);
 
 	if (error)
 		return error;
@@ -3919,6 +3936,8 @@ long do_unlinkat(int dfd, struct filename *name)
 	dentry = __lookup_hash(&last, path.dentry, lookup_flags);
 	error = PTR_ERR(dentry);
 	if (!IS_ERR(dentry)) {
+		struct user_namespace *user_ns;
+
 		/* Why not before? Because we want correct error value */
 		if (last.name[last.len])
 			goto slashes;
@@ -3929,7 +3948,8 @@ long do_unlinkat(int dfd, struct filename *name)
 		error = security_path_unlink(&path, dentry);
 		if (error)
 			goto exit2;
-		error = vfs_unlink(path.dentry->d_inode, dentry, &delegated_inode);
+		user_ns = mnt_user_ns(path.mnt);
+		error = vfs_unlink(user_ns, path.dentry->d_inode, dentry, &delegated_inode);
 exit2:
 		dput(dentry);
 	}
@@ -3978,9 +3998,10 @@ SYSCALL_DEFINE1(unlink, const char __user *, pathname)
 	return do_unlinkat(AT_FDCWD, getname(pathname));
 }
 
-int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
+int vfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+		struct dentry *dentry, const char *oldname)
 {
-	int error = may_create(&init_user_ns, dir, dentry);
+	int error = may_create(user_ns, dir, dentry);
 
 	if (error)
 		return error;
@@ -4018,8 +4039,12 @@ static long do_symlinkat(const char __user *oldname, int newdfd,
 		goto out_putname;
 
 	error = security_path_symlink(&path, dentry, from->name);
-	if (!error)
-		error = vfs_symlink(path.dentry->d_inode, dentry, from->name);
+	if (!error) {
+		struct user_namespace *user_ns;
+		user_ns = mnt_user_ns(path.mnt);
+		error = vfs_symlink(user_ns, path.dentry->d_inode, dentry,
+				    from->name);
+	}
 	done_path_create(&path, dentry);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
@@ -4044,6 +4069,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
 /**
  * vfs_link - create a new link
  * @old_dentry:	object to be linked
+ * @user_ns:	the user namespace of the mount
  * @dir:	new parent
  * @new_dentry:	where to create the new link
  * @delegated_inode: returns inode needing a delegation break
@@ -4060,7 +4086,9 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
  * be appropriate for callers that expect the underlying filesystem not
  * to be NFS exported.
  */
-int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
+int vfs_link(struct dentry *old_dentry, struct user_namespace *user_ns,
+	     struct inode *dir, struct dentry *new_dentry,
+	     struct inode **delegated_inode)
 {
 	struct inode *inode = old_dentry->d_inode;
 	unsigned max_links = dir->i_sb->s_max_links;
@@ -4069,7 +4097,7 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 	if (!inode)
 		return -ENOENT;
 
-	error = may_create(&init_user_ns, dir, new_dentry);
+	error = may_create(user_ns, dir, new_dentry);
 	if (error)
 		return error;
 
@@ -4086,7 +4114,7 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 	 * be writen back improperly if their true value is unknown to
 	 * the vfs.
 	 */
-	if (HAS_UNMAPPED_ID(&init_user_ns, inode))
+	if (HAS_UNMAPPED_ID(user_ns, inode))
 		return -EPERM;
 	if (!dir->i_op->link)
 		return -EPERM;
@@ -4133,6 +4161,7 @@ EXPORT_SYMBOL(vfs_link);
 static int do_linkat(int olddfd, const char __user *oldname, int newdfd,
 	      const char __user *newname, int flags)
 {
+	struct user_namespace *user_ns;
 	struct dentry *new_dentry;
 	struct path old_path, new_path;
 	struct inode *delegated_inode = NULL;
@@ -4174,7 +4203,9 @@ static int do_linkat(int olddfd, const char __user *oldname, int newdfd,
 	error = security_path_link(old_path.dentry, &new_path, new_dentry);
 	if (error)
 		goto out_dput;
-	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode);
+	user_ns = mnt_user_ns(new_path.mnt);
+	error = vfs_link(old_path.dentry, user_ns, new_path.dentry->d_inode,
+			 new_dentry, &delegated_inode);
 out_dput:
 	done_path_create(&new_path, new_dentry);
 	if (delegated_inode) {
@@ -4208,8 +4239,10 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
 
 /**
  * vfs_rename - rename a filesystem object
+ * @old_user_ns: user namespace the old inode is accessed from
  * @old_dir:	parent of source
  * @old_dentry:	source
+ * @new_user_ns: user namespace the old inode is accessed from
  * @new_dir:	parent of destination
  * @new_dentry:	destination
  * @delegated_inode: returns an inode needing a delegation break
@@ -4259,7 +4292,6 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
 int vfs_rename(struct renamedata *rd)
 {
 	int error;
-	struct user_namespace *user_ns = &init_user_ns;
 	struct inode *old_dir = rd->old_dir, *new_dir = rd->new_dir;
 	struct dentry *old_dentry = rd->old_dentry,
 		      *new_dentry = rd->new_dentry;
@@ -4275,19 +4307,19 @@ int vfs_rename(struct renamedata *rd)
 	if (source == target)
 		return 0;
 
-	error = may_delete(user_ns, old_dir, old_dentry, is_dir);
+	error = may_delete(rd->old_user_ns, old_dir, old_dentry, is_dir);
 	if (error)
 		return error;
 
 	if (!target) {
-		error = may_create(user_ns, new_dir, new_dentry);
+		error = may_create(rd->new_user_ns, new_dir, new_dentry);
 	} else {
 		new_is_dir = d_is_dir(new_dentry);
 
 		if (!(flags & RENAME_EXCHANGE))
-			error = may_delete(user_ns, new_dir, new_dentry, is_dir);
+			error = may_delete(rd->new_user_ns, new_dir, new_dentry, is_dir);
 		else
-			error = may_delete(user_ns, new_dir, new_dentry, new_is_dir);
+			error = may_delete(rd->new_user_ns, new_dir, new_dentry, new_is_dir);
 	}
 	if (error)
 		return error;
@@ -4301,12 +4333,12 @@ int vfs_rename(struct renamedata *rd)
 	 */
 	if (new_dir != old_dir) {
 		if (is_dir) {
-			error = inode_permission(&init_user_ns, source, MAY_WRITE);
+			error = inode_permission(rd->old_user_ns, source, MAY_WRITE);
 			if (error)
 				return error;
 		}
 		if ((flags & RENAME_EXCHANGE) && new_is_dir) {
-			error = inode_permission(&init_user_ns, target, MAY_WRITE);
+			error = inode_permission(rd->new_user_ns, target, MAY_WRITE);
 			if (error)
 				return error;
 		}
@@ -4497,8 +4529,10 @@ static int do_renameat2(int olddfd, const char __user *oldname, int newdfd,
 
 	rd.old_dir	   = old_path.dentry->d_inode;
 	rd.old_dentry	   = old_dentry;
+	rd.old_user_ns	   = mnt_user_ns(old_path.mnt);
 	rd.new_dir	   = new_path.dentry->d_inode;
 	rd.new_dentry	   = new_dentry;
+	rd.new_user_ns	   = mnt_user_ns(new_path.mnt);
 	rd.delegated_inode = &delegated_inode;
 	rd.flags	   = flags;
 	error = vfs_rename(&rd);
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 186fa2c2c6ba..891395c6c7d3 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -233,7 +233,7 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 		 * as well be forgiving and just succeed silently.
 		 */
 		goto out_put;
-	status = vfs_mkdir(d_inode(dir), dentry, S_IRWXU);
+	status = vfs_mkdir(&init_user_ns, d_inode(dir), dentry, S_IRWXU);
 out_put:
 	dput(dentry);
 out_unlock:
@@ -353,7 +353,7 @@ nfsd4_unlink_clid_dir(char *name, int namlen, struct nfsd_net *nn)
 	status = -ENOENT;
 	if (d_really_is_negative(dentry))
 		goto out;
-	status = vfs_rmdir(d_inode(dir), dentry);
+	status = vfs_rmdir(&init_user_ns, d_inode(dir), dentry);
 out:
 	dput(dentry);
 out_unlock:
@@ -443,7 +443,7 @@ purge_old(struct dentry *parent, struct dentry *child, struct nfsd_net *nn)
 	if (nfs4_has_reclaimed_state(name, nn))
 		goto out_free;
 
-	status = vfs_rmdir(d_inode(parent), child);
+	status = vfs_rmdir(&init_user_ns, d_inode(parent), child);
 	if (status)
 		printk("failed to remove client recovery directory %pd\n",
 				child);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 3d7a8cd61098..49837c995377 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1245,12 +1245,12 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	host_err = 0;
 	switch (type) {
 	case S_IFREG:
-		host_err = vfs_create(dirp, dchild, iap->ia_mode, true);
+		host_err = vfs_create(&init_user_ns, dirp, dchild, iap->ia_mode, true);
 		if (!host_err)
 			nfsd_check_ignore_resizing(iap);
 		break;
 	case S_IFDIR:
-		host_err = vfs_mkdir(dirp, dchild, iap->ia_mode);
+		host_err = vfs_mkdir(&init_user_ns, dirp, dchild, iap->ia_mode);
 		if (!host_err && unlikely(d_unhashed(dchild))) {
 			struct dentry *d;
 			d = lookup_one_len(dchild->d_name.name,
@@ -1278,7 +1278,8 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	case S_IFBLK:
 	case S_IFIFO:
 	case S_IFSOCK:
-		host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
+		host_err = vfs_mknod(&init_user_ns, dirp, dchild,
+				     iap->ia_mode, rdev);
 		break;
 	default:
 		printk(KERN_WARNING "nfsd: bad file type %o in nfsd_create\n",
@@ -1476,7 +1477,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (!IS_POSIXACL(dirp))
 		iap->ia_mode &= ~current_umask();
 
-	host_err = vfs_create(dirp, dchild, iap->ia_mode, true);
+	host_err = vfs_create(&init_user_ns, dirp, dchild, iap->ia_mode, true);
 	if (host_err < 0) {
 		fh_drop_write(fhp);
 		goto out_nfserr;
@@ -1600,7 +1601,7 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (IS_ERR(dnew))
 		goto out_nfserr;
 
-	host_err = vfs_symlink(d_inode(dentry), dnew, path);
+	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
 	err = nfserrno(host_err);
 	if (!err)
 		err = nfserrno(commit_metadata(fhp));
@@ -1668,7 +1669,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 	err = nfserr_noent;
 	if (d_really_is_negative(dold))
 		goto out_dput;
-	host_err = vfs_link(dold, dirp, dnew, NULL);
+	host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL);
 	if (!host_err) {
 		err = nfserrno(commit_metadata(ffhp));
 		if (!err)
@@ -1788,8 +1789,10 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 		goto out_dput_old;
 	} else {
 		struct renamedata rd = {
+			.old_user_ns	= &init_user_ns,
 			.old_dir	= fdir,
 			.old_dentry	= odentry,
+			.new_user_ns	= &init_user_ns,
 			.new_dir	= tdir,
 			.new_dentry	= ndentry,
 		};
@@ -1879,9 +1882,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 
 	if (type != S_IFDIR) {
 		nfsd_close_cached_files(rdentry);
-		host_err = vfs_unlink(dirp, rdentry, NULL);
+		host_err = vfs_unlink(&init_user_ns, dirp, rdentry, NULL);
 	} else {
-		host_err = vfs_rmdir(dirp, rdentry);
+		host_err = vfs_rmdir(&init_user_ns, dirp, rdentry);
 	}
 
 	if (!host_err)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a1794c838b78..fa6e3fde9a26 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -821,9 +821,9 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
 		goto out_dput_upper;
 
 	if (is_dir)
-		err = vfs_rmdir(dir, upper);
+		err = vfs_rmdir(&init_user_ns, dir, upper);
 	else
-		err = vfs_unlink(dir, upper, NULL);
+		err = vfs_unlink(&init_user_ns, dir, upper, NULL);
 	ovl_dir_modified(dentry->d_parent, ovl_type_origin(dentry));
 
 	/*
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 73da8710b0f0..0249ea876dc3 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -121,7 +121,7 @@ static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
 
 static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry)
 {
-	int err = vfs_rmdir(dir, dentry);
+	int err = vfs_rmdir(&init_user_ns, dir, dentry);
 
 	pr_debug("rmdir(%pd2) = %i\n", dentry, err);
 	return err;
@@ -129,7 +129,7 @@ static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry)
 
 static inline int ovl_do_unlink(struct inode *dir, struct dentry *dentry)
 {
-	int err = vfs_unlink(dir, dentry, NULL);
+	int err = vfs_unlink(&init_user_ns, dir, dentry, NULL);
 
 	pr_debug("unlink(%pd2) = %i\n", dentry, err);
 	return err;
@@ -138,7 +138,7 @@ static inline int ovl_do_unlink(struct inode *dir, struct dentry *dentry)
 static inline int ovl_do_link(struct dentry *old_dentry, struct inode *dir,
 			      struct dentry *new_dentry)
 {
-	int err = vfs_link(old_dentry, dir, new_dentry, NULL);
+	int err = vfs_link(old_dentry, &init_user_ns, dir, new_dentry, NULL);
 
 	pr_debug("link(%pd2, %pd2) = %i\n", old_dentry, new_dentry, err);
 	return err;
@@ -147,7 +147,7 @@ static inline int ovl_do_link(struct dentry *old_dentry, struct inode *dir,
 static inline int ovl_do_create(struct inode *dir, struct dentry *dentry,
 				umode_t mode)
 {
-	int err = vfs_create(dir, dentry, mode, true);
+	int err = vfs_create(&init_user_ns, dir, dentry, mode, true);
 
 	pr_debug("create(%pd2, 0%o) = %i\n", dentry, mode, err);
 	return err;
@@ -156,7 +156,7 @@ static inline int ovl_do_create(struct inode *dir, struct dentry *dentry,
 static inline int ovl_do_mkdir(struct inode *dir, struct dentry *dentry,
 			       umode_t mode)
 {
-	int err = vfs_mkdir(dir, dentry, mode);
+	int err = vfs_mkdir(&init_user_ns, dir, dentry, mode);
 	pr_debug("mkdir(%pd2, 0%o) = %i\n", dentry, mode, err);
 	return err;
 }
@@ -164,7 +164,7 @@ static inline int ovl_do_mkdir(struct inode *dir, struct dentry *dentry,
 static inline int ovl_do_mknod(struct inode *dir, struct dentry *dentry,
 			       umode_t mode, dev_t dev)
 {
-	int err = vfs_mknod(dir, dentry, mode, dev);
+	int err = vfs_mknod(&init_user_ns, dir, dentry, mode, dev);
 
 	pr_debug("mknod(%pd2, 0%o, 0%o) = %i\n", dentry, mode, dev, err);
 	return err;
@@ -173,7 +173,7 @@ static inline int ovl_do_mknod(struct inode *dir, struct dentry *dentry,
 static inline int ovl_do_symlink(struct inode *dir, struct dentry *dentry,
 				 const char *oldname)
 {
-	int err = vfs_symlink(dir, dentry, oldname);
+	int err = vfs_symlink(&init_user_ns, dir, dentry, oldname);
 
 	pr_debug("symlink(\"%s\", %pd2) = %i\n", oldname, dentry, err);
 	return err;
@@ -213,8 +213,10 @@ static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry,
 {
 	int err;
 	struct renamedata rd = {
+		.old_user_ns	= &init_user_ns,
 		.old_dir 	= olddir,
 		.old_dentry 	= olddentry,
+		.new_user_ns	= &init_user_ns,
 		.new_dir 	= newdir,
 		.new_dentry 	= newdentry,
 		.flags 		= flags,
@@ -231,14 +233,14 @@ static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry,
 
 static inline int ovl_do_whiteout(struct inode *dir, struct dentry *dentry)
 {
-	int err = vfs_whiteout(dir, dentry);
+	int err = vfs_whiteout(&init_user_ns, dir, dentry);
 	pr_debug("whiteout(%pd2) = %i\n", dentry, err);
 	return err;
 }
 
 static inline struct dentry *ovl_do_tmpfile(struct dentry *dentry, umode_t mode)
 {
-	struct dentry *ret = vfs_tmpfile(dentry, mode, 0);
+	struct dentry *ret = vfs_tmpfile(&init_user_ns, dentry, mode, 0);
 	int err = PTR_ERR_OR_ZERO(ret);
 
 	pr_debug("tmpfile(%pd2, 0%o) = %i\n", dentry, mode, err);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 22c0ab9d0fcf..f29909139838 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1749,17 +1749,20 @@ extern bool inode_owner_or_capable(struct user_namespace *user_ns, const struct
 /*
  * VFS helper functions..
  */
-extern int vfs_create(struct inode *, struct dentry *, umode_t, bool);
-extern int vfs_mkdir(struct inode *, struct dentry *, umode_t);
-extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
-extern int vfs_symlink(struct inode *, struct dentry *, const char *);
-extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
-extern int vfs_rmdir(struct inode *, struct dentry *);
-extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
+extern int vfs_create(struct user_namespace *, struct inode *, struct dentry *, umode_t, bool);
+extern int vfs_mkdir(struct user_namespace *, struct inode *, struct dentry *, umode_t);
+extern int vfs_mknod(struct user_namespace *, struct inode *, struct dentry *, umode_t, dev_t);
+extern int vfs_symlink(struct user_namespace *, struct inode *, struct dentry *, const char *);
+extern int vfs_link(struct dentry *, struct user_namespace *, struct inode *,
+		    struct dentry *, struct inode **);
+extern int vfs_rmdir(struct user_namespace *, struct inode *, struct dentry *);
+extern int vfs_unlink(struct user_namespace *, struct inode *, struct dentry *, struct inode **);
 
 struct renamedata {
+	struct user_namespace *old_user_ns;
 	struct inode *old_dir;
 	struct dentry *old_dentry;
+	struct user_namespace *new_user_ns;
 	struct inode *new_dir;
 	struct dentry *new_dentry;
 	struct inode **delegated_inode;
@@ -1768,12 +1771,14 @@ struct renamedata {
 
 extern int vfs_rename(struct renamedata *);
 
-static inline int vfs_whiteout(struct inode *dir, struct dentry *dentry)
+static inline int vfs_whiteout(struct user_namespace *user_ns,
+			       struct inode *dir, struct dentry *dentry)
 {
-	return vfs_mknod(dir, dentry, S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
+	return vfs_mknod(user_ns, dir, dentry, S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
 }
 
-extern struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode,
+extern struct dentry *vfs_tmpfile(struct user_namespace *user_ns,
+				  struct dentry *dentry, umode_t mode,
 				  int open_flag);
 
 int vfs_mkobj(struct dentry *, umode_t,
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 693f01fe1216..8b8c300854db 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -965,7 +965,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
 		err = -ENOENT;
 	} else {
 		ihold(inode);
-		err = vfs_unlink(d_inode(dentry->d_parent), dentry, NULL);
+		err = vfs_unlink(&init_user_ns, d_inode(dentry->d_parent), dentry, NULL);
 	}
 	dput(dentry);
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index f568526d4a02..b4987805e5e5 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -996,7 +996,7 @@ static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
 	 */
 	err = security_path_mknod(&path, dentry, mode, 0);
 	if (!err) {
-		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
+		err = vfs_mknod(&init_user_ns, d_inode(path.dentry), dentry, mode, 0);
 		if (!err) {
 			res->mnt = mntget(path.mnt);
 			res->dentry = dget(dentry);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 19/39] open: handle idmapped mounts in do_truncate()
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (17 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 18/39] namei: prepare for idmapped mounts Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:36 ` [PATCH v2 20/39] open: handle idmapped mounts Christian Brauner
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

When truncating files the vfs will verify that the caller is privileged
over the inode. Since the do_truncate() helper is only used in a few places
in the vfs code extend it to handle idmapped mounts instead of adding a new
helper.  If the inode is accessed through an idmapped mount it is mapped
according to the mount's user namespace. Afterwards the permissions checks
are identical to non-idmapped mounts. If the initial user namespace is
passed all mapping operations are a nop so non-idmapped mounts will not see
a change in behavior and will also not see any performance impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/coredump.c      | 12 +++++++++---
 fs/inode.c         | 13 +++++++++----
 fs/namei.c         |  6 +++---
 fs/open.c          | 21 +++++++++++++--------
 include/linux/fs.h |  4 ++--
 5 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 0cd9056d79cc..25beac7230ff 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -703,6 +703,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 			goto close_fail;
 		}
 	} else {
+		struct user_namespace *user_ns;
 		struct inode *inode;
 		int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW |
 				 O_LARGEFILE | O_EXCL;
@@ -786,7 +787,8 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 			goto close_fail;
 		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
 			goto close_fail;
-		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
+		user_ns = mnt_user_ns(cprm.file->f_path.mnt);
+		if (do_truncate(user_ns, cprm.file->f_path.dentry, 0, 0, cprm.file))
 			goto close_fail;
 	}
 
@@ -931,8 +933,12 @@ void dump_truncate(struct coredump_params *cprm)
 
 	if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
 		offset = file->f_op->llseek(file, 0, SEEK_CUR);
-		if (i_size_read(file->f_mapping->host) < offset)
-			do_truncate(file->f_path.dentry, offset, 0, file);
+		if (i_size_read(file->f_mapping->host) < offset) {
+			struct user_namespace *user_ns;
+
+			user_ns = mnt_user_ns(file->f_path.mnt);
+			do_truncate(user_ns, file->f_path.dentry, offset, 0, file);
+		}
 	}
 }
 EXPORT_SYMBOL(dump_truncate);
diff --git a/fs/inode.c b/fs/inode.c
index 75c64f003c45..0ccdd673636d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1904,7 +1904,8 @@ int dentry_needs_remove_privs(struct dentry *dentry)
 	return mask;
 }
 
-static int __remove_privs(struct dentry *dentry, int kill)
+static int __remove_privs(struct user_namespace *user_ns, struct dentry *dentry,
+			  int kill)
 {
 	struct iattr newattrs;
 
@@ -1913,7 +1914,7 @@ static int __remove_privs(struct dentry *dentry, int kill)
 	 * Note we call this on write, so notify_change will not
 	 * encounter any conflicting delegations:
 	 */
-	return notify_change(&init_user_ns, dentry, &newattrs, NULL);
+	return notify_change(user_ns, dentry, &newattrs, NULL);
 }
 
 /*
@@ -1939,8 +1940,12 @@ int file_remove_privs(struct file *file)
 	kill = dentry_needs_remove_privs(dentry);
 	if (kill < 0)
 		return kill;
-	if (kill)
-		error = __remove_privs(dentry, kill);
+	if (kill) {
+		struct user_namespace *user_ns;
+
+		user_ns = mnt_user_ns(file->f_path.mnt);
+		error = __remove_privs(user_ns, dentry, kill);
+	}
 	if (!error)
 		inode_has_no_xattr(inode);
 
diff --git a/fs/namei.c b/fs/namei.c
index b91bf923d22c..5601b6680d4c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2940,9 +2940,9 @@ static int handle_truncate(struct file *filp)
 	if (!error)
 		error = security_path_truncate(path);
 	if (!error) {
-		error = do_truncate(path->dentry, 0,
-				    ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
-				    filp);
+		error = do_truncate(mnt_user_ns(filp->f_path.mnt),
+				    path->dentry, 0,
+				    ATTR_MTIME | ATTR_CTIME | ATTR_OPEN, filp);
 	}
 	put_write_access(inode);
 	return error;
diff --git a/fs/open.c b/fs/open.c
index 2dc94689a7dc..137dcc52d2f8 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -35,8 +35,8 @@
 
 #include "internal.h"
 
-int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
-	struct file *filp)
+int do_truncate(struct user_namespace *user_ns, struct dentry *dentry,
+		loff_t length, unsigned int time_attrs, struct file *filp)
 {
 	int ret;
 	struct iattr newattrs;
@@ -61,13 +61,14 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 
 	inode_lock(dentry->d_inode);
 	/* Note any delegations or leases have already been broken: */
-	ret = notify_change(&init_user_ns, dentry, &newattrs, NULL);
+	ret = notify_change(user_ns, dentry, &newattrs, NULL);
 	inode_unlock(dentry->d_inode);
 	return ret;
 }
 
 long vfs_truncate(const struct path *path, loff_t length)
 {
+	struct user_namespace *user_ns;
 	struct inode *inode;
 	long error;
 
@@ -83,7 +84,8 @@ long vfs_truncate(const struct path *path, loff_t length)
 	if (error)
 		goto out;
 
-	error = inode_permission(&init_user_ns, inode, MAY_WRITE);
+	user_ns = mnt_user_ns(path->mnt);
+	error = inode_permission(user_ns, inode, MAY_WRITE);
 	if (error)
 		goto mnt_drop_write_and_out;
 
@@ -107,7 +109,7 @@ long vfs_truncate(const struct path *path, loff_t length)
 	if (!error)
 		error = security_path_truncate(path);
 	if (!error)
-		error = do_truncate(path->dentry, length, 0, NULL);
+		error = do_truncate(user_ns, path->dentry, length, 0, NULL);
 
 put_write_and_out:
 	put_write_access(inode);
@@ -186,13 +188,16 @@ long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 	/* Check IS_APPEND on real upper inode */
 	if (IS_APPEND(file_inode(f.file)))
 		goto out_putf;
-
 	sb_start_write(inode->i_sb);
 	error = locks_verify_truncate(inode, f.file, length);
 	if (!error)
 		error = security_path_truncate(&f.file->f_path);
-	if (!error)
-		error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, f.file);
+	if (!error) {
+		struct user_namespace *user_ns;
+
+		user_ns = mnt_user_ns(f.file->f_path.mnt);
+		error = do_truncate(user_ns, dentry, length, ATTR_MTIME | ATTR_CTIME, f.file);
+	}
 	sb_end_write(inode->i_sb);
 out_putf:
 	fdput(f);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f29909139838..1f2ec4c3c70b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2565,8 +2565,8 @@ struct filename {
 static_assert(offsetof(struct filename, iname) % sizeof(long) == 0);
 
 extern long vfs_truncate(const struct path *, loff_t);
-extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs,
-		       struct file *filp);
+extern int do_truncate(struct user_namespace *, struct dentry *, loff_t start,
+		       unsigned int time_attrs, struct file *filp);
 extern int vfs_fallocate(struct file *file, int mode, loff_t offset,
 			loff_t len);
 extern long do_sys_open(int dfd, const char __user *filename, int flags,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 20/39] open: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (18 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 19/39] open: handle idmapped mounts in do_truncate() Christian Brauner
@ 2020-11-15 10:36 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 21/39] af_unix: " Christian Brauner
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:36 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

For core file operations such as changing directories or chrooting,
determining file access, changing mode or ownership the vfs will verify
that the caller is privileged over the inode. Extend the various helpers to
handle idmapped mounts. If the inode is accessed through an idmapped mount
it is mapped according to the mount's user namespace.  Afterwards the
permissions checks are identical to non-idmapped mounts.  When changing
file ownership we need to map the mount from the mount's user namespace. If
the initial user namespace is passed all mapping operations are a nop so
non-idmapped mounts will not see a change in behavior and will also not see
any performance impact.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/open.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 137dcc52d2f8..2e2eb55976b1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -401,6 +401,7 @@ static const struct cred *access_override_creds(void)
 
 static long do_faccessat(int dfd, const char __user *filename, int mode, int flags)
 {
+	struct user_namespace *user_ns;
 	struct path path;
 	struct inode *inode;
 	int res;
@@ -441,7 +442,8 @@ static long do_faccessat(int dfd, const char __user *filename, int mode, int fla
 			goto out_path_release;
 	}
 
-	res = inode_permission(&init_user_ns, inode, mode | MAY_ACCESS);
+	user_ns = mnt_user_ns(path.mnt);
+	res = inode_permission(user_ns, inode, mode | MAY_ACCESS);
 	/* SuS v2 requires we report a read only fs too */
 	if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
 		goto out_path_release;
@@ -489,6 +491,7 @@ SYSCALL_DEFINE2(access, const char __user *, filename, int, mode)
 
 SYSCALL_DEFINE1(chdir, const char __user *, filename)
 {
+	struct user_namespace *user_ns;
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
@@ -497,7 +500,8 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 	if (error)
 		goto out;
 
-	error = inode_permission(&init_user_ns, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	user_ns = mnt_user_ns(path.mnt);
+	error = inode_permission(user_ns, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
@@ -515,6 +519,7 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 
 SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 {
+	struct user_namespace *user_ns;
 	struct fd f = fdget_raw(fd);
 	int error;
 
@@ -526,7 +531,8 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 	if (!d_can_lookup(f.file->f_path.dentry))
 		goto out_putf;
 
-	error = inode_permission(&init_user_ns, file_inode(f.file), MAY_EXEC | MAY_CHDIR);
+	user_ns = mnt_user_ns(f.file->f_path.mnt);
+	error = inode_permission(user_ns, file_inode(f.file), MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &f.file->f_path);
 out_putf:
@@ -537,6 +543,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 
 SYSCALL_DEFINE1(chroot, const char __user *, filename)
 {
+	struct user_namespace *user_ns;
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
@@ -545,7 +552,8 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
 	if (error)
 		goto out;
 
-	error = inode_permission(&init_user_ns, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+	user_ns = mnt_user_ns(path.mnt);
+	error = inode_permission(user_ns, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
 
@@ -570,6 +578,7 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
 
 int chmod_common(const struct path *path, umode_t mode)
 {
+	struct user_namespace *user_ns;
 	struct inode *inode = path->dentry->d_inode;
 	struct inode *delegated_inode = NULL;
 	struct iattr newattrs;
@@ -585,7 +594,8 @@ int chmod_common(const struct path *path, umode_t mode)
 		goto out_unlock;
 	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
-	error = notify_change(&init_user_ns, path->dentry, &newattrs, &delegated_inode);
+	user_ns = mnt_user_ns(path->mnt);
+	error = notify_change(user_ns, path->dentry, &newattrs, &delegated_inode);
 out_unlock:
 	inode_unlock(inode);
 	if (delegated_inode) {
@@ -646,6 +656,7 @@ SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
 
 int chown_common(const struct path *path, uid_t user, gid_t group)
 {
+	struct user_namespace *user_ns;
 	struct inode *inode = path->dentry->d_inode;
 	struct inode *delegated_inode = NULL;
 	int error;
@@ -656,6 +667,12 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
 	uid = make_kuid(current_user_ns(), user);
 	gid = make_kgid(current_user_ns(), group);
 
+	user_ns = mnt_user_ns(path->mnt);
+	if (mnt_idmapped(path->mnt)) {
+		uid = kuid_from_mnt(user_ns, uid);
+		gid = kgid_from_mnt(user_ns, gid);
+	}
+
 retry_deleg:
 	newattrs.ia_valid =  ATTR_CTIME;
 	if (user != (uid_t) -1) {
@@ -676,7 +693,7 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
 	inode_lock(inode);
 	error = security_path_chown(path, uid, gid);
 	if (!error)
-		error = notify_change(&init_user_ns, path->dentry, &newattrs, &delegated_inode);
+		error = notify_change(user_ns, path->dentry, &newattrs, &delegated_inode);
 	inode_unlock(inode);
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
@@ -1133,7 +1150,7 @@ struct file *filp_open(const char *filename, int flags, umode_t mode)
 {
 	struct filename *name = getname_kernel(filename);
 	struct file *file = ERR_CAST(name);
-	
+
 	if (!IS_ERR(name)) {
 		file = file_open_name(name, flags, mode);
 		putname(name);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 21/39] af_unix: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (19 preceding siblings ...)
  2020-11-15 10:36 ` [PATCH v2 20/39] open: handle idmapped mounts Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 22/39] utimes: " Christian Brauner
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

When binding a non-abstract AF_UNIX socket it will gain a representation in the
filesystem. Enable the socket infrastructure to handle idmapped mounts by
passing down the user namespace of the mount the socket will be created
from. Non-idmapped mounts will not see any altered behavior.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 net/unix/af_unix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b4987805e5e5..4be33240e9cc 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -996,7 +996,7 @@ static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
 	 */
 	err = security_path_mknod(&path, dentry, mode, 0);
 	if (!err) {
-		err = vfs_mknod(&init_user_ns, d_inode(path.dentry), dentry, mode, 0);
+		err = vfs_mknod(mnt_user_ns(path.mnt), d_inode(path.dentry), dentry, mode, 0);
 		if (!err) {
 			res->mnt = mntget(path.mnt);
 			res->dentry = dget(dentry);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 22/39] utimes: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (20 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 21/39] af_unix: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 23/39] fcntl: " Christian Brauner
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Enable the vfs_utimes() helper to handle idmapped mounts by passing down
the mount's user namespace.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/utimes.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/utimes.c b/fs/utimes.c
index 1a4130bee157..ead17d623aaa 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -22,6 +22,7 @@ int vfs_utimes(const struct path *path, struct timespec64 *times)
 	struct iattr newattrs;
 	struct inode *inode = path->dentry->d_inode;
 	struct inode *delegated_inode = NULL;
+	struct user_namespace *user_ns;
 
 	if (times) {
 		if (!nsec_valid(times[0].tv_nsec) ||
@@ -61,8 +62,9 @@ int vfs_utimes(const struct path *path, struct timespec64 *times)
 		newattrs.ia_valid |= ATTR_TOUCH;
 	}
 retry_deleg:
+	user_ns = mnt_user_ns(path->mnt);
 	inode_lock(inode);
-	error = notify_change(&init_user_ns, path->dentry, &newattrs, &delegated_inode);
+	error = notify_change(user_ns, path->dentry, &newattrs, &delegated_inode);
 	inode_unlock(inode);
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 23/39] fcntl: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (21 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 22/39] utimes: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 24/39] notify: " Christian Brauner
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Enable the setfl() helper to handle idmapped mounts by passing down the
mount's user namespace.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 fs/fcntl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index df091d435603..ed330fa91438 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -25,6 +25,7 @@
 #include <linux/user_namespace.h>
 #include <linux/memfd.h>
 #include <linux/compat.h>
+#include <linux/mount.h>
 
 #include <linux/poll.h>
 #include <asm/siginfo.h>
@@ -46,7 +47,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 
 	/* O_NOATIME can only be set by the owner or superuser */
 	if ((arg & O_NOATIME) && !(filp->f_flags & O_NOATIME))
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(mnt_user_ns(filp->f_path.mnt), inode))
 			return -EPERM;
 
 	/* required for strict SunOS emulation */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 24/39] notify: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (22 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 23/39] fcntl: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 25/39] init: " Christian Brauner
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Enable notify implementations to handle idmapped mounts by passing down the
mount's user namespace.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 fs/notify/fanotify/fanotify_user.c | 2 +-
 fs/notify/inotify/inotify_user.c   | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index de4d01bb1d8d..e3b2cb6a9d81 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -702,7 +702,7 @@ static int fanotify_find_path(int dfd, const char __user *filename,
 	}
 
 	/* you can only watch an inode if you have read permissions on it */
-	ret = inode_permission(&init_user_ns, path->dentry->d_inode, MAY_READ);
+	ret = inode_permission(mnt_user_ns(path->mnt), path->dentry->d_inode, MAY_READ);
 	if (ret) {
 		path_put(path);
 		goto out;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index e995fd4e4e53..f39f5b81f2b3 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -31,6 +31,7 @@
 #include <linux/wait.h>
 #include <linux/memcontrol.h>
 #include <linux/security.h>
+#include <linux/mount.h>
 
 #include "inotify.h"
 #include "../fdinfo.h"
@@ -343,7 +344,7 @@ static int inotify_find_inode(const char __user *dirname, struct path *path,
 	if (error)
 		return error;
 	/* you can only watch an inode if you have read permissions on it */
-	error = inode_permission(&init_user_ns, path->dentry->d_inode, MAY_READ);
+	error = inode_permission(mnt_user_ns(path->mnt), path->dentry->d_inode, MAY_READ);
 	if (error) {
 		path_put(path);
 		return error;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 25/39] init: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (23 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 24/39] notify: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 26/39] ioctl: " Christian Brauner
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Enable the init helpers to handle idmapped mounts by passing down the mount's
user namespace.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 fs/init.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/init.c b/fs/init.c
index 76f493600030..334e4c9c07eb 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -49,7 +49,7 @@ int __init init_chdir(const char *filename)
 	error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path);
 	if (error)
 		return error;
-	error = inode_permission(&init_user_ns, path.dentry->d_inode,
+	error = inode_permission(mnt_user_ns(path.mnt), path.dentry->d_inode,
 				 MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &path);
@@ -65,7 +65,7 @@ int __init init_chroot(const char *filename)
 	error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path);
 	if (error)
 		return error;
-	error = inode_permission(&init_user_ns, path.dentry->d_inode,
+	error = inode_permission(mnt_user_ns(path.mnt), path.dentry->d_inode,
 				 MAY_EXEC | MAY_CHDIR);
 	if (error)
 		goto dput_and_out;
@@ -120,7 +120,7 @@ int __init init_eaccess(const char *filename)
 	error = kern_path(filename, LOOKUP_FOLLOW, &path);
 	if (error)
 		return error;
-	error = inode_permission(&init_user_ns, d_inode(path.dentry),
+	error = inode_permission(mnt_user_ns(path.mnt), d_inode(path.dentry),
 				 MAY_ACCESS);
 	path_put(&path);
 	return error;
@@ -190,7 +190,7 @@ int __init init_link(const char *oldname, const char *newname)
 	error = security_path_link(old_path.dentry, &new_path, new_dentry);
 	if (error)
 		goto out_dput;
-	error = vfs_link(old_path.dentry, &init_user_ns, 
+	error = vfs_link(old_path.dentry, &init_user_ns,
 			 new_path.dentry->d_inode, new_dentry, NULL);
 out_dput:
 	done_path_create(&new_path, new_dentry);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 26/39] ioctl: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (24 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 25/39] init: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 27/39] would_dump: " Christian Brauner
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Enable generic ioctls to handle idmapped mounts by passing down the mount's
user namespace.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 fs/remap_range.c   | 7 +++++--
 fs/verity/enable.c | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/remap_range.c b/fs/remap_range.c
index 9e5b27641756..fe7f07228462 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -432,13 +432,16 @@ EXPORT_SYMBOL(vfs_clone_file_range);
 /* Check whether we are allowed to dedupe the destination file */
 static bool allow_file_dedupe(struct file *file)
 {
+	struct user_namespace *user_ns = mnt_user_ns(file->f_path.mnt);
+	struct inode *inode = file_inode(file);
+
 	if (capable(CAP_SYS_ADMIN))
 		return true;
 	if (file->f_mode & FMODE_WRITE)
 		return true;
-	if (uid_eq(current_fsuid(), file_inode(file)->i_uid))
+	if (uid_eq(current_fsuid(), i_uid_into_mnt(user_ns, inode)))
 		return true;
-	if (!inode_permission(&init_user_ns, file_inode(file), MAY_WRITE))
+	if (!inode_permission(user_ns, inode, MAY_WRITE))
 		return true;
 	return false;
 }
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 7449ef0050f4..8b9ea0f0850f 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -369,7 +369,7 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
 	 * has verity enabled, and to stabilize the data being hashed.
 	 */
 
-	err = inode_permission(&init_user_ns, inode, MAY_WRITE);
+	err = inode_permission(mnt_user_ns(filp->f_path.mnt), inode, MAY_WRITE);
 	if (err)
 		return err;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 27/39] would_dump: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (25 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 26/39] ioctl: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 28/39] exec: " Christian Brauner
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

When determining whether or not to create a coredump the vfs will verify that
the caller is privileged over the inode. Make the would_dump() helper handle
idmapped mounts by passing down the mount's user namespace of the exec file.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/exec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b499a1a03934..10c06fdf78a7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1391,14 +1391,15 @@ EXPORT_SYMBOL(begin_new_exec);
 void would_dump(struct linux_binprm *bprm, struct file *file)
 {
 	struct inode *inode = file_inode(file);
-	if (inode_permission(&init_user_ns, inode, MAY_READ) < 0) {
+	struct user_namespace *ns = mnt_user_ns(file->f_path.mnt);
+	if (inode_permission(ns, inode, MAY_READ) < 0) {
 		struct user_namespace *old, *user_ns;
 		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
 
 		/* Ensure mm->user_ns contains the executable */
 		user_ns = old = bprm->mm->user_ns;
 		while ((user_ns != &init_user_ns) &&
-		       !privileged_wrt_inode_uidgid(user_ns, &init_user_ns, inode))
+		       !privileged_wrt_inode_uidgid(user_ns, ns, inode))
 			user_ns = user_ns->parent;
 
 		if (old != user_ns) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 28/39] exec: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (26 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 27/39] would_dump: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 29/39] fs: add helpers for idmap mounts Christian Brauner
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

When executing a setuid binary the kernel will verify in bprm_fill_uid() that
the inode has a mapping in the caller's user namespace before setting the
callers uid and gid. Let bprm_fill_uid() handle idmapped mounts. If the inode
is accessed through an idmapped mount it is mapped according to the mount's
user namespace. Afterwards the checks are identical to non-idmapped mounts.On
regular mounts this is a nop.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/exec.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 10c06fdf78a7..7d6d3dd17e84 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1567,6 +1567,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
 {
 	/* Handle suid and sgid on files */
+	struct user_namespace *user_ns;
 	struct inode *inode;
 	unsigned int mode;
 	kuid_t uid;
@@ -1583,13 +1584,15 @@ static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file)
 	if (!(mode & (S_ISUID|S_ISGID)))
 		return;
 
+	user_ns = mnt_user_ns(file->f_path.mnt);
+
 	/* Be careful if suid/sgid is set */
 	inode_lock(inode);
 
 	/* reload atomically mode/uid/gid now that lock held */
 	mode = inode->i_mode;
-	uid = inode->i_uid;
-	gid = inode->i_gid;
+	uid = i_uid_into_mnt(user_ns, inode);
+	gid = i_gid_into_mnt(user_ns, inode);
 	inode_unlock(inode);
 
 	/* We ignore suid/sgid if there are no mappings for them in the ns */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 29/39] fs: add helpers for idmap mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (27 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 28/39] exec: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 30/39] apparmor: handle idmapped mounts Christian Brauner
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user namespace
the mount has been marked with. This can be used for additional permission
checking and also to enable filesystems to translate between uids and gids
if they need to. We have implemented all relevant helpers in earlier
patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doens't leave us with additional inode methods.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig:
  - Don't pollute the vfs with additional helpers simply extend the existing
    inode methods with an additional argument and switch all callers.
---
 Documentation/filesystems/vfs.rst         | 17 ++++----
 arch/powerpc/platforms/cell/spufs/inode.c |  3 +-
 drivers/android/binderfs.c                |  3 +-
 fs/9p/acl.c                               |  2 +-
 fs/9p/v9fs.h                              |  3 +-
 fs/9p/v9fs_vfs.h                          |  2 +-
 fs/9p/vfs_inode.c                         | 22 ++++++----
 fs/9p/vfs_inode_dotl.c                    | 26 ++++++------
 fs/adfs/adfs.h                            |  3 +-
 fs/adfs/inode.c                           |  3 +-
 fs/affs/affs.h                            | 10 ++---
 fs/affs/inode.c                           |  3 +-
 fs/affs/namei.c                           | 15 ++++---
 fs/afs/dir.c                              | 34 ++++++++-------
 fs/afs/inode.c                            |  3 +-
 fs/afs/internal.h                         |  4 +-
 fs/afs/security.c                         |  2 +-
 fs/attr.c                                 |  4 +-
 fs/autofs/root.c                          | 13 +++---
 fs/bad_inode.c                            | 31 ++++++++------
 fs/bfs/dir.c                              | 10 ++---
 fs/btrfs/acl.c                            |  3 +-
 fs/btrfs/ctree.h                          |  3 +-
 fs/btrfs/inode.c                          | 29 +++++++------
 fs/ceph/acl.c                             |  3 +-
 fs/ceph/dir.c                             | 23 +++++-----
 fs/ceph/inode.c                           |  5 ++-
 fs/ceph/super.h                           |  8 ++--
 fs/cifs/cifsfs.c                          |  2 +-
 fs/cifs/cifsfs.h                          | 18 ++++----
 fs/cifs/dir.c                             |  8 ++--
 fs/cifs/inode.c                           | 12 +++---
 fs/cifs/link.c                            |  3 +-
 fs/coda/coda_linux.h                      |  4 +-
 fs/coda/dir.c                             | 18 ++++----
 fs/coda/inode.c                           |  3 +-
 fs/coda/pioctl.c                          |  6 ++-
 fs/configfs/configfs_internal.h           |  7 ++--
 fs/configfs/dir.c                         |  3 +-
 fs/configfs/inode.c                       |  5 ++-
 fs/configfs/symlink.c                     |  3 +-
 fs/debugfs/inode.c                        |  9 ++--
 fs/ecryptfs/inode.c                       | 25 ++++++-----
 fs/efivarfs/inode.c                       |  4 +-
 fs/exfat/exfat_fs.h                       |  3 +-
 fs/exfat/file.c                           |  3 +-
 fs/exfat/namei.c                          | 13 +++---
 fs/ext2/acl.c                             |  3 +-
 fs/ext2/acl.h                             |  3 +-
 fs/ext2/ext2.h                            |  2 +-
 fs/ext2/inode.c                           |  3 +-
 fs/ext2/namei.c                           | 22 ++++++----
 fs/ext4/acl.c                             |  3 +-
 fs/ext4/acl.h                             |  3 +-
 fs/ext4/ext4.h                            |  2 +-
 fs/ext4/inode.c                           |  3 +-
 fs/ext4/namei.c                           | 22 +++++-----
 fs/f2fs/acl.c                             |  3 +-
 fs/f2fs/acl.h                             |  3 +-
 fs/f2fs/f2fs.h                            |  3 +-
 fs/f2fs/file.c                            |  3 +-
 fs/f2fs/namei.c                           | 24 ++++++-----
 fs/fat/fat.h                              |  3 +-
 fs/fat/file.c                             |  5 ++-
 fs/fat/namei_msdos.c                      | 13 +++---
 fs/fat/namei_vfat.c                       | 13 +++---
 fs/fuse/acl.c                             |  3 +-
 fs/fuse/dir.c                             | 33 ++++++++-------
 fs/fuse/fuse_i.h                          |  4 +-
 fs/gfs2/acl.c                             |  3 +-
 fs/gfs2/acl.h                             |  3 +-
 fs/gfs2/file.c                            |  2 +-
 fs/gfs2/inode.c                           | 44 ++++++++++---------
 fs/gfs2/inode.h                           |  2 +-
 fs/hfs/dir.c                              | 13 +++---
 fs/hfs/hfs_fs.h                           |  2 +-
 fs/hfs/inode.c                            |  3 +-
 fs/hfsplus/dir.c                          | 25 +++++------
 fs/hfsplus/inode.c                        |  3 +-
 fs/hostfs/hostfs_kern.c                   | 25 ++++++-----
 fs/hpfs/hpfs_fn.h                         |  2 +-
 fs/hpfs/inode.c                           |  3 +-
 fs/hpfs/namei.c                           | 20 +++++----
 fs/hugetlbfs/inode.c                      | 25 ++++++-----
 fs/jffs2/acl.c                            |  3 +-
 fs/jffs2/acl.h                            |  3 +-
 fs/jffs2/dir.c                            | 32 +++++++-------
 fs/jffs2/fs.c                             |  3 +-
 fs/jffs2/os-linux.h                       |  2 +-
 fs/jfs/acl.c                              |  3 +-
 fs/jfs/file.c                             |  3 +-
 fs/jfs/jfs_acl.h                          |  3 +-
 fs/jfs/jfs_inode.h                        |  2 +-
 fs/jfs/namei.c                            | 21 +++++-----
 fs/kernfs/dir.c                           |  7 ++--
 fs/kernfs/inode.c                         |  5 ++-
 fs/kernfs/kernfs-internal.h               |  5 ++-
 fs/libfs.c                                | 23 +++++-----
 fs/minix/file.c                           |  3 +-
 fs/minix/namei.c                          | 25 ++++++-----
 fs/namei.c                                | 24 ++++++-----
 fs/nfs/dir.c                              | 21 ++++++----
 fs/nfs/inode.c                            |  3 +-
 fs/nfs/internal.h                         | 10 ++---
 fs/nfs/namespace.c                        |  5 ++-
 fs/nfs/nfs3_fs.h                          |  3 +-
 fs/nfs/nfs3acl.c                          |  3 +-
 fs/nilfs2/inode.c                         |  5 ++-
 fs/nilfs2/namei.c                         | 20 +++++----
 fs/nilfs2/nilfs.h                         |  4 +-
 fs/ocfs2/acl.c                            |  3 +-
 fs/ocfs2/acl.h                            |  3 +-
 fs/ocfs2/dlmfs/dlmfs.c                    |  9 ++--
 fs/ocfs2/file.c                           |  5 ++-
 fs/ocfs2/file.h                           |  5 ++-
 fs/ocfs2/namei.c                          | 19 +++++----
 fs/omfs/dir.c                             | 13 +++---
 fs/omfs/file.c                            |  3 +-
 fs/orangefs/acl.c                         |  3 +-
 fs/orangefs/inode.c                       |  5 ++-
 fs/orangefs/namei.c                       | 12 ++++--
 fs/orangefs/orangefs-kernel.h             |  7 ++--
 fs/overlayfs/dir.c                        | 21 +++++-----
 fs/overlayfs/inode.c                      |  5 ++-
 fs/overlayfs/overlayfs.h                  |  5 ++-
 fs/overlayfs/super.c                      |  2 +-
 fs/posix_acl.c                            |  9 ++--
 fs/proc/base.c                            |  8 ++--
 fs/proc/fd.c                              |  2 +-
 fs/proc/fd.h                              |  3 +-
 fs/proc/generic.c                         |  3 +-
 fs/proc/internal.h                        |  2 +-
 fs/proc/proc_sysctl.c                     |  6 ++-
 fs/ramfs/file-nommu.c                     |  5 ++-
 fs/ramfs/inode.c                          | 16 ++++---
 fs/reiserfs/acl.h                         |  3 +-
 fs/reiserfs/inode.c                       |  3 +-
 fs/reiserfs/namei.c                       | 19 +++++----
 fs/reiserfs/reiserfs.h                    |  3 +-
 fs/reiserfs/xattr.c                       | 10 ++---
 fs/reiserfs/xattr.h                       |  2 +-
 fs/reiserfs/xattr_acl.c                   |  3 +-
 fs/sysv/file.c                            |  3 +-
 fs/sysv/namei.c                           | 21 ++++++----
 fs/tracefs/inode.c                        |  4 +-
 fs/ubifs/dir.c                            | 25 +++++------
 fs/ubifs/file.c                           |  3 +-
 fs/ubifs/ubifs.h                          |  3 +-
 fs/udf/file.c                             |  3 +-
 fs/udf/namei.c                            | 24 ++++++-----
 fs/ufs/inode.c                            |  3 +-
 fs/ufs/namei.c                            | 19 +++++----
 fs/ufs/ufs.h                              |  3 +-
 fs/vboxsf/dir.c                           | 12 ++++--
 fs/vboxsf/utils.c                         |  3 +-
 fs/vboxsf/vfsmod.h                        |  3 +-
 fs/xfs/xfs_acl.c                          |  3 +-
 fs/xfs/xfs_acl.h                          |  3 +-
 fs/xfs/xfs_iops.c                         | 51 +++++++++++++----------
 fs/zonefs/super.c                         |  3 +-
 include/linux/fs.h                        | 26 ++++++------
 include/linux/nfs_fs.h                    |  4 +-
 include/linux/posix_acl.h                 |  2 +-
 ipc/mqueue.c                              |  4 +-
 kernel/bpf/inode.c                        |  7 ++--
 mm/shmem.c                                | 34 +++++++++------
 net/socket.c                              |  5 ++-
 security/apparmor/apparmorfs.c            |  3 +-
 security/integrity/evm/evm_secfs.c        |  2 +-
 169 files changed, 870 insertions(+), 651 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index ca52c82e5bb5..d8418a3dcbca 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -415,28 +415,29 @@ As of kernel 2.6.22, the following members are defined:
 .. code-block:: c
 
 	struct inode_operations {
-		int (*create) (struct inode *,struct dentry *, umode_t, bool);
+		int (*create) (struct user_namespace *user_ns, struct inode *,struct dentry *, umode_t, bool);
 		struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
 		int (*link) (struct dentry *,struct inode *,struct dentry *);
 		int (*unlink) (struct inode *,struct dentry *);
-		int (*symlink) (struct inode *,struct dentry *,const char *);
-		int (*mkdir) (struct inode *,struct dentry *,umode_t);
+		int (*symlink) (struct user_namespace *user_ns, struct inode *,struct dentry *,const char *);
+		int (*mkdir) (struct user_namespace *user_ns, struct inode *,struct dentry *,umode_t);
 		int (*rmdir) (struct inode *,struct dentry *);
-		int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
-		int (*rename) (struct inode *, struct dentry *,
+		int (*mknod) (struct user_namespace *user_ns, struct inode *,struct dentry *,umode_t,dev_t);
+		int (*rename) (struct user_namespace *user_ns, struct inode *, struct dentry *,
 			       struct inode *, struct dentry *, unsigned int);
 		int (*readlink) (struct dentry *, char __user *,int);
 		const char *(*get_link) (struct dentry *, struct inode *,
 					 struct delayed_call *);
-		int (*permission) (struct inode *, int);
+		int (*permission) (struct user_namespace *user_ns, struct inode *, int);
 		int (*get_acl)(struct inode *, int);
-		int (*setattr) (struct dentry *, struct iattr *);
+		int (*setattr) (struct user_namespace *user_ns, struct dentry *, struct iattr *);
 		int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
 		ssize_t (*listxattr) (struct dentry *, char *, size_t);
 		void (*update_time)(struct inode *, struct timespec *, int);
 		int (*atomic_open)(struct inode *, struct dentry *, struct file *,
 				   unsigned open_flag, umode_t create_mode);
-		int (*tmpfile) (struct inode *, struct dentry *, umode_t);
+		int (*tmpfile) (struct user_namespace *user_ns, struct inode *, struct dentry *, umode_t);
+	        int (*set_acl)(struct user_namespace *, struct inode *, struct posix_acl *, int);
 	};
 
 Again, all methods are called without any locks being held, unless
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index 3de526eb2275..e92d43fdeb33 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -91,7 +91,8 @@ spufs_new_inode(struct super_block *sb, umode_t mode)
 }
 
 static int
-spufs_setattr(struct dentry *dentry, struct iattr *attr)
+spufs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+	      struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 
diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c
index 7b4f154f07e6..f0941525c11c 100644
--- a/drivers/android/binderfs.c
+++ b/drivers/android/binderfs.c
@@ -355,7 +355,8 @@ static inline bool is_binderfs_control_device(const struct dentry *dentry)
 	return info->control_dentry == dentry;
 }
 
-static int binderfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+static int binderfs_rename(struct user_namespace *user_ns,
+			   struct inode *old_dir, struct dentry *old_dentry,
 			   struct inode *new_dir, struct dentry *new_dentry,
 			   unsigned int flags)
 {
diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index 650b14dd3ccd..cecdf8703ceb 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -298,7 +298,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler,
 			 * What is the following setxattr update the
 			 * mode ?
 			 */
-			v9fs_vfs_setattr_dotl(dentry, &iattr);
+			v9fs_vfs_setattr_dotl(user_ns, dentry, &iattr);
 		}
 		break;
 	case ACL_TYPE_DEFAULT:
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 7b763776306e..da34afbadbda 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -135,7 +135,8 @@ extern struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 			unsigned int flags);
 extern int v9fs_vfs_unlink(struct inode *i, struct dentry *d);
 extern int v9fs_vfs_rmdir(struct inode *i, struct dentry *d);
-extern int v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+extern int v9fs_vfs_rename(struct user_namespace *user_ns,
+			   struct inode *old_dir, struct dentry *old_dentry,
 			   struct inode *new_dir, struct dentry *new_dentry,
 			   unsigned int flags);
 extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
diff --git a/fs/9p/v9fs_vfs.h b/fs/9p/v9fs_vfs.h
index fd2a2b040250..f65e95f7fb20 100644
--- a/fs/9p/v9fs_vfs.h
+++ b/fs/9p/v9fs_vfs.h
@@ -59,7 +59,7 @@ void v9fs_inode2stat(struct inode *inode, struct p9_wstat *stat);
 int v9fs_uflags2omode(int uflags, int extended);
 
 void v9fs_blank_wstat(struct p9_wstat *wstat);
-int v9fs_vfs_setattr_dotl(struct dentry *, struct iattr *);
+int v9fs_vfs_setattr_dotl(struct user_namespace *, struct dentry *, struct iattr *);
 int v9fs_file_fsync_dotl(struct file *filp, loff_t start, loff_t end,
 			 int datasync);
 int v9fs_refresh_inode(struct p9_fid *fid, struct inode *inode);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 0a5c022c1c70..64a28e9dd3b8 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -667,8 +667,8 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
  */
 
 static int
-v9fs_vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		bool excl)
+v9fs_vfs_create(struct user_namespace *user_ns, struct inode *dir,
+		struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(dir);
 	u32 perm = unixmode2p9mode(v9ses, mode);
@@ -693,7 +693,8 @@ v9fs_vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
  *
  */
 
-static int v9fs_vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int v9fs_vfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			  struct dentry *dentry, umode_t mode)
 {
 	int err;
 	u32 perm;
@@ -894,9 +895,9 @@ int v9fs_vfs_rmdir(struct inode *i, struct dentry *d)
  */
 
 int
-v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		struct inode *new_dir, struct dentry *new_dentry,
-		unsigned int flags)
+v9fs_vfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		struct dentry *old_dentry, struct inode *new_dir,
+		struct dentry *new_dentry, unsigned int flags)
 {
 	int retval;
 	struct inode *old_inode;
@@ -1032,7 +1033,8 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
  *
  */
 
-static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
+static int v9fs_vfs_setattr(struct user_namespace *user_ns,
+			    struct dentry *dentry, struct iattr *iattr)
 {
 	int retval;
 	struct v9fs_session_info *v9ses;
@@ -1266,7 +1268,8 @@ static int v9fs_vfs_mkspecial(struct inode *dir, struct dentry *dentry,
  */
 
 static int
-v9fs_vfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
+v9fs_vfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+		 struct dentry *dentry, const char *symname)
 {
 	p9_debug(P9_DEBUG_VFS, " %lu,%pd,%s\n",
 		 dir->i_ino, dentry, symname);
@@ -1319,7 +1322,8 @@ v9fs_vfs_link(struct dentry *old_dentry, struct inode *dir,
  */
 
 static int
-v9fs_vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rdev)
+v9fs_vfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+	       struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(dir);
 	int retval;
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 8f3c1daf72ba..24ea7f21aa14 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -33,8 +33,8 @@
 #include "acl.h"
 
 static int
-v9fs_vfs_mknod_dotl(struct inode *dir, struct dentry *dentry, umode_t omode,
-		    dev_t rdev);
+v9fs_vfs_mknod_dotl(struct user_namespace *user_ns, struct inode *dir,
+		    struct dentry *dentry, umode_t omode, dev_t rdev);
 
 /**
  * v9fs_get_fsgid_for_create - Helper function to get the gid for creating a
@@ -218,10 +218,10 @@ int v9fs_open_to_dotl_flags(int flags)
  */
 
 static int
-v9fs_vfs_create_dotl(struct inode *dir, struct dentry *dentry, umode_t omode,
-		bool excl)
+v9fs_vfs_create_dotl(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t omode, bool excl)
 {
-	return v9fs_vfs_mknod_dotl(dir, dentry, omode, 0);
+	return v9fs_vfs_mknod_dotl(user_ns, dir, dentry, omode, 0);
 }
 
 static int
@@ -365,8 +365,9 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
  *
  */
 
-static int v9fs_vfs_mkdir_dotl(struct inode *dir,
-			       struct dentry *dentry, umode_t omode)
+static int v9fs_vfs_mkdir_dotl(struct user_namespace *user_ns,
+			       struct inode *dir, struct dentry *dentry,
+			       umode_t omode)
 {
 	int err;
 	struct v9fs_session_info *v9ses;
@@ -537,7 +538,8 @@ static int v9fs_mapped_iattr_valid(int iattr_valid)
  *
  */
 
-int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
+int v9fs_vfs_setattr_dotl(struct user_namespace *user_ns, struct dentry *dentry,
+			  struct iattr *iattr)
 {
 	int retval;
 	struct p9_fid *fid = NULL;
@@ -670,8 +672,8 @@ v9fs_stat2inode_dotl(struct p9_stat_dotl *stat, struct inode *inode,
 }
 
 static int
-v9fs_vfs_symlink_dotl(struct inode *dir, struct dentry *dentry,
-		const char *symname)
+v9fs_vfs_symlink_dotl(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, const char *symname)
 {
 	int err;
 	kgid_t gid;
@@ -804,8 +806,8 @@ v9fs_vfs_link_dotl(struct dentry *old_dentry, struct inode *dir,
  *
  */
 static int
-v9fs_vfs_mknod_dotl(struct inode *dir, struct dentry *dentry, umode_t omode,
-		dev_t rdev)
+v9fs_vfs_mknod_dotl(struct user_namespace *user_ns, struct inode *dir,
+		    struct dentry *dentry, umode_t omode, dev_t rdev)
 {
 	int err;
 	kgid_t gid;
diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h
index 699c4fa8b78b..e49adb7f4b9d 100644
--- a/fs/adfs/adfs.h
+++ b/fs/adfs/adfs.h
@@ -144,7 +144,8 @@ struct adfs_discmap {
 /* Inode stuff */
 struct inode *adfs_iget(struct super_block *sb, struct object_info *obj);
 int adfs_write_inode(struct inode *inode, struct writeback_control *wbc);
-int adfs_notify_change(struct dentry *dentry, struct iattr *attr);
+int adfs_notify_change(struct user_namespace *user_ns, struct dentry *dentry,
+		       struct iattr *attr);
 
 /* map.c */
 int adfs_map_lookup(struct super_block *sb, u32 frag_id, unsigned int offset);
diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c
index 278dcee6ae22..954985009ab8 100644
--- a/fs/adfs/inode.c
+++ b/fs/adfs/inode.c
@@ -292,7 +292,8 @@ adfs_iget(struct super_block *sb, struct object_info *obj)
  * later.
  */
 int
-adfs_notify_change(struct dentry *dentry, struct iattr *attr)
+adfs_notify_change(struct user_namespace *user_ns, struct dentry *dentry,
+		   struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct super_block *sb = inode->i_sb;
diff --git a/fs/affs/affs.h b/fs/affs/affs.h
index a755bef7c4c7..92d12e0b454e 100644
--- a/fs/affs/affs.h
+++ b/fs/affs/affs.h
@@ -167,21 +167,21 @@ extern const struct export_operations affs_export_ops;
 extern int	affs_hash_name(struct super_block *sb, const u8 *name, unsigned int len);
 extern struct dentry *affs_lookup(struct inode *dir, struct dentry *dentry, unsigned int);
 extern int	affs_unlink(struct inode *dir, struct dentry *dentry);
-extern int	affs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool);
-extern int	affs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
+extern int	affs_create(struct user_namespace *user_ns, struct inode *dir, struct dentry *dentry, umode_t mode, bool);
+extern int	affs_mkdir(struct user_namespace *user_ns, struct inode *dir, struct dentry *dentry, umode_t mode);
 extern int	affs_rmdir(struct inode *dir, struct dentry *dentry);
 extern int	affs_link(struct dentry *olddentry, struct inode *dir,
 			  struct dentry *dentry);
-extern int	affs_symlink(struct inode *dir, struct dentry *dentry,
+extern int	affs_symlink(struct user_namespace *user_ns, struct inode *dir, struct dentry *dentry,
 			     const char *symname);
-extern int	affs_rename2(struct inode *old_dir, struct dentry *old_dentry,
+extern int	affs_rename2(struct user_namespace *user_ns, struct inode *old_dir, struct dentry *old_dentry,
 			    struct inode *new_dir, struct dentry *new_dentry,
 			    unsigned int flags);
 
 /* inode.c */
 
 extern struct inode		*affs_new_inode(struct inode *dir);
-extern int			 affs_notify_change(struct dentry *dentry, struct iattr *attr);
+extern int			 affs_notify_change(struct user_namespace *user_ns, struct dentry *dentry, struct iattr *attr);
 extern void			 affs_evict_inode(struct inode *inode);
 extern struct inode		*affs_iget(struct super_block *sb,
 					unsigned long ino);
diff --git a/fs/affs/inode.c b/fs/affs/inode.c
index 767e5bdfb703..5d29136b7fb4 100644
--- a/fs/affs/inode.c
+++ b/fs/affs/inode.c
@@ -216,7 +216,8 @@ affs_write_inode(struct inode *inode, struct writeback_control *wbc)
 }
 
 int
-affs_notify_change(struct dentry *dentry, struct iattr *attr)
+affs_notify_change(struct user_namespace *user_ns, struct dentry *dentry,
+		   struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
diff --git a/fs/affs/namei.c b/fs/affs/namei.c
index 41c5749f4db7..f715c32bdc0e 100644
--- a/fs/affs/namei.c
+++ b/fs/affs/namei.c
@@ -242,7 +242,8 @@ affs_unlink(struct inode *dir, struct dentry *dentry)
 }
 
 int
-affs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl)
+affs_create(struct user_namespace *user_ns, struct inode *dir,
+	    struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct super_block *sb = dir->i_sb;
 	struct inode	*inode;
@@ -273,7 +274,8 @@ affs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl)
 }
 
 int
-affs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+affs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+	   struct dentry *dentry, umode_t mode)
 {
 	struct inode		*inode;
 	int			 error;
@@ -311,7 +313,8 @@ affs_rmdir(struct inode *dir, struct dentry *dentry)
 }
 
 int
-affs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
+affs_symlink(struct user_namespace *user_ns, struct inode *dir,
+	     struct dentry *dentry, const char *symname)
 {
 	struct super_block	*sb = dir->i_sb;
 	struct buffer_head	*bh;
@@ -498,9 +501,9 @@ affs_xrename(struct inode *old_dir, struct dentry *old_dentry,
 	return retval;
 }
 
-int affs_rename2(struct inode *old_dir, struct dentry *old_dentry,
-			struct inode *new_dir, struct dentry *new_dentry,
-			unsigned int flags)
+int affs_rename2(struct user_namespace *user_ns, struct inode *old_dir,
+		 struct dentry *old_dentry, struct inode *new_dir,
+		 struct dentry *new_dentry, unsigned int flags)
 {
 
 	if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 1bb5b9d7f0a2..20da50690960 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -28,18 +28,19 @@ static int afs_lookup_one_filldir(struct dir_context *ctx, const char *name, int
 				  loff_t fpos, u64 ino, unsigned dtype);
 static int afs_lookup_filldir(struct dir_context *ctx, const char *name, int nlen,
 			      loff_t fpos, u64 ino, unsigned dtype);
-static int afs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		      bool excl);
-static int afs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
+static int afs_create(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, bool excl);
+static int afs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode);
 static int afs_rmdir(struct inode *dir, struct dentry *dentry);
 static int afs_unlink(struct inode *dir, struct dentry *dentry);
 static int afs_link(struct dentry *from, struct inode *dir,
 		    struct dentry *dentry);
-static int afs_symlink(struct inode *dir, struct dentry *dentry,
-		       const char *content);
-static int afs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		      struct inode *new_dir, struct dentry *new_dentry,
-		      unsigned int flags);
+static int afs_symlink(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, const char *content);
+static int afs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		      struct dentry *old_dentry, struct inode *new_dir,
+		      struct dentry *new_dentry, unsigned int flags);
 static int afs_dir_releasepage(struct page *page, gfp_t gfp_flags);
 static void afs_dir_invalidatepage(struct page *page, unsigned int offset,
 				   unsigned int length);
@@ -1321,7 +1322,8 @@ static const struct afs_operation_ops afs_mkdir_operation = {
 /*
  * create a directory on an AFS filesystem
  */
-static int afs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int afs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode)
 {
 	struct afs_operation *op;
 	struct afs_vnode *dvnode = AFS_FS_I(dir);
@@ -1615,8 +1617,8 @@ static const struct afs_operation_ops afs_create_operation = {
 /*
  * create a regular file on an AFS filesystem
  */
-static int afs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		      bool excl)
+static int afs_create(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct afs_operation *op;
 	struct afs_vnode *dvnode = AFS_FS_I(dir);
@@ -1737,8 +1739,8 @@ static const struct afs_operation_ops afs_symlink_operation = {
 /*
  * create a symlink in an AFS filesystem
  */
-static int afs_symlink(struct inode *dir, struct dentry *dentry,
-		       const char *content)
+static int afs_symlink(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, const char *content)
 {
 	struct afs_operation *op;
 	struct afs_vnode *dvnode = AFS_FS_I(dir);
@@ -1872,9 +1874,9 @@ static const struct afs_operation_ops afs_rename_operation = {
 /*
  * rename a file in an AFS filesystem and/or move it between directories
  */
-static int afs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		      struct inode *new_dir, struct dentry *new_dentry,
-		      unsigned int flags)
+static int afs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		      struct dentry *old_dentry, struct inode *new_dir,
+		      struct dentry *new_dentry, unsigned int flags)
 {
 	struct afs_operation *op;
 	struct afs_vnode *orig_dvnode, *new_dvnode, *vnode;
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 17ecdee404eb..13de02298309 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -849,7 +849,8 @@ static const struct afs_operation_ops afs_setattr_operation = {
 /*
  * set the attributes of an inode
  */
-int afs_setattr(struct dentry *dentry, struct iattr *attr)
+int afs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		struct iattr *attr)
 {
 	struct afs_operation *op;
 	struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry));
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 14d5d75f4b6e..3ff787d74fb5 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1149,7 +1149,7 @@ extern struct inode *afs_root_iget(struct super_block *, struct key *);
 extern bool afs_check_validity(struct afs_vnode *);
 extern int afs_validate(struct afs_vnode *, struct key *);
 extern int afs_getattr(const struct path *, struct kstat *, u32, unsigned int);
-extern int afs_setattr(struct dentry *, struct iattr *);
+extern int afs_setattr(struct user_namespace *user_ns, struct dentry *, struct iattr *);
 extern void afs_evict_inode(struct inode *);
 extern int afs_drop_inode(struct inode *);
 
@@ -1360,7 +1360,7 @@ extern void afs_zap_permits(struct rcu_head *);
 extern struct key *afs_request_key(struct afs_cell *);
 extern struct key *afs_request_key_rcu(struct afs_cell *);
 extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
-extern int afs_permission(struct inode *, int);
+extern int afs_permission(struct user_namespace *, struct inode *, int);
 extern void __exit afs_clean_up_permit_cache(void);
 
 /*
diff --git a/fs/afs/security.c b/fs/afs/security.c
index 9cf3102f370c..bd029ac7a2fd 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -396,7 +396,7 @@ int afs_check_permit(struct afs_vnode *vnode, struct key *key,
  * - AFS ACLs are attached to directories only, and a file is controlled by its
  *   parent directory's ACL
  */
-int afs_permission(struct inode *inode, int mask)
+int afs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct afs_vnode *vnode = AFS_FS_I(inode);
 	afs_access_t access;
diff --git a/fs/attr.c b/fs/attr.c
index e990cda1ea6f..36383cd3a986 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -351,9 +351,9 @@ int notify_change(struct user_namespace *user_ns, struct dentry *dentry,
 		return error;
 
 	if (inode->i_op->setattr)
-		error = inode->i_op->setattr(dentry, attr);
+		error = inode->i_op->setattr(user_ns, dentry, attr);
 	else
-		error = simple_setattr(dentry, attr);
+		error = simple_setattr(user_ns, dentry, attr);
 
 	if (!error) {
 		fsnotify_change(dentry, ia_valid);
diff --git a/fs/autofs/root.c b/fs/autofs/root.c
index 5aaa1732bf1e..73f5b54aa539 100644
--- a/fs/autofs/root.c
+++ b/fs/autofs/root.c
@@ -10,10 +10,12 @@
 
 #include "autofs_i.h"
 
-static int autofs_dir_symlink(struct inode *, struct dentry *, const char *);
+static int autofs_dir_symlink(struct user_namespace *, struct inode *,
+			      struct dentry *, const char *);
 static int autofs_dir_unlink(struct inode *, struct dentry *);
 static int autofs_dir_rmdir(struct inode *, struct dentry *);
-static int autofs_dir_mkdir(struct inode *, struct dentry *, umode_t);
+static int autofs_dir_mkdir(struct user_namespace *, struct inode *,
+			    struct dentry *, umode_t);
 static long autofs_root_ioctl(struct file *, unsigned int, unsigned long);
 #ifdef CONFIG_COMPAT
 static long autofs_root_compat_ioctl(struct file *,
@@ -524,9 +526,8 @@ static struct dentry *autofs_lookup(struct inode *dir,
 	return NULL;
 }
 
-static int autofs_dir_symlink(struct inode *dir,
-			       struct dentry *dentry,
-			       const char *symname)
+static int autofs_dir_symlink(struct user_namespace *user_ns, struct inode *dir,
+			      struct dentry *dentry, const char *symname)
 {
 	struct autofs_sb_info *sbi = autofs_sbi(dir->i_sb);
 	struct autofs_info *ino = autofs_dentry_ino(dentry);
@@ -715,7 +716,7 @@ static int autofs_dir_rmdir(struct inode *dir, struct dentry *dentry)
 	return 0;
 }
 
-static int autofs_dir_mkdir(struct inode *dir,
+static int autofs_dir_mkdir(struct user_namespace *user_ns, struct inode *dir,
 			    struct dentry *dentry, umode_t mode)
 {
 	struct autofs_sb_info *sbi = autofs_sbi(dir->i_sb);
diff --git a/fs/bad_inode.c b/fs/bad_inode.c
index 54f0ce444272..a95d67c9146b 100644
--- a/fs/bad_inode.c
+++ b/fs/bad_inode.c
@@ -27,8 +27,8 @@ static const struct file_operations bad_file_ops =
 	.open		= bad_file_open,
 };
 
-static int bad_inode_create (struct inode *dir, struct dentry *dentry,
-		umode_t mode, bool excl)
+static int bad_inode_create(struct user_namespace *user_ns, struct inode *dir,
+			    struct dentry *dentry, umode_t mode, bool excl)
 {
 	return -EIO;
 }
@@ -50,14 +50,14 @@ static int bad_inode_unlink(struct inode *dir, struct dentry *dentry)
 	return -EIO;
 }
 
-static int bad_inode_symlink (struct inode *dir, struct dentry *dentry,
-		const char *symname)
+static int bad_inode_symlink(struct user_namespace *user_ns, struct inode *dir,
+			     struct dentry *dentry, const char *symname)
 {
 	return -EIO;
 }
 
-static int bad_inode_mkdir(struct inode *dir, struct dentry *dentry,
-			umode_t mode)
+static int bad_inode_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			   struct dentry *dentry, umode_t mode)
 {
 	return -EIO;
 }
@@ -67,13 +67,14 @@ static int bad_inode_rmdir (struct inode *dir, struct dentry *dentry)
 	return -EIO;
 }
 
-static int bad_inode_mknod (struct inode *dir, struct dentry *dentry,
-			umode_t mode, dev_t rdev)
+static int bad_inode_mknod(struct user_namespace *user_ns, struct inode *dir,
+			   struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	return -EIO;
 }
 
-static int bad_inode_rename2(struct inode *old_dir, struct dentry *old_dentry,
+static int bad_inode_rename2(struct user_namespace *user_ns,
+			     struct inode *old_dir, struct dentry *old_dentry,
 			     struct inode *new_dir, struct dentry *new_dentry,
 			     unsigned int flags)
 {
@@ -86,7 +87,8 @@ static int bad_inode_readlink(struct dentry *dentry, char __user *buffer,
 	return -EIO;
 }
 
-static int bad_inode_permission(struct inode *inode, int mask)
+static int bad_inode_permission(struct user_namespace *user_ns,
+				struct inode *inode, int mask)
 {
 	return -EIO;
 }
@@ -97,7 +99,8 @@ static int bad_inode_getattr(const struct path *path, struct kstat *stat,
 	return -EIO;
 }
 
-static int bad_inode_setattr(struct dentry *direntry, struct iattr *attrs)
+static int bad_inode_setattr(struct user_namespace *user_ns,
+			     struct dentry *direntry, struct iattr *attrs)
 {
 	return -EIO;
 }
@@ -140,13 +143,15 @@ static int bad_inode_atomic_open(struct inode *inode, struct dentry *dentry,
 	return -EIO;
 }
 
-static int bad_inode_tmpfile(struct inode *inode, struct dentry *dentry,
+static int bad_inode_tmpfile(struct user_namespace *user_ns,
+			     struct inode *inode, struct dentry *dentry,
 			     umode_t mode)
 {
 	return -EIO;
 }
 
-static int bad_inode_set_acl(struct inode *inode, struct posix_acl *acl,
+static int bad_inode_set_acl(struct user_namespace *user_ns,
+			     struct inode *inode, struct posix_acl *acl,
 			     int type)
 {
 	return -EIO;
diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
index c5ae76a87be5..04c48c6bc6a7 100644
--- a/fs/bfs/dir.c
+++ b/fs/bfs/dir.c
@@ -75,8 +75,8 @@ const struct file_operations bfs_dir_operations = {
 	.llseek		= generic_file_llseek,
 };
 
-static int bfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-						bool excl)
+static int bfs_create(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, bool excl)
 {
 	int err;
 	struct inode *inode;
@@ -199,9 +199,9 @@ static int bfs_unlink(struct inode *dir, struct dentry *dentry)
 	return error;
 }
 
-static int bfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		      struct inode *new_dir, struct dentry *new_dentry,
-		      unsigned int flags)
+static int bfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		      struct dentry *old_dentry, struct inode *new_dir,
+		      struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *old_inode, *new_inode;
 	struct buffer_head *old_bh, *new_bh;
diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index b5a683e895c6..f9991ee67105 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -107,7 +107,8 @@ static int __btrfs_set_acl(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int btrfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		  struct posix_acl *acl, int type)
 {
 	int ret;
 	umode_t old_mode = inode->i_mode;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0378933d163c..f6916d3b3fea 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3490,7 +3490,8 @@ static inline int __btrfs_fs_compat_ro(struct btrfs_fs_info *fs_info, u64 flag)
 /* acl.c */
 #ifdef CONFIG_BTRFS_FS_POSIX_ACL
 struct posix_acl *btrfs_get_acl(struct inode *inode, int type);
-int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+int btrfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		  struct posix_acl *acl, int type);
 int btrfs_init_acl(struct btrfs_trans_handle *trans,
 		   struct inode *inode, struct inode *dir);
 #else
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 99b4fd66681d..0ad6f4afc752 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4866,7 +4866,8 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 	return ret;
 }
 
-static int btrfs_setattr(struct dentry *dentry, struct iattr *attr)
+static int btrfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			 struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -6176,8 +6177,8 @@ static int btrfs_add_nondir(struct btrfs_trans_handle *trans,
 	return err;
 }
 
-static int btrfs_mknod(struct inode *dir, struct dentry *dentry,
-			umode_t mode, dev_t rdev)
+static int btrfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(dir->i_sb);
 	struct btrfs_trans_handle *trans;
@@ -6240,8 +6241,8 @@ static int btrfs_mknod(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int btrfs_create(struct inode *dir, struct dentry *dentry,
-			umode_t mode, bool excl)
+static int btrfs_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(dir->i_sb);
 	struct btrfs_trans_handle *trans;
@@ -6385,7 +6386,8 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	return err;
 }
 
-static int btrfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int btrfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(dir->i_sb);
 	struct inode *inode = NULL;
@@ -9273,9 +9275,9 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	return ret;
 }
 
-static int btrfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
-			 struct inode *new_dir, struct dentry *new_dentry,
-			 unsigned int flags)
+static int btrfs_rename2(struct user_namespace *user_ns, struct inode *old_dir,
+			 struct dentry *old_dentry, struct inode *new_dir,
+			 struct dentry *new_dentry, unsigned int flags)
 {
 	if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
 		return -EINVAL;
@@ -9450,8 +9452,8 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr)
 	return ret;
 }
 
-static int btrfs_symlink(struct inode *dir, struct dentry *dentry,
-			 const char *symname)
+static int btrfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, const char *symname)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(dir->i_sb);
 	struct btrfs_trans_handle *trans;
@@ -9782,7 +9784,7 @@ static int btrfs_set_page_dirty(struct page *page)
 	return __set_page_dirty_nobuffers(page);
 }
 
-static int btrfs_permission(struct inode *inode, int mask)
+static int btrfs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	umode_t mode = inode->i_mode;
@@ -9797,7 +9799,8 @@ static int btrfs_permission(struct inode *inode, int mask)
 	return generic_permission(&init_user_ns, inode, mask);
 }
 
-static int btrfs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int btrfs_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(dir->i_sb);
 	struct btrfs_trans_handle *trans;
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index bceab4c01585..64c8c421626c 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -82,7 +82,8 @@ struct posix_acl *ceph_get_acl(struct inode *inode, int type)
 	return acl;
 }
 
-int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int ceph_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type)
 {
 	int ret = 0, size = 0;
 	const char *name = NULL;
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index a4d48370b2b3..3431ab586712 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -823,8 +823,8 @@ int ceph_handle_notrace_create(struct inode *dir, struct dentry *dentry)
 	return PTR_ERR(result);
 }
 
-static int ceph_mknod(struct inode *dir, struct dentry *dentry,
-		      umode_t mode, dev_t rdev)
+static int ceph_mknod(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
 	struct ceph_mds_request *req;
@@ -878,14 +878,14 @@ static int ceph_mknod(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int ceph_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		       bool excl)
+static int ceph_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
-	return ceph_mknod(dir, dentry, mode, 0);
+	return ceph_mknod(user_ns, dir, dentry, mode, 0);
 }
 
-static int ceph_symlink(struct inode *dir, struct dentry *dentry,
-			    const char *dest)
+static int ceph_symlink(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, const char *dest)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
 	struct ceph_mds_request *req;
@@ -937,7 +937,8 @@ static int ceph_symlink(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int ceph_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ceph_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb);
 	struct ceph_mds_request *req;
@@ -1183,9 +1184,9 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int ceph_rename(struct inode *old_dir, struct dentry *old_dentry,
-		       struct inode *new_dir, struct dentry *new_dentry,
-		       unsigned int flags)
+static int ceph_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		       struct dentry *old_dentry, struct inode *new_dir,
+		       struct dentry *new_dentry, unsigned int flags)
 {
 	struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(old_dir->i_sb);
 	struct ceph_mds_request *req;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index decfa5e1ec4c..52d8651b46b4 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2242,7 +2242,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
 /*
  * setattr
  */
-int ceph_setattr(struct dentry *dentry, struct iattr *attr)
+int ceph_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		 struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
@@ -2325,7 +2326,7 @@ int __ceph_do_getattr(struct inode *inode, struct page *locked_page,
  * Check inode permissions.  We verify we have a valid value for
  * the AUTH cap, then call the generic handler.
  */
-int ceph_permission(struct inode *inode, int mask)
+int ceph_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	int err;
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 482473e4cce1..d9275abe8711 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -973,9 +973,10 @@ static inline int ceph_do_getattr(struct inode *inode, int mask, bool force)
 {
 	return __ceph_do_getattr(inode, NULL, mask, force);
 }
-extern int ceph_permission(struct inode *inode, int mask);
+extern int ceph_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 extern int __ceph_setattr(struct inode *inode, struct iattr *attr);
-extern int ceph_setattr(struct dentry *dentry, struct iattr *attr);
+extern int ceph_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			struct iattr *attr);
 extern int ceph_getattr(const struct path *path, struct kstat *stat,
 			u32 request_mask, unsigned int flags);
 
@@ -1037,7 +1038,8 @@ void ceph_release_acl_sec_ctx(struct ceph_acl_sec_ctx *as_ctx);
 #ifdef CONFIG_CEPH_FS_POSIX_ACL
 
 struct posix_acl *ceph_get_acl(struct inode *, int);
-int ceph_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+int ceph_set_acl(struct user_namespace *user_ns,
+		 struct inode *inode, struct posix_acl *acl, int type);
 int ceph_pre_init_acls(struct inode *dir, umode_t *mode,
 		       struct ceph_acl_sec_ctx *as_ctx);
 void ceph_init_inode_acls(struct inode *inode,
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index f79dbc581eee..79ba2ef5b19a 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -301,7 +301,7 @@ static long cifs_fallocate(struct file *file, int mode, loff_t off, loff_t len)
 	return -EOPNOTSUPP;
 }
 
-static int cifs_permission(struct inode *inode, int mask)
+static int cifs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct cifs_sb_info *cifs_sb;
 
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 905d03863721..8d3ecc5ffaf8 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -62,19 +62,19 @@ extern void cifs_sb_deactive(struct super_block *sb);
 /* Functions related to inodes */
 extern const struct inode_operations cifs_dir_inode_ops;
 extern struct inode *cifs_root_iget(struct super_block *);
-extern int cifs_create(struct inode *, struct dentry *, umode_t,
-		       bool excl);
+extern int cifs_create(struct user_namespace *user_ns, struct inode *,
+		       struct dentry *, umode_t, bool excl);
 extern int cifs_atomic_open(struct inode *, struct dentry *,
 			    struct file *, unsigned, umode_t);
 extern struct dentry *cifs_lookup(struct inode *, struct dentry *,
 				  unsigned int);
 extern int cifs_unlink(struct inode *dir, struct dentry *dentry);
 extern int cifs_hardlink(struct dentry *, struct inode *, struct dentry *);
-extern int cifs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
-extern int cifs_mkdir(struct inode *, struct dentry *, umode_t);
+extern int cifs_mknod(struct user_namespace *, struct inode *, struct dentry *, umode_t, dev_t);
+extern int cifs_mkdir(struct user_namespace *, struct inode *, struct dentry *, umode_t);
 extern int cifs_rmdir(struct inode *, struct dentry *);
-extern int cifs_rename2(struct inode *, struct dentry *, struct inode *,
-			struct dentry *, unsigned int);
+extern int cifs_rename2(struct user_namespace *, struct inode *,
+			struct dentry *, struct inode *, struct dentry *, unsigned int);
 extern int cifs_revalidate_file_attr(struct file *filp);
 extern int cifs_revalidate_dentry_attr(struct dentry *);
 extern int cifs_revalidate_file(struct file *filp);
@@ -83,7 +83,7 @@ extern int cifs_invalidate_mapping(struct inode *inode);
 extern int cifs_revalidate_mapping(struct inode *inode);
 extern int cifs_zap_mapping(struct inode *inode);
 extern int cifs_getattr(const struct path *, struct kstat *, u32, unsigned int);
-extern int cifs_setattr(struct dentry *, struct iattr *);
+extern int cifs_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 extern int cifs_fiemap(struct inode *, struct fiemap_extent_info *, u64 start,
 		       u64 len);
 
@@ -132,8 +132,8 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
 /* Functions related to symlinks */
 extern const char *cifs_get_link(struct dentry *, struct inode *,
 			struct delayed_call *);
-extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
-			const char *symname);
+extern int cifs_symlink(struct user_namespace *user_ns, struct inode *inode,
+			struct dentry *direntry, const char *symname);
 
 #ifdef CONFIG_CIFS_XATTR
 extern const struct xattr_handler *cifs_xattr_handlers[];
diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 398c1eef7190..2ebc53e071a1 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -566,8 +566,8 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry,
 	return rc;
 }
 
-int cifs_create(struct inode *inode, struct dentry *direntry, umode_t mode,
-		bool excl)
+int cifs_create(struct user_namespace *user_ns, struct inode *inode,
+		struct dentry *direntry, umode_t mode, bool excl)
 {
 	int rc;
 	unsigned int xid = get_xid();
@@ -610,8 +610,8 @@ int cifs_create(struct inode *inode, struct dentry *direntry, umode_t mode,
 	return rc;
 }
 
-int cifs_mknod(struct inode *inode, struct dentry *direntry, umode_t mode,
-		dev_t device_number)
+int cifs_mknod(struct user_namespace *user_ns, struct inode *inode,
+	       struct dentry *direntry, umode_t mode, dev_t device_number)
 {
 	int rc = -EPERM;
 	unsigned int xid;
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 4fb3b6961096..013a569ae7ab 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1851,7 +1851,8 @@ cifs_posix_mkdir(struct inode *inode, struct dentry *dentry, umode_t mode,
 	goto posix_mkdir_out;
 }
 
-int cifs_mkdir(struct inode *inode, struct dentry *direntry, umode_t mode)
+int cifs_mkdir(struct user_namespace *user_ns, struct inode *inode,
+	       struct dentry *direntry, umode_t mode)
 {
 	int rc = 0;
 	unsigned int xid;
@@ -2061,9 +2062,9 @@ cifs_do_rename(const unsigned int xid, struct dentry *from_dentry,
 }
 
 int
-cifs_rename2(struct inode *source_dir, struct dentry *source_dentry,
-	     struct inode *target_dir, struct dentry *target_dentry,
-	     unsigned int flags)
+cifs_rename2(struct user_namespace *user_ns, struct inode *source_dir,
+	     struct dentry *source_dentry, struct inode *target_dir,
+	     struct dentry *target_dentry, unsigned int flags)
 {
 	char *from_name = NULL;
 	char *to_name = NULL;
@@ -2907,7 +2908,8 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
 }
 
 int
-cifs_setattr(struct dentry *direntry, struct iattr *attrs)
+cifs_setattr(struct user_namespace *user_ns, struct dentry *direntry,
+	     struct iattr *attrs)
 {
 	struct cifs_sb_info *cifs_sb = CIFS_SB(direntry->d_sb);
 	struct cifs_tcon *pTcon = cifs_sb_master_tcon(cifs_sb);
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 94dab4309fbb..e0ace95c3a88 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -661,7 +661,8 @@ cifs_get_link(struct dentry *direntry, struct inode *inode,
 }
 
 int
-cifs_symlink(struct inode *inode, struct dentry *direntry, const char *symname)
+cifs_symlink(struct user_namespace *user_ns, struct inode *inode,
+	     struct dentry *direntry, const char *symname)
 {
 	int rc = -EOPNOTSUPP;
 	unsigned int xid;
diff --git a/fs/coda/coda_linux.h b/fs/coda/coda_linux.h
index d5ebd36fb2cc..9bd92e18a7a2 100644
--- a/fs/coda/coda_linux.h
+++ b/fs/coda/coda_linux.h
@@ -46,10 +46,10 @@ extern const struct file_operations coda_ioctl_operations;
 /* operations shared over more than one file */
 int coda_open(struct inode *i, struct file *f);
 int coda_release(struct inode *i, struct file *f);
-int coda_permission(struct inode *inode, int mask);
+int coda_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 int coda_revalidate_inode(struct inode *);
 int coda_getattr(const struct path *, struct kstat *, u32, unsigned int);
-int coda_setattr(struct dentry *, struct iattr *);
+int coda_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 
 /* this file:  heloers */
 char *coda_f2s(struct CodaFid *f);
diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index ca40c2556ba6..21266a5d3989 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -73,7 +73,7 @@ static struct dentry *coda_lookup(struct inode *dir, struct dentry *entry, unsig
 }
 
 
-int coda_permission(struct inode *inode, int mask)
+int coda_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	int error;
 
@@ -132,7 +132,8 @@ static inline void coda_dir_drop_nlink(struct inode *dir)
 }
 
 /* creation routines: create, mknod, mkdir, link, symlink */
-static int coda_create(struct inode *dir, struct dentry *de, umode_t mode, bool excl)
+static int coda_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *de, umode_t mode, bool excl)
 {
 	int error;
 	const char *name=de->d_name.name;
@@ -164,7 +165,8 @@ static int coda_create(struct inode *dir, struct dentry *de, umode_t mode, bool
 	return error;
 }
 
-static int coda_mkdir(struct inode *dir, struct dentry *de, umode_t mode)
+static int coda_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *de, umode_t mode)
 {
 	struct inode *inode;
 	struct coda_vattr attrs;
@@ -225,8 +227,8 @@ static int coda_link(struct dentry *source_de, struct inode *dir_inode,
 }
 
 
-static int coda_symlink(struct inode *dir_inode, struct dentry *de,
-			const char *symname)
+static int coda_symlink(struct user_namespace *user_ns, struct inode *dir_inode,
+			struct dentry *de, const char *symname)
 {
 	const char *name = de->d_name.name;
 	int len = de->d_name.len;
@@ -291,9 +293,9 @@ static int coda_rmdir(struct inode *dir, struct dentry *de)
 }
 
 /* rename */
-static int coda_rename(struct inode *old_dir, struct dentry *old_dentry,
-		       struct inode *new_dir, struct dentry *new_dentry,
-		       unsigned int flags)
+static int coda_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		       struct dentry *old_dentry, struct inode *new_dir,
+		       struct dentry *new_dentry, unsigned int flags)
 {
 	const char *old_name = old_dentry->d_name.name;
 	const char *new_name = new_dentry->d_name.name;
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 4d113e191cb8..7c3f9c1687d2 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -260,7 +260,8 @@ int coda_getattr(const struct path *path, struct kstat *stat,
 	return err;
 }
 
-int coda_setattr(struct dentry *de, struct iattr *iattr)
+int coda_setattr(struct user_namespace *user_ns, struct dentry *de,
+		 struct iattr *iattr)
 {
 	struct inode *inode = d_inode(de);
 	struct coda_vattr vattr;
diff --git a/fs/coda/pioctl.c b/fs/coda/pioctl.c
index 3aec27e5eb82..4b711cfe5739 100644
--- a/fs/coda/pioctl.c
+++ b/fs/coda/pioctl.c
@@ -24,7 +24,8 @@
 #include "coda_linux.h"
 
 /* pioctl ops */
-static int coda_ioctl_permission(struct inode *inode, int mask);
+static int coda_ioctl_permission(struct user_namespace *user_ns,
+				 struct inode *inode, int mask);
 static long coda_pioctl(struct file *filp, unsigned int cmd,
 			unsigned long user_data);
 
@@ -40,7 +41,8 @@ const struct file_operations coda_ioctl_operations = {
 };
 
 /* the coda pioctl inode ops */
-static int coda_ioctl_permission(struct inode *inode, int mask)
+static int coda_ioctl_permission(struct user_namespace *user_ns,
+				 struct inode *inode, int mask)
 {
 	return (mask & MAY_EXEC) ? -EACCES : 0;
 }
diff --git a/fs/configfs/configfs_internal.h b/fs/configfs/configfs_internal.h
index 22dce2d35a4b..3d11cf291a25 100644
--- a/fs/configfs/configfs_internal.h
+++ b/fs/configfs/configfs_internal.h
@@ -79,7 +79,8 @@ extern void configfs_hash_and_remove(struct dentry * dir, const char * name);
 
 extern const unsigned char * configfs_get_name(struct configfs_dirent *sd);
 extern void configfs_drop_dentry(struct configfs_dirent *sd, struct dentry *parent);
-extern int configfs_setattr(struct dentry *dentry, struct iattr *iattr);
+extern int configfs_setattr(struct user_namespace *user_ns,
+			    struct dentry *dentry, struct iattr *iattr);
 
 extern struct dentry *configfs_pin_fs(void);
 extern void configfs_release_fs(void);
@@ -92,8 +93,8 @@ extern const struct inode_operations configfs_root_inode_operations;
 extern const struct inode_operations configfs_symlink_inode_operations;
 extern const struct dentry_operations configfs_dentry_ops;
 
-extern int configfs_symlink(struct inode *dir, struct dentry *dentry,
-			    const char *symname);
+extern int configfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			    struct dentry *dentry, const char *symname);
 extern int configfs_unlink(struct inode *dir, struct dentry *dentry);
 
 int configfs_create_link(struct configfs_dirent *target, struct dentry *parent,
diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index b0983e2a4e2c..aedb76c32a1e 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -1267,7 +1267,8 @@ int configfs_depend_item_unlocked(struct configfs_subsystem *caller_subsys,
 }
 EXPORT_SYMBOL(configfs_depend_item_unlocked);
 
-static int configfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int configfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			  struct dentry *dentry, umode_t mode)
 {
 	int ret = 0;
 	int module_got = 0;
diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c
index 8bd6a883c94c..c8ec418dbd15 100644
--- a/fs/configfs/inode.c
+++ b/fs/configfs/inode.c
@@ -40,7 +40,8 @@ static const struct inode_operations configfs_inode_operations ={
 	.setattr	= configfs_setattr,
 };
 
-int configfs_setattr(struct dentry * dentry, struct iattr * iattr)
+int configfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		     struct iattr *iattr)
 {
 	struct inode * inode = d_inode(dentry);
 	struct configfs_dirent * sd = dentry->d_fsdata;
@@ -67,7 +68,7 @@ int configfs_setattr(struct dentry * dentry, struct iattr * iattr)
 	}
 	/* attributes were changed atleast once in past */
 
-	error = simple_setattr(dentry, iattr);
+	error = simple_setattr(user_ns, dentry, iattr);
 	if (error)
 		return error;
 
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index 0b592c55f38e..574233245c06 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -139,7 +139,8 @@ static int get_target(const char *symname, struct path *path,
 }
 
 
-int configfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
+int configfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, const char *symname)
 {
 	int ret;
 	struct path path;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 2fcf66473436..6276cfc70039 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -42,13 +42,14 @@ static unsigned int debugfs_allow = DEFAULT_DEBUGFS_ALLOW_BITS;
  * so that we can use the file mode as part of a heuristic to determine whether
  * to lock down individual files.
  */
-static int debugfs_setattr(struct dentry *dentry, struct iattr *ia)
+static int debugfs_setattr(struct user_namespace *user_ns,
+			   struct dentry *dentry, struct iattr *ia)
 {
 	int ret = security_locked_down(LOCKDOWN_DEBUGFS);
 
 	if (ret && (ia->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)))
 		return ret;
-	return simple_setattr(dentry, ia);
+	return simple_setattr(user_ns, dentry, ia);
 }
 
 static const struct inode_operations debugfs_file_inode_operations = {
@@ -775,8 +776,8 @@ struct dentry *debugfs_rename(struct dentry *old_dir, struct dentry *old_dentry,
 
 	take_dentry_name_snapshot(&old_name, old_dentry);
 
-	error = simple_rename(d_inode(old_dir), old_dentry, d_inode(new_dir),
-			      dentry, 0);
+	error = simple_rename(&init_user_ns, d_inode(old_dir), old_dentry,
+			      d_inode(new_dir), dentry, 0);
 	if (error) {
 		release_dentry_name_snapshot(&old_name);
 		goto exit;
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 42066c5613ca..609594adb615 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -255,8 +255,8 @@ int ecryptfs_initialize_file(struct dentry *ecryptfs_dentry,
  * Returns zero on success; non-zero on error condition
  */
 static int
-ecryptfs_create(struct inode *directory_inode, struct dentry *ecryptfs_dentry,
-		umode_t mode, bool excl)
+ecryptfs_create(struct user_namespace *user_ns, struct inode *directory_inode,
+		struct dentry *ecryptfs_dentry, umode_t mode, bool excl)
 {
 	struct inode *ecryptfs_inode;
 	int rc;
@@ -461,8 +461,8 @@ static int ecryptfs_unlink(struct inode *dir, struct dentry *dentry)
 	return ecryptfs_do_unlink(dir, dentry, d_inode(dentry));
 }
 
-static int ecryptfs_symlink(struct inode *dir, struct dentry *dentry,
-			    const char *symname)
+static int ecryptfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			    struct dentry *dentry, const char *symname)
 {
 	int rc;
 	struct dentry *lower_dentry;
@@ -500,7 +500,8 @@ static int ecryptfs_symlink(struct inode *dir, struct dentry *dentry,
 	return rc;
 }
 
-static int ecryptfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ecryptfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			  struct dentry *dentry, umode_t mode)
 {
 	int rc;
 	struct dentry *lower_dentry;
@@ -556,7 +557,8 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
 }
 
 static int
-ecryptfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
+ecryptfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+	       struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	int rc;
 	struct dentry *lower_dentry;
@@ -581,9 +583,9 @@ ecryptfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev
 }
 
 static int
-ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		struct inode *new_dir, struct dentry *new_dentry,
-		unsigned int flags)
+ecryptfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		struct dentry *old_dentry, struct inode *new_dir,
+		struct dentry *new_dentry, unsigned int flags)
 {
 	int rc;
 	struct dentry *lower_old_dentry;
@@ -870,7 +872,7 @@ int ecryptfs_truncate(struct dentry *dentry, loff_t new_length)
 }
 
 static int
-ecryptfs_permission(struct inode *inode, int mask)
+ecryptfs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	return inode_permission(&init_user_ns, ecryptfs_inode_to_lower(inode), mask);
 }
@@ -887,7 +889,8 @@ ecryptfs_permission(struct inode *inode, int mask)
  * All other metadata changes will be passed right to the lower filesystem,
  * and we will just update our inode to look like the lower.
  */
-static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
+static int ecryptfs_setattr(struct user_namespace *user_ns,
+			    struct dentry *dentry, struct iattr *ia)
 {
 	int rc = 0;
 	struct dentry *lower_dentry;
diff --git a/fs/efivarfs/inode.c b/fs/efivarfs/inode.c
index 96c0c86f3fff..eb40fefe5518 100644
--- a/fs/efivarfs/inode.c
+++ b/fs/efivarfs/inode.c
@@ -65,8 +65,8 @@ bool efivarfs_valid_name(const char *str, int len)
 	return uuid_is_valid(s);
 }
 
-static int efivarfs_create(struct inode *dir, struct dentry *dentry,
-			  umode_t mode, bool excl)
+static int efivarfs_create(struct user_namespace *user_ns, struct inode *dir,
+			   struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct inode *inode = NULL;
 	struct efivar_entry *var;
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index b8f0e829ecbd..6f38e8b607a0 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -416,7 +416,8 @@ int exfat_count_used_clusters(struct super_block *sb, unsigned int *ret_count);
 extern const struct file_operations exfat_file_operations;
 int __exfat_truncate(struct inode *inode, loff_t new_size);
 void exfat_truncate(struct inode *inode, loff_t size);
-int exfat_setattr(struct dentry *dentry, struct iattr *attr);
+int exfat_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *attr);
 int exfat_getattr(const struct path *path, struct kstat *stat,
 		unsigned int request_mask, unsigned int query_flags);
 int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index e9705b3295d3..b5e51c5b11a7 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -282,7 +282,8 @@ int exfat_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
-int exfat_setattr(struct dentry *dentry, struct iattr *attr)
+int exfat_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *attr)
 {
 	struct exfat_sb_info *sbi = EXFAT_SB(dentry->d_sb);
 	struct inode *inode = dentry->d_inode;
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 2932b23a3b6c..de92cf697ecb 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -541,8 +541,8 @@ static int exfat_add_entry(struct inode *inode, const char *path,
 	return ret;
 }
 
-static int exfat_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		bool excl)
+static int exfat_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct super_block *sb = dir->i_sb;
 	struct inode *inode;
@@ -827,7 +827,8 @@ static int exfat_unlink(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int exfat_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int exfat_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct super_block *sb = dir->i_sb;
 	struct inode *inode;
@@ -1318,9 +1319,9 @@ static int __exfat_rename(struct inode *old_parent_inode,
 	return ret;
 }
 
-static int exfat_rename(struct inode *old_dir, struct dentry *old_dentry,
-		struct inode *new_dir, struct dentry *new_dentry,
-		unsigned int flags)
+static int exfat_rename(struct user_namespace *user_ns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *old_inode, *new_inode;
 	struct super_block *sb = old_dir->i_sb;
diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c
index 826987b23ccb..709bbf2233e1 100644
--- a/fs/ext2/acl.c
+++ b/fs/ext2/acl.c
@@ -216,7 +216,8 @@ __ext2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
  * inode->i_mutex: down
  */
 int
-ext2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+ext2_set_acl(struct user_namespace *user_ns, struct inode *inode,
+	     struct posix_acl *acl, int type)
 {
 	int error;
 	int update_mode = 0;
diff --git a/fs/ext2/acl.h b/fs/ext2/acl.h
index 0f01c759daac..8d9783ea725e 100644
--- a/fs/ext2/acl.h
+++ b/fs/ext2/acl.h
@@ -56,7 +56,8 @@ static inline int ext2_acl_count(size_t size)
 
 /* acl.c */
 extern struct posix_acl *ext2_get_acl(struct inode *inode, int type);
-extern int ext2_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+extern int ext2_set_acl(struct user_namespace *user_ns, struct inode *inode,
+			struct posix_acl *acl, int type);
 extern int ext2_init_acl (struct inode *, struct inode *);
 
 #else
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 5136b7289e8d..265af9933f0b 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -757,7 +757,7 @@ extern struct inode *ext2_iget (struct super_block *, unsigned long);
 extern int ext2_write_inode (struct inode *, struct writeback_control *);
 extern void ext2_evict_inode(struct inode *);
 extern int ext2_get_block(struct inode *, sector_t, struct buffer_head *, int);
-extern int ext2_setattr (struct dentry *, struct iattr *);
+extern int ext2_setattr (struct user_namespace *, struct dentry *, struct iattr *);
 extern int ext2_getattr (const struct path *, struct kstat *, u32, unsigned int);
 extern void ext2_set_inode_flags(struct inode *inode);
 extern int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 7fcda76094ff..094a99b1737a 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1663,7 +1663,8 @@ int ext2_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
-int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
+int ext2_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		 struct iattr *iattr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index 5bf2c145643b..b0d8768a7060 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -100,7 +100,8 @@ struct dentry *ext2_get_parent(struct dentry *child)
  * If the create succeeds, we fill in the inode information
  * with d_instantiate(). 
  */
-static int ext2_create (struct inode * dir, struct dentry * dentry, umode_t mode, bool excl)
+static int ext2_create (struct user_namespace * user_ns,
+			struct inode * dir, struct dentry * dentry, umode_t mode, bool excl)
 {
 	struct inode *inode;
 	int err;
@@ -118,7 +119,8 @@ static int ext2_create (struct inode * dir, struct dentry * dentry, umode_t mode
 	return ext2_add_nondir(dentry, inode);
 }
 
-static int ext2_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ext2_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode = ext2_new_inode(dir, mode, NULL);
 	if (IS_ERR(inode))
@@ -131,7 +133,8 @@ static int ext2_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return 0;
 }
 
-static int ext2_mknod (struct inode * dir, struct dentry *dentry, umode_t mode, dev_t rdev)
+static int ext2_mknod (struct user_namespace * user_ns, struct inode * dir,
+	struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct inode * inode;
 	int err;
@@ -151,8 +154,8 @@ static int ext2_mknod (struct inode * dir, struct dentry *dentry, umode_t mode,
 	return err;
 }
 
-static int ext2_symlink (struct inode * dir, struct dentry * dentry,
-	const char * symname)
+static int ext2_symlink (struct user_namespace * user_ns, struct inode * dir,
+	struct dentry * dentry, const char * symname)
 {
 	struct super_block * sb = dir->i_sb;
 	int err = -ENAMETOOLONG;
@@ -225,7 +228,8 @@ static int ext2_link (struct dentry * old_dentry, struct inode * dir,
 	return err;
 }
 
-static int ext2_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
+static int ext2_mkdir(struct user_namespace * user_ns,
+	struct inode * dir, struct dentry * dentry, umode_t mode)
 {
 	struct inode * inode;
 	int err;
@@ -315,9 +319,9 @@ static int ext2_rmdir (struct inode * dir, struct dentry *dentry)
 	return err;
 }
 
-static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
-			struct inode * new_dir,	struct dentry * new_dentry,
-			unsigned int flags)
+static int ext2_rename (struct user_namespace * user_ns, struct inode * old_dir,
+			struct dentry * old_dentry, struct inode * new_dir,
+			struct dentry * new_dentry, unsigned int flags)
 {
 	struct inode * old_inode = d_inode(old_dentry);
 	struct inode * new_inode = d_inode(new_dentry);
diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 4aad060010d8..3ab0a69b974b 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -222,7 +222,8 @@ __ext4_set_acl(handle_t *handle, struct inode *inode, int type,
 }
 
 int
-ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+ext4_set_acl(struct user_namespace *user_ns, struct inode *inode,
+	     struct posix_acl *acl, int type)
 {
 	handle_t *handle;
 	int error, credits, retries = 0;
diff --git a/fs/ext4/acl.h b/fs/ext4/acl.h
index 9b63f5416a2f..e1483e81c074 100644
--- a/fs/ext4/acl.h
+++ b/fs/ext4/acl.h
@@ -56,7 +56,8 @@ static inline int ext4_acl_count(size_t size)
 
 /* acl.c */
 struct posix_acl *ext4_get_acl(struct inode *inode, int type);
-int ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+int ext4_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type);
 extern int ext4_init_acl(handle_t *, struct inode *, struct inode *);
 
 #else  /* CONFIG_EXT4_FS_POSIX_ACL */
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 45fcdbf538d1..4c8bdcea0a0c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2820,7 +2820,7 @@ extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 	__ext4_iget((sb), (ino), (flags), __func__, __LINE__)
 
 extern int  ext4_write_inode(struct inode *, struct writeback_control *);
-extern int  ext4_setattr(struct dentry *, struct iattr *);
+extern int  ext4_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 extern int  ext4_getattr(const struct path *, struct kstat *, u32, unsigned int);
 extern void ext4_evict_inode(struct inode *);
 extern void ext4_clear_inode(struct inode *);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 03be530cebbe..90a8c2f29616 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5305,7 +5305,8 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
  *
  * Called with inode->i_mutex down.
  */
-int ext4_setattr(struct dentry *dentry, struct iattr *attr)
+int ext4_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		 struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error, rc = 0;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index f458d1d81d96..aa7128f16e10 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2602,8 +2602,8 @@ static int ext4_add_nondir(handle_t *handle,
  * If the create succeeds, we fill in the inode information
  * with d_instantiate().
  */
-static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		       bool excl)
+static int ext4_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
 	handle_t *handle;
 	struct inode *inode, *inode_save;
@@ -2639,8 +2639,8 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 	return err;
 }
 
-static int ext4_mknod(struct inode *dir, struct dentry *dentry,
-		      umode_t mode, dev_t rdev)
+static int ext4_mknod(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	handle_t *handle;
 	struct inode *inode, *inode_save;
@@ -2676,7 +2676,8 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ext4_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode)
 {
 	handle_t *handle;
 	struct inode *inode;
@@ -2785,7 +2786,8 @@ int ext4_init_new_dir(handle_t *handle, struct inode *dir,
 	return err;
 }
 
-static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ext4_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	handle_t *handle;
 	struct inode *inode;
@@ -3297,7 +3299,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
 	return retval;
 }
 
-static int ext4_symlink(struct inode *dir,
+static int ext4_symlink(struct user_namespace *user_ns, struct inode *dir,
 			struct dentry *dentry, const char *symname)
 {
 	handle_t *handle;
@@ -4089,9 +4091,9 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	return retval;
 }
 
-static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
-			struct inode *new_dir, struct dentry *new_dentry,
-			unsigned int flags)
+static int ext4_rename2(struct user_namespace *user_ns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
 {
 	int err;
 
diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index 50735e8a354e..007865672d3c 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -248,7 +248,8 @@ static int __f2fs_set_acl(struct inode *inode, int type,
 	return error;
 }
 
-int f2fs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int f2fs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type)
 {
 	if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
 		return -EIO;
diff --git a/fs/f2fs/acl.h b/fs/f2fs/acl.h
index 124868c13f80..986fd1bc780b 100644
--- a/fs/f2fs/acl.h
+++ b/fs/f2fs/acl.h
@@ -34,7 +34,8 @@ struct f2fs_acl_header {
 #ifdef CONFIG_F2FS_FS_POSIX_ACL
 
 extern struct posix_acl *f2fs_get_acl(struct inode *, int);
-extern int f2fs_set_acl(struct inode *, struct posix_acl *, int);
+extern int f2fs_set_acl(struct user_namespace *, struct inode *,
+			struct posix_acl *, int);
 extern int f2fs_init_acl(struct inode *, struct inode *, struct page *,
 							struct page *);
 #else
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cb700d797296..d71697eb2db1 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3169,7 +3169,8 @@ int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock);
 int f2fs_truncate(struct inode *inode);
 int f2fs_getattr(const struct path *path, struct kstat *stat,
 			u32 request_mask, unsigned int flags);
-int f2fs_setattr(struct dentry *dentry, struct iattr *attr);
+int f2fs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		 struct iattr *attr);
 int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end);
 void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count);
 int f2fs_precache_extents(struct inode *inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 20f7e5bdd9ad..370856b3d8e1 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -851,7 +851,8 @@ static void __setattr_copy(struct inode *inode, const struct iattr *attr)
 #define __setattr_copy setattr_copy
 #endif
 
-int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
+int f2fs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		 struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int err;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 66b522e61e50..55685a1d9431 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -314,8 +314,8 @@ static void set_compress_inode(struct f2fs_sb_info *sbi, struct inode *inode,
 	}
 }
 
-static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-						bool excl)
+static int f2fs_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
 	struct inode *inode;
@@ -636,8 +636,8 @@ static const char *f2fs_get_link(struct dentry *dentry,
 	return link;
 }
 
-static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
-					const char *symname)
+static int f2fs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, const char *symname)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
 	struct inode *inode;
@@ -716,7 +716,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int f2fs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
 	struct inode *inode;
@@ -769,8 +770,8 @@ static int f2fs_rmdir(struct inode *dir, struct dentry *dentry)
 	return -ENOTEMPTY;
 }
 
-static int f2fs_mknod(struct inode *dir, struct dentry *dentry,
-				umode_t mode, dev_t rdev)
+static int f2fs_mknod(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
 	struct inode *inode;
@@ -873,7 +874,8 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int f2fs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int f2fs_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
 
@@ -1246,9 +1248,9 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	return err;
 }
 
-static int f2fs_rename2(struct inode *old_dir, struct dentry *old_dentry,
-			struct inode *new_dir, struct dentry *new_dentry,
-			unsigned int flags)
+static int f2fs_rename2(struct user_namespace *user_ns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
 {
 	int err;
 
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 922a0c6ba46c..982c36c8971d 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -397,7 +397,8 @@ extern long fat_generic_ioctl(struct file *filp, unsigned int cmd,
 			      unsigned long arg);
 extern const struct file_operations fat_file_operations;
 extern const struct inode_operations fat_file_inode_operations;
-extern int fat_setattr(struct dentry *dentry, struct iattr *attr);
+extern int fat_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		       struct iattr *attr);
 extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
 extern int fat_getattr(const struct path *path, struct kstat *stat,
 		       u32 request_mask, unsigned int flags);
diff --git a/fs/fat/file.c b/fs/fat/file.c
index f7e04f533d31..5b12cf209801 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -95,7 +95,7 @@ static int fat_ioctl_set_attributes(struct file *file, u32 __user *user_attr)
 		goto out_unlock_inode;
 
 	/* This MUST be done before doing anything irreversible... */
-	err = fat_setattr(file->f_path.dentry, &ia);
+	err = fat_setattr(mnt_user_ns(file->f_path.mnt), file->f_path.dentry, &ia);
 	if (err)
 		goto out_unlock_inode;
 
@@ -466,7 +466,8 @@ static int fat_allow_set_time(struct msdos_sb_info *sbi, struct inode *inode)
 /* valid file mode bits */
 #define FAT_VALID_MODE	(S_IFREG | S_IFDIR | S_IRWXUGO)
 
-int fat_setattr(struct dentry *dentry, struct iattr *attr)
+int fat_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		struct iattr *attr)
 {
 	struct msdos_sb_info *sbi = MSDOS_SB(dentry->d_sb);
 	struct inode *inode = d_inode(dentry);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 9d062886fbc1..608b0606f3ca 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -261,8 +261,8 @@ static int msdos_add_entry(struct inode *dir, const unsigned char *name,
 }
 
 /***** Create a file */
-static int msdos_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-			bool excl)
+static int msdos_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct super_block *sb = dir->i_sb;
 	struct inode *inode = NULL;
@@ -339,7 +339,8 @@ static int msdos_rmdir(struct inode *dir, struct dentry *dentry)
 }
 
 /***** Make a directory */
-static int msdos_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int msdos_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct super_block *sb = dir->i_sb;
 	struct fat_slot_info sinfo;
@@ -593,9 +594,9 @@ static int do_msdos_rename(struct inode *old_dir, unsigned char *old_name,
 }
 
 /***** Rename, a wrapper for rename_same_dir & rename_diff_dir */
-static int msdos_rename(struct inode *old_dir, struct dentry *old_dentry,
-			struct inode *new_dir, struct dentry *new_dentry,
-			unsigned int flags)
+static int msdos_rename(struct user_namespace *user_ns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
 {
 	struct super_block *sb = old_dir->i_sb;
 	unsigned char old_msdos_name[MSDOS_NAME], new_msdos_name[MSDOS_NAME];
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 0cdd0fb9f742..34903d14d6a6 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -756,8 +756,8 @@ static struct dentry *vfat_lookup(struct inode *dir, struct dentry *dentry,
 	return ERR_PTR(err);
 }
 
-static int vfat_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		       bool excl)
+static int vfat_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct super_block *sb = dir->i_sb;
 	struct inode *inode;
@@ -846,7 +846,8 @@ static int vfat_unlink(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int vfat_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int vfat_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	struct super_block *sb = dir->i_sb;
 	struct inode *inode;
@@ -892,9 +893,9 @@ static int vfat_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return err;
 }
 
-static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
-		       struct inode *new_dir, struct dentry *new_dentry,
-		       unsigned int flags)
+static int vfat_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		       struct dentry *old_dentry, struct inode *new_dir,
+		       struct dentry *new_dentry, unsigned int flags)
 {
 	struct buffer_head *dotdot_bh;
 	struct msdos_dir_entry *dotdot_de;
diff --git a/fs/fuse/acl.c b/fs/fuse/acl.c
index 5a48cee6d7d3..1e3f195c37d8 100644
--- a/fs/fuse/acl.c
+++ b/fs/fuse/acl.c
@@ -47,7 +47,8 @@ struct posix_acl *fuse_get_acl(struct inode *inode, int type)
 	return acl;
 }
 
-int fuse_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int fuse_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	const char *name;
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0750755685cd..05921106f7ca 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -597,7 +597,8 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	return err;
 }
 
-static int fuse_mknod(struct inode *, struct dentry *, umode_t, dev_t);
+static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
+		      umode_t, dev_t);
 static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 			    struct file *file, unsigned flags,
 			    umode_t mode)
@@ -634,7 +635,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	return err;
 
 mknod:
-	err = fuse_mknod(dir, entry, mode, 0);
+	err = fuse_mknod(&init_user_ns, dir, entry, mode, 0);
 	if (err)
 		goto out_dput;
 no_open:
@@ -701,8 +702,8 @@ static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
 	return err;
 }
 
-static int fuse_mknod(struct inode *dir, struct dentry *entry, umode_t mode,
-		      dev_t rdev)
+static int fuse_mknod(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *entry, umode_t mode, dev_t rdev)
 {
 	struct fuse_mknod_in inarg;
 	struct fuse_mount *fm = get_fuse_mount(dir);
@@ -724,13 +725,14 @@ static int fuse_mknod(struct inode *dir, struct dentry *entry, umode_t mode,
 	return create_new_entry(fm, &args, dir, entry, mode);
 }
 
-static int fuse_create(struct inode *dir, struct dentry *entry, umode_t mode,
-		       bool excl)
+static int fuse_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *entry, umode_t mode, bool excl)
 {
-	return fuse_mknod(dir, entry, mode, 0);
+	return fuse_mknod(user_ns, dir, entry, mode, 0);
 }
 
-static int fuse_mkdir(struct inode *dir, struct dentry *entry, umode_t mode)
+static int fuse_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *entry, umode_t mode)
 {
 	struct fuse_mkdir_in inarg;
 	struct fuse_mount *fm = get_fuse_mount(dir);
@@ -751,8 +753,8 @@ static int fuse_mkdir(struct inode *dir, struct dentry *entry, umode_t mode)
 	return create_new_entry(fm, &args, dir, entry, S_IFDIR);
 }
 
-static int fuse_symlink(struct inode *dir, struct dentry *entry,
-			const char *link)
+static int fuse_symlink(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *entry, const char *link)
 {
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	unsigned len = strlen(link) + 1;
@@ -888,9 +890,9 @@ static int fuse_rename_common(struct inode *olddir, struct dentry *oldent,
 	return err;
 }
 
-static int fuse_rename2(struct inode *olddir, struct dentry *oldent,
-			struct inode *newdir, struct dentry *newent,
-			unsigned int flags)
+static int fuse_rename2(struct user_namespace *user_ns, struct inode *olddir,
+			struct dentry *oldent, struct inode *newdir,
+			struct dentry *newent, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(olddir);
 	int err;
@@ -1226,7 +1228,7 @@ static int fuse_perm_getattr(struct inode *inode, int mask)
  * access request is sent.  Execute permission is still checked
  * locally based on file mode.
  */
-static int fuse_permission(struct inode *inode, int mask)
+static int fuse_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	bool refreshed = false;
@@ -1720,7 +1722,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 	return err;
 }
 
-static int fuse_setattr(struct dentry *entry, struct iattr *attr)
+static int fuse_setattr(struct user_namespace *user_ns, struct dentry *entry,
+			struct iattr *attr)
 {
 	struct inode *inode = d_inode(entry);
 	struct fuse_conn *fc = get_fuse_conn(inode);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d51598017d13..ce979adfa137 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1177,8 +1177,8 @@ extern const struct xattr_handler *fuse_no_acl_xattr_handlers[];
 
 struct posix_acl;
 struct posix_acl *fuse_get_acl(struct inode *inode, int type);
-int fuse_set_acl(struct inode *inode, struct posix_acl *acl, int type);
-
+int fuse_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type);
 
 /* readdir.c */
 int fuse_readdir(struct file *file, struct dir_context *ctx);
diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index ce88ef29eef0..0c6e91860511 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -106,7 +106,8 @@ int __gfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	return error;
 }
 
-int gfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int gfs2_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder gh;
diff --git a/fs/gfs2/acl.h b/fs/gfs2/acl.h
index 61353a1501c5..5d5cea5927d3 100644
--- a/fs/gfs2/acl.h
+++ b/fs/gfs2/acl.h
@@ -13,6 +13,7 @@
 
 extern struct posix_acl *gfs2_get_acl(struct inode *inode, int type);
 extern int __gfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type);
-extern int gfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+extern int gfs2_set_acl(struct user_namespace *user_ns, struct inode *inode,
+			struct posix_acl *acl, int type);
 
 #endif /* __ACL_DOT_H__ */
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 1d994bdfffaa..8f5523822788 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -256,7 +256,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask,
 	    !capable(CAP_LINUX_IMMUTABLE))
 		goto out;
 	if (!IS_IMMUTABLE(inode)) {
-		error = gfs2_permission(inode, MAY_WRITE);
+		error = gfs2_permission(&init_user_ns, inode, MAY_WRITE);
 		if (error)
 			goto out;
 	}
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index eebc07f62c50..9ed57ac7cc9b 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -320,7 +320,7 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 	}
 
 	if (!is_root) {
-		error = gfs2_permission(dir, MAY_EXEC);
+		error = gfs2_permission(&init_user_ns, dir, MAY_EXEC);
 		if (error)
 			goto out;
 	}
@@ -350,7 +350,8 @@ static int create_ok(struct gfs2_inode *dip, const struct qstr *name,
 {
 	int error;
 
-	error = gfs2_permission(&dip->i_inode, MAY_WRITE | MAY_EXEC);
+	error = gfs2_permission(&init_user_ns, &dip->i_inode,
+				MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 
@@ -841,8 +842,8 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
  * Returns: errno
  */
 
-static int gfs2_create(struct inode *dir, struct dentry *dentry,
-		       umode_t mode, bool excl)
+static int gfs2_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
 	return gfs2_create_inode(dir, dentry, NULL, S_IFREG | mode, 0, NULL, 0, excl);
 }
@@ -949,7 +950,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
 	if (inode->i_nlink == 0)
 		goto out_gunlock;
 
-	error = gfs2_permission(dir, MAY_WRITE | MAY_EXEC);
+	error = gfs2_permission(&init_user_ns, dir, MAY_WRITE | MAY_EXEC);
 	if (error)
 		goto out_gunlock;
 
@@ -1066,7 +1067,8 @@ static int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name,
 	if (IS_APPEND(&dip->i_inode))
 		return -EPERM;
 
-	error = gfs2_permission(&dip->i_inode, MAY_WRITE | MAY_EXEC);
+	error = gfs2_permission(&init_user_ns, &dip->i_inode,
+				MAY_WRITE | MAY_EXEC);
 	if (error)
 		return error;
 
@@ -1202,8 +1204,8 @@ static int gfs2_unlink(struct inode *dir, struct dentry *dentry)
  * Returns: errno
  */
 
-static int gfs2_symlink(struct inode *dir, struct dentry *dentry,
-			const char *symname)
+static int gfs2_symlink(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, const char *symname)
 {
 	unsigned int size;
 
@@ -1223,7 +1225,8 @@ static int gfs2_symlink(struct inode *dir, struct dentry *dentry,
  * Returns: errno
  */
 
-static int gfs2_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int gfs2_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	unsigned dsize = gfs2_max_stuffed_size(GFS2_I(dir));
 	return gfs2_create_inode(dir, dentry, NULL, S_IFDIR | mode, 0, NULL, dsize, 0);
@@ -1238,8 +1241,8 @@ static int gfs2_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
  *
  */
 
-static int gfs2_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
-		      dev_t dev)
+static int gfs2_mknod(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	return gfs2_create_inode(dir, dentry, NULL, mode, dev, NULL, 0, 0);
 }
@@ -1488,7 +1491,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 			}
 		}
 	} else {
-		error = gfs2_permission(ndir, MAY_WRITE | MAY_EXEC);
+		error = gfs2_permission(&init_user_ns, ndir, MAY_WRITE | MAY_EXEC);
 		if (error)
 			goto out_gunlock;
 
@@ -1523,7 +1526,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
 	/* Check out the dir to be renamed */
 
 	if (dir_rename) {
-		error = gfs2_permission(d_inode(odentry), MAY_WRITE);
+		error = gfs2_permission(&init_user_ns, d_inode(odentry), MAY_WRITE);
 		if (error)
 			goto out_gunlock;
 	}
@@ -1686,12 +1689,12 @@ static int gfs2_exchange(struct inode *odir, struct dentry *odentry,
 		goto out_gunlock;
 
 	if (S_ISDIR(old_mode)) {
-		error = gfs2_permission(odentry->d_inode, MAY_WRITE);
+		error = gfs2_permission(&init_user_ns, odentry->d_inode, MAY_WRITE);
 		if (error)
 			goto out_gunlock;
 	}
 	if (S_ISDIR(new_mode)) {
-		error = gfs2_permission(ndentry->d_inode, MAY_WRITE);
+		error = gfs2_permission(&init_user_ns, ndentry->d_inode, MAY_WRITE);
 		if (error)
 			goto out_gunlock;
 	}
@@ -1745,9 +1748,9 @@ static int gfs2_exchange(struct inode *odir, struct dentry *odentry,
 	return error;
 }
 
-static int gfs2_rename2(struct inode *odir, struct dentry *odentry,
-			struct inode *ndir, struct dentry *ndentry,
-			unsigned int flags)
+static int gfs2_rename2(struct user_namespace *user_ns, struct inode *odir,
+			struct dentry *odentry, struct inode *ndir,
+			struct dentry *ndentry, unsigned int flags)
 {
 	flags &= ~RENAME_NOREPLACE;
 
@@ -1831,7 +1834,7 @@ static const char *gfs2_get_link(struct dentry *dentry,
  * Returns: errno
  */
 
-int gfs2_permission(struct inode *inode, int mask)
+int gfs2_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct gfs2_inode *ip;
 	struct gfs2_holder i_gh;
@@ -1961,7 +1964,8 @@ static int setattr_chown(struct inode *inode, struct iattr *attr)
  * Returns: errno
  */
 
-static int gfs2_setattr(struct dentry *dentry, struct iattr *attr)
+static int gfs2_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct gfs2_inode *ip = GFS2_I(inode);
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index b52ecf4ffe63..6c8db2361ed6 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -99,7 +99,7 @@ extern int gfs2_inode_refresh(struct gfs2_inode *ip);
 
 extern struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
 				  int is_root);
-extern int gfs2_permission(struct inode *inode, int mask);
+extern int gfs2_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 extern int gfs2_setattr_simple(struct inode *inode, struct iattr *attr);
 extern struct inode *gfs2_lookup_simple(struct inode *dip, const char *name);
 extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf);
diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c
index 3bf2ae0e467c..def978306b8a 100644
--- a/fs/hfs/dir.c
+++ b/fs/hfs/dir.c
@@ -189,8 +189,8 @@ static int hfs_dir_release(struct inode *inode, struct file *file)
  * a directory and return a corresponding inode, given the inode for
  * the directory and the name (and its length) of the new file.
  */
-static int hfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		      bool excl)
+static int hfs_create(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct inode *inode;
 	int res;
@@ -219,7 +219,8 @@ static int hfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
  * in a directory, given the inode for the parent directory and the
  * name (and its length) of the new directory.
  */
-static int hfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int hfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 	int res;
@@ -279,9 +280,9 @@ static int hfs_remove(struct inode *dir, struct dentry *dentry)
  * new file/directory.
  * XXX: how do you handle must_be dir?
  */
-static int hfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		      struct inode *new_dir, struct dentry *new_dentry,
-		      unsigned int flags)
+static int hfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		      struct dentry *old_dentry, struct inode *new_dir,
+		      struct dentry *new_dentry, unsigned int flags)
 {
 	int res;
 
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index f71c384064c8..e84985416511 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -204,7 +204,7 @@ extern const struct address_space_operations hfs_btree_aops;
 extern struct inode *hfs_new_inode(struct inode *, const struct qstr *, umode_t);
 extern void hfs_inode_write_fork(struct inode *, struct hfs_extent *, __be32 *, __be32 *);
 extern int hfs_write_inode(struct inode *, struct writeback_control *);
-extern int hfs_inode_setattr(struct dentry *, struct iattr *);
+extern int hfs_inode_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 extern void hfs_inode_read_fork(struct inode *inode, struct hfs_extent *ext,
 			__be32 log_size, __be32 phys_size, u32 clump_size);
 extern struct inode *hfs_iget(struct super_block *, struct hfs_cat_key *, hfs_cat_rec *);
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index c646218b72bf..00c52ca4d57c 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -602,7 +602,8 @@ static int hfs_file_release(struct inode *inode, struct file *file)
  *     correspond to the same HFS file.
  */
 
-int hfs_inode_setattr(struct dentry *dentry, struct iattr * attr)
+int hfs_inode_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		      struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct hfs_sb_info *hsb = HFS_SB(inode->i_sb);
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index 29a9dcfbe81f..ad3ac9dc0ed0 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -434,8 +434,8 @@ static int hfsplus_rmdir(struct inode *dir, struct dentry *dentry)
 	return res;
 }
 
-static int hfsplus_symlink(struct inode *dir, struct dentry *dentry,
-			   const char *symname)
+static int hfsplus_symlink(struct user_namespace *user_ns, struct inode *dir,
+			   struct dentry *dentry, const char *symname)
 {
 	struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
 	struct inode *inode;
@@ -476,8 +476,8 @@ static int hfsplus_symlink(struct inode *dir, struct dentry *dentry,
 	return res;
 }
 
-static int hfsplus_mknod(struct inode *dir, struct dentry *dentry,
-			 umode_t mode, dev_t rdev)
+static int hfsplus_mknod(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
 	struct inode *inode;
@@ -517,20 +517,21 @@ static int hfsplus_mknod(struct inode *dir, struct dentry *dentry,
 	return res;
 }
 
-static int hfsplus_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-			  bool excl)
+static int hfsplus_create(struct user_namespace *user_ns, struct inode *dir,
+			  struct dentry *dentry, umode_t mode, bool excl)
 {
-	return hfsplus_mknod(dir, dentry, mode, 0);
+	return hfsplus_mknod(user_ns, dir, dentry, mode, 0);
 }
 
-static int hfsplus_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int hfsplus_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode)
 {
-	return hfsplus_mknod(dir, dentry, mode | S_IFDIR, 0);
+	return hfsplus_mknod(user_ns, dir, dentry, mode | S_IFDIR, 0);
 }
 
-static int hfsplus_rename(struct inode *old_dir, struct dentry *old_dentry,
-			  struct inode *new_dir, struct dentry *new_dentry,
-			  unsigned int flags)
+static int hfsplus_rename(struct user_namespace *user_ns, struct inode *old_dir,
+			  struct dentry *old_dentry, struct inode *new_dir,
+			  struct dentry *new_dentry, unsigned int flags)
 {
 	int res;
 
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index a34f37a53dcd..85343018dbba 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -241,7 +241,8 @@ static int hfsplus_file_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static int hfsplus_setattr(struct dentry *dentry, struct iattr *attr)
+static int hfsplus_setattr(struct user_namespace *user_ns,
+			   struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index a00899c8fd52..9b93650e81f4 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -555,8 +555,8 @@ static int read_name(struct inode *ino, char *name)
 	return 0;
 }
 
-static int hostfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-			 bool excl)
+static int hostfs_create(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct inode *inode;
 	char *name;
@@ -654,8 +654,8 @@ static int hostfs_unlink(struct inode *ino, struct dentry *dentry)
 	return err;
 }
 
-static int hostfs_symlink(struct inode *ino, struct dentry *dentry,
-			  const char *to)
+static int hostfs_symlink(struct user_namespace *user_ns, struct inode *ino,
+			  struct dentry *dentry, const char *to)
 {
 	char *file;
 	int err;
@@ -667,7 +667,8 @@ static int hostfs_symlink(struct inode *ino, struct dentry *dentry,
 	return err;
 }
 
-static int hostfs_mkdir(struct inode *ino, struct dentry *dentry, umode_t mode)
+static int hostfs_mkdir(struct user_namespace *user_ns, struct inode *ino,
+			struct dentry *dentry, umode_t mode)
 {
 	char *file;
 	int err;
@@ -691,7 +692,8 @@ static int hostfs_rmdir(struct inode *ino, struct dentry *dentry)
 	return err;
 }
 
-static int hostfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
+static int hostfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	struct inode *inode;
 	char *name;
@@ -729,9 +731,9 @@ static int hostfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
 	return err;
 }
 
-static int hostfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
-			  struct inode *new_dir, struct dentry *new_dentry,
-			  unsigned int flags)
+static int hostfs_rename2(struct user_namespace *user_ns, struct inode *old_dir,
+			  struct dentry *old_dentry, struct inode *new_dir,
+			  struct dentry *new_dentry, unsigned int flags)
 {
 	char *old_name, *new_name;
 	int err;
@@ -757,7 +759,7 @@ static int hostfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
 	return err;
 }
 
-static int hostfs_permission(struct inode *ino, int desired)
+static int hostfs_permission(struct user_namespace *user_ns, struct inode *ino, int desired)
 {
 	char *name;
 	int r = 0, w = 0, x = 0, err;
@@ -783,7 +785,8 @@ static int hostfs_permission(struct inode *ino, int desired)
 	return err;
 }
 
-static int hostfs_setattr(struct dentry *dentry, struct iattr *attr)
+static int hostfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			  struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct hostfs_iattr attrs;
diff --git a/fs/hpfs/hpfs_fn.h b/fs/hpfs/hpfs_fn.h
index 1cca83218fb5..167ec6884642 100644
--- a/fs/hpfs/hpfs_fn.h
+++ b/fs/hpfs/hpfs_fn.h
@@ -280,7 +280,7 @@ void hpfs_init_inode(struct inode *);
 void hpfs_read_inode(struct inode *);
 void hpfs_write_inode(struct inode *);
 void hpfs_write_inode_nolock(struct inode *);
-int hpfs_setattr(struct dentry *, struct iattr *);
+int hpfs_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 void hpfs_write_if_changed(struct inode *);
 void hpfs_evict_inode(struct inode *);
 
diff --git a/fs/hpfs/inode.c b/fs/hpfs/inode.c
index 8ba2152a78ba..b75d23656b28 100644
--- a/fs/hpfs/inode.c
+++ b/fs/hpfs/inode.c
@@ -257,7 +257,8 @@ void hpfs_write_inode_nolock(struct inode *i)
 	brelse(bh);
 }
 
-int hpfs_setattr(struct dentry *dentry, struct iattr *attr)
+int hpfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		 struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error = -EINVAL;
diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
index 1aee39160ac5..42a03bca9557 100644
--- a/fs/hpfs/namei.c
+++ b/fs/hpfs/namei.c
@@ -20,7 +20,8 @@ static void hpfs_update_directory_times(struct inode *dir)
 	hpfs_write_inode_nolock(dir);
 }
 
-static int hpfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int hpfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	const unsigned char *name = dentry->d_name.name;
 	unsigned len = dentry->d_name.len;
@@ -128,7 +129,8 @@ static int hpfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return err;
 }
 
-static int hpfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl)
+static int hpfs_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
 	const unsigned char *name = dentry->d_name.name;
 	unsigned len = dentry->d_name.len;
@@ -215,7 +217,8 @@ static int hpfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, b
 	return err;
 }
 
-static int hpfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rdev)
+static int hpfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	const unsigned char *name = dentry->d_name.name;
 	unsigned len = dentry->d_name.len;
@@ -289,7 +292,8 @@ static int hpfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, de
 	return err;
 }
 
-static int hpfs_symlink(struct inode *dir, struct dentry *dentry, const char *symlink)
+static int hpfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, const char *symlink)
 {
 	const unsigned char *name = dentry->d_name.name;
 	unsigned len = dentry->d_name.len;
@@ -506,10 +510,10 @@ static int hpfs_symlink_readpage(struct file *file, struct page *page)
 const struct address_space_operations hpfs_symlink_aops = {
 	.readpage	= hpfs_symlink_readpage
 };
-	
-static int hpfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		       struct inode *new_dir, struct dentry *new_dentry,
-		       unsigned int flags)
+
+static int hpfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		       struct dentry *old_dentry, struct inode *new_dir,
+		       struct dentry *new_dentry, unsigned int flags)
 {
 	const unsigned char *old_name = old_dentry->d_name.name;
 	unsigned old_len = old_dentry->d_name.len;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 53d57d852073..3ef737db2915 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -751,7 +751,8 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 	return error;
 }
 
-static int hugetlbfs_setattr(struct dentry *dentry, struct iattr *attr)
+static int hugetlbfs_setattr(struct user_namespace *user_ns,
+			     struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct hstate *h = hstate_inode(inode);
@@ -898,33 +899,35 @@ static int do_hugetlbfs_mknod(struct inode *dir,
 	return error;
 }
 
-static int hugetlbfs_mknod(struct inode *dir,
-			struct dentry *dentry, umode_t mode, dev_t dev)
+static int hugetlbfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+			   struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	return do_hugetlbfs_mknod(dir, dentry, mode, dev, false);
 }
 
-static int hugetlbfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int hugetlbfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			   struct dentry *dentry, umode_t mode)
 {
-	int retval = hugetlbfs_mknod(dir, dentry, mode | S_IFDIR, 0);
+	int retval = hugetlbfs_mknod(user_ns, dir, dentry, mode | S_IFDIR, 0);
 	if (!retval)
 		inc_nlink(dir);
 	return retval;
 }
 
-static int hugetlbfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl)
+static int hugetlbfs_create(struct user_namespace *user_ns, struct inode *dir,
+			    struct dentry *dentry, umode_t mode, bool excl)
 {
-	return hugetlbfs_mknod(dir, dentry, mode | S_IFREG, 0);
+	return hugetlbfs_mknod(user_ns, dir, dentry, mode | S_IFREG, 0);
 }
 
-static int hugetlbfs_tmpfile(struct inode *dir,
-			struct dentry *dentry, umode_t mode)
+static int hugetlbfs_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+			     struct dentry *dentry, umode_t mode)
 {
 	return do_hugetlbfs_mknod(dir, dentry, mode | S_IFREG, 0, true);
 }
 
-static int hugetlbfs_symlink(struct inode *dir,
-			struct dentry *dentry, const char *symname)
+static int hugetlbfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			     struct dentry *dentry, const char *symname)
 {
 	struct inode *inode;
 	int error = -ENOSPC;
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index cf07a2fdf8bf..71325c8d05d3 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -226,7 +226,8 @@ static int __jffs2_set_acl(struct inode *inode, int xprefix, struct posix_acl *a
 	return rc;
 }
 
-int jffs2_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int jffs2_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		  struct posix_acl *acl, int type)
 {
 	int rc, xprefix;
 
diff --git a/fs/jffs2/acl.h b/fs/jffs2/acl.h
index 12d0271bdde3..57b61d93ee11 100644
--- a/fs/jffs2/acl.h
+++ b/fs/jffs2/acl.h
@@ -28,7 +28,8 @@ struct jffs2_acl_header {
 #ifdef CONFIG_JFFS2_FS_POSIX_ACL
 
 struct posix_acl *jffs2_get_acl(struct inode *inode, int type);
-int jffs2_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+int jffs2_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		  struct posix_acl *acl, int type);
 extern int jffs2_init_acl_pre(struct inode *, struct inode *, umode_t *);
 extern int jffs2_init_acl_post(struct inode *);
 
diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 776493713153..b94fdc64f4ad 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -24,17 +24,18 @@
 
 static int jffs2_readdir (struct file *, struct dir_context *);
 
-static int jffs2_create (struct inode *,struct dentry *,umode_t,
-			 bool);
+static int jffs2_create (struct user_namespace *, struct inode *,
+		         struct dentry *, umode_t, bool);
 static struct dentry *jffs2_lookup (struct inode *,struct dentry *,
 				    unsigned int);
 static int jffs2_link (struct dentry *,struct inode *,struct dentry *);
 static int jffs2_unlink (struct inode *,struct dentry *);
-static int jffs2_symlink (struct inode *,struct dentry *,const char *);
-static int jffs2_mkdir (struct inode *,struct dentry *,umode_t);
+static int jffs2_symlink (struct user_namespace *, struct inode *,
+			  struct dentry *, const char *);
+static int jffs2_mkdir (struct user_namespace *, struct inode *,struct dentry *,umode_t);
 static int jffs2_rmdir (struct inode *,struct dentry *);
-static int jffs2_mknod (struct inode *,struct dentry *,umode_t,dev_t);
-static int jffs2_rename (struct inode *, struct dentry *,
+static int jffs2_mknod (struct user_namespace *, struct inode *,struct dentry *,umode_t,dev_t);
+static int jffs2_rename (struct user_namespace *, struct inode *, struct dentry *,
 			 struct inode *, struct dentry *,
 			 unsigned int);
 
@@ -157,8 +158,8 @@ static int jffs2_readdir(struct file *file, struct dir_context *ctx)
 /***********************************************************************/
 
 
-static int jffs2_create(struct inode *dir_i, struct dentry *dentry,
-			umode_t mode, bool excl)
+static int jffs2_create(struct user_namespace *user_ns, struct inode *dir_i,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct jffs2_raw_inode *ri;
 	struct jffs2_inode_info *f, *dir_f;
@@ -276,7 +277,8 @@ static int jffs2_link (struct dentry *old_dentry, struct inode *dir_i, struct de
 
 /***********************************************************************/
 
-static int jffs2_symlink (struct inode *dir_i, struct dentry *dentry, const char *target)
+static int jffs2_symlink (struct user_namespace *user_ns, struct inode *dir_i,
+			  struct dentry *dentry, const char *target)
 {
 	struct jffs2_inode_info *f, *dir_f;
 	struct jffs2_sb_info *c;
@@ -438,7 +440,8 @@ static int jffs2_symlink (struct inode *dir_i, struct dentry *dentry, const char
 }
 
 
-static int jffs2_mkdir (struct inode *dir_i, struct dentry *dentry, umode_t mode)
+static int jffs2_mkdir (struct user_namespace *user_ns, struct inode *dir_i,
+		        struct dentry *dentry, umode_t mode)
 {
 	struct jffs2_inode_info *f, *dir_f;
 	struct jffs2_sb_info *c;
@@ -609,7 +612,8 @@ static int jffs2_rmdir (struct inode *dir_i, struct dentry *dentry)
 	return ret;
 }
 
-static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, umode_t mode, dev_t rdev)
+static int jffs2_mknod (struct user_namespace *user_ns, struct inode *dir_i,
+		        struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct jffs2_inode_info *f, *dir_f;
 	struct jffs2_sb_info *c;
@@ -756,9 +760,9 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, umode_t mode
 	return ret;
 }
 
-static int jffs2_rename (struct inode *old_dir_i, struct dentry *old_dentry,
-			 struct inode *new_dir_i, struct dentry *new_dentry,
-			 unsigned int flags)
+static int jffs2_rename (struct user_namespace *user_ns, struct inode *old_dir_i,
+		         struct dentry *old_dentry, struct inode *new_dir_i,
+		         struct dentry *new_dentry, unsigned int flags)
 {
 	int ret;
 	struct jffs2_sb_info *c = JFFS2_SB_INFO(old_dir_i->i_sb);
diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c
index ee9f51bab4c6..31e405659e98 100644
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -190,7 +190,8 @@ int jffs2_do_setattr (struct inode *inode, struct iattr *iattr)
 	return 0;
 }
 
-int jffs2_setattr(struct dentry *dentry, struct iattr *iattr)
+int jffs2_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *iattr)
 {
 	struct inode *inode = d_inode(dentry);
 	int rc;
diff --git a/fs/jffs2/os-linux.h b/fs/jffs2/os-linux.h
index ef1cfa61549e..173eccac691d 100644
--- a/fs/jffs2/os-linux.h
+++ b/fs/jffs2/os-linux.h
@@ -164,7 +164,7 @@ long jffs2_ioctl(struct file *, unsigned int, unsigned long);
 extern const struct inode_operations jffs2_symlink_inode_operations;
 
 /* fs.c */
-int jffs2_setattr (struct dentry *, struct iattr *);
+int jffs2_setattr (struct user_namespace *, struct dentry *, struct iattr *);
 int jffs2_do_setattr (struct inode *, struct iattr *);
 struct inode *jffs2_iget(struct super_block *, unsigned long);
 void jffs2_evict_inode (struct inode *);
diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
index cf79a34bfada..caa04c7f4609 100644
--- a/fs/jfs/acl.c
+++ b/fs/jfs/acl.c
@@ -91,7 +91,8 @@ static int __jfs_set_acl(tid_t tid, struct inode *inode, int type,
 	return rc;
 }
 
-int jfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int jfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		struct posix_acl *acl, int type)
 {
 	int rc;
 	tid_t tid;
diff --git a/fs/jfs/file.c b/fs/jfs/file.c
index 61c3b0c1fbf6..d427aa4812d2 100644
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@@ -85,7 +85,8 @@ static int jfs_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-int jfs_setattr(struct dentry *dentry, struct iattr *iattr)
+int jfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		struct iattr *iattr)
 {
 	struct inode *inode = d_inode(dentry);
 	int rc;
diff --git a/fs/jfs/jfs_acl.h b/fs/jfs/jfs_acl.h
index 9f8f92dd6f84..200d7b77d6f5 100644
--- a/fs/jfs/jfs_acl.h
+++ b/fs/jfs/jfs_acl.h
@@ -8,7 +8,8 @@
 #ifdef CONFIG_JFS_POSIX_ACL
 
 struct posix_acl *jfs_get_acl(struct inode *inode, int type);
-int jfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+int jfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		struct posix_acl *acl, int type);
 int jfs_init_acl(tid_t, struct inode *, struct inode *);
 
 #else
diff --git a/fs/jfs/jfs_inode.h b/fs/jfs/jfs_inode.h
index 70a0d12e427e..01daa0cb0ae5 100644
--- a/fs/jfs/jfs_inode.h
+++ b/fs/jfs/jfs_inode.h
@@ -26,7 +26,7 @@ extern struct dentry *jfs_fh_to_parent(struct super_block *sb, struct fid *fid,
 	int fh_len, int fh_type);
 extern void jfs_set_inode_flags(struct inode *);
 extern int jfs_get_block(struct inode *, sector_t, struct buffer_head *, int);
-extern int jfs_setattr(struct dentry *, struct iattr *);
+extern int jfs_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 
 extern const struct address_space_operations jfs_aops;
 extern const struct inode_operations jfs_dir_inode_operations;
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index 7a55d14cc1af..472efe501e28 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -59,8 +59,8 @@ static inline void free_ea_wmap(struct inode *inode)
  * RETURN:	Errors from subroutines
  *
  */
-static int jfs_create(struct inode *dip, struct dentry *dentry, umode_t mode,
-		bool excl)
+static int jfs_create(struct user_namespace *user_ns, struct inode *dip,
+		      struct dentry *dentry, umode_t mode, bool excl)
 {
 	int rc = 0;
 	tid_t tid;		/* transaction id */
@@ -192,7 +192,8 @@ static int jfs_create(struct inode *dip, struct dentry *dentry, umode_t mode,
  * note:
  * EACCES: user needs search+write permission on the parent directory
  */
-static int jfs_mkdir(struct inode *dip, struct dentry *dentry, umode_t mode)
+static int jfs_mkdir(struct user_namespace *user_ns, struct inode *dip,
+		     struct dentry *dentry, umode_t mode)
 {
 	int rc = 0;
 	tid_t tid;		/* transaction id */
@@ -868,8 +869,8 @@ static int jfs_link(struct dentry *old_dentry,
  * an intermediate result whose length exceeds PATH_MAX [XPG4.2]
 */
 
-static int jfs_symlink(struct inode *dip, struct dentry *dentry,
-		const char *name)
+static int jfs_symlink(struct user_namespace *user_ns, struct inode *dip,
+		       struct dentry *dentry, const char *name)
 {
 	int rc;
 	tid_t tid;
@@ -1058,9 +1059,9 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
  *
  * FUNCTION:	rename a file or directory
  */
-static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		      struct inode *new_dir, struct dentry *new_dentry,
-		      unsigned int flags)
+static int jfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		      struct dentry *old_dentry, struct inode *new_dir,
+		      struct dentry *new_dentry, unsigned int flags)
 {
 	struct btstack btstack;
 	ino_t ino;
@@ -1344,8 +1345,8 @@ static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
  *
  * FUNCTION:	Create a special file (device)
  */
-static int jfs_mknod(struct inode *dir, struct dentry *dentry,
-		umode_t mode, dev_t rdev)
+static int jfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct jfs_inode_info *jfs_ip;
 	struct btstack btstack;
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 9aec80b9d7c6..9e9a8266ca76 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1111,8 +1111,8 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir,
 	return ret;
 }
 
-static int kernfs_iop_mkdir(struct inode *dir, struct dentry *dentry,
-			    umode_t mode)
+static int kernfs_iop_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			    struct dentry *dentry, umode_t mode)
 {
 	struct kernfs_node *parent = dir->i_private;
 	struct kernfs_syscall_ops *scops = kernfs_root(parent)->syscall_ops;
@@ -1148,7 +1148,8 @@ static int kernfs_iop_rmdir(struct inode *dir, struct dentry *dentry)
 	return ret;
 }
 
-static int kernfs_iop_rename(struct inode *old_dir, struct dentry *old_dentry,
+static int kernfs_iop_rename(struct user_namespace *user_ns,
+			     struct inode *old_dir, struct dentry *old_dentry,
 			     struct inode *new_dir, struct dentry *new_dentry,
 			     unsigned int flags)
 {
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 062593491bbe..8a49b1c2ceed 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -112,7 +112,8 @@ int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr)
 	return ret;
 }
 
-int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr)
+int kernfs_iop_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		       struct iattr *iattr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct kernfs_node *kn = inode->i_private;
@@ -272,7 +273,7 @@ void kernfs_evict_inode(struct inode *inode)
 	kernfs_put(kn);
 }
 
-int kernfs_iop_permission(struct inode *inode, int mask)
+int kernfs_iop_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct kernfs_node *kn;
 
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 7ee97ef59184..3b54bff5fc49 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -89,8 +89,9 @@ extern struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache;
  */
 extern const struct xattr_handler *kernfs_xattr_handlers[];
 void kernfs_evict_inode(struct inode *inode);
-int kernfs_iop_permission(struct inode *inode, int mask);
-int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr);
+int kernfs_iop_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
+int kernfs_iop_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		       struct iattr *iattr);
 int kernfs_iop_getattr(const struct path *path, struct kstat *stat,
 		       u32 request_mask, unsigned int query_flags);
 ssize_t kernfs_iop_listxattr(struct dentry *dentry, char *buf, size_t size);
diff --git a/fs/libfs.c b/fs/libfs.c
index 6aa8bead838f..e97e2c93c916 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -447,9 +447,9 @@ int simple_rmdir(struct inode *dir, struct dentry *dentry)
 }
 EXPORT_SYMBOL(simple_rmdir);
 
-int simple_rename(struct inode *old_dir, struct dentry *old_dentry,
-		  struct inode *new_dir, struct dentry *new_dentry,
-		  unsigned int flags)
+int simple_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		  struct dentry *old_dentry, struct inode *new_dir,
+		  struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *inode = d_inode(old_dentry);
 	int they_are_dirs = d_is_dir(old_dentry);
@@ -492,18 +492,19 @@ EXPORT_SYMBOL(simple_rename);
  * on simple regular filesystems.  Anything that needs to change on-disk
  * or wire state on size changes needs its own setattr method.
  */
-int simple_setattr(struct dentry *dentry, struct iattr *iattr)
+int simple_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		   struct iattr *iattr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
 
-	error = setattr_prepare(&init_user_ns, dentry, iattr);
+	error = setattr_prepare(user_ns, dentry, iattr);
 	if (error)
 		return error;
 
 	if (iattr->ia_valid & ATTR_SIZE)
 		truncate_setsize(inode, iattr->ia_size);
-	setattr_copy(&init_user_ns, inode, iattr);
+	setattr_copy(user_ns, inode, iattr);
 	mark_inode_dirty(inode);
 	return 0;
 }
@@ -1306,7 +1307,8 @@ static int empty_dir_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
-static int empty_dir_setattr(struct dentry *dentry, struct iattr *attr)
+static int empty_dir_setattr(struct user_namespace *user_ns,
+			     struct dentry *dentry, struct iattr *attr)
 {
 	return -EPERM;
 }
@@ -1316,14 +1318,9 @@ static ssize_t empty_dir_listxattr(struct dentry *dentry, char *list, size_t siz
 	return -EOPNOTSUPP;
 }
 
-static int empty_dir_permission(struct inode *inode, int mask)
-{
-	return generic_permission(&init_user_ns, inode, mask);
-}
-
 static const struct inode_operations empty_dir_inode_operations = {
 	.lookup		= empty_dir_lookup,
-	.permission	= empty_dir_permission,
+	.permission	= generic_permission,
 	.setattr	= empty_dir_setattr,
 	.getattr	= empty_dir_getattr,
 	.listxattr	= empty_dir_listxattr,
diff --git a/fs/minix/file.c b/fs/minix/file.c
index f07acd268577..02bfc86b6867 100644
--- a/fs/minix/file.c
+++ b/fs/minix/file.c
@@ -22,7 +22,8 @@ const struct file_operations minix_file_operations = {
 	.splice_read	= generic_file_splice_read,
 };
 
-static int minix_setattr(struct dentry *dentry, struct iattr *attr)
+static int minix_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			 struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
diff --git a/fs/minix/namei.c b/fs/minix/namei.c
index 1a6084d2b02e..3ab9efe6bdcb 100644
--- a/fs/minix/namei.c
+++ b/fs/minix/namei.c
@@ -33,7 +33,8 @@ static struct dentry *minix_lookup(struct inode * dir, struct dentry *dentry, un
 	return d_splice_alias(inode, dentry);
 }
 
-static int minix_mknod(struct inode * dir, struct dentry *dentry, umode_t mode, dev_t rdev)
+static int minix_mknod(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	int error;
 	struct inode *inode;
@@ -51,7 +52,8 @@ static int minix_mknod(struct inode * dir, struct dentry *dentry, umode_t mode,
 	return error;
 }
 
-static int minix_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int minix_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode)
 {
 	int error;
 	struct inode *inode = minix_new_inode(dir, mode, &error);
@@ -63,14 +65,14 @@ static int minix_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return error;
 }
 
-static int minix_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		bool excl)
+static int minix_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
-	return minix_mknod(dir, dentry, mode, 0);
+	return minix_mknod(user_ns, dir, dentry, mode, 0);
 }
 
-static int minix_symlink(struct inode * dir, struct dentry *dentry,
-	  const char * symname)
+static int minix_symlink(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, const char *symname)
 {
 	int err = -ENAMETOOLONG;
 	int i = strlen(symname)+1;
@@ -109,7 +111,8 @@ static int minix_link(struct dentry * old_dentry, struct inode * dir,
 	return add_nondir(dentry, inode);
 }
 
-static int minix_mkdir(struct inode * dir, struct dentry *dentry, umode_t mode)
+static int minix_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct inode * inode;
 	int err;
@@ -181,9 +184,9 @@ static int minix_rmdir(struct inode * dir, struct dentry *dentry)
 	return err;
 }
 
-static int minix_rename(struct inode * old_dir, struct dentry *old_dentry,
-			struct inode * new_dir, struct dentry *new_dentry,
-			unsigned int flags)
+static int minix_rename(struct user_namespace *user_ns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode * old_inode = d_inode(old_dentry);
 	struct inode * new_inode = d_inode(new_dentry);
diff --git a/fs/namei.c b/fs/namei.c
index 5601b6680d4c..1d6a0da8bf81 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -402,7 +402,7 @@ static inline int do_inode_permission(struct user_namespace *user_ns,
 {
 	if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
 		if (likely(inode->i_op->permission))
-			return inode->i_op->permission(inode, mask);
+			return inode->i_op->permission(user_ns, inode, mask);
 
 		/* This gets set once for the inode lifetime */
 		spin_lock(&inode->i_lock);
@@ -2834,7 +2834,7 @@ int vfs_create(struct user_namespace *user_ns, struct inode *dir,
 	error = security_inode_create(dir, dentry, mode);
 	if (error)
 		return error;
-	error = dir->i_op->create(dir, dentry, mode, want_excl);
+	error = dir->i_op->create(user_ns, dir, dentry, mode, want_excl);
 	if (!error)
 		fsnotify_create(dir, dentry);
 	return error;
@@ -3130,14 +3130,18 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 
 	/* Negative dentry, just create the file */
 	if (!dentry->d_inode && (open_flag & O_CREAT)) {
+		struct user_namespace *user_ns;
+
 		file->f_mode |= FMODE_CREATED;
 		audit_inode_child(dir_inode, dentry, AUDIT_TYPE_CHILD_CREATE);
 		if (!dir_inode->i_op->create) {
 			error = -EACCES;
 			goto out_dput;
 		}
-		error = dir_inode->i_op->create(dir_inode, dentry, mode,
-						open_flag & O_EXCL);
+
+		user_ns = mnt_user_ns(nd->path.mnt);
+		error = dir_inode->i_op->create(user_ns, dir_inode, dentry,
+						mode, open_flag & O_EXCL);
 		if (error)
 			goto out_dput;
 	}
@@ -3317,7 +3321,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *user_ns,
 	child = d_alloc(dentry, &slash_name);
 	if (unlikely(!child))
 		goto out_err;
-	error = dir->i_op->tmpfile(dir, child, mode);
+	error = dir->i_op->tmpfile(user_ns, dir, child, mode);
 	if (error)
 		goto out_err;
 	error = -ENOENT;
@@ -3588,7 +3592,7 @@ int vfs_mknod(struct user_namespace *user_ns, struct inode *dir,
 	if (error)
 		return error;
 
-	error = dir->i_op->mknod(dir, dentry, mode, dev);
+	error = dir->i_op->mknod(user_ns, dir, dentry, mode, dev);
 	if (!error)
 		fsnotify_create(dir, dentry);
 	return error;
@@ -3692,7 +3696,7 @@ int vfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
 	if (max_links && dir->i_nlink >= max_links)
 		return -EMLINK;
 
-	error = dir->i_op->mkdir(dir, dentry, mode);
+	error = dir->i_op->mkdir(user_ns, dir, dentry, mode);
 	if (!error)
 		fsnotify_mkdir(dir, dentry);
 	return error;
@@ -4013,7 +4017,7 @@ int vfs_symlink(struct user_namespace *user_ns, struct inode *dir,
 	if (error)
 		return error;
 
-	error = dir->i_op->symlink(dir, dentry, oldname);
+	error = dir->i_op->symlink(user_ns, dir, dentry, oldname);
 	if (!error)
 		fsnotify_create(dir, dentry);
 	return error;
@@ -4378,8 +4382,8 @@ int vfs_rename(struct renamedata *rd)
 		if (error)
 			goto out;
 	}
-	error = old_dir->i_op->rename(old_dir, old_dentry,
-				       new_dir, new_dentry, flags);
+	error = old_dir->i_op->rename(rd->new_user_ns, old_dir, old_dentry,
+				      new_dir, new_dentry, flags);
 	if (error)
 		goto out;
 
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c890144c496a..b0404f90134d 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1901,8 +1901,8 @@ EXPORT_SYMBOL_GPL(nfs_instantiate);
  * that the operation succeeded on the server, but an error in the
  * reply path made it appear to have failed.
  */
-int nfs_create(struct inode *dir, struct dentry *dentry,
-		umode_t mode, bool excl)
+int nfs_create(struct user_namespace *user_ns, struct inode *dir,
+	       struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct iattr attr;
 	int open_flags = excl ? O_CREAT | O_EXCL : O_CREAT;
@@ -1930,7 +1930,8 @@ EXPORT_SYMBOL_GPL(nfs_create);
  * See comments for nfs_proc_create regarding failed operations.
  */
 int
-nfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rdev)
+nfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+	  struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct iattr attr;
 	int status;
@@ -1956,7 +1957,8 @@ EXPORT_SYMBOL_GPL(nfs_mknod);
 /*
  * See comments for nfs_proc_create regarding failed operations.
  */
-int nfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+int nfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+	      struct dentry *dentry, umode_t mode)
 {
 	struct iattr attr;
 	int error;
@@ -2101,7 +2103,8 @@ EXPORT_SYMBOL_GPL(nfs_unlink);
  * now have a new file handle and can instantiate an in-core NFS inode
  * and move the raw page into its mapping.
  */
-int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
+int nfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+		struct dentry *dentry, const char *symname)
 {
 	struct page *page;
 	char *kaddr;
@@ -2204,9 +2207,9 @@ EXPORT_SYMBOL_GPL(nfs_link);
  * If these conditions are met, we can drop the dentries before doing
  * the rename.
  */
-int nfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-	       struct inode *new_dir, struct dentry *new_dentry,
-	       unsigned int flags)
+int nfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+	       struct dentry *old_dentry, struct inode *new_dir,
+	       struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *old_inode = d_inode(old_dentry);
 	struct inode *new_inode = d_inode(new_dentry);
@@ -2745,7 +2748,7 @@ static int nfs_execute_ok(struct inode *inode, int mask)
 	return ret;
 }
 
-int nfs_permission(struct inode *inode, int mask)
+int nfs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	const struct cred *cred = current_cred();
 	int res = 0;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 8d74974d7992..8365131773ce 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -595,7 +595,8 @@ EXPORT_SYMBOL_GPL(nfs_fhget);
 #define NFS_VALID_ATTRS (ATTR_MODE|ATTR_UID|ATTR_GID|ATTR_SIZE|ATTR_ATIME|ATTR_ATIME_SET|ATTR_MTIME|ATTR_MTIME_SET|ATTR_FILE|ATTR_OPEN)
 
 int
-nfs_setattr(struct dentry *dentry, struct iattr *attr)
+nfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+	    struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct nfs_fattr *fattr;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 6673a77884d9..9db2fbe8c9b0 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -364,14 +364,14 @@ extern unsigned long nfs_access_cache_count(struct shrinker *shrink,
 extern unsigned long nfs_access_cache_scan(struct shrinker *shrink,
 					   struct shrink_control *sc);
 struct dentry *nfs_lookup(struct inode *, struct dentry *, unsigned int);
-int nfs_create(struct inode *, struct dentry *, umode_t, bool);
-int nfs_mkdir(struct inode *, struct dentry *, umode_t);
+int nfs_create(struct user_namespace *, struct inode *, struct dentry *, umode_t, bool);
+int nfs_mkdir(struct user_namespace *, struct inode *, struct dentry *, umode_t);
 int nfs_rmdir(struct inode *, struct dentry *);
 int nfs_unlink(struct inode *, struct dentry *);
-int nfs_symlink(struct inode *, struct dentry *, const char *);
+int nfs_symlink(struct user_namespace *, struct inode *, struct dentry *, const char *);
 int nfs_link(struct dentry *, struct inode *, struct dentry *);
-int nfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
-int nfs_rename(struct inode *, struct dentry *,
+int nfs_mknod(struct user_namespace *, struct inode *, struct dentry *, umode_t, dev_t);
+int nfs_rename(struct user_namespace *, struct inode *, struct dentry *,
 	       struct inode *, struct dentry *, unsigned int);
 
 /* file.c */
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 55fc711e368b..1e1d77b6dbbb 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -218,10 +218,11 @@ nfs_namespace_getattr(const struct path *path, struct kstat *stat,
 }
 
 static int
-nfs_namespace_setattr(struct dentry *dentry, struct iattr *attr)
+nfs_namespace_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		      struct iattr *attr)
 {
 	if (NFS_FH(d_inode(dentry))->size != 0)
-		return nfs_setattr(dentry, attr);
+		return nfs_setattr(user_ns, dentry, attr);
 	return -EACCES;
 }
 
diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
index 1b950b66b3bb..4f2665b5d494 100644
--- a/fs/nfs/nfs3_fs.h
+++ b/fs/nfs/nfs3_fs.h
@@ -12,7 +12,8 @@
  */
 #ifdef CONFIG_NFS_V3_ACL
 extern struct posix_acl *nfs3_get_acl(struct inode *inode, int type);
-extern int nfs3_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+extern int nfs3_set_acl(struct user_namespace *user_ns, struct inode *inode,
+			struct posix_acl *acl, int type);
 extern int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
 		struct posix_acl *dfacl);
 extern ssize_t nfs3_listxattr(struct dentry *, char *, size_t);
diff --git a/fs/nfs/nfs3acl.c b/fs/nfs/nfs3acl.c
index c6c863382f37..fbf799d4453d 100644
--- a/fs/nfs/nfs3acl.c
+++ b/fs/nfs/nfs3acl.c
@@ -251,7 +251,8 @@ int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
 
 }
 
-int nfs3_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int nfs3_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type)
 {
 	struct posix_acl *orig = acl, *dfacl = NULL, *alloc;
 	int status;
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 9ac47c8d27a8..abe277d865a9 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -805,7 +805,8 @@ void nilfs_evict_inode(struct inode *inode)
 	 */
 }
 
-int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
+int nilfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *iattr)
 {
 	struct nilfs_transaction_info ti;
 	struct inode *inode = d_inode(dentry);
@@ -843,7 +844,7 @@ int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	return err;
 }
 
-int nilfs_permission(struct inode *inode, int mask)
+int nilfs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct nilfs_root *root = NILFS_I(inode)->i_root;
 
diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
index a6ec7961d4f5..f83acb62f41b 100644
--- a/fs/nilfs2/namei.c
+++ b/fs/nilfs2/namei.c
@@ -72,8 +72,8 @@ nilfs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
  * If the create succeeds, we fill in the inode information
  * with d_instantiate().
  */
-static int nilfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-			bool excl)
+static int nilfs_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct inode *inode;
 	struct nilfs_transaction_info ti;
@@ -100,7 +100,8 @@ static int nilfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 }
 
 static int
-nilfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rdev)
+nilfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+	    struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct inode *inode;
 	struct nilfs_transaction_info ti;
@@ -124,8 +125,8 @@ nilfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rdev)
 	return err;
 }
 
-static int nilfs_symlink(struct inode *dir, struct dentry *dentry,
-			 const char *symname)
+static int nilfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, const char *symname)
 {
 	struct nilfs_transaction_info ti;
 	struct super_block *sb = dir->i_sb;
@@ -201,7 +202,8 @@ static int nilfs_link(struct dentry *old_dentry, struct inode *dir,
 	return err;
 }
 
-static int nilfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int nilfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 	struct nilfs_transaction_info ti;
@@ -338,9 +340,9 @@ static int nilfs_rmdir(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int nilfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-			struct inode *new_dir,	struct dentry *new_dentry,
-			unsigned int flags)
+static int nilfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *old_inode = d_inode(old_dentry);
 	struct inode *new_inode = d_inode(new_dentry);
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index f8450ee3fd06..045e11f1839c 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -267,9 +267,9 @@ extern struct inode *nilfs_iget_for_gc(struct super_block *sb,
 extern void nilfs_update_inode(struct inode *, struct buffer_head *, int);
 extern void nilfs_truncate(struct inode *);
 extern void nilfs_evict_inode(struct inode *);
-extern int nilfs_setattr(struct dentry *, struct iattr *);
+extern int nilfs_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 extern void nilfs_write_failed(struct address_space *mapping, loff_t to);
-int nilfs_permission(struct inode *inode, int mask);
+int nilfs_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 int nilfs_load_inode_block(struct inode *inode, struct buffer_head **pbh);
 extern int nilfs_inode_dirty(struct inode *);
 int nilfs_set_file_dirty(struct inode *inode, unsigned int nr_dirty);
diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index 7e64dbe93251..02c89b17e03d 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -262,7 +262,8 @@ static int ocfs2_set_acl(handle_t *handle,
 	return ret;
 }
 
-int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int ocfs2_iop_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		      struct posix_acl *acl, int type)
 {
 	struct buffer_head *bh = NULL;
 	int status, had_lock;
diff --git a/fs/ocfs2/acl.h b/fs/ocfs2/acl.h
index 127b13432146..9599bd65d49b 100644
--- a/fs/ocfs2/acl.h
+++ b/fs/ocfs2/acl.h
@@ -19,7 +19,8 @@ struct ocfs2_acl_entry {
 };
 
 struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type);
-int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+int ocfs2_iop_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		      struct posix_acl *acl, int type);
 extern int ocfs2_acl_chmod(struct inode *, struct buffer_head *);
 extern int ocfs2_init_acl(handle_t *, struct inode *, struct inode *,
 			  struct buffer_head *, struct buffer_head *,
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index d5e63ca81afe..a5f5cc1e85f2 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -190,7 +190,8 @@ static int dlmfs_file_release(struct inode *inode,
  * We do ->setattr() just to override size changes.  Our size is the size
  * of the LVB and nothing else.
  */
-static int dlmfs_file_setattr(struct dentry *dentry, struct iattr *attr)
+static int dlmfs_file_setattr(struct user_namespace *user_ns,
+			      struct dentry *dentry, struct iattr *attr)
 {
 	int error;
 	struct inode *inode = d_inode(dentry);
@@ -395,7 +396,8 @@ static struct inode *dlmfs_get_inode(struct inode *parent,
  * File creation. Allocate an inode, and we're done..
  */
 /* SMP-safe */
-static int dlmfs_mkdir(struct inode * dir,
+static int dlmfs_mkdir(struct user_namespace * user_ns,
+		       struct inode * dir,
 		       struct dentry * dentry,
 		       umode_t mode)
 {
@@ -443,7 +445,8 @@ static int dlmfs_mkdir(struct inode * dir,
 	return status;
 }
 
-static int dlmfs_create(struct inode *dir,
+static int dlmfs_create(struct user_namespace *user_ns,
+			struct inode *dir,
 			struct dentry *dentry,
 			umode_t mode,
 			bool excl)
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index a070d4c9b6ed..dae02363a37f 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1112,7 +1112,8 @@ static int ocfs2_extend_file(struct inode *inode,
 	return ret;
 }
 
-int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
+int ocfs2_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *attr)
 {
 	int status = 0, size_change;
 	int inode_locked = 0;
@@ -1330,7 +1331,7 @@ int ocfs2_getattr(const struct path *path, struct kstat *stat,
 	return err;
 }
 
-int ocfs2_permission(struct inode *inode, int mask)
+int ocfs2_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	int ret, had_lock;
 	struct ocfs2_lock_holder oh;
diff --git a/fs/ocfs2/file.h b/fs/ocfs2/file.h
index 4832cbceba5b..1886c3770b49 100644
--- a/fs/ocfs2/file.h
+++ b/fs/ocfs2/file.h
@@ -51,10 +51,11 @@ int ocfs2_extend_no_holes(struct inode *inode, struct buffer_head *di_bh,
 			  u64 new_i_size, u64 zero_to);
 int ocfs2_zero_extend(struct inode *inode, struct buffer_head *di_bh,
 		      loff_t zero_to);
-int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
+int ocfs2_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *attr);
 int ocfs2_getattr(const struct path *path, struct kstat *stat,
 		  u32 request_mask, unsigned int flags);
-int ocfs2_permission(struct inode *inode, int mask);
+int ocfs2_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 
 int ocfs2_should_update_atime(struct inode *inode,
 			      struct vfsmount *vfsmnt);
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 51a80acbb97e..268c75f3503b 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -221,7 +221,8 @@ static void ocfs2_cleanup_add_entry_failure(struct ocfs2_super *osb,
 	iput(inode);
 }
 
-static int ocfs2_mknod(struct inode *dir,
+static int ocfs2_mknod(struct user_namespace *user_ns,
+		       struct inode *dir,
 		       struct dentry *dentry,
 		       umode_t mode,
 		       dev_t dev)
@@ -645,7 +646,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
 	return status;
 }
 
-static int ocfs2_mkdir(struct inode *dir,
+static int ocfs2_mkdir(struct user_namespace *user_ns,
+		       struct inode *dir,
 		       struct dentry *dentry,
 		       umode_t mode)
 {
@@ -653,14 +655,15 @@ static int ocfs2_mkdir(struct inode *dir,
 
 	trace_ocfs2_mkdir(dir, dentry, dentry->d_name.len, dentry->d_name.name,
 			  OCFS2_I(dir)->ip_blkno, mode);
-	ret = ocfs2_mknod(dir, dentry, mode | S_IFDIR, 0);
+	ret = ocfs2_mknod(user_ns, dir, dentry, mode | S_IFDIR, 0);
 	if (ret)
 		mlog_errno(ret);
 
 	return ret;
 }
 
-static int ocfs2_create(struct inode *dir,
+static int ocfs2_create(struct user_namespace *user_ns,
+			struct inode *dir,
 			struct dentry *dentry,
 			umode_t mode,
 			bool excl)
@@ -669,7 +672,7 @@ static int ocfs2_create(struct inode *dir,
 
 	trace_ocfs2_create(dir, dentry, dentry->d_name.len, dentry->d_name.name,
 			   (unsigned long long)OCFS2_I(dir)->ip_blkno, mode);
-	ret = ocfs2_mknod(dir, dentry, mode | S_IFREG, 0);
+	ret = ocfs2_mknod(user_ns, dir, dentry, mode | S_IFREG, 0);
 	if (ret)
 		mlog_errno(ret);
 
@@ -1195,7 +1198,8 @@ static void ocfs2_double_unlock(struct inode *inode1, struct inode *inode2)
 		ocfs2_inode_unlock(inode2, 1);
 }
 
-static int ocfs2_rename(struct inode *old_dir,
+static int ocfs2_rename(struct user_namespace *user_ns,
+			struct inode *old_dir,
 			struct dentry *old_dentry,
 			struct inode *new_dir,
 			struct dentry *new_dentry,
@@ -1784,7 +1788,8 @@ static int ocfs2_create_symlink_data(struct ocfs2_super *osb,
 	return status;
 }
 
-static int ocfs2_symlink(struct inode *dir,
+static int ocfs2_symlink(struct user_namespace *user_ns,
+			 struct inode *dir,
 			 struct dentry *dentry,
 			 const char *symname)
 {
diff --git a/fs/omfs/dir.c b/fs/omfs/dir.c
index a0f45651f3b7..0ccd29c89dcf 100644
--- a/fs/omfs/dir.c
+++ b/fs/omfs/dir.c
@@ -279,13 +279,14 @@ static int omfs_add_node(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return err;
 }
 
-static int omfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int omfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	return omfs_add_node(dir, dentry, mode | S_IFDIR);
 }
 
-static int omfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		bool excl)
+static int omfs_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
 	return omfs_add_node(dir, dentry, mode | S_IFREG);
 }
@@ -369,9 +370,9 @@ static bool omfs_fill_chain(struct inode *dir, struct dir_context *ctx,
 	return true;
 }
 
-static int omfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		       struct inode *new_dir, struct dentry *new_dentry,
-		       unsigned int flags)
+static int omfs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		       struct dentry *old_dentry, struct inode *new_dir,
+		       struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *new_inode = d_inode(new_dentry);
 	struct inode *old_inode = d_inode(old_dentry);
diff --git a/fs/omfs/file.c b/fs/omfs/file.c
index 729339cd7902..cc9e2101737d 100644
--- a/fs/omfs/file.c
+++ b/fs/omfs/file.c
@@ -343,7 +343,8 @@ const struct file_operations omfs_file_operations = {
 	.splice_read = generic_file_splice_read,
 };
 
-static int omfs_setattr(struct dentry *dentry, struct iattr *attr)
+static int omfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
diff --git a/fs/orangefs/acl.c b/fs/orangefs/acl.c
index ba55d61906c2..4fb59f03035c 100644
--- a/fs/orangefs/acl.c
+++ b/fs/orangefs/acl.c
@@ -116,7 +116,8 @@ static int __orangefs_set_acl(struct inode *inode, struct posix_acl *acl,
 	return error;
 }
 
-int orangefs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int orangefs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		     struct posix_acl *acl, int type)
 {
 	int error;
 	struct iattr iattr;
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index b94032f77e61..c4693dc1b666 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -871,7 +871,8 @@ int __orangefs_setattr(struct inode *inode, struct iattr *iattr)
 /*
  * Change attributes of an object referenced by dentry.
  */
-int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
+int orangefs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		     struct iattr *iattr)
 {
 	int ret;
 	gossip_debug(GOSSIP_INODE_DEBUG, "__orangefs_setattr: called on %pd\n",
@@ -919,7 +920,7 @@ int orangefs_getattr(const struct path *path, struct kstat *stat,
 	return ret;
 }
 
-int orangefs_permission(struct inode *inode, int mask)
+int orangefs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	int ret;
 
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index 3e7cf3d0a494..9f0789fc8a5f 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -15,7 +15,8 @@
 /*
  * Get a newly allocated inode to go with a negative dentry.
  */
-static int orangefs_create(struct inode *dir,
+static int orangefs_create(struct user_namespace *user_ns,
+			struct inode *dir,
 			struct dentry *dentry,
 			umode_t mode,
 			bool exclusive)
@@ -215,7 +216,8 @@ static int orangefs_unlink(struct inode *dir, struct dentry *dentry)
 	return ret;
 }
 
-static int orangefs_symlink(struct inode *dir,
+static int orangefs_symlink(struct user_namespace *user_ns,
+		         struct inode *dir,
 			 struct dentry *dentry,
 			 const char *symname)
 {
@@ -303,7 +305,8 @@ static int orangefs_symlink(struct inode *dir,
 	return ret;
 }
 
-static int orangefs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int orangefs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			  struct dentry *dentry, umode_t mode)
 {
 	struct orangefs_inode_s *parent = ORANGEFS_I(dir);
 	struct orangefs_kernel_op_s *new_op;
@@ -372,7 +375,8 @@ static int orangefs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 	return ret;
 }
 
-static int orangefs_rename(struct inode *old_dir,
+static int orangefs_rename(struct user_namespace *user_ns,
+			struct inode *old_dir,
 			struct dentry *old_dentry,
 			struct inode *new_dir,
 			struct dentry *new_dentry,
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index e12aeb9623d6..31d3aa505b79 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -107,7 +107,8 @@ extern int orangefs_init_acl(struct inode *inode, struct inode *dir);
 extern const struct xattr_handler *orangefs_xattr_handlers[];
 
 extern struct posix_acl *orangefs_get_acl(struct inode *inode, int type);
-extern int orangefs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+extern int orangefs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+			    struct posix_acl *acl, int type);
 
 /*
  * orangefs data structures
@@ -359,12 +360,12 @@ struct inode *orangefs_new_inode(struct super_block *sb,
 			      struct orangefs_object_kref *ref);
 
 int __orangefs_setattr(struct inode *, struct iattr *);
-int orangefs_setattr(struct dentry *, struct iattr *);
+int orangefs_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 
 int orangefs_getattr(const struct path *path, struct kstat *stat,
 		     u32 request_mask, unsigned int flags);
 
-int orangefs_permission(struct inode *inode, int mask);
+int orangefs_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 
 int orangefs_update_time(struct inode *, struct timespec64 *, int);
 
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index fa6e3fde9a26..b751cba5eaf3 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -650,19 +650,20 @@ static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
 	return err;
 }
 
-static int ovl_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		      bool excl)
+static int ovl_create(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, bool excl)
 {
 	return ovl_create_object(dentry, (mode & 07777) | S_IFREG, 0, NULL);
 }
 
-static int ovl_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ovl_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode)
 {
 	return ovl_create_object(dentry, (mode & 07777) | S_IFDIR, 0, NULL);
 }
 
-static int ovl_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
-		     dev_t rdev)
+static int ovl_mknod(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	/* Don't allow creation of "whiteout" on overlay */
 	if (S_ISCHR(mode) && rdev == WHITEOUT_DEV)
@@ -671,8 +672,8 @@ static int ovl_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
 	return ovl_create_object(dentry, mode, rdev, NULL);
 }
 
-static int ovl_symlink(struct inode *dir, struct dentry *dentry,
-		       const char *link)
+static int ovl_symlink(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, const char *link)
 {
 	return ovl_create_object(dentry, S_IFLNK, 0, link);
 }
@@ -1069,9 +1070,9 @@ static int ovl_set_redirect(struct dentry *dentry, bool samedir)
 	return err;
 }
 
-static int ovl_rename(struct inode *olddir, struct dentry *old,
-		      struct inode *newdir, struct dentry *new,
-		      unsigned int flags)
+static int ovl_rename(struct user_namespace *user_ns, struct inode *olddir,
+		      struct dentry *old, struct inode *newdir,
+		      struct dentry *new, unsigned int flags)
 {
 	int err;
 	struct dentry *old_upperdir;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 2835f33f386a..f819d87b13cb 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -14,7 +14,8 @@
 #include "overlayfs.h"
 
 
-int ovl_setattr(struct dentry *dentry, struct iattr *attr)
+int ovl_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		struct iattr *attr)
 {
 	int err;
 	bool full_copy_up = false;
@@ -277,7 +278,7 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
 	return err;
 }
 
-int ovl_permission(struct inode *inode, int mask)
+int ovl_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct inode *upperinode = ovl_inode_upper(inode);
 	struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 0249ea876dc3..ca935707cceb 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -441,10 +441,11 @@ int ovl_set_nlink_lower(struct dentry *dentry);
 unsigned int ovl_get_nlink(struct ovl_fs *ofs, struct dentry *lowerdentry,
 			   struct dentry *upperdentry,
 			   unsigned int fallback);
-int ovl_setattr(struct dentry *dentry, struct iattr *attr);
+int ovl_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		struct iattr *attr);
 int ovl_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int flags);
-int ovl_permission(struct inode *inode, int mask);
+int ovl_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 int ovl_xattr_set(struct dentry *dentry, struct inode *inode, const char *name,
 		  const void *value, size_t size, int flags);
 int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4dcd092ca561..0d4f2baf6836 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -976,7 +976,7 @@ ovl_posix_acl_xattr_set(const struct xattr_handler *handler,
 	    !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID)) {
 		struct iattr iattr = { .ia_valid = ATTR_KILL_SGID };
 
-		err = ovl_setattr(dentry, &iattr);
+		err = ovl_setattr(user_ns, dentry, &iattr);
 		if (err)
 			return err;
 	}
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index ee6c017802c3..1ae80808b8ba 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -579,7 +579,7 @@ posix_acl_chmod(struct user_namespace *user_ns, struct inode *inode, umode_t mod
 	ret = __posix_acl_chmod(&acl, GFP_KERNEL, mode);
 	if (ret)
 		return ret;
-	ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS);
+	ret = inode->i_op->set_acl(user_ns, inode, acl, ACL_TYPE_ACCESS);
 	posix_acl_release(acl);
 	return ret;
 }
@@ -897,7 +897,7 @@ set_posix_acl(struct user_namespace *user_ns, struct inode *inode,
 		if (ret)
 			return ret;
 	}
-	return inode->i_op->set_acl(inode, acl, type);
+	return inode->i_op->set_acl(user_ns, inode, acl, type);
 }
 EXPORT_SYMBOL(set_posix_acl);
 
@@ -945,12 +945,13 @@ const struct xattr_handler posix_acl_default_xattr_handler = {
 };
 EXPORT_SYMBOL_GPL(posix_acl_default_xattr_handler);
 
-int simple_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+int simple_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		   struct posix_acl *acl, int type)
 {
 	int error;
 
 	if (type == ACL_TYPE_ACCESS) {
-		error = posix_acl_update_mode(&init_user_ns, inode,
+		error = posix_acl_update_mode(user_ns, inode,
 				&inode->i_mode, &acl);
 		if (error)
 			return error;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 541779a52b7d..552d551e1bf9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -685,7 +685,8 @@ static int proc_fd_access_allowed(struct inode *inode)
 	return allowed;
 }
 
-int proc_setattr(struct dentry *dentry, struct iattr *attr)
+int proc_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		 struct iattr *attr)
 {
 	int error;
 	struct inode *inode = d_inode(dentry);
@@ -726,7 +727,7 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info,
 }
 
 
-static int proc_pid_permission(struct inode *inode, int mask)
+static int proc_pid_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb);
 	struct task_struct *task;
@@ -3468,7 +3469,8 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
  * This function makes sure that the node is always accessible for members of
  * same thread group.
  */
-static int proc_tid_comm_permission(struct inode *inode, int mask)
+static int proc_tid_comm_permission(struct user_namespace *user_ns,
+				    struct inode *inode, int mask)
 {
 	bool is_same_tgroup;
 	struct task_struct *task;
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index ad692a39381d..bc43e45095d7 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -294,7 +294,7 @@ static struct dentry *proc_lookupfd(struct inode *dir, struct dentry *dentry,
  * /proc/pid/fd needs a special permission handler so that a process can still
  * access /proc/self/fd after it has executed a setuid().
  */
-int proc_fd_permission(struct inode *inode, int mask)
+int proc_fd_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	struct task_struct *p;
 	int rv;
diff --git a/fs/proc/fd.h b/fs/proc/fd.h
index f371a602bf58..7da993f9d6d5 100644
--- a/fs/proc/fd.h
+++ b/fs/proc/fd.h
@@ -10,7 +10,8 @@ extern const struct inode_operations proc_fd_inode_operations;
 extern const struct file_operations proc_fdinfo_operations;
 extern const struct inode_operations proc_fdinfo_inode_operations;
 
-extern int proc_fd_permission(struct inode *inode, int mask);
+extern int proc_fd_permission(struct user_namespace *user_ns,
+			      struct inode *inode, int mask);
 
 static inline unsigned int proc_fd(struct inode *inode)
 {
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 6b8094a64176..52737d72e74f 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -115,7 +115,8 @@ static bool pde_subdir_insert(struct proc_dir_entry *dir,
 	return true;
 }
 
-static int proc_notify_change(struct dentry *dentry, struct iattr *iattr)
+static int proc_notify_change(struct user_namespace *user_ns,
+			      struct dentry *dentry, struct iattr *iattr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct proc_dir_entry *de = PDE(inode);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 917cc85e3466..7f787218a5f8 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -163,7 +163,7 @@ extern int proc_pid_statm(struct seq_file *, struct pid_namespace *,
  */
 extern const struct dentry_operations pid_dentry_operations;
 extern int pid_getattr(const struct path *, struct kstat *, u32, unsigned int);
-extern int proc_setattr(struct dentry *, struct iattr *);
+extern int proc_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 extern void proc_pid_evict_inode(struct proc_inode *);
 extern struct inode *proc_pid_make_inode(struct super_block *, struct task_struct *, umode_t);
 extern void pid_update_inode(struct task_struct *, struct inode *);
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 87c828348140..3fee773e1bc1 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -785,7 +785,8 @@ static int proc_sys_readdir(struct file *file, struct dir_context *ctx)
 	return 0;
 }
 
-static int proc_sys_permission(struct inode *inode, int mask)
+static int proc_sys_permission(struct user_namespace *user_ns,
+			       struct inode *inode, int mask)
 {
 	/*
 	 * sysctl entries that are not writeable,
@@ -813,7 +814,8 @@ static int proc_sys_permission(struct inode *inode, int mask)
 	return error;
 }
 
-static int proc_sys_setattr(struct dentry *dentry, struct iattr *attr)
+static int proc_sys_setattr(struct user_namespace *user_ns,
+			    struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index f0358fe410d3..f369e24105b6 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -22,7 +22,7 @@
 #include <linux/uaccess.h>
 #include "internal.h"
 
-static int ramfs_nommu_setattr(struct dentry *, struct iattr *);
+static int ramfs_nommu_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 static unsigned long ramfs_nommu_get_unmapped_area(struct file *file,
 						   unsigned long addr,
 						   unsigned long len,
@@ -158,7 +158,8 @@ static int ramfs_nommu_resize(struct inode *inode, loff_t newsize, loff_t size)
  * handle a change of attributes
  * - we're specifically interested in a change of size
  */
-static int ramfs_nommu_setattr(struct dentry *dentry, struct iattr *ia)
+static int ramfs_nommu_setattr(struct user_namespace *user_ns,
+			       struct dentry *dentry, struct iattr *ia)
 {
 	struct inode *inode = d_inode(dentry);
 	unsigned int old_ia_valid = ia->ia_valid;
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index 83641b9614bd..e0b0bd70dc7e 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -101,7 +101,8 @@ struct inode *ramfs_get_inode(struct super_block *sb,
  */
 /* SMP-safe */
 static int
-ramfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
+ramfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+	    struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	struct inode * inode = ramfs_get_inode(dir->i_sb, dir, mode, dev);
 	int error = -ENOSPC;
@@ -115,20 +116,23 @@ ramfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 	return error;
 }
 
-static int ramfs_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
+static int ramfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
-	int retval = ramfs_mknod(dir, dentry, mode | S_IFDIR, 0);
+	int retval = ramfs_mknod(user_ns, dir, dentry, mode | S_IFDIR, 0);
 	if (!retval)
 		inc_nlink(dir);
 	return retval;
 }
 
-static int ramfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl)
+static int ramfs_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
-	return ramfs_mknod(dir, dentry, mode | S_IFREG, 0);
+	return ramfs_mknod(user_ns, dir, dentry, mode | S_IFREG, 0);
 }
 
-static int ramfs_symlink(struct inode * dir, struct dentry *dentry, const char * symname)
+static int ramfs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, const char *symname)
 {
 	struct inode *inode;
 	int error = -ENOSPC;
diff --git a/fs/reiserfs/acl.h b/fs/reiserfs/acl.h
index 0c1c847f992f..d0db0dd16cba 100644
--- a/fs/reiserfs/acl.h
+++ b/fs/reiserfs/acl.h
@@ -49,7 +49,8 @@ static inline int reiserfs_acl_count(size_t size)
 
 #ifdef CONFIG_REISERFS_FS_POSIX_ACL
 struct posix_acl *reiserfs_get_acl(struct inode *inode, int type);
-int reiserfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+int reiserfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		     struct posix_acl *acl, int type);
 int reiserfs_acl_chmod(struct inode *inode);
 int reiserfs_inherit_default_acl(struct reiserfs_transaction_handle *th,
 				 struct inode *dir, struct dentry *dentry,
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 944f2b487cf8..e53f9460162b 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -3282,7 +3282,8 @@ static ssize_t reiserfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	return ret;
 }
 
-int reiserfs_setattr(struct dentry *dentry, struct iattr *attr)
+int reiserfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		     struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	unsigned int ia_valid;
diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
index 6e43aec49b43..dbec92a3a6ab 100644
--- a/fs/reiserfs/namei.c
+++ b/fs/reiserfs/namei.c
@@ -619,8 +619,8 @@ static int new_inode_init(struct inode *inode, struct inode *dir, umode_t mode)
 	return dquot_initialize(inode);
 }
 
-static int reiserfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-			   bool excl)
+static int reiserfs_create(struct user_namespace *user_ns, struct inode *dir,
+			   struct dentry *dentry, umode_t mode, bool excl)
 {
 	int retval;
 	struct inode *inode;
@@ -698,8 +698,8 @@ static int reiserfs_create(struct inode *dir, struct dentry *dentry, umode_t mod
 	return retval;
 }
 
-static int reiserfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
-			  dev_t rdev)
+static int reiserfs_mknod(struct user_namespace *user_ns, struct inode *dir,
+			  struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	int retval;
 	struct inode *inode;
@@ -781,7 +781,8 @@ static int reiserfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode
 	return retval;
 }
 
-static int reiserfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int reiserfs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+			  struct dentry *dentry, umode_t mode)
 {
 	int retval;
 	struct inode *inode;
@@ -1094,8 +1095,9 @@ static int reiserfs_unlink(struct inode *dir, struct dentry *dentry)
 	return retval;
 }
 
-static int reiserfs_symlink(struct inode *parent_dir,
-			    struct dentry *dentry, const char *symname)
+static int reiserfs_symlink(struct user_namespace *user_ns,
+			    struct inode *parent_dir, struct dentry *dentry,
+			    const char *symname)
 {
 	int retval;
 	struct inode *inode;
@@ -1304,7 +1306,8 @@ static void set_ino_in_dir_entry(struct reiserfs_dir_entry *de,
  * one path. If it holds 2 or more, it can get into endless waiting in
  * get_empty_nodes or its clones
  */
-static int reiserfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+static int reiserfs_rename(struct user_namespace *user_ns,
+			   struct inode *old_dir, struct dentry *old_dentry,
 			   struct inode *new_dir, struct dentry *new_dentry,
 			   unsigned int flags)
 {
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h
index f69871516167..695791a8ca22 100644
--- a/fs/reiserfs/reiserfs.h
+++ b/fs/reiserfs/reiserfs.h
@@ -3102,7 +3102,8 @@ static inline void reiserfs_update_sd(struct reiserfs_transaction_handle *th,
 }
 
 void sd_attrs_to_i_attrs(__u16 sd_attrs, struct inode *inode);
-int reiserfs_setattr(struct dentry *dentry, struct iattr *attr);
+int reiserfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		     struct iattr *attr);
 
 int __reiserfs_write_begin(struct page *page, unsigned from, unsigned len);
 
diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c
index ec440d1957a1..56c104092ace 100644
--- a/fs/reiserfs/xattr.c
+++ b/fs/reiserfs/xattr.c
@@ -66,14 +66,14 @@
 static int xattr_create(struct inode *dir, struct dentry *dentry, int mode)
 {
 	BUG_ON(!inode_is_locked(dir));
-	return dir->i_op->create(dir, dentry, mode, true);
+	return dir->i_op->create(&init_user_ns, dir, dentry, mode, true);
 }
 #endif
 
 static int xattr_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 {
 	BUG_ON(!inode_is_locked(dir));
-	return dir->i_op->mkdir(dir, dentry, mode);
+	return dir->i_op->mkdir(&init_user_ns, dir, dentry, mode);
 }
 
 /*
@@ -352,7 +352,7 @@ static int chown_one_xattr(struct dentry *dentry, void *data)
 	 * ATTR_MODE is set.
 	 */
 	attrs->ia_valid &= (ATTR_UID|ATTR_GID);
-	err = reiserfs_setattr(dentry, attrs);
+	err = reiserfs_setattr(&init_user_ns, dentry, attrs);
 	attrs->ia_valid = ia_valid;
 
 	return err;
@@ -604,7 +604,7 @@ reiserfs_xattr_set_handle(struct reiserfs_transaction_handle *th,
 		inode_lock_nested(d_inode(dentry), I_MUTEX_XATTR);
 		inode_dio_wait(d_inode(dentry));
 
-		err = reiserfs_setattr(dentry, &newattrs);
+		err = reiserfs_setattr(&init_user_ns, dentry, &newattrs);
 		inode_unlock(d_inode(dentry));
 	} else
 		update_ctime(inode);
@@ -948,7 +948,7 @@ static int xattr_mount_check(struct super_block *s)
 	return 0;
 }
 
-int reiserfs_permission(struct inode *inode, int mask)
+int reiserfs_permission(struct user_namespace *user_ns, struct inode *inode, int mask)
 {
 	/*
 	 * We don't do permission checks on the internal objects.
diff --git a/fs/reiserfs/xattr.h b/fs/reiserfs/xattr.h
index c764352447ba..070c95ed63e8 100644
--- a/fs/reiserfs/xattr.h
+++ b/fs/reiserfs/xattr.h
@@ -16,7 +16,7 @@ int reiserfs_xattr_init(struct super_block *sb, int mount_flags);
 int reiserfs_lookup_privroot(struct super_block *sb);
 int reiserfs_delete_xattrs(struct inode *inode);
 int reiserfs_chown_xattrs(struct inode *inode, struct iattr *attrs);
-int reiserfs_permission(struct inode *inode, int mask);
+int reiserfs_permission(struct user_namespace *user_ns, struct inode *inode, int mask);
 
 #ifdef CONFIG_REISERFS_FS_XATTR
 #define has_xattr_dir(inode) (REISERFS_I(inode)->i_flags & i_has_xattr_dir)
diff --git a/fs/reiserfs/xattr_acl.c b/fs/reiserfs/xattr_acl.c
index b8f397134c17..81410765d1da 100644
--- a/fs/reiserfs/xattr_acl.c
+++ b/fs/reiserfs/xattr_acl.c
@@ -18,7 +18,8 @@ static int __reiserfs_set_acl(struct reiserfs_transaction_handle *th,
 
 
 int
-reiserfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+reiserfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		 struct posix_acl *acl, int type)
 {
 	int error, error2;
 	struct reiserfs_transaction_handle th;
diff --git a/fs/sysv/file.c b/fs/sysv/file.c
index ca7e216b7b9e..dabbe016de3f 100644
--- a/fs/sysv/file.c
+++ b/fs/sysv/file.c
@@ -29,7 +29,8 @@ const struct file_operations sysv_file_operations = {
 	.splice_read	= generic_file_splice_read,
 };
 
-static int sysv_setattr(struct dentry *dentry, struct iattr *attr)
+static int sysv_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	int error;
diff --git a/fs/sysv/namei.c b/fs/sysv/namei.c
index ea2414b385ec..dd0c5d06c35a 100644
--- a/fs/sysv/namei.c
+++ b/fs/sysv/namei.c
@@ -41,7 +41,8 @@ static struct dentry *sysv_lookup(struct inode * dir, struct dentry * dentry, un
 	return d_splice_alias(inode, dentry);
 }
 
-static int sysv_mknod(struct inode * dir, struct dentry * dentry, umode_t mode, dev_t rdev)
+static int sysv_mknod(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct inode * inode;
 	int err;
@@ -60,13 +61,14 @@ static int sysv_mknod(struct inode * dir, struct dentry * dentry, umode_t mode,
 	return err;
 }
 
-static int sysv_create(struct inode * dir, struct dentry * dentry, umode_t mode, bool excl)
+static int sysv_create(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, bool excl)
 {
-	return sysv_mknod(dir, dentry, mode, 0);
+	return sysv_mknod(user_ns, dir, dentry, mode, 0);
 }
 
-static int sysv_symlink(struct inode * dir, struct dentry * dentry, 
-	const char * symname)
+static int sysv_symlink(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, const char *symname)
 {
 	int err = -ENAMETOOLONG;
 	int l = strlen(symname)+1;
@@ -108,7 +110,8 @@ static int sysv_link(struct dentry * old_dentry, struct inode * dir,
 	return add_nondir(dentry, inode);
 }
 
-static int sysv_mkdir(struct inode * dir, struct dentry *dentry, umode_t mode)
+static int sysv_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode)
 {
 	struct inode * inode;
 	int err;
@@ -186,9 +189,9 @@ static int sysv_rmdir(struct inode * dir, struct dentry * dentry)
  * Anybody can rename anything with this: the permission checks are left to the
  * higher-level routines.
  */
-static int sysv_rename(struct inode * old_dir, struct dentry * old_dentry,
-		       struct inode * new_dir, struct dentry * new_dentry,
-		       unsigned int flags)
+static int sysv_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		       struct dentry *old_dentry, struct inode *new_dir,
+		       struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode * old_inode = d_inode(old_dentry);
 	struct inode * new_inode = d_inode(new_dentry);
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 0ee8c6dfb036..a5ffc05d9734 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -67,7 +67,9 @@ static char *get_dname(struct dentry *dentry)
 	return name;
 }
 
-static int tracefs_syscall_mkdir(struct inode *inode, struct dentry *dentry, umode_t mode)
+static int tracefs_syscall_mkdir(struct user_namespace *user_ns,
+				 struct inode *inode, struct dentry *dentry,
+				 umode_t mode)
 {
 	char *name;
 	int ret;
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 144422c39125..54237ad77a87 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -270,8 +270,8 @@ static struct dentry *ubifs_lookup(struct inode *dir, struct dentry *dentry,
 	return d_splice_alias(inode, dentry);
 }
 
-static int ubifs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-			bool excl)
+static int ubifs_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct inode *inode;
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
@@ -431,8 +431,8 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int ubifs_tmpfile(struct inode *dir, struct dentry *dentry,
-			 umode_t mode)
+static int ubifs_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode)
 {
 	return do_tmpfile(dir, dentry, mode, NULL);
 }
@@ -932,7 +932,8 @@ static int ubifs_rmdir(struct inode *dir, struct dentry *dentry)
 	return err;
 }
 
-static int ubifs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ubifs_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 	struct ubifs_inode *dir_ui = ubifs_inode(dir);
@@ -1003,8 +1004,8 @@ static int ubifs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return err;
 }
 
-static int ubifs_mknod(struct inode *dir, struct dentry *dentry,
-		       umode_t mode, dev_t rdev)
+static int ubifs_mknod(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct inode *inode;
 	struct ubifs_inode *ui;
@@ -1092,8 +1093,8 @@ static int ubifs_mknod(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int ubifs_symlink(struct inode *dir, struct dentry *dentry,
-			 const char *symname)
+static int ubifs_symlink(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, const char *symname)
 {
 	struct inode *inode;
 	struct ubifs_inode *ui;
@@ -1532,9 +1533,9 @@ static int ubifs_xrename(struct inode *old_dir, struct dentry *old_dentry,
 	return err;
 }
 
-static int ubifs_rename(struct inode *old_dir, struct dentry *old_dentry,
-			struct inode *new_dir, struct dentry *new_dentry,
-			unsigned int flags)
+static int ubifs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+			struct dentry *old_dentry, struct inode *new_dir,
+			struct dentry *new_dentry, unsigned int flags)
 {
 	int err;
 	struct ubifs_info *c = old_dir->i_sb->s_fs_info;
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 25041c6d08e3..9c778c5baa13 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1257,7 +1257,8 @@ static int do_setattr(struct ubifs_info *c, struct inode *inode,
 	return err;
 }
 
-int ubifs_setattr(struct dentry *dentry, struct iattr *attr)
+int ubifs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *attr)
 {
 	int err;
 	struct inode *inode = d_inode(dentry);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index 4ffd832e3b93..80df64837d6e 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1989,7 +1989,8 @@ int ubifs_calc_dark(const struct ubifs_info *c, int spc);
 
 /* file.c */
 int ubifs_fsync(struct file *file, loff_t start, loff_t end, int datasync);
-int ubifs_setattr(struct dentry *dentry, struct iattr *attr);
+int ubifs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		  struct iattr *attr);
 int ubifs_update_time(struct inode *inode, struct timespec64 *time, int flags);
 
 /* dir.c */
diff --git a/fs/udf/file.c b/fs/udf/file.c
index 0d5ade491b1a..83f6e158ba17 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -253,7 +253,8 @@ const struct file_operations udf_file_operations = {
 	.llseek			= generic_file_llseek,
 };
 
-static int udf_setattr(struct dentry *dentry, struct iattr *attr)
+static int udf_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		       struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct super_block *sb = inode->i_sb;
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index e169d8fe35b5..900bc7957332 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -604,8 +604,8 @@ static int udf_add_nondir(struct dentry *dentry, struct inode *inode)
 	return 0;
 }
 
-static int udf_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		      bool excl)
+static int udf_create(struct user_namespace *user_ns, struct inode *dir,
+		      struct dentry *dentry, umode_t mode, bool excl)
 {
 	struct inode *inode = udf_new_inode(dir, mode);
 
@@ -623,7 +623,8 @@ static int udf_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 	return udf_add_nondir(dentry, inode);
 }
 
-static int udf_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int udf_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode = udf_new_inode(dir, mode);
 
@@ -642,8 +643,8 @@ static int udf_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return 0;
 }
 
-static int udf_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
-		     dev_t rdev)
+static int udf_mknod(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct inode *inode;
 
@@ -658,7 +659,8 @@ static int udf_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
 	return udf_add_nondir(dentry, inode);
 }
 
-static int udf_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int udf_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 	struct udf_fileident_bh fibh;
@@ -877,8 +879,8 @@ static int udf_unlink(struct inode *dir, struct dentry *dentry)
 	return retval;
 }
 
-static int udf_symlink(struct inode *dir, struct dentry *dentry,
-		       const char *symname)
+static int udf_symlink(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, const char *symname)
 {
 	struct inode *inode = udf_new_inode(dir, S_IFLNK | 0777);
 	struct pathComponent *pc;
@@ -1065,9 +1067,9 @@ static int udf_link(struct dentry *old_dentry, struct inode *dir,
 /* Anybody can rename anything with this: the permission checks are left to the
  * higher-level routines.
  */
-static int udf_rename(struct inode *old_dir, struct dentry *old_dentry,
-		      struct inode *new_dir, struct dentry *new_dentry,
-		      unsigned int flags)
+static int udf_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		      struct dentry *old_dentry, struct inode *new_dir,
+		      struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *old_inode = d_inode(old_dentry);
 	struct inode *new_inode = d_inode(new_dentry);
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index 6b51f3b20143..86646db48c12 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -1211,7 +1211,8 @@ static int ufs_truncate(struct inode *inode, loff_t size)
 	return err;
 }
 
-int ufs_setattr(struct dentry *dentry, struct iattr *attr)
+int ufs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	unsigned int ia_valid = attr->ia_valid;
diff --git a/fs/ufs/namei.c b/fs/ufs/namei.c
index 9ef40f100415..126d172b735d 100644
--- a/fs/ufs/namei.c
+++ b/fs/ufs/namei.c
@@ -69,7 +69,8 @@ static struct dentry *ufs_lookup(struct inode * dir, struct dentry *dentry, unsi
  * If the create succeeds, we fill in the inode information
  * with d_instantiate(). 
  */
-static int ufs_create (struct inode * dir, struct dentry * dentry, umode_t mode,
+static int ufs_create (struct user_namespace * user_ns,
+		struct inode * dir, struct dentry * dentry, umode_t mode,
 		bool excl)
 {
 	struct inode *inode;
@@ -85,7 +86,8 @@ static int ufs_create (struct inode * dir, struct dentry * dentry, umode_t mode,
 	return ufs_add_nondir(dentry, inode);
 }
 
-static int ufs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rdev)
+static int ufs_mknod(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode, dev_t rdev)
 {
 	struct inode *inode;
 	int err;
@@ -104,8 +106,8 @@ static int ufs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev
 	return err;
 }
 
-static int ufs_symlink (struct inode * dir, struct dentry * dentry,
-	const char * symname)
+static int ufs_symlink (struct user_namespace *user_ns, struct inode * dir,
+	struct dentry * dentry, const char * symname)
 {
 	struct super_block * sb = dir->i_sb;
 	int err;
@@ -164,7 +166,8 @@ static int ufs_link (struct dentry * old_dentry, struct inode * dir,
 	return error;
 }
 
-static int ufs_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
+static int ufs_mkdir(struct user_namespace * user_ns, struct inode * dir,
+	struct dentry * dentry, umode_t mode)
 {
 	struct inode * inode;
 	int err;
@@ -240,9 +243,9 @@ static int ufs_rmdir (struct inode * dir, struct dentry *dentry)
 	return err;
 }
 
-static int ufs_rename(struct inode *old_dir, struct dentry *old_dentry,
-		      struct inode *new_dir, struct dentry *new_dentry,
-		      unsigned int flags)
+static int ufs_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		      struct dentry *old_dentry, struct inode *new_dir,
+		      struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *old_inode = d_inode(old_dentry);
 	struct inode *new_inode = d_inode(new_dentry);
diff --git a/fs/ufs/ufs.h b/fs/ufs/ufs.h
index b49e0efdf3d7..89f916280916 100644
--- a/fs/ufs/ufs.h
+++ b/fs/ufs/ufs.h
@@ -123,7 +123,8 @@ extern struct inode *ufs_iget(struct super_block *, unsigned long);
 extern int ufs_write_inode (struct inode *, struct writeback_control *);
 extern int ufs_sync_inode (struct inode *);
 extern void ufs_evict_inode (struct inode *);
-extern int ufs_setattr(struct dentry *dentry, struct iattr *attr);
+extern int ufs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		       struct iattr *attr);
 
 /* namei.c */
 extern const struct file_operations ufs_dir_operations;
diff --git a/fs/vboxsf/dir.c b/fs/vboxsf/dir.c
index 4d569f14a8d8..7e984883f0c7 100644
--- a/fs/vboxsf/dir.c
+++ b/fs/vboxsf/dir.c
@@ -288,13 +288,15 @@ static int vboxsf_dir_create(struct inode *parent, struct dentry *dentry,
 	return 0;
 }
 
-static int vboxsf_dir_mkfile(struct inode *parent, struct dentry *dentry,
+static int vboxsf_dir_mkfile(struct user_namespace *user_ns,
+			     struct inode *parent, struct dentry *dentry,
 			     umode_t mode, bool excl)
 {
 	return vboxsf_dir_create(parent, dentry, mode, 0);
 }
 
-static int vboxsf_dir_mkdir(struct inode *parent, struct dentry *dentry,
+static int vboxsf_dir_mkdir(struct user_namespace *user_ns,
+			    struct inode *parent, struct dentry *dentry,
 			    umode_t mode)
 {
 	return vboxsf_dir_create(parent, dentry, mode, 1);
@@ -332,7 +334,8 @@ static int vboxsf_dir_unlink(struct inode *parent, struct dentry *dentry)
 	return 0;
 }
 
-static int vboxsf_dir_rename(struct inode *old_parent,
+static int vboxsf_dir_rename(struct user_namespace *user_ns,
+			     struct inode *old_parent,
 			     struct dentry *old_dentry,
 			     struct inode *new_parent,
 			     struct dentry *new_dentry,
@@ -374,7 +377,8 @@ static int vboxsf_dir_rename(struct inode *old_parent,
 	return err;
 }
 
-static int vboxsf_dir_symlink(struct inode *parent, struct dentry *dentry,
+static int vboxsf_dir_symlink(struct user_namespace *user_ns,
+			      struct inode *parent, struct dentry *dentry,
 			      const char *symname)
 {
 	struct vboxsf_inode *sf_parent_i = VBOXSF_I(parent);
diff --git a/fs/vboxsf/utils.c b/fs/vboxsf/utils.c
index d2cd1c99f48e..8a1f8ea96549 100644
--- a/fs/vboxsf/utils.c
+++ b/fs/vboxsf/utils.c
@@ -237,7 +237,8 @@ int vboxsf_getattr(const struct path *path, struct kstat *kstat,
 	return 0;
 }
 
-int vboxsf_setattr(struct dentry *dentry, struct iattr *iattr)
+int vboxsf_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		   struct iattr *iattr)
 {
 	struct vboxsf_inode *sf_i = VBOXSF_I(d_inode(dentry));
 	struct vboxsf_sbi *sbi = VBOXSF_SBI(dentry->d_sb);
diff --git a/fs/vboxsf/vfsmod.h b/fs/vboxsf/vfsmod.h
index 18f95b00fc33..5b413687d4fb 100644
--- a/fs/vboxsf/vfsmod.h
+++ b/fs/vboxsf/vfsmod.h
@@ -92,7 +92,8 @@ int vboxsf_stat_dentry(struct dentry *dentry, struct shfl_fsobjinfo *info);
 int vboxsf_inode_revalidate(struct dentry *dentry);
 int vboxsf_getattr(const struct path *path, struct kstat *kstat,
 		   u32 request_mask, unsigned int query_flags);
-int vboxsf_setattr(struct dentry *dentry, struct iattr *iattr);
+int vboxsf_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+		   struct iattr *iattr);
 struct shfl_string *vboxsf_path_from_dentry(struct vboxsf_sbi *sbi,
 					    struct dentry *dentry);
 int vboxsf_nlscpy(struct vboxsf_sbi *sbi, char *name, size_t name_bound_len,
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 2e9d2d2878ce..736e6475205e 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -230,7 +230,8 @@ xfs_set_mode(struct inode *inode, umode_t mode)
 }
 
 int
-xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+xfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+	    struct posix_acl *acl, int type)
 {
 	umode_t mode;
 	bool set_mode = false;
diff --git a/fs/xfs/xfs_acl.h b/fs/xfs/xfs_acl.h
index c042c0868016..7aa7c2a7e9d9 100644
--- a/fs/xfs/xfs_acl.h
+++ b/fs/xfs/xfs_acl.h
@@ -11,7 +11,8 @@ struct posix_acl;
 
 #ifdef CONFIG_XFS_POSIX_ACL
 extern struct posix_acl *xfs_get_acl(struct inode *inode, int type);
-extern int xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+extern int xfs_set_acl(struct user_namespace *user_ns, struct inode *inode,
+		       struct posix_acl *acl, int type);
 extern int __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 void xfs_forget_acl(struct inode *inode, const char *name);
 #else
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 36d583ed8d25..9585d2a29d06 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -222,29 +222,32 @@ xfs_generic_create(
 
 STATIC int
 xfs_vn_mknod(
-	struct inode	*dir,
-	struct dentry	*dentry,
-	umode_t		mode,
-	dev_t		rdev)
+	struct user_namespace	*user_ns,
+	struct inode		*dir,
+	struct dentry		*dentry,
+	umode_t			mode,
+	dev_t			rdev)
 {
 	return xfs_generic_create(dir, dentry, mode, rdev, false);
 }
 
 STATIC int
 xfs_vn_create(
-	struct inode	*dir,
-	struct dentry	*dentry,
-	umode_t		mode,
-	bool		flags)
+	struct user_namespace	*user_ns,
+	struct inode		*dir,
+	struct dentry		*dentry,
+	umode_t			mode,
+	bool			flags)
 {
 	return xfs_generic_create(dir, dentry, mode, 0, false);
 }
 
 STATIC int
 xfs_vn_mkdir(
-	struct inode	*dir,
-	struct dentry	*dentry,
-	umode_t		mode)
+	struct user_namespace	*user_ns,
+	struct inode		*dir,
+	struct dentry		*dentry,
+	umode_t			mode)
 {
 	return xfs_generic_create(dir, dentry, mode | S_IFDIR, 0, false);
 }
@@ -363,9 +366,10 @@ xfs_vn_unlink(
 
 STATIC int
 xfs_vn_symlink(
-	struct inode	*dir,
-	struct dentry	*dentry,
-	const char	*symname)
+	struct user_namespace	*user_ns,
+	struct inode		*dir,
+	struct dentry		*dentry,
+	const char		*symname)
 {
 	struct inode	*inode;
 	struct xfs_inode *cip = NULL;
@@ -405,11 +409,12 @@ xfs_vn_symlink(
 
 STATIC int
 xfs_vn_rename(
-	struct inode	*odir,
-	struct dentry	*odentry,
-	struct inode	*ndir,
-	struct dentry	*ndentry,
-	unsigned int	flags)
+	struct user_namespace	*user_ns,
+	struct inode		*odir,
+	struct dentry		*odentry,
+	struct inode		*ndir,
+	struct dentry		*ndentry,
+	unsigned int		flags)
 {
 	struct inode	*new_inode = d_inode(ndentry);
 	int		omode = 0;
@@ -1056,6 +1061,7 @@ xfs_vn_setattr_size(
 
 STATIC int
 xfs_vn_setattr(
+	struct user_namespace	*user_ns,
 	struct dentry		*dentry,
 	struct iattr		*iattr)
 {
@@ -1149,9 +1155,10 @@ xfs_vn_fiemap(
 
 STATIC int
 xfs_vn_tmpfile(
-	struct inode	*dir,
-	struct dentry	*dentry,
-	umode_t		mode)
+	struct user_namespace	*user_ns,
+	struct inode		*dir,
+	struct dentry		*dentry,
+	umode_t			mode)
 {
 	return xfs_generic_create(dir, dentry, mode, 0, true);
 }
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index f266e28ce00d..e38a71986884 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -480,7 +480,8 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize)
 	return ret;
 }
 
-static int zonefs_inode_setattr(struct dentry *dentry, struct iattr *iattr)
+static int zonefs_inode_setattr(struct user_namespace *user_ns,
+				struct dentry *dentry, struct iattr *iattr)
 {
 	struct inode *inode = d_inode(dentry);
 	int ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1f2ec4c3c70b..d83647b5a299 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1907,21 +1907,21 @@ struct file_operations {
 struct inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
 	const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
-	int (*permission) (struct inode *, int);
+	int (*permission) (struct user_namespace *, struct inode *, int);
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
 	int (*readlink) (struct dentry *, char __user *,int);
 
-	int (*create) (struct inode *,struct dentry *, umode_t, bool);
+	int (*create) (struct user_namespace *, struct inode *,struct dentry *, umode_t, bool);
 	int (*link) (struct dentry *,struct inode *,struct dentry *);
 	int (*unlink) (struct inode *,struct dentry *);
-	int (*symlink) (struct inode *,struct dentry *,const char *);
-	int (*mkdir) (struct inode *,struct dentry *,umode_t);
+	int (*symlink) (struct user_namespace *, struct inode *,struct dentry *,const char *);
+	int (*mkdir) (struct user_namespace *, struct inode *,struct dentry *,umode_t);
 	int (*rmdir) (struct inode *,struct dentry *);
-	int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
-	int (*rename) (struct inode *, struct dentry *,
+	int (*mknod) (struct user_namespace *, struct inode *,struct dentry *,umode_t,dev_t);
+	int (*rename) (struct user_namespace *, struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
-	int (*setattr) (struct dentry *, struct iattr *);
+	int (*setattr) (struct user_namespace *, struct dentry *, struct iattr *);
 	int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
 	ssize_t (*listxattr) (struct dentry *, char *, size_t);
 	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
@@ -1930,8 +1930,8 @@ struct inode_operations {
 	int (*atomic_open)(struct inode *, struct dentry *,
 			   struct file *, unsigned open_flag,
 			   umode_t create_mode);
-	int (*tmpfile) (struct inode *, struct dentry *, umode_t);
-	int (*set_acl)(struct inode *, struct posix_acl *, int);
+	int (*tmpfile) (struct user_namespace *, struct inode *, struct dentry *, umode_t);
+	int (*set_acl)(struct user_namespace *, struct inode *, struct posix_acl *, int);
 } ____cacheline_aligned;
 
 static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio,
@@ -3192,15 +3192,17 @@ extern int dcache_dir_open(struct inode *, struct file *);
 extern int dcache_dir_close(struct inode *, struct file *);
 extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
 extern int dcache_readdir(struct file *, struct dir_context *);
-extern int simple_setattr(struct dentry *, struct iattr *);
+extern int simple_setattr(struct user_namespace *, struct dentry *,
+			  struct iattr *);
 extern int simple_getattr(const struct path *, struct kstat *, u32, unsigned int);
 extern int simple_statfs(struct dentry *, struct kstatfs *);
 extern int simple_open(struct inode *inode, struct file *file);
 extern int simple_link(struct dentry *, struct inode *, struct dentry *);
 extern int simple_unlink(struct inode *, struct dentry *);
 extern int simple_rmdir(struct inode *, struct dentry *);
-extern int simple_rename(struct inode *, struct dentry *,
-			 struct inode *, struct dentry *, unsigned int);
+extern int simple_rename(struct user_namespace *, struct inode *,
+			 struct dentry *, struct inode *, struct dentry *,
+			 unsigned int);
 extern void simple_recursive_removal(struct dentry *,
                               void (*callback)(struct dentry *));
 extern int noop_fsync(struct file *, loff_t, loff_t, int);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a2c6455ea3fa..887276b27d4b 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -377,7 +377,7 @@ extern int nfs_post_op_update_inode_force_wcc_locked(struct inode *inode, struct
 extern int nfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
 extern void nfs_access_add_cache(struct inode *, struct nfs_access_entry *);
 extern void nfs_access_set_mask(struct nfs_access_entry *, u32);
-extern int nfs_permission(struct inode *, int);
+extern int nfs_permission(struct user_namespace *, struct inode *, int);
 extern int nfs_open(struct inode *, struct file *);
 extern int nfs_attribute_cache_expired(struct inode *inode);
 extern int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode);
@@ -385,7 +385,7 @@ extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
 extern bool nfs_mapping_need_revalidate_inode(struct inode *inode);
 extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
 extern int nfs_revalidate_mapping_rcu(struct inode *inode);
-extern int nfs_setattr(struct dentry *, struct iattr *);
+extern int nfs_setattr(struct user_namespace *, struct dentry *, struct iattr *);
 extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr, struct nfs_fattr *);
 extern void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
 				struct nfs4_label *label);
diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
index 28a789218d83..5ed0950fd3c2 100644
--- a/include/linux/posix_acl.h
+++ b/include/linux/posix_acl.h
@@ -79,7 +79,7 @@ extern int posix_acl_create(struct inode *, umode_t *, struct posix_acl **,
 		struct posix_acl **);
 extern int posix_acl_update_mode(struct user_namespace *, struct inode *, umode_t *, struct posix_acl **);
 
-extern int simple_set_acl(struct inode *, struct posix_acl *, int);
+extern int simple_set_acl(struct user_namespace *, struct inode *, struct posix_acl *, int);
 extern int simple_acl_create(struct inode *, struct inode *);
 
 struct posix_acl *get_cached_acl(struct inode *inode, int type);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 8b8c300854db..5008e87b026d 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -594,8 +594,8 @@ static int mqueue_create_attr(struct dentry *dentry, umode_t mode, void *arg)
 	return error;
 }
 
-static int mqueue_create(struct inode *dir, struct dentry *dentry,
-				umode_t mode, bool excl)
+static int mqueue_create(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode, bool excl)
 {
 	return mqueue_create_attr(dentry, mode, NULL);
 }
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index cfd2e0868f2d..af416ce033c3 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -152,7 +152,8 @@ static void bpf_dentry_finalize(struct dentry *dentry, struct inode *inode,
 	dir->i_ctime = dir->i_mtime;
 }
 
-static int bpf_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int bpf_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		     struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 
@@ -381,8 +382,8 @@ bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags)
 	return simple_lookup(dir, dentry, flags);
 }
 
-static int bpf_symlink(struct inode *dir, struct dentry *dentry,
-		       const char *target)
+static int bpf_symlink(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, const char *target)
 {
 	char *link = kstrdup(target, GFP_USER | __GFP_NOWARN);
 	struct inode *inode;
diff --git a/mm/shmem.c b/mm/shmem.c
index 7c317b0d06f0..0a933d491702 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1080,7 +1080,8 @@ static int shmem_getattr(const struct path *path, struct kstat *stat,
 	return 0;
 }
 
-static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
+static int shmem_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			 struct iattr *attr)
 {
 	struct inode *inode = d_inode(dentry);
 	struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2922,7 +2923,8 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
  * File creation. Allocate an inode, and we're done..
  */
 static int
-shmem_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
+shmem_mknod(struct user_namespace *user_ns, struct inode *dir,
+	    struct dentry *dentry, umode_t mode, dev_t dev)
 {
 	struct inode *inode;
 	int error = -ENOSPC;
@@ -2951,7 +2953,8 @@ shmem_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 }
 
 static int
-shmem_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
+shmem_tmpfile(struct user_namespace *user_ns, struct inode *dir,
+	      struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 	int error = -ENOSPC;
@@ -2974,20 +2977,21 @@ shmem_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return error;
 }
 
-static int shmem_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int shmem_mkdir(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	int error;
 
-	if ((error = shmem_mknod(dir, dentry, mode | S_IFDIR, 0)))
+	if ((error = shmem_mknod(user_ns, dir, dentry, mode | S_IFDIR, 0)))
 		return error;
 	inc_nlink(dir);
 	return 0;
 }
 
-static int shmem_create(struct inode *dir, struct dentry *dentry, umode_t mode,
-		bool excl)
+static int shmem_create(struct user_namespace *user_ns, struct inode *dir,
+			struct dentry *dentry, umode_t mode, bool excl)
 {
-	return shmem_mknod(dir, dentry, mode | S_IFREG, 0);
+	return shmem_mknod(user_ns, dir, dentry, mode | S_IFREG, 0);
 }
 
 /*
@@ -3067,7 +3071,8 @@ static int shmem_exchange(struct inode *old_dir, struct dentry *old_dentry, stru
 	return 0;
 }
 
-static int shmem_whiteout(struct inode *old_dir, struct dentry *old_dentry)
+static int shmem_whiteout(struct user_namespace *user_ns, struct inode *old_dir,
+			  struct dentry *old_dentry)
 {
 	struct dentry *whiteout;
 	int error;
@@ -3076,7 +3081,7 @@ static int shmem_whiteout(struct inode *old_dir, struct dentry *old_dentry)
 	if (!whiteout)
 		return -ENOMEM;
 
-	error = shmem_mknod(old_dir, whiteout,
+	error = shmem_mknod(user_ns, old_dir, whiteout,
 			    S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
 	dput(whiteout);
 	if (error)
@@ -3099,7 +3104,9 @@ static int shmem_whiteout(struct inode *old_dir, struct dentry *old_dentry)
  * it exists so that the VFS layer correctly free's it when it
  * gets overwritten.
  */
-static int shmem_rename2(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, unsigned int flags)
+static int shmem_rename2(struct user_namespace *user_ns, struct inode *old_dir,
+			 struct dentry *old_dentry, struct inode *new_dir,
+			 struct dentry *new_dentry, unsigned int flags)
 {
 	struct inode *inode = d_inode(old_dentry);
 	int they_are_dirs = S_ISDIR(inode->i_mode);
@@ -3116,7 +3123,7 @@ static int shmem_rename2(struct inode *old_dir, struct dentry *old_dentry, struc
 	if (flags & RENAME_WHITEOUT) {
 		int error;
 
-		error = shmem_whiteout(old_dir, old_dentry);
+		error = shmem_whiteout(user_ns, old_dir, old_dentry);
 		if (error)
 			return error;
 	}
@@ -3140,7 +3147,8 @@ static int shmem_rename2(struct inode *old_dir, struct dentry *old_dentry, struc
 	return 0;
 }
 
-static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
+static int shmem_symlink(struct user_namespace *user_ns, struct inode *dir,
+			 struct dentry *dentry, const char *symname)
 {
 	int error;
 	int len;
diff --git a/net/socket.c b/net/socket.c
index 3bf36327f531..c98229c1bca9 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -537,9 +537,10 @@ static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer,
 	return used;
 }
 
-static int sockfs_setattr(struct dentry *dentry, struct iattr *iattr)
+static int sockfs_setattr(struct user_namespace *user_ns, struct dentry *dentry,
+			  struct iattr *iattr)
 {
-	int err = simple_setattr(dentry, iattr);
+	int err = simple_setattr(user_ns, dentry, iattr);
 
 	if (!err && (iattr->ia_valid & ATTR_UID)) {
 		struct socket *sock = SOCKET_I(d_inode(dentry));
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 5fd4a64e431f..53100d79228e 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -1773,7 +1773,8 @@ int __aafs_profile_mkdir(struct aa_profile *profile, struct dentry *parent)
 	return error;
 }
 
-static int ns_mkdir_op(struct inode *dir, struct dentry *dentry, umode_t mode)
+static int ns_mkdir_op(struct user_namespace *user_ns, struct inode *dir,
+		       struct dentry *dentry, umode_t mode)
 {
 	struct aa_ns *ns, *parent;
 	/* TODO: improve permission check */
diff --git a/security/integrity/evm/evm_secfs.c b/security/integrity/evm/evm_secfs.c
index cfc3075769bb..bbc85637e18b 100644
--- a/security/integrity/evm/evm_secfs.c
+++ b/security/integrity/evm/evm_secfs.c
@@ -219,7 +219,7 @@ static ssize_t evm_write_xattrs(struct file *file, const char __user *buf,
 		newattrs.ia_valid = ATTR_MODE;
 		inode = evm_xattrs->d_inode;
 		inode_lock(inode);
-		err = simple_setattr(evm_xattrs, &newattrs);
+		err = simple_setattr(&init_user_ns, evm_xattrs, &newattrs);
 		inode_unlock(inode);
 		if (!err)
 			err = count;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 30/39] apparmor: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (28 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 29/39] fs: add helpers for idmap mounts Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 31/39] audit: " Christian Brauner
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner

The i_uid and i_gid are only ever used when logging for AppArmor. This is
already broken in a bunch of places where the global root id is reported
instead of the i_uid or i_gid of the file. Nonetheless, be kind and log the
mapped inode if we're coming from an idmapped mount.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 security/apparmor/domain.c |  9 ++++++---
 security/apparmor/file.c   |  5 ++++-
 security/apparmor/lsm.c    | 12 ++++++++----
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 16f184bc48de..4f997dba4573 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -10,12 +10,14 @@
 
 #include <linux/errno.h>
 #include <linux/fdtable.h>
+#include <linux/fs.h>
 #include <linux/file.h>
 #include <linux/mount.h>
 #include <linux/syscalls.h>
 #include <linux/tracehook.h>
 #include <linux/personality.h>
 #include <linux/xattr.h>
+#include <linux/user_namespace.h>
 
 #include "include/audit.h"
 #include "include/apparmorfs.h"
@@ -858,8 +860,10 @@ int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm)
 	const char *info = NULL;
 	int error = 0;
 	bool unsafe = false;
+	struct user_namespace *user_ns = mnt_user_ns(bprm->file->f_path.mnt);
+	kuid_t i_uid = i_uid_into_mnt(user_ns, file_inode(bprm->file));
 	struct path_cond cond = {
-		file_inode(bprm->file)->i_uid,
+		i_uid,
 		file_inode(bprm->file)->i_mode
 	};
 
@@ -967,8 +971,7 @@ int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm)
 	error = fn_for_each(label, profile,
 			aa_audit_file(profile, &nullperms, OP_EXEC, MAY_EXEC,
 				      bprm->filename, NULL, new,
-				      file_inode(bprm->file)->i_uid, info,
-				      error));
+				      i_uid, info, error));
 	aa_put_label(new);
 	goto done;
 }
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
index 92acf9a49405..d6d9e71f1900 100644
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -11,6 +11,8 @@
 #include <linux/tty.h>
 #include <linux/fdtable.h>
 #include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
 
 #include "include/apparmor.h"
 #include "include/audit.h"
@@ -508,8 +510,9 @@ static int __file_path_perm(const char *op, struct aa_label *label,
 {
 	struct aa_profile *profile;
 	struct aa_perms perms = {};
+	struct user_namespace *user_ns = mnt_user_ns(file->f_path.mnt);
 	struct path_cond cond = {
-		.uid = file_inode(file)->i_uid,
+		.uid = i_uid_into_mnt(user_ns, file_inode(file)),
 		.mode = file_inode(file)->i_mode
 	};
 	char *buffer;
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index ffeaee5ed968..ece9afc3994f 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -224,7 +224,8 @@ static int common_perm(const char *op, const struct path *path, u32 mask,
  */
 static int common_perm_cond(const char *op, const struct path *path, u32 mask)
 {
-	struct path_cond cond = { d_backing_inode(path->dentry)->i_uid,
+	struct user_namespace *user_ns = mnt_user_ns(path->mnt);
+	struct path_cond cond = { i_uid_into_mnt(user_ns, d_backing_inode(path->dentry)),
 				  d_backing_inode(path->dentry)->i_mode
 	};
 
@@ -266,12 +267,13 @@ static int common_perm_rm(const char *op, const struct path *dir,
 			  struct dentry *dentry, u32 mask)
 {
 	struct inode *inode = d_backing_inode(dentry);
+	struct user_namespace *user_ns = mnt_user_ns(dir->mnt);
 	struct path_cond cond = { };
 
 	if (!inode || !path_mediated_fs(dentry))
 		return 0;
 
-	cond.uid = inode->i_uid;
+	cond.uid = i_uid_into_mnt(user_ns, inode);
 	cond.mode = inode->i_mode;
 
 	return common_perm_dir_dentry(op, dir, dentry, mask, &cond);
@@ -361,11 +363,12 @@ static int apparmor_path_rename(const struct path *old_dir, struct dentry *old_d
 
 	label = begin_current_label_crit_section();
 	if (!unconfined(label)) {
+		struct user_namespace *user_ns = mnt_user_ns(old_dir->mnt);
 		struct path old_path = { .mnt = old_dir->mnt,
 					 .dentry = old_dentry };
 		struct path new_path = { .mnt = new_dir->mnt,
 					 .dentry = new_dentry };
-		struct path_cond cond = { d_backing_inode(old_dentry)->i_uid,
+		struct path_cond cond = { i_uid_into_mnt(user_ns, d_backing_inode(old_dentry)),
 					  d_backing_inode(old_dentry)->i_mode
 		};
 
@@ -420,8 +423,9 @@ static int apparmor_file_open(struct file *file)
 
 	label = aa_get_newest_cred_label(file->f_cred);
 	if (!unconfined(label)) {
+		struct user_namespace *user_ns = mnt_user_ns(file->f_path.mnt);
 		struct inode *inode = file_inode(file);
-		struct path_cond cond = { inode->i_uid, inode->i_mode };
+		struct path_cond cond = { i_uid_into_mnt(user_ns, inode), inode->i_mode };
 
 		error = aa_path_perm(OP_OPEN, label, &file->f_path, 0,
 				     aa_map_file_to_perms(file), &cond);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 31/39] audit: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (29 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 30/39] apparmor: handle idmapped mounts Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-22 22:17   ` Paul Moore
  2020-11-15 10:37 ` [PATCH v2 32/39] ima: " Christian Brauner
                   ` (10 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner

Audit will sometimes log the inode's i_uid and i_gid. Enable audit to log the
mapped inode when it is accessed from an idmapped mount.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/namei.c            | 14 +++++++-------
 include/linux/audit.h | 10 ++++++----
 ipc/mqueue.c          |  8 ++++----
 kernel/auditsc.c      | 26 ++++++++++++++------------
 4 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1d6a0da8bf81..976ee05c5027 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -986,7 +986,7 @@ static inline int may_follow_link(struct nameidata *nd, const struct inode *inod
 	if (nd->flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	audit_inode(nd->name, nd->stack[0].link.dentry, 0);
+	audit_inode(nd->name, user_ns, nd->stack[0].link.dentry, 0);
 	audit_log_path_denied(AUDIT_ANOM_LINK, "follow_link");
 	return -EACCES;
 }
@@ -2398,7 +2398,7 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
 		retval = path_lookupat(&nd, flags | LOOKUP_REVAL, path);
 
 	if (likely(!retval))
-		audit_inode(name, path->dentry,
+		audit_inode(name, mnt_user_ns(path->mnt), path->dentry,
 			    flags & LOOKUP_MOUNTPOINT ? AUDIT_INODE_NOEVAL : 0);
 	restore_nameidata();
 	putname(name);
@@ -2440,7 +2440,7 @@ static struct filename *filename_parentat(int dfd, struct filename *name,
 	if (likely(!retval)) {
 		*last = nd.last;
 		*type = nd.last_type;
-		audit_inode(name, parent->dentry, AUDIT_INODE_PARENT);
+		audit_inode(name, mnt_user_ns(parent->mnt), parent->dentry, AUDIT_INODE_PARENT);
 	} else {
 		putname(name);
 		name = ERR_PTR(retval);
@@ -3194,7 +3194,7 @@ static const char *open_last_lookups(struct nameidata *nd,
 			if (unlikely(error))
 				return ERR_PTR(error);
 		}
-		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
+		audit_inode(nd->name, mnt_user_ns(nd->path.mnt), dir, AUDIT_INODE_PARENT);
 		/* trailing slashes? */
 		if (unlikely(nd->last.name[nd->last.len]))
 			return ERR_PTR(-EISDIR);
@@ -3260,7 +3260,7 @@ static int do_open(struct nameidata *nd,
 			return error;
 	}
 	if (!(file->f_mode & FMODE_CREATED))
-		audit_inode(nd->name, nd->path.dentry, 0);
+		audit_inode(nd->name, mnt_user_ns(nd->path.mnt), nd->path.dentry, 0);
 	if (open_flag & O_CREAT) {
 		if ((open_flag & O_EXCL) && !(file->f_mode & FMODE_CREATED))
 			return -EEXIST;
@@ -3362,7 +3362,7 @@ static int do_tmpfile(struct nameidata *nd, unsigned flags,
 		goto out2;
 	dput(path.dentry);
 	path.dentry = child;
-	audit_inode(nd->name, child, 0);
+	audit_inode(nd->name, user_ns, child, 0);
 	/* Don't check for other permissions, the inode was just created */
 	error = may_open(&path, 0, op->open_flag);
 	if (error)
@@ -3381,7 +3381,7 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file)
 	struct path path;
 	int error = path_lookupat(nd, flags, &path);
 	if (!error) {
-		audit_inode(nd->name, path.dentry, 0);
+		audit_inode(nd->name, mnt_user_ns(path.mnt), path.dentry, 0);
 		error = vfs_open(&path, file);
 		path_put(&path);
 	}
diff --git a/include/linux/audit.h b/include/linux/audit.h
index b3d859831a31..217d2b0c273e 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -293,8 +293,8 @@ extern void __audit_syscall_exit(int ret_success, long ret_value);
 extern struct filename *__audit_reusename(const __user char *uptr);
 extern void __audit_getname(struct filename *name);
 extern void __audit_getcwd(void);
-extern void __audit_inode(struct filename *name, const struct dentry *dentry,
-				unsigned int flags);
+extern void __audit_inode(struct filename *name, struct user_namespace *user_ns,
+			  const struct dentry *dentry, unsigned int flags);
 extern void __audit_file(const struct file *);
 extern void __audit_inode_child(struct inode *parent,
 				const struct dentry *dentry,
@@ -357,10 +357,11 @@ static inline void audit_getcwd(void)
 		__audit_getcwd();
 }
 static inline void audit_inode(struct filename *name,
+				struct user_namespace *user_ns,
 				const struct dentry *dentry,
 				unsigned int aflags) {
 	if (unlikely(!audit_dummy_context()))
-		__audit_inode(name, dentry, aflags);
+		__audit_inode(name, user_ns, dentry, aflags);
 }
 static inline void audit_file(struct file *file)
 {
@@ -371,7 +372,7 @@ static inline void audit_inode_parent_hidden(struct filename *name,
 						const struct dentry *dentry)
 {
 	if (unlikely(!audit_dummy_context()))
-		__audit_inode(name, dentry,
+		__audit_inode(name, &init_user_ns, dentry,
 				AUDIT_INODE_PARENT | AUDIT_INODE_HIDDEN);
 }
 static inline void audit_inode_child(struct inode *parent,
@@ -587,6 +588,7 @@ static inline void audit_getname(struct filename *name)
 static inline void audit_getcwd(void)
 { }
 static inline void audit_inode(struct filename *name,
+				struct user_namespace *user_ns,
 				const struct dentry *dentry,
 				unsigned int aflags)
 { }
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 5008e87b026d..f3943b2cc003 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -849,8 +849,8 @@ static void remove_notification(struct mqueue_inode_info *info)
 	info->notify_user_ns = NULL;
 }
 
-static int prepare_open(struct dentry *dentry, int oflag, int ro,
-			umode_t mode, struct filename *name,
+static int prepare_open(struct user_namespace *user_ns, struct dentry *dentry,
+			int oflag, int ro, umode_t mode, struct filename *name,
 			struct mq_attr *attr)
 {
 	static const int oflag2acc[O_ACCMODE] = { MAY_READ, MAY_WRITE,
@@ -867,7 +867,7 @@ static int prepare_open(struct dentry *dentry, int oflag, int ro,
 				  mqueue_create_attr, attr);
 	}
 	/* it already existed */
-	audit_inode(name, dentry, 0);
+	audit_inode(name, user_ns, dentry, 0);
 	if ((oflag & (O_CREAT|O_EXCL)) == (O_CREAT|O_EXCL))
 		return -EEXIST;
 	if ((oflag & O_ACCMODE) == (O_RDWR | O_WRONLY))
@@ -903,7 +903,7 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
 		goto out_putfd;
 	}
 	path.mnt = mntget(mnt);
-	error = prepare_open(path.dentry, oflag, ro, mode, name, attr);
+	error = prepare_open(mnt_user_ns(path.mnt), path.dentry, oflag, ro, mode, name, attr);
 	if (!error) {
 		struct file *file = dentry_open(&path, oflag, current_cred());
 		if (!IS_ERR(file))
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index ddb9213a3e81..348e6048f1b7 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1936,6 +1936,7 @@ void __audit_getname(struct filename *name)
 }
 
 static inline int audit_copy_fcaps(struct audit_names *name,
+				   struct user_namespace *user_ns,
 				   const struct dentry *dentry)
 {
 	struct cpu_vfs_cap_data caps;
@@ -1944,7 +1945,7 @@ static inline int audit_copy_fcaps(struct audit_names *name,
 	if (!dentry)
 		return 0;
 
-	rc = get_vfs_caps_from_disk(&init_user_ns, dentry, &caps);
+	rc = get_vfs_caps_from_disk(user_ns, dentry, &caps);
 	if (rc)
 		return rc;
 
@@ -1960,21 +1961,22 @@ static inline int audit_copy_fcaps(struct audit_names *name,
 
 /* Copy inode data into an audit_names. */
 static void audit_copy_inode(struct audit_names *name,
-			     const struct dentry *dentry,
-			     struct inode *inode, unsigned int flags)
+			     struct user_namespace *user_ns,
+			     const struct dentry *dentry, struct inode *inode,
+			     unsigned int flags)
 {
 	name->ino   = inode->i_ino;
 	name->dev   = inode->i_sb->s_dev;
 	name->mode  = inode->i_mode;
-	name->uid   = inode->i_uid;
-	name->gid   = inode->i_gid;
+	name->uid   = i_uid_into_mnt(user_ns, inode);
+	name->gid   = i_gid_into_mnt(user_ns, inode);
 	name->rdev  = inode->i_rdev;
 	security_inode_getsecid(inode, &name->osid);
 	if (flags & AUDIT_INODE_NOEVAL) {
 		name->fcap_ver = -1;
 		return;
 	}
-	audit_copy_fcaps(name, dentry);
+	audit_copy_fcaps(name, user_ns, dentry);
 }
 
 /**
@@ -1983,8 +1985,8 @@ static void audit_copy_inode(struct audit_names *name,
  * @dentry: dentry being audited
  * @flags: attributes for this particular entry
  */
-void __audit_inode(struct filename *name, const struct dentry *dentry,
-		   unsigned int flags)
+void __audit_inode(struct filename *name, struct user_namespace *user_ns,
+		   const struct dentry *dentry, unsigned int flags)
 {
 	struct audit_context *context = audit_context();
 	struct inode *inode = d_backing_inode(dentry);
@@ -2078,12 +2080,12 @@ void __audit_inode(struct filename *name, const struct dentry *dentry,
 		n->type = AUDIT_TYPE_NORMAL;
 	}
 	handle_path(dentry);
-	audit_copy_inode(n, dentry, inode, flags & AUDIT_INODE_NOEVAL);
+	audit_copy_inode(n, user_ns, dentry, inode, flags & AUDIT_INODE_NOEVAL);
 }
 
 void __audit_file(const struct file *file)
 {
-	__audit_inode(NULL, file->f_path.dentry, 0);
+	__audit_inode(NULL, mnt_user_ns(file->f_path.mnt), file->f_path.dentry, 0);
 }
 
 /**
@@ -2175,7 +2177,7 @@ void __audit_inode_child(struct inode *parent,
 		n = audit_alloc_name(context, AUDIT_TYPE_PARENT);
 		if (!n)
 			return;
-		audit_copy_inode(n, NULL, parent, 0);
+		audit_copy_inode(n, &init_user_ns, NULL, parent, 0);
 	}
 
 	if (!found_child) {
@@ -2194,7 +2196,7 @@ void __audit_inode_child(struct inode *parent,
 	}
 
 	if (inode)
-		audit_copy_inode(found_child, dentry, inode, 0);
+		audit_copy_inode(found_child, &init_user_ns, dentry, inode, 0);
 	else
 		found_child->ino = AUDIT_INO_UNSET;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 32/39] ima: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (30 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 31/39] audit: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 33/39] fat: " Christian Brauner
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner

IMA does sometimes access the inode's i_uid and compares it against the rules'
fowner. Enable IMA to handle idmapped mounts by passing down the mount's user
namespace. We simply make use of the helpers we introduced before.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/attr.c                                    |  2 +-
 fs/namei.c                                   |  4 +--
 include/linux/ima.h                          | 15 ++++++-----
 security/integrity/ima/ima.h                 | 19 ++++++++-----
 security/integrity/ima/ima_api.c             | 10 ++++---
 security/integrity/ima/ima_appraise.c        | 14 +++++-----
 security/integrity/ima/ima_asymmetric_keys.c |  2 +-
 security/integrity/ima/ima_main.c            | 28 ++++++++++++--------
 security/integrity/ima/ima_policy.c          | 17 ++++++------
 security/integrity/ima/ima_queue_keys.c      |  2 +-
 10 files changed, 66 insertions(+), 47 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 36383cd3a986..2d55b0c36544 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -357,7 +357,7 @@ int notify_change(struct user_namespace *user_ns, struct dentry *dentry,
 
 	if (!error) {
 		fsnotify_change(dentry, ia_valid);
-		ima_inode_post_setattr(dentry);
+		ima_inode_post_setattr(user_ns, dentry);
 		evm_inode_post_setattr(dentry, ia_valid);
 	}
 
diff --git a/fs/namei.c b/fs/namei.c
index 976ee05c5027..4cebcb002c4f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3333,7 +3333,7 @@ struct dentry *vfs_tmpfile(struct user_namespace *user_ns,
 		inode->i_state |= I_LINKABLE;
 		spin_unlock(&inode->i_lock);
 	}
-	ima_post_create_tmpfile(inode);
+	ima_post_create_tmpfile(user_ns, inode);
 	return child;
 
 out_err:
@@ -3645,7 +3645,7 @@ static long do_mknodat(int dfd, const char __user *filename, umode_t mode,
 			error = vfs_create(user_ns, path.dentry->d_inode,
 					   dentry, mode, true);
 			if (!error)
-				ima_post_path_mknod(dentry);
+				ima_post_path_mknod(user_ns, dentry);
 			break;
 		case S_IFCHR: case S_IFBLK:
 			error = vfs_mknod(user_ns, path.dentry->d_inode, dentry,
diff --git a/include/linux/ima.h b/include/linux/ima.h
index 8fa7bcfb2da2..c3e3c260ad40 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -16,7 +16,7 @@ struct linux_binprm;
 #ifdef CONFIG_IMA
 extern int ima_bprm_check(struct linux_binprm *bprm);
 extern int ima_file_check(struct file *file, int mask);
-extern void ima_post_create_tmpfile(struct inode *inode);
+extern void ima_post_create_tmpfile(struct user_namespace *user_ns, struct inode *inode);
 extern void ima_file_free(struct file *file);
 extern int ima_file_mmap(struct file *file, unsigned long prot);
 extern int ima_file_mprotect(struct vm_area_struct *vma, unsigned long prot);
@@ -27,7 +27,8 @@ extern int ima_read_file(struct file *file, enum kernel_read_file_id id,
 			 bool contents);
 extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
 			      enum kernel_read_file_id id);
-extern void ima_post_path_mknod(struct dentry *dentry);
+extern void ima_post_path_mknod(struct user_namespace *user_ns,
+				struct dentry *dentry);
 extern int ima_file_hash(struct file *file, char *buf, size_t buf_size);
 extern void ima_kexec_cmdline(int kernel_fd, const void *buf, int size);
 
@@ -61,7 +62,8 @@ static inline int ima_file_check(struct file *file, int mask)
 	return 0;
 }
 
-static inline void ima_post_create_tmpfile(struct inode *inode)
+static inline void ima_post_create_tmpfile(struct user_namespace *user_ns,
+					   struct inode *inode)
 {
 }
 
@@ -105,7 +107,8 @@ static inline int ima_post_read_file(struct file *file, void *buf, loff_t size,
 	return 0;
 }
 
-static inline void ima_post_path_mknod(struct dentry *dentry)
+static inline void ima_post_path_mknod(struct user_namespace *user_ns,
+				       struct dentry *dentry)
 {
 	return;
 }
@@ -141,7 +144,7 @@ static inline void ima_post_key_create_or_update(struct key *keyring,
 
 #ifdef CONFIG_IMA_APPRAISE
 extern bool is_ima_appraise_enabled(void);
-extern void ima_inode_post_setattr(struct dentry *dentry);
+extern void ima_inode_post_setattr(struct user_namespace *user_ns, struct dentry *dentry);
 extern int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name,
 		       const void *xattr_value, size_t xattr_value_len);
 extern int ima_inode_removexattr(struct dentry *dentry, const char *xattr_name);
@@ -151,7 +154,7 @@ static inline bool is_ima_appraise_enabled(void)
 	return 0;
 }
 
-static inline void ima_inode_post_setattr(struct dentry *dentry)
+static inline void ima_inode_post_setattr(struct user_namespace *user_ns, struct dentry *dentry)
 {
 	return;
 }
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 6ebefec616e4..54da7b1a2faa 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -252,8 +252,9 @@ static inline void ima_process_queued_keys(void) {}
 #endif /* CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS */
 
 /* LIM API function definitions */
-int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
-		   int mask, enum ima_hooks func, int *pcr,
+int ima_get_action(struct user_namespace *user_ns, struct inode *inode,
+		   const struct cred *cred, u32 secid, int mask,
+		   enum ima_hooks func, int *pcr,
 		   struct ima_template_desc **template_desc,
 		   const char *keyring);
 int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func);
@@ -265,7 +266,8 @@ void ima_store_measurement(struct integrity_iint_cache *iint, struct file *file,
 			   struct evm_ima_xattr_data *xattr_value,
 			   int xattr_len, const struct modsig *modsig, int pcr,
 			   struct ima_template_desc *template_desc);
-void process_buffer_measurement(struct inode *inode, const void *buf, int size,
+void process_buffer_measurement(struct user_namespace *user_ns,
+				struct inode *inode, const void *buf, int size,
 				const char *eventname, enum ima_hooks func,
 				int pcr, const char *keyring);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
@@ -280,8 +282,9 @@ void ima_free_template_entry(struct ima_template_entry *entry);
 const char *ima_d_path(const struct path *path, char **pathbuf, char *filename);
 
 /* IMA policy related functions */
-int ima_match_policy(struct inode *inode, const struct cred *cred, u32 secid,
-		     enum ima_hooks func, int mask, int flags, int *pcr,
+int ima_match_policy(struct user_namespace *user_ns, struct inode *inode,
+		     const struct cred *cred, u32 secid, enum ima_hooks func,
+		     int mask, int flags, int *pcr,
 		     struct ima_template_desc **template_desc,
 		     const char *keyring);
 void ima_init_policy(void);
@@ -312,7 +315,8 @@ int ima_appraise_measurement(enum ima_hooks func,
 			     struct file *file, const unsigned char *filename,
 			     struct evm_ima_xattr_data *xattr_value,
 			     int xattr_len, const struct modsig *modsig);
-int ima_must_appraise(struct inode *inode, int mask, enum ima_hooks func);
+int ima_must_appraise(struct user_namespace *user_ns, struct inode *inode,
+		      int mask, enum ima_hooks func);
 void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file);
 enum integrity_status ima_get_cache_status(struct integrity_iint_cache *iint,
 					   enum ima_hooks func);
@@ -339,7 +343,8 @@ static inline int ima_appraise_measurement(enum ima_hooks func,
 	return INTEGRITY_UNKNOWN;
 }
 
-static inline int ima_must_appraise(struct inode *inode, int mask,
+static inline int ima_must_appraise(struct user_namespace *user_ns,
+				    struct inode *inode, int mask,
 				    enum ima_hooks func)
 {
 	return 0;
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 4f39fb93f278..ec51ada849a5 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -162,6 +162,7 @@ void ima_add_violation(struct file *file, const unsigned char *filename,
 
 /**
  * ima_get_action - appraise & measure decision based on policy.
+ * @user_ns: userns of the mount through which the inode is accessed
  * @inode: pointer to the inode associated with the object being validated
  * @cred: pointer to credentials structure to validate
  * @secid: secid of the task being validated
@@ -183,8 +184,9 @@ void ima_add_violation(struct file *file, const unsigned char *filename,
  * Returns IMA_MEASURE, IMA_APPRAISE mask.
  *
  */
-int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
-		   int mask, enum ima_hooks func, int *pcr,
+int ima_get_action(struct user_namespace *user_ns, struct inode *inode,
+		   const struct cred *cred, u32 secid, int mask,
+		   enum ima_hooks func, int *pcr,
 		   struct ima_template_desc **template_desc,
 		   const char *keyring)
 {
@@ -192,8 +194,8 @@ int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
 
 	flags &= ima_policy_flag;
 
-	return ima_match_policy(inode, cred, secid, func, mask, flags, pcr,
-				template_desc, keyring);
+	return ima_match_policy(user_ns, inode, cred, secid, func, mask, flags,
+				pcr, template_desc, keyring);
 }
 
 /*
diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
index 3492f0b2da1c..162f00660a4a 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -63,7 +63,8 @@ bool is_ima_appraise_enabled(void)
  *
  * Return 1 to appraise or hash
  */
-int ima_must_appraise(struct inode *inode, int mask, enum ima_hooks func)
+int ima_must_appraise(struct user_namespace *user_ns, struct inode *inode,
+		      int mask, enum ima_hooks func)
 {
 	u32 secid;
 
@@ -71,8 +72,8 @@ int ima_must_appraise(struct inode *inode, int mask, enum ima_hooks func)
 		return 0;
 
 	security_task_getsecid(current, &secid);
-	return ima_match_policy(inode, current_cred(), secid, func, mask,
-				IMA_APPRAISE | IMA_HASH, NULL, NULL, NULL);
+	return ima_match_policy(user_ns, inode, current_cred(), secid, func,
+				mask, IMA_APPRAISE | IMA_HASH, NULL, NULL, NULL);
 }
 
 static int ima_fix_xattr(struct dentry *dentry,
@@ -345,7 +346,7 @@ int ima_check_blacklist(struct integrity_iint_cache *iint,
 
 		rc = is_binary_blacklisted(digest, digestsize);
 		if ((rc == -EPERM) && (iint->flags & IMA_MEASURE))
-			process_buffer_measurement(NULL, digest, digestsize,
+			process_buffer_measurement(NULL, NULL, digest, digestsize,
 						   "blacklisted-hash", NONE,
 						   pcr, NULL);
 	}
@@ -496,6 +497,7 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file)
 
 /**
  * ima_inode_post_setattr - reflect file metadata changes
+ * @user_ns: user namespace of the mount
  * @dentry: pointer to the affected dentry
  *
  * Changes to a dentry's metadata might result in needing to appraise.
@@ -503,7 +505,7 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file)
  * This function is called from notify_change(), which expects the caller
  * to lock the inode's i_mutex.
  */
-void ima_inode_post_setattr(struct dentry *dentry)
+void ima_inode_post_setattr(struct user_namespace *user_ns, struct dentry *dentry)
 {
 	struct inode *inode = d_backing_inode(dentry);
 	struct integrity_iint_cache *iint;
@@ -513,7 +515,7 @@ void ima_inode_post_setattr(struct dentry *dentry)
 	    || !(inode->i_opflags & IOP_XATTR))
 		return;
 
-	action = ima_must_appraise(inode, MAY_ACCESS, POST_SETATTR);
+	action = ima_must_appraise(user_ns, inode, MAY_ACCESS, POST_SETATTR);
 	if (!action)
 		__vfs_removexattr(&init_user_ns, dentry, XATTR_NAME_IMA);
 	iint = integrity_iint_find(inode);
diff --git a/security/integrity/ima/ima_asymmetric_keys.c b/security/integrity/ima/ima_asymmetric_keys.c
index 1c68c500c26f..9810f3bfa57f 100644
--- a/security/integrity/ima/ima_asymmetric_keys.c
+++ b/security/integrity/ima/ima_asymmetric_keys.c
@@ -58,7 +58,7 @@ void ima_post_key_create_or_update(struct key *keyring, struct key *key,
 	 * if the IMA policy is configured to measure a key linked
 	 * to the given keyring.
 	 */
-	process_buffer_measurement(NULL, payload, payload_len,
+	process_buffer_measurement(NULL, NULL, payload, payload_len,
 				   keyring->description, KEY_CHECK, 0,
 				   keyring->description);
 }
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 2d1af8899cab..562003bab943 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -218,8 +218,8 @@ static int process_measurement(struct file *file, const struct cred *cred,
 	 * bitmask based on the appraise/audit/measurement policy.
 	 * Included is the appraise submask.
 	 */
-	action = ima_get_action(inode, cred, secid, mask, func, &pcr,
-				&template_desc, NULL);
+	action = ima_get_action(mnt_user_ns(file->f_path.mnt), inode, cred,
+				secid, mask, func, &pcr, &template_desc, NULL);
 	violation_check = ((func == FILE_CHECK || func == MMAP_CHECK) &&
 			   (ima_policy_flag & IMA_MEASURE));
 	if (!action && !violation_check)
@@ -431,8 +431,9 @@ int ima_file_mprotect(struct vm_area_struct *vma, unsigned long prot)
 
 	security_task_getsecid(current, &secid);
 	inode = file_inode(vma->vm_file);
-	action = ima_get_action(inode, current_cred(), secid, MAY_EXEC,
-				MMAP_CHECK, &pcr, &template, 0);
+	action = ima_get_action(mnt_user_ns(vma->vm_file->f_path.mnt), inode,
+				current_cred(), secid, MAY_EXEC, MMAP_CHECK,
+				&pcr, &template, 0);
 
 	/* Is the mmap'ed file in policy? */
 	if (!(action & (IMA_MEASURE | IMA_APPRAISE_SUBMASK)))
@@ -568,12 +569,13 @@ EXPORT_SYMBOL_GPL(ima_file_hash);
  * Skip calling process_measurement(), but indicate which newly, created
  * tmpfiles are in policy.
  */
-void ima_post_create_tmpfile(struct inode *inode)
+void ima_post_create_tmpfile(struct user_namespace *user_ns,
+			     struct inode *inode)
 {
 	struct integrity_iint_cache *iint;
 	int must_appraise;
 
-	must_appraise = ima_must_appraise(inode, MAY_ACCESS, FILE_CHECK);
+	must_appraise = ima_must_appraise(user_ns, inode, MAY_ACCESS, FILE_CHECK);
 	if (!must_appraise)
 		return;
 
@@ -589,18 +591,19 @@ void ima_post_create_tmpfile(struct inode *inode)
 
 /**
  * ima_post_path_mknod - mark as a new inode
+ * @user_ns: user namespace of the mount
  * @dentry: newly created dentry
  *
  * Mark files created via the mknodat syscall as new, so that the
  * file data can be written later.
  */
-void ima_post_path_mknod(struct dentry *dentry)
+void ima_post_path_mknod(struct user_namespace *user_ns, struct dentry *dentry)
 {
 	struct integrity_iint_cache *iint;
 	struct inode *inode = dentry->d_inode;
 	int must_appraise;
 
-	must_appraise = ima_must_appraise(inode, MAY_ACCESS, FILE_CHECK);
+	must_appraise = ima_must_appraise(user_ns, inode, MAY_ACCESS, FILE_CHECK);
 	if (!must_appraise)
 		return;
 
@@ -780,6 +783,7 @@ int ima_post_load_data(char *buf, loff_t size,
 
 /*
  * process_buffer_measurement - Measure the buffer to ima log.
+ * @userns: user namespace of the mount through which the inode is accessed
  * @inode: inode associated with the object being measured (NULL for KEY_CHECK)
  * @buf: pointer to the buffer that needs to be added to the log.
  * @size: size of buffer(in bytes).
@@ -790,7 +794,8 @@ int ima_post_load_data(char *buf, loff_t size,
  *
  * Based on policy, the buffer is measured into the ima log.
  */
-void process_buffer_measurement(struct inode *inode, const void *buf, int size,
+void process_buffer_measurement(struct user_namespace *user_ns,
+				struct inode *inode, const void *buf, int size,
 				const char *eventname, enum ima_hooks func,
 				int pcr, const char *keyring)
 {
@@ -823,7 +828,7 @@ void process_buffer_measurement(struct inode *inode, const void *buf, int size,
 	 */
 	if (func) {
 		security_task_getsecid(current, &secid);
-		action = ima_get_action(inode, current_cred(), secid, 0, func,
+		action = ima_get_action(user_ns, inode, current_cred(), secid, 0, func,
 					&pcr, &template, keyring);
 		if (!(action & IMA_MEASURE))
 			return;
@@ -895,7 +900,8 @@ void ima_kexec_cmdline(int kernel_fd, const void *buf, int size)
 	if (!f.file)
 		return;
 
-	process_buffer_measurement(file_inode(f.file), buf, size,
+	process_buffer_measurement(mnt_user_ns(f.file->f_path.mnt),
+				   file_inode(f.file), buf, size,
 				   "kexec-cmdline", KEXEC_CMDLINE, 0, NULL);
 	fdput(f);
 }
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index 9b5adeaa47fc..003d974ce2f3 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -497,10 +497,10 @@ static bool ima_match_keyring(struct ima_rule_entry *rule,
  *
  * Returns true on rule match, false on failure.
  */
-static bool ima_match_rules(struct ima_rule_entry *rule, struct inode *inode,
+static bool ima_match_rules(struct ima_rule_entry *rule,
+			    struct user_namespace *user_ns, struct inode *inode,
 			    const struct cred *cred, u32 secid,
-			    enum ima_hooks func, int mask,
-			    const char *keyring)
+			    enum ima_hooks func, int mask, const char *keyring)
 {
 	int i;
 
@@ -539,7 +539,7 @@ static bool ima_match_rules(struct ima_rule_entry *rule, struct inode *inode,
 	}
 
 	if ((rule->flags & IMA_FOWNER) &&
-	    !rule->fowner_op(inode->i_uid, rule->fowner))
+	    !rule->fowner_op(i_uid_into_mnt(user_ns, inode), rule->fowner))
 		return false;
 	for (i = 0; i < MAX_LSM_RULES; i++) {
 		int rc = 0;
@@ -620,8 +620,9 @@ static int get_subaction(struct ima_rule_entry *rule, enum ima_hooks func)
  * list when walking it.  Reads are many orders of magnitude more numerous
  * than writes so ima_match_policy() is classical RCU candidate.
  */
-int ima_match_policy(struct inode *inode, const struct cred *cred, u32 secid,
-		     enum ima_hooks func, int mask, int flags, int *pcr,
+int ima_match_policy(struct user_namespace *user_ns, struct inode *inode,
+		     const struct cred *cred, u32 secid, enum ima_hooks func,
+		     int mask, int flags, int *pcr,
 		     struct ima_template_desc **template_desc,
 		     const char *keyring)
 {
@@ -637,8 +638,8 @@ int ima_match_policy(struct inode *inode, const struct cred *cred, u32 secid,
 		if (!(entry->action & actmask))
 			continue;
 
-		if (!ima_match_rules(entry, inode, cred, secid, func, mask,
-				     keyring))
+		if (!ima_match_rules(entry, user_ns, inode, cred, secid, func,
+				     mask, keyring))
 			continue;
 
 		action |= entry->flags & IMA_ACTION_FLAGS;
diff --git a/security/integrity/ima/ima_queue_keys.c b/security/integrity/ima/ima_queue_keys.c
index 69a8626a35c0..2bacc4f3e6ba 100644
--- a/security/integrity/ima/ima_queue_keys.c
+++ b/security/integrity/ima/ima_queue_keys.c
@@ -158,7 +158,7 @@ void ima_process_queued_keys(void)
 
 	list_for_each_entry_safe(entry, tmp, &ima_keys, list) {
 		if (!timer_expired)
-			process_buffer_measurement(NULL, entry->payload,
+			process_buffer_measurement(NULL, NULL, entry->payload,
 						   entry->payload_len,
 						   entry->keyring_name,
 						   KEY_CHECK, 0,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 33/39] fat: handle idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (31 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 32/39] ima: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 34/39] ext4: support " Christian Brauner
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Let fat handle idmapped mounts. This allows to have the same fat mount appear
in multiple locations with different id mappings. This allows to expose a vfat
formatted USB stick to multiple user with different ids on the host or in user
namespaces:

mount -o uid=1000,gid=1000 /dev/sdb /mnt

u1001@f2-vm:/lower1$ ls -ln /mnt/
total 4
-rwxr-xr-x 1 1000 1000 4 Oct 28 03:44 aaa
-rwxr-xr-x 1 1000 1000 0 Oct 28 01:09 bbb
-rwxr-xr-x 1 1000 1000 0 Oct 28 01:10 ccc
-rwxr-xr-x 1 1000 1000 0 Oct 28 03:46 ddd
-rwxr-xr-x 1 1000 1000 0 Oct 28 04:01 eee

mount2 --idmap both:1000:1001:1

u1001@f2-vm:/lower1$ ls -ln /lower1/
total 4
-rwxr-xr-x 1 1001 1001 4 Oct 28 03:44 aaa
-rwxr-xr-x 1 1001 1001 0 Oct 28 01:09 bbb
-rwxr-xr-x 1 1001 1001 0 Oct 28 01:10 ccc
-rwxr-xr-x 1 1001 1001 0 Oct 28 03:46 ddd
-rwxr-xr-x 1 1001 1001 0 Oct 28 04:01 eee

u1001@f2-vm:/lower1$ touch /lower1/fff

u1001@f2-vm:/lower1$ ls -ln /lower1/fff
-rwxr-xr-x 1 1001 1001 0 Oct 28 04:03 /lower1/fff

u1001@f2-vm:/lower1$ ls -ln /mnt/fff
-rwxr-xr-x 1 1000 1000 0 Oct 28 04:03 /mnt/fff

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/fat/file.c        | 15 ++++++++-------
 fs/fat/namei_msdos.c |  2 +-
 fs/fat/namei_vfat.c  |  2 +-
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/fat/file.c b/fs/fat/file.c
index 5b12cf209801..15dc8d27aa72 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -398,7 +398,7 @@ int fat_getattr(const struct path *path, struct kstat *stat,
 		u32 request_mask, unsigned int flags)
 {
 	struct inode *inode = d_inode(path->dentry);
-	generic_fillattr(&init_user_ns, inode, stat);
+	generic_fillattr(mnt_user_ns(path->mnt), inode, stat);
 	stat->blksize = MSDOS_SB(inode->i_sb)->cluster_size;
 
 	if (MSDOS_SB(inode->i_sb)->options.nfs == FAT_NFS_NOSTALE_RO) {
@@ -447,12 +447,13 @@ static int fat_sanitize_mode(const struct msdos_sb_info *sbi,
 	return 0;
 }
 
-static int fat_allow_set_time(struct msdos_sb_info *sbi, struct inode *inode)
+static int fat_allow_set_time(struct user_namespace *user_ns,
+			      struct msdos_sb_info *sbi, struct inode *inode)
 {
 	umode_t allow_utime = sbi->options.allow_utime;
 
-	if (!uid_eq(current_fsuid(), inode->i_uid)) {
-		if (in_group_p(inode->i_gid))
+	if (!uid_eq(current_fsuid(), i_uid_into_mnt(user_ns, inode))) {
+		if (in_group_p(i_gid_into_mnt(user_ns, inode)))
 			allow_utime >>= 3;
 		if (allow_utime & MAY_WRITE)
 			return 1;
@@ -477,11 +478,11 @@ int fat_setattr(struct user_namespace *user_ns, struct dentry *dentry,
 	/* Check for setting the inode time. */
 	ia_valid = attr->ia_valid;
 	if (ia_valid & TIMES_SET_FLAGS) {
-		if (fat_allow_set_time(sbi, inode))
+		if (fat_allow_set_time(user_ns, sbi, inode))
 			attr->ia_valid &= ~TIMES_SET_FLAGS;
 	}
 
-	error = setattr_prepare(&init_user_ns, dentry, attr);
+	error = setattr_prepare(user_ns, dentry, attr);
 	attr->ia_valid = ia_valid;
 	if (error) {
 		if (sbi->options.quiet)
@@ -551,7 +552,7 @@ int fat_setattr(struct user_namespace *user_ns, struct dentry *dentry,
 		fat_truncate_time(inode, &attr->ia_mtime, S_MTIME);
 	attr->ia_valid &= ~(ATTR_ATIME|ATTR_CTIME|ATTR_MTIME);
 
-	setattr_copy(&init_user_ns, inode, attr);
+	setattr_copy(user_ns, inode, attr);
 	mark_inode_dirty(inode);
 out:
 	return error;
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 608b0606f3ca..0f871a1b6620 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -666,7 +666,7 @@ static struct file_system_type msdos_fs_type = {
 	.name		= "msdos",
 	.mount		= msdos_mount,
 	.kill_sb	= kill_block_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
 };
 MODULE_ALIAS_FS("msdos");
 
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 34903d14d6a6..177d939b95da 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1063,7 +1063,7 @@ static struct file_system_type vfat_fs_type = {
 	.name		= "vfat",
 	.mount		= vfat_mount,
 	.kill_sb	= kill_block_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
 };
 MODULE_ALIAS_FS("vfat");
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 34/39] ext4: support idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (32 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 33/39] fat: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 35/39] ecryptfs: do not mount on top of " Christian Brauner
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

This enables ext4 to support idmapped mounts. All dedicated helpers we need for
this exist. The vfs will have already made sure that the fsids can be
translated after having been shifted if they are on an idmapped mount.

This implements helpers for the new inode operations that we've added. The core
change is the allocation of a new inode based on the mount's user namespace.
Code duplication is virtually non-existent because we can implement the
non-idmapped mount aware inode methods on top of the idmapped mount aware inode
methods. When the initial user namespace is passed the idmapped mount helpers
are nops and all mounts are marked with the initial user namespace by default.

It is also noteworthy that the idmapped mount implementation allows us to
cleanly handle ioctls() too. I've kept this as a single patch for now since the
change is overall fairly mechanical but I'm happy to split this.

Let's create simple example where we idmap an ext4 filesystem:

 root@f2-vm:~# truncate -s 5G ext4.img

 root@f2-vm:~# mkfs.ext4 ./ext4.img
 mke2fs 1.45.5 (07-Jan-2020)
 Discarding device blocks: done
 Creating filesystem with 1310720 4k blocks and 327680 inodes
 Filesystem UUID: 3fd91794-c6ca-4b0f-9964-289a000919cf
 Superblock backups stored on blocks:
         32768, 98304, 163840, 229376, 294912, 819200, 884736

 Allocating group tables: done
 Writing inode tables: done
 Creating journal (16384 blocks): done
 Writing superblocks and filesystem accounting information: done

 root@f2-vm:~# losetup -f --show ./ext4.img
 /dev/loop0

 root@f2-vm:~# mount /dev/loop0 /mnt

 root@f2-vm:~# ls -al /mnt/
 total 24
 drwxr-xr-x  3 root root  4096 Oct 28 13:34 .
 drwxr-xr-x 30 root root  4096 Oct 28 13:22 ..
 drwx------  2 root root 16384 Oct 28 13:34 lost+found

 # Let's create an idmapped mount at /idmapped1 where we map uid and gid 0 to
 # uid and gid 1000
 root@f2-vm:/# ./mount2 -mb:0:1000:1 /mnt/ /idmapped1/

 root@f2-vm:/# ls -al /idmapped1/
 total 24
 drwxr-xr-x  3 ubuntu ubuntu  4096 Oct 28 13:34 .
 drwxr-xr-x 30 root   root    4096 Oct 28 13:22 ..
 drwx------  2 ubuntu ubuntu 16384 Oct 28 13:34 lost+found

 # Let's create an idmapped mount at /idmapped2 where we map uid and gid 0 to
 # uid and gid 2000
 root@f2-vm:/# ./mount2 -mb:0:2000:1 /mnt/ /idmapped2/

 root@f2-vm:/# ls -al /idmapped2/
 total 24
 drwxr-xr-x  3 2000 2000  4096 Oct 28 13:34 .
 drwxr-xr-x 31 root root  4096 Oct 28 13:39 ..
 drwx------  2 2000 2000 16384 Oct 28 13:34 lost+found

Let's create another example where we idmap the rootfs filesystem without a
mapping for uid 0 and gid 0:

 # Create an idmapped mount of for a full POSIX range of rootfs under /mnt
 # but without a mapping for uid 0 to reduce attack surface

 root@f2-vm:/# ./mount2 -mb:1:1:65536 / /mnt/

 # Since we don't have a mapping for uid and gid 0 all files owned by uid and
 # gid 0 should show up as uid and gid 65534:
 root@f2-vm:/# ls -al /mnt/
 total 664
 drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 .
 drwxr-xr-x 31 root   root      4096 Oct 28 13:39 ..
 lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 bin -> usr/bin
 drwxr-xr-x  4 nobody nogroup   4096 Oct 28 13:17 boot
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:48 dev
 drwxr-xr-x 81 nobody nogroup   4096 Oct 28 04:00 etc
 drwxr-xr-x  4 nobody nogroup   4096 Oct 28 04:00 home
 lrwxrwxrwx  1 nobody nogroup      7 Aug 25 07:44 lib -> usr/lib
 lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib32 -> usr/lib32
 lrwxrwxrwx  1 nobody nogroup      9 Aug 25 07:44 lib64 -> usr/lib64
 lrwxrwxrwx  1 nobody nogroup     10 Aug 25 07:44 libx32 -> usr/libx32
 drwx------  2 nobody nogroup  16384 Aug 25 07:47 lost+found
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 media
 drwxr-xr-x 31 nobody nogroup   4096 Oct 28 13:39 mnt
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 opt
 drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 proc
 drwx--x--x  6 nobody nogroup   4096 Oct 28 13:34 root
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:46 run
 lrwxrwxrwx  1 nobody nogroup      8 Aug 25 07:44 sbin -> usr/sbin
 drwxr-xr-x  2 nobody nogroup   4096 Aug 25 07:44 srv
 drwxr-xr-x  2 nobody nogroup   4096 Apr 15  2020 sys
 drwxrwxrwt 10 nobody nogroup   4096 Oct 28 13:19 tmp
 drwxr-xr-x 14 nobody nogroup   4096 Oct 20 13:00 usr
 drwxr-xr-x 12 nobody nogroup   4096 Aug 25 07:45 var

 # Since we do have a mapping for uid and gid 1000 all files owned by uid and
 # gid 1000 should simply show up as uid and gid 1000:
 root@f2-vm:/# ls -al /mnt/home/ubuntu/
 total 40
 drwxr-xr-x 3 ubuntu ubuntu  4096 Oct 28 00:43 .
 drwxr-xr-x 4 nobody nogroup 4096 Oct 28 04:00 ..
 -rw------- 1 ubuntu ubuntu  2936 Oct 28 12:26 .bash_history
 -rw-r--r-- 1 ubuntu ubuntu   220 Feb 25  2020 .bash_logout
 -rw-r--r-- 1 ubuntu ubuntu  3771 Feb 25  2020 .bashrc
 -rw-r--r-- 1 ubuntu ubuntu   807 Feb 25  2020 .profile
 -rw-r--r-- 1 ubuntu ubuntu     0 Oct 16 16:11 .sudo_as_admin_successful
 -rw------- 1 ubuntu ubuntu  1144 Oct 28 00:43 .viminfo

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
unchanged
---
 fs/ext4/Kconfig  |  9 +++++++++
 fs/ext4/acl.c    |  2 +-
 fs/ext4/ext4.h   | 13 +++++++------
 fs/ext4/ialloc.c |  7 ++++---
 fs/ext4/inode.c  | 11 +++++++----
 fs/ext4/ioctl.c  | 18 ++++++++++--------
 fs/ext4/namei.c  | 30 ++++++++++++++++--------------
 fs/ext4/super.c  |  6 +++++-
 8 files changed, 59 insertions(+), 37 deletions(-)

diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index 619dd35ddd48..5918c05cfe5b 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -118,3 +118,12 @@ config EXT4_KUNIT_TESTS
 	  to the KUnit documentation in Documentation/dev-tools/kunit/.
 
 	  If unsure, say N.
+
+config EXT4_IDMAP_MOUNTS
+	bool "Support vfs idmapped mounts in ext4"
+	depends on EXT4_FS
+	default n
+	help
+	  The vfs allows to expose a filesystem at different mountpoints with
+	  differnet idmappings. Allow ext4 to be exposed through idmapped
+	  mounts.
diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 3ab0a69b974b..7c413bd4d2f2 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -246,7 +246,7 @@ ext4_set_acl(struct user_namespace *user_ns, struct inode *inode,
 	ext4_fc_start_update(inode);
 
 	if ((type == ACL_TYPE_ACCESS) && acl) {
-		error = posix_acl_update_mode(&init_user_ns, inode, &mode, &acl);
+		error = posix_acl_update_mode(user_ns, inode, &mode, &acl);
 		if (error)
 			goto out_stop;
 		if (mode != inode->i_mode)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4c8bdcea0a0c..aa0ddbc0d8c2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2702,18 +2702,19 @@ extern int ext4fs_dirhash(const struct inode *dir, const char *name, int len,
 
 /* ialloc.c */
 extern int ext4_mark_inode_used(struct super_block *sb, int ino);
-extern struct inode *__ext4_new_inode(handle_t *, struct inode *, umode_t,
+extern struct inode *__ext4_new_inode(struct user_namespace *, handle_t *,
+				      struct inode *, umode_t,
 				      const struct qstr *qstr, __u32 goal,
 				      uid_t *owner, __u32 i_flags,
 				      int handle_type, unsigned int line_no,
 				      int nblocks);
 
-#define ext4_new_inode(handle, dir, mode, qstr, goal, owner, i_flags) \
-	__ext4_new_inode((handle), (dir), (mode), (qstr), (goal), (owner), \
-			 i_flags, 0, 0, 0)
-#define ext4_new_inode_start_handle(dir, mode, qstr, goal, owner, \
+#define ext4_new_inode(handle, dir, mode, qstr, goal, owner, i_flags)          \
+	__ext4_new_inode(&init_user_ns, (handle), (dir), (mode), (qstr),       \
+			 (goal), (owner), i_flags, 0, 0, 0)
+#define ext4_new_inode_start_handle(user_ns, dir, mode, qstr, goal, owner, \
 				    type, nblocks)		    \
-	__ext4_new_inode(NULL, (dir), (mode), (qstr), (goal), (owner), \
+	__ext4_new_inode((user_ns), NULL, (dir), (mode), (qstr), (goal), (owner), \
 			 0, (type), __LINE__, (nblocks))
 
 
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index d91f69282311..93251687c6a2 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -919,7 +919,8 @@ static int ext4_xattr_credits_for_new_inode(struct inode *dir, mode_t mode,
  * For other inodes, search forward from the parent directory's block
  * group to find a free inode.
  */
-struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
+struct inode *__ext4_new_inode(struct user_namespace *user_ns,
+			       handle_t *handle, struct inode *dir,
 			       umode_t mode, const struct qstr *qstr,
 			       __u32 goal, uid_t *owner, __u32 i_flags,
 			       int handle_type, unsigned int line_no,
@@ -969,10 +970,10 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 		i_gid_write(inode, owner[1]);
 	} else if (test_opt(sb, GRPID)) {
 		inode->i_mode = mode;
-		inode->i_uid = current_fsuid();
+		inode->i_uid = fsuid_into_mnt(user_ns);
 		inode->i_gid = dir->i_gid;
 	} else
-		inode_init_owner(inode, &init_user_ns, dir, mode);
+		inode_init_owner(inode, user_ns, dir, mode);
 
 	if (ext4_has_feature_project(sb) &&
 	    ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT))
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 90a8c2f29616..c00ceb15fb1b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -20,6 +20,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/mount.h>
 #include <linux/time.h>
 #include <linux/highuid.h>
 #include <linux/pagemap.h>
@@ -5324,7 +5325,7 @@ int ext4_setattr(struct user_namespace *user_ns, struct dentry *dentry,
 				  ATTR_GID | ATTR_TIMES_SET))))
 		return -EPERM;
 
-	error = setattr_prepare(&init_user_ns, dentry, attr);
+	error = setattr_prepare(user_ns, dentry, attr);
 	if (error)
 		return error;
 
@@ -5499,7 +5500,7 @@ int ext4_setattr(struct user_namespace *user_ns, struct dentry *dentry,
 	}
 
 	if (!error) {
-		setattr_copy(&init_user_ns, inode, attr);
+		setattr_copy(user_ns, inode, attr);
 		mark_inode_dirty(inode);
 	}
 
@@ -5511,7 +5512,7 @@ int ext4_setattr(struct user_namespace *user_ns, struct dentry *dentry,
 		ext4_orphan_del(NULL, inode);
 
 	if (!error && (ia_valid & ATTR_MODE))
-		rc = posix_acl_chmod(&init_user_ns, inode, inode->i_mode);
+		rc = posix_acl_chmod(user_ns, inode, inode->i_mode);
 
 err_out:
 	if  (error)
@@ -5525,6 +5526,7 @@ int ext4_setattr(struct user_namespace *user_ns, struct dentry *dentry,
 int ext4_getattr(const struct path *path, struct kstat *stat,
 		 u32 request_mask, unsigned int query_flags)
 {
+	struct user_namespace *user_ns;
 	struct inode *inode = d_inode(path->dentry);
 	struct ext4_inode *raw_inode;
 	struct ext4_inode_info *ei = EXT4_I(inode);
@@ -5558,7 +5560,8 @@ int ext4_getattr(const struct path *path, struct kstat *stat,
 				  STATX_ATTR_NODUMP |
 				  STATX_ATTR_VERITY);
 
-	generic_fillattr(&init_user_ns, inode, stat);
+	user_ns = mnt_user_ns(path->mnt);
+	generic_fillattr(user_ns, inode, stat);
 	return 0;
 }
 
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index e35aba820254..6fff34587f0b 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -111,6 +111,7 @@ void ext4_reset_inode_seed(struct inode *inode)
  *
  */
 static long swap_inode_boot_loader(struct super_block *sb,
+				struct user_namespace *user_ns,
 				struct inode *inode)
 {
 	handle_t *handle;
@@ -139,7 +140,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	}
 
 	if (IS_RDONLY(inode) || IS_APPEND(inode) || IS_IMMUTABLE(inode) ||
-	    !inode_owner_or_capable(&init_user_ns, inode) || !capable(CAP_SYS_ADMIN)) {
+	    !inode_owner_or_capable(user_ns, inode) || !capable(CAP_SYS_ADMIN)) {
 		err = -EPERM;
 		goto journal_err_out;
 	}
@@ -814,6 +815,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	struct inode *inode = file_inode(filp);
 	struct super_block *sb = inode->i_sb;
 	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct user_namespace *user_ns = mnt_user_ns(filp->f_path.mnt);
 	unsigned int flags;
 
 	ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);
@@ -829,7 +831,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case FS_IOC_SETFLAGS: {
 		int err;
 
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EACCES;
 
 		if (get_user(flags, (int __user *) arg))
@@ -871,7 +873,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		__u32 generation;
 		int err;
 
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EPERM;
 
 		if (ext4_has_metadata_csum(inode->i_sb)) {
@@ -1010,7 +1012,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case EXT4_IOC_MIGRATE:
 	{
 		int err;
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EACCES;
 
 		err = mnt_want_write_file(filp);
@@ -1032,7 +1034,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	case EXT4_IOC_ALLOC_DA_BLKS:
 	{
 		int err;
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EACCES;
 
 		err = mnt_want_write_file(filp);
@@ -1051,7 +1053,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		err = mnt_want_write_file(filp);
 		if (err)
 			return err;
-		err = swap_inode_boot_loader(sb, inode);
+		err = swap_inode_boot_loader(sb, user_ns, inode);
 		mnt_drop_write_file(filp);
 		return err;
 	}
@@ -1214,7 +1216,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 
 	case EXT4_IOC_CLEAR_ES_CACHE:
 	{
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EACCES;
 		ext4_clear_inode_es(inode);
 		return 0;
@@ -1260,7 +1262,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 			return -EFAULT;
 
 		/* Make sure caller has proper permission */
-		if (!inode_owner_or_capable(&init_user_ns, inode))
+		if (!inode_owner_or_capable(user_ns, inode))
 			return -EACCES;
 
 		if (fa.fsx_xflags & ~EXT4_SUPPORTED_FS_XFLAGS)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index aa7128f16e10..090526e68a54 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2616,8 +2616,8 @@ static int ext4_create(struct user_namespace *user_ns, struct inode *dir,
 	credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
 		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
 retry:
-	inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
-					    NULL, EXT4_HT_DIR, credits);
+	inode = ext4_new_inode_start_handle(user_ns, dir, mode, &dentry->d_name,
+					    0, NULL, EXT4_HT_DIR, credits);
 	handle = ext4_journal_current_handle();
 	err = PTR_ERR(inode);
 	if (!IS_ERR(inode)) {
@@ -2653,8 +2653,8 @@ static int ext4_mknod(struct user_namespace *user_ns, struct inode *dir,
 	credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
 		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
 retry:
-	inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
-					    NULL, EXT4_HT_DIR, credits);
+	inode = ext4_new_inode_start_handle(user_ns, dir, mode, &dentry->d_name,
+					    0, NULL, EXT4_HT_DIR, credits);
 	handle = ext4_journal_current_handle();
 	err = PTR_ERR(inode);
 	if (!IS_ERR(inode)) {
@@ -2688,7 +2688,7 @@ static int ext4_tmpfile(struct user_namespace *user_ns, struct inode *dir,
 		return err;
 
 retry:
-	inode = ext4_new_inode_start_handle(dir, mode,
+	inode = ext4_new_inode_start_handle(user_ns, dir, mode,
 					    NULL, 0, NULL,
 					    EXT4_HT_DIR,
 			EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) +
@@ -2803,7 +2803,7 @@ static int ext4_mkdir(struct user_namespace *user_ns, struct inode *dir,
 	credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
 		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
 retry:
-	inode = ext4_new_inode_start_handle(dir, S_IFDIR | mode,
+	inode = ext4_new_inode_start_handle(user_ns, dir, S_IFDIR | mode,
 					    &dentry->d_name,
 					    0, NULL, EXT4_HT_DIR, credits);
 	handle = ext4_journal_current_handle();
@@ -3340,7 +3340,7 @@ static int ext4_symlink(struct user_namespace *user_ns, struct inode *dir,
 			  EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
 	}
 
-	inode = ext4_new_inode_start_handle(dir, S_IFLNK|S_IRWXUGO,
+	inode = ext4_new_inode_start_handle(user_ns, dir, S_IFLNK|S_IRWXUGO,
 					    &dentry->d_name, 0, NULL,
 					    EXT4_HT_DIR, credits);
 	handle = ext4_journal_current_handle();
@@ -3672,7 +3672,8 @@ static void ext4_update_dir_count(handle_t *handle, struct ext4_renament *ent)
 	}
 }
 
-static struct inode *ext4_whiteout_for_rename(struct ext4_renament *ent,
+static struct inode *ext4_whiteout_for_rename(struct user_namespace *user_ns,
+					      struct ext4_renament *ent,
 					      int credits, handle_t **h)
 {
 	struct inode *wh;
@@ -3686,7 +3687,8 @@ static struct inode *ext4_whiteout_for_rename(struct ext4_renament *ent,
 	credits += (EXT4_MAXQUOTAS_TRANS_BLOCKS(ent->dir->i_sb) +
 		    EXT4_XATTR_TRANS_BLOCKS + 4);
 retry:
-	wh = ext4_new_inode_start_handle(ent->dir, S_IFCHR | WHITEOUT_MODE,
+	wh = ext4_new_inode_start_handle(user_ns, ent->dir,
+					 S_IFCHR | WHITEOUT_MODE,
 					 &ent->dentry->d_name, 0, NULL,
 					 EXT4_HT_DIR, credits);
 
@@ -3713,9 +3715,9 @@ static struct inode *ext4_whiteout_for_rename(struct ext4_renament *ent,
  * while new_{dentry,inode) refers to the destination dentry/inode
  * This comes from rename(const char *oldpath, const char *newpath)
  */
-static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
-		       struct inode *new_dir, struct dentry *new_dentry,
-		       unsigned int flags)
+static int ext4_rename(struct user_namespace *user_ns, struct inode *old_dir,
+		       struct dentry *old_dentry, struct inode *new_dir,
+		       struct dentry *new_dentry, unsigned int flags)
 {
 	handle_t *handle = NULL;
 	struct ext4_renament old = {
@@ -3799,7 +3801,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 			goto end_rename;
 		}
 	} else {
-		whiteout = ext4_whiteout_for_rename(&old, credits, &handle);
+		whiteout = ext4_whiteout_for_rename(user_ns, &old, credits, &handle);
 		if (IS_ERR(whiteout)) {
 			retval = PTR_ERR(whiteout);
 			whiteout = NULL;
@@ -4113,7 +4115,7 @@ static int ext4_rename2(struct user_namespace *user_ns, struct inode *old_dir,
 					 new_dir, new_dentry);
 	}
 
-	return ext4_rename(old_dir, old_dentry, new_dir, new_dentry, flags);
+	return ext4_rename(user_ns, old_dir, old_dentry, new_dir, new_dentry, flags);
 }
 
 /*
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ef4734b40e2a..ca15283d1552 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5014,7 +5014,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	}
 
 	block = ext4_count_free_clusters(sb);
-	ext4_free_blocks_count_set(sbi->s_es, 
+	ext4_free_blocks_count_set(sbi->s_es,
 				   EXT4_C2B(sbi, block));
 	ext4_superblock_csum_set(sb);
 	err = percpu_counter_init(&sbi->s_freeclusters_counter, block,
@@ -6641,7 +6641,11 @@ static struct file_system_type ext4_fs_type = {
 	.name		= "ext4",
 	.mount		= ext4_mount,
 	.kill_sb	= kill_block_super,
+#ifdef CONFIG_EXT4_IDMAP_MOUNTS
+	.fs_flags	= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+#else
 	.fs_flags	= FS_REQUIRES_DEV,
+#endif
 };
 MODULE_ALIAS_FS("ext4");
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 35/39] ecryptfs: do not mount on top of idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (33 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 34/39] ext4: support " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 36/39] overlayfs: " Christian Brauner
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Prevent ecryptfs from being mounted on top of idmapped mounts until we have
ported it to handle this case and added proper testing for it.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 fs/ecryptfs/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index e63259fdef28..c739f42157db 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -531,6 +531,12 @@ static struct dentry *ecryptfs_mount(struct file_system_type *fs_type, int flags
 		goto out_free;
 	}
 
+	if (mnt_idmapped(path.mnt)) {
+		rc = -EINVAL;
+		printk(KERN_ERR "Mounting on idmapped mounts currently disallowed\n");
+		goto out_free;
+	}
+
 	if (check_ruid && !uid_eq(d_inode(path.dentry)->i_uid, current_uid())) {
 		rc = -EPERM;
 		printk(KERN_ERR "Mount of device (uid: %d) not owned by "
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 36/39] overlayfs: do not mount on top of idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (34 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 35/39] ecryptfs: do not mount on top of " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 12:31   ` Amir Goldstein
  2020-11-15 10:37 ` [PATCH v2 37/39] fs: introduce MOUNT_ATTR_IDMAP Christian Brauner
                   ` (5 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

Prevent overlayfs from being mounted on top of idmapped mounts until we
have ported it to handle this case and added proper testing for it.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 fs/overlayfs/super.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 0d4f2baf6836..3cacc3d3fb65 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1708,6 +1708,12 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
 		if (err)
 			goto out_err;
 
+		if (mnt_idmapped(stack[i].mnt)) {
+			err = -EINVAL;
+			pr_err("idmapped lower layers are currently unsupported\n");
+			goto out_err;
+		}
+
 		lower = strchr(lower, '\0') + 1;
 	}
 
@@ -1939,6 +1945,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 		if (err)
 			goto out_err;
 
+		if (mnt_idmapped(upperpath.mnt)) {
+			err = -EINVAL;
+			pr_err("idmapped lower layers are currently unsupported\n");
+			goto out_err;
+		}
+
 		err = ovl_get_workdir(sb, ofs, &upperpath);
 		if (err)
 			goto out_err;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 37/39] fs: introduce MOUNT_ATTR_IDMAP
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (35 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 36/39] overlayfs: " Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 38/39] selftests: add idmapped mounts xattr selftest Christian Brauner
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig,
	Mauricio Vásquez Bernal

Introduce a new mount bind mount property to allow idmapping mounts. The
MOUNT_ATTR_IDMAP flag can be set via the new mount_setattr() syscall
together with a file descriptor referring to a user namespace.

The user namespace referenced by the namespace file descriptor will be
attached to the bind mount. All interactions with the filesystem going
through that mount will be idmapped according to the mapping specified in
the user namespace attached to it.

Using user namespaces to mark mounts means we can reuse all the existing
infrastructure in the kernel that already exists to handle idmappings and can
also use this for permission checking to allow unprivileged user to create
idmapped mounts in the future.

Idmapping a mount is decoupled from the caller's user and mount namespace.
This means idmapped mounts can be created in the initial user namespace
which is an important use-case for systemd-homed, portable usb-sticks
between systems, sharing data between the initial user namespace and
unprivileged containers, and other use-cases that have been brought up. For
example, assume a home directory where all files are owned by uid and gid
1000 and the home directory is brought to a new laptop where the user has
id 12345. The system administrator can simply create a mount of this home
directory with a mapping of 1000:12345:1 other mappings to indicate the ids
should be kept. (With this it is e.g. also possible to create idmapped
mounts on the host with an identity mapping 1:1:100000 where the root user
is not mapped. A user with root access that e.g. has been pivot rooted into
such a mount on the host will be not be able to execute, read, write, or
create files as root.)

Given that idmapping a mount is decoupled from the caller's user namespace
a sufficiently privileged process such as a container manager can set up a
shifted mount for the container and the container can simply pivot root to
it. There's no need for the container to do anything. The mount will appear
correctly mapped independent of the user namespace the container uses. This
means we don't need to mark a mount as idmappable.

In order to create an idmapped mount the caller must currently be privileged in
the user namespace of the superblock the mount belongs to. Currently, once a
mount has been idmapped we don't allow it to change its mapping. This can be
changed in the future if the use-cases arises.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Mauricio Vásquez Bernal <mauricio@kinvolk.io>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
- Christoph Hellwig <hch@lst.de>:
  - Drop kconfig option to make vfs idmappings unconditional.
  - Move introduction of MOUNT_ATTR_IDMAP to the end of the series after all
    internal changes have been done.
  - Move MOUNT_ATTR_IDMAP handling from build_mount_kattr() to separate
    build_mount_idmapped() helper.
  - Move MNT_IDMAPPED handling from do_mount_setattr() into separate
    do_mount_idmap() helper.
  - Use more helpers instead of one big function for mount attribute changes.
- Mauricio Vásquez Bernal <mauricio@kinvolk.io>:
  - Recalculate flags before checking can_change_locked_flags().
---
 fs/internal.h              |   1 +
 fs/namespace.c             | 122 ++++++++++++++++++++++++++++++++++++-
 fs/proc_namespace.c        |   1 +
 include/linux/fs.h         |   2 +-
 include/linux/mount.h      |   2 +-
 include/uapi/linux/mount.h |   5 +-
 6 files changed, 128 insertions(+), 5 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index a5a6c470dc07..b0978274155f 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -88,6 +88,7 @@ struct mount_kattr {
 	unsigned int propagation;
 	unsigned int lookup_flags;
 	bool recurse;
+	struct user_namespace *mnt_user_ns;
 };
 
 extern struct vfsmount *lookup_mnt(const struct path *);
diff --git a/fs/namespace.c b/fs/namespace.c
index 15fb0ae3f01f..f76292294dbd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -25,6 +25,7 @@
 #include <linux/proc_ns.h>
 #include <linux/magic.h>
 #include <linux/memblock.h>
+#include <linux/proc_fs.h>
 #include <linux/task_work.h>
 #include <linux/sched/task.h>
 #include <uapi/linux/mount.h>
@@ -3465,7 +3466,8 @@ static int build_attr_flags(unsigned int attr_flags, unsigned int *flags)
 			   MOUNT_ATTR_NODEV |
 			   MOUNT_ATTR_NOEXEC |
 			   MOUNT_ATTR__ATIME |
-			   MOUNT_ATTR_NODIRATIME))
+			   MOUNT_ATTR_NODIRATIME |
+			   MOUNT_ATTR_IDMAP))
 		return -EINVAL;
 
 	if (attr_flags & MOUNT_ATTR_RDONLY)
@@ -3478,6 +3480,8 @@ static int build_attr_flags(unsigned int attr_flags, unsigned int *flags)
 		aflags |= MNT_NOEXEC;
 	if (attr_flags & MOUNT_ATTR_NODIRATIME)
 		aflags |= MNT_NODIRATIME;
+	if (attr_flags & MOUNT_ATTR_IDMAP)
+		aflags |= MNT_IDMAPPED;
 
 	*flags = aflags;
 	return 0;
@@ -3505,6 +3509,14 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	if ((flags & ~(FSMOUNT_CLOEXEC)) != 0)
 		return -EINVAL;
 
+	if (attr_flags & ~(MOUNT_ATTR_RDONLY |
+			   MOUNT_ATTR_NOSUID |
+			   MOUNT_ATTR_NODEV |
+			   MOUNT_ATTR_NOEXEC |
+			   MOUNT_ATTR__ATIME |
+			   MOUNT_ATTR_NODIRATIME))
+		return -EINVAL;
+
 	ret = build_attr_flags(attr_flags, &mnt_flags);
 	if (ret)
 		return ret;
@@ -3827,6 +3839,32 @@ static unsigned int recalc_flags(struct mount_kattr *kattr, struct mount *mnt)
 	return flags;
 }
 
+static int can_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt)
+{
+	struct vfsmount *m = &mnt->mnt;
+
+	if (!(kattr->attr_set & MNT_IDMAPPED))
+		return 0;
+
+	/*
+	 * Once a mount has been idmapped we don't allow it to change its
+	 * mapping. It makes things simpler and callers can just create
+	 * another bind-mount they can idmap if they want to.
+	 */
+	if (mnt_idmapped(m))
+		return -EPERM;
+
+	/* The underlying filesystem doesn't support idmapped mounts yet. */
+	if (!(m->mnt_sb->s_type->fs_flags & FS_ALLOW_IDMAP))
+		return -EINVAL;
+
+	/* We're controlling the superblock. */
+	if (!ns_capable(m->mnt_sb->s_user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	return 0;
+}
+
 static struct mount *mount_setattr_prepare(struct mount_kattr *kattr,
 					   struct mount *mnt, int *err)
 {
@@ -3851,6 +3889,10 @@ static struct mount *mount_setattr_prepare(struct mount_kattr *kattr,
 			goto out;
 		}
 
+		*err = can_idmap_mount(kattr, m);
+		if (*err)
+			goto out;
+
 		last = m;
 
 		if ((kattr->attr_set & MNT_READONLY) &&
@@ -3865,6 +3907,17 @@ static struct mount *mount_setattr_prepare(struct mount_kattr *kattr,
 	return last;
 }
 
+static void do_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt)
+{
+	struct user_namespace *user_ns;
+
+	if (!(kattr->attr_set & MNT_IDMAPPED))
+		return;
+
+	user_ns = get_user_ns(kattr->mnt_user_ns);
+	WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns);
+}
+
 static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt,
 				 struct mount *last, int err)
 {
@@ -3874,6 +3927,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt,
 		if (!err) {
 			unsigned int flags;
 
+			do_idmap_mount(kattr, m);
 			flags = recalc_flags(kattr, m);
 			WRITE_ONCE(m->mnt.mnt_flags, flags);
 		}
@@ -3946,6 +4000,61 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
 	return err;
 }
 
+static int build_mount_idmapped(const struct mount_attr *attr,
+				struct mount_kattr *kattr, unsigned int flags)
+{
+	int err = 0;
+	struct ns_common *ns;
+	struct user_namespace *user_ns;
+	struct file *file;
+
+	if (!((attr->attr_set | attr->attr_clr) & MOUNT_ATTR_IDMAP))
+		return 0;
+
+	/*
+	 * We currently do not support clearing an idmapped mount. If this ever
+	 * is a use-case we can revisit this but for now let's keep it simple
+	 * and not allow it.
+	 */
+	if (attr->attr_clr & MOUNT_ATTR_IDMAP)
+		return -EINVAL;
+
+	if (attr->userns_fd > INT_MAX)
+		return -EINVAL;
+
+	file = fget(attr->userns_fd);
+	if (!file)
+		return -EBADF;
+
+	if (!proc_ns_file(file)) {
+		err = -EINVAL;
+		goto out_fput;
+	}
+
+	ns = get_proc_ns(file_inode(file));
+	if (ns->ops->type != CLONE_NEWUSER) {
+		err = -EINVAL;
+		goto out_fput;
+	}
+
+	/*
+	 * The init_user_ns is used to indicate that a vfsmount is not idmapped.
+	 * This is simpler than just having to treat NULL as unmapped. Users
+	 * wanting to idmap a mount to init_user_ns can just use a namespace
+	 * with an identity mapping.
+	 */
+	user_ns = container_of(ns, struct user_namespace, ns);
+	if (user_ns == &init_user_ns) {
+		err = -EPERM;
+		goto out_fput;
+	}
+	kattr->mnt_user_ns = get_user_ns(user_ns);
+
+out_fput:
+	fput(file);
+	return err;
+}
+
 static int build_mount_kattr(const struct mount_attr *attr,
 			     struct mount_kattr *kattr, unsigned int flags)
 {
@@ -4025,7 +4134,15 @@ static int build_mount_kattr(const struct mount_attr *attr,
 		}
 	}
 
-	return 0;
+	return build_mount_idmapped(attr, kattr, flags);
+}
+
+static void finish_mount_kattr(struct mount_kattr *kattr)
+{
+	if (kattr->attr_set & MNT_IDMAPPED) {
+		put_user_ns(kattr->mnt_user_ns);
+		kattr->mnt_user_ns = NULL;
+	}
 }
 
 SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path, unsigned int, flags,
@@ -4064,6 +4181,7 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path, unsigned int
 		return err;
 
 	err = do_mount_setattr(&target, &kattr);
+	finish_mount_kattr(&kattr);
 	path_put(&target);
 	return err;
 }
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index e59d4bb3a89e..5ebc337de8a7 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -71,6 +71,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
 		{ MNT_NODIRATIME, ",nodiratime" },
 		{ MNT_RELATIME, ",relatime" },
 		{ MNT_NOSYMFOLLOW, ",nosymfollow" },
+		{ MNT_IDMAPPED, ",idmapped" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_opts *fs_infop;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d83647b5a299..b89fc633ccc3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2277,7 +2277,7 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
-#define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
+#define FS_ALLOW_IDMAP		32	/* FS has been updated to handle vfs idmappings. */
 #define FS_THP_SUPPORT		8192	/* Remove once all fs converted */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 3c7ba1bd4a21..d6745a0b90db 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -48,7 +48,7 @@ struct fs_context;
 #define MNT_SHARED_MASK	(MNT_UNBINDABLE)
 #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
 				 | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \
-				 | MNT_READONLY | MNT_NOSYMFOLLOW)
+				 | MNT_READONLY | MNT_NOSYMFOLLOW | MNT_IDMAPPED)
 #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME )
 
 #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index fb3ad26fdebf..525cc67363f8 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -117,6 +117,7 @@ enum fsconfig_command {
 #define MOUNT_ATTR_NOATIME	0x00000010 /* - Do not update access times. */
 #define MOUNT_ATTR_STRICTATIME	0x00000020 /* - Always perform atime updates */
 #define MOUNT_ATTR_NODIRATIME	0x00000080 /* Do not update directory access times */
+#define MOUNT_ATTR_IDMAP	0x00100000 /* Idmap this mount to @userns in mount_attr. */
 
 /*
  * mount_setattr()
@@ -125,6 +126,7 @@ struct mount_attr {
 	__u64 attr_set;
 	__u64 attr_clr;
 	__u64 propagation;
+	__u64 userns_fd;
 };
 
 /* Change propagation through mount_setattr(). */
@@ -138,6 +140,7 @@ enum propagation_type {
 
 /* List of all mount_attr versions. */
 #define MOUNT_ATTR_SIZE_VER0	24 /* sizeof first published struct */
-#define MOUNT_ATTR_SIZE_LATEST	MOUNT_ATTR_SIZE_VER0
+#define MOUNT_ATTR_SIZE_VER1	32 /* sizeof second published struct */
+#define MOUNT_ATTR_SIZE_LATEST	MOUNT_ATTR_SIZE_VER1
 
 #endif /* _UAPI_LINUX_MOUNT_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 38/39] selftests: add idmapped mounts xattr selftest
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (36 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 37/39] fs: introduce MOUNT_ATTR_IDMAP Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-15 10:37 ` [PATCH v2 39/39] tests: add vfs/idmapped mounts test suite Christian Brauner
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Tycho Andersen, Christoph Hellwig, Christian Brauner

From: Tycho Andersen <tycho@tycho.pizza>

Add some tests for setting extended attributes on idmapped mounts.

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 .../testing/selftests/idmap_mounts/.gitignore |   1 +
 tools/testing/selftests/idmap_mounts/Makefile |   9 +
 tools/testing/selftests/idmap_mounts/config   |   1 +
 .../testing/selftests/idmap_mounts/internal.h | 116 +++++++
 tools/testing/selftests/idmap_mounts/xattr.c  | 284 ++++++++++++++++++
 5 files changed, 411 insertions(+)
 create mode 100644 tools/testing/selftests/idmap_mounts/.gitignore
 create mode 100644 tools/testing/selftests/idmap_mounts/Makefile
 create mode 100644 tools/testing/selftests/idmap_mounts/config
 create mode 100644 tools/testing/selftests/idmap_mounts/internal.h
 create mode 100644 tools/testing/selftests/idmap_mounts/xattr.c

diff --git a/tools/testing/selftests/idmap_mounts/.gitignore b/tools/testing/selftests/idmap_mounts/.gitignore
new file mode 100644
index 000000000000..18c5e90522ad
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/.gitignore
@@ -0,0 +1 @@
+xattr
diff --git a/tools/testing/selftests/idmap_mounts/Makefile b/tools/testing/selftests/idmap_mounts/Makefile
new file mode 100644
index 000000000000..1d495c99d924
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for mount selftests.
+CFLAGS = -g -I../../../../usr/include/ -Wall -O2 -pthread
+
+TEST_GEN_FILES += xattr
+
+include ../lib.mk
+
+$(OUTPUT)/xattr: xattr.c internal.h
diff --git a/tools/testing/selftests/idmap_mounts/config b/tools/testing/selftests/idmap_mounts/config
new file mode 100644
index 000000000000..80730abc534b
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/config
@@ -0,0 +1 @@
+CONFIG_IDMAP_MOUNTS=y
diff --git a/tools/testing/selftests/idmap_mounts/internal.h b/tools/testing/selftests/idmap_mounts/internal.h
new file mode 100644
index 000000000000..252803f35d71
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/internal.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __IDMAP_INTERNAL_H
+#define __IDMAP_INTERNAL_H
+
+#define _GNU_SOURCE
+
+#include "../kselftest_harness.h"
+
+#ifndef __NR_mount_setattr
+	#if defined __alpha__
+		#define __NR_mount_setattr 551
+	#elif defined _MIPS_SIM
+		#if _MIPS_SIM == _MIPS_SIM_ABI32	/* o32 */
+			#define __NR_mount_setattr 4441
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_NABI32	/* n32 */
+			#define __NR_mount_setattr 6441
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_ABI64	/* n64 */
+			#define __NR_mount_setattr 5441
+		#endif
+	#elif defined __ia64__
+		#define __NR_mount_setattr (441 + 1024)
+	#else
+		#define __NR_mount_setattr 441
+	#endif
+
+#ifndef __NR_open_tree
+	#if defined __alpha__
+		#define __NR_open_tree 538
+	#elif defined _MIPS_SIM
+		#if _MIPS_SIM == _MIPS_SIM_ABI32	/* o32 */
+			#define __NR_open_tree 4428
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_NABI32	/* n32 */
+			#define __NR_open_tree 6428
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_ABI64	/* n64 */
+			#define __NR_open_tree 5428
+		#endif
+	#elif defined __ia64__
+		#define __NR_open_tree (428 + 1024)
+	#else
+		#define __NR_open_tree 428
+	#endif
+#endif
+
+#ifndef __NR_move_mount
+	#if defined __alpha__
+		#define __NR_move_mount 539
+	#elif defined _MIPS_SIM
+		#if _MIPS_SIM == _MIPS_SIM_ABI32	/* o32 */
+			#define __NR_move_mount 4429
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_NABI32	/* n32 */
+			#define __NR_move_mount 6429
+		#endif
+		#if _MIPS_SIM == _MIPS_SIM_ABI64	/* n64 */
+			#define __NR_move_mount 5429
+		#endif
+	#elif defined __ia64__
+		#define __NR_move_mount (428 + 1024)
+	#else
+		#define __NR_move_mount 429
+	#endif
+#endif
+
+
+struct mount_attr {
+	__u64 attr_set;
+	__u64 attr_clr;
+	__u64 propagation;
+	__u64 userns;
+};
+#endif
+
+#ifndef MOVE_MOUNT_F_EMPTY_PATH
+#define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
+#endif
+
+#ifndef MOUNT_ATTR_IDMAP
+#define MOUNT_ATTR_IDMAP 0x00100000
+#endif
+
+#ifndef OPEN_TREE_CLONE
+#define OPEN_TREE_CLONE 1
+#endif
+
+#ifndef OPEN_TREE_CLOEXEC
+#define OPEN_TREE_CLOEXEC O_CLOEXEC
+#endif
+
+#ifndef AT_RECURSIVE
+#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
+#endif
+
+static inline int sys_mount_setattr(int dfd, const char *path, unsigned int flags,
+				    struct mount_attr *attr, size_t size)
+{
+	return syscall(__NR_mount_setattr, dfd, path, flags, attr, size);
+}
+
+static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
+{
+	return syscall(__NR_open_tree, dfd, filename, flags);
+}
+
+static inline int sys_move_mount(int from_dfd, const char *from_pathname, int to_dfd,
+				 const char *to_pathname, unsigned int flags)
+{
+	return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags);
+}
+
+
+#endif /* __IDMAP_INTERNAL_H */
diff --git a/tools/testing/selftests/idmap_mounts/xattr.c b/tools/testing/selftests/idmap_mounts/xattr.c
new file mode 100644
index 000000000000..58e88f92f958
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/xattr.c
@@ -0,0 +1,284 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <sched.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <linux/limits.h>
+
+#include "internal.h"
+#include "../kselftest_harness.h"
+
+static ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = write(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+static int write_file(const char *path, const void *buf, size_t count)
+{
+	int fd;
+	ssize_t ret;
+
+	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
+	if (fd < 0)
+		return -1;
+
+	ret = write_nointr(fd, buf, count);
+	close(fd);
+	if (ret < 0 || (size_t)ret != count)
+		return -1;
+
+	return 0;
+}
+
+static int map_ids(pid_t pid, unsigned long nsid, unsigned long hostid,
+		   unsigned long range)
+{
+	char map[100], procfile[256];
+
+	snprintf(procfile, sizeof(procfile), "/proc/%d/setgroups", pid);
+	if (write_file(procfile, "deny", sizeof("deny") - 1) &&
+	    errno != ENOENT)
+		return -1;
+
+	snprintf(procfile, sizeof(procfile), "/proc/%d/uid_map", pid);
+	snprintf(map, sizeof(map), "%lu %lu %lu", nsid, hostid, range);
+	if (write_file(procfile, map, strlen(map)))
+		return -1;
+
+
+	snprintf(procfile, sizeof(procfile), "/proc/%d/gid_map", pid);
+	snprintf(map, sizeof(map), "%lu %lu %lu", nsid, hostid, range);
+	if (write_file(procfile, map, strlen(map)))
+		return -1;
+
+	return 0;
+}
+
+#define __STACK_SIZE (8 * 1024 * 1024)
+static pid_t do_clone(int (*fn)(void *), void *arg, int flags)
+{
+	void *stack;
+
+	stack = malloc(__STACK_SIZE);
+	if (!stack)
+		return -ENOMEM;
+
+#ifdef __ia64__
+	return __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, NULL);
+#else
+	return clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, NULL);
+#endif
+}
+
+static int get_userns_fd_cb(void *data)
+{
+	return kill(getpid(), SIGSTOP);
+}
+
+static int get_userns_fd(unsigned long nsid, unsigned long hostid,
+			 unsigned long range)
+{
+	int ret;
+	pid_t pid;
+	char path[256];
+
+	pid = do_clone(get_userns_fd_cb, NULL, CLONE_NEWUSER | CLONE_NEWNS);
+	if (pid < 0)
+		return -errno;
+
+	ret = map_ids(pid, nsid, hostid, range);
+	if (ret < 0)
+		return ret;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/user", pid);
+	ret = open(path, O_RDONLY | O_CLOEXEC);
+	kill(pid, SIGKILL);
+	return ret;
+}
+
+struct run_as_data {
+	int userns;
+	int (*f)(void *data);
+	void *data;
+};
+
+static int run_in_cb(void *data)
+{
+	struct run_as_data *rad = data;
+
+	if (setns(rad->userns, CLONE_NEWUSER) < 0) {
+		perror("setns");
+		return 1;
+	}
+
+	if (setuid(100010)) {
+		perror("setuid");
+		return 1;
+	}
+
+	if (setgid(100010)) {
+		perror("setgid");
+		return 1;
+	}
+
+	return rad->f(rad->data);
+}
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static int run_in(int userns, int (*f)(void *), void *f_data)
+{
+	pid_t pid;
+	struct run_as_data data;
+
+	data.userns = userns;
+	data.f = f;
+	data.data = f_data;
+	pid = do_clone(run_in_cb, &data, 0);
+	if (pid < 0)
+		return -errno;
+
+	return wait_for_pid(pid);
+}
+
+FIXTURE(ext4_xattr) {};
+
+FIXTURE_SETUP(ext4_xattr)
+{
+	int fd;
+
+	fd = open("/tmp/idmap_mounts.ext4", O_CREAT | O_WRONLY, 0600);
+	ASSERT_GE(fd, 0);
+	ASSERT_EQ(ftruncate(fd, 640 * 1024), 0);
+	ASSERT_EQ(close(fd), 0);
+	ASSERT_EQ(system("mkfs.ext4 /tmp/idmap_mounts.ext4"), 0);
+	ASSERT_EQ(mkdir("/tmp/ext4", 0777), 0);
+	ASSERT_EQ(system("mount -o loop -t ext4 /tmp/idmap_mounts.ext4 /tmp/ext4"), 0);
+}
+
+FIXTURE_TEARDOWN(ext4_xattr)
+{
+	umount("/tmp/ext4/dest");
+	umount("/tmp/ext4");
+	rmdir("/tmp/ext4");
+	unlink("/tmp/idmap_mounts.ext4");
+}
+
+struct getacl_should_be_data {
+	char path[256];
+	uid_t uid;
+};
+
+static int getacl_should_be_uid(void *data)
+{
+	struct getacl_should_be_data *ssb = data;
+	char cmd[512];
+	int ret;
+
+	snprintf(cmd, sizeof(cmd), "getfacl %s | grep user:%u:rwx", ssb->path, ssb->uid);
+	ret = system(cmd);
+	if (ret < 0) {
+		perror("system");
+		return -1;
+	}
+	if (!WIFEXITED(ret))
+		return -1;
+	return WEXITSTATUS(ret);
+}
+
+static int ls_path(void *data)
+{
+	char cmd[PATH_MAX];
+	char *path = data;
+	int ret;
+
+	snprintf(cmd, sizeof(cmd), "ls %s", path);
+	ret = system(cmd);
+	if (ret < 0) {
+		perror("system");
+		return -1;
+	}
+	if (!WIFEXITED(ret))
+		return -1;
+	return WEXITSTATUS(ret);
+}
+
+TEST_F(ext4_xattr, setattr_didnt_work)
+{
+	int mount_fd, ret;
+	struct mount_attr attr = {};
+	struct getacl_should_be_data ssb;
+
+	ASSERT_EQ(mkdir("/tmp/ext4/source", 0777), 0);
+	ASSERT_EQ(mkdir("/tmp/ext4/dest", 0777), 0);
+
+	mount_fd = sys_open_tree(-EBADF, "/tmp/ext4/source",
+				 OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC | AT_EMPTY_PATH);
+	ASSERT_GE(mount_fd, 0);
+
+	ASSERT_EQ(sys_move_mount(mount_fd, "", -EBADF, "/tmp/ext4/dest",
+				 MOVE_MOUNT_F_EMPTY_PATH), 0);
+
+	attr.attr_set = MOUNT_ATTR_IDMAP;
+	attr.userns = get_userns_fd(100010, 100020, 5);
+	ASSERT_GE(attr.userns, 0);
+	ret = sys_mount_setattr(mount_fd, "", AT_EMPTY_PATH | AT_RECURSIVE,
+				    &attr, sizeof(attr));
+	ASSERT_EQ(close(mount_fd), 0);
+	ASSERT_EQ(ret, 0);
+
+	ASSERT_EQ(mkdir("/tmp/ext4/source/foo", 0700), 0);
+	ASSERT_EQ(chown("/tmp/ext4/source/foo", 100010, 100010), 0);
+
+	ASSERT_EQ(system("setfacl -m u:100010:rwx /tmp/ext4/source/foo"), 0);
+	EXPECT_EQ(system("getfacl /tmp/ext4/source/foo | grep user:100010:rwx"), 0);
+	EXPECT_EQ(system("getfacl /tmp/ext4/dest/foo | grep user:100020:rwx"), 0);
+
+	snprintf(ssb.path, sizeof(ssb.path), "/tmp/ext4/source/foo");
+	ssb.uid = 4294967295;
+	EXPECT_EQ(run_in(attr.userns, getacl_should_be_uid, &ssb), 0);
+
+	snprintf(ssb.path, sizeof(ssb.path), "/tmp/ext4/dest/foo");
+	ssb.uid = 100010;
+	EXPECT_EQ(run_in(attr.userns, getacl_should_be_uid, &ssb), 0);
+
+	/*
+	 * now, dir is owned by someone else in the user namespace, but we can
+	 * still read it because of acls
+	 */
+	ASSERT_EQ(chown("/tmp/ext4/source/foo", 100012, 100012), 0);
+	EXPECT_EQ(run_in(attr.userns, ls_path, "/tmp/ext4/dest/foo"), 0);
+
+	/*
+	 * if we delete the acls, the ls should fail because it's 700.
+	 */
+	ASSERT_EQ(system("setfacl --remove-all /tmp/ext4/source/foo"), 0);
+	EXPECT_NE(run_in(attr.userns, ls_path, "/tmp/ext4/dest/foo"), 0);
+}
+
+TEST_HARNESS_MAIN
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH v2 39/39] tests: add vfs/idmapped mounts test suite
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (37 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 38/39] selftests: add idmapped mounts xattr selftest Christian Brauner
@ 2020-11-15 10:37 ` Christian Brauner
  2020-11-20 21:15   ` Kees Cook
  2020-11-17 23:54 ` [PATCH v2 00/39] fs: idmapped mounts Jonathan Corbet
                   ` (2 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-15 10:37 UTC (permalink / raw)
  To: Alexander Viro, Christoph Hellwig, linux-fsdevel
  Cc: John Johansen, James Morris, Mimi Zohar, Dmitry Kasatkin,
	Stephen Smalley, Casey Schaufler, Arnd Bergmann, Andreas Dilger,
	OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel, Josh Triplett,
	Andy Lutomirski, Theodore Tso, Alban Crequy, Tycho Andersen,
	David Howells, James Bottomley, Jann Horn, Seth Forshee,
	Stéphane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux,
	Christian Brauner, Christoph Hellwig

This adds a whole test suite for idmapped mounts but in order to ensure that
there are no regression for the vfs itself it also includes tests for correct
functionality on non-idmapped mounts. The following tests are currently
available with more to come in the future:

01. create_delete_rename: test that basic file interactions work for idmapped mounts
02. create_delete_rename_userns: test that basic file interactions work for
    idmapped mounts from within user namespaces
03. hardlinks: verify that hardlinks work correctly
04. rename: verity tat rename works correctly
05. create_userns: verify that file creation in user namespaces works from idmapped mounts
06. create_userns_device_node: verify that device node creation fails inside user
    namespace from idmapped mounts.
07. expected_uid_gid: verify that file ownership works correctly on idmapped mounts
08. expected_uid_gid_userns: verify that file ownership works correctly on
    idmapped mounts inside user namespaces
09. expected_fscaps_userns: verify that filesystem capabilities work correctly on
    idmapped mounts and inside user namespaces
10. expected_fscaps_reverse: verify that filesystem capabilities work correctly
    on idmapped mounts and inside user namespaces where we map from unprivileged
    ids to privileged ids
11. setid_binaries: verify that suid and sgid binaries work correctly on idmapped mounts
12. setid_binaries_reverse: verify that suid and sgid binaries work correctly on
    idmapped mounts where we map from unprivileged ids to privileged ids
13. setid_binaries_userns: verify that suid and sgid binaries work correctly on
    idmapped mounts inside user namespaces
14. idmap_mount_tree: verify that idmapping a whole mount tree works correctly
15. idmap_mount_tree_invalid: verify that idmapping a mount tree with a mount of
    a filesystem that doesn't support being idmapped yet fails
16. sticky_bit_unlink: verify that unlinking in sticky directories works correctly
17. sticky_bit_unlink_idmapped: verify that unlinking in sticky directories works
    correctly on idmapped mounts
18. sticky_bit_unlink_idmapped_userns: verify that unlinking in sticky directories works
    correctly on idmapped mounts inside user namespaces
19. sticky_bit_rename_idmapped: verify that renaminging in sticky directories works
    correctly on idmapped mounts
20. sticky_bit_rename_idmapped_userns: verify that renaming in sticky directories works
    correctly on idmapped mounts inside user namespaces
21. follow_symlinks: test that following protected symlinks works correctly
22. follow_symlinks_idmapped: test that following symlinks works correctly on idmapped mounts
23. follow_symlinks_idmapped_userns: test that following symlinks works correctly
    on idmapped mounts inside user namespaces
24. invalid_fd_negative: test that negative fds are rejected when idmapping mounts
25. invalid_fd_large: test that excessively large fds are rejected when idmappings mounts
26. invalid_fd_closed: test that closed fds are rejected when idmapping mounts
27. invalid_fd_initial_userns: test that fds referencing the initial user namespace are rejected
28. attached_mount_inside_current_mount_namespace: test that attached mounts can be idmapped
29. attached_mount_outside_current_mount_namespace: test that attached mounts
    can't be idmapped if we are in a different user namespace
30. detached_mount_inside_current_mount_namespace: test that detached mounts can be idmapped
31. detached_mount_outside_current_mount_namespace: test that detached mounts can
    be idmapped outside of our current user namespace
32. change_idmapping: test that idmapped mounts can't be changed

Output:
 TAP version 13
 1..33
 # Starting 33 tests from 2 test cases.
 #  RUN           core.invalid_fd_negative ...
 #            OK  core.invalid_fd_negative
 ok 1 core.invalid_fd_negative
 #  RUN           core.invalid_fd_large ...
 #            OK  core.invalid_fd_large
 ok 2 core.invalid_fd_large
 #  RUN           core.invalid_fd_closed ...
 #            OK  core.invalid_fd_closed
 ok 3 core.invalid_fd_closed
 #  RUN           core.invalid_fd_initial_userns ...
 #            OK  core.invalid_fd_initial_userns
 ok 4 core.invalid_fd_initial_userns
 #  RUN           core.attached_mount_inside_current_mount_namespace ...
 #            OK  core.attached_mount_inside_current_mount_namespace
 ok 5 core.attached_mount_inside_current_mount_namespace
 #  RUN           core.attached_mount_outside_current_mount_namespace ...
 #            OK  core.attached_mount_outside_current_mount_namespace
 ok 6 core.attached_mount_outside_current_mount_namespace
 #  RUN           core.detached_mount_inside_current_mount_namespace ...
 #            OK  core.detached_mount_inside_current_mount_namespace
 ok 7 core.detached_mount_inside_current_mount_namespace
 #  RUN           core.detached_mount_outside_current_mount_namespace ...
 #            OK  core.detached_mount_outside_current_mount_namespace
 ok 8 core.detached_mount_outside_current_mount_namespace
 #  RUN           core.change_idmapping ...
 #            OK  core.change_idmapping
 ok 9 core.change_idmapping
 #  RUN           core.create_delete_rename ...
 #            OK  core.create_delete_rename
 ok 10 core.create_delete_rename
 #  RUN           core.create_delete_rename_userns ...
 #            OK  core.create_delete_rename_userns
 ok 11 core.create_delete_rename_userns
 #  RUN           core.hardlinks ...
 #            OK  core.hardlinks
 ok 12 core.hardlinks
 #  RUN           core.rename ...
 #            OK  core.rename
 ok 13 core.rename
 #  RUN           core.create_userns ...
 #            OK  core.create_userns
 ok 14 core.create_userns
 #  RUN           core.create_userns_device_node ...
 #            OK  core.create_userns_device_node
 ok 15 core.create_userns_device_node
 #  RUN           core.expected_uid_gid ...
 #            OK  core.expected_uid_gid
 ok 16 core.expected_uid_gid
 #  RUN           core.expected_uid_gid_userns ...
 #            OK  core.expected_uid_gid_userns
 ok 17 core.expected_uid_gid_userns
 #  RUN           core.expected_fscaps_userns ...
 #            OK  core.expected_fscaps_userns
 ok 18 core.expected_fscaps_userns
 #  RUN           core.expected_fscaps_reverse ...
 #            OK  core.expected_fscaps_reverse
 ok 19 core.expected_fscaps_reverse
 #  RUN           core.setid_binaries ...
 #            OK  core.setid_binaries
 ok 20 core.setid_binaries
 #  RUN           core.setid_binaries_reverse ...
 #            OK  core.setid_binaries_reverse
 ok 21 core.setid_binaries_reverse
 #  RUN           core.setid_binaries_userns ...
 #            OK  core.setid_binaries_userns
 ok 22 core.setid_binaries_userns
 #  RUN           core.idmap_mount_tree ...
 #            OK  core.idmap_mount_tree
 ok 23 core.idmap_mount_tree
 #  RUN           core.idmap_mount_tree_invalid ...
 #            OK  core.idmap_mount_tree_invalid
 ok 24 core.idmap_mount_tree_invalid
 #  RUN           core.sticky_bit_unlink ...
 #            OK  core.sticky_bit_unlink
 ok 25 core.sticky_bit_unlink
 #  RUN           core.sticky_bit_unlink_idmapped ...
 #            OK  core.sticky_bit_unlink_idmapped
 ok 26 core.sticky_bit_unlink_idmapped
 #  RUN           core.sticky_bit_unlink_idmapped_userns ...
 #            OK  core.sticky_bit_unlink_idmapped_userns
 ok 27 core.sticky_bit_unlink_idmapped_userns
 #  RUN           core.sticky_bit_rename ...
 #            OK  core.sticky_bit_rename
 ok 28 core.sticky_bit_rename
 #  RUN           core.sticky_bit_rename_idmapped ...
 #            OK  core.sticky_bit_rename_idmapped
 ok 29 core.sticky_bit_rename_idmapped
 #  RUN           core.sticky_bit_rename_idmapped_userns ...
 #            OK  core.sticky_bit_rename_idmapped_userns
 ok 30 core.sticky_bit_rename_idmapped_userns
 #  RUN           core.follow_symlinks ...
 #            OK  core.follow_symlinks
 ok 31 core.follow_symlinks
 #  RUN           core.follow_symlinks_idmapped ...
 #            OK  core.follow_symlinks_idmapped
 ok 32 core.follow_symlinks_idmapped
 #  RUN           core.follow_symlinks_idmapped_userns ...
 #            OK  core.follow_symlinks_idmapped_userns
 ok 33 core.follow_symlinks_idmapped_userns
 # PASSED: 33 / 33 tests passed.
 # Totals: pass:33 fail:0 xfail:0 xpass:0 skip:0 error:0

Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
/* v2 */
patch introduced
---
 .../testing/selftests/idmap_mounts/.gitignore |    1 +
 tools/testing/selftests/idmap_mounts/Makefile |    7 +-
 tools/testing/selftests/idmap_mounts/core.c   | 3476 +++++++++++++++++
 .../testing/selftests/idmap_mounts/internal.h |   33 +-
 tools/testing/selftests/idmap_mounts/utils.c  |  136 +
 tools/testing/selftests/idmap_mounts/utils.h  |   17 +
 tools/testing/selftests/idmap_mounts/xattr.c  |  126 +-
 7 files changed, 3664 insertions(+), 132 deletions(-)
 create mode 100644 tools/testing/selftests/idmap_mounts/core.c
 create mode 100644 tools/testing/selftests/idmap_mounts/utils.c
 create mode 100644 tools/testing/selftests/idmap_mounts/utils.h

diff --git a/tools/testing/selftests/idmap_mounts/.gitignore b/tools/testing/selftests/idmap_mounts/.gitignore
index 18c5e90522ad..03c7198482d2 100644
--- a/tools/testing/selftests/idmap_mounts/.gitignore
+++ b/tools/testing/selftests/idmap_mounts/.gitignore
@@ -1 +1,2 @@
+core
 xattr
diff --git a/tools/testing/selftests/idmap_mounts/Makefile b/tools/testing/selftests/idmap_mounts/Makefile
index 1d495c99d924..67697b788353 100644
--- a/tools/testing/selftests/idmap_mounts/Makefile
+++ b/tools/testing/selftests/idmap_mounts/Makefile
@@ -1,9 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0
 # Makefile for mount selftests.
 CFLAGS = -g -I../../../../usr/include/ -Wall -O2 -pthread
+LDLIBS += -lcap
 
-TEST_GEN_FILES += xattr
+TEST_GEN_FILES = xattr
+TEST_GEN_FILES += core
 
 include ../lib.mk
 
-$(OUTPUT)/xattr: xattr.c internal.h
+$(OUTPUT)/xattr: xattr.c internal.h utils.c utils.h
+$(OUTPUT)/core: core.c internal.h utils.c utils.h
diff --git a/tools/testing/selftests/idmap_mounts/core.c b/tools/testing/selftests/idmap_mounts/core.c
new file mode 100644
index 000000000000..23909a55beb8
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/core.c
@@ -0,0 +1,3476 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <grp.h>
+#include <limits.h>
+#include <linux/capability.h>
+#include <sys/capability.h>
+#include <linux/limits.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <sys/fsuid.h>
+#include <sys/sysmacros.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/acl.h>
+#include <sys/xattr.h>
+
+#include "internal.h"
+#include "utils.h"
+#include "../kselftest_harness.h"
+
+#define IMAGE_FILE1 "ext4_1.img"
+#define IMAGE_FILE2 "ext4_2.img"
+#define IMAGE_ROOT_MNT1 "mnt_root_1"
+#define IMAGE_ROOT_MNT2_RELATIVE "mnt_root_2"
+#define IMAGE_ROOT_MNT2 IMAGE_ROOT_MNT1 "/" IMAGE_ROOT_MNT2_RELATIVE
+#define FILESYSTEM_MOUNT1 IMAGE_ROOT_MNT2 "/fs_root_1"
+#define MNT_TARGET1 "mnt_target1"
+#define MNT_TARGET2 "mnt_target2"
+#define FILE1 "file1"
+#define FILE1_RENAME "file1_rename"
+#define FILE2 "file2"
+#define FILE2_RENAME "file2_rename"
+#define DIR1 "dir1"
+#define DIR1_RENAME "dir1_rename"
+#define HARDLINK1 "hardlink1"
+#define SYMLINK1 "symlink1"
+#define SYMLINK_USER1 "symlink_user1"
+#define SYMLINK_USER2 "symlink_user2"
+#define SYMLINK_USER3 "symlink_user3"
+#define CHRDEV1 "chrdev1"
+
+/* Attempt to de-conflict with the selftests tree. */
+#ifndef SKIP
+#define SKIP(s, ...)	XFAIL(s, ##__VA_ARGS__)
+#endif
+
+static bool symlinks_protected(void)
+{
+	int fd;
+	ssize_t ret;
+	char buf[256];
+
+	fd = open("/proc/sys/fs/protected_symlinks", O_RDONLY | O_CLOEXEC);
+	if (fd < 0)
+		return false;
+
+	ret = read(fd, buf, sizeof(buf));
+	close(fd);
+	if (ret < sizeof(int))
+		return false;
+
+	return atoi(buf) >= 1;
+}
+
+/**
+ * caps_down - lower all effective caps
+ */
+static bool caps_down(void)
+{
+	cap_t caps = NULL;
+	int ret = -1;
+	bool fret = false;
+
+	caps = cap_get_proc();
+	if (!caps)
+		goto out;
+
+	ret = cap_clear_flag(caps, CAP_EFFECTIVE);
+	if (ret)
+		goto out;
+
+	ret = cap_set_proc(caps);
+	if (ret)
+		goto out;
+
+	fret = true;
+
+out:
+	cap_free(caps);
+	return fret;
+}
+
+/**
+ * expected_uid_gid - check whether file is owned by the provided uid and gid
+ */
+static bool expected_uid_gid(int dfd, const char *path, int flags,
+			     uid_t expected_uid, gid_t expected_gid)
+{
+	int ret;
+	struct stat st;
+
+	ret = fstatat(dfd, path, &st, flags);
+	if (ret < 0)
+		return false;
+
+	return st.st_uid == expected_uid && st.st_gid == expected_gid;
+}
+
+/**
+ * is_setid - check whether file is S_ISUID and S_ISGID
+ */
+static bool is_setid(int dfd, const char *path, int flags)
+{
+	int ret;
+	struct stat st;
+
+	ret = fstatat(dfd, path, &st, flags);
+	if (ret < 0)
+		return false;
+
+	return (st.st_mode & S_ISUID) || (st.st_mode & S_ISGID);
+}
+
+/**
+ * is_sticky - check whether file is S_ISUID and S_ISGID
+ */
+static bool is_sticky(int dfd, const char *path, int flags)
+{
+	int ret;
+	struct stat st;
+
+	ret = fstatat(dfd, path, &st, flags);
+	if (ret < 0)
+		return false;
+
+	return (st.st_mode & S_ISVTX) > 0;
+}
+
+/**
+ * rm_r - recursively remove all files
+ */
+static int rm_r(const char *dirname)
+{
+	DIR *dir;
+	int ret;
+	struct dirent *direntp;
+
+	dir = opendir(dirname);
+	if (!dir)
+		return -1;
+
+	while ((direntp = readdir(dir))) {
+		char buf[PATH_MAX];
+		struct stat st;
+
+		if (!strcmp(direntp->d_name, ".") ||
+		    !strcmp(direntp->d_name, ".."))
+			continue;
+
+		snprintf(buf, sizeof(buf), "%s/%s", dirname, direntp->d_name);
+		ret = lstat(buf, &st);
+		if (ret < 0 && errno != ENOENT)
+			break;
+
+		if (S_ISDIR(st.st_mode))
+			ret = rm_r(buf);
+		else
+			ret = unlink(buf);
+		if (ret < 0 && errno != ENOENT)
+			break;
+	}
+
+	ret = rmdir(dirname);
+	closedir(dir);
+	return ret;
+}
+
+/**
+ * umount_r - recursively umount all mounts
+ */
+static void umount_r(const char *path)
+{
+	DIR *dir;
+	int ret;
+	struct dirent *direntp;
+
+	dir = opendir(path);
+	if (!dir)
+		return;
+
+	while ((direntp = readdir(dir))) {
+		char buf[PATH_MAX];
+		struct stat st;
+
+		if (!strcmp(direntp->d_name, ".") ||
+		    !strcmp(direntp->d_name, ".."))
+			continue;
+
+		snprintf(buf, sizeof(buf), "%s/%s", path, direntp->d_name);
+		umount2(buf, MNT_DETACH);
+
+		ret = lstat(buf, &st);
+		if (ret < 0 && errno != ENOENT)
+			break;
+
+		if (!S_ISDIR(st.st_mode))
+			continue;
+
+		umount_r(buf);
+	}
+
+	umount2(path, MNT_DETACH);
+	closedir(dir);
+}
+
+/**
+ * chown_r - recursively change ownership of all files
+ */
+static int chown_r(const char *dirname, uid_t uid, gid_t gid)
+{
+	DIR *dir;
+	int ret;
+	struct dirent *direntp;
+
+	dir = opendir(dirname);
+	if (!dir)
+		return -1;
+
+	while ((direntp = readdir(dir))) {
+		char buf[PATH_MAX];
+		struct stat st;
+
+		if (!strcmp(direntp->d_name, ".") ||
+		    !strcmp(direntp->d_name, ".."))
+			continue;
+
+		snprintf(buf, sizeof(buf), "%s/%s", dirname, direntp->d_name);
+		ret = lstat(buf, &st);
+		if (ret < 0 && errno != ENOENT)
+			break;
+
+		if (S_ISDIR(st.st_mode))
+			ret = chown_r(buf, uid, gid);
+		else
+			ret = chown(buf, uid, gid);
+		if (ret < 0 && errno != ENOENT)
+			break;
+	}
+
+	ret = chown(dirname, uid, gid);
+	closedir(dir);
+	return ret;
+}
+
+/**
+ * fd_to_fd - transfer data from one fd to another
+ */
+static int fd_to_fd(int from, int to)
+{
+	for (;;) {
+		uint8_t buf[PATH_MAX];
+		uint8_t *p = buf;
+		ssize_t bytes_to_write;
+		ssize_t bytes_read;
+
+		bytes_read = read_nointr(from, buf, sizeof buf);
+		if (bytes_read < 0)
+			return -1;
+		if (bytes_read == 0)
+			break;
+
+		bytes_to_write = (size_t)bytes_read;
+		do {
+			ssize_t bytes_written;
+
+			bytes_written = write_nointr(to, p, bytes_to_write);
+			if (bytes_written < 0)
+				return -1;
+
+			bytes_to_write -= bytes_written;
+			p += bytes_written;
+		} while (bytes_to_write > 0);
+	}
+
+	return 0;
+}
+
+static int sys_execveat(int fd, const char *path, char **argv, char **envp,
+			int flags)
+{
+#ifdef __NR_execveat
+	return syscall(__NR_execveat, fd, path, argv, envp, flags);
+#else
+	errno = ENOSYS;
+	return -1;
+#endif
+}
+
+#ifndef VFS_CAP_U32_3
+#define VFS_CAP_U32_3 2
+#endif
+
+#ifndef VFS_CAP_U32
+#define VFS_CAP_U32 VFS_CAP_U32_3
+#endif
+
+#ifndef VFS_CAP_REVISION_1
+#define VFS_CAP_REVISION_1 0x01000000
+#endif
+
+#ifndef VFS_CAP_REVISION_2
+#define VFS_CAP_REVISION_2 0x02000000
+#endif
+
+#ifndef VFS_CAP_REVISION_3
+#define VFS_CAP_REVISION_3 0x03000000
+struct vfs_ns_cap_data {
+	__le32 magic_etc;
+	struct {
+		__le32 permitted;
+		__le32 inheritable;
+	} data[VFS_CAP_U32];
+	__le32 rootid;
+};
+#endif
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define cpu_to_le16(w16) le16_to_cpu(w16)
+#define le16_to_cpu(w16) ((u_int16_t)((u_int16_t)(w16) >> 8) | (u_int16_t)((u_int16_t)(w16) << 8))
+#define cpu_to_le32(w32) le32_to_cpu(w32)
+#define le32_to_cpu(w32)                                                                       \
+	((u_int32_t)((u_int32_t)(w32) >> 24) | (u_int32_t)(((u_int32_t)(w32) >> 8) & 0xFF00) | \
+	 (u_int32_t)(((u_int32_t)(w32) << 8) & 0xFF0000) | (u_int32_t)((u_int32_t)(w32) << 24))
+#elif __BYTE_ORDER == __LITTLE_ENDIAN
+#define cpu_to_le16(w16) ((u_int16_t)(w16))
+#define le16_to_cpu(w16) ((u_int16_t)(w16))
+#define cpu_to_le32(w32) ((u_int32_t)(w32))
+#define le32_to_cpu(w32) ((u_int32_t)(w32))
+#else
+#error Expected endianess macro to be set
+#endif
+
+/**
+ * expected_dummy_vfs_caps_uid - check vfs caps are stored with the provided uid
+ */
+static bool expected_dummy_vfs_caps_uid(int fd, uid_t expected_uid)
+{
+#define __cap_raised_permitted(x, ns_cap_data)                                 \
+	((ns_cap_data.data[(x) >> 5].permitted) & (1 << ((x)&31)))
+	struct vfs_ns_cap_data ns_xattr = {};
+	ssize_t ret;
+
+	ret = fgetxattr(fd, "security.capability", &ns_xattr, sizeof(ns_xattr));
+	if (ret < 0 || ret == 0)
+		return false;
+
+	if (ns_xattr.magic_etc & VFS_CAP_REVISION_3)
+		return (le32_to_cpu(ns_xattr.rootid) == expected_uid) && (__cap_raised_permitted(CAP_NET_RAW, ns_xattr) > 0);
+
+	return false;
+}
+
+/**
+ * set_dummy_vfs_caps - set dummy vfs caps for the provided uid
+ */
+static int set_dummy_vfs_caps(int fd, int flags, int rootuid)
+{
+#define __raise_cap_permitted(x, ns_cap_data)                                  \
+	ns_cap_data.data[(x) >> 5].permitted |= (1 << ((x)&31))
+
+	struct vfs_ns_cap_data ns_xattr;
+
+	memset(&ns_xattr, 0, sizeof(ns_xattr));
+	__raise_cap_permitted(CAP_NET_RAW, ns_xattr);
+	ns_xattr.magic_etc |= VFS_CAP_REVISION_3 | VFS_CAP_FLAGS_EFFECTIVE;
+	ns_xattr.rootid = cpu_to_le32(rootuid);
+
+	return fsetxattr(fd, "security.capability",
+			 &ns_xattr, sizeof(ns_xattr), flags);
+}
+
+FIXTURE(core)
+{
+	/* fd for the main test directory */
+	int test_dir_fd;
+	/* the absolute path to the main test directory */
+	char test_dir_path[PATH_MAX];
+
+	/* fd for test test filesystem image */
+	int img_fd;
+	/* fd for the mountpoint of the test fs image */
+	int img_mnt_fd;
+
+	/* temporary buffer */
+	char cmdline[3 * PATH_MAX];
+
+	/* open_tree fd for the mountpoint of the test fs image */
+	int target1_mnt_fd_attached;
+
+	/* detached open_tree fd for the mountpoint of the test fs image */
+	int target1_mnt_fd_detached;
+};
+
+FIXTURE_SETUP(core)
+{
+	struct mount_attr attr = {
+		.attr_set	= 0,
+		.attr_clr	= 0,
+		.propagation	= MAKE_PROPAGATION_PRIVATE,
+	};
+
+	self->img_fd = -EBADF;
+	self->img_mnt_fd = -EBADF;
+
+	/* create separate mount namespace with mount propagation turned off */
+	ASSERT_EQ(unshare(CLONE_NEWNS), 0);
+	ASSERT_EQ(sys_mount_setattr(-1, "/", AT_RECURSIVE, &attr, sizeof(attr)), 0);
+
+	/* create unique test directory */
+	snprintf(self->test_dir_path, sizeof(self->test_dir_path),
+		 "/idmap_mount_core_XXXXXX");
+	ASSERT_NE(mkdtemp(self->test_dir_path), NULL);
+	self->test_dir_fd = open(self->test_dir_path, O_CLOEXEC | O_DIRECTORY);
+	ASSERT_GE(self->test_dir_fd, 0);
+	ASSERT_EQ(fchmod(self->test_dir_fd, 0777), 0);
+
+	/* create filesystem image */
+	self->img_fd = openat(self->test_dir_fd, IMAGE_FILE1, O_CREAT | O_WRONLY, 0600);
+	ASSERT_GE(self->img_fd, 0);
+	ASSERT_EQ(ftruncate(self->img_fd, 1024 * 2048), 0);
+	snprintf(self->cmdline, sizeof(self->cmdline),
+		 "mkfs.ext4 -q %s/" IMAGE_FILE1, self->test_dir_path);
+	ASSERT_EQ(system(self->cmdline), 0);
+
+	/* create mountpoint for image */
+	ASSERT_EQ(mkdirat(self->test_dir_fd, IMAGE_ROOT_MNT1, 0777), 0);
+
+	/* mount image */
+	snprintf(self->cmdline, sizeof(self->cmdline),
+		 "mount -o loop -t ext4 %s/" IMAGE_FILE1 " %s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path, self->test_dir_path);
+	ASSERT_EQ(system(self->cmdline), 0);
+
+	/* stash fd for the image mountpoint */
+	self->img_mnt_fd = openat(self->test_dir_fd, IMAGE_ROOT_MNT1, O_DIRECTORY | O_CLOEXEC, 0);
+	ASSERT_GE(self->img_mnt_fd, 0);
+
+	/* create mountpoint for bind-mount */
+	ASSERT_EQ(mkdirat(self->test_dir_fd, MNT_TARGET1, 0777), 0);
+
+	/* create new detached mount from image mountpoint */
+	self->target1_mnt_fd_detached = sys_open_tree(self->test_dir_fd,
+						      IMAGE_ROOT_MNT1,
+						      AT_NO_AUTOMOUNT |
+						      AT_SYMLINK_NOFOLLOW |
+						      OPEN_TREE_CLOEXEC |
+						      OPEN_TREE_CLONE);
+	ASSERT_GE(self->target1_mnt_fd_detached, 0);
+
+	/* create attached mount from image mountpoint */
+	self->target1_mnt_fd_attached = sys_open_tree(self->test_dir_fd,
+						      IMAGE_ROOT_MNT1,
+						      AT_NO_AUTOMOUNT |
+						      AT_SYMLINK_NOFOLLOW |
+						      OPEN_TREE_CLOEXEC);
+	ASSERT_GE(self->target1_mnt_fd_attached, 0);
+}
+
+FIXTURE_TEARDOWN(core)
+{
+	EXPECT_EQ(close(self->test_dir_fd), 0);
+	EXPECT_EQ(close(self->img_fd), 0);
+	EXPECT_EQ(close(self->img_mnt_fd), 0);
+	EXPECT_EQ(close(self->target1_mnt_fd_attached), 0);
+	EXPECT_EQ(close(self->target1_mnt_fd_detached), 0);
+	umount_r(self->test_dir_path);
+	rm_r(self->test_dir_path);
+}
+
+/**
+ * Validate that negative fd values are rejected.
+ */
+TEST_F(core, invalid_fd_negative)
+{
+	struct mount_attr attr = {
+		.attr_set	= MOUNT_ATTR_IDMAP,
+		.userns_fd	= -EBADF,
+	};
+
+	ASSERT_NE(sys_mount_setattr(-1, "/", 0, &attr, sizeof(attr)), 0) {
+		TH_LOG("failure: created idmapped mount with negative fd");
+	}
+}
+
+/**
+ * Validate that excessively large fd values are rejected.
+ */
+TEST_F(core, invalid_fd_large)
+{
+	struct mount_attr attr = {
+		.attr_set	= MOUNT_ATTR_IDMAP,
+		.userns_fd	= INT64_MAX,
+	};
+
+	ASSERT_NE(sys_mount_setattr(-1, "/", 0, &attr, sizeof(attr)), 0) {
+		TH_LOG("failure: created idmapped mount with too large fd value");
+	}
+}
+
+/**
+ * Validate that closed fd values are rejected.
+ */
+TEST_F(core, invalid_fd_closed)
+{
+	int fd;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	fd = open("/dev/null", O_RDONLY | O_CLOEXEC);
+	ASSERT_GE(fd, 0);
+	ASSERT_GE(close(fd), 0);
+
+	attr.userns_fd = fd;
+	ASSERT_NE(sys_mount_setattr(-1, "/", 0, &attr, sizeof(attr)), 0) {
+		TH_LOG("failure: created idmapped mount with closed fd");
+	}
+}
+
+/**
+ * Validate that the initial user namespace is rejected.
+ */
+TEST_F(core, invalid_fd_initial_userns)
+{
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	attr.userns_fd = open("/proc/1/ns/user", O_RDONLY | O_CLOEXEC);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_NE(sys_mount_setattr(-1, "/", 0, &attr, sizeof(attr)), 0) {
+		TH_LOG("failure: created idmapped mount with initial user namespace");
+	}
+	ASSERT_EQ(errno, EPERM);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that an attached mount in our mount namespace can be idmapped.
+ * (The kernel enforces that the mount's mount namespace and the caller's mount
+ *  namespace match.)
+ */
+TEST_F(core, attached_mount_inside_current_mount_namespace)
+{
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_attached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_attached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that idmapping a mount is rejected if the mount's mount namespace
+ * and our mount namespace don't match.
+ * (The kernel enforces that the mount's mount namespace and the caller's mount
+ *  namespace match.)
+ */
+TEST_F(core, attached_mount_outside_current_mount_namespace)
+{
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	ASSERT_EQ(unshare(CLONE_NEWNS), 0);
+
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_NE(sys_mount_setattr(self->target1_mnt_fd_attached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: managed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_attached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that an attached mount in our mount namespace can be idmapped.
+ */
+TEST_F(core, detached_mount_inside_current_mount_namespace)
+{
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that a detached mount not in our mount namespace can be idmapped.
+ */
+TEST_F(core, detached_mount_outside_current_mount_namespace)
+{
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	ASSERT_EQ(unshare(CLONE_NEWNS), 0);
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that currently changing the idmapping of an idmapped mount fails.
+ */
+TEST_F(core, change_idmapping)
+{
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+
+	/* Change idmapping on a detached mount that is already idmapped. */
+	attr.userns_fd	= get_userns_fd(0, 20000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_NE(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("failure: managed to change idmapping of already idmapped mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that basic file operations on idmapped mounts.
+ */
+TEST_F(core, create_delete_rename)
+{
+	int file1_fd = -EBADF, hardlink_target_fd = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create hardlink target */
+	hardlink_target_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(hardlink_target_fd, 0);
+	ASSERT_EQ(close(hardlink_target_fd), 0);
+
+	/* create directory for rename test */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0700), 0);
+
+	/* change ownership of all files to uid 0 */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 0, 0), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+
+	/*
+	 * The caller's fsids don't have a mappings in the idmapped mount so
+	 * any file creation must fail.
+	 */
+
+	/* create hardlink */
+	ASSERT_NE(linkat(self->target1_mnt_fd_detached, FILE1, self->target1_mnt_fd_detached, HARDLINK1, 0), 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+
+	/* try to rename a file */
+	ASSERT_NE(renameat2(self->target1_mnt_fd_detached, FILE1,
+			    self->target1_mnt_fd_detached, FILE1_RENAME, 0), 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+
+	/* try to rename a directory */
+	ASSERT_NE(renameat2(self->target1_mnt_fd_detached, DIR1,
+			    self->target1_mnt_fd_detached, DIR1_RENAME, 0), 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+
+	/*
+	 * The caller is privileged over the inode so file deletion must work.
+	 */
+
+	/* remove file */
+	ASSERT_EQ(unlinkat(self->target1_mnt_fd_detached, FILE1, 0), 0);
+
+	/* remove directory */
+	ASSERT_EQ(unlinkat(self->target1_mnt_fd_detached, DIR1, AT_REMOVEDIR), 0);
+
+	/*
+	 * The caller's fsids don't have a mappings in the idmapped mount so
+	 * any file creation must fail.
+	 */
+
+	/* create regular file via open() */
+	file1_fd = openat(self->target1_mnt_fd_detached, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_LT(file1_fd, 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+
+	/* create regular file via mknod */
+	ASSERT_NE(mknodat(self->target1_mnt_fd_detached, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+
+	/* create character device */
+	ASSERT_NE(mknodat(self->target1_mnt_fd_detached, CHRDEV1, S_IFCHR | 0644,
+			  makedev(5, 1)), 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+
+	/* create symlink */
+	ASSERT_NE(symlinkat(FILE2, self->target1_mnt_fd_detached, SYMLINK1), 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+
+	/* create directory */
+	ASSERT_NE(mkdirat(self->target1_mnt_fd_detached, DIR1, 0700), 0);
+	ASSERT_EQ(errno, EOVERFLOW);
+}
+
+/**
+ * Validate that basic file operations on idmapped mounts from a user
+ * namespace.
+ */
+TEST_F(core, create_delete_rename_userns)
+{
+	int file1_fd = -EBADF;
+	pid_t pid;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* change ownership of all files to uid 0 */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 0, 0), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* Switch to  user namespace where uid 10000 maps to 0. */
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(0, 0, 0), 0);
+		ASSERT_EQ(setresuid(0, 0, 0), 0);
+
+		/* create regular file via open() */
+		file1_fd = openat(self->target1_mnt_fd_detached,
+				  FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+		ASSERT_GT(file1_fd, 0);
+		ASSERT_EQ(close(file1_fd), 0);
+
+		/* create regular file via mknod */
+		ASSERT_EQ(mknodat(self->target1_mnt_fd_detached,
+				  FILE2, S_IFREG | 0000, 0), 0);
+
+		/* create symlink */
+		ASSERT_EQ(symlinkat(FILE2, self->target1_mnt_fd_detached,
+				    SYMLINK1), 0);
+
+		/* create directory */
+		ASSERT_EQ(mkdirat(self->target1_mnt_fd_detached,
+				  DIR1, 0700), 0);
+
+		/* try to rename a file */
+		ASSERT_EQ(renameat2(self->target1_mnt_fd_detached, FILE1,
+				    self->target1_mnt_fd_detached, FILE1_RENAME,
+				    0), 0);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   FILE1_RENAME, 0, 0, 0), true);
+
+		/* try to rename a file */
+		ASSERT_EQ(renameat2(self->target1_mnt_fd_detached, DIR1,
+				    self->target1_mnt_fd_detached, DIR1_RENAME,
+				    0), 0);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   DIR1_RENAME, 0, 0, 0), true);
+
+		/* remove file */
+		ASSERT_EQ(unlinkat(self->target1_mnt_fd_detached,
+				   FILE1_RENAME, 0), 0);
+
+		/* remove directory */
+		ASSERT_EQ(unlinkat(self->target1_mnt_fd_detached,
+				   DIR1_RENAME, AT_REMOVEDIR), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+TEST_F(core, hardlinks)
+{
+	int file1_fd = -EBADF, open_tree_fd = -EBADF;
+	struct mount_attr attr1 = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	struct mount_attr attr2 = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/* Chown all files to an unprivileged user. */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 10000, 10000), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr1.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr1.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr1, sizeof(attr1)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->test_dir_fd, IMAGE_ROOT_MNT1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr2.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr2.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH, &attr2, sizeof(attr2)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET2 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/* we're crossing a mountpoint so this must fail
+	 *
+	 * Note that this must also fail for non-idmapped mounts but here we're
+	 * interested in making sure we're not introducing an accidental way to
+	 * violate that restriction or that suddenly this becomes possible.
+	 */
+	ASSERT_NE(linkat(self->target1_mnt_fd_detached, FILE1, open_tree_fd, HARDLINK1, 0), 0);
+	ASSERT_EQ(errno, EXDEV);
+
+	ASSERT_EQ(close(attr1.userns_fd), 0);
+	ASSERT_EQ(close(attr2.userns_fd), 0);
+}
+
+TEST_F(core, rename)
+{
+	int file1_fd = -EBADF, open_tree_fd = -EBADF;
+	struct mount_attr attr1 = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	struct mount_attr attr2 = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/* Chown all files to an unprivileged user. */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 10000, 10000), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr1.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr1.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr1, sizeof(attr1)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->test_dir_fd, IMAGE_ROOT_MNT1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr2.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr2.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH, &attr2, sizeof(attr2)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET2 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/* we're crossing a mountpoint so this must fail
+	 *
+	 * Note that this must also fail for non-idmapped mounts but here we're
+	 * interested in making sure we're not introducing an accidental way to
+	 * violate that restriction or that suddenly this becomes possible.
+	 */
+	ASSERT_NE(renameat2(self->target1_mnt_fd_detached, FILE1,
+			    open_tree_fd, FILE1_RENAME, 0), 0);
+
+	ASSERT_EQ(close(attr1.userns_fd), 0);
+	ASSERT_EQ(close(attr2.userns_fd), 0);
+}
+
+/**
+ * Validate that a caller whose fsids map into the idmapped mount within it's
+ * user namespace can create files.
+ */
+TEST_F(core, create_userns)
+{
+	int file1_fd = -EBADF;
+	pid_t pid;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* Switch to user namespace where uid 10000 maps to 0. */
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(0, 0, 0), 0);
+		ASSERT_EQ(setresuid(0, 0, 0), 0);
+
+		/* create regular file via open() */
+		file1_fd = openat(self->target1_mnt_fd_detached,
+				  FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+		ASSERT_GE(file1_fd, 0);
+		ASSERT_EQ(close(file1_fd), 0);
+
+		/* create regular file via mknod */
+		ASSERT_EQ(mknodat(self->target1_mnt_fd_detached,
+				  FILE2, S_IFREG | 0000, 0), 0);
+
+		/* create hardlink */
+		ASSERT_EQ(linkat(self->target1_mnt_fd_detached,
+				 FILE1,
+				 self->target1_mnt_fd_detached,
+				 HARDLINK1, 0), 0);
+
+		/* create symlink */
+		ASSERT_EQ(symlinkat(FILE2, self->target1_mnt_fd_detached,
+				    SYMLINK1), 0);
+
+		/* create directory */
+		ASSERT_EQ(mkdirat(self->target1_mnt_fd_detached,
+				  DIR1, 0700), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that a caller whose fsids map into the idmapped mount within it's
+ * user namespace cannot create any device nodes.
+ */
+TEST_F(core, create_userns_device_node)
+{
+	pid_t pid;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* Switch to user namespace where uid 10000 maps to 0. */
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(0, 0, 0), 0);
+		ASSERT_EQ(setresuid(0, 0, 0), 0);
+
+		/* create character device */
+		ASSERT_NE(mknodat(self->target1_mnt_fd_detached,
+				  CHRDEV1, S_IFCHR | 0644, makedev(5, 1)), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that changing file ownership works correctly on idmapped mounts.
+ */
+TEST_F(core, expected_uid_gid)
+{
+	int file1_fd = -EBADF;
+	uid_t fsuid;
+	gid_t fsgid;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create regular file via open() */
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(self->img_mnt_fd, FILE2, S_IFREG | 0000, 0), 0);
+
+	/* create character device */
+	ASSERT_EQ(mknodat(self->img_mnt_fd, CHRDEV1, S_IFCHR | 0644,
+			  makedev(5, 1)), 0);
+
+	/* create hardlink */
+	ASSERT_EQ(linkat(self->img_mnt_fd, FILE1, self->img_mnt_fd, HARDLINK1, 0), 0);
+
+	/* create symlink */
+	ASSERT_EQ(symlinkat(FILE2, self->img_mnt_fd, SYMLINK1), 0);
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0700), 0);
+
+	/* retrieve fsids */
+	fsuid = setfsuid(-1);
+	fsgid = setfsgid(-1);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+
+	/*
+	 * Validate that all files created through the image mountpoint are
+	 * owned by the callers fsuid and fsgid.
+	 */
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, FILE1,
+		  0, fsuid, fsgid), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, FILE2,
+		  0, fsuid, fsgid), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, HARDLINK1,
+		  0, fsuid, fsgid), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, CHRDEV1,
+		  0, fsuid, fsgid), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, SYMLINK1,
+		  0, fsuid, fsgid), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, DIR1,
+		  0, fsuid, fsgid), true);
+
+	/*
+	 * Validate that all files are owned by the uid and gid specified in
+	 * the idmapping of the mount they are accessed from.
+	 */
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE1,
+		  0, 10000, 10000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE2,
+		  0, 10000, 10000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, HARDLINK1,
+		  0, 10000, 10000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, CHRDEV1,
+		  0, 10000, 10000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, SYMLINK1,
+		  0, 10000, 10000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, DIR1,
+		  0, 10000, 10000), true);
+
+	/* Change ownership throught original image mountpoint. */
+	ASSERT_EQ(fchownat(self->img_mnt_fd, FILE1, 1000, 1000, 0), 0);
+
+	/* Verify correct ownership through original image mountpoint. */
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, FILE1,
+		  0, 1000, 1000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, HARDLINK1,
+		  0, 1000, 1000), true);
+
+	/* Verify correct ownership through idmapped mountpoint. */
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE1,
+		  0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, HARDLINK1,
+		  0, 11000, 11000), true);
+
+	/* Change ownership throught idmapped mountpoint. */
+	ASSERT_EQ(fchownat(self->target1_mnt_fd_detached, DIR1,
+			   11000, 11000, 0), 0);
+
+	/* Verify correct ownership through original image mountpoint. */
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, DIR1,
+		  0, 1000, 1000), true);
+
+	/* Verify correct ownership through idmapped mountpoint. */
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, DIR1,
+		  0, 11000, 11000), true);
+
+	/* Change ownership throught idmapped mountpoint. */
+	ASSERT_EQ(fchownat(self->target1_mnt_fd_detached, FILE1,
+			   11000, 11000, 0), 0);
+	ASSERT_EQ(fchownat(self->target1_mnt_fd_detached, FILE2,
+			   11000, 11000, 0), 0);
+	ASSERT_EQ(fchownat(self->target1_mnt_fd_detached, CHRDEV1,
+			   11000, 11000, 0), 0);
+	ASSERT_EQ(fchownat(self->target1_mnt_fd_detached, SYMLINK1,
+			   11000, 11000, 0), 0);
+	ASSERT_EQ(fchownat(self->target1_mnt_fd_detached, DIR1,
+			   11000, 11000, 0), 0);
+
+	/*
+	 * Validate that all files are owned by the uid and gid specified in
+	 * the idmapping of the mount they are accessed from.
+	 */
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE1,
+		  0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE2,
+		  0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, HARDLINK1,
+		  0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, CHRDEV1,
+		  0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, SYMLINK1,
+		  0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, DIR1,
+		  0, 11000, 11000), true);
+
+	/*
+	 * Try to change to an id for which we do not have an idmapping from
+	 * the idmapped mountpoint. This must fail.
+	 */
+	ASSERT_NE(fchownat(self->target1_mnt_fd_detached, FILE1,
+			   30000, 30000, 0), 0);
+	ASSERT_NE(fchownat(self->target1_mnt_fd_detached, FILE2,
+			   30000, 30000, 0), 0);
+	ASSERT_NE(fchownat(self->target1_mnt_fd_detached, CHRDEV1,
+			   30000, 30000, 0), 0);
+	ASSERT_NE(fchownat(self->target1_mnt_fd_detached, SYMLINK1,
+			   30000, 30000, 0), 0);
+	ASSERT_NE(fchownat(self->target1_mnt_fd_detached, DIR1,
+			   30000, 30000, 0), 0);
+
+	ASSERT_EQ(close(file1_fd), 0);
+}
+
+/**
+ * Validate that changing file ownership in a user namespace with a matching
+ * idmapping works correctly on idmapped mounts.
+ */
+TEST_F(core, expected_uid_gid_userns)
+{
+	int file1_fd = -EBADF;
+	pid_t pid;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create regular file via open() */
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(self->img_mnt_fd, FILE2, S_IFREG | 0000, 0), 0);
+
+	/* create character device */
+	ASSERT_EQ(mknodat(self->img_mnt_fd, CHRDEV1, S_IFCHR | 0644,
+			  makedev(5, 1)), 0);
+
+	/* create hardlink */
+	ASSERT_EQ(linkat(self->img_mnt_fd, FILE1, self->img_mnt_fd, HARDLINK1, 0), 0);
+
+	/* create symlink */
+	ASSERT_EQ(symlinkat(FILE2, self->img_mnt_fd, SYMLINK1), 0);
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0700), 0);
+
+	/* change ownership of all files to uid 0 */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 0, 0), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* Switch to  user namespace where uid 10000 maps to 0. */
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(0, 0, 0), 0);
+		ASSERT_EQ(setresuid(0, 0, 0), 0);
+
+		/*
+		 * All files are now owned by uid 0 if accessed through the
+		 * idmapped mountpoint.
+		 */
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   FILE1, 0, 0, 0), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   FILE2, 0, 0, 0), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   HARDLINK1, 0, 0, 0), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   CHRDEV1, 0, 0, 0), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   SYMLINK1, 0, 0, 0), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   DIR1, 0, 0, 0), true);
+
+		/* Change ownership throught idmapped mountpoint. */
+		ASSERT_EQ(fchownat(self->target1_mnt_fd_detached,
+				   FILE1, 1000, 1000, 0), 0);
+		ASSERT_EQ(fchownat(self->target1_mnt_fd_detached,
+				   FILE2, 1000, 1000, 0), 0);
+		ASSERT_EQ(fchownat(self->target1_mnt_fd_detached,
+				   CHRDEV1, 1000, 1000, 0), 0);
+		ASSERT_EQ(fchownat(self->target1_mnt_fd_detached,
+				   SYMLINK1, 1000, 1000, 0), 0);
+		ASSERT_EQ(fchownat(self->target1_mnt_fd_detached,
+				   DIR1, 1000, 1000, 0), 0);
+
+		/*
+		 * All files are now owned by uid 1000 if accessed through the
+		 * idmapped mountpoint.
+		 */
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   FILE1, 0, 1000, 1000), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   FILE2, 0, 1000, 1000), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   HARDLINK1, 0, 1000, 1000), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   CHRDEV1, 0, 1000, 1000), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   SYMLINK1, 0, 1000, 1000), true);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached,
+					   DIR1, 0, 1000, 1000), true);
+
+		/*
+		 * Try to change to an id for which we do not have an idmapping
+		 * from the idmapped mountpoint. This must fail.
+		 */
+		ASSERT_NE(fchownat(self->target1_mnt_fd_detached,
+				   FILE1, 30000, 30000, 0), 0);
+		ASSERT_NE(fchownat(self->target1_mnt_fd_detached,
+				   FILE2, 30000, 30000, 0), 0);
+		ASSERT_NE(fchownat(self->target1_mnt_fd_detached,
+				   CHRDEV1, 30000, 30000, 0), 0);
+		ASSERT_NE(fchownat(self->target1_mnt_fd_detached,
+				   SYMLINK1, 30000, 30000, 0), 0);
+		ASSERT_NE(fchownat(self->target1_mnt_fd_detached,
+				   DIR1, 30000, 30000, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * All files are owned by uid 10000 if accessed through the image
+	 * mountpoint.
+	 */
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd,
+				  FILE1, 0, 1000, 1000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd,
+				   FILE2, 0, 1000, 1000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd,
+				   HARDLINK1, 0, 1000, 1000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd,
+				   CHRDEV1, 0, 1000, 1000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd,
+				   SYMLINK1, 0, 1000, 1000), true);
+
+	ASSERT_EQ(expected_uid_gid(self->img_mnt_fd,
+				   DIR1, 0, 1000, 1000), true);
+
+	ASSERT_EQ(close(file1_fd), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that setting namespaced filesystem capabilities works correctly on
+ * idmapped mounts.
+ */
+TEST_F(core, expected_fscaps_userns)
+{
+	pid_t pid;
+	int file1_fd = -EBADF, file1_fd2 = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	file1_fd = openat(self->target1_mnt_fd_detached, FILE1, O_RDWR | O_CLOEXEC, 0);
+	ASSERT_GE(file1_fd, 0);
+
+	/*
+	 * uid 10000 maps to 0 in the mount's user namespace and uid 0 has a
+	 * mapping in our current user namespace and in the superblock's
+	 * namespace so this must succeed.
+	 */
+	ASSERT_EQ(set_dummy_vfs_caps(file1_fd, 0, 10000), 0);
+	ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd, 10000), true);
+
+	/*
+	 * uid 0 maps to 10000 in the mount's user namespace but uid 10000
+	 * doesn't have a mapping in our current user namespace so this must
+	 * fails.
+	 */
+	ASSERT_NE(set_dummy_vfs_caps(file1_fd, 0, 0), 0);
+
+	file1_fd2 = openat(self->img_mnt_fd, FILE1, O_RDWR | O_CLOEXEC, 0);
+	ASSERT_GE(file1_fd2, 0);
+	ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd2, 0), true);
+
+	ASSERT_EQ(fremovexattr(file1_fd, "security.capability"), 0);
+	ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd2, 0), false);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* Switch to a user namespace where uid 10000 maps to 0. */
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(0, 0, 0), 0);
+		ASSERT_EQ(setresuid(0, 0, 0), 0);
+
+		/*
+		 * uid 0 maps to 10000 in the mount's user namespace and uid 10000
+		 * has a mapping in our current user namespace and in the superblock's
+		 * user namespace so this must succeed.
+		 */
+		ASSERT_EQ(set_dummy_vfs_caps(file1_fd, 0, 0), 0);
+		ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd, 0), true);
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd2, 0), true);
+
+	ASSERT_EQ(close(file1_fd), 0);
+	ASSERT_EQ(close(file1_fd2), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that setting filesystem capabilities works correctly on idmapped
+ * mounts where all files on disk are owned by uid and gid 10000.
+ */
+TEST_F(core, expected_fscaps_reverse)
+{
+	int file1_fd = -EBADF, file1_fd2 = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/* Chown all files to an unprivileged user. */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 10000, 10000), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+
+	ASSERT_EQ(sys_move_mount(self->target1_mnt_fd_detached, "",
+				 self->test_dir_fd, MNT_TARGET1,
+				 MOVE_MOUNT_F_EMPTY_PATH), 0) {
+		TH_LOG("%m - Failed to attached detached mount %d(%s/" IMAGE_FILE1 ") to %s/" MNT_TARGET1,
+		       self->target1_mnt_fd_detached, self->test_dir_path,
+		       self->test_dir_path);
+	}
+
+	snprintf(self->cmdline, sizeof(self->cmdline),
+		 "setcap cap_dac_override,cap_sys_tty_config+ep %s/" MNT_TARGET1 "/" FILE1,
+		 self->test_dir_path);
+	ASSERT_EQ(system(self->cmdline), 0);
+
+	file1_fd = openat(self->target1_mnt_fd_detached, FILE1, O_RDWR | O_CLOEXEC, 0);
+	ASSERT_GE(file1_fd, 0);
+	ASSERT_EQ(set_dummy_vfs_caps(file1_fd, 0, 0), 0);
+	ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd, 0), true);
+
+	ASSERT_NE(set_dummy_vfs_caps(file1_fd, 0, 10000), 0);
+
+	file1_fd2 = openat(self->img_mnt_fd, FILE1, O_RDWR | O_CLOEXEC, 0);
+	ASSERT_GE(file1_fd2, 0);
+	ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd2, 10000), true);
+
+	ASSERT_EQ(fremovexattr(file1_fd, "security.capability"), 0);
+	ASSERT_EQ(expected_dummy_vfs_caps_uid(file1_fd2, 0), false);
+
+	ASSERT_EQ(close(file1_fd), 0);
+	ASSERT_EQ(close(file1_fd2), 0);
+}
+
+/**
+ * Validate that when the IDMAP_MOUNT_TEST_RUN_SETID environment variable is
+ * set to 1 that we are executed with setid privileges and if set to 0 we are
+ * not. If the env variable isn't set the tests are not run.
+ */
+static void __attribute__((constructor)) setuid_rexec(void)
+{
+	const char *expected_euid_str, *expected_egid_str, *rexec;
+
+	rexec = getenv("IDMAP_MOUNT_TEST_RUN_SETID");
+	/* This is a regular test-suite run. */
+	if (!rexec)
+		return;
+
+	expected_euid_str = getenv("EXPECTED_EUID");
+	expected_egid_str = getenv("EXPECTED_EGID");
+
+	if (expected_euid_str && expected_egid_str) {
+		uid_t expected_euid;
+		gid_t expected_egid;
+
+		expected_euid = atoi(expected_euid_str);
+		expected_egid = atoi(expected_egid_str);
+
+		if (strcmp(rexec, "1") == 0) {
+			/* we're expecting to run setid */
+			if ((getuid() != geteuid()) &&
+			    (expected_euid == geteuid()) &&
+			    (getgid() != getegid()) &&
+			    (expected_egid == getegid()))
+				exit(EXIT_SUCCESS);
+		} else if (strcmp(rexec, "0") == 0) {
+			/* we're expecting to not run setid */
+			if ((getuid() == geteuid()) &&
+			    (expected_euid == geteuid()) &&
+			    (getgid() == getegid()) &&
+			    (expected_egid == getegid()))
+				exit(EXIT_SUCCESS);
+		}
+	}
+
+	exit(EXIT_FAILURE);
+}
+
+/**
+ * Validate that setid transitions are handled correctly on idmapped mounts.
+ */
+TEST_F(core, setid_binaries)
+{
+	int file1_fd = -EBADF, exec_fd = -EBADF;
+	pid_t pid;
+
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create a file to be used as setuid binary */
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_RDWR | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+
+	/* open our own executable */
+	exec_fd = openat(-EBADF, "/proc/self/exe", O_RDONLY | O_CLOEXEC, 0000);
+	ASSERT_GE(exec_fd, 0);
+
+	/* copy our own executable into the file we created */
+	ASSERT_EQ(fd_to_fd(exec_fd, file1_fd), 0);
+
+	/* Chown all files to an unprivileged user. */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 10000, 10000), 0);
+
+	/* chown the file to the uid and gid we want to assume */
+	ASSERT_EQ(fchown(file1_fd, 5000, 5000), 0);
+
+	/* set the setid bits and grant execute permissions to the group */
+	ASSERT_EQ(fchmod(file1_fd, S_IXGRP | S_IEXEC | S_ISUID | S_ISGID), 0);
+
+	/* Verify that the sid bits got raised. */
+	ASSERT_EQ(is_setid(self->img_mnt_fd, FILE1, 0), true);
+
+	ASSERT_EQ(close(exec_fd), 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+
+	/* Verify we run setid binary as uid and gid 5000 from original image mount. */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		static char *envp[] = {
+			"IDMAP_MOUNT_TEST_RUN_SETID=1",
+			"EXPECTED_EUID=5000",
+			"EXPECTED_EGID=5000",
+			NULL,
+		};
+		static char *argv[] = {
+			NULL,
+		};
+
+		ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, FILE1, 0, 5000, 5000), true);
+		ASSERT_EQ(sys_execveat(self->img_mnt_fd, FILE1, argv, envp, 0), 0) {
+			TH_LOG("%m - failure: failed to execute setuid binary");
+		}
+
+		exit(EXIT_FAILURE);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * A detached mount will have an anonymous mount namespace attached to
+	 * it. This means that we can't execute setid binaries on a detached
+	 * mount because the mnt_may_suid() helper will fail the check_mount()
+	 * part of its check which compares the caller's mount namespace to the
+	 * detached mount's mount namespace. Since by definition an anonymous
+	 * mount namespace is not equale to any mount namespace currently in
+	 * use this can't work. So attach the mount to the filesystem first
+	 * before performing this check.
+	 */
+	ASSERT_EQ(sys_move_mount(self->target1_mnt_fd_detached, "",
+				 self->test_dir_fd, MNT_TARGET1,
+				 MOVE_MOUNT_F_EMPTY_PATH), 0) {
+		TH_LOG("%m - Failed to attached detached mount %d(%s/" IMAGE_FILE1 ") to %s/" MNT_TARGET1,
+		       self->target1_mnt_fd_detached, self->test_dir_path,
+		       self->test_dir_path);
+	}
+
+	/* Verify we run setid binary as uid and gid 10000 from idmapped mount mount. */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		static char *envp[] = {
+			"IDMAP_MOUNT_TEST_RUN_SETID=1",
+			"EXPECTED_EUID=15000",
+			"EXPECTED_EGID=15000",
+			NULL,
+		};
+		static char *argv[] = {
+			NULL,
+		};
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE1, 0, 15000, 15000), true);
+		ASSERT_EQ(sys_execveat(self->target1_mnt_fd_detached, FILE1, argv, envp, 0), 0) {
+			TH_LOG("%m - Failed to execute setuid binary");
+		}
+
+		exit(EXIT_FAILURE);
+	}
+
+	ASSERT_GE(wait_for_pid(pid), 0);
+}
+
+/**
+ * Validate that setid transitions are handled correctly on idmapped
+ * mounts where all files on disk are owned by uid and gid 10000.
+ */
+TEST_F(core, setid_binaries_reverse)
+{
+	int file1_fd = -EBADF, exec_fd = -EBADF;
+	pid_t pid;
+
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create a file to be used as setuid binary */
+	file1_fd = openat(self->img_mnt_fd, FILE1,
+			  O_CREAT | O_EXCL | O_RDWR | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+
+	/* open our own executable */
+	exec_fd = openat(-EBADF, "/proc/self/exe", O_RDONLY | O_CLOEXEC, 0000);
+	ASSERT_GE(exec_fd, 0);
+
+	ASSERT_EQ(fd_to_fd(exec_fd, file1_fd), 0);
+	ASSERT_EQ(close(exec_fd), 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/* chown all files to uid and gid 15000 */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 15000, 15000), 0);
+
+	/* Set setid bits on the newly chowned binary. */
+	ASSERT_EQ(fchmodat(self->img_mnt_fd, FILE1,
+			   S_IXGRP | S_IEXEC | S_ISUID | S_ISGID, 0), 0);
+
+	/* Verify that the sid bits got raised. */
+	ASSERT_EQ(is_setid(self->img_mnt_fd, FILE1, 0), true);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+
+	/* Attach the mount to the filesystem. */
+	ASSERT_EQ(sys_move_mount(self->target1_mnt_fd_detached, "",
+				 self->test_dir_fd, MNT_TARGET1,
+				 MOVE_MOUNT_F_EMPTY_PATH), 0) {
+		TH_LOG("%m - failure: failed to attached detached mount %d(%s/" IMAGE_FILE1 ") to %s/" MNT_TARGET1,
+		       self->target1_mnt_fd_detached, self->test_dir_path,
+		       self->test_dir_path);
+	}
+
+	/* Verify we run setid binary as uid and gid 5000 from idmapped mount mount. */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		static char *envp[] = {
+			"IDMAP_MOUNT_TEST_RUN_SETID=1",
+			"EXPECTED_EUID=5000",
+			"EXPECTED_EGID=5000",
+			NULL,
+		};
+		static char *argv[] = {
+			NULL,
+		};
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE1, 0, 5000, 5000), true);
+		ASSERT_EQ(sys_execveat(self->target1_mnt_fd_detached, FILE1, argv, envp, 0), 0) {
+			TH_LOG("%m - Failed to execute setuid binary");
+		}
+
+		exit(EXIT_FAILURE);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* Verify we run setid binary as uid and gid 15000 from original image mount. */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		static char *envp[] = {
+			"IDMAP_MOUNT_TEST_RUN_SETID=1",
+			"EXPECTED_EUID=15000",
+			"EXPECTED_EGID=15000",
+			NULL,
+		};
+		static char *argv[] = {
+			NULL,
+		};
+
+		ASSERT_EQ(expected_uid_gid(self->img_mnt_fd, FILE1, 0, 15000, 15000), true);
+		ASSERT_EQ(sys_execveat(self->img_mnt_fd, FILE1, argv, envp, 0), 0) {
+			TH_LOG("%m - Failed to execute setuid binary");
+		}
+
+		exit(EXIT_FAILURE);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+}
+
+/**
+ * Validate that setid transitions are handled correctly on idmapped mounts
+ * running in a user namespace where the uid and gid of the setid binary have
+ * no mapping.
+ */
+TEST_F(core, setid_binaries_userns)
+{
+	int file1_fd = -EBADF, exec_fd = -EBADF;
+	pid_t pid;
+
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create a file to be used as setuid binary */
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_RDWR | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+
+	/* open our own executable */
+	exec_fd = openat(-EBADF, "/proc/self/exe", O_RDONLY | O_CLOEXEC, 0000);
+	ASSERT_GE(exec_fd, 0);
+
+	/* copy our own executable into the file we created */
+	ASSERT_EQ(fd_to_fd(exec_fd, file1_fd), 0);
+
+	/* chown the file to the uid and gid we want to assume */
+	ASSERT_EQ(fchown(file1_fd, 5000, 5000), 0);
+
+	/* set the setid bits and grant execute permissions to the group */
+	ASSERT_EQ(fchmod(file1_fd, S_IXGRP | S_IEXEC | S_ISUID | S_ISGID), 0);
+
+	/* Verify that the sid bits got raised. */
+	ASSERT_EQ(is_setid(self->img_mnt_fd, FILE1, 0), true);
+
+	ASSERT_EQ(close(exec_fd), 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(self->target1_mnt_fd_detached, "",
+				    AT_EMPTY_PATH, &attr, sizeof(attr)), 0) {
+		TH_LOG("%m - failure: failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       self->target1_mnt_fd_detached, self->test_dir_path);
+	}
+
+	/*
+	 * A detached mount will have an anonymous mount namespace attached to
+	 * it. This means that we can't execute setid binaries on a detached
+	 * mount because the mnt_may_suid() helper will fail the check_mount()
+	 * part of its check which compares the caller's mount namespace to the
+	 * detached mount's mount namespace. Since by definition an anonymous
+	 * mount namespace is not equale to any mount namespace currently in
+	 * use this can't work. So attach the mount to the filesystem first
+	 * before performing this check.
+	 */
+	ASSERT_EQ(sys_move_mount(self->target1_mnt_fd_detached, "",
+				 self->test_dir_fd, MNT_TARGET1,
+				 MOVE_MOUNT_F_EMPTY_PATH), 0) {
+		TH_LOG("%m - Failed to attached detached mount %d(%s/" IMAGE_FILE1 ") to %s/" MNT_TARGET1,
+		       self->target1_mnt_fd_detached, self->test_dir_path,
+		       self->test_dir_path);
+	}
+
+	/* Verify we run setid binary as uid and gid 5000 from original image mount. */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		static char *envp[] = {
+			"IDMAP_MOUNT_TEST_RUN_SETID=1",
+			"EXPECTED_EUID=5000",
+			"EXPECTED_EGID=5000",
+			NULL,
+		};
+		static char *argv[] = {
+			NULL,
+		};
+
+		/* Switch to user namespace where uid 10000 maps to 0. */
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(0, 0, 0), 0);
+		ASSERT_EQ(setresuid(0, 0, 0), 0);
+
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE1, 0, 5000, 5000), true);
+		ASSERT_EQ(sys_execveat(self->target1_mnt_fd_detached, FILE1, argv, envp, 0), 0) {
+			TH_LOG("%m - failure: failed to execute setuid binary");
+		}
+
+		exit(EXIT_FAILURE);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_RDWR | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+
+	/* chown the file to the uid and gid we want to assume */
+	ASSERT_EQ(fchown(file1_fd, 30000, 30000), 0);
+
+	/* set the setid bits and grant execute permissions to other users */
+	ASSERT_EQ(fchmod(file1_fd, S_IXOTH | S_IEXEC | S_ISUID | S_ISGID), 0);
+
+	ASSERT_EQ(close(file1_fd), 0);
+
+	/*
+	 * Verify that we can't assume a uid and gid of a setid binary for
+	 * which we have no mapping in our user namespace.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		char expected_euid[100];
+		char expected_egid[100];
+		static char *envp[4] = {
+			NULL,
+			NULL,
+			NULL,
+			NULL,
+		};
+		static char *argv[] = {
+			NULL,
+		};
+
+		/* Switch to user namespace where uid 10000 maps to 0. */
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(0, 0, 0), 0);
+		ASSERT_EQ(setresuid(0, 0, 0), 0);
+
+		envp[0] = "IDMAP_MOUNT_TEST_RUN_SETID=0";
+		snprintf(expected_euid, sizeof(expected_euid), "EXPECTED_EUID=%d", geteuid());
+		envp[1] = expected_euid;
+		snprintf(expected_egid, sizeof(expected_egid), "EXPECTED_egid=%d", getegid());
+		envp[2] = expected_egid;
+		ASSERT_EQ(expected_uid_gid(self->target1_mnt_fd_detached, FILE1, 0, 65534, 65534), true);
+		ASSERT_EQ(sys_execveat(self->target1_mnt_fd_detached, FILE1, argv, envp, 0), 0) {
+			TH_LOG("%m - Failed to execute setuid binary");
+		}
+
+		exit(EXIT_FAILURE);
+	}
+	ASSERT_GE(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that idmapping a whole mount tree works correctly.
+ */
+TEST_F(core, idmap_mount_tree)
+{
+	int img_fd2 = -EBADF, img_mnt_fd2 = -EBADF, file1_fd = -EBADF,
+	    open_tree_fd = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create filesystem image */
+	img_fd2 = openat(self->test_dir_fd, IMAGE_FILE2,
+			 O_CREAT | O_WRONLY, 0600);
+	ASSERT_GE(img_fd2, 0);
+	ASSERT_EQ(ftruncate(img_fd2, 1024 * 2048), 0);
+	ASSERT_EQ(close(img_fd2), 0);
+	snprintf(self->cmdline, sizeof(self->cmdline),
+		 "mkfs.ext4 -q %s/" IMAGE_FILE2, self->test_dir_path);
+	ASSERT_EQ(system(self->cmdline), 0);
+
+	/* create mountpoint for image */
+	ASSERT_EQ(mkdirat(self->test_dir_fd, IMAGE_ROOT_MNT2, 0777), 0);
+	snprintf(self->cmdline, sizeof(self->cmdline),
+		 "mount -o loop -t ext4 %s/" IMAGE_FILE2 " %s/" IMAGE_ROOT_MNT2,
+		 self->test_dir_path, self->test_dir_path);
+	ASSERT_EQ(system(self->cmdline), 0);
+	img_mnt_fd2 = openat(self->test_dir_fd, IMAGE_ROOT_MNT2,
+			     O_DIRECTORY | O_CLOEXEC, 0);
+	ASSERT_GE(img_mnt_fd2, 0);
+
+	/* Create files in first filesystem. */
+	file1_fd = openat(self->img_mnt_fd, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	ASSERT_EQ(mknodat(self->img_mnt_fd, FILE2, S_IFREG | 0000, 0), 0);
+
+	ASSERT_EQ(mknodat(self->img_mnt_fd, CHRDEV1, S_IFCHR | 0644,
+			  makedev(5, 1)), 0);
+
+	ASSERT_EQ(linkat(self->img_mnt_fd, FILE1, self->img_mnt_fd, HARDLINK1, 0), 0);
+
+	ASSERT_EQ(symlinkat(FILE2, self->img_mnt_fd, SYMLINK1), 0);
+
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0700), 0);
+
+	/* Chown all files to 1000. */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT1,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 1000, 1000), 0);
+
+	/* Create files in second filesystem. */
+	file1_fd = openat(img_mnt_fd2, FILE1, O_CREAT | O_EXCL | O_CLOEXEC, 0644);
+	ASSERT_GE(file1_fd, 0);
+	ASSERT_EQ(close(file1_fd), 0);
+
+	ASSERT_EQ(mknodat(img_mnt_fd2, FILE2, S_IFREG | 0000, 0), 0);
+
+	ASSERT_EQ(mknodat(img_mnt_fd2, CHRDEV1, S_IFCHR | 0644,
+			  makedev(5, 1)), 0);
+
+	ASSERT_EQ(linkat(img_mnt_fd2, FILE1, img_mnt_fd2, HARDLINK1, 0), 0);
+
+	ASSERT_EQ(symlinkat(FILE2, img_mnt_fd2, SYMLINK1), 0);
+
+	ASSERT_EQ(mkdirat(img_mnt_fd2, DIR1, 0700), 0);
+	ASSERT_EQ(close(img_mnt_fd2), 0);
+
+	/* Chown all files to 1000. */
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" IMAGE_ROOT_MNT2,
+		 self->test_dir_path);
+	ASSERT_EQ(chown_r(self->cmdline, 1000, 1000), 0);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->test_dir_fd, IMAGE_ROOT_MNT1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+
+	/*
+	 * All files created through the original image mountpoint  are owned
+	 * by uid 0.
+	 */
+	ASSERT_EQ(expected_uid_gid(open_tree_fd, FILE1, 0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd, FILE2, 0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd, HARDLINK1, 0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd, CHRDEV1, 0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd, SYMLINK1, 0, 11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd, DIR1, 0, 11000, 11000), true);
+
+	/*
+	 * All files created through the original image mountpoint  are owned
+	 * by uid 0.
+	 */
+	ASSERT_EQ(expected_uid_gid(open_tree_fd,
+				   IMAGE_ROOT_MNT2_RELATIVE "/" FILE1, 0, 11000,
+				   11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd,
+				   IMAGE_ROOT_MNT2_RELATIVE "/" FILE2, 0, 11000,
+				   11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd,
+				   IMAGE_ROOT_MNT2_RELATIVE "/" HARDLINK1, 0,
+				   11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd,
+				   IMAGE_ROOT_MNT2_RELATIVE "/" CHRDEV1, 0,
+				   11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd,
+				   IMAGE_ROOT_MNT2_RELATIVE "/" SYMLINK1, 0,
+				   11000, 11000), true);
+
+	ASSERT_EQ(expected_uid_gid(open_tree_fd,
+				   IMAGE_ROOT_MNT2_RELATIVE "/" DIR1, 0, 11000,
+				   11000), true);
+
+	ASSERT_EQ(close(open_tree_fd), 0);
+}
+
+/**
+ * Validate that idmapping a mount tree with an unsupported filesystem
+ * somehwere in the tree fails.
+ */
+TEST_F(core, idmap_mount_tree_invalid)
+{
+	int img_fd2 = -EBADF, img_mnt_fd2 = -EBADF, open_tree_fd = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create filesystem image */
+	img_fd2 = openat(self->test_dir_fd, IMAGE_FILE2,
+			 O_CREAT | O_WRONLY, 0600);
+	ASSERT_GE(img_fd2, 0);
+	ASSERT_EQ(ftruncate(img_fd2, 1024 * 2048), 0);
+	ASSERT_EQ(close(img_fd2), 0);
+	snprintf(self->cmdline, sizeof(self->cmdline),
+		 "mkfs.ext4 -q %s/" IMAGE_FILE2, self->test_dir_path);
+	ASSERT_EQ(system(self->cmdline), 0);
+
+	/* create mountpoint for image */
+	ASSERT_EQ(mkdirat(self->test_dir_fd, IMAGE_ROOT_MNT2, 0777), 0);
+	snprintf(self->cmdline, sizeof(self->cmdline),
+		 "mount -o loop -t ext4 %s/" IMAGE_FILE2 " %s/" IMAGE_ROOT_MNT2,
+		 self->test_dir_path, self->test_dir_path);
+	ASSERT_EQ(system(self->cmdline), 0);
+	img_mnt_fd2 = openat(self->test_dir_fd, IMAGE_ROOT_MNT2,
+			     O_DIRECTORY | O_CLOEXEC, 0);
+	ASSERT_GE(img_mnt_fd2, 0);
+
+	/* create mount of currently unsupported filesystem */
+	ASSERT_EQ(mkdirat(self->test_dir_fd, FILESYSTEM_MOUNT1, 0777), 0);
+	snprintf(self->cmdline, sizeof(self->cmdline), "%s/" FILESYSTEM_MOUNT1, self->test_dir_path);
+	ASSERT_EQ(mount(NULL, self->cmdline, "tmpfs", 0, NULL), 0);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->test_dir_fd, IMAGE_ROOT_MNT1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_NE(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("Managed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that the sticky bit behaves correctly on regular mounts for unlink
+ * operations.
+ */
+TEST_F(core, sticky_bit_unlink)
+{
+	pid_t pid;
+	int dir_fd = -EBADF;
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 0, 0), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is not set so we must be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(unlinkat(dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(dir_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* set sticky bit */
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set so we must not be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_NE(unlinkat(dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(unlinkat(dir_fd, FILE2, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * The sticky bit is set and we own the files so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* change ownership */
+		ASSERT_EQ(fchownat(dir_fd, FILE1, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 1000, 0), true);
+		ASSERT_EQ(fchownat(dir_fd, FILE2, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE2, 0, 1000, 2000), true);
+
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(unlinkat(dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(dir_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* change uid to unprivileged user */
+	ASSERT_EQ(fchown(dir_fd, 1000, -1), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set and we own the directory so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(unlinkat(dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(dir_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+}
+
+/**
+ * Validate that the sticky bit behaves correctly on idmapped mounts for unlink
+ * operations.
+ */
+TEST_F(core, sticky_bit_unlink_idmapped)
+{
+	pid_t pid;
+	int dir_fd = -EBADF, open_tree_fd = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 10000, 10000), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 10000, 10000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 12000, 12000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->img_mnt_fd, DIR1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/*
+	 * The sticky bit is not set so we must be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE2, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* set sticky bit */
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 10000, 10000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 12000, 12000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set so we must not be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_NE(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(unlinkat(open_tree_fd, FILE2, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * The sticky bit is set and we own the files so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* change ownership */
+		ASSERT_EQ(fchownat(dir_fd, FILE1, 11000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 11000, 10000), true);
+		ASSERT_EQ(fchownat(dir_fd, FILE2, 11000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE2, 0, 11000, 12000), true);
+
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* change uid to unprivileged user */
+	ASSERT_EQ(fchown(dir_fd, 11000, -1), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 10000, 10000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 12000, 12000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set and we own the directory so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that the sticky bit behaves correctly on idmapped mounts for unlink
+ * operations in a user namespace.
+ */
+TEST_F(core, sticky_bit_unlink_idmapped_userns)
+{
+	pid_t pid;
+	int dir_fd = -EBADF, open_tree_fd = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 0, 0), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->img_mnt_fd, DIR1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/*
+	 * The sticky bit is not set so we must be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* set sticky bit */
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set so we must not be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_NE(unlinkat(dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(unlinkat(dir_fd, FILE2, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		ASSERT_NE(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(unlinkat(open_tree_fd, FILE2, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * The sticky bit is set and we own the files so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* change ownership */
+		ASSERT_EQ(fchownat(dir_fd, FILE1, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 1000, 0), true);
+		ASSERT_EQ(fchownat(dir_fd, FILE2, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE2, 0, 1000, 2000), true);
+
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* we don't own the file from the original mount */
+		ASSERT_NE(unlinkat(dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(unlinkat(dir_fd, FILE2, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		/* we own the file from the idmapped mount */
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* change uid to unprivileged user */
+	ASSERT_EQ(fchown(dir_fd, 1000, -1), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set and we own the directory so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* we don't own the directory from the original mount */
+		ASSERT_NE(unlinkat(dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(unlinkat(dir_fd, FILE2, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		/* we own the file from the idmapped mount */
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(unlinkat(open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that the sticky bit behaves correctly on regular mounts for rename
+ * operations.
+ */
+TEST_F(core, sticky_bit_rename)
+{
+	pid_t pid;
+	int dir_fd = -EBADF;
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 0, 0), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is not set so we must be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(renameat2(dir_fd, FILE1, dir_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE1_RENAME, dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE2, dir_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE2_RENAME, dir_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* set sticky bit */
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/*
+	 * The sticky bit is set so we must not be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_NE(renameat2(dir_fd, FILE1, dir_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(renameat2(dir_fd, FILE2, dir_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * The sticky bit is set and we own the files so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* change ownership */
+		ASSERT_EQ(fchownat(dir_fd, FILE1, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 1000, 0), true);
+		ASSERT_EQ(fchownat(dir_fd, FILE2, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE2, 0, 1000, 2000), true);
+
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(renameat2(dir_fd, FILE1, dir_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE1_RENAME, dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE2, dir_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE2_RENAME, dir_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* change uid to unprivileged user */
+	ASSERT_EQ(fchown(dir_fd, 1000, -1), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set and we own the directory so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+
+		ASSERT_EQ(renameat2(dir_fd, FILE1, dir_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE1_RENAME, dir_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE2, dir_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(dir_fd, FILE2_RENAME, dir_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+}
+
+/**
+ * Validate that the sticky bit behaves correctly on idmapped mounts for rename
+ * operations.
+ */
+TEST_F(core, sticky_bit_rename_idmapped)
+{
+	pid_t pid;
+	int dir_fd = -EBADF, open_tree_fd = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 10000, 10000), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 10000, 10000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 12000, 12000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->img_mnt_fd, DIR1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/*
+	 * The sticky bit is not set so we must be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1_RENAME, open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2_RENAME, open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* set sticky bit */
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 10000, 10000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 12000, 12000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set so we must not be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_NE(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * The sticky bit is set and we own the files so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* change ownership */
+		ASSERT_EQ(fchownat(dir_fd, FILE1, 11000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 11000, 10000), true);
+		ASSERT_EQ(fchownat(dir_fd, FILE2, 11000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE2, 0, 11000, 12000), true);
+
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1_RENAME, open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2_RENAME, open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* change uid to unprivileged user */
+	ASSERT_EQ(fchown(dir_fd, 11000, -1), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 10000, 10000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 12000, 12000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set and we own the directory so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1_RENAME, open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2_RENAME, open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that the sticky bit behaves correctly on idmapped mounts for rename
+ * operations in a user namespace.
+ */
+TEST_F(core, sticky_bit_rename_idmapped_userns)
+{
+	pid_t pid;
+	int dir_fd = -EBADF, open_tree_fd = -EBADF;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 0, 0), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE2, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->img_mnt_fd, DIR1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/*
+	 * The sticky bit is not set so we must be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1_RENAME, open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2_RENAME, open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* set sticky bit */
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set so we must not be able to delete files not
+	 * owned by us.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		ASSERT_NE(renameat2(dir_fd, FILE1, dir_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(renameat2(dir_fd, FILE2, dir_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		ASSERT_NE(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/*
+	 * The sticky bit is set and we own the files so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		/* change ownership */
+		ASSERT_EQ(fchownat(dir_fd, FILE1, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 1000, 0), true);
+		ASSERT_EQ(fchownat(dir_fd, FILE2, 1000, -1, 0), 0);
+		ASSERT_EQ(expected_uid_gid(dir_fd, FILE2, 0, 1000, 2000), true);
+
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* we don't own the file from the original mount */
+		ASSERT_NE(renameat2(dir_fd, FILE1, dir_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(renameat2(dir_fd, FILE2, dir_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		/* we own the file from the idmapped mount */
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1_RENAME, open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2_RENAME, open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	/* change uid to unprivileged user */
+	ASSERT_EQ(fchown(dir_fd, 1000, -1), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	ASSERT_EQ(fchownat(dir_fd, FILE2, 2000, 2000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE2, 0644, 0), 0);
+
+	/*
+	 * The sticky bit is set and we own the directory so we must be able to
+	 * delete the files now.
+	 */
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* we don't own the directory from the original mount */
+		ASSERT_NE(renameat2(dir_fd, FILE1, dir_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+		ASSERT_NE(renameat2(dir_fd, FILE2, dir_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(errno, EPERM);
+
+		/* we own the file from the idmapped mount */
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1, open_tree_fd, FILE1_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE1_RENAME, open_tree_fd, FILE1, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2, open_tree_fd, FILE2_RENAME, 0), 0);
+		ASSERT_EQ(renameat2(open_tree_fd, FILE2_RENAME, open_tree_fd, FILE2, 0), 0);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that protected symlinks work correctly.
+ */
+TEST_F(core, follow_symlinks)
+{
+	int dir_fd = -EBADF, fd = -EBADF;
+	pid_t pid;
+
+	if (!symlinks_protected())
+		SKIP(return, "Symlinks are not protected. Skipping test");
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 0, 0), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create symlinks */
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER1), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER1, 0, 0, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER1, AT_SYMLINK_NOFOLLOW, 0, 0), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 0, 0), true);
+
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER2), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER2, 1000, 1000, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER2, AT_SYMLINK_NOFOLLOW, 1000, 1000), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 0, 0), true);
+
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER3), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER3, 2000, 2000, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER3, AT_SYMLINK_NOFOLLOW, 2000, 2000), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 0, 0), true);
+
+	/* validate file can be directly read */
+	fd = openat(dir_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+	ASSERT_GE(fd, 0);
+	ASSERT_EQ(close(fd), 0);
+
+	/* validate file can be read through own symlink */
+	fd = openat(dir_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+	ASSERT_GE(fd, 0);
+	ASSERT_EQ(close(fd), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* validate file can be directly read */
+		fd = openat(dir_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through own symlink */
+		fd = openat(dir_fd, SYMLINK_USER2, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through root symlink */
+		fd = openat(dir_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can't be read through other users symlink */
+		fd = openat(dir_fd, SYMLINK_USER3, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_LT(fd, 0);
+		ASSERT_EQ(errno, EACCES);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(2000, 2000, 2000), 0);
+		ASSERT_EQ(setresuid(2000, 2000, 2000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* validate file can be directly read */
+		fd = openat(dir_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through own symlink */
+		fd = openat(dir_fd, SYMLINK_USER3, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through root symlink */
+		fd = openat(dir_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can't be read through other users symlink */
+		fd = openat(dir_fd, SYMLINK_USER2, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_LT(fd, 0);
+		ASSERT_EQ(errno, EACCES);
+
+		exit(EXIT_SUCCESS);
+	}
+
+	ASSERT_EQ(wait_for_pid(pid), 0);
+}
+
+/**
+ * Validate that protected symlinks work correctly on idmapped mounts.
+ */
+TEST_F(core, follow_symlinks_idmapped)
+{
+	int dir_fd = -EBADF, fd = -EBADF, open_tree_fd = -EBADF;
+	pid_t pid;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	if (!symlinks_protected())
+		SKIP(return, "Symlinks are not protected. Skipping test");
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 10000, 10000), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 10000, 10000, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create symlinks */
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER1), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER1, 10000, 10000, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER1, AT_SYMLINK_NOFOLLOW, 10000, 10000), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 10000, 10000), true);
+
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER2), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER2, 11000, 11000, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER2, AT_SYMLINK_NOFOLLOW, 11000, 11000), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 10000, 10000), true);
+
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER3), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER3, 12000, 12000, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER3, AT_SYMLINK_NOFOLLOW, 12000, 12000), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 10000, 10000), true);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->img_mnt_fd, DIR1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(10000, 0, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/* validate file can be directly read */
+	fd = openat(open_tree_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+	ASSERT_GE(fd, 0);
+	ASSERT_EQ(close(fd), 0);
+
+	/* validate file can be read through own symlink */
+	fd = openat(open_tree_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+	ASSERT_GE(fd, 0);
+	ASSERT_EQ(close(fd), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* validate file can be directly read */
+		fd = openat(open_tree_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through own symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER2, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through root symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can't be read through other users symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER3, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_LT(fd, 0);
+		ASSERT_EQ(errno, EACCES);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setresgid(2000, 2000, 2000), 0);
+		ASSERT_EQ(setresuid(2000, 2000, 2000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* validate file can be directly read */
+		fd = openat(open_tree_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through own symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER3, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through root symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can't be read through other users symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER2, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_LT(fd, 0);
+		ASSERT_EQ(errno, EACCES);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+	ASSERT_EQ(close(open_tree_fd), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+/**
+ * Validate that protected symlinks work correctly on idmapped mounts inside a
+ * user namespace.
+ */
+TEST_F(core, follow_symlinks_idmapped_userns)
+{
+	int dir_fd = -EBADF, fd = -EBADF, open_tree_fd = -EBADF;
+	pid_t pid;
+	struct mount_attr attr = {
+		.attr_set = MOUNT_ATTR_IDMAP,
+	};
+
+	if (!symlinks_protected())
+		SKIP(return, "Symlinks are not protected. Skipping test");
+
+	/* create directory */
+	ASSERT_EQ(mkdirat(self->img_mnt_fd, DIR1, 0000), 0);
+	dir_fd = openat(self->img_mnt_fd, DIR1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_GE(dir_fd, 0);
+	ASSERT_EQ(fchown(dir_fd, 0, 0), 0);
+	ASSERT_EQ(fchmod(dir_fd, 0777 | S_ISVTX), 0);
+	/* validate sticky bit is set */
+	ASSERT_EQ(is_sticky(self->img_mnt_fd, DIR1, 0), true);
+
+	/* create regular file via mknod */
+	ASSERT_EQ(mknodat(dir_fd, FILE1, S_IFREG | 0000, 0), 0);
+	ASSERT_EQ(fchownat(dir_fd, FILE1, 0, 0, 0), 0);
+	ASSERT_EQ(fchmodat(dir_fd, FILE1, 0644, 0), 0);
+
+	/* create symlinks */
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER1), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER1, 0, 0, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER1, AT_SYMLINK_NOFOLLOW, 0, 0), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 0, 0), true);
+
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER2), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER2, 1000, 1000, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER2, AT_SYMLINK_NOFOLLOW, 1000, 1000), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 0, 0), true);
+
+	ASSERT_EQ(symlinkat(FILE1, dir_fd, SYMLINK_USER3), 0);
+	ASSERT_EQ(fchownat(dir_fd, SYMLINK_USER3, 2000, 2000, AT_SYMLINK_NOFOLLOW), 0);
+	ASSERT_EQ(expected_uid_gid(dir_fd, SYMLINK_USER3, AT_SYMLINK_NOFOLLOW, 2000, 2000), true);
+	ASSERT_EQ(expected_uid_gid(dir_fd, FILE1, 0, 0, 0), true);
+
+	/* Create detached mount. */
+	open_tree_fd = sys_open_tree(self->img_mnt_fd, DIR1,
+				     AT_NO_AUTOMOUNT | AT_SYMLINK_NOFOLLOW |
+				     OPEN_TREE_CLOEXEC | OPEN_TREE_CLONE |
+				     AT_RECURSIVE);
+	ASSERT_GE(open_tree_fd, 0);
+
+	/* Changing mount properties on a detached mount. */
+	attr.userns_fd	= get_userns_fd(0, 10000, 10000);
+	ASSERT_GE(attr.userns_fd, 0);
+	ASSERT_EQ(sys_mount_setattr(open_tree_fd, "",
+				    AT_EMPTY_PATH | AT_RECURSIVE, &attr,
+				    sizeof(attr)), 0) {
+		TH_LOG("%m - Failed to idmap mount %d(%s/" MNT_TARGET1 ")",
+		       open_tree_fd, self->test_dir_path);
+	}
+
+	/* validate file can be directly read */
+	fd = openat(open_tree_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+	ASSERT_GE(fd, 0);
+	ASSERT_EQ(close(fd), 0);
+
+	/* validate file can be read through own symlink */
+	fd = openat(open_tree_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+	ASSERT_GE(fd, 0);
+	ASSERT_EQ(close(fd), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(1000, 1000, 1000), 0);
+		ASSERT_EQ(setresuid(1000, 1000, 1000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* validate file can be directly read */
+		fd = openat(open_tree_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through own symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER2, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through root symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can't be read through other users symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER3, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_LT(fd, 0);
+		ASSERT_EQ(errno, EACCES);
+
+		exit(EXIT_SUCCESS);
+	}
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+	if (pid == 0) {
+		ASSERT_EQ(setns(attr.userns_fd, CLONE_NEWUSER), 0);
+		ASSERT_EQ(setresgid(2000, 2000, 2000), 0);
+		ASSERT_EQ(setresuid(2000, 2000, 2000), 0);
+		ASSERT_EQ(caps_down(), true);
+
+		/* validate file can be directly read */
+		fd = openat(open_tree_fd, FILE1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through own symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER3, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can be read through root symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER1, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_GE(fd, 0);
+		ASSERT_EQ(close(fd), 0);
+
+		/* validate file can't be read through other users symlink */
+		fd = openat(open_tree_fd, SYMLINK_USER2, O_RDONLY | O_CLOEXEC, 0);
+		ASSERT_LT(fd, 0);
+		ASSERT_EQ(errno, EACCES);
+
+		exit(EXIT_SUCCESS);
+	}
+
+	ASSERT_EQ(wait_for_pid(pid), 0);
+
+	ASSERT_EQ(close(dir_fd), 0);
+	ASSERT_EQ(close(open_tree_fd), 0);
+	ASSERT_EQ(close(attr.userns_fd), 0);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/idmap_mounts/internal.h b/tools/testing/selftests/idmap_mounts/internal.h
index 252803f35d71..a7d648c305cc 100644
--- a/tools/testing/selftests/idmap_mounts/internal.h
+++ b/tools/testing/selftests/idmap_mounts/internal.h
@@ -4,8 +4,16 @@
 #define __IDMAP_INTERNAL_H
 
 #define _GNU_SOURCE
-
-#include "../kselftest_harness.h"
+#include <errno.h>
+#include <linux/types.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <sys/types.h>
+#include <unistd.h>
 
 #ifndef __NR_mount_setattr
 	#if defined __alpha__
@@ -26,6 +34,14 @@
 		#define __NR_mount_setattr 441
 	#endif
 
+struct mount_attr {
+	__u64 attr_set;
+	__u64 attr_clr;
+	__u64 propagation;
+	__u64 userns_fd;
+};
+#endif
+
 #ifndef __NR_open_tree
 	#if defined __alpha__
 		#define __NR_open_tree 538
@@ -66,15 +82,6 @@
 	#endif
 #endif
 
-
-struct mount_attr {
-	__u64 attr_set;
-	__u64 attr_clr;
-	__u64 propagation;
-	__u64 userns;
-};
-#endif
-
 #ifndef MOVE_MOUNT_F_EMPTY_PATH
 #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
 #endif
@@ -95,6 +102,10 @@ struct mount_attr {
 #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
 #endif
 
+#ifndef MAKE_PROPAGATION_PRIVATE
+#define MAKE_PROPAGATION_PRIVATE 2
+#endif
+
 static inline int sys_mount_setattr(int dfd, const char *path, unsigned int flags,
 				    struct mount_attr *attr, size_t size)
 {
diff --git a/tools/testing/selftests/idmap_mounts/utils.c b/tools/testing/selftests/idmap_mounts/utils.c
new file mode 100644
index 000000000000..34ea3a7f9393
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/utils.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <linux/limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sched.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "internal.h"
+
+ssize_t read_nointr(int fd, void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = read(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = write(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+static int write_file(const char *path, const void *buf, size_t count)
+{
+	int fd;
+	ssize_t ret;
+
+	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
+	if (fd < 0)
+		return -1;
+
+	ret = write_nointr(fd, buf, count);
+	close(fd);
+	if (ret < 0 || (size_t)ret != count)
+		return -1;
+
+	return 0;
+}
+
+static int map_ids(pid_t pid, unsigned long nsid, unsigned long hostid,
+		   unsigned long range)
+{
+	char map[100], procfile[256];
+
+	snprintf(procfile, sizeof(procfile), "/proc/%d/setgroups", pid);
+	if (write_file(procfile, "deny", sizeof("deny") - 1) &&
+	    errno != ENOENT)
+		return -1;
+
+	snprintf(procfile, sizeof(procfile), "/proc/%d/uid_map", pid);
+	snprintf(map, sizeof(map), "%lu %lu %lu", nsid, hostid, range);
+	if (write_file(procfile, map, strlen(map)))
+		return -1;
+
+
+	snprintf(procfile, sizeof(procfile), "/proc/%d/gid_map", pid);
+	snprintf(map, sizeof(map), "%lu %lu %lu", nsid, hostid, range);
+	if (write_file(procfile, map, strlen(map)))
+		return -1;
+
+	return 0;
+}
+
+#define __STACK_SIZE (8 * 1024 * 1024)
+pid_t do_clone(int (*fn)(void *), void *arg, int flags)
+{
+	void *stack;
+
+	stack = malloc(__STACK_SIZE);
+	if (!stack)
+		return -ENOMEM;
+
+#ifdef __ia64__
+	return __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, NULL);
+#else
+	return clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, NULL);
+#endif
+}
+
+static int get_userns_fd_cb(void *data)
+{
+	return kill(getpid(), SIGSTOP);
+}
+
+int get_userns_fd(unsigned long nsid, unsigned long hostid, unsigned long range)
+{
+	int ret;
+	pid_t pid;
+	char path[256];
+
+	pid = do_clone(get_userns_fd_cb, NULL, CLONE_NEWUSER | CLONE_NEWNS);
+	if (pid < 0)
+		return -errno;
+
+	ret = map_ids(pid, nsid, hostid, range);
+	if (ret < 0)
+		return ret;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/user", pid);
+	ret = open(path, O_RDONLY | O_CLOEXEC);
+	kill(pid, SIGKILL);
+	return ret;
+}
+
+int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
diff --git a/tools/testing/selftests/idmap_mounts/utils.h b/tools/testing/selftests/idmap_mounts/utils.h
new file mode 100644
index 000000000000..1f52997af99d
--- /dev/null
+++ b/tools/testing/selftests/idmap_mounts/utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __IDMAP_UTILS_H
+#define __IDMAP_UTILS_H
+
+#define _GNU_SOURCE
+
+#include "../kselftest_harness.h"
+
+extern pid_t do_clone(int (*fn)(void *), void *arg, int flags);
+extern int get_userns_fd(unsigned long nsid, unsigned long hostid,
+			 unsigned long range);
+extern ssize_t read_nointr(int fd, void *buf, size_t count);
+extern int wait_for_pid(pid_t pid);
+extern ssize_t write_nointr(int fd, const void *buf, size_t count);
+
+#endif /* __IDMAP_UTILS_H */
diff --git a/tools/testing/selftests/idmap_mounts/xattr.c b/tools/testing/selftests/idmap_mounts/xattr.c
index 58e88f92f958..0625e3fe53ac 100644
--- a/tools/testing/selftests/idmap_mounts/xattr.c
+++ b/tools/testing/selftests/idmap_mounts/xattr.c
@@ -8,102 +8,9 @@
 #include <linux/limits.h>
 
 #include "internal.h"
+#include "utils.h"
 #include "../kselftest_harness.h"
 
-static ssize_t write_nointr(int fd, const void *buf, size_t count)
-{
-	ssize_t ret;
-
-	do {
-		ret = write(fd, buf, count);
-	} while (ret < 0 && errno == EINTR);
-
-	return ret;
-}
-
-static int write_file(const char *path, const void *buf, size_t count)
-{
-	int fd;
-	ssize_t ret;
-
-	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
-	if (fd < 0)
-		return -1;
-
-	ret = write_nointr(fd, buf, count);
-	close(fd);
-	if (ret < 0 || (size_t)ret != count)
-		return -1;
-
-	return 0;
-}
-
-static int map_ids(pid_t pid, unsigned long nsid, unsigned long hostid,
-		   unsigned long range)
-{
-	char map[100], procfile[256];
-
-	snprintf(procfile, sizeof(procfile), "/proc/%d/setgroups", pid);
-	if (write_file(procfile, "deny", sizeof("deny") - 1) &&
-	    errno != ENOENT)
-		return -1;
-
-	snprintf(procfile, sizeof(procfile), "/proc/%d/uid_map", pid);
-	snprintf(map, sizeof(map), "%lu %lu %lu", nsid, hostid, range);
-	if (write_file(procfile, map, strlen(map)))
-		return -1;
-
-
-	snprintf(procfile, sizeof(procfile), "/proc/%d/gid_map", pid);
-	snprintf(map, sizeof(map), "%lu %lu %lu", nsid, hostid, range);
-	if (write_file(procfile, map, strlen(map)))
-		return -1;
-
-	return 0;
-}
-
-#define __STACK_SIZE (8 * 1024 * 1024)
-static pid_t do_clone(int (*fn)(void *), void *arg, int flags)
-{
-	void *stack;
-
-	stack = malloc(__STACK_SIZE);
-	if (!stack)
-		return -ENOMEM;
-
-#ifdef __ia64__
-	return __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, NULL);
-#else
-	return clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, NULL);
-#endif
-}
-
-static int get_userns_fd_cb(void *data)
-{
-	return kill(getpid(), SIGSTOP);
-}
-
-static int get_userns_fd(unsigned long nsid, unsigned long hostid,
-			 unsigned long range)
-{
-	int ret;
-	pid_t pid;
-	char path[256];
-
-	pid = do_clone(get_userns_fd_cb, NULL, CLONE_NEWUSER | CLONE_NEWNS);
-	if (pid < 0)
-		return -errno;
-
-	ret = map_ids(pid, nsid, hostid, range);
-	if (ret < 0)
-		return ret;
-
-	snprintf(path, sizeof(path), "/proc/%d/ns/user", pid);
-	ret = open(path, O_RDONLY | O_CLOEXEC);
-	kill(pid, SIGKILL);
-	return ret;
-}
-
 struct run_as_data {
 	int userns;
 	int (*f)(void *data);
@@ -132,25 +39,6 @@ static int run_in_cb(void *data)
 	return rad->f(rad->data);
 }
 
-static int wait_for_pid(pid_t pid)
-{
-	int status, ret;
-
-again:
-	ret = waitpid(pid, &status, 0);
-	if (ret == -1) {
-		if (errno == EINTR)
-			goto again;
-
-		return -1;
-	}
-
-	if (!WIFEXITED(status))
-		return -1;
-
-	return WEXITSTATUS(status);
-}
-
 static int run_in(int userns, int (*f)(void *), void *f_data)
 {
 	pid_t pid;
@@ -245,8 +133,8 @@ TEST_F(ext4_xattr, setattr_didnt_work)
 				 MOVE_MOUNT_F_EMPTY_PATH), 0);
 
 	attr.attr_set = MOUNT_ATTR_IDMAP;
-	attr.userns = get_userns_fd(100010, 100020, 5);
-	ASSERT_GE(attr.userns, 0);
+	attr.userns_fd = get_userns_fd(100010, 100020, 5);
+	ASSERT_GE(attr.userns_fd, 0);
 	ret = sys_mount_setattr(mount_fd, "", AT_EMPTY_PATH | AT_RECURSIVE,
 				    &attr, sizeof(attr));
 	ASSERT_EQ(close(mount_fd), 0);
@@ -261,24 +149,24 @@ TEST_F(ext4_xattr, setattr_didnt_work)
 
 	snprintf(ssb.path, sizeof(ssb.path), "/tmp/ext4/source/foo");
 	ssb.uid = 4294967295;
-	EXPECT_EQ(run_in(attr.userns, getacl_should_be_uid, &ssb), 0);
+	EXPECT_EQ(run_in(attr.userns_fd, getacl_should_be_uid, &ssb), 0);
 
 	snprintf(ssb.path, sizeof(ssb.path), "/tmp/ext4/dest/foo");
 	ssb.uid = 100010;
-	EXPECT_EQ(run_in(attr.userns, getacl_should_be_uid, &ssb), 0);
+	EXPECT_EQ(run_in(attr.userns_fd, getacl_should_be_uid, &ssb), 0);
 
 	/*
 	 * now, dir is owned by someone else in the user namespace, but we can
 	 * still read it because of acls
 	 */
 	ASSERT_EQ(chown("/tmp/ext4/source/foo", 100012, 100012), 0);
-	EXPECT_EQ(run_in(attr.userns, ls_path, "/tmp/ext4/dest/foo"), 0);
+	EXPECT_EQ(run_in(attr.userns_fd, ls_path, "/tmp/ext4/dest/foo"), 0);
 
 	/*
 	 * if we delete the acls, the ls should fail because it's 700.
 	 */
 	ASSERT_EQ(system("setfacl --remove-all /tmp/ext4/source/foo"), 0);
-	EXPECT_NE(run_in(attr.userns, ls_path, "/tmp/ext4/dest/foo"), 0);
+	EXPECT_NE(run_in(attr.userns_fd, ls_path, "/tmp/ext4/dest/foo"), 0);
 }
 
 TEST_HARNESS_MAIN
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 36/39] overlayfs: do not mount on top of idmapped mounts
  2020-11-15 10:37 ` [PATCH v2 36/39] overlayfs: " Christian Brauner
@ 2020-11-15 12:31   ` Amir Goldstein
  2020-11-18 10:26     ` Christian Brauner
  0 siblings, 1 reply; 63+ messages in thread
From: Amir Goldstein @ 2020-11-15 12:31 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, Miklos Szeredi,
	overlayfs

On Sun, Nov 15, 2020 at 12:42 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> Prevent overlayfs from being mounted on top of idmapped mounts until we
> have ported it to handle this case and added proper testing for it.
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---
> /* v2 */
> patch introduced
> ---
>  fs/overlayfs/super.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 0d4f2baf6836..3cacc3d3fb65 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -1708,6 +1708,12 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
>                 if (err)
>                         goto out_err;
>
> +               if (mnt_idmapped(stack[i].mnt)) {
> +                       err = -EINVAL;
> +                       pr_err("idmapped lower layers are currently unsupported\n");
> +                       goto out_err;
> +               }
> +
>                 lower = strchr(lower, '\0') + 1;
>         }
>
> @@ -1939,6 +1945,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>                 if (err)
>                         goto out_err;
>
> +               if (mnt_idmapped(upperpath.mnt)) {
> +                       err = -EINVAL;
> +                       pr_err("idmapped lower layers are currently unsupported\n");
> +                       goto out_err;
> +               }
> +

Both checks should be replaced with one check in ovl_mount_dir_noesc()
right next to ovl_dentry_weird() check and FWIW the error above about
"lower layers" when referring to upperpath.mnt is confusing.
"idmapped layers..." should be fine.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/39] fs: idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (38 preceding siblings ...)
  2020-11-15 10:37 ` [PATCH v2 39/39] tests: add vfs/idmapped mounts test suite Christian Brauner
@ 2020-11-17 23:54 ` Jonathan Corbet
  2020-11-18  9:45   ` Christian Brauner
  2020-11-18  3:51 ` Stephen Barber
  2020-11-20  2:33 ` Darrick J. Wong
  41 siblings, 1 reply; 63+ messages in thread
From: Jonathan Corbet @ 2020-11-17 23:54 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, containers,
	linux-security-module, linux-api, linux-ext4, linux-audit,
	linux-integrity, selinux

On Sun, 15 Nov 2020 11:36:39 +0100
Christian Brauner <christian.brauner@ubuntu.com> wrote:

One quick question...

> I have written a simple tool available at
> https://github.com/brauner/mount-idmapped that allows to create idmapped
> mounts so people can play with this patch series.

I spent a while looking at that tool.  When actually setting the namespace
for the mapping, it uses MOUNT_ATTR_SHIFT rather than MOUNT_ATTR_IDMAP.
The value is the same, so I expect it works...:)  But did that perhaps not
get updated to reflect a name change?

Thanks,

jon

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/39] fs: idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (39 preceding siblings ...)
  2020-11-17 23:54 ` [PATCH v2 00/39] fs: idmapped mounts Jonathan Corbet
@ 2020-11-18  3:51 ` Stephen Barber
  2020-11-20  2:33 ` Darrick J. Wong
  41 siblings, 0 replies; 63+ messages in thread
From: Stephen Barber @ 2020-11-18  3:51 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, Phil Estes,
	Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet, containers,
	linux-security-module, linux-api, linux-ext4, linux-audit,
	linux-integrity, selinux

Hi Christian,

Thanks for working on this and respinning this patch series - wanted
to chime in below that we on Chrome OS also have lots of exciting use
cases for this.

On Sun, Nov 15, 2020 at 2:38 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> Hey everyone,
>
> This is v2. It is reworked according to the reviews coming from
> Christoph and others to adapt all relevant helpers and inode_operations
> methods to account for idmapped mounts instead of introducing new
> helpers and methods specific to idmapped mounts like we did before.
> We've also moved the overlayfs conversion to handle idmapped mounts into
> a separate patchset that will be sent out separately after the core
> changes. The converted filesytems in this series include fat and ext4.
> The config option to disable idmapped mounts has been moved from a a vfs
> config ption to a per-filesystem option. They default to off. Having a
> config option allows us to gain some confidence in the patchset over
> multiple kernel releases.
>
> There are two noteable things about this version. First, that it comes
> with a really large test-suite to test current vfs behavior and
> idmapped mounts behavior. We intend this test-suite to grow over time
> and at some point cover most basic core vfs functionality that isn't
> covered in xfstests and have it be part of the selftests.
> Second, while while working on adapting this patchset to the requested
> changes, the runC and containerd crowd was nice enough to adapt
> containerd to this patchset to make use of idmapped mounts in one of the
> most widely used container runtimes:
> https://github.com/containerd/containerd/pull/4734
>
> With this patchset we make it possible to attach idmappings to bind
> mounts. This handles several common use-cases. Here are just a few:
> - Shifting of a container rootfs or base image without having to mangle
>   every file (runc, Docker, containerd, k8s, LXD, systemd ...)

This is one that has hit us repeatedly with Crostini via LXD. We need to
change uids/gids to share files and directories with the host.

> - Sharing of data between host or privileged containers with
>   underprivileged containers (runc, Docker, containerd, k8s, LXD, ...)

Also useful for sharing data with unprivileged sandboxes like Chrome
OS's minijail.

> - Shifting of subset of ownership-less filesystems (vfat) for use by
>   multiple users, effectively allowing for DAC on such devices (systemd,
>   Android, ...)
> - Data sharing between multiple user namespaces with incompatible maps
>   (LXD, k8s, ...)

This may also prove useful sharing data between Chrome OS, Android,
and Crostini on the same device. Basically the same use case, although
VM boundaries are involved.

> Making it possible to share directories and mounts between users with
> different uids and gids is itself quite an important use-case in
> distributed systems environments. It's of course especially useful in
> general for portable usb sticks, sharing data between multiple users in
> general, and sharing home directories between multiple users. The last
> example is now elegantly expressed in systemd's homed concept for
> portable home directories. As mentioned above, idmapped mounts also
> allow data from the host to be shared with unprivileged containers,
> between privileged and unprivileged containers simultaneously and in
> addition also between unprivileged containers with different idmappings
> whenever they are used to isolate one container completely from another
> container.
> As can be seen from answers to earlier threads of this patchset and from
> the list of potential users interest in this patchset is fairly
> widespread.
>
> We have implemented and proposed multiple solutions to this before. This
> included the introduction of fsid mappings, a tiny filesystem that is
> currently carried in Ubuntu that has shown it's limitations, and an
> approach to call override creds in the vfs. None of these solutions have
> covered all of the above use-cases. Some of them have been fairly hacky
> too by e.g. violating how things should be passed down to the individual
> filesystems.
> The solution proposed here has it's origins in multiple discussions
> during Linux Plumbers 2017 during and after the end of the containers
> microconference.
> To the best of my knowledge this involved Aleksa, Stéphane, Eric, David,
> James, and myself. The original idea or a variant thereof has been
> discussed, again to the best of my knowledge, after a Linux conference
> in St. Petersburg in Russia in 2017 between Christoph, Tycho, and
> myself.
> We've taken the time to implement a working version of this solution
> over the last weeks to the best of my abilities. Tycho has signed up
> for this sligthly crazy endeavour as well and he has helped with the
> conversion of the xattr codepaths and will be involved with others in
> converting additional filesystems.
>
> This series makes idmappings a property of struct vfsmount instead of
> tying it to a process being inside of a user namespace which has been
> the case for all other proposed approaches. It also allows to pass down
> the user namespace into the filesystms which is a clean way instead of
> violating calling conventions by strapping the user namespace
> information that is a property of the mount to the caller's credentials
> or similar hacks.
> With this idmappings become a property of bind-mounts, i.e. each
> bind-mount can have a separate idmapping. Such idmapped mounts can even
> be created inside of the initial user namespace.
>
> The vfsmount struct gains a new struct user_namespace member. The
> idmapping of the user namespace becomes the idmapping of the mount. A
> caller that is privileged with respect to the user namespace of
> the superblock of the underlying filesystem can create an idmapped
> bind-mount. In the future, we can enable unprivileged use-cases by
> checking whether the caller is privileged wrt to the user namespace an
> already idmapped mount has been marked with, allowing them to change the
> idmapping. For now, keep things simple until we feel sure enough. Note,
> that with syscall interception it is already possible to intercept
> idmapped mount requests from unprivileged containers and handle them in
> a sufficiently privileged container manager. Support for this is
> already available in LXD and will be available in runC were syscall
> interception is currently becoming part of the runtime spec (see
> https://github.com/opencontainers/runtime-spec/pull/1074).
>
> The user namespace the mount will be marked with can be specified by
> passing a file descriptor refering to the user namespace as an argument
> to the new mount_setattr() syscall together with the new
> MOUNT_ATTR_IDMAP flag. By default vfsmounts are marked with the initial
> user namespace and no behavioral or performance changes should be
> observed. All mapping operations are nops for the initial user
> namespace. When a file/inode is accessed through an idmapped mount the
> i_uid and i_gid of the inode will be remapped according to the user
> namespace the mount has been marked with.
>
> In order to support idmapped mounts, filesystems need to be changed and
> mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The initial
> version contains fat and ext4 including a list of examples. But patches
> for other filesystems are actively worked on but will be sent out
> separately. We are here to see this through and there are multiple
> people involved in converting filesystems. So filesystem developers are
> not left alone with this.
>
> I have written a simple tool available at
> https://github.com/brauner/mount-idmapped that allows to create idmapped
> mounts so people can play with this patch series. Here are a few
> illustrations:
>
> 1. Create a simple idmapped mount of another user's home directory
>
> u1001@f2-vm:/$ sudo ./mount-idmapped --map-mount b:1000:1001:1 /home/ubuntu/ /mnt
> u1001@f2-vm:/$ ls -al /home/ubuntu/
> total 28
> drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
> drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
> -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
> -rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
> -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
> -rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
> -rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
> -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
> u1001@f2-vm:/$ ls -al /mnt/
> total 28
> drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
> drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
> -rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
> -rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
> -rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
> -rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
> -rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
> -rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo
> u1001@f2-vm:/$ touch /mnt/my-file
> u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
> u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
> u1001@f2-vm:/$ ls -al /mnt/my-file
> -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
> u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
> -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
> u1001@f2-vm:/$ getfacl /mnt/my-file
> getfacl: Removing leading '/' from absolute path names
> # file: mnt/my-file
> # owner: u1001
> # group: u1001
> user::rw-
> user:u1001:rwx
> group::rw-
> mask::rwx
> other::r--
> u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
> getfacl: Removing leading '/' from absolute path names
> # file: home/ubuntu/my-file
> # owner: ubuntu
> # group: ubuntu
> user::rw-
> user:ubuntu:rwx
> group::rw-
> mask::rwx
> other::r--
>
> 2. Create mapping of the whole ext4 rootfs without a mapping for uid and gid 0
>
> ubuntu@f2-vm:~$ sudo /mount-idmapped --map-mount b:1:1:65536 / /mnt/
> ubuntu@f2-vm:~$ findmnt | grep mnt
> └─/mnt                                /dev/sda2  ext4       rw,relatime
>   └─/mnt/mnt                          /dev/sda2  ext4       rw,relatime
> ubuntu@f2-vm:~$ sudo mkdir /AS-ROOT-CAN-CREATE
> ubuntu@f2-vm:~$ sudo mkdir /mnt/AS-ROOT-CANT-CREATE
> mkdir: cannot create directory ‘/mnt/AS-ROOT-CANT-CREATE’: Value too large for defined data type
> ubuntu@f2-vm:~$ mkdir /mnt/home/ubuntu/AS-USER-1000-CAN-CREATE
>
> 3. Create a vfat usb mount and expose to user 1001 and 5000
>
> ubuntu@f2-vm:/$ sudo mount /dev/sdb /mnt
> ubuntu@f2-vm:/$ findmnt  | grep mnt
> └─/mnt                                /dev/sdb vfat       rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
> ubuntu@f2-vm:/$ ls -al /mnt
> total 12
> drwxr-xr-x  2 root root 4096 Jan  1  1970 .
> drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
> -rwxr-xr-x  1 root root    4 Oct 28 03:44 aaa
> -rwxr-xr-x  1 root root    0 Oct 28 01:09 bbb
> ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:1001:1 /mnt /mnt-1001/
> ubuntu@f2-vm:/$ ls -al /mnt-1001/
> total 12
> drwxr-xr-x  2 u1001 u1001 4096 Jan  1  1970 .
> drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
> -rwxr-xr-x  1 u1001 u1001    4 Oct 28 03:44 aaa
> -rwxr-xr-x  1 u1001 u1001    0 Oct 28 01:09 bbb
> ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:5000:1 /mnt /mnt-5000/
> ubuntu@f2-vm:/$ ls -al /mnt-5000/
> total 12
> drwxr-xr-x  2 5000 5000 4096 Jan  1  1970 .
> drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
> -rwxr-xr-x  1 5000 5000    4 Oct 28 03:44 aaa
> -rwxr-xr-x  1 5000 5000    0 Oct 28 01:09 bbb
>
> 4. Create an idmapped rootfs mount for a container
>
> root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/
> total 68
> drwxr-xr-x 17 20000 20000 4096 Sep 24 07:48 .
> drwxrwx---  3 20000 20000 4096 Oct 16 19:26 ..
> lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 bin -> usr/bin
> drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 boot
> drwxr-xr-x  3 20000 20000 4096 Oct 16 19:26 dev
> drwxr-xr-x 61 20000 20000 4096 Oct 16 19:26 etc
> drwxr-xr-x  3 20000 20000 4096 Sep 24 07:45 home
> lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 lib -> usr/lib
> lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib32 -> usr/lib32
> lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib64 -> usr/lib64
> lrwxrwxrwx  1 20000 20000   10 Sep 24 07:43 libx32 -> usr/libx32
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 media
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 mnt
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 opt
> drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 proc
> drwx------  2 20000 20000 4096 Sep 24 07:43 root
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:45 run
> lrwxrwxrwx  1 20000 20000    8 Sep 24 07:43 sbin -> usr/sbin
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 srv
> drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 sys
> drwxrwxrwt  2 20000 20000 4096 Sep 24 07:44 tmp
> drwxr-xr-x 13 20000 20000 4096 Sep 24 07:43 usr
> drwxr-xr-x 12 20000 20000 4096 Sep 24 07:44 var
> root@f2-vm:~# /mount-idmapped --map-mount b:20000:10000:100000 /var/lib/lxc/f2/rootfs/ /mnt
> root@f2-vm:~# ls -al /mnt
> total 68
> drwxr-xr-x 17 10000 10000 4096 Sep 24 07:48 .
> drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
> lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 bin -> usr/bin
> drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 boot
> drwxr-xr-x  3 10000 10000 4096 Oct 16 19:26 dev
> drwxr-xr-x 61 10000 10000 4096 Oct 16 19:26 etc
> drwxr-xr-x  3 10000 10000 4096 Sep 24 07:45 home
> lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 lib -> usr/lib
> lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib32 -> usr/lib32
> lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib64 -> usr/lib64
> lrwxrwxrwx  1 10000 10000   10 Sep 24 07:43 libx32 -> usr/libx32
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 media
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 mnt
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 opt
> drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 proc
> drwx------  2 10000 10000 4096 Sep 24 07:43 root
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:45 run
> lrwxrwxrwx  1 10000 10000    8 Sep 24 07:43 sbin -> usr/sbin
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 srv
> drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 sys
> drwxrwxrwt  2 10000 10000 4096 Sep 24 07:44 tmp
> drwxr-xr-x 13 10000 10000 4096 Sep 24 07:43 usr
> drwxr-xr-x 12 10000 10000 4096 Sep 24 07:44 var
> root@f2-vm:~# lxc-start f2 # uses /mnt as rootfs
> root@f2-vm:~# lxc-attach f2 -- cat /proc/1/uid_map
>          0      10000      10000
> root@f2-vm:~# lxc-attach f2 -- cat /proc/1/gid_map
>          0      10000      10000
> root@f2-vm:~# lxc-attach f2 -- ls -al /
> total 52
> drwxr-xr-x  17 root   root    4096 Sep 24 07:48 .
> drwxr-xr-x  17 root   root    4096 Sep 24 07:48 ..
> lrwxrwxrwx   1 root   root       7 Sep 24 07:43 bin -> usr/bin
> drwxr-xr-x   2 root   root    4096 Apr 15  2020 boot
> drwxr-xr-x   5 root   root     500 Oct 28 23:39 dev
> drwxr-xr-x  61 root   root    4096 Oct 28 23:39 etc
> drwxr-xr-x   3 root   root    4096 Sep 24 07:45 home
> lrwxrwxrwx   1 root   root       7 Sep 24 07:43 lib -> usr/lib
> lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib32 -> usr/lib32
> lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib64 -> usr/lib64
> lrwxrwxrwx   1 root   root      10 Sep 24 07:43 libx32 -> usr/libx32
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 media
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 mnt
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 opt
> dr-xr-xr-x 232 nobody nogroup    0 Oct 28 23:39 proc
> drwx------   2 root   root    4096 Oct 28 23:41 root
> drwxr-xr-x  12 root   root     360 Oct 28 23:39 run
> lrwxrwxrwx   1 root   root       8 Sep 24 07:43 sbin -> usr/sbin
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 srv
> dr-xr-xr-x  13 nobody nogroup    0 Oct 28 23:39 sys
> drwxrwxrwt  11 root   root    4096 Oct 28 23:40 tmp
> drwxr-xr-x  13 root   root    4096 Sep 24 07:43 usr
> drwxr-xr-x  12 root   root    4096 Sep 24 07:44 var
> root@f2-vm:~# lxc-attach f2 -- ls -al /my-file
> -rw-r--r-- 1 root root 0 Oct 28 23:43 /my-file
> root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/my-file
> -rw-r--r-- 1 20000 20000 0 Oct 28 23:43 /var/lib/lxc/f2/rootfs/my-file
>
> I'd like to say thanks to:
> Al for pointing me into the direction to avoid inode alias issues during
> lookup. David for various discussions around this. Christoph for proving
> a first proper review and for being involved in the original idea. Tycho
> for helping with this series and on future patches to convert
> filesystems. Alban Crequy and the Kinvolk located just a few streets
> away from me in Berlin for providing use-case discussions and writing
> patches for containerd! Stéphane for his invaluable input on many things
> and level head and enabling me to work on this. Amir for explaining and
> discussing aspects of overlayfs with me. I'd like to especially thank
> Seth Forshee because he provided a lot of good analysis, suggestions,
> and participated in short-notice discussions in both chat and video for
> some nitty-gritty technical details.
>
> This series can be found and pulled from the three usual locations:
> https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=idmapped_mounts
> https://github.com/brauner/linux/tree/idmapped_mounts
> https://gitlab.com/brauner/linux/-/commits/idmapped_mounts
>
> Thanks!
> Christian
>
> Christian Brauner (37):
>   namespace: take lock_mount_hash() directly when changing flags
>   mount: make {lock,unlock}_mount_hash() static
>   namespace: only take read lock in do_reconfigure_mnt()
>   fs: add mount_setattr()
>   tests: add mount_setattr() selftests
>   fs: add id translation helpers
>   mount: attach mappings to mounts
>   capability: handle idmapped mounts
>   namei: add idmapped mount aware permission helpers
>   inode: add idmapped mount aware init and permission helpers
>   attr: handle idmapped mounts
>   acl: handle idmapped mounts
>   commoncap: handle idmapped mounts
>   stat: handle idmapped mounts
>   namei: handle idmapped mounts in may_*() helpers
>   namei: introduce struct renamedata
>   namei: prepare for idmapped mounts
>   open: handle idmapped mounts in do_truncate()
>   open: handle idmapped mounts
>   af_unix: handle idmapped mounts
>   utimes: handle idmapped mounts
>   fcntl: handle idmapped mounts
>   notify: handle idmapped mounts
>   init: handle idmapped mounts
>   ioctl: handle idmapped mounts
>   would_dump: handle idmapped mounts
>   exec: handle idmapped mounts
>   fs: add helpers for idmap mounts
>   apparmor: handle idmapped mounts
>   audit: handle idmapped mounts
>   ima: handle idmapped mounts
>   fat: handle idmapped mounts
>   ext4: support idmapped mounts
>   ecryptfs: do not mount on top of idmapped mounts
>   overlayfs: do not mount on top of idmapped mounts
>   fs: introduce MOUNT_ATTR_IDMAP
>   tests: add vfs/idmapped mounts test suite
>
> Tycho Andersen (2):
>   xattr: handle idmapped mounts
>   selftests: add idmapped mounts xattr selftest
>
>  Documentation/filesystems/locking.rst         |    6 +-
>  Documentation/filesystems/porting.rst         |    2 +
>  Documentation/filesystems/vfs.rst             |   17 +-
>  arch/alpha/kernel/syscalls/syscall.tbl        |    1 +
>  arch/arm/tools/syscall.tbl                    |    1 +
>  arch/arm64/include/asm/unistd32.h             |    2 +
>  arch/ia64/kernel/syscalls/syscall.tbl         |    1 +
>  arch/m68k/kernel/syscalls/syscall.tbl         |    1 +
>  arch/microblaze/kernel/syscalls/syscall.tbl   |    1 +
>  arch/mips/kernel/syscalls/syscall_n32.tbl     |    1 +
>  arch/mips/kernel/syscalls/syscall_n64.tbl     |    1 +
>  arch/mips/kernel/syscalls/syscall_o32.tbl     |    1 +
>  arch/parisc/kernel/syscalls/syscall.tbl       |    1 +
>  arch/powerpc/kernel/syscalls/syscall.tbl      |    1 +
>  arch/powerpc/platforms/cell/spufs/inode.c     |    5 +-
>  arch/s390/kernel/syscalls/syscall.tbl         |    1 +
>  arch/sh/kernel/syscalls/syscall.tbl           |    1 +
>  arch/sparc/kernel/syscalls/syscall.tbl        |    1 +
>  arch/x86/entry/syscalls/syscall_32.tbl        |    1 +
>  arch/x86/entry/syscalls/syscall_64.tbl        |    1 +
>  arch/xtensa/kernel/syscalls/syscall.tbl       |    1 +
>  drivers/android/binderfs.c                    |    3 +-
>  drivers/base/devtmpfs.c                       |   12 +-
>  fs/9p/acl.c                                   |    7 +-
>  fs/9p/v9fs.h                                  |    3 +-
>  fs/9p/v9fs_vfs.h                              |    2 +-
>  fs/9p/vfs_inode.c                             |   32 +-
>  fs/9p/vfs_inode_dotl.c                        |   34 +-
>  fs/9p/xattr.c                                 |    1 +
>  fs/adfs/adfs.h                                |    3 +-
>  fs/adfs/inode.c                               |    5 +-
>  fs/affs/affs.h                                |   10 +-
>  fs/affs/inode.c                               |    7 +-
>  fs/affs/namei.c                               |   15 +-
>  fs/afs/dir.c                                  |   34 +-
>  fs/afs/inode.c                                |    5 +-
>  fs/afs/internal.h                             |    4 +-
>  fs/afs/security.c                             |    2 +-
>  fs/afs/xattr.c                                |    2 +
>  fs/attr.c                                     |   78 +-
>  fs/autofs/root.c                              |   13 +-
>  fs/bad_inode.c                                |   31 +-
>  fs/bfs/dir.c                                  |   12 +-
>  fs/btrfs/acl.c                                |    5 +-
>  fs/btrfs/ctree.h                              |    3 +-
>  fs/btrfs/inode.c                              |   41 +-
>  fs/btrfs/ioctl.c                              |   25 +-
>  fs/btrfs/tests/btrfs-tests.c                  |    2 +-
>  fs/btrfs/xattr.c                              |    2 +
>  fs/cachefiles/interface.c                     |    4 +-
>  fs/cachefiles/namei.c                         |   19 +-
>  fs/cachefiles/xattr.c                         |   16 +-
>  fs/ceph/acl.c                                 |    5 +-
>  fs/ceph/dir.c                                 |   23 +-
>  fs/ceph/inode.c                               |   13 +-
>  fs/ceph/super.h                               |    8 +-
>  fs/ceph/xattr.c                               |    1 +
>  fs/cifs/cifsfs.c                              |    4 +-
>  fs/cifs/cifsfs.h                              |   18 +-
>  fs/cifs/dir.c                                 |    8 +-
>  fs/cifs/inode.c                               |   22 +-
>  fs/cifs/link.c                                |    3 +-
>  fs/cifs/xattr.c                               |    1 +
>  fs/coda/coda_linux.h                          |    4 +-
>  fs/coda/dir.c                                 |   18 +-
>  fs/coda/inode.c                               |    5 +-
>  fs/coda/pioctl.c                              |    6 +-
>  fs/configfs/configfs_internal.h               |    7 +-
>  fs/configfs/dir.c                             |    3 +-
>  fs/configfs/inode.c                           |    5 +-
>  fs/configfs/symlink.c                         |    5 +-
>  fs/coredump.c                                 |   12 +-
>  fs/crypto/policy.c                            |    2 +-
>  fs/debugfs/inode.c                            |    9 +-
>  fs/ecryptfs/crypto.c                          |    4 +-
>  fs/ecryptfs/inode.c                           |   74 +-
>  fs/ecryptfs/main.c                            |    6 +
>  fs/ecryptfs/mmap.c                            |    4 +-
>  fs/efivarfs/file.c                            |    2 +-
>  fs/efivarfs/inode.c                           |    4 +-
>  fs/erofs/inode.c                              |    2 +-
>  fs/exec.c                                     |   12 +-
>  fs/exfat/exfat_fs.h                           |    3 +-
>  fs/exfat/file.c                               |    9 +-
>  fs/exfat/namei.c                              |   13 +-
>  fs/ext2/acl.c                                 |    5 +-
>  fs/ext2/acl.h                                 |    3 +-
>  fs/ext2/ext2.h                                |    2 +-
>  fs/ext2/ialloc.c                              |    2 +-
>  fs/ext2/inode.c                               |   11 +-
>  fs/ext2/ioctl.c                               |    6 +-
>  fs/ext2/namei.c                               |   22 +-
>  fs/ext2/xattr_security.c                      |    1 +
>  fs/ext2/xattr_trusted.c                       |    1 +
>  fs/ext2/xattr_user.c                          |    1 +
>  fs/ext4/Kconfig                               |    9 +
>  fs/ext4/acl.c                                 |    5 +-
>  fs/ext4/acl.h                                 |    3 +-
>  fs/ext4/ext4.h                                |   15 +-
>  fs/ext4/ialloc.c                              |    7 +-
>  fs/ext4/inode.c                               |   14 +-
>  fs/ext4/ioctl.c                               |   18 +-
>  fs/ext4/namei.c                               |   52 +-
>  fs/ext4/super.c                               |    6 +-
>  fs/ext4/xattr_hurd.c                          |    1 +
>  fs/ext4/xattr_security.c                      |    1 +
>  fs/ext4/xattr_trusted.c                       |    1 +
>  fs/ext4/xattr_user.c                          |    1 +
>  fs/f2fs/acl.c                                 |    5 +-
>  fs/f2fs/acl.h                                 |    3 +-
>  fs/f2fs/f2fs.h                                |    3 +-
>  fs/f2fs/file.c                                |   23 +-
>  fs/f2fs/namei.c                               |   26 +-
>  fs/f2fs/xattr.c                               |    4 +-
>  fs/fat/fat.h                                  |    3 +-
>  fs/fat/file.c                                 |   20 +-
>  fs/fat/namei_msdos.c                          |   15 +-
>  fs/fat/namei_vfat.c                           |   15 +-
>  fs/fcntl.c                                    |    3 +-
>  fs/fuse/acl.c                                 |    3 +-
>  fs/fuse/dir.c                                 |   41 +-
>  fs/fuse/fuse_i.h                              |    4 +-
>  fs/fuse/xattr.c                               |    2 +
>  fs/gfs2/acl.c                                 |    5 +-
>  fs/gfs2/acl.h                                 |    3 +-
>  fs/gfs2/file.c                                |    4 +-
>  fs/gfs2/inode.c                               |   54 +-
>  fs/gfs2/inode.h                               |    2 +-
>  fs/gfs2/xattr.c                               |    1 +
>  fs/hfs/attr.c                                 |    1 +
>  fs/hfs/dir.c                                  |   13 +-
>  fs/hfs/hfs_fs.h                               |    2 +-
>  fs/hfs/inode.c                                |    7 +-
>  fs/hfsplus/dir.c                              |   25 +-
>  fs/hfsplus/inode.c                            |   11 +-
>  fs/hfsplus/ioctl.c                            |    2 +-
>  fs/hfsplus/xattr.c                            |    1 +
>  fs/hfsplus/xattr_security.c                   |    1 +
>  fs/hfsplus/xattr_trusted.c                    |    1 +
>  fs/hfsplus/xattr_user.c                       |    1 +
>  fs/hostfs/hostfs_kern.c                       |   31 +-
>  fs/hpfs/hpfs_fn.h                             |    2 +-
>  fs/hpfs/inode.c                               |    7 +-
>  fs/hpfs/namei.c                               |   20 +-
>  fs/hugetlbfs/inode.c                          |   31 +-
>  fs/init.c                                     |   21 +-
>  fs/inode.c                                    |   38 +-
>  fs/internal.h                                 |    9 +
>  fs/jffs2/acl.c                                |    5 +-
>  fs/jffs2/acl.h                                |    3 +-
>  fs/jffs2/dir.c                                |   32 +-
>  fs/jffs2/fs.c                                 |    7 +-
>  fs/jffs2/os-linux.h                           |    2 +-
>  fs/jffs2/security.c                           |    1 +
>  fs/jffs2/xattr_trusted.c                      |    1 +
>  fs/jffs2/xattr_user.c                         |    1 +
>  fs/jfs/acl.c                                  |    5 +-
>  fs/jfs/file.c                                 |    9 +-
>  fs/jfs/ioctl.c                                |    2 +-
>  fs/jfs/jfs_acl.h                              |    3 +-
>  fs/jfs/jfs_inode.c                            |    2 +-
>  fs/jfs/jfs_inode.h                            |    2 +-
>  fs/jfs/namei.c                                |   21 +-
>  fs/jfs/xattr.c                                |    2 +
>  fs/kernfs/dir.c                               |    7 +-
>  fs/kernfs/inode.c                             |   15 +-
>  fs/kernfs/kernfs-internal.h                   |    5 +-
>  fs/libfs.c                                    |   20 +-
>  fs/minix/bitmap.c                             |    2 +-
>  fs/minix/file.c                               |    7 +-
>  fs/minix/inode.c                              |    2 +-
>  fs/minix/namei.c                              |   25 +-
>  fs/mount.h                                    |   10 -
>  fs/namei.c                                    |  317 +-
>  fs/namespace.c                                |  473 ++-
>  fs/nfs/dir.c                                  |   23 +-
>  fs/nfs/inode.c                                |    5 +-
>  fs/nfs/internal.h                             |   10 +-
>  fs/nfs/namespace.c                            |    7 +-
>  fs/nfs/nfs3_fs.h                              |    3 +-
>  fs/nfs/nfs3acl.c                              |    3 +-
>  fs/nfs/nfs4proc.c                             |    3 +
>  fs/nfsd/nfs2acl.c                             |    4 +-
>  fs/nfsd/nfs3acl.c                             |    4 +-
>  fs/nfsd/nfs4acl.c                             |    4 +-
>  fs/nfsd/nfs4recover.c                         |    6 +-
>  fs/nfsd/nfsfh.c                               |    2 +-
>  fs/nfsd/nfsproc.c                             |    2 +-
>  fs/nfsd/vfs.c                                 |   47 +-
>  fs/nilfs2/inode.c                             |   13 +-
>  fs/nilfs2/ioctl.c                             |    2 +-
>  fs/nilfs2/namei.c                             |   20 +-
>  fs/nilfs2/nilfs.h                             |    4 +-
>  fs/notify/fanotify/fanotify_user.c            |    2 +-
>  fs/notify/inotify/inotify_user.c              |    3 +-
>  fs/ntfs/inode.c                               |    2 +-
>  fs/ocfs2/acl.c                                |    5 +-
>  fs/ocfs2/acl.h                                |    3 +-
>  fs/ocfs2/dlmfs/dlmfs.c                        |   17 +-
>  fs/ocfs2/file.c                               |   13 +-
>  fs/ocfs2/file.h                               |    5 +-
>  fs/ocfs2/ioctl.c                              |    2 +-
>  fs/ocfs2/namei.c                              |   21 +-
>  fs/ocfs2/refcounttree.c                       |    4 +-
>  fs/ocfs2/xattr.c                              |    3 +
>  fs/omfs/dir.c                                 |   13 +-
>  fs/omfs/file.c                                |    7 +-
>  fs/omfs/inode.c                               |    2 +-
>  fs/open.c                                     |   52 +-
>  fs/orangefs/acl.c                             |    5 +-
>  fs/orangefs/inode.c                           |   15 +-
>  fs/orangefs/namei.c                           |   12 +-
>  fs/orangefs/orangefs-kernel.h                 |    7 +-
>  fs/orangefs/xattr.c                           |    1 +
>  fs/overlayfs/copy_up.c                        |   20 +-
>  fs/overlayfs/dir.c                            |   31 +-
>  fs/overlayfs/file.c                           |    6 +-
>  fs/overlayfs/inode.c                          |   21 +-
>  fs/overlayfs/overlayfs.h                      |   40 +-
>  fs/overlayfs/super.c                          |   27 +-
>  fs/overlayfs/util.c                           |    4 +-
>  fs/posix_acl.c                                |   78 +-
>  fs/proc/base.c                                |   24 +-
>  fs/proc/fd.c                                  |    4 +-
>  fs/proc/fd.h                                  |    3 +-
>  fs/proc/generic.c                             |    9 +-
>  fs/proc/internal.h                            |    2 +-
>  fs/proc/proc_net.c                            |    2 +-
>  fs/proc/proc_sysctl.c                         |   12 +-
>  fs/proc/root.c                                |    2 +-
>  fs/proc_namespace.c                           |    1 +
>  fs/ramfs/file-nommu.c                         |    9 +-
>  fs/ramfs/inode.c                              |   18 +-
>  fs/reiserfs/acl.h                             |    3 +-
>  fs/reiserfs/inode.c                           |    7 +-
>  fs/reiserfs/ioctl.c                           |    4 +-
>  fs/reiserfs/namei.c                           |   21 +-
>  fs/reiserfs/reiserfs.h                        |    3 +-
>  fs/reiserfs/xattr.c                           |   12 +-
>  fs/reiserfs/xattr.h                           |    2 +-
>  fs/reiserfs/xattr_acl.c                       |    7 +-
>  fs/reiserfs/xattr_security.c                  |    3 +-
>  fs/reiserfs/xattr_trusted.c                   |    3 +-
>  fs/reiserfs/xattr_user.c                      |    3 +-
>  fs/remap_range.c                              |    7 +-
>  fs/stat.c                                     |   10 +-
>  fs/sysv/file.c                                |    7 +-
>  fs/sysv/ialloc.c                              |    2 +-
>  fs/sysv/itree.c                               |    2 +-
>  fs/sysv/namei.c                               |   21 +-
>  fs/tracefs/inode.c                            |    4 +-
>  fs/ubifs/dir.c                                |   29 +-
>  fs/ubifs/file.c                               |    5 +-
>  fs/ubifs/ioctl.c                              |    2 +-
>  fs/ubifs/ubifs.h                              |    3 +-
>  fs/ubifs/xattr.c                              |    1 +
>  fs/udf/file.c                                 |    9 +-
>  fs/udf/ialloc.c                               |    2 +-
>  fs/udf/namei.c                                |   24 +-
>  fs/udf/symlink.c                              |    2 +-
>  fs/ufs/ialloc.c                               |    2 +-
>  fs/ufs/inode.c                                |    7 +-
>  fs/ufs/namei.c                                |   19 +-
>  fs/ufs/ufs.h                                  |    3 +-
>  fs/utimes.c                                   |    4 +-
>  fs/vboxsf/dir.c                               |   12 +-
>  fs/vboxsf/utils.c                             |    5 +-
>  fs/vboxsf/vfsmod.h                            |    3 +-
>  fs/verity/enable.c                            |    2 +-
>  fs/xattr.c                                    |  136 +-
>  fs/xfs/xfs_acl.c                              |    5 +-
>  fs/xfs/xfs_acl.h                              |    3 +-
>  fs/xfs/xfs_ioctl.c                            |    4 +-
>  fs/xfs/xfs_iops.c                             |   55 +-
>  fs/xfs/xfs_xattr.c                            |    3 +-
>  fs/zonefs/super.c                             |    9 +-
>  include/linux/audit.h                         |   10 +-
>  include/linux/capability.h                    |   14 +-
>  include/linux/fs.h                            |  145 +-
>  include/linux/ima.h                           |   15 +-
>  include/linux/lsm_hook_defs.h                 |   15 +-
>  include/linux/lsm_hooks.h                     |    1 +
>  include/linux/mount.h                         |   14 +-
>  include/linux/nfs_fs.h                        |    4 +-
>  include/linux/posix_acl.h                     |   15 +-
>  include/linux/posix_acl_xattr.h               |   12 +-
>  include/linux/security.h                      |   44 +-
>  include/linux/syscalls.h                      |    3 +
>  include/linux/xattr.h                         |   30 +-
>  include/uapi/asm-generic/unistd.h             |    4 +-
>  include/uapi/linux/mount.h                    |   25 +
>  ipc/mqueue.c                                  |   16 +-
>  kernel/auditsc.c                              |   29 +-
>  kernel/bpf/inode.c                            |   13 +-
>  kernel/capability.c                           |   14 +-
>  kernel/cgroup/cgroup.c                        |    2 +-
>  kernel/sys.c                                  |    2 +-
>  mm/madvise.c                                  |    4 +-
>  mm/memcontrol.c                               |    2 +-
>  mm/mincore.c                                  |    4 +-
>  mm/shmem.c                                    |   45 +-
>  net/socket.c                                  |    6 +-
>  net/unix/af_unix.c                            |    4 +-
>  security/apparmor/apparmorfs.c                |    3 +-
>  security/apparmor/domain.c                    |   13 +-
>  security/apparmor/file.c                      |    5 +-
>  security/apparmor/lsm.c                       |   12 +-
>  security/commoncap.c                          |   46 +-
>  security/integrity/evm/evm_crypto.c           |   11 +-
>  security/integrity/evm/evm_main.c             |    4 +-
>  security/integrity/evm/evm_secfs.c            |    2 +-
>  security/integrity/ima/ima.h                  |   19 +-
>  security/integrity/ima/ima_api.c              |   10 +-
>  security/integrity/ima/ima_appraise.c         |   22 +-
>  security/integrity/ima/ima_asymmetric_keys.c  |    2 +-
>  security/integrity/ima/ima_main.c             |   28 +-
>  security/integrity/ima/ima_policy.c           |   17 +-
>  security/integrity/ima/ima_queue_keys.c       |    2 +-
>  security/security.c                           |   25 +-
>  security/selinux/hooks.c                      |   22 +-
>  security/smack/smack_lsm.c                    |   18 +-
>  tools/include/uapi/asm-generic/unistd.h       |    4 +-
>  tools/testing/selftests/Makefile              |    1 +
>  .../testing/selftests/idmap_mounts/.gitignore |    2 +
>  tools/testing/selftests/idmap_mounts/Makefile |   12 +
>  tools/testing/selftests/idmap_mounts/config   |    1 +
>  tools/testing/selftests/idmap_mounts/core.c   | 3476 +++++++++++++++++
>  .../testing/selftests/idmap_mounts/internal.h |  127 +
>  tools/testing/selftests/idmap_mounts/utils.c  |  136 +
>  tools/testing/selftests/idmap_mounts/utils.h  |   17 +
>  tools/testing/selftests/idmap_mounts/xattr.c  |  172 +
>  .../selftests/mount_setattr/.gitignore        |    1 +
>  .../testing/selftests/mount_setattr/Makefile  |    7 +
>  tools/testing/selftests/mount_setattr/config  |    1 +
>  .../mount_setattr/mount_setattr_test.c        |  889 +++++
>  335 files changed, 7594 insertions(+), 1570 deletions(-)
>  create mode 100644 tools/testing/selftests/idmap_mounts/.gitignore
>  create mode 100644 tools/testing/selftests/idmap_mounts/Makefile
>  create mode 100644 tools/testing/selftests/idmap_mounts/config
>  create mode 100644 tools/testing/selftests/idmap_mounts/core.c
>  create mode 100644 tools/testing/selftests/idmap_mounts/internal.h
>  create mode 100644 tools/testing/selftests/idmap_mounts/utils.c
>  create mode 100644 tools/testing/selftests/idmap_mounts/utils.h
>  create mode 100644 tools/testing/selftests/idmap_mounts/xattr.c
>  create mode 100644 tools/testing/selftests/mount_setattr/.gitignore
>  create mode 100644 tools/testing/selftests/mount_setattr/Makefile
>  create mode 100644 tools/testing/selftests/mount_setattr/config
>  create mode 100644 tools/testing/selftests/mount_setattr/mount_setattr_test.c
>
>
> base-commit: 3cea11cd5e3b00d91caf0b4730194039b45c5891
> --
> 2.29.2
>

Thanks!
Steve

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/39] fs: idmapped mounts
  2020-11-17 23:54 ` [PATCH v2 00/39] fs: idmapped mounts Jonathan Corbet
@ 2020-11-18  9:45   ` Christian Brauner
  0 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-18  9:45 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, containers,
	linux-security-module, linux-api, linux-ext4, linux-audit,
	linux-integrity, selinux

On Tue, Nov 17, 2020 at 04:54:33PM -0700, Jonathan Corbet wrote:
> On Sun, 15 Nov 2020 11:36:39 +0100
> Christian Brauner <christian.brauner@ubuntu.com> wrote:
> 
> One quick question...
> 
> > I have written a simple tool available at
> > https://github.com/brauner/mount-idmapped that allows to create idmapped
> > mounts so people can play with this patch series.
> 
> I spent a while looking at that tool.  When actually setting the namespace
> for the mapping, it uses MOUNT_ATTR_SHIFT rather than MOUNT_ATTR_IDMAP.
> The value is the same, so I expect it works...:)  But did that perhaps not
> get updated to reflect a name change?

Yep, that was my mistake. I'll fix it up in the repo for that tool now
and maybe improve it a little too! :)

Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 36/39] overlayfs: do not mount on top of idmapped mounts
  2020-11-15 12:31   ` Amir Goldstein
@ 2020-11-18 10:26     ` Christian Brauner
  0 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-18 10:26 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, Miklos Szeredi,
	overlayfs

On Sun, Nov 15, 2020 at 02:31:46PM +0200, Amir Goldstein wrote:
> On Sun, Nov 15, 2020 at 12:42 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> > Prevent overlayfs from being mounted on top of idmapped mounts until we
> > have ported it to handle this case and added proper testing for it.
> >
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: linux-fsdevel@vger.kernel.org
> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> > ---
> > /* v2 */
> > patch introduced
> > ---
> >  fs/overlayfs/super.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 0d4f2baf6836..3cacc3d3fb65 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -1708,6 +1708,12 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
> >                 if (err)
> >                         goto out_err;
> >
> > +               if (mnt_idmapped(stack[i].mnt)) {
> > +                       err = -EINVAL;
> > +                       pr_err("idmapped lower layers are currently unsupported\n");
> > +                       goto out_err;
> > +               }
> > +
> >                 lower = strchr(lower, '\0') + 1;
> >         }
> >
> > @@ -1939,6 +1945,12 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >                 if (err)
> >                         goto out_err;
> >
> > +               if (mnt_idmapped(upperpath.mnt)) {
> > +                       err = -EINVAL;
> > +                       pr_err("idmapped lower layers are currently unsupported\n");
> > +                       goto out_err;
> > +               }
> > +
> 
> Both checks should be replaced with one check in ovl_mount_dir_noesc()
> right next to ovl_dentry_weird() check and FWIW the error above about
> "lower layers" when referring to upperpath.mnt is confusing.
> "idmapped layers..." should be fine.

Noted, thanks!

Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/39] fs: idmapped mounts
  2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
                   ` (40 preceding siblings ...)
  2020-11-18  3:51 ` Stephen Barber
@ 2020-11-20  2:33 ` Darrick J. Wong
  2020-11-20  9:10   ` Christian Brauner
  41 siblings, 1 reply; 63+ messages in thread
From: Darrick J. Wong @ 2020-11-20  2:33 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux

On Sun, Nov 15, 2020 at 11:36:39AM +0100, Christian Brauner wrote:
> Hey everyone,
> 
> This is v2. It is reworked according to the reviews coming from
> Christoph and others to adapt all relevant helpers and inode_operations
> methods to account for idmapped mounts instead of introducing new
> helpers and methods specific to idmapped mounts like we did before.
> We've also moved the overlayfs conversion to handle idmapped mounts into
> a separate patchset that will be sent out separately after the core
> changes. The converted filesytems in this series include fat and ext4.
> The config option to disable idmapped mounts has been moved from a a vfs
> config ption to a per-filesystem option. They default to off. Having a
> config option allows us to gain some confidence in the patchset over
> multiple kernel releases.

So uh I noticed that the new user_ns parameter passed to the xfs
functions don't actually get used for anything.  Did I miss something
when I pulled the branch, or does this simply reflect xfs not having any
idmap support?  And how would one add such a thing?  Replace the
&init_user_ns with the passed-in user_ns parameter?

> There are two noteable things about this version. First, that it comes
> with a really large test-suite to test current vfs behavior and
> idmapped mounts behavior. We intend this test-suite to grow over time
> and at some point cover most basic core vfs functionality that isn't
> covered in xfstests and have it be part of the selftests.

Please put enough of a test in fstests to do basic validation that we
filesystem developers didn't accidentally screw things up.

--D

> Second, while while working on adapting this patchset to the requested
> changes, the runC and containerd crowd was nice enough to adapt
> containerd to this patchset to make use of idmapped mounts in one of the
> most widely used container runtimes:
> https://github.com/containerd/containerd/pull/4734
> 
> With this patchset we make it possible to attach idmappings to bind
> mounts. This handles several common use-cases. Here are just a few:
> - Shifting of a container rootfs or base image without having to mangle
>   every file (runc, Docker, containerd, k8s, LXD, systemd ...)
> - Sharing of data between host or privileged containers with
>   underprivileged containers (runc, Docker, containerd, k8s, LXD, ...)
> - Shifting of subset of ownership-less filesystems (vfat) for use by
>   multiple users, effectively allowing for DAC on such devices (systemd,
>   Android, ...)
> - Data sharing between multiple user namespaces with incompatible maps
>   (LXD, k8s, ...)
> Making it possible to share directories and mounts between users with
> different uids and gids is itself quite an important use-case in
> distributed systems environments. It's of course especially useful in
> general for portable usb sticks, sharing data between multiple users in
> general, and sharing home directories between multiple users. The last
> example is now elegantly expressed in systemd's homed concept for
> portable home directories. As mentioned above, idmapped mounts also
> allow data from the host to be shared with unprivileged containers,
> between privileged and unprivileged containers simultaneously and in
> addition also between unprivileged containers with different idmappings
> whenever they are used to isolate one container completely from another
> container.
> As can be seen from answers to earlier threads of this patchset and from
> the list of potential users interest in this patchset is fairly
> widespread.
> 
> We have implemented and proposed multiple solutions to this before. This
> included the introduction of fsid mappings, a tiny filesystem that is
> currently carried in Ubuntu that has shown it's limitations, and an
> approach to call override creds in the vfs. None of these solutions have
> covered all of the above use-cases. Some of them have been fairly hacky
> too by e.g. violating how things should be passed down to the individual
> filesystems.
> The solution proposed here has it's origins in multiple discussions
> during Linux Plumbers 2017 during and after the end of the containers
> microconference.
> To the best of my knowledge this involved Aleksa, Stéphane, Eric, David,
> James, and myself. The original idea or a variant thereof has been
> discussed, again to the best of my knowledge, after a Linux conference
> in St. Petersburg in Russia in 2017 between Christoph, Tycho, and
> myself.
> We've taken the time to implement a working version of this solution
> over the last weeks to the best of my abilities. Tycho has signed up
> for this sligthly crazy endeavour as well and he has helped with the
> conversion of the xattr codepaths and will be involved with others in
> converting additional filesystems.
> 
> This series makes idmappings a property of struct vfsmount instead of
> tying it to a process being inside of a user namespace which has been
> the case for all other proposed approaches. It also allows to pass down
> the user namespace into the filesystms which is a clean way instead of
> violating calling conventions by strapping the user namespace
> information that is a property of the mount to the caller's credentials
> or similar hacks.
> With this idmappings become a property of bind-mounts, i.e. each
> bind-mount can have a separate idmapping. Such idmapped mounts can even
> be created inside of the initial user namespace.
> 
> The vfsmount struct gains a new struct user_namespace member. The
> idmapping of the user namespace becomes the idmapping of the mount. A
> caller that is privileged with respect to the user namespace of
> the superblock of the underlying filesystem can create an idmapped
> bind-mount. In the future, we can enable unprivileged use-cases by
> checking whether the caller is privileged wrt to the user namespace an
> already idmapped mount has been marked with, allowing them to change the
> idmapping. For now, keep things simple until we feel sure enough. Note,
> that with syscall interception it is already possible to intercept
> idmapped mount requests from unprivileged containers and handle them in
> a sufficiently privileged container manager. Support for this is
> already available in LXD and will be available in runC were syscall
> interception is currently becoming part of the runtime spec (see
> https://github.com/opencontainers/runtime-spec/pull/1074).
> 
> The user namespace the mount will be marked with can be specified by
> passing a file descriptor refering to the user namespace as an argument
> to the new mount_setattr() syscall together with the new
> MOUNT_ATTR_IDMAP flag. By default vfsmounts are marked with the initial
> user namespace and no behavioral or performance changes should be
> observed. All mapping operations are nops for the initial user
> namespace. When a file/inode is accessed through an idmapped mount the
> i_uid and i_gid of the inode will be remapped according to the user
> namespace the mount has been marked with.
> 
> In order to support idmapped mounts, filesystems need to be changed and
> mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The initial
> version contains fat and ext4 including a list of examples. But patches
> for other filesystems are actively worked on but will be sent out
> separately. We are here to see this through and there are multiple
> people involved in converting filesystems. So filesystem developers are
> not left alone with this.
> 
> I have written a simple tool available at
> https://github.com/brauner/mount-idmapped that allows to create idmapped
> mounts so people can play with this patch series. Here are a few
> illustrations:
> 
> 1. Create a simple idmapped mount of another user's home directory
> 
> u1001@f2-vm:/$ sudo ./mount-idmapped --map-mount b:1000:1001:1 /home/ubuntu/ /mnt
> u1001@f2-vm:/$ ls -al /home/ubuntu/
> total 28
> drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
> drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
> -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
> -rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
> -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
> -rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
> -rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
> -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
> u1001@f2-vm:/$ ls -al /mnt/
> total 28
> drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
> drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
> -rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
> -rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
> -rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
> -rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
> -rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
> -rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo
> u1001@f2-vm:/$ touch /mnt/my-file
> u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
> u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
> u1001@f2-vm:/$ ls -al /mnt/my-file
> -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
> u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
> -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
> u1001@f2-vm:/$ getfacl /mnt/my-file
> getfacl: Removing leading '/' from absolute path names
> # file: mnt/my-file
> # owner: u1001
> # group: u1001
> user::rw-
> user:u1001:rwx
> group::rw-
> mask::rwx
> other::r--
> u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
> getfacl: Removing leading '/' from absolute path names
> # file: home/ubuntu/my-file
> # owner: ubuntu
> # group: ubuntu
> user::rw-
> user:ubuntu:rwx
> group::rw-
> mask::rwx
> other::r--
> 
> 2. Create mapping of the whole ext4 rootfs without a mapping for uid and gid 0
> 
> ubuntu@f2-vm:~$ sudo /mount-idmapped --map-mount b:1:1:65536 / /mnt/
> ubuntu@f2-vm:~$ findmnt | grep mnt
> └─/mnt                                /dev/sda2  ext4       rw,relatime
>   └─/mnt/mnt                          /dev/sda2  ext4       rw,relatime
> ubuntu@f2-vm:~$ sudo mkdir /AS-ROOT-CAN-CREATE
> ubuntu@f2-vm:~$ sudo mkdir /mnt/AS-ROOT-CANT-CREATE
> mkdir: cannot create directory ‘/mnt/AS-ROOT-CANT-CREATE’: Value too large for defined data type
> ubuntu@f2-vm:~$ mkdir /mnt/home/ubuntu/AS-USER-1000-CAN-CREATE
> 
> 3. Create a vfat usb mount and expose to user 1001 and 5000
> 
> ubuntu@f2-vm:/$ sudo mount /dev/sdb /mnt
> ubuntu@f2-vm:/$ findmnt  | grep mnt
> └─/mnt                                /dev/sdb vfat       rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
> ubuntu@f2-vm:/$ ls -al /mnt
> total 12
> drwxr-xr-x  2 root root 4096 Jan  1  1970 .
> drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
> -rwxr-xr-x  1 root root    4 Oct 28 03:44 aaa
> -rwxr-xr-x  1 root root    0 Oct 28 01:09 bbb
> ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:1001:1 /mnt /mnt-1001/
> ubuntu@f2-vm:/$ ls -al /mnt-1001/
> total 12
> drwxr-xr-x  2 u1001 u1001 4096 Jan  1  1970 .
> drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
> -rwxr-xr-x  1 u1001 u1001    4 Oct 28 03:44 aaa
> -rwxr-xr-x  1 u1001 u1001    0 Oct 28 01:09 bbb
> ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:5000:1 /mnt /mnt-5000/
> ubuntu@f2-vm:/$ ls -al /mnt-5000/
> total 12
> drwxr-xr-x  2 5000 5000 4096 Jan  1  1970 .
> drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
> -rwxr-xr-x  1 5000 5000    4 Oct 28 03:44 aaa
> -rwxr-xr-x  1 5000 5000    0 Oct 28 01:09 bbb
> 
> 4. Create an idmapped rootfs mount for a container
> 
> root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/
> total 68
> drwxr-xr-x 17 20000 20000 4096 Sep 24 07:48 .
> drwxrwx---  3 20000 20000 4096 Oct 16 19:26 ..
> lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 bin -> usr/bin
> drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 boot
> drwxr-xr-x  3 20000 20000 4096 Oct 16 19:26 dev
> drwxr-xr-x 61 20000 20000 4096 Oct 16 19:26 etc
> drwxr-xr-x  3 20000 20000 4096 Sep 24 07:45 home
> lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 lib -> usr/lib
> lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib32 -> usr/lib32
> lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib64 -> usr/lib64
> lrwxrwxrwx  1 20000 20000   10 Sep 24 07:43 libx32 -> usr/libx32
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 media
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 mnt
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 opt
> drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 proc
> drwx------  2 20000 20000 4096 Sep 24 07:43 root
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:45 run
> lrwxrwxrwx  1 20000 20000    8 Sep 24 07:43 sbin -> usr/sbin
> drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 srv
> drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 sys
> drwxrwxrwt  2 20000 20000 4096 Sep 24 07:44 tmp
> drwxr-xr-x 13 20000 20000 4096 Sep 24 07:43 usr
> drwxr-xr-x 12 20000 20000 4096 Sep 24 07:44 var
> root@f2-vm:~# /mount-idmapped --map-mount b:20000:10000:100000 /var/lib/lxc/f2/rootfs/ /mnt
> root@f2-vm:~# ls -al /mnt
> total 68
> drwxr-xr-x 17 10000 10000 4096 Sep 24 07:48 .
> drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
> lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 bin -> usr/bin
> drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 boot
> drwxr-xr-x  3 10000 10000 4096 Oct 16 19:26 dev
> drwxr-xr-x 61 10000 10000 4096 Oct 16 19:26 etc
> drwxr-xr-x  3 10000 10000 4096 Sep 24 07:45 home
> lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 lib -> usr/lib
> lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib32 -> usr/lib32
> lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib64 -> usr/lib64
> lrwxrwxrwx  1 10000 10000   10 Sep 24 07:43 libx32 -> usr/libx32
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 media
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 mnt
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 opt
> drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 proc
> drwx------  2 10000 10000 4096 Sep 24 07:43 root
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:45 run
> lrwxrwxrwx  1 10000 10000    8 Sep 24 07:43 sbin -> usr/sbin
> drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 srv
> drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 sys
> drwxrwxrwt  2 10000 10000 4096 Sep 24 07:44 tmp
> drwxr-xr-x 13 10000 10000 4096 Sep 24 07:43 usr
> drwxr-xr-x 12 10000 10000 4096 Sep 24 07:44 var
> root@f2-vm:~# lxc-start f2 # uses /mnt as rootfs
> root@f2-vm:~# lxc-attach f2 -- cat /proc/1/uid_map
>          0      10000      10000
> root@f2-vm:~# lxc-attach f2 -- cat /proc/1/gid_map
>          0      10000      10000
> root@f2-vm:~# lxc-attach f2 -- ls -al /
> total 52
> drwxr-xr-x  17 root   root    4096 Sep 24 07:48 .
> drwxr-xr-x  17 root   root    4096 Sep 24 07:48 ..
> lrwxrwxrwx   1 root   root       7 Sep 24 07:43 bin -> usr/bin
> drwxr-xr-x   2 root   root    4096 Apr 15  2020 boot
> drwxr-xr-x   5 root   root     500 Oct 28 23:39 dev
> drwxr-xr-x  61 root   root    4096 Oct 28 23:39 etc
> drwxr-xr-x   3 root   root    4096 Sep 24 07:45 home
> lrwxrwxrwx   1 root   root       7 Sep 24 07:43 lib -> usr/lib
> lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib32 -> usr/lib32
> lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib64 -> usr/lib64
> lrwxrwxrwx   1 root   root      10 Sep 24 07:43 libx32 -> usr/libx32
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 media
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 mnt
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 opt
> dr-xr-xr-x 232 nobody nogroup    0 Oct 28 23:39 proc
> drwx------   2 root   root    4096 Oct 28 23:41 root
> drwxr-xr-x  12 root   root     360 Oct 28 23:39 run
> lrwxrwxrwx   1 root   root       8 Sep 24 07:43 sbin -> usr/sbin
> drwxr-xr-x   2 root   root    4096 Sep 24 07:43 srv
> dr-xr-xr-x  13 nobody nogroup    0 Oct 28 23:39 sys
> drwxrwxrwt  11 root   root    4096 Oct 28 23:40 tmp
> drwxr-xr-x  13 root   root    4096 Sep 24 07:43 usr
> drwxr-xr-x  12 root   root    4096 Sep 24 07:44 var
> root@f2-vm:~# lxc-attach f2 -- ls -al /my-file
> -rw-r--r-- 1 root root 0 Oct 28 23:43 /my-file
> root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/my-file
> -rw-r--r-- 1 20000 20000 0 Oct 28 23:43 /var/lib/lxc/f2/rootfs/my-file
> 
> I'd like to say thanks to:
> Al for pointing me into the direction to avoid inode alias issues during
> lookup. David for various discussions around this. Christoph for proving
> a first proper review and for being involved in the original idea. Tycho
> for helping with this series and on future patches to convert
> filesystems. Alban Crequy and the Kinvolk located just a few streets
> away from me in Berlin for providing use-case discussions and writing
> patches for containerd! Stéphane for his invaluable input on many things
> and level head and enabling me to work on this. Amir for explaining and
> discussing aspects of overlayfs with me. I'd like to especially thank
> Seth Forshee because he provided a lot of good analysis, suggestions,
> and participated in short-notice discussions in both chat and video for
> some nitty-gritty technical details.
> 
> This series can be found and pulled from the three usual locations:
> https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=idmapped_mounts
> https://github.com/brauner/linux/tree/idmapped_mounts
> https://gitlab.com/brauner/linux/-/commits/idmapped_mounts
> 
> Thanks!
> Christian
> 
> Christian Brauner (37):
>   namespace: take lock_mount_hash() directly when changing flags
>   mount: make {lock,unlock}_mount_hash() static
>   namespace: only take read lock in do_reconfigure_mnt()
>   fs: add mount_setattr()
>   tests: add mount_setattr() selftests
>   fs: add id translation helpers
>   mount: attach mappings to mounts
>   capability: handle idmapped mounts
>   namei: add idmapped mount aware permission helpers
>   inode: add idmapped mount aware init and permission helpers
>   attr: handle idmapped mounts
>   acl: handle idmapped mounts
>   commoncap: handle idmapped mounts
>   stat: handle idmapped mounts
>   namei: handle idmapped mounts in may_*() helpers
>   namei: introduce struct renamedata
>   namei: prepare for idmapped mounts
>   open: handle idmapped mounts in do_truncate()
>   open: handle idmapped mounts
>   af_unix: handle idmapped mounts
>   utimes: handle idmapped mounts
>   fcntl: handle idmapped mounts
>   notify: handle idmapped mounts
>   init: handle idmapped mounts
>   ioctl: handle idmapped mounts
>   would_dump: handle idmapped mounts
>   exec: handle idmapped mounts
>   fs: add helpers for idmap mounts
>   apparmor: handle idmapped mounts
>   audit: handle idmapped mounts
>   ima: handle idmapped mounts
>   fat: handle idmapped mounts
>   ext4: support idmapped mounts
>   ecryptfs: do not mount on top of idmapped mounts
>   overlayfs: do not mount on top of idmapped mounts
>   fs: introduce MOUNT_ATTR_IDMAP
>   tests: add vfs/idmapped mounts test suite
> 
> Tycho Andersen (2):
>   xattr: handle idmapped mounts
>   selftests: add idmapped mounts xattr selftest
> 
>  Documentation/filesystems/locking.rst         |    6 +-
>  Documentation/filesystems/porting.rst         |    2 +
>  Documentation/filesystems/vfs.rst             |   17 +-
>  arch/alpha/kernel/syscalls/syscall.tbl        |    1 +
>  arch/arm/tools/syscall.tbl                    |    1 +
>  arch/arm64/include/asm/unistd32.h             |    2 +
>  arch/ia64/kernel/syscalls/syscall.tbl         |    1 +
>  arch/m68k/kernel/syscalls/syscall.tbl         |    1 +
>  arch/microblaze/kernel/syscalls/syscall.tbl   |    1 +
>  arch/mips/kernel/syscalls/syscall_n32.tbl     |    1 +
>  arch/mips/kernel/syscalls/syscall_n64.tbl     |    1 +
>  arch/mips/kernel/syscalls/syscall_o32.tbl     |    1 +
>  arch/parisc/kernel/syscalls/syscall.tbl       |    1 +
>  arch/powerpc/kernel/syscalls/syscall.tbl      |    1 +
>  arch/powerpc/platforms/cell/spufs/inode.c     |    5 +-
>  arch/s390/kernel/syscalls/syscall.tbl         |    1 +
>  arch/sh/kernel/syscalls/syscall.tbl           |    1 +
>  arch/sparc/kernel/syscalls/syscall.tbl        |    1 +
>  arch/x86/entry/syscalls/syscall_32.tbl        |    1 +
>  arch/x86/entry/syscalls/syscall_64.tbl        |    1 +
>  arch/xtensa/kernel/syscalls/syscall.tbl       |    1 +
>  drivers/android/binderfs.c                    |    3 +-
>  drivers/base/devtmpfs.c                       |   12 +-
>  fs/9p/acl.c                                   |    7 +-
>  fs/9p/v9fs.h                                  |    3 +-
>  fs/9p/v9fs_vfs.h                              |    2 +-
>  fs/9p/vfs_inode.c                             |   32 +-
>  fs/9p/vfs_inode_dotl.c                        |   34 +-
>  fs/9p/xattr.c                                 |    1 +
>  fs/adfs/adfs.h                                |    3 +-
>  fs/adfs/inode.c                               |    5 +-
>  fs/affs/affs.h                                |   10 +-
>  fs/affs/inode.c                               |    7 +-
>  fs/affs/namei.c                               |   15 +-
>  fs/afs/dir.c                                  |   34 +-
>  fs/afs/inode.c                                |    5 +-
>  fs/afs/internal.h                             |    4 +-
>  fs/afs/security.c                             |    2 +-
>  fs/afs/xattr.c                                |    2 +
>  fs/attr.c                                     |   78 +-
>  fs/autofs/root.c                              |   13 +-
>  fs/bad_inode.c                                |   31 +-
>  fs/bfs/dir.c                                  |   12 +-
>  fs/btrfs/acl.c                                |    5 +-
>  fs/btrfs/ctree.h                              |    3 +-
>  fs/btrfs/inode.c                              |   41 +-
>  fs/btrfs/ioctl.c                              |   25 +-
>  fs/btrfs/tests/btrfs-tests.c                  |    2 +-
>  fs/btrfs/xattr.c                              |    2 +
>  fs/cachefiles/interface.c                     |    4 +-
>  fs/cachefiles/namei.c                         |   19 +-
>  fs/cachefiles/xattr.c                         |   16 +-
>  fs/ceph/acl.c                                 |    5 +-
>  fs/ceph/dir.c                                 |   23 +-
>  fs/ceph/inode.c                               |   13 +-
>  fs/ceph/super.h                               |    8 +-
>  fs/ceph/xattr.c                               |    1 +
>  fs/cifs/cifsfs.c                              |    4 +-
>  fs/cifs/cifsfs.h                              |   18 +-
>  fs/cifs/dir.c                                 |    8 +-
>  fs/cifs/inode.c                               |   22 +-
>  fs/cifs/link.c                                |    3 +-
>  fs/cifs/xattr.c                               |    1 +
>  fs/coda/coda_linux.h                          |    4 +-
>  fs/coda/dir.c                                 |   18 +-
>  fs/coda/inode.c                               |    5 +-
>  fs/coda/pioctl.c                              |    6 +-
>  fs/configfs/configfs_internal.h               |    7 +-
>  fs/configfs/dir.c                             |    3 +-
>  fs/configfs/inode.c                           |    5 +-
>  fs/configfs/symlink.c                         |    5 +-
>  fs/coredump.c                                 |   12 +-
>  fs/crypto/policy.c                            |    2 +-
>  fs/debugfs/inode.c                            |    9 +-
>  fs/ecryptfs/crypto.c                          |    4 +-
>  fs/ecryptfs/inode.c                           |   74 +-
>  fs/ecryptfs/main.c                            |    6 +
>  fs/ecryptfs/mmap.c                            |    4 +-
>  fs/efivarfs/file.c                            |    2 +-
>  fs/efivarfs/inode.c                           |    4 +-
>  fs/erofs/inode.c                              |    2 +-
>  fs/exec.c                                     |   12 +-
>  fs/exfat/exfat_fs.h                           |    3 +-
>  fs/exfat/file.c                               |    9 +-
>  fs/exfat/namei.c                              |   13 +-
>  fs/ext2/acl.c                                 |    5 +-
>  fs/ext2/acl.h                                 |    3 +-
>  fs/ext2/ext2.h                                |    2 +-
>  fs/ext2/ialloc.c                              |    2 +-
>  fs/ext2/inode.c                               |   11 +-
>  fs/ext2/ioctl.c                               |    6 +-
>  fs/ext2/namei.c                               |   22 +-
>  fs/ext2/xattr_security.c                      |    1 +
>  fs/ext2/xattr_trusted.c                       |    1 +
>  fs/ext2/xattr_user.c                          |    1 +
>  fs/ext4/Kconfig                               |    9 +
>  fs/ext4/acl.c                                 |    5 +-
>  fs/ext4/acl.h                                 |    3 +-
>  fs/ext4/ext4.h                                |   15 +-
>  fs/ext4/ialloc.c                              |    7 +-
>  fs/ext4/inode.c                               |   14 +-
>  fs/ext4/ioctl.c                               |   18 +-
>  fs/ext4/namei.c                               |   52 +-
>  fs/ext4/super.c                               |    6 +-
>  fs/ext4/xattr_hurd.c                          |    1 +
>  fs/ext4/xattr_security.c                      |    1 +
>  fs/ext4/xattr_trusted.c                       |    1 +
>  fs/ext4/xattr_user.c                          |    1 +
>  fs/f2fs/acl.c                                 |    5 +-
>  fs/f2fs/acl.h                                 |    3 +-
>  fs/f2fs/f2fs.h                                |    3 +-
>  fs/f2fs/file.c                                |   23 +-
>  fs/f2fs/namei.c                               |   26 +-
>  fs/f2fs/xattr.c                               |    4 +-
>  fs/fat/fat.h                                  |    3 +-
>  fs/fat/file.c                                 |   20 +-
>  fs/fat/namei_msdos.c                          |   15 +-
>  fs/fat/namei_vfat.c                           |   15 +-
>  fs/fcntl.c                                    |    3 +-
>  fs/fuse/acl.c                                 |    3 +-
>  fs/fuse/dir.c                                 |   41 +-
>  fs/fuse/fuse_i.h                              |    4 +-
>  fs/fuse/xattr.c                               |    2 +
>  fs/gfs2/acl.c                                 |    5 +-
>  fs/gfs2/acl.h                                 |    3 +-
>  fs/gfs2/file.c                                |    4 +-
>  fs/gfs2/inode.c                               |   54 +-
>  fs/gfs2/inode.h                               |    2 +-
>  fs/gfs2/xattr.c                               |    1 +
>  fs/hfs/attr.c                                 |    1 +
>  fs/hfs/dir.c                                  |   13 +-
>  fs/hfs/hfs_fs.h                               |    2 +-
>  fs/hfs/inode.c                                |    7 +-
>  fs/hfsplus/dir.c                              |   25 +-
>  fs/hfsplus/inode.c                            |   11 +-
>  fs/hfsplus/ioctl.c                            |    2 +-
>  fs/hfsplus/xattr.c                            |    1 +
>  fs/hfsplus/xattr_security.c                   |    1 +
>  fs/hfsplus/xattr_trusted.c                    |    1 +
>  fs/hfsplus/xattr_user.c                       |    1 +
>  fs/hostfs/hostfs_kern.c                       |   31 +-
>  fs/hpfs/hpfs_fn.h                             |    2 +-
>  fs/hpfs/inode.c                               |    7 +-
>  fs/hpfs/namei.c                               |   20 +-
>  fs/hugetlbfs/inode.c                          |   31 +-
>  fs/init.c                                     |   21 +-
>  fs/inode.c                                    |   38 +-
>  fs/internal.h                                 |    9 +
>  fs/jffs2/acl.c                                |    5 +-
>  fs/jffs2/acl.h                                |    3 +-
>  fs/jffs2/dir.c                                |   32 +-
>  fs/jffs2/fs.c                                 |    7 +-
>  fs/jffs2/os-linux.h                           |    2 +-
>  fs/jffs2/security.c                           |    1 +
>  fs/jffs2/xattr_trusted.c                      |    1 +
>  fs/jffs2/xattr_user.c                         |    1 +
>  fs/jfs/acl.c                                  |    5 +-
>  fs/jfs/file.c                                 |    9 +-
>  fs/jfs/ioctl.c                                |    2 +-
>  fs/jfs/jfs_acl.h                              |    3 +-
>  fs/jfs/jfs_inode.c                            |    2 +-
>  fs/jfs/jfs_inode.h                            |    2 +-
>  fs/jfs/namei.c                                |   21 +-
>  fs/jfs/xattr.c                                |    2 +
>  fs/kernfs/dir.c                               |    7 +-
>  fs/kernfs/inode.c                             |   15 +-
>  fs/kernfs/kernfs-internal.h                   |    5 +-
>  fs/libfs.c                                    |   20 +-
>  fs/minix/bitmap.c                             |    2 +-
>  fs/minix/file.c                               |    7 +-
>  fs/minix/inode.c                              |    2 +-
>  fs/minix/namei.c                              |   25 +-
>  fs/mount.h                                    |   10 -
>  fs/namei.c                                    |  317 +-
>  fs/namespace.c                                |  473 ++-
>  fs/nfs/dir.c                                  |   23 +-
>  fs/nfs/inode.c                                |    5 +-
>  fs/nfs/internal.h                             |   10 +-
>  fs/nfs/namespace.c                            |    7 +-
>  fs/nfs/nfs3_fs.h                              |    3 +-
>  fs/nfs/nfs3acl.c                              |    3 +-
>  fs/nfs/nfs4proc.c                             |    3 +
>  fs/nfsd/nfs2acl.c                             |    4 +-
>  fs/nfsd/nfs3acl.c                             |    4 +-
>  fs/nfsd/nfs4acl.c                             |    4 +-
>  fs/nfsd/nfs4recover.c                         |    6 +-
>  fs/nfsd/nfsfh.c                               |    2 +-
>  fs/nfsd/nfsproc.c                             |    2 +-
>  fs/nfsd/vfs.c                                 |   47 +-
>  fs/nilfs2/inode.c                             |   13 +-
>  fs/nilfs2/ioctl.c                             |    2 +-
>  fs/nilfs2/namei.c                             |   20 +-
>  fs/nilfs2/nilfs.h                             |    4 +-
>  fs/notify/fanotify/fanotify_user.c            |    2 +-
>  fs/notify/inotify/inotify_user.c              |    3 +-
>  fs/ntfs/inode.c                               |    2 +-
>  fs/ocfs2/acl.c                                |    5 +-
>  fs/ocfs2/acl.h                                |    3 +-
>  fs/ocfs2/dlmfs/dlmfs.c                        |   17 +-
>  fs/ocfs2/file.c                               |   13 +-
>  fs/ocfs2/file.h                               |    5 +-
>  fs/ocfs2/ioctl.c                              |    2 +-
>  fs/ocfs2/namei.c                              |   21 +-
>  fs/ocfs2/refcounttree.c                       |    4 +-
>  fs/ocfs2/xattr.c                              |    3 +
>  fs/omfs/dir.c                                 |   13 +-
>  fs/omfs/file.c                                |    7 +-
>  fs/omfs/inode.c                               |    2 +-
>  fs/open.c                                     |   52 +-
>  fs/orangefs/acl.c                             |    5 +-
>  fs/orangefs/inode.c                           |   15 +-
>  fs/orangefs/namei.c                           |   12 +-
>  fs/orangefs/orangefs-kernel.h                 |    7 +-
>  fs/orangefs/xattr.c                           |    1 +
>  fs/overlayfs/copy_up.c                        |   20 +-
>  fs/overlayfs/dir.c                            |   31 +-
>  fs/overlayfs/file.c                           |    6 +-
>  fs/overlayfs/inode.c                          |   21 +-
>  fs/overlayfs/overlayfs.h                      |   40 +-
>  fs/overlayfs/super.c                          |   27 +-
>  fs/overlayfs/util.c                           |    4 +-
>  fs/posix_acl.c                                |   78 +-
>  fs/proc/base.c                                |   24 +-
>  fs/proc/fd.c                                  |    4 +-
>  fs/proc/fd.h                                  |    3 +-
>  fs/proc/generic.c                             |    9 +-
>  fs/proc/internal.h                            |    2 +-
>  fs/proc/proc_net.c                            |    2 +-
>  fs/proc/proc_sysctl.c                         |   12 +-
>  fs/proc/root.c                                |    2 +-
>  fs/proc_namespace.c                           |    1 +
>  fs/ramfs/file-nommu.c                         |    9 +-
>  fs/ramfs/inode.c                              |   18 +-
>  fs/reiserfs/acl.h                             |    3 +-
>  fs/reiserfs/inode.c                           |    7 +-
>  fs/reiserfs/ioctl.c                           |    4 +-
>  fs/reiserfs/namei.c                           |   21 +-
>  fs/reiserfs/reiserfs.h                        |    3 +-
>  fs/reiserfs/xattr.c                           |   12 +-
>  fs/reiserfs/xattr.h                           |    2 +-
>  fs/reiserfs/xattr_acl.c                       |    7 +-
>  fs/reiserfs/xattr_security.c                  |    3 +-
>  fs/reiserfs/xattr_trusted.c                   |    3 +-
>  fs/reiserfs/xattr_user.c                      |    3 +-
>  fs/remap_range.c                              |    7 +-
>  fs/stat.c                                     |   10 +-
>  fs/sysv/file.c                                |    7 +-
>  fs/sysv/ialloc.c                              |    2 +-
>  fs/sysv/itree.c                               |    2 +-
>  fs/sysv/namei.c                               |   21 +-
>  fs/tracefs/inode.c                            |    4 +-
>  fs/ubifs/dir.c                                |   29 +-
>  fs/ubifs/file.c                               |    5 +-
>  fs/ubifs/ioctl.c                              |    2 +-
>  fs/ubifs/ubifs.h                              |    3 +-
>  fs/ubifs/xattr.c                              |    1 +
>  fs/udf/file.c                                 |    9 +-
>  fs/udf/ialloc.c                               |    2 +-
>  fs/udf/namei.c                                |   24 +-
>  fs/udf/symlink.c                              |    2 +-
>  fs/ufs/ialloc.c                               |    2 +-
>  fs/ufs/inode.c                                |    7 +-
>  fs/ufs/namei.c                                |   19 +-
>  fs/ufs/ufs.h                                  |    3 +-
>  fs/utimes.c                                   |    4 +-
>  fs/vboxsf/dir.c                               |   12 +-
>  fs/vboxsf/utils.c                             |    5 +-
>  fs/vboxsf/vfsmod.h                            |    3 +-
>  fs/verity/enable.c                            |    2 +-
>  fs/xattr.c                                    |  136 +-
>  fs/xfs/xfs_acl.c                              |    5 +-
>  fs/xfs/xfs_acl.h                              |    3 +-
>  fs/xfs/xfs_ioctl.c                            |    4 +-
>  fs/xfs/xfs_iops.c                             |   55 +-
>  fs/xfs/xfs_xattr.c                            |    3 +-
>  fs/zonefs/super.c                             |    9 +-
>  include/linux/audit.h                         |   10 +-
>  include/linux/capability.h                    |   14 +-
>  include/linux/fs.h                            |  145 +-
>  include/linux/ima.h                           |   15 +-
>  include/linux/lsm_hook_defs.h                 |   15 +-
>  include/linux/lsm_hooks.h                     |    1 +
>  include/linux/mount.h                         |   14 +-
>  include/linux/nfs_fs.h                        |    4 +-
>  include/linux/posix_acl.h                     |   15 +-
>  include/linux/posix_acl_xattr.h               |   12 +-
>  include/linux/security.h                      |   44 +-
>  include/linux/syscalls.h                      |    3 +
>  include/linux/xattr.h                         |   30 +-
>  include/uapi/asm-generic/unistd.h             |    4 +-
>  include/uapi/linux/mount.h                    |   25 +
>  ipc/mqueue.c                                  |   16 +-
>  kernel/auditsc.c                              |   29 +-
>  kernel/bpf/inode.c                            |   13 +-
>  kernel/capability.c                           |   14 +-
>  kernel/cgroup/cgroup.c                        |    2 +-
>  kernel/sys.c                                  |    2 +-
>  mm/madvise.c                                  |    4 +-
>  mm/memcontrol.c                               |    2 +-
>  mm/mincore.c                                  |    4 +-
>  mm/shmem.c                                    |   45 +-
>  net/socket.c                                  |    6 +-
>  net/unix/af_unix.c                            |    4 +-
>  security/apparmor/apparmorfs.c                |    3 +-
>  security/apparmor/domain.c                    |   13 +-
>  security/apparmor/file.c                      |    5 +-
>  security/apparmor/lsm.c                       |   12 +-
>  security/commoncap.c                          |   46 +-
>  security/integrity/evm/evm_crypto.c           |   11 +-
>  security/integrity/evm/evm_main.c             |    4 +-
>  security/integrity/evm/evm_secfs.c            |    2 +-
>  security/integrity/ima/ima.h                  |   19 +-
>  security/integrity/ima/ima_api.c              |   10 +-
>  security/integrity/ima/ima_appraise.c         |   22 +-
>  security/integrity/ima/ima_asymmetric_keys.c  |    2 +-
>  security/integrity/ima/ima_main.c             |   28 +-
>  security/integrity/ima/ima_policy.c           |   17 +-
>  security/integrity/ima/ima_queue_keys.c       |    2 +-
>  security/security.c                           |   25 +-
>  security/selinux/hooks.c                      |   22 +-
>  security/smack/smack_lsm.c                    |   18 +-
>  tools/include/uapi/asm-generic/unistd.h       |    4 +-
>  tools/testing/selftests/Makefile              |    1 +
>  .../testing/selftests/idmap_mounts/.gitignore |    2 +
>  tools/testing/selftests/idmap_mounts/Makefile |   12 +
>  tools/testing/selftests/idmap_mounts/config   |    1 +
>  tools/testing/selftests/idmap_mounts/core.c   | 3476 +++++++++++++++++
>  .../testing/selftests/idmap_mounts/internal.h |  127 +
>  tools/testing/selftests/idmap_mounts/utils.c  |  136 +
>  tools/testing/selftests/idmap_mounts/utils.h  |   17 +
>  tools/testing/selftests/idmap_mounts/xattr.c  |  172 +
>  .../selftests/mount_setattr/.gitignore        |    1 +
>  .../testing/selftests/mount_setattr/Makefile  |    7 +
>  tools/testing/selftests/mount_setattr/config  |    1 +
>  .../mount_setattr/mount_setattr_test.c        |  889 +++++
>  335 files changed, 7594 insertions(+), 1570 deletions(-)
>  create mode 100644 tools/testing/selftests/idmap_mounts/.gitignore
>  create mode 100644 tools/testing/selftests/idmap_mounts/Makefile
>  create mode 100644 tools/testing/selftests/idmap_mounts/config
>  create mode 100644 tools/testing/selftests/idmap_mounts/core.c
>  create mode 100644 tools/testing/selftests/idmap_mounts/internal.h
>  create mode 100644 tools/testing/selftests/idmap_mounts/utils.c
>  create mode 100644 tools/testing/selftests/idmap_mounts/utils.h
>  create mode 100644 tools/testing/selftests/idmap_mounts/xattr.c
>  create mode 100644 tools/testing/selftests/mount_setattr/.gitignore
>  create mode 100644 tools/testing/selftests/mount_setattr/Makefile
>  create mode 100644 tools/testing/selftests/mount_setattr/config
>  create mode 100644 tools/testing/selftests/mount_setattr/mount_setattr_test.c
> 
> 
> base-commit: 3cea11cd5e3b00d91caf0b4730194039b45c5891
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/39] fs: idmapped mounts
  2020-11-20  2:33 ` Darrick J. Wong
@ 2020-11-20  9:10   ` Christian Brauner
  2020-11-20  9:12     ` Christoph Hellwig
  0 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-20  9:10 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux

On Thu, Nov 19, 2020 at 06:33:09PM -0800, Darrick J. Wong wrote:
> On Sun, Nov 15, 2020 at 11:36:39AM +0100, Christian Brauner wrote:
> > Hey everyone,
> > 
> > This is v2. It is reworked according to the reviews coming from
> > Christoph and others to adapt all relevant helpers and inode_operations
> > methods to account for idmapped mounts instead of introducing new
> > helpers and methods specific to idmapped mounts like we did before.
> > We've also moved the overlayfs conversion to handle idmapped mounts into
> > a separate patchset that will be sent out separately after the core
> > changes. The converted filesytems in this series include fat and ext4.
> > The config option to disable idmapped mounts has been moved from a a vfs
> > config ption to a per-filesystem option. They default to off. Having a
> > config option allows us to gain some confidence in the patchset over
> > multiple kernel releases.
> 
> So uh I noticed that the new user_ns parameter passed to the xfs
> functions don't actually get used for anything.  Did I miss something
> when I pulled the branch, or does this simply reflect xfs not having any
> idmap support?  And how would one add such a thing?  Replace the
> &init_user_ns with the passed-in user_ns parameter?

Sorry, maybe you missed it but that's mentioned further below in the
commit messages:

"In order to support idmapped mounts, filesystems need to be changed and
mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The initial
version contains fat and ext4 including a list of examples. But patches
for other filesystems are actively worked on but will be sent out
separately. We are here to see this through and there are multiple
people involved in converting filesystems. So filesystem developers are
not left alone with this."

> 
> > There are two noteable things about this version. First, that it comes
> > with a really large test-suite to test current vfs behavior and
> > idmapped mounts behavior. We intend this test-suite to grow over time
> > and at some point cover most basic core vfs functionality that isn't
> > covered in xfstests and have it be part of the selftests.
> 
> Please put enough of a test in fstests to do basic validation that we
> filesystem developers didn't accidentally screw things up.

Maybe you didn't see this or you're referring to xfstests but this
series contains a >=4000 lines long test-suite that validates all core
features with and without idmapped mounts. It's the last patch in this
version of the series and it's located in:
tools/testing/selftests/idmap_mounts.

Everytime a filesystem is added this test-suite will be updated. We
would perfer if this test would be shipped with the kernel itself and
not in a separate test-suite such as xfstests. But we're happy to add
patches for the latter at some point too.

Christian

> 
> --D
> 
> > Second, while while working on adapting this patchset to the requested
> > changes, the runC and containerd crowd was nice enough to adapt
> > containerd to this patchset to make use of idmapped mounts in one of the
> > most widely used container runtimes:
> > https://github.com/containerd/containerd/pull/4734
> > 
> > With this patchset we make it possible to attach idmappings to bind
> > mounts. This handles several common use-cases. Here are just a few:
> > - Shifting of a container rootfs or base image without having to mangle
> >   every file (runc, Docker, containerd, k8s, LXD, systemd ...)
> > - Sharing of data between host or privileged containers with
> >   underprivileged containers (runc, Docker, containerd, k8s, LXD, ...)
> > - Shifting of subset of ownership-less filesystems (vfat) for use by
> >   multiple users, effectively allowing for DAC on such devices (systemd,
> >   Android, ...)
> > - Data sharing between multiple user namespaces with incompatible maps
> >   (LXD, k8s, ...)
> > Making it possible to share directories and mounts between users with
> > different uids and gids is itself quite an important use-case in
> > distributed systems environments. It's of course especially useful in
> > general for portable usb sticks, sharing data between multiple users in
> > general, and sharing home directories between multiple users. The last
> > example is now elegantly expressed in systemd's homed concept for
> > portable home directories. As mentioned above, idmapped mounts also
> > allow data from the host to be shared with unprivileged containers,
> > between privileged and unprivileged containers simultaneously and in
> > addition also between unprivileged containers with different idmappings
> > whenever they are used to isolate one container completely from another
> > container.
> > As can be seen from answers to earlier threads of this patchset and from
> > the list of potential users interest in this patchset is fairly
> > widespread.
> > 
> > We have implemented and proposed multiple solutions to this before. This
> > included the introduction of fsid mappings, a tiny filesystem that is
> > currently carried in Ubuntu that has shown it's limitations, and an
> > approach to call override creds in the vfs. None of these solutions have
> > covered all of the above use-cases. Some of them have been fairly hacky
> > too by e.g. violating how things should be passed down to the individual
> > filesystems.
> > The solution proposed here has it's origins in multiple discussions
> > during Linux Plumbers 2017 during and after the end of the containers
> > microconference.
> > To the best of my knowledge this involved Aleksa, Stéphane, Eric, David,
> > James, and myself. The original idea or a variant thereof has been
> > discussed, again to the best of my knowledge, after a Linux conference
> > in St. Petersburg in Russia in 2017 between Christoph, Tycho, and
> > myself.
> > We've taken the time to implement a working version of this solution
> > over the last weeks to the best of my abilities. Tycho has signed up
> > for this sligthly crazy endeavour as well and he has helped with the
> > conversion of the xattr codepaths and will be involved with others in
> > converting additional filesystems.
> > 
> > This series makes idmappings a property of struct vfsmount instead of
> > tying it to a process being inside of a user namespace which has been
> > the case for all other proposed approaches. It also allows to pass down
> > the user namespace into the filesystms which is a clean way instead of
> > violating calling conventions by strapping the user namespace
> > information that is a property of the mount to the caller's credentials
> > or similar hacks.
> > With this idmappings become a property of bind-mounts, i.e. each
> > bind-mount can have a separate idmapping. Such idmapped mounts can even
> > be created inside of the initial user namespace.
> > 
> > The vfsmount struct gains a new struct user_namespace member. The
> > idmapping of the user namespace becomes the idmapping of the mount. A
> > caller that is privileged with respect to the user namespace of
> > the superblock of the underlying filesystem can create an idmapped
> > bind-mount. In the future, we can enable unprivileged use-cases by
> > checking whether the caller is privileged wrt to the user namespace an
> > already idmapped mount has been marked with, allowing them to change the
> > idmapping. For now, keep things simple until we feel sure enough. Note,
> > that with syscall interception it is already possible to intercept
> > idmapped mount requests from unprivileged containers and handle them in
> > a sufficiently privileged container manager. Support for this is
> > already available in LXD and will be available in runC were syscall
> > interception is currently becoming part of the runtime spec (see
> > https://github.com/opencontainers/runtime-spec/pull/1074).
> > 
> > The user namespace the mount will be marked with can be specified by
> > passing a file descriptor refering to the user namespace as an argument
> > to the new mount_setattr() syscall together with the new
> > MOUNT_ATTR_IDMAP flag. By default vfsmounts are marked with the initial
> > user namespace and no behavioral or performance changes should be
> > observed. All mapping operations are nops for the initial user
> > namespace. When a file/inode is accessed through an idmapped mount the
> > i_uid and i_gid of the inode will be remapped according to the user
> > namespace the mount has been marked with.
> > 
> > In order to support idmapped mounts, filesystems need to be changed and
> > mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The initial
> > version contains fat and ext4 including a list of examples. But patches
> > for other filesystems are actively worked on but will be sent out
> > separately. We are here to see this through and there are multiple
> > people involved in converting filesystems. So filesystem developers are
> > not left alone with this.
> > 
> > I have written a simple tool available at
> > https://github.com/brauner/mount-idmapped that allows to create idmapped
> > mounts so people can play with this patch series. Here are a few
> > illustrations:
> > 
> > 1. Create a simple idmapped mount of another user's home directory
> > 
> > u1001@f2-vm:/$ sudo ./mount-idmapped --map-mount b:1000:1001:1 /home/ubuntu/ /mnt
> > u1001@f2-vm:/$ ls -al /home/ubuntu/
> > total 28
> > drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
> > drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
> > -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
> > -rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
> > -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
> > -rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
> > -rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
> > -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
> > u1001@f2-vm:/$ ls -al /mnt/
> > total 28
> > drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
> > drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
> > -rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
> > -rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
> > -rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
> > -rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
> > -rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
> > -rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo
> > u1001@f2-vm:/$ touch /mnt/my-file
> > u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
> > u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
> > u1001@f2-vm:/$ ls -al /mnt/my-file
> > -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
> > u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
> > -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
> > u1001@f2-vm:/$ getfacl /mnt/my-file
> > getfacl: Removing leading '/' from absolute path names
> > # file: mnt/my-file
> > # owner: u1001
> > # group: u1001
> > user::rw-
> > user:u1001:rwx
> > group::rw-
> > mask::rwx
> > other::r--
> > u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
> > getfacl: Removing leading '/' from absolute path names
> > # file: home/ubuntu/my-file
> > # owner: ubuntu
> > # group: ubuntu
> > user::rw-
> > user:ubuntu:rwx
> > group::rw-
> > mask::rwx
> > other::r--
> > 
> > 2. Create mapping of the whole ext4 rootfs without a mapping for uid and gid 0
> > 
> > ubuntu@f2-vm:~$ sudo /mount-idmapped --map-mount b:1:1:65536 / /mnt/
> > ubuntu@f2-vm:~$ findmnt | grep mnt
> > └─/mnt                                /dev/sda2  ext4       rw,relatime
> >   └─/mnt/mnt                          /dev/sda2  ext4       rw,relatime
> > ubuntu@f2-vm:~$ sudo mkdir /AS-ROOT-CAN-CREATE
> > ubuntu@f2-vm:~$ sudo mkdir /mnt/AS-ROOT-CANT-CREATE
> > mkdir: cannot create directory ‘/mnt/AS-ROOT-CANT-CREATE’: Value too large for defined data type
> > ubuntu@f2-vm:~$ mkdir /mnt/home/ubuntu/AS-USER-1000-CAN-CREATE
> > 
> > 3. Create a vfat usb mount and expose to user 1001 and 5000
> > 
> > ubuntu@f2-vm:/$ sudo mount /dev/sdb /mnt
> > ubuntu@f2-vm:/$ findmnt  | grep mnt
> > └─/mnt                                /dev/sdb vfat       rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
> > ubuntu@f2-vm:/$ ls -al /mnt
> > total 12
> > drwxr-xr-x  2 root root 4096 Jan  1  1970 .
> > drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
> > -rwxr-xr-x  1 root root    4 Oct 28 03:44 aaa
> > -rwxr-xr-x  1 root root    0 Oct 28 01:09 bbb
> > ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:1001:1 /mnt /mnt-1001/
> > ubuntu@f2-vm:/$ ls -al /mnt-1001/
> > total 12
> > drwxr-xr-x  2 u1001 u1001 4096 Jan  1  1970 .
> > drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
> > -rwxr-xr-x  1 u1001 u1001    4 Oct 28 03:44 aaa
> > -rwxr-xr-x  1 u1001 u1001    0 Oct 28 01:09 bbb
> > ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:5000:1 /mnt /mnt-5000/
> > ubuntu@f2-vm:/$ ls -al /mnt-5000/
> > total 12
> > drwxr-xr-x  2 5000 5000 4096 Jan  1  1970 .
> > drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
> > -rwxr-xr-x  1 5000 5000    4 Oct 28 03:44 aaa
> > -rwxr-xr-x  1 5000 5000    0 Oct 28 01:09 bbb
> > 
> > 4. Create an idmapped rootfs mount for a container
> > 
> > root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/
> > total 68
> > drwxr-xr-x 17 20000 20000 4096 Sep 24 07:48 .
> > drwxrwx---  3 20000 20000 4096 Oct 16 19:26 ..
> > lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 bin -> usr/bin
> > drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 boot
> > drwxr-xr-x  3 20000 20000 4096 Oct 16 19:26 dev
> > drwxr-xr-x 61 20000 20000 4096 Oct 16 19:26 etc
> > drwxr-xr-x  3 20000 20000 4096 Sep 24 07:45 home
> > lrwxrwxrwx  1 20000 20000    7 Sep 24 07:43 lib -> usr/lib
> > lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib32 -> usr/lib32
> > lrwxrwxrwx  1 20000 20000    9 Sep 24 07:43 lib64 -> usr/lib64
> > lrwxrwxrwx  1 20000 20000   10 Sep 24 07:43 libx32 -> usr/libx32
> > drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 media
> > drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 mnt
> > drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 opt
> > drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 proc
> > drwx------  2 20000 20000 4096 Sep 24 07:43 root
> > drwxr-xr-x  2 20000 20000 4096 Sep 24 07:45 run
> > lrwxrwxrwx  1 20000 20000    8 Sep 24 07:43 sbin -> usr/sbin
> > drwxr-xr-x  2 20000 20000 4096 Sep 24 07:43 srv
> > drwxr-xr-x  2 20000 20000 4096 Apr 15  2020 sys
> > drwxrwxrwt  2 20000 20000 4096 Sep 24 07:44 tmp
> > drwxr-xr-x 13 20000 20000 4096 Sep 24 07:43 usr
> > drwxr-xr-x 12 20000 20000 4096 Sep 24 07:44 var
> > root@f2-vm:~# /mount-idmapped --map-mount b:20000:10000:100000 /var/lib/lxc/f2/rootfs/ /mnt
> > root@f2-vm:~# ls -al /mnt
> > total 68
> > drwxr-xr-x 17 10000 10000 4096 Sep 24 07:48 .
> > drwxr-xr-x 34 root  root  4096 Oct 28 22:24 ..
> > lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 bin -> usr/bin
> > drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 boot
> > drwxr-xr-x  3 10000 10000 4096 Oct 16 19:26 dev
> > drwxr-xr-x 61 10000 10000 4096 Oct 16 19:26 etc
> > drwxr-xr-x  3 10000 10000 4096 Sep 24 07:45 home
> > lrwxrwxrwx  1 10000 10000    7 Sep 24 07:43 lib -> usr/lib
> > lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib32 -> usr/lib32
> > lrwxrwxrwx  1 10000 10000    9 Sep 24 07:43 lib64 -> usr/lib64
> > lrwxrwxrwx  1 10000 10000   10 Sep 24 07:43 libx32 -> usr/libx32
> > drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 media
> > drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 mnt
> > drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 opt
> > drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 proc
> > drwx------  2 10000 10000 4096 Sep 24 07:43 root
> > drwxr-xr-x  2 10000 10000 4096 Sep 24 07:45 run
> > lrwxrwxrwx  1 10000 10000    8 Sep 24 07:43 sbin -> usr/sbin
> > drwxr-xr-x  2 10000 10000 4096 Sep 24 07:43 srv
> > drwxr-xr-x  2 10000 10000 4096 Apr 15  2020 sys
> > drwxrwxrwt  2 10000 10000 4096 Sep 24 07:44 tmp
> > drwxr-xr-x 13 10000 10000 4096 Sep 24 07:43 usr
> > drwxr-xr-x 12 10000 10000 4096 Sep 24 07:44 var
> > root@f2-vm:~# lxc-start f2 # uses /mnt as rootfs
> > root@f2-vm:~# lxc-attach f2 -- cat /proc/1/uid_map
> >          0      10000      10000
> > root@f2-vm:~# lxc-attach f2 -- cat /proc/1/gid_map
> >          0      10000      10000
> > root@f2-vm:~# lxc-attach f2 -- ls -al /
> > total 52
> > drwxr-xr-x  17 root   root    4096 Sep 24 07:48 .
> > drwxr-xr-x  17 root   root    4096 Sep 24 07:48 ..
> > lrwxrwxrwx   1 root   root       7 Sep 24 07:43 bin -> usr/bin
> > drwxr-xr-x   2 root   root    4096 Apr 15  2020 boot
> > drwxr-xr-x   5 root   root     500 Oct 28 23:39 dev
> > drwxr-xr-x  61 root   root    4096 Oct 28 23:39 etc
> > drwxr-xr-x   3 root   root    4096 Sep 24 07:45 home
> > lrwxrwxrwx   1 root   root       7 Sep 24 07:43 lib -> usr/lib
> > lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib32 -> usr/lib32
> > lrwxrwxrwx   1 root   root       9 Sep 24 07:43 lib64 -> usr/lib64
> > lrwxrwxrwx   1 root   root      10 Sep 24 07:43 libx32 -> usr/libx32
> > drwxr-xr-x   2 root   root    4096 Sep 24 07:43 media
> > drwxr-xr-x   2 root   root    4096 Sep 24 07:43 mnt
> > drwxr-xr-x   2 root   root    4096 Sep 24 07:43 opt
> > dr-xr-xr-x 232 nobody nogroup    0 Oct 28 23:39 proc
> > drwx------   2 root   root    4096 Oct 28 23:41 root
> > drwxr-xr-x  12 root   root     360 Oct 28 23:39 run
> > lrwxrwxrwx   1 root   root       8 Sep 24 07:43 sbin -> usr/sbin
> > drwxr-xr-x   2 root   root    4096 Sep 24 07:43 srv
> > dr-xr-xr-x  13 nobody nogroup    0 Oct 28 23:39 sys
> > drwxrwxrwt  11 root   root    4096 Oct 28 23:40 tmp
> > drwxr-xr-x  13 root   root    4096 Sep 24 07:43 usr
> > drwxr-xr-x  12 root   root    4096 Sep 24 07:44 var
> > root@f2-vm:~# lxc-attach f2 -- ls -al /my-file
> > -rw-r--r-- 1 root root 0 Oct 28 23:43 /my-file
> > root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/my-file
> > -rw-r--r-- 1 20000 20000 0 Oct 28 23:43 /var/lib/lxc/f2/rootfs/my-file
> > 
> > I'd like to say thanks to:
> > Al for pointing me into the direction to avoid inode alias issues during
> > lookup. David for various discussions around this. Christoph for proving
> > a first proper review and for being involved in the original idea. Tycho
> > for helping with this series and on future patches to convert
> > filesystems. Alban Crequy and the Kinvolk located just a few streets
> > away from me in Berlin for providing use-case discussions and writing
> > patches for containerd! Stéphane for his invaluable input on many things
> > and level head and enabling me to work on this. Amir for explaining and
> > discussing aspects of overlayfs with me. I'd like to especially thank
> > Seth Forshee because he provided a lot of good analysis, suggestions,
> > and participated in short-notice discussions in both chat and video for
> > some nitty-gritty technical details.
> > 
> > This series can be found and pulled from the three usual locations:
> > https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=idmapped_mounts
> > https://github.com/brauner/linux/tree/idmapped_mounts
> > https://gitlab.com/brauner/linux/-/commits/idmapped_mounts
> > 
> > Thanks!
> > Christian
> > 
> > Christian Brauner (37):
> >   namespace: take lock_mount_hash() directly when changing flags
> >   mount: make {lock,unlock}_mount_hash() static
> >   namespace: only take read lock in do_reconfigure_mnt()
> >   fs: add mount_setattr()
> >   tests: add mount_setattr() selftests
> >   fs: add id translation helpers
> >   mount: attach mappings to mounts
> >   capability: handle idmapped mounts
> >   namei: add idmapped mount aware permission helpers
> >   inode: add idmapped mount aware init and permission helpers
> >   attr: handle idmapped mounts
> >   acl: handle idmapped mounts
> >   commoncap: handle idmapped mounts
> >   stat: handle idmapped mounts
> >   namei: handle idmapped mounts in may_*() helpers
> >   namei: introduce struct renamedata
> >   namei: prepare for idmapped mounts
> >   open: handle idmapped mounts in do_truncate()
> >   open: handle idmapped mounts
> >   af_unix: handle idmapped mounts
> >   utimes: handle idmapped mounts
> >   fcntl: handle idmapped mounts
> >   notify: handle idmapped mounts
> >   init: handle idmapped mounts
> >   ioctl: handle idmapped mounts
> >   would_dump: handle idmapped mounts
> >   exec: handle idmapped mounts
> >   fs: add helpers for idmap mounts
> >   apparmor: handle idmapped mounts
> >   audit: handle idmapped mounts
> >   ima: handle idmapped mounts
> >   fat: handle idmapped mounts
> >   ext4: support idmapped mounts
> >   ecryptfs: do not mount on top of idmapped mounts
> >   overlayfs: do not mount on top of idmapped mounts
> >   fs: introduce MOUNT_ATTR_IDMAP
> >   tests: add vfs/idmapped mounts test suite
> > 
> > Tycho Andersen (2):
> >   xattr: handle idmapped mounts
> >   selftests: add idmapped mounts xattr selftest
> > 
> >  Documentation/filesystems/locking.rst         |    6 +-
> >  Documentation/filesystems/porting.rst         |    2 +
> >  Documentation/filesystems/vfs.rst             |   17 +-
> >  arch/alpha/kernel/syscalls/syscall.tbl        |    1 +
> >  arch/arm/tools/syscall.tbl                    |    1 +
> >  arch/arm64/include/asm/unistd32.h             |    2 +
> >  arch/ia64/kernel/syscalls/syscall.tbl         |    1 +
> >  arch/m68k/kernel/syscalls/syscall.tbl         |    1 +
> >  arch/microblaze/kernel/syscalls/syscall.tbl   |    1 +
> >  arch/mips/kernel/syscalls/syscall_n32.tbl     |    1 +
> >  arch/mips/kernel/syscalls/syscall_n64.tbl     |    1 +
> >  arch/mips/kernel/syscalls/syscall_o32.tbl     |    1 +
> >  arch/parisc/kernel/syscalls/syscall.tbl       |    1 +
> >  arch/powerpc/kernel/syscalls/syscall.tbl      |    1 +
> >  arch/powerpc/platforms/cell/spufs/inode.c     |    5 +-
> >  arch/s390/kernel/syscalls/syscall.tbl         |    1 +
> >  arch/sh/kernel/syscalls/syscall.tbl           |    1 +
> >  arch/sparc/kernel/syscalls/syscall.tbl        |    1 +
> >  arch/x86/entry/syscalls/syscall_32.tbl        |    1 +
> >  arch/x86/entry/syscalls/syscall_64.tbl        |    1 +
> >  arch/xtensa/kernel/syscalls/syscall.tbl       |    1 +
> >  drivers/android/binderfs.c                    |    3 +-
> >  drivers/base/devtmpfs.c                       |   12 +-
> >  fs/9p/acl.c                                   |    7 +-
> >  fs/9p/v9fs.h                                  |    3 +-
> >  fs/9p/v9fs_vfs.h                              |    2 +-
> >  fs/9p/vfs_inode.c                             |   32 +-
> >  fs/9p/vfs_inode_dotl.c                        |   34 +-
> >  fs/9p/xattr.c                                 |    1 +
> >  fs/adfs/adfs.h                                |    3 +-
> >  fs/adfs/inode.c                               |    5 +-
> >  fs/affs/affs.h                                |   10 +-
> >  fs/affs/inode.c                               |    7 +-
> >  fs/affs/namei.c                               |   15 +-
> >  fs/afs/dir.c                                  |   34 +-
> >  fs/afs/inode.c                                |    5 +-
> >  fs/afs/internal.h                             |    4 +-
> >  fs/afs/security.c                             |    2 +-
> >  fs/afs/xattr.c                                |    2 +
> >  fs/attr.c                                     |   78 +-
> >  fs/autofs/root.c                              |   13 +-
> >  fs/bad_inode.c                                |   31 +-
> >  fs/bfs/dir.c                                  |   12 +-
> >  fs/btrfs/acl.c                                |    5 +-
> >  fs/btrfs/ctree.h                              |    3 +-
> >  fs/btrfs/inode.c                              |   41 +-
> >  fs/btrfs/ioctl.c                              |   25 +-
> >  fs/btrfs/tests/btrfs-tests.c                  |    2 +-
> >  fs/btrfs/xattr.c                              |    2 +
> >  fs/cachefiles/interface.c                     |    4 +-
> >  fs/cachefiles/namei.c                         |   19 +-
> >  fs/cachefiles/xattr.c                         |   16 +-
> >  fs/ceph/acl.c                                 |    5 +-
> >  fs/ceph/dir.c                                 |   23 +-
> >  fs/ceph/inode.c                               |   13 +-
> >  fs/ceph/super.h                               |    8 +-
> >  fs/ceph/xattr.c                               |    1 +
> >  fs/cifs/cifsfs.c                              |    4 +-
> >  fs/cifs/cifsfs.h                              |   18 +-
> >  fs/cifs/dir.c                                 |    8 +-
> >  fs/cifs/inode.c                               |   22 +-
> >  fs/cifs/link.c                                |    3 +-
> >  fs/cifs/xattr.c                               |    1 +
> >  fs/coda/coda_linux.h                          |    4 +-
> >  fs/coda/dir.c                                 |   18 +-
> >  fs/coda/inode.c                               |    5 +-
> >  fs/coda/pioctl.c                              |    6 +-
> >  fs/configfs/configfs_internal.h               |    7 +-
> >  fs/configfs/dir.c                             |    3 +-
> >  fs/configfs/inode.c                           |    5 +-
> >  fs/configfs/symlink.c                         |    5 +-
> >  fs/coredump.c                                 |   12 +-
> >  fs/crypto/policy.c                            |    2 +-
> >  fs/debugfs/inode.c                            |    9 +-
> >  fs/ecryptfs/crypto.c                          |    4 +-
> >  fs/ecryptfs/inode.c                           |   74 +-
> >  fs/ecryptfs/main.c                            |    6 +
> >  fs/ecryptfs/mmap.c                            |    4 +-
> >  fs/efivarfs/file.c                            |    2 +-
> >  fs/efivarfs/inode.c                           |    4 +-
> >  fs/erofs/inode.c                              |    2 +-
> >  fs/exec.c                                     |   12 +-
> >  fs/exfat/exfat_fs.h                           |    3 +-
> >  fs/exfat/file.c                               |    9 +-
> >  fs/exfat/namei.c                              |   13 +-
> >  fs/ext2/acl.c                                 |    5 +-
> >  fs/ext2/acl.h                                 |    3 +-
> >  fs/ext2/ext2.h                                |    2 +-
> >  fs/ext2/ialloc.c                              |    2 +-
> >  fs/ext2/inode.c                               |   11 +-
> >  fs/ext2/ioctl.c                               |    6 +-
> >  fs/ext2/namei.c                               |   22 +-
> >  fs/ext2/xattr_security.c                      |    1 +
> >  fs/ext2/xattr_trusted.c                       |    1 +
> >  fs/ext2/xattr_user.c                          |    1 +
> >  fs/ext4/Kconfig                               |    9 +
> >  fs/ext4/acl.c                                 |    5 +-
> >  fs/ext4/acl.h                                 |    3 +-
> >  fs/ext4/ext4.h                                |   15 +-
> >  fs/ext4/ialloc.c                              |    7 +-
> >  fs/ext4/inode.c                               |   14 +-
> >  fs/ext4/ioctl.c                               |   18 +-
> >  fs/ext4/namei.c                               |   52 +-
> >  fs/ext4/super.c                               |    6 +-
> >  fs/ext4/xattr_hurd.c                          |    1 +
> >  fs/ext4/xattr_security.c                      |    1 +
> >  fs/ext4/xattr_trusted.c                       |    1 +
> >  fs/ext4/xattr_user.c                          |    1 +
> >  fs/f2fs/acl.c                                 |    5 +-
> >  fs/f2fs/acl.h                                 |    3 +-
> >  fs/f2fs/f2fs.h                                |    3 +-
> >  fs/f2fs/file.c                                |   23 +-
> >  fs/f2fs/namei.c                               |   26 +-
> >  fs/f2fs/xattr.c                               |    4 +-
> >  fs/fat/fat.h                                  |    3 +-
> >  fs/fat/file.c                                 |   20 +-
> >  fs/fat/namei_msdos.c                          |   15 +-
> >  fs/fat/namei_vfat.c                           |   15 +-
> >  fs/fcntl.c                                    |    3 +-
> >  fs/fuse/acl.c                                 |    3 +-
> >  fs/fuse/dir.c                                 |   41 +-
> >  fs/fuse/fuse_i.h                              |    4 +-
> >  fs/fuse/xattr.c                               |    2 +
> >  fs/gfs2/acl.c                                 |    5 +-
> >  fs/gfs2/acl.h                                 |    3 +-
> >  fs/gfs2/file.c                                |    4 +-
> >  fs/gfs2/inode.c                               |   54 +-
> >  fs/gfs2/inode.h                               |    2 +-
> >  fs/gfs2/xattr.c                               |    1 +
> >  fs/hfs/attr.c                                 |    1 +
> >  fs/hfs/dir.c                                  |   13 +-
> >  fs/hfs/hfs_fs.h                               |    2 +-
> >  fs/hfs/inode.c                                |    7 +-
> >  fs/hfsplus/dir.c                              |   25 +-
> >  fs/hfsplus/inode.c                            |   11 +-
> >  fs/hfsplus/ioctl.c                            |    2 +-
> >  fs/hfsplus/xattr.c                            |    1 +
> >  fs/hfsplus/xattr_security.c                   |    1 +
> >  fs/hfsplus/xattr_trusted.c                    |    1 +
> >  fs/hfsplus/xattr_user.c                       |    1 +
> >  fs/hostfs/hostfs_kern.c                       |   31 +-
> >  fs/hpfs/hpfs_fn.h                             |    2 +-
> >  fs/hpfs/inode.c                               |    7 +-
> >  fs/hpfs/namei.c                               |   20 +-
> >  fs/hugetlbfs/inode.c                          |   31 +-
> >  fs/init.c                                     |   21 +-
> >  fs/inode.c                                    |   38 +-
> >  fs/internal.h                                 |    9 +
> >  fs/jffs2/acl.c                                |    5 +-
> >  fs/jffs2/acl.h                                |    3 +-
> >  fs/jffs2/dir.c                                |   32 +-
> >  fs/jffs2/fs.c                                 |    7 +-
> >  fs/jffs2/os-linux.h                           |    2 +-
> >  fs/jffs2/security.c                           |    1 +
> >  fs/jffs2/xattr_trusted.c                      |    1 +
> >  fs/jffs2/xattr_user.c                         |    1 +
> >  fs/jfs/acl.c                                  |    5 +-
> >  fs/jfs/file.c                                 |    9 +-
> >  fs/jfs/ioctl.c                                |    2 +-
> >  fs/jfs/jfs_acl.h                              |    3 +-
> >  fs/jfs/jfs_inode.c                            |    2 +-
> >  fs/jfs/jfs_inode.h                            |    2 +-
> >  fs/jfs/namei.c                                |   21 +-
> >  fs/jfs/xattr.c                                |    2 +
> >  fs/kernfs/dir.c                               |    7 +-
> >  fs/kernfs/inode.c                             |   15 +-
> >  fs/kernfs/kernfs-internal.h                   |    5 +-
> >  fs/libfs.c                                    |   20 +-
> >  fs/minix/bitmap.c                             |    2 +-
> >  fs/minix/file.c                               |    7 +-
> >  fs/minix/inode.c                              |    2 +-
> >  fs/minix/namei.c                              |   25 +-
> >  fs/mount.h                                    |   10 -
> >  fs/namei.c                                    |  317 +-
> >  fs/namespace.c                                |  473 ++-
> >  fs/nfs/dir.c                                  |   23 +-
> >  fs/nfs/inode.c                                |    5 +-
> >  fs/nfs/internal.h                             |   10 +-
> >  fs/nfs/namespace.c                            |    7 +-
> >  fs/nfs/nfs3_fs.h                              |    3 +-
> >  fs/nfs/nfs3acl.c                              |    3 +-
> >  fs/nfs/nfs4proc.c                             |    3 +
> >  fs/nfsd/nfs2acl.c                             |    4 +-
> >  fs/nfsd/nfs3acl.c                             |    4 +-
> >  fs/nfsd/nfs4acl.c                             |    4 +-
> >  fs/nfsd/nfs4recover.c                         |    6 +-
> >  fs/nfsd/nfsfh.c                               |    2 +-
> >  fs/nfsd/nfsproc.c                             |    2 +-
> >  fs/nfsd/vfs.c                                 |   47 +-
> >  fs/nilfs2/inode.c                             |   13 +-
> >  fs/nilfs2/ioctl.c                             |    2 +-
> >  fs/nilfs2/namei.c                             |   20 +-
> >  fs/nilfs2/nilfs.h                             |    4 +-
> >  fs/notify/fanotify/fanotify_user.c            |    2 +-
> >  fs/notify/inotify/inotify_user.c              |    3 +-
> >  fs/ntfs/inode.c                               |    2 +-
> >  fs/ocfs2/acl.c                                |    5 +-
> >  fs/ocfs2/acl.h                                |    3 +-
> >  fs/ocfs2/dlmfs/dlmfs.c                        |   17 +-
> >  fs/ocfs2/file.c                               |   13 +-
> >  fs/ocfs2/file.h                               |    5 +-
> >  fs/ocfs2/ioctl.c                              |    2 +-
> >  fs/ocfs2/namei.c                              |   21 +-
> >  fs/ocfs2/refcounttree.c                       |    4 +-
> >  fs/ocfs2/xattr.c                              |    3 +
> >  fs/omfs/dir.c                                 |   13 +-
> >  fs/omfs/file.c                                |    7 +-
> >  fs/omfs/inode.c                               |    2 +-
> >  fs/open.c                                     |   52 +-
> >  fs/orangefs/acl.c                             |    5 +-
> >  fs/orangefs/inode.c                           |   15 +-
> >  fs/orangefs/namei.c                           |   12 +-
> >  fs/orangefs/orangefs-kernel.h                 |    7 +-
> >  fs/orangefs/xattr.c                           |    1 +
> >  fs/overlayfs/copy_up.c                        |   20 +-
> >  fs/overlayfs/dir.c                            |   31 +-
> >  fs/overlayfs/file.c                           |    6 +-
> >  fs/overlayfs/inode.c                          |   21 +-
> >  fs/overlayfs/overlayfs.h                      |   40 +-
> >  fs/overlayfs/super.c                          |   27 +-
> >  fs/overlayfs/util.c                           |    4 +-
> >  fs/posix_acl.c                                |   78 +-
> >  fs/proc/base.c                                |   24 +-
> >  fs/proc/fd.c                                  |    4 +-
> >  fs/proc/fd.h                                  |    3 +-
> >  fs/proc/generic.c                             |    9 +-
> >  fs/proc/internal.h                            |    2 +-
> >  fs/proc/proc_net.c                            |    2 +-
> >  fs/proc/proc_sysctl.c                         |   12 +-
> >  fs/proc/root.c                                |    2 +-
> >  fs/proc_namespace.c                           |    1 +
> >  fs/ramfs/file-nommu.c                         |    9 +-
> >  fs/ramfs/inode.c                              |   18 +-
> >  fs/reiserfs/acl.h                             |    3 +-
> >  fs/reiserfs/inode.c                           |    7 +-
> >  fs/reiserfs/ioctl.c                           |    4 +-
> >  fs/reiserfs/namei.c                           |   21 +-
> >  fs/reiserfs/reiserfs.h                        |    3 +-
> >  fs/reiserfs/xattr.c                           |   12 +-
> >  fs/reiserfs/xattr.h                           |    2 +-
> >  fs/reiserfs/xattr_acl.c                       |    7 +-
> >  fs/reiserfs/xattr_security.c                  |    3 +-
> >  fs/reiserfs/xattr_trusted.c                   |    3 +-
> >  fs/reiserfs/xattr_user.c                      |    3 +-
> >  fs/remap_range.c                              |    7 +-
> >  fs/stat.c                                     |   10 +-
> >  fs/sysv/file.c                                |    7 +-
> >  fs/sysv/ialloc.c                              |    2 +-
> >  fs/sysv/itree.c                               |    2 +-
> >  fs/sysv/namei.c                               |   21 +-
> >  fs/tracefs/inode.c                            |    4 +-
> >  fs/ubifs/dir.c                                |   29 +-
> >  fs/ubifs/file.c                               |    5 +-
> >  fs/ubifs/ioctl.c                              |    2 +-
> >  fs/ubifs/ubifs.h                              |    3 +-
> >  fs/ubifs/xattr.c                              |    1 +
> >  fs/udf/file.c                                 |    9 +-
> >  fs/udf/ialloc.c                               |    2 +-
> >  fs/udf/namei.c                                |   24 +-
> >  fs/udf/symlink.c                              |    2 +-
> >  fs/ufs/ialloc.c                               |    2 +-
> >  fs/ufs/inode.c                                |    7 +-
> >  fs/ufs/namei.c                                |   19 +-
> >  fs/ufs/ufs.h                                  |    3 +-
> >  fs/utimes.c                                   |    4 +-
> >  fs/vboxsf/dir.c                               |   12 +-
> >  fs/vboxsf/utils.c                             |    5 +-
> >  fs/vboxsf/vfsmod.h                            |    3 +-
> >  fs/verity/enable.c                            |    2 +-
> >  fs/xattr.c                                    |  136 +-
> >  fs/xfs/xfs_acl.c                              |    5 +-
> >  fs/xfs/xfs_acl.h                              |    3 +-
> >  fs/xfs/xfs_ioctl.c                            |    4 +-
> >  fs/xfs/xfs_iops.c                             |   55 +-
> >  fs/xfs/xfs_xattr.c                            |    3 +-
> >  fs/zonefs/super.c                             |    9 +-
> >  include/linux/audit.h                         |   10 +-
> >  include/linux/capability.h                    |   14 +-
> >  include/linux/fs.h                            |  145 +-
> >  include/linux/ima.h                           |   15 +-
> >  include/linux/lsm_hook_defs.h                 |   15 +-
> >  include/linux/lsm_hooks.h                     |    1 +
> >  include/linux/mount.h                         |   14 +-
> >  include/linux/nfs_fs.h                        |    4 +-
> >  include/linux/posix_acl.h                     |   15 +-
> >  include/linux/posix_acl_xattr.h               |   12 +-
> >  include/linux/security.h                      |   44 +-
> >  include/linux/syscalls.h                      |    3 +
> >  include/linux/xattr.h                         |   30 +-
> >  include/uapi/asm-generic/unistd.h             |    4 +-
> >  include/uapi/linux/mount.h                    |   25 +
> >  ipc/mqueue.c                                  |   16 +-
> >  kernel/auditsc.c                              |   29 +-
> >  kernel/bpf/inode.c                            |   13 +-
> >  kernel/capability.c                           |   14 +-
> >  kernel/cgroup/cgroup.c                        |    2 +-
> >  kernel/sys.c                                  |    2 +-
> >  mm/madvise.c                                  |    4 +-
> >  mm/memcontrol.c                               |    2 +-
> >  mm/mincore.c                                  |    4 +-
> >  mm/shmem.c                                    |   45 +-
> >  net/socket.c                                  |    6 +-
> >  net/unix/af_unix.c                            |    4 +-
> >  security/apparmor/apparmorfs.c                |    3 +-
> >  security/apparmor/domain.c                    |   13 +-
> >  security/apparmor/file.c                      |    5 +-
> >  security/apparmor/lsm.c                       |   12 +-
> >  security/commoncap.c                          |   46 +-
> >  security/integrity/evm/evm_crypto.c           |   11 +-
> >  security/integrity/evm/evm_main.c             |    4 +-
> >  security/integrity/evm/evm_secfs.c            |    2 +-
> >  security/integrity/ima/ima.h                  |   19 +-
> >  security/integrity/ima/ima_api.c              |   10 +-
> >  security/integrity/ima/ima_appraise.c         |   22 +-
> >  security/integrity/ima/ima_asymmetric_keys.c  |    2 +-
> >  security/integrity/ima/ima_main.c             |   28 +-
> >  security/integrity/ima/ima_policy.c           |   17 +-
> >  security/integrity/ima/ima_queue_keys.c       |    2 +-
> >  security/security.c                           |   25 +-
> >  security/selinux/hooks.c                      |   22 +-
> >  security/smack/smack_lsm.c                    |   18 +-
> >  tools/include/uapi/asm-generic/unistd.h       |    4 +-
> >  tools/testing/selftests/Makefile              |    1 +
> >  .../testing/selftests/idmap_mounts/.gitignore |    2 +
> >  tools/testing/selftests/idmap_mounts/Makefile |   12 +
> >  tools/testing/selftests/idmap_mounts/config   |    1 +
> >  tools/testing/selftests/idmap_mounts/core.c   | 3476 +++++++++++++++++
> >  .../testing/selftests/idmap_mounts/internal.h |  127 +
> >  tools/testing/selftests/idmap_mounts/utils.c  |  136 +
> >  tools/testing/selftests/idmap_mounts/utils.h  |   17 +
> >  tools/testing/selftests/idmap_mounts/xattr.c  |  172 +
> >  .../selftests/mount_setattr/.gitignore        |    1 +
> >  .../testing/selftests/mount_setattr/Makefile  |    7 +
> >  tools/testing/selftests/mount_setattr/config  |    1 +
> >  .../mount_setattr/mount_setattr_test.c        |  889 +++++
> >  335 files changed, 7594 insertions(+), 1570 deletions(-)
> >  create mode 100644 tools/testing/selftests/idmap_mounts/.gitignore
> >  create mode 100644 tools/testing/selftests/idmap_mounts/Makefile
> >  create mode 100644 tools/testing/selftests/idmap_mounts/config
> >  create mode 100644 tools/testing/selftests/idmap_mounts/core.c
> >  create mode 100644 tools/testing/selftests/idmap_mounts/internal.h
> >  create mode 100644 tools/testing/selftests/idmap_mounts/utils.c
> >  create mode 100644 tools/testing/selftests/idmap_mounts/utils.h
> >  create mode 100644 tools/testing/selftests/idmap_mounts/xattr.c
> >  create mode 100644 tools/testing/selftests/mount_setattr/.gitignore
> >  create mode 100644 tools/testing/selftests/mount_setattr/Makefile
> >  create mode 100644 tools/testing/selftests/mount_setattr/config
> >  create mode 100644 tools/testing/selftests/mount_setattr/mount_setattr_test.c
> > 
> > 
> > base-commit: 3cea11cd5e3b00d91caf0b4730194039b45c5891
> > -- 
> > 2.29.2
> > 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/39] fs: idmapped mounts
  2020-11-20  9:10   ` Christian Brauner
@ 2020-11-20  9:12     ` Christoph Hellwig
  2020-11-20 11:58       ` Christian Brauner
  0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2020-11-20  9:12 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Alexander Viro, Christoph Hellwig,
	linux-fsdevel, John Johansen, James Morris, Mimi Zohar,
	Dmitry Kasatkin, Stephen Smalley, Casey Schaufler, Arnd Bergmann,
	Andreas Dilger, OGAWA Hirofumi, Geoffrey Thomas, Mrunal Patel,
	Josh Triplett, Andy Lutomirski, Theodore Tso, Alban Crequy,
	Tycho Andersen, David Howells, James Bottomley, Jann Horn,
	Seth Forshee, St??phane Graber, Aleksa Sarai, Lennart Poettering,
	Eric W. Biederman, smbarber, Phil Estes, Serge Hallyn, Kees Cook,
	Todd Kjos, Jonathan Corbet, containers, linux-security-module,
	linux-api, linux-ext4, linux-audit, linux-integrity, selinux

On Fri, Nov 20, 2020 at 10:10:44AM +0100, Christian Brauner wrote:
> Maybe you didn't see this or you're referring to xfstests but this
> series contains a >=4000 lines long test-suite that validates all core
> features with and without idmapped mounts. It's the last patch in this
> version of the series and it's located in:
> tools/testing/selftests/idmap_mounts.
> 
> Everytime a filesystem is added this test-suite will be updated. We
> would perfer if this test would be shipped with the kernel itself and
> not in a separate test-suite such as xfstests. But we're happy to add
> patches for the latter at some point too.

selftests is a complete pain to use, partialy because it is not
integrated with the framework we file system developers use (xfstests)
and partially because having the test suite in the kernel tree really
breaks a lot of the typical use cases.  So I think we'll need to wire
this up in the proper place instead.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 00/39] fs: idmapped mounts
  2020-11-20  9:12     ` Christoph Hellwig
@ 2020-11-20 11:58       ` Christian Brauner
  0 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-20 11:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Alexander Viro, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, St??phane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux

On Fri, Nov 20, 2020 at 09:12:47AM +0000, Christoph Hellwig wrote:
> On Fri, Nov 20, 2020 at 10:10:44AM +0100, Christian Brauner wrote:
> > Maybe you didn't see this or you're referring to xfstests but this
> > series contains a >=4000 lines long test-suite that validates all core
> > features with and without idmapped mounts. It's the last patch in this
> > version of the series and it's located in:
> > tools/testing/selftests/idmap_mounts.
> > 
> > Everytime a filesystem is added this test-suite will be updated. We
> > would perfer if this test would be shipped with the kernel itself and
> > not in a separate test-suite such as xfstests. But we're happy to add
> > patches for the latter at some point too.
> 
> selftests is a complete pain to use, partialy because it is not
> integrated with the framework we file system developers use (xfstests)
> and partially because having the test suite in the kernel tree really
> breaks a lot of the typical use cases.  So I think we'll need to wire
> this up in the proper place instead.

Ok, I think I can basically port the test-suite at the end of this patch
series so that it can be carried in xfstests/src/idmapped_mounts.c

I'll start doing that now.
It would make it a bit easier if we could carry it as a single file for
now.

Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 39/39] tests: add vfs/idmapped mounts test suite
  2020-11-15 10:37 ` [PATCH v2 39/39] tests: add vfs/idmapped mounts test suite Christian Brauner
@ 2020-11-20 21:15   ` Kees Cook
  0 siblings, 0 replies; 63+ messages in thread
From: Kees Cook @ 2020-11-20 21:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Todd Kjos, Jonathan Corbet, containers,
	linux-security-module, linux-api, linux-ext4, linux-audit,
	linux-integrity, selinux, Christoph Hellwig

On Sun, Nov 15, 2020 at 11:37:18AM +0100, Christian Brauner wrote:
> This adds a whole test suite for idmapped mounts but in order to ensure that
> there are no regression for the vfs itself it also includes tests for correct
> functionality on non-idmapped mounts. The following tests are currently
> available with more to come in the future:

Awesome! :)

Some glitches in the build, though... something about the ordering or
the Make rules produces odd results on a failure:

$ make
gcc -g -I../../../../usr/include/ -Wall -O2 -pthread    xattr.c internal.h utils.c utils.h -lcap -o /home/kees/src/linux-build/seccomp/tools/testing/selftests/idmap_mounts/xattr
gcc -g -I../../../../usr/include/ -Wall -O2 -pthread    core.c internal.h utils.c utils.h -lcap -o /home/kees/src/linux-build/seccomp/tools/testing/selftests/idmap_mounts/core
core.c:19:10: fatal error: sys/acl.h: No such file or directory
   19 | #include <sys/acl.h>
      |          ^~~~~~~~~~~
compilation terminated.
make: *** [../lib.mk:139: /home/kees/src/linux-build/seccomp/tools/testing/selftests/idmap_mounts/core]
Error 1
$ make
make: Nothing to be done for 'all'.
$ file xattr core
xattr: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7a3c1951e54f20e657b4181c1be77c7183a54f81, for GNU/Linux 3.2.0, with debug_info, not stripped
core:  GCC precompiled header (version 014) for C

Even after I install libacl1-dev, I still get a "core" file output which
breaks attempts to build again. :)


Is there any way to have the test suite not depend on
__NR_mount_setattr? Running this test on older kernels fails everything.


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 14/39] commoncap: handle idmapped mounts
  2020-11-15 10:36 ` [PATCH v2 14/39] commoncap: " Christian Brauner
@ 2020-11-22 21:18   ` Paul Moore
  2020-11-23  7:45     ` Christian Brauner
  0 siblings, 1 reply; 63+ messages in thread
From: Paul Moore @ 2020-11-22 21:18 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux, Christoph Hellwig

On Sun, Nov 15, 2020 at 5:39 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> When interacting with user namespace and non-user namespace aware
> filesystem capabilities the vfs will perform various security checks to
> determine whether or not the filesystem capabilities can be used by the
> caller (e.g. during exec), or even whether they need to be removed. The
> main infrastructure for this resides in the capability codepaths but they
> are called through the LSM security infrastructure even though they are not
> technically an LSM or optional. This extends the existing security hooks
> security_inode_removexattr(), security_inode_killpriv(),
> security_inode_getsecurity() to pass down the mount's user namespace and
> makes them aware of idmapped mounts.
> In order to actually get filesystem capabilities from disk the capability
> infrastructure exposes the get_vfs_caps_from_disk() helper. For user
> namespace aware filesystem capabilities a root uid is stored alongside the
> capabilities.
> In order to determine whether the caller can make use of the filesystem
> capability or whether it needs to be ignored it is translated according to
> the superblock's user namespace. If it can be translated to uid 0 according
> to that id mapping the caller can use the filesystem capabilities stored on
> disk. If we are accessing the inode that holds the filesystem capabilities
> through an idmapped mount we need to map the root uid according to the
> mount's user namespace.
> Afterwards the checks are identical to non-idmapped mounts. Reading
> filesystem caps from disk enforces that the root uid associated with the
> filesystem capability must have a mapping in the superblock's user
> namespace and that the caller is either in the same user namespace or is a
> descendant of the superblock's user namespace. For filesystems that are
> mountable inside user namespace the container can just mount the filesystem
> and won't usually need to idmap it. If it does create an idmapped mount it
> can mark it with a user namespace it has created and which is therefore a
> descendant of the s_user_ns. For filesystems that are not mountable inside
> user namespaces the descendant rule is trivially true because the s_user_ns
> will be the initial user namespace.
>
> If the initial user namespace is passed all operations are a nop so
> non-idmapped mounts will not see a change in behavior and will also not see
> any performance impact.
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

...

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 8dba8f0983b5..ddb9213a3e81 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -1944,7 +1944,7 @@ static inline int audit_copy_fcaps(struct audit_names *name,
>         if (!dentry)
>                 return 0;
>
> -       rc = get_vfs_caps_from_disk(dentry, &caps);
> +       rc = get_vfs_caps_from_disk(&init_user_ns, dentry, &caps);
>         if (rc)
>                 return rc;
>
> @@ -2495,7 +2495,8 @@ int __audit_log_bprm_fcaps(struct linux_binprm *bprm,
>         ax->d.next = context->aux;
>         context->aux = (void *)ax;
>
> -       get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
> +       get_vfs_caps_from_disk(mnt_user_ns(bprm->file->f_path.mnt),
> +                              bprm->file->f_path.dentry, &vcaps);

As audit currently records information in the context of the
initial/host namespace I'm guessing we don't want the mnt_user_ns()
call above; it seems like &init_user_ns would be the right choice
(similar to audit_copy_fcaps()), yes?

>         ax->fcap.permitted = vcaps.permitted;
>         ax->fcap.inheritable = vcaps.inheritable;

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 31/39] audit: handle idmapped mounts
  2020-11-15 10:37 ` [PATCH v2 31/39] audit: " Christian Brauner
@ 2020-11-22 22:17   ` Paul Moore
  2020-11-23  7:41     ` Christian Brauner
  0 siblings, 1 reply; 63+ messages in thread
From: Paul Moore @ 2020-11-22 22:17 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux

On Sun, Nov 15, 2020 at 5:43 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> Audit will sometimes log the inode's i_uid and i_gid. Enable audit to log the
> mapped inode when it is accessed from an idmapped mount.

I mentioned this in an earlier patch in this patchset, but it is worth
repeating here: audit currently records information in the context of
the initial/host namespace and I believe it should probably stay that
way until the rest of the namespace smarts that Richard is working on
is merged.  If we do change the context of the inode's UID and GID
information it has the potential to create a rather odd looking audit
record with inconsistent credentials and the filters would yield some
very interesting results.

> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---
> /* v2 */
> unchanged
> ---
>  fs/namei.c            | 14 +++++++-------
>  include/linux/audit.h | 10 ++++++----
>  ipc/mqueue.c          |  8 ++++----
>  kernel/auditsc.c      | 26 ++++++++++++++------------
>  4 files changed, 31 insertions(+), 27 deletions(-)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 31/39] audit: handle idmapped mounts
  2020-11-22 22:17   ` Paul Moore
@ 2020-11-23  7:41     ` Christian Brauner
  2020-11-23 22:06       ` Paul Moore
  0 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-23  7:41 UTC (permalink / raw)
  To: Paul Moore
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux

On Sun, Nov 22, 2020 at 05:17:39PM -0500, Paul Moore wrote:
> On Sun, Nov 15, 2020 at 5:43 AM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> > Audit will sometimes log the inode's i_uid and i_gid. Enable audit to log the
> > mapped inode when it is accessed from an idmapped mount.
> 
> I mentioned this in an earlier patch in this patchset, but it is worth

I did not receive that message.

> repeating here: audit currently records information in the context of
> the initial/host namespace and I believe it should probably stay that
> way until the rest of the namespace smarts that Richard is working on

Ah, that's good to know and makes the patchset simpler so I'm all for
it.

Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 14/39] commoncap: handle idmapped mounts
  2020-11-22 21:18   ` Paul Moore
@ 2020-11-23  7:45     ` Christian Brauner
  0 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-23  7:45 UTC (permalink / raw)
  To: Paul Moore
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux, Christoph Hellwig

On Sun, Nov 22, 2020 at 04:18:55PM -0500, Paul Moore wrote:
> On Sun, Nov 15, 2020 at 5:39 AM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> > When interacting with user namespace and non-user namespace aware
> > filesystem capabilities the vfs will perform various security checks to
> > determine whether or not the filesystem capabilities can be used by the
> > caller (e.g. during exec), or even whether they need to be removed. The
> > main infrastructure for this resides in the capability codepaths but they
> > are called through the LSM security infrastructure even though they are not
> > technically an LSM or optional. This extends the existing security hooks
> > security_inode_removexattr(), security_inode_killpriv(),
> > security_inode_getsecurity() to pass down the mount's user namespace and
> > makes them aware of idmapped mounts.
> > In order to actually get filesystem capabilities from disk the capability
> > infrastructure exposes the get_vfs_caps_from_disk() helper. For user
> > namespace aware filesystem capabilities a root uid is stored alongside the
> > capabilities.
> > In order to determine whether the caller can make use of the filesystem
> > capability or whether it needs to be ignored it is translated according to
> > the superblock's user namespace. If it can be translated to uid 0 according
> > to that id mapping the caller can use the filesystem capabilities stored on
> > disk. If we are accessing the inode that holds the filesystem capabilities
> > through an idmapped mount we need to map the root uid according to the
> > mount's user namespace.
> > Afterwards the checks are identical to non-idmapped mounts. Reading
> > filesystem caps from disk enforces that the root uid associated with the
> > filesystem capability must have a mapping in the superblock's user
> > namespace and that the caller is either in the same user namespace or is a
> > descendant of the superblock's user namespace. For filesystems that are
> > mountable inside user namespace the container can just mount the filesystem
> > and won't usually need to idmap it. If it does create an idmapped mount it
> > can mark it with a user namespace it has created and which is therefore a
> > descendant of the s_user_ns. For filesystems that are not mountable inside
> > user namespaces the descendant rule is trivially true because the s_user_ns
> > will be the initial user namespace.
> >
> > If the initial user namespace is passed all operations are a nop so
> > non-idmapped mounts will not see a change in behavior and will also not see
> > any performance impact.
> >
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: linux-fsdevel@vger.kernel.org
> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> ...
> 
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 8dba8f0983b5..ddb9213a3e81 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -1944,7 +1944,7 @@ static inline int audit_copy_fcaps(struct audit_names *name,
> >         if (!dentry)
> >                 return 0;
> >
> > -       rc = get_vfs_caps_from_disk(dentry, &caps);
> > +       rc = get_vfs_caps_from_disk(&init_user_ns, dentry, &caps);
> >         if (rc)
> >                 return rc;
> >
> > @@ -2495,7 +2495,8 @@ int __audit_log_bprm_fcaps(struct linux_binprm *bprm,
> >         ax->d.next = context->aux;
> >         context->aux = (void *)ax;
> >
> > -       get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
> > +       get_vfs_caps_from_disk(mnt_user_ns(bprm->file->f_path.mnt),
> > +                              bprm->file->f_path.dentry, &vcaps);
> 
> As audit currently records information in the context of the
> initial/host namespace I'm guessing we don't want the mnt_user_ns()
> call above; it seems like &init_user_ns would be the right choice
> (similar to audit_copy_fcaps()), yes?

Ok, sounds good. It also makes the patchset simpler.
Note that I'm currently not on the audit mailing list so this is likely
not going to show up there.

(Fwiw, I responded to you in your other mail too.)

Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-15 10:36 ` [PATCH v2 07/39] mount: attach mappings to mounts Christian Brauner
@ 2020-11-23 15:47   ` Tycho Andersen
  2020-11-23 16:24     ` Tycho Andersen
  0 siblings, 1 reply; 63+ messages in thread
From: Tycho Andersen @ 2020-11-23 15:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel,
	Lennart Poettering, Mimi Zohar, James Bottomley, Andreas Dilger,
	containers, Christoph Hellwig, Jonathan Corbet, smbarber,
	linux-ext4, Mrunal Patel, Kees Cook, Arnd Bergmann, Jann Horn,
	selinux, Josh Triplett, Seth Forshee, Aleksa Sarai,
	Andy Lutomirski, OGAWA Hirofumi, Geoffrey Thomas, David Howells,
	John Johansen, Theodore Tso, Dmitry Kasatkin, Stephen Smalley,
	linux-security-module, linux-audit, Eric W. Biederman, linux-api,
	Casey Schaufler, Alban Crequy, linux-integrity, Todd Kjos

On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote:
> +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
> +{
> +	return mnt->mnt_user_ns;
> +}

I think you might want a READ_ONCE() here. Right now it seems ok, since the
mnt_user_ns can't change, but if we ever allow it to change (and I see you have
a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree
:D), the pattern of,

        user_ns = mnt_user_ns(path->mnt);
        if (mnt_idmapped(path->mnt)) {
                uid = kuid_from_mnt(user_ns, uid);
                gid = kgid_from_mnt(user_ns, gid);
        }

could race.

Tycho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-23 15:47   ` Tycho Andersen
@ 2020-11-23 16:24     ` Tycho Andersen
  2020-11-24 12:30       ` Christian Brauner
  0 siblings, 1 reply; 63+ messages in thread
From: Tycho Andersen @ 2020-11-23 16:24 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andy Lutomirski, Mimi Zohar, James Bottomley, Andreas Dilger,
	Christoph Hellwig, Jonathan Corbet, smbarber, Christoph Hellwig,
	Casey Schaufler, linux-ext4, Mrunal Patel, Kees Cook,
	Arnd Bergmann, Jann Horn, selinux, Josh Triplett, Seth Forshee,
	Aleksa Sarai, Alexander Viro, Lennart Poettering, OGAWA Hirofumi,
	Geoffrey Thomas, David Howells, John Johansen, Theodore Tso,
	Dmitry Kasatkin, containers, linux-security-module, linux-audit,
	Eric W. Biederman, linux-api, linux-fsdevel, Alban Crequy,
	linux-integrity, Stephen Smalley, Todd Kjos

On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote:
> On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote:
> > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
> > +{
> > +	return mnt->mnt_user_ns;
> > +}
> 
> I think you might want a READ_ONCE() here. Right now it seems ok, since the
> mnt_user_ns can't change, but if we ever allow it to change (and I see you have
> a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree
> :D), the pattern of,
> 
>         user_ns = mnt_user_ns(path->mnt);
>         if (mnt_idmapped(path->mnt)) {
>                 uid = kuid_from_mnt(user_ns, uid);
>                 gid = kgid_from_mnt(user_ns, gid);
>         }
> 
> could race.

Actually, isn't a race possible now?

kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */);
WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns);
WRITE_ONCE(m->mnt.mnt_flags, flags);
kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */);

So maybe it should be:

         if (mnt_idmapped(path->mnt)) {
                 barrier();
                 user_ns = mnt_user_ns(path->mnt);
                 uid = kuid_from_mnt(user_ns, uid);
                 gid = kgid_from_mnt(user_ns, gid);
         }

since there's no data dependency between mnt_idmapped() and
mnt_user_ns()?

Tycho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 31/39] audit: handle idmapped mounts
  2020-11-23  7:41     ` Christian Brauner
@ 2020-11-23 22:06       ` Paul Moore
  0 siblings, 0 replies; 63+ messages in thread
From: Paul Moore @ 2020-11-23 22:06 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux

On Mon, Nov 23, 2020 at 2:42 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> On Sun, Nov 22, 2020 at 05:17:39PM -0500, Paul Moore wrote:
> > On Sun, Nov 15, 2020 at 5:43 AM Christian Brauner
> > <christian.brauner@ubuntu.com> wrote:
> > >
> > > Audit will sometimes log the inode's i_uid and i_gid. Enable audit to log the
> > > mapped inode when it is accessed from an idmapped mount.
> >
> > I mentioned this in an earlier patch in this patchset, but it is worth
>
> I did not receive that message.

I'm guessing just a slow mail relay somewhere as you responded to both
of my emails on this patchset, I think we're all set for now :)

Thanks.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-23 16:24     ` Tycho Andersen
@ 2020-11-24 12:30       ` Christian Brauner
  2020-11-24 13:37         ` Tycho Andersen
  0 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-24 12:30 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Andy Lutomirski, Mimi Zohar, James Bottomley, Andreas Dilger,
	Christoph Hellwig, Jonathan Corbet, smbarber, Christoph Hellwig,
	Casey Schaufler, linux-ext4, Mrunal Patel, Kees Cook,
	Arnd Bergmann, Jann Horn, selinux, Josh Triplett, Seth Forshee,
	Aleksa Sarai, Alexander Viro, Lennart Poettering, OGAWA Hirofumi,
	Geoffrey Thomas, David Howells, John Johansen, Theodore Tso,
	Dmitry Kasatkin, containers, linux-security-module, linux-audit,
	Eric W. Biederman, linux-api, linux-fsdevel, Alban Crequy,
	linux-integrity, Stephen Smalley, Todd Kjos

On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote:
> On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote:
> > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote:
> > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
> > > +{
> > > +	return mnt->mnt_user_ns;
> > > +}
> > 
> > I think you might want a READ_ONCE() here. Right now it seems ok, since the
> > mnt_user_ns can't change, but if we ever allow it to change (and I see you have
> > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree
> > :D), the pattern of,
> > 
> >         user_ns = mnt_user_ns(path->mnt);
> >         if (mnt_idmapped(path->mnt)) {
> >                 uid = kuid_from_mnt(user_ns, uid);
> >                 gid = kgid_from_mnt(user_ns, gid);
> >         }
> > 
> > could race.
> 
> Actually, isn't a race possible now?
> 
> kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */);
> WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns);
> WRITE_ONCE(m->mnt.mnt_flags, flags);
> kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */);
> 
> So maybe it should be:
> 
>          if (mnt_idmapped(path->mnt)) {
>                  barrier();
>                  user_ns = mnt_user_ns(path->mnt);
>                  uid = kuid_from_mnt(user_ns, uid);
>                  gid = kgid_from_mnt(user_ns, gid);
>          }
> 
> since there's no data dependency between mnt_idmapped() and
> mnt_user_ns()?

I think I had something to handle this case in another branch of mine.
The READ_ONCE() you mentioned in another patch I had originally dropped
because I wasn't sure whether it works on pointers but after talking to
Jann and David it seems that it handles pointers fine.
Let me take a look and fix it in the next version. I just finished
porting the test suite to xfstests as Christoph requested and I'm
looking at this now.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-24 12:30       ` Christian Brauner
@ 2020-11-24 13:37         ` Tycho Andersen
  2020-11-24 13:40           ` Christian Brauner
  0 siblings, 1 reply; 63+ messages in thread
From: Tycho Andersen @ 2020-11-24 13:37 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andy Lutomirski, Mimi Zohar, James Bottomley, Andreas Dilger,
	Christoph Hellwig, Jonathan Corbet, smbarber, Christoph Hellwig,
	Casey Schaufler, linux-ext4, Mrunal Patel, Kees Cook,
	Arnd Bergmann, Jann Horn, selinux, Josh Triplett, Seth Forshee,
	Aleksa Sarai, Alexander Viro, Lennart Poettering, OGAWA Hirofumi,
	Geoffrey Thomas, David Howells, John Johansen, Theodore Tso,
	Dmitry Kasatkin, containers, linux-security-module, linux-audit,
	Eric W. Biederman, linux-api, linux-fsdevel, Alban Crequy,
	linux-integrity, Stephen Smalley, Todd Kjos

On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote:
> On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote:
> > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote:
> > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote:
> > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
> > > > +{
> > > > +	return mnt->mnt_user_ns;
> > > > +}
> > > 
> > > I think you might want a READ_ONCE() here. Right now it seems ok, since the
> > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have
> > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree
> > > :D), the pattern of,
> > > 
> > >         user_ns = mnt_user_ns(path->mnt);
> > >         if (mnt_idmapped(path->mnt)) {
> > >                 uid = kuid_from_mnt(user_ns, uid);
> > >                 gid = kgid_from_mnt(user_ns, gid);
> > >         }
> > > 
> > > could race.
> > 
> > Actually, isn't a race possible now?
> > 
> > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */);
> > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns);
> > WRITE_ONCE(m->mnt.mnt_flags, flags);
> > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */);
> > 
> > So maybe it should be:
> > 
> >          if (mnt_idmapped(path->mnt)) {
> >                  barrier();
> >                  user_ns = mnt_user_ns(path->mnt);
> >                  uid = kuid_from_mnt(user_ns, uid);
> >                  gid = kgid_from_mnt(user_ns, gid);
> >          }
> > 
> > since there's no data dependency between mnt_idmapped() and
> > mnt_user_ns()?
> 
> I think I had something to handle this case in another branch of mine.
> The READ_ONCE() you mentioned in another patch I had originally dropped
> because I wasn't sure whether it works on pointers but after talking to
> Jann and David it seems that it handles pointers fine.
> Let me take a look and fix it in the next version. I just finished
> porting the test suite to xfstests as Christoph requested and I'm
> looking at this now.

Another way would be to just have mnt_idmapped() test
mnt_user_ns() != &init_user_ns instead of the flags; then I think you
get the data dependency and thus correct ordering for free.

Tycho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-24 13:37         ` Tycho Andersen
@ 2020-11-24 13:40           ` Christian Brauner
  2020-11-24 13:44             ` Tycho Andersen
  0 siblings, 1 reply; 63+ messages in thread
From: Christian Brauner @ 2020-11-24 13:40 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Andy Lutomirski, Mimi Zohar, James Bottomley, Andreas Dilger,
	Christoph Hellwig, Jonathan Corbet, smbarber, Christoph Hellwig,
	Casey Schaufler, linux-ext4, Mrunal Patel, Kees Cook,
	Arnd Bergmann, Jann Horn, selinux, Josh Triplett, Seth Forshee,
	Aleksa Sarai, Alexander Viro, Lennart Poettering, OGAWA Hirofumi,
	Geoffrey Thomas, David Howells, John Johansen, Theodore Tso,
	Dmitry Kasatkin, containers, linux-security-module, linux-audit,
	Eric W. Biederman, linux-api, linux-fsdevel, Alban Crequy,
	linux-integrity, Stephen Smalley, Todd Kjos

On Tue, Nov 24, 2020 at 08:37:40AM -0500, Tycho Andersen wrote:
> On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote:
> > On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote:
> > > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote:
> > > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote:
> > > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
> > > > > +{
> > > > > +	return mnt->mnt_user_ns;
> > > > > +}
> > > > 
> > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the
> > > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have
> > > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree
> > > > :D), the pattern of,
> > > > 
> > > >         user_ns = mnt_user_ns(path->mnt);
> > > >         if (mnt_idmapped(path->mnt)) {
> > > >                 uid = kuid_from_mnt(user_ns, uid);
> > > >                 gid = kgid_from_mnt(user_ns, gid);
> > > >         }
> > > > 
> > > > could race.
> > > 
> > > Actually, isn't a race possible now?
> > > 
> > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */);
> > > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns);
> > > WRITE_ONCE(m->mnt.mnt_flags, flags);
> > > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */);
> > > 
> > > So maybe it should be:
> > > 
> > >          if (mnt_idmapped(path->mnt)) {
> > >                  barrier();
> > >                  user_ns = mnt_user_ns(path->mnt);
> > >                  uid = kuid_from_mnt(user_ns, uid);
> > >                  gid = kgid_from_mnt(user_ns, gid);
> > >          }
> > > 
> > > since there's no data dependency between mnt_idmapped() and
> > > mnt_user_ns()?
> > 
> > I think I had something to handle this case in another branch of mine.
> > The READ_ONCE() you mentioned in another patch I had originally dropped
> > because I wasn't sure whether it works on pointers but after talking to
> > Jann and David it seems that it handles pointers fine.
> > Let me take a look and fix it in the next version. I just finished
> > porting the test suite to xfstests as Christoph requested and I'm
> > looking at this now.
> 
> Another way would be to just have mnt_idmapped() test
> mnt_user_ns() != &init_user_ns instead of the flags; then I think you
> get the data dependency and thus correct ordering for free.

I indeed dropped mnt_idmapped() which is unnecessary. :)
I think we should still use smp_store_release() in mnt_user_ns() paired
with smp_load_acquire() in do_idmap_mount() thought.

Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-24 13:40           ` Christian Brauner
@ 2020-11-24 13:44             ` Tycho Andersen
  2020-11-24 13:59               ` Christian Brauner
  0 siblings, 1 reply; 63+ messages in thread
From: Tycho Andersen @ 2020-11-24 13:44 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andy Lutomirski, Mimi Zohar, James Bottomley, Andreas Dilger,
	Christoph Hellwig, Jonathan Corbet, smbarber, Christoph Hellwig,
	Casey Schaufler, linux-ext4, Mrunal Patel, Kees Cook,
	Arnd Bergmann, Jann Horn, selinux, Josh Triplett, Seth Forshee,
	Aleksa Sarai, Alexander Viro, Lennart Poettering, OGAWA Hirofumi,
	Geoffrey Thomas, David Howells, John Johansen, Theodore Tso,
	Dmitry Kasatkin, containers, linux-security-module, linux-audit,
	Eric W. Biederman, linux-api, linux-fsdevel, Alban Crequy,
	linux-integrity, Stephen Smalley, Todd Kjos

On Tue, Nov 24, 2020 at 02:40:35PM +0100, Christian Brauner wrote:
> On Tue, Nov 24, 2020 at 08:37:40AM -0500, Tycho Andersen wrote:
> > On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote:
> > > On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote:
> > > > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote:
> > > > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote:
> > > > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
> > > > > > +{
> > > > > > +	return mnt->mnt_user_ns;
> > > > > > +}
> > > > > 
> > > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the
> > > > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have
> > > > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree
> > > > > :D), the pattern of,
> > > > > 
> > > > >         user_ns = mnt_user_ns(path->mnt);
> > > > >         if (mnt_idmapped(path->mnt)) {
> > > > >                 uid = kuid_from_mnt(user_ns, uid);
> > > > >                 gid = kgid_from_mnt(user_ns, gid);
> > > > >         }
> > > > > 
> > > > > could race.
> > > > 
> > > > Actually, isn't a race possible now?
> > > > 
> > > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */);
> > > > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns);
> > > > WRITE_ONCE(m->mnt.mnt_flags, flags);
> > > > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */);
> > > > 
> > > > So maybe it should be:
> > > > 
> > > >          if (mnt_idmapped(path->mnt)) {
> > > >                  barrier();
> > > >                  user_ns = mnt_user_ns(path->mnt);
> > > >                  uid = kuid_from_mnt(user_ns, uid);
> > > >                  gid = kgid_from_mnt(user_ns, gid);
> > > >          }
> > > > 
> > > > since there's no data dependency between mnt_idmapped() and
> > > > mnt_user_ns()?
> > > 
> > > I think I had something to handle this case in another branch of mine.
> > > The READ_ONCE() you mentioned in another patch I had originally dropped
> > > because I wasn't sure whether it works on pointers but after talking to
> > > Jann and David it seems that it handles pointers fine.
> > > Let me take a look and fix it in the next version. I just finished
> > > porting the test suite to xfstests as Christoph requested and I'm
> > > looking at this now.
> > 
> > Another way would be to just have mnt_idmapped() test
> > mnt_user_ns() != &init_user_ns instead of the flags; then I think you
> > get the data dependency and thus correct ordering for free.
> 
> I indeed dropped mnt_idmapped() which is unnecessary. :)

It still might be a nice helper to prevent people from checking the
flags and forgetting that there's a memory ordering issue, though.

> I think we should still use smp_store_release() in mnt_user_ns() paired
> with smp_load_acquire() in do_idmap_mount() thought.

Sounds reasonable.

Tycho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 07/39] mount: attach mappings to mounts
  2020-11-24 13:44             ` Tycho Andersen
@ 2020-11-24 13:59               ` Christian Brauner
  0 siblings, 0 replies; 63+ messages in thread
From: Christian Brauner @ 2020-11-24 13:59 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Andy Lutomirski, Mimi Zohar, James Bottomley, Andreas Dilger,
	Christoph Hellwig, Jonathan Corbet, smbarber, Christoph Hellwig,
	Casey Schaufler, linux-ext4, Mrunal Patel, Kees Cook,
	Arnd Bergmann, Jann Horn, selinux, Josh Triplett, Seth Forshee,
	Aleksa Sarai, Alexander Viro, Lennart Poettering, OGAWA Hirofumi,
	Geoffrey Thomas, David Howells, John Johansen, Theodore Tso,
	Dmitry Kasatkin, containers, linux-security-module, linux-audit,
	Eric W. Biederman, linux-api, linux-fsdevel, Alban Crequy,
	linux-integrity, Stephen Smalley, Todd Kjos

On Tue, Nov 24, 2020 at 08:44:59AM -0500, Tycho Andersen wrote:
> On Tue, Nov 24, 2020 at 02:40:35PM +0100, Christian Brauner wrote:
> > On Tue, Nov 24, 2020 at 08:37:40AM -0500, Tycho Andersen wrote:
> > > On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote:
> > > > On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote:
> > > > > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote:
> > > > > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote:
> > > > > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
> > > > > > > +{
> > > > > > > +	return mnt->mnt_user_ns;
> > > > > > > +}
> > > > > > 
> > > > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the
> > > > > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have
> > > > > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree
> > > > > > :D), the pattern of,
> > > > > > 
> > > > > >         user_ns = mnt_user_ns(path->mnt);
> > > > > >         if (mnt_idmapped(path->mnt)) {
> > > > > >                 uid = kuid_from_mnt(user_ns, uid);
> > > > > >                 gid = kgid_from_mnt(user_ns, gid);
> > > > > >         }
> > > > > > 
> > > > > > could race.
> > > > > 
> > > > > Actually, isn't a race possible now?
> > > > > 
> > > > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */);
> > > > > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns);
> > > > > WRITE_ONCE(m->mnt.mnt_flags, flags);
> > > > > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */);
> > > > > 
> > > > > So maybe it should be:
> > > > > 
> > > > >          if (mnt_idmapped(path->mnt)) {
> > > > >                  barrier();
> > > > >                  user_ns = mnt_user_ns(path->mnt);
> > > > >                  uid = kuid_from_mnt(user_ns, uid);
> > > > >                  gid = kgid_from_mnt(user_ns, gid);
> > > > >          }
> > > > > 
> > > > > since there's no data dependency between mnt_idmapped() and
> > > > > mnt_user_ns()?
> > > > 
> > > > I think I had something to handle this case in another branch of mine.
> > > > The READ_ONCE() you mentioned in another patch I had originally dropped
> > > > because I wasn't sure whether it works on pointers but after talking to
> > > > Jann and David it seems that it handles pointers fine.
> > > > Let me take a look and fix it in the next version. I just finished
> > > > porting the test suite to xfstests as Christoph requested and I'm
> > > > looking at this now.
> > > 
> > > Another way would be to just have mnt_idmapped() test
> > > mnt_user_ns() != &init_user_ns instead of the flags; then I think you
> > > get the data dependency and thus correct ordering for free.
> > 
> > I indeed dropped mnt_idmapped() which is unnecessary. :)
> 
> It still might be a nice helper to prevent people from checking the
> flags and forgetting that there's a memory ordering issue, though.

I just mentioned this offline but for the record: the flag is gone since
we can rely on the pointer alone. :)

Christian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v2 10/39] inode: add idmapped mount aware init and permission helpers
  2020-11-15 10:36 ` [PATCH v2 10/39] inode: add idmapped mount aware init and " Christian Brauner
@ 2020-11-28 18:12   ` Serge E. Hallyn
  0 siblings, 0 replies; 63+ messages in thread
From: Serge E. Hallyn @ 2020-11-28 18:12 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Christoph Hellwig, linux-fsdevel, John Johansen,
	James Morris, Mimi Zohar, Dmitry Kasatkin, Stephen Smalley,
	Casey Schaufler, Arnd Bergmann, Andreas Dilger, OGAWA Hirofumi,
	Geoffrey Thomas, Mrunal Patel, Josh Triplett, Andy Lutomirski,
	Theodore Tso, Alban Crequy, Tycho Andersen, David Howells,
	James Bottomley, Jann Horn, Seth Forshee, Stéphane Graber,
	Aleksa Sarai, Lennart Poettering, Eric W. Biederman, smbarber,
	Phil Estes, Serge Hallyn, Kees Cook, Todd Kjos, Jonathan Corbet,
	containers, linux-security-module, linux-api, linux-ext4,
	linux-audit, linux-integrity, selinux, Christoph Hellwig

On Sun, Nov 15, 2020 at 11:36:49AM +0100, Christian Brauner wrote:
> The inode_owner_or_capable() helper determines whether the caller is the
> owner of the inode or is capable with respect to that inode. Allow it to
> handle idmapped mounts. If the inode is accessed through an idmapped mount
> we first need to map it according to the mount's user namespace.
> Afterwards the checks are identical to non-idmapped mounts. If the initial
> user namespace is passed all operations are a nop so non-idmapped mounts
> will not see a change in behavior and will not see any performance impact.
> 
> Similarly, we allow the inode_init_owner() helper to handle idmapped
> mounts. It initializes a new inode on idmapped mounts by mapping the fsuid
> and fsgid of the caller from the mount's user namespace. If the initial
> user namespace is passed all operations are a nop so non-idmapped mounts
> will not see a change in behavior and will also not see any performance
> impact.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---
> /* v2 */
> - Christoph Hellwig:
>   - Don't pollute the vfs with additional helpers simply extend the existing
>     helpers with an additional argument and switch all callers.
> ---
>  fs/9p/acl.c                  |  2 +-
>  fs/9p/vfs_inode.c            |  2 +-
>  fs/attr.c                    |  6 +++---
>  fs/bfs/dir.c                 |  2 +-
>  fs/btrfs/inode.c             |  2 +-
>  fs/btrfs/ioctl.c             | 10 +++++-----
>  fs/btrfs/tests/btrfs-tests.c |  2 +-
>  fs/crypto/policy.c           |  2 +-
>  fs/efivarfs/file.c           |  2 +-
>  fs/ext2/ialloc.c             |  2 +-
>  fs/ext2/ioctl.c              |  6 +++---
>  fs/ext4/ialloc.c             |  2 +-
>  fs/ext4/ioctl.c              | 14 +++++++-------
>  fs/f2fs/file.c               | 14 +++++++-------
>  fs/f2fs/namei.c              |  2 +-
>  fs/f2fs/xattr.c              |  2 +-
>  fs/fcntl.c                   |  2 +-
>  fs/gfs2/file.c               |  2 +-
>  fs/hfsplus/inode.c           |  2 +-
>  fs/hfsplus/ioctl.c           |  2 +-
>  fs/hugetlbfs/inode.c         |  2 +-
>  fs/inode.c                   | 23 ++++++++++++++---------
>  fs/jfs/ioctl.c               |  2 +-
>  fs/jfs/jfs_inode.c           |  2 +-
>  fs/minix/bitmap.c            |  2 +-
>  fs/namei.c                   |  4 ++--
>  fs/nilfs2/inode.c            |  2 +-
>  fs/nilfs2/ioctl.c            |  2 +-
>  fs/ocfs2/dlmfs/dlmfs.c       |  4 ++--
>  fs/ocfs2/ioctl.c             |  2 +-
>  fs/ocfs2/namei.c             |  2 +-
>  fs/omfs/inode.c              |  2 +-
>  fs/overlayfs/dir.c           |  2 +-
>  fs/overlayfs/file.c          |  4 ++--
>  fs/overlayfs/super.c         |  2 +-
>  fs/overlayfs/util.c          |  2 +-
>  fs/posix_acl.c               |  2 +-
>  fs/ramfs/inode.c             |  2 +-
>  fs/reiserfs/ioctl.c          |  4 ++--
>  fs/reiserfs/namei.c          |  2 +-
>  fs/sysv/ialloc.c             |  2 +-
>  fs/ubifs/dir.c               |  2 +-
>  fs/ubifs/ioctl.c             |  2 +-
>  fs/udf/ialloc.c              |  2 +-
>  fs/ufs/ialloc.c              |  2 +-
>  fs/xattr.c                   |  2 +-
>  fs/xfs/xfs_ioctl.c           |  2 +-
>  fs/zonefs/super.c            |  2 +-
>  include/linux/fs.h           |  7 ++++---
>  kernel/bpf/inode.c           |  2 +-
>  mm/madvise.c                 |  2 +-
>  mm/mincore.c                 |  2 +-
>  mm/shmem.c                   |  2 +-
>  security/selinux/hooks.c     |  4 ++--
>  54 files changed, 95 insertions(+), 89 deletions(-)
> 
> diff --git a/fs/9p/acl.c b/fs/9p/acl.c
> index 6261719f6f2a..d77b28e8d57a 100644
> --- a/fs/9p/acl.c
> +++ b/fs/9p/acl.c
> @@ -258,7 +258,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler,
>  
>  	if (S_ISLNK(inode->i_mode))
>  		return -EOPNOTSUPP;
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  	if (value) {
>  		/* update the cached acl value */
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index ae0c38ad1fcb..f058e89df30f 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -251,7 +251,7 @@ int v9fs_init_inode(struct v9fs_session_info *v9ses,
>  {
>  	int err = 0;
>  
> -	inode_init_owner(inode, NULL, mode);
> +	inode_init_owner(inode, &init_user_ns, NULL, mode);
>  	inode->i_blocks = 0;
>  	inode->i_rdev = rdev;
>  	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
> diff --git a/fs/attr.c b/fs/attr.c
> index c9e29e589cec..00ae0b000146 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -87,7 +87,7 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
>  
>  	/* Make sure a caller can chmod. */
>  	if (ia_valid & ATTR_MODE) {
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EPERM;
>  		/* Also check the setgid bit! */
>  		if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
> @@ -98,7 +98,7 @@ int setattr_prepare(struct dentry *dentry, struct iattr *attr)
>  
>  	/* Check for setting the inode time. */
>  	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET | ATTR_TIMES_SET)) {
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EPERM;
>  	}
>  
> @@ -243,7 +243,7 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de
>  		if (IS_IMMUTABLE(inode))
>  			return -EPERM;
>  
> -		if (!inode_owner_or_capable(inode)) {
> +		if (!inode_owner_or_capable(&init_user_ns, inode)) {
>  			error = inode_permission(&init_user_ns, inode,
>  						 MAY_WRITE);
>  			if (error)
> diff --git a/fs/bfs/dir.c b/fs/bfs/dir.c
> index d8dfe3a0cb39..c5ae76a87be5 100644
> --- a/fs/bfs/dir.c
> +++ b/fs/bfs/dir.c
> @@ -96,7 +96,7 @@ static int bfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
>  	}
>  	set_bit(ino, info->si_imap);
>  	info->si_freei--;
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
>  	inode->i_blocks = 0;
>  	inode->i_op = &bfs_file_inops;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 32e3bf88d4f7..ed1a5bf5f068 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6015,7 +6015,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
>  	if (ret != 0)
>  		goto fail_unlock;
>  
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	inode_set_bytes(inode, 0);
>  
>  	inode->i_mtime = current_time(inode);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 771ee08920ed..39f25b5d06ed 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -205,7 +205,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
>  	const char *comp = NULL;
>  	u32 binode_flags;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  
>  	if (btrfs_root_readonly(root))
> @@ -417,7 +417,7 @@ static int btrfs_ioctl_fssetxattr(struct file *file, void __user *arg)
>  	unsigned old_i_flags;
>  	int ret = 0;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  
>  	if (btrfs_root_readonly(root))
> @@ -1813,7 +1813,7 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
>  			btrfs_info(BTRFS_I(file_inode(file))->root->fs_info,
>  				   "Snapshot src from another FS");
>  			ret = -EXDEV;
> -		} else if (!inode_owner_or_capable(src_inode)) {
> +		} else if (!inode_owner_or_capable(&init_user_ns, src_inode)) {
>  			/*
>  			 * Subvolume creation is not restricted, but snapshots
>  			 * are limited to own subvolumes only
> @@ -1933,7 +1933,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>  	u64 flags;
>  	int ret = 0;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  
>  	ret = mnt_want_write_file(file);
> @@ -4403,7 +4403,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file,
>  	int ret = 0;
>  	int received_uuid_changed;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  
>  	ret = mnt_want_write_file(file);
> diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
> index 999c14e5d0bd..ac8e604d44e3 100644
> --- a/fs/btrfs/tests/btrfs-tests.c
> +++ b/fs/btrfs/tests/btrfs-tests.c
> @@ -56,7 +56,7 @@ struct inode *btrfs_new_test_inode(void)
>  
>  	inode = new_inode(test_mnt->mnt_sb);
>  	if (inode)
> -		inode_init_owner(inode, NULL, S_IFREG);
> +		inode_init_owner(inode, &init_user_ns, NULL, S_IFREG);
>  
>  	return inode;
>  }
> diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
> index 4441d9944b9e..6ddd9f0d8b36 100644
> --- a/fs/crypto/policy.c
> +++ b/fs/crypto/policy.c
> @@ -462,7 +462,7 @@ int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg)
>  		return -EFAULT;
>  	policy.version = version;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	ret = mnt_want_write_file(filp);
> diff --git a/fs/efivarfs/file.c b/fs/efivarfs/file.c
> index feaa5e182b7b..e6bc0302643b 100644
> --- a/fs/efivarfs/file.c
> +++ b/fs/efivarfs/file.c
> @@ -137,7 +137,7 @@ efivarfs_ioc_setxflags(struct file *file, void __user *arg)
>  	unsigned int oldflags = efivarfs_getflags(inode);
>  	int error;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	if (copy_from_user(&flags, arg, sizeof(flags)))
> diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
> index 432c3febea6d..5081f2dd8a20 100644
> --- a/fs/ext2/ialloc.c
> +++ b/fs/ext2/ialloc.c
> @@ -551,7 +551,7 @@ struct inode *ext2_new_inode(struct inode *dir, umode_t mode,
>  		inode->i_uid = current_fsuid();
>  		inode->i_gid = dir->i_gid;
>  	} else
> -		inode_init_owner(inode, dir, mode);
> +		inode_init_owner(inode, &init_user_ns, dir, mode);
>  
>  	inode->i_ino = ino;
>  	inode->i_blocks = 0;
> diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c
> index 32a8d10b579d..b399cbb7022d 100644
> --- a/fs/ext2/ioctl.c
> +++ b/fs/ext2/ioctl.c
> @@ -39,7 +39,7 @@ long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		if (ret)
>  			return ret;
>  
> -		if (!inode_owner_or_capable(inode)) {
> +		if (!inode_owner_or_capable(&init_user_ns, inode)) {
>  			ret = -EACCES;
>  			goto setflags_out;
>  		}
> @@ -84,7 +84,7 @@ long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  	case EXT2_IOC_SETVERSION: {
>  		__u32 generation;
>  
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EPERM;
>  		ret = mnt_want_write_file(filp);
>  		if (ret)
> @@ -117,7 +117,7 @@ long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
>  			return -ENOTTY;
>  
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EACCES;
>  
>  		if (get_user(rsv_window_size, (int __user *)arg))
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index b215c564bc31..d91f69282311 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -972,7 +972,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
>  		inode->i_uid = current_fsuid();
>  		inode->i_gid = dir->i_gid;
>  	} else
> -		inode_init_owner(inode, dir, mode);
> +		inode_init_owner(inode, &init_user_ns, dir, mode);
>  
>  	if (ext4_has_feature_project(sb) &&
>  	    ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT))
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index f0381876a7e5..e35aba820254 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -139,7 +139,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
>  	}
>  
>  	if (IS_RDONLY(inode) || IS_APPEND(inode) || IS_IMMUTABLE(inode) ||
> -	    !inode_owner_or_capable(inode) || !capable(CAP_SYS_ADMIN)) {
> +	    !inode_owner_or_capable(&init_user_ns, inode) || !capable(CAP_SYS_ADMIN)) {
>  		err = -EPERM;
>  		goto journal_err_out;
>  	}
> @@ -829,7 +829,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  	case FS_IOC_SETFLAGS: {
>  		int err;
>  
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EACCES;
>  
>  		if (get_user(flags, (int __user *) arg))
> @@ -871,7 +871,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		__u32 generation;
>  		int err;
>  
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EPERM;
>  
>  		if (ext4_has_metadata_csum(inode->i_sb)) {
> @@ -1010,7 +1010,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  	case EXT4_IOC_MIGRATE:
>  	{
>  		int err;
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EACCES;
>  
>  		err = mnt_want_write_file(filp);
> @@ -1032,7 +1032,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  	case EXT4_IOC_ALLOC_DA_BLKS:
>  	{
>  		int err;
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EACCES;
>  
>  		err = mnt_want_write_file(filp);
> @@ -1214,7 +1214,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  
>  	case EXT4_IOC_CLEAR_ES_CACHE:
>  	{
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EACCES;
>  		ext4_clear_inode_es(inode);
>  		return 0;
> @@ -1260,7 +1260,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  			return -EFAULT;
>  
>  		/* Make sure caller has proper permission */
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EACCES;
>  
>  		if (fa.fsx_xflags & ~EXT4_SUPPORTED_FS_XFLAGS)
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index ee861c6d9ff0..333442e96cc4 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1955,7 +1955,7 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg)
>  	u32 iflags;
>  	int ret;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	if (get_user(fsflags, (int __user *)arg))
> @@ -2002,7 +2002,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
>  	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>  	int ret;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	if (!S_ISREG(inode->i_mode))
> @@ -2069,7 +2069,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp)
>  	struct inode *inode = file_inode(filp);
>  	int ret;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	ret = mnt_want_write_file(filp);
> @@ -2111,7 +2111,7 @@ static int f2fs_ioc_start_volatile_write(struct file *filp)
>  	struct inode *inode = file_inode(filp);
>  	int ret;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	if (!S_ISREG(inode->i_mode))
> @@ -2146,7 +2146,7 @@ static int f2fs_ioc_release_volatile_write(struct file *filp)
>  	struct inode *inode = file_inode(filp);
>  	int ret;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	ret = mnt_want_write_file(filp);
> @@ -2175,7 +2175,7 @@ static int f2fs_ioc_abort_volatile_write(struct file *filp)
>  	struct inode *inode = file_inode(filp);
>  	int ret;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	ret = mnt_want_write_file(filp);
> @@ -3153,7 +3153,7 @@ static int f2fs_ioc_fssetxattr(struct file *filp, unsigned long arg)
>  		return -EFAULT;
>  
>  	/* Make sure caller has proper permission */
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	if (fa.fsx_xflags & ~F2FS_SUPPORTED_XFLAGS)
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index 8fa37d1434de..66b522e61e50 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -46,7 +46,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
>  
>  	nid_free = true;
>  
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  
>  	inode->i_ino = ino;
>  	inode->i_blocks = 0;
> diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
> index 65afcc3cc68a..d772bf13a814 100644
> --- a/fs/f2fs/xattr.c
> +++ b/fs/f2fs/xattr.c
> @@ -114,7 +114,7 @@ static int f2fs_xattr_advise_set(const struct xattr_handler *handler,
>  	unsigned char old_advise = F2FS_I(inode)->i_advise;
>  	unsigned char new_advise;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  	if (value == NULL)
>  		return -EINVAL;
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 19ac5baad50f..df091d435603 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -46,7 +46,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
>  
>  	/* O_NOATIME can only be set by the owner or superuser */
>  	if ((arg & O_NOATIME) && !(filp->f_flags & O_NOATIME))
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EPERM;
>  
>  	/* required for strict SunOS emulation */
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index b39b339feddc..1d994bdfffaa 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -238,7 +238,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask,
>  		goto out;
>  
>  	error = -EACCES;
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		goto out;
>  
>  	error = 0;
> diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
> index e3da9e96b835..02d51cbcff04 100644
> --- a/fs/hfsplus/inode.c
> +++ b/fs/hfsplus/inode.c
> @@ -376,7 +376,7 @@ struct inode *hfsplus_new_inode(struct super_block *sb, struct inode *dir,
>  		return NULL;
>  
>  	inode->i_ino = sbi->next_cnid++;
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	set_nlink(inode, 1);
>  	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
>  
> diff --git a/fs/hfsplus/ioctl.c b/fs/hfsplus/ioctl.c
> index ce15b9496b77..3edb1926d127 100644
> --- a/fs/hfsplus/ioctl.c
> +++ b/fs/hfsplus/ioctl.c
> @@ -91,7 +91,7 @@ static int hfsplus_ioctl_setflags(struct file *file, int __user *user_flags)
>  	if (err)
>  		goto out;
>  
> -	if (!inode_owner_or_capable(inode)) {
> +	if (!inode_owner_or_capable(&init_user_ns, inode)) {
>  		err = -EACCES;
>  		goto out_drop_write;
>  	}
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b5c109703daa..fed6ddfc3f3a 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -836,7 +836,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
>  		struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
>  
>  		inode->i_ino = get_next_ino();
> -		inode_init_owner(inode, dir, mode);
> +		inode_init_owner(inode, &init_user_ns, dir, mode);
>  		lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
>  				&hugetlbfs_i_mmap_rwsem_key);
>  		inode->i_mapping->a_ops = &hugetlbfs_aops;
> diff --git a/fs/inode.c b/fs/inode.c
> index 7a15372d9c2d..d6dfa876c58d 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2132,13 +2132,14 @@ EXPORT_SYMBOL(init_special_inode);
>  /**
>   * inode_init_owner - Init uid,gid,mode for new inode according to posix standards
>   * @inode: New inode
> + * @user_ns: User namespace the inode is accessed from
>   * @dir: Directory inode
>   * @mode: mode of the new inode
>   */
> -void inode_init_owner(struct inode *inode, const struct inode *dir,
> -			umode_t mode)
> +void inode_init_owner(struct inode *inode, struct user_namespace *user_ns,
> +		      const struct inode *dir, umode_t mode)
>  {
> -	inode->i_uid = current_fsuid();
> +	inode->i_uid = fsuid_into_mnt(user_ns);
>  	if (dir && dir->i_mode & S_ISGID) {
>  		inode->i_gid = dir->i_gid;
>  
> @@ -2146,31 +2147,35 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
>  		if (S_ISDIR(mode))
>  			mode |= S_ISGID;
>  		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
> -			 !in_group_p(inode->i_gid) &&
> -			 !capable_wrt_inode_uidgid(&init_user_ns, dir, CAP_FSETID))
> +			 !in_group_p(i_gid_into_mnt(user_ns, inode)) &&
> +			 !capable_wrt_inode_uidgid(user_ns, dir, CAP_FSETID))
>  			mode &= ~S_ISGID;
>  	} else
> -		inode->i_gid = current_fsgid();
> +		inode->i_gid = fsgid_into_mnt(user_ns);
>  	inode->i_mode = mode;
>  }
>  EXPORT_SYMBOL(inode_init_owner);
>  
>  /**
>   * inode_owner_or_capable - check current task permissions to inode
> + * @user_ns: User namespace the inode is accessed from
>   * @inode: inode being checked
>   *
>   * Return true if current either has CAP_FOWNER in a namespace with the
>   * inode owner uid mapped, or owns the file.
>   */
> -bool inode_owner_or_capable(const struct inode *inode)
> +bool inode_owner_or_capable(struct user_namespace *user_ns,
> +			    const struct inode *inode)
>  {
> +	kuid_t i_uid;
>  	struct user_namespace *ns;
>  
> -	if (uid_eq(current_fsuid(), inode->i_uid))
> +	i_uid = i_uid_into_mnt(user_ns, inode);

Is there a way to end up in a situation where current_fsuid() is
INVALID_UID?  The only way I can think of is to enter a userns
which requires CAP_SYS_ADMIN to it in the first place.  But actually
that suffices here.  If inode->i_uid is invalid in user_ns, and
I enter my userns without setting a uid, the uid_eq() below will
pass and I'll get privilege over someone else's inode, right?

> +	if (uid_eq(current_fsuid(), i_uid))
>  		return true;
>  
>  	ns = current_user_ns();
> -	if (kuid_has_mapping(ns, inode->i_uid) && ns_capable(ns, CAP_FOWNER))
> +	if (kuid_has_mapping(ns, i_uid) && ns_capable(ns, CAP_FOWNER))
>  		return true;
>  	return false;
>  }
> diff --git a/fs/jfs/ioctl.c b/fs/jfs/ioctl.c
> index 10ee0ecca1a8..2581d4db58ff 100644
> --- a/fs/jfs/ioctl.c
> +++ b/fs/jfs/ioctl.c
> @@ -76,7 +76,7 @@ long jfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		if (err)
>  			return err;
>  
> -		if (!inode_owner_or_capable(inode)) {
> +		if (!inode_owner_or_capable(&init_user_ns, inode)) {
>  			err = -EACCES;
>  			goto setflags_out;
>  		}
> diff --git a/fs/jfs/jfs_inode.c b/fs/jfs/jfs_inode.c
> index 4cef170630db..282a785bbf29 100644
> --- a/fs/jfs/jfs_inode.c
> +++ b/fs/jfs/jfs_inode.c
> @@ -64,7 +64,7 @@ struct inode *ialloc(struct inode *parent, umode_t mode)
>  		goto fail_put;
>  	}
>  
> -	inode_init_owner(inode, parent, mode);
> +	inode_init_owner(inode, &init_user_ns, parent, mode);
>  	/*
>  	 * New inodes need to save sane values on disk when
>  	 * uid & gid mount options are used
> diff --git a/fs/minix/bitmap.c b/fs/minix/bitmap.c
> index f4e5e5181a14..d99a78c83fbc 100644
> --- a/fs/minix/bitmap.c
> +++ b/fs/minix/bitmap.c
> @@ -252,7 +252,7 @@ struct inode *minix_new_inode(const struct inode *dir, umode_t mode, int *error)
>  		iput(inode);
>  		return NULL;
>  	}
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	inode->i_ino = j;
>  	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
>  	inode->i_blocks = 0;
> diff --git a/fs/namei.c b/fs/namei.c
> index 38222f92efb6..35952c28ee29 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1047,7 +1047,7 @@ int may_linkat(struct path *link)
>  	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
>  	 * otherwise, it must be a safe source.
>  	 */
> -	if (safe_hardlink_source(inode) || inode_owner_or_capable(inode))
> +	if (safe_hardlink_source(inode) || inode_owner_or_capable(&init_user_ns, inode))
>  		return 0;
>  
>  	audit_log_path_denied(AUDIT_ANOM_LINK, "linkat");
> @@ -2897,7 +2897,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
>  	}
>  
>  	/* O_NOATIME can only be set by the owner or superuser */
> -	if (flag & O_NOATIME && !inode_owner_or_capable(inode))
> +	if (flag & O_NOATIME && !inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  
>  	return 0;
> diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
> index b6517220cad5..d286c3bf7d43 100644
> --- a/fs/nilfs2/inode.c
> +++ b/fs/nilfs2/inode.c
> @@ -348,7 +348,7 @@ struct inode *nilfs_new_inode(struct inode *dir, umode_t mode)
>  	/* reference count of i_bh inherits from nilfs_mdt_read_block() */
>  
>  	atomic64_inc(&root->inodes_count);
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	inode->i_ino = ino;
>  	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
>  
> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index 07d26f61f22a..b053b40315bf 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -132,7 +132,7 @@ static int nilfs_ioctl_setflags(struct inode *inode, struct file *filp,
>  	unsigned int flags, oldflags;
>  	int ret;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	if (get_user(flags, (int __user *)argp))
> diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
> index 583820ec63e2..64491af88239 100644
> --- a/fs/ocfs2/dlmfs/dlmfs.c
> +++ b/fs/ocfs2/dlmfs/dlmfs.c
> @@ -329,7 +329,7 @@ static struct inode *dlmfs_get_root_inode(struct super_block *sb)
>  
>  	if (inode) {
>  		inode->i_ino = get_next_ino();
> -		inode_init_owner(inode, NULL, mode);
> +		inode_init_owner(inode, &init_user_ns, NULL, mode);
>  		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
>  		inc_nlink(inode);
>  
> @@ -352,7 +352,7 @@ static struct inode *dlmfs_get_inode(struct inode *parent,
>  		return NULL;
>  
>  	inode->i_ino = get_next_ino();
> -	inode_init_owner(inode, parent, mode);
> +	inode_init_owner(inode, &init_user_ns, parent, mode);
>  	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
>  
>  	ip = DLMFS_I(inode);
> diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
> index 89984172fc4a..50c9b30ee9f6 100644
> --- a/fs/ocfs2/ioctl.c
> +++ b/fs/ocfs2/ioctl.c
> @@ -96,7 +96,7 @@ static int ocfs2_set_inode_attr(struct inode *inode, unsigned flags,
>  	}
>  
>  	status = -EACCES;
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		goto bail_unlock;
>  
>  	if (!S_ISDIR(inode->i_mode))
> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index c46bf7f581a1..51a80acbb97e 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -198,7 +198,7 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
>  	 * callers. */
>  	if (S_ISDIR(mode))
>  		set_nlink(inode, 2);
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	status = dquot_initialize(inode);
>  	if (status)
>  		return ERR_PTR(status);
> diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c
> index ce93ccca8639..eed9e1273104 100644
> --- a/fs/omfs/inode.c
> +++ b/fs/omfs/inode.c
> @@ -48,7 +48,7 @@ struct inode *omfs_new_inode(struct inode *dir, umode_t mode)
>  		goto fail;
>  
>  	inode->i_ino = new_block;
> -	inode_init_owner(inode, NULL, mode);
> +	inode_init_owner(inode, &init_user_ns, NULL, mode);
>  	inode->i_mapping->a_ops = &omfs_aops;
>  
>  	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 28a075b5f5b2..80b2fab73df7 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -636,7 +636,7 @@ static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev,
>  	inode->i_state |= I_CREATING;
>  	spin_unlock(&inode->i_lock);
>  
> -	inode_init_owner(inode, dentry->d_parent->d_inode, mode);
> +	inode_init_owner(inode, &init_user_ns, dentry->d_parent->d_inode, mode);
>  	attr.mode = inode->i_mode;
>  
>  	err = ovl_create_or_link(dentry, inode, &attr, false);
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index f966b5108358..d58b49a1ea3b 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -53,7 +53,7 @@ static struct file *ovl_open_realfile(const struct file *file,
>  	err = inode_permission(&init_user_ns, realinode, MAY_OPEN | acc_mode);
>  	if (err) {
>  		realfile = ERR_PTR(err);
> -	} else if (!inode_owner_or_capable(realinode)) {
> +	} else if (!inode_owner_or_capable(&init_user_ns, realinode)) {
>  		realfile = ERR_PTR(-EPERM);
>  	} else {
>  		realfile = open_with_fake_path(&file->f_path, flags, realinode,
> @@ -582,7 +582,7 @@ static long ovl_ioctl_set_flags(struct file *file, unsigned int cmd,
>  	struct inode *inode = file_inode(file);
>  	unsigned int oldflags;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EACCES;
>  
>  	ret = mnt_want_write_file(file);
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 196fe3e3f02b..82f2c35894e4 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -960,7 +960,7 @@ ovl_posix_acl_xattr_set(const struct xattr_handler *handler,
>  		goto out_acl_release;
>  	}
>  	err = -EPERM;
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		goto out_acl_release;
>  
>  	posix_acl_release(acl);
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index ff67da201303..0a1f4bccb5da 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -481,7 +481,7 @@ struct file *ovl_path_open(struct path *path, int flags)
>  		return ERR_PTR(err);
>  
>  	/* O_NOATIME is an optimization, don't fail if not permitted */
> -	if (inode_owner_or_capable(inode))
> +	if (inode_owner_or_capable(&init_user_ns, inode))
>  		flags |= O_NOATIME;
>  
>  	return dentry_open(path, flags, current_cred());
> diff --git a/fs/posix_acl.c b/fs/posix_acl.c
> index 5b6296cc89c4..87b5ec67000b 100644
> --- a/fs/posix_acl.c
> +++ b/fs/posix_acl.c
> @@ -874,7 +874,7 @@ set_posix_acl(struct inode *inode, int type, struct posix_acl *acl)
>  
>  	if (type == ACL_TYPE_DEFAULT && !S_ISDIR(inode->i_mode))
>  		return acl ? -EACCES : 0;
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  
>  	if (acl) {
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index ee179a81b3da..83641b9614bd 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -67,7 +67,7 @@ struct inode *ramfs_get_inode(struct super_block *sb,
>  
>  	if (inode) {
>  		inode->i_ino = get_next_ino();
> -		inode_init_owner(inode, dir, mode);
> +		inode_init_owner(inode, &init_user_ns, dir, mode);
>  		inode->i_mapping->a_ops = &ramfs_aops;
>  		mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
>  		mapping_set_unevictable(inode->i_mapping);
> diff --git a/fs/reiserfs/ioctl.c b/fs/reiserfs/ioctl.c
> index adb21bea3d60..4f1cbd930179 100644
> --- a/fs/reiserfs/ioctl.c
> +++ b/fs/reiserfs/ioctl.c
> @@ -59,7 +59,7 @@ long reiserfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  			if (err)
>  				break;
>  
> -			if (!inode_owner_or_capable(inode)) {
> +			if (!inode_owner_or_capable(&init_user_ns, inode)) {
>  				err = -EPERM;
>  				goto setflags_out;
>  			}
> @@ -101,7 +101,7 @@ long reiserfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		err = put_user(inode->i_generation, (int __user *)arg);
>  		break;
>  	case REISERFS_IOC_SETVERSION:
> -		if (!inode_owner_or_capable(inode)) {
> +		if (!inode_owner_or_capable(&init_user_ns, inode)) {
>  			err = -EPERM;
>  			break;
>  		}
> diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
> index 1594687582f0..6e43aec49b43 100644
> --- a/fs/reiserfs/namei.c
> +++ b/fs/reiserfs/namei.c
> @@ -615,7 +615,7 @@ static int new_inode_init(struct inode *inode, struct inode *dir, umode_t mode)
>  	 * the quota init calls have to know who to charge the quota to, so
>  	 * we have to set uid and gid here
>  	 */
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	return dquot_initialize(inode);
>  }
>  
> diff --git a/fs/sysv/ialloc.c b/fs/sysv/ialloc.c
> index 6c9801986af6..96288d35dcb9 100644
> --- a/fs/sysv/ialloc.c
> +++ b/fs/sysv/ialloc.c
> @@ -163,7 +163,7 @@ struct inode * sysv_new_inode(const struct inode * dir, umode_t mode)
>  	*sbi->s_sb_fic_count = cpu_to_fs16(sbi, count);
>  	fs16_add(sbi, sbi->s_sb_total_free_inodes, -1);
>  	dirty_sb(sb);
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	inode->i_ino = fs16_to_cpu(sbi, ino);
>  	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
>  	inode->i_blocks = 0;
> diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
> index 155521e51ac5..1639331f9543 100644
> --- a/fs/ubifs/dir.c
> +++ b/fs/ubifs/dir.c
> @@ -94,7 +94,7 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, struct inode *dir,
>  	 */
>  	inode->i_flags |= S_NOCMTIME;
>  
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	inode->i_mtime = inode->i_atime = inode->i_ctime =
>  			 current_time(inode);
>  	inode->i_mapping->nrpages = 0;
> diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
> index 4363d85a3fd4..2326d5122beb 100644
> --- a/fs/ubifs/ioctl.c
> +++ b/fs/ubifs/ioctl.c
> @@ -155,7 +155,7 @@ long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  		if (IS_RDONLY(inode))
>  			return -EROFS;
>  
> -		if (!inode_owner_or_capable(inode))
> +		if (!inode_owner_or_capable(&init_user_ns, inode))
>  			return -EACCES;
>  
>  		if (get_user(flags, (int __user *) arg))
> diff --git a/fs/udf/ialloc.c b/fs/udf/ialloc.c
> index 84ed23edebfd..e2d07cc1d3c3 100644
> --- a/fs/udf/ialloc.c
> +++ b/fs/udf/ialloc.c
> @@ -103,7 +103,7 @@ struct inode *udf_new_inode(struct inode *dir, umode_t mode)
>  		mutex_unlock(&sbi->s_alloc_mutex);
>  	}
>  
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UID_SET))
>  		inode->i_uid = sbi->s_uid;
>  	if (UDF_QUERY_FLAG(sb, UDF_FLAG_GID_SET))
> diff --git a/fs/ufs/ialloc.c b/fs/ufs/ialloc.c
> index 969fd60436d3..a04c6ea490a0 100644
> --- a/fs/ufs/ialloc.c
> +++ b/fs/ufs/ialloc.c
> @@ -289,7 +289,7 @@ struct inode *ufs_new_inode(struct inode *dir, umode_t mode)
>  	ufs_mark_sb_dirty(sb);
>  
>  	inode->i_ino = cg * uspi->s_ipg + bit;
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  	inode->i_blocks = 0;
>  	inode->i_generation = 0;
>  	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 61a9947f62f4..fcc79c2a1ea1 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -127,7 +127,7 @@ xattr_permission(struct inode *inode, const char *name, int mask)
>  		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
>  			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
>  		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
> -		    (mask & MAY_WRITE) && !inode_owner_or_capable(inode))
> +		    (mask & MAY_WRITE) && !inode_owner_or_capable(&init_user_ns, inode))
>  			return -EPERM;
>  	}
>  
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 97bd29fc8c43..218e80afc859 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1300,7 +1300,7 @@ xfs_ioctl_setattr_get_trans(
>  	 * The user ID of the calling process must be equal to the file owner
>  	 * ID, except in cases where the CAP_FSETID capability is applicable.
>  	 */
> -	if (!inode_owner_or_capable(VFS_I(ip))) {
> +	if (!inode_owner_or_capable(&init_user_ns, VFS_I(ip))) {
>  		error = -EPERM;
>  		goto out_cancel;
>  	}
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index ff5930be096c..5021a41e880c 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -1221,7 +1221,7 @@ static void zonefs_init_dir_inode(struct inode *parent, struct inode *inode,
>  	struct super_block *sb = parent->i_sb;
>  
>  	inode->i_ino = blkdev_nr_zones(sb->s_bdev->bd_disk) + type + 1;
> -	inode_init_owner(inode, parent, S_IFDIR | 0555);
> +	inode_init_owner(inode, &init_user_ns, parent, S_IFDIR | 0555);
>  	inode->i_op = &zonefs_dir_inode_operations;
>  	inode->i_fop = &simple_dir_operations;
>  	set_nlink(inode, 2);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8f6fb065450b..a5845a67a34b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1744,7 +1744,7 @@ static inline int sb_start_intwrite_trylock(struct super_block *sb)
>  }
>  
>  
> -extern bool inode_owner_or_capable(const struct inode *inode);
> +extern bool inode_owner_or_capable(struct user_namespace *user_ns, const struct inode *inode);
>  
>  /*
>   * VFS helper functions..
> @@ -1786,8 +1786,9 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd,
>  /*
>   * VFS file helper functions.
>   */
> -extern void inode_init_owner(struct inode *inode, const struct inode *dir,
> -			umode_t mode);
> +extern void inode_init_owner(struct inode *inode,
> +			     struct user_namespace *user_ns,
> +			     const struct inode *dir, umode_t mode);
>  extern bool may_open_dev(const struct path *path);
>  
>  /*
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index f1c393e5d47d..cfd2e0868f2d 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -122,7 +122,7 @@ static struct inode *bpf_get_inode(struct super_block *sb,
>  	inode->i_mtime = inode->i_atime;
>  	inode->i_ctime = inode->i_atime;
>  
> -	inode_init_owner(inode, dir, mode);
> +	inode_init_owner(inode, &init_user_ns, dir, mode);
>  
>  	return inode;
>  }
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 8afabc363b6b..a3ab05c08c28 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -539,7 +539,7 @@ static inline bool can_do_pageout(struct vm_area_struct *vma)
>  	 * otherwise we'd be including shared non-exclusive mappings, which
>  	 * opens a side channel.
>  	 */
> -	return inode_owner_or_capable(file_inode(vma->vm_file)) ||
> +	return inode_owner_or_capable(&init_user_ns, file_inode(vma->vm_file)) ||
>  		inode_permission(&init_user_ns, file_inode(vma->vm_file), MAY_WRITE) == 0;
>  }
>  
> diff --git a/mm/mincore.c b/mm/mincore.c
> index d5a58e61eac6..ad2dfb7a4500 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -166,7 +166,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma)
>  	 * for writing; otherwise we'd be including shared non-exclusive
>  	 * mappings, which opens a side channel.
>  	 */
> -	return inode_owner_or_capable(file_inode(vma->vm_file)) ||
> +	return inode_owner_or_capable(&init_user_ns, file_inode(vma->vm_file)) ||
>  		inode_permission(&init_user_ns, file_inode(vma->vm_file), MAY_WRITE) == 0;
>  }
>  
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 537c137698f8..1bd6a9487222 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2303,7 +2303,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
>  	inode = new_inode(sb);
>  	if (inode) {
>  		inode->i_ino = ino;
> -		inode_init_owner(inode, dir, mode);
> +		inode_init_owner(inode, &init_user_ns, dir, mode);
>  		inode->i_blocks = 0;
>  		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
>  		inode->i_generation = prandom_u32();
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 6b1826fc3658..14a195fa55eb 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -3133,13 +3133,13 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
>  	}
>  
>  	if (!selinux_initialized(&selinux_state))
> -		return (inode_owner_or_capable(inode) ? 0 : -EPERM);
> +		return (inode_owner_or_capable(&init_user_ns, inode) ? 0 : -EPERM);
>  
>  	sbsec = inode->i_sb->s_security;
>  	if (!(sbsec->flags & SBLABEL_MNT))
>  		return -EOPNOTSUPP;
>  
> -	if (!inode_owner_or_capable(inode))
> +	if (!inode_owner_or_capable(&init_user_ns, inode))
>  		return -EPERM;
>  
>  	ad.type = LSM_AUDIT_DATA_DENTRY;
> -- 
> 2.29.2

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2020-11-28 22:24 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-15 10:36 [PATCH v2 00/39] fs: idmapped mounts Christian Brauner
2020-11-15 10:36 ` [PATCH v2 01/39] namespace: take lock_mount_hash() directly when changing flags Christian Brauner
2020-11-15 10:36 ` [PATCH v2 02/39] mount: make {lock,unlock}_mount_hash() static Christian Brauner
2020-11-15 10:36 ` [PATCH v2 03/39] namespace: only take read lock in do_reconfigure_mnt() Christian Brauner
2020-11-15 10:36 ` [PATCH v2 04/39] fs: add mount_setattr() Christian Brauner
2020-11-15 10:36 ` [PATCH v2 05/39] tests: add mount_setattr() selftests Christian Brauner
2020-11-15 10:36 ` [PATCH v2 06/39] fs: add id translation helpers Christian Brauner
2020-11-15 10:36 ` [PATCH v2 07/39] mount: attach mappings to mounts Christian Brauner
2020-11-23 15:47   ` Tycho Andersen
2020-11-23 16:24     ` Tycho Andersen
2020-11-24 12:30       ` Christian Brauner
2020-11-24 13:37         ` Tycho Andersen
2020-11-24 13:40           ` Christian Brauner
2020-11-24 13:44             ` Tycho Andersen
2020-11-24 13:59               ` Christian Brauner
2020-11-15 10:36 ` [PATCH v2 08/39] capability: handle idmapped mounts Christian Brauner
2020-11-15 10:36 ` [PATCH v2 09/39] namei: add idmapped mount aware permission helpers Christian Brauner
2020-11-15 10:36 ` [PATCH v2 10/39] inode: add idmapped mount aware init and " Christian Brauner
2020-11-28 18:12   ` Serge E. Hallyn
2020-11-15 10:36 ` [PATCH v2 11/39] attr: handle idmapped mounts Christian Brauner
2020-11-15 10:36 ` [PATCH v2 12/39] acl: " Christian Brauner
2020-11-15 10:36 ` [PATCH v2 13/39] xattr: " Christian Brauner
2020-11-15 10:36 ` [PATCH v2 14/39] commoncap: " Christian Brauner
2020-11-22 21:18   ` Paul Moore
2020-11-23  7:45     ` Christian Brauner
2020-11-15 10:36 ` [PATCH v2 15/39] stat: " Christian Brauner
2020-11-15 10:36 ` [PATCH v2 16/39] namei: handle idmapped mounts in may_*() helpers Christian Brauner
2020-11-15 10:36 ` [PATCH v2 17/39] namei: introduce struct renamedata Christian Brauner
2020-11-15 10:36 ` [PATCH v2 18/39] namei: prepare for idmapped mounts Christian Brauner
2020-11-15 10:36 ` [PATCH v2 19/39] open: handle idmapped mounts in do_truncate() Christian Brauner
2020-11-15 10:36 ` [PATCH v2 20/39] open: handle idmapped mounts Christian Brauner
2020-11-15 10:37 ` [PATCH v2 21/39] af_unix: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 22/39] utimes: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 23/39] fcntl: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 24/39] notify: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 25/39] init: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 26/39] ioctl: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 27/39] would_dump: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 28/39] exec: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 29/39] fs: add helpers for idmap mounts Christian Brauner
2020-11-15 10:37 ` [PATCH v2 30/39] apparmor: handle idmapped mounts Christian Brauner
2020-11-15 10:37 ` [PATCH v2 31/39] audit: " Christian Brauner
2020-11-22 22:17   ` Paul Moore
2020-11-23  7:41     ` Christian Brauner
2020-11-23 22:06       ` Paul Moore
2020-11-15 10:37 ` [PATCH v2 32/39] ima: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 33/39] fat: " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 34/39] ext4: support " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 35/39] ecryptfs: do not mount on top of " Christian Brauner
2020-11-15 10:37 ` [PATCH v2 36/39] overlayfs: " Christian Brauner
2020-11-15 12:31   ` Amir Goldstein
2020-11-18 10:26     ` Christian Brauner
2020-11-15 10:37 ` [PATCH v2 37/39] fs: introduce MOUNT_ATTR_IDMAP Christian Brauner
2020-11-15 10:37 ` [PATCH v2 38/39] selftests: add idmapped mounts xattr selftest Christian Brauner
2020-11-15 10:37 ` [PATCH v2 39/39] tests: add vfs/idmapped mounts test suite Christian Brauner
2020-11-20 21:15   ` Kees Cook
2020-11-17 23:54 ` [PATCH v2 00/39] fs: idmapped mounts Jonathan Corbet
2020-11-18  9:45   ` Christian Brauner
2020-11-18  3:51 ` Stephen Barber
2020-11-20  2:33 ` Darrick J. Wong
2020-11-20  9:10   ` Christian Brauner
2020-11-20  9:12     ` Christoph Hellwig
2020-11-20 11:58       ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).