linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] VFS: Introduce filesystem context [ver #7]
@ 2018-04-19 13:31 David Howells
  2018-04-19 13:31 ` [PATCH 01/24] vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion " David Howells
                   ` (23 more replies)
  0 siblings, 24 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs


Here are a set of patches to create a filesystem context prior to setting
up a new mount, populating it with the parsed options/binary data, creating
the superblock and then effecting the mount.  This is also used for remount
since much of the parsing stuff is common in many filesystems.

This allows namespaces and other information to be conveyed through the
mount procedure.

This also allows Miklós Szeredi's idea of doing:

	fd = fsopen("nfs");
	write(fd, "option=val", ...);
	fsmount(fd, "/mnt");

that he presented at LSF-2017 to be implemented (see the relevant patches
in the series).

I didn't use netlink as that would make the core kernel depend on
CONFIG_NET and CONFIG_NETLINK and would introduce network namespacing
issues.

I've implemented filesystem context handling for procfs, nfs, mqueue,
cpuset, kernfs, sysfs, cgroup and afs filesystems.

Non-converted filesystems are handled by the legacy filesystem wrapper.

This post is mostly about the internal filesystem context and the special
kernel interface filesystems.  I've included the fsopen() and fsmount()
syscall implementations for reference, but I expect these to undergo some
reconsideration during LSF.  The last five patches relate to the AFS
conversion and are included as an example.

Significant changes:

 ver #7:

 (*) Undo an incorrect MS_* -> SB_* conversion.

 (*) Pass the mount data buffer size to all the mount-related functions that
     take the data pointer.  This fixes a problem where someone (say SELinux)
     tries to copy the mount data, assuming it to be a page in size, and
     overruns the buffer - thereby incurring an oops by hitting a guard page.

 (*) Made the AFS filesystem use them as an example.  This is a much easier to
     deal with than with NFS or Ext4 as there are very few mount options.

 ver #6:

 (*) Dropped the supplementary error string facility for the moment.

 (*) Dropped the NFS patches for the moment.

 (*) Dropped the reserved file descriptor argument from fsopen() and
     replaced it with three reserved pointers that must be NULL.

 ver #5:

 (*) Renamed sb_config -> fs_context and adjusted variable names.

 (*) Differentiated the flags in sb->s_flags (now named SB_*) from those
     passed to mount(2) (named MS_*).

 (*) Renamed __vfs_new_fs_context() to vfs_new_fs_context() and made the
     caller always provide a struct file_system_type pointer and the
     parameters required.

 (*) Got rid of vfs_submount_fc() in favour of passing
     FS_CONTEXT_FOR_SUBMOUNT to vfs_new_fs_context().  The purpose is now
     used more.

 (*) Call ->validate() on the remount path.

 (*) Got rid of the inode locking in sys_fsmount().

 (*) Call security_sb_mountpoint() in the mount(2) path.

 ver #4:

 (*) Split the sb_config patch up somewhat.

 (*) Made the supplementary error string facility something attached to the
     task_struct rather than the sb_config so that error messages can be
     obtained from NFS doing a mount-root-and-pathwalk inside the
     nfs_get_tree() operation.

     Further, made this managed and read by prctl rather than through the
     mount fd so that it's more generally available.

 ver #3:

 (*) Rebased on 4.12-rc1.

 (*) Split the NFS patch up somewhat.

 ver #2:

 (*) Removed the ->fill_super() from sb_config_operations and passed it in
     directly to functions that want to call it.  NFS now calls
     nfs_fill_super() directly rather than jumping through a pointer to it
     since there's only the one option at the moment.

 (*) Removed ->mnt_ns and ->sb from sb_config and moved ->pid_ns into
     proc_sb_config.

 (*) Renamed create_super -> get_tree.

 (*) Renamed struct mount_context to struct sb_config and amended various
     variable names.

 (*) sys_fsmount() acquired AT_* flags and MS_* flags (for MNT_* flags)
     arguments.

 ver #1:

 (*) Split the sb_config stuff out into its own header.

 (*) Support non-context aware filesystems through a special set of
     sb_config operations.

 (*) Stored the created superblock and root dentry into the sb_config after
     creation rather than directly into a vfsmount.  This allows some
     arguments to be removed to various NFS functions.

 (*) Added an explicit superblock-creation step.  This allows a created
     superblock to then be mounted multiple times.

 (*) Added a flag to say that the sb_config is degraded and cannot have
     another go at having a superblock creation whilst getting rid of the
     one that says it's already mounted.

Possible further developments:

 (*) Implement sb reconfiguration (for now it returns ENOANO).

 (*) Implement mount context support in more filesystems, ext4 being next
     on my list.

 (*) Move the walk-from-root stuff that nfs has to generic code so that you
     can do something akin to:

	mount /dev/sda1:/foo/bar /mnt

     See nfs_follow_remote_path() and mount_subtree().  This is slightly
     tricky in NFS as we have to prevent referral loops.

 (*) Work out how to get at the error message incurred by submounts
     encountered during nfs_follow_remote_path().

     Should the error message be moved to task_struct and made more
     general, perhaps retrieved with a prctl() function?

 (*) Clean up/consolidate the security functions.  Possibly add a
     validation hook to be called at the same time as the mount context
     validate op.

The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=mount-context

David
---
David Howells (24):
      vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion
      VFS: Suppress MS_* flag defs within the kernel unless explicitly enabled
      VFS: Introduce the structs and doc for a filesystem context
      VFS: Add LSM hooks for filesystem context
      apparmor: Implement security hooks for the new mount API
      tomoyo: Implement security hooks for the new mount API
      smack: Implement filesystem context security hooks
      VFS: Require specification of size of mount data for internal mounts
      VFS: Implement a filesystem superblock creation/configuration context
      VFS: Remove unused code after filesystem context changes
      procfs: Move proc_fill_super() to fs/proc/root.c
      proc: Add fs_context support to procfs
      ipc: Convert mqueue fs to fs_context
      cpuset: Use fs_context
      kernfs, sysfs, cgroup, intel_rdt: Support fs_context
      hugetlbfs: Convert to fs_context
      VFS: Remove kern_mount_data()
      VFS: Implement fsopen() to prepare for a mount
      VFS: Implement fsmount() to effect a pre-configured mount
      afs: Fix server record deletion
      net: Export get_proc_net()
      afs: Add fs_context support
      afs: Implement namespacing
      afs: Use fs_context to pass parameters over automount


 Documentation/filesystems/mounting.txt             |  445 +++++++++++++++
 arch/arc/kernel/setup.c                            |    1 
 arch/arm/kernel/atags_parse.c                      |    1 
 arch/ia64/kernel/perfmon.c                         |    3 
 arch/powerpc/platforms/cell/spufs/inode.c          |    6 
 arch/s390/hypfs/inode.c                            |    7 
 arch/sh/kernel/setup.c                             |    1 
 arch/sparc/kernel/setup_32.c                       |    1 
 arch/sparc/kernel/setup_64.c                       |    1 
 arch/x86/entry/syscalls/syscall_32.tbl             |    2 
 arch/x86/entry/syscalls/syscall_64.tbl             |    2 
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c           |  125 ++--
 arch/x86/kernel/setup.c                            |    1 
 drivers/base/devtmpfs.c                            |    7 
 drivers/dax/super.c                                |    2 
 drivers/gpu/drm/drm_drv.c                          |    3 
 drivers/gpu/drm/i915/i915_gemfs.c                  |    2 
 drivers/infiniband/hw/qib/qib_fs.c                 |    7 
 drivers/misc/ibmasm/ibmasmfs.c                     |   11 
 drivers/mtd/mtdsuper.c                             |   26 +
 drivers/oprofile/oprofilefs.c                      |    8 
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    3 
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |    7 
 drivers/staging/ncpfs/inode.c                      |   10 
 drivers/usb/gadget/function/f_fs.c                 |    7 
 drivers/usb/gadget/legacy/inode.c                  |    7 
 drivers/virtio/virtio_balloon.c                    |    2 
 drivers/xen/xenfs/super.c                          |    7 
 fs/9p/vfs_super.c                                  |    2 
 fs/Makefile                                        |    3 
 fs/adfs/super.c                                    |    9 
 fs/affs/super.c                                    |   13 
 fs/afs/cell.c                                      |    4 
 fs/afs/internal.h                                  |   46 +-
 fs/afs/main.c                                      |   33 +
 fs/afs/mntpt.c                                     |  151 +++--
 fs/afs/proc.c                                      |   89 ++-
 fs/afs/server.c                                    |    9 
 fs/afs/super.c                                     |  438 ++++++++-------
 fs/afs/volume.c                                    |    4 
 fs/aio.c                                           |    3 
 fs/anon_inodes.c                                   |    3 
 fs/autofs4/autofs_i.h                              |    2 
 fs/autofs4/init.c                                  |    4 
 fs/autofs4/inode.c                                 |    3 
 fs/befs/linuxvfs.c                                 |   11 
 fs/bfs/inode.c                                     |    8 
 fs/binfmt_misc.c                                   |    7 
 fs/block_dev.c                                     |    2 
 fs/btrfs/super.c                                   |   30 +
 fs/btrfs/tests/btrfs-tests.c                       |    2 
 fs/ceph/super.c                                    |    3 
 fs/cifs/cifs_dfs_ref.c                             |    3 
 fs/cifs/cifsfs.c                                   |    5 
 fs/coda/inode.c                                    |   11 
 fs/configfs/mount.c                                |    7 
 fs/cramfs/inode.c                                  |   17 -
 fs/debugfs/inode.c                                 |   14 
 fs/devpts/inode.c                                  |   10 
 fs/ecryptfs/main.c                                 |    2 
 fs/efivarfs/super.c                                |    9 
 fs/efs/super.c                                     |   14 
 fs/exofs/super.c                                   |    7 
 fs/ext2/super.c                                    |   14 
 fs/ext4/super.c                                    |   16 -
 fs/f2fs/super.c                                    |   13 
 fs/fat/inode.c                                     |    3 
 fs/fat/namei_msdos.c                               |    8 
 fs/fat/namei_vfat.c                                |    8 
 fs/freevxfs/vxfs_super.c                           |   12 
 fs/fs_context.c                                    |  593 ++++++++++++++++++++
 fs/fsopen.c                                        |  304 ++++++++++
 fs/fuse/control.c                                  |    9 
 fs/fuse/inode.c                                    |   16 -
 fs/gfs2/ops_fstype.c                               |    6 
 fs/gfs2/super.c                                    |    4 
 fs/hfs/super.c                                     |   12 
 fs/hfsplus/super.c                                 |   12 
 fs/hostfs/hostfs_kern.c                            |    7 
 fs/hpfs/super.c                                    |   11 
 fs/hugetlbfs/inode.c                               |  327 ++++++-----
 fs/internal.h                                      |    5 
 fs/isofs/inode.c                                   |   11 
 fs/jffs2/super.c                                   |   10 
 fs/jfs/super.c                                     |   11 
 fs/kernfs/mount.c                                  |   90 ++-
 fs/libfs.c                                         |   17 +
 fs/minix/inode.c                                   |   14 
 fs/namespace.c                                     |  422 ++++++++++----
 fs/nfs/internal.h                                  |    4 
 fs/nfs/namespace.c                                 |    3 
 fs/nfs/nfs4namespace.c                             |    3 
 fs/nfs/nfs4super.c                                 |   27 +
 fs/nfs/super.c                                     |   22 -
 fs/nfsd/nfsctl.c                                   |    8 
 fs/nilfs2/super.c                                  |   10 
 fs/nsfs.c                                          |    3 
 fs/ntfs/super.c                                    |   13 
 fs/ocfs2/dlmfs/dlmfs.c                             |    5 
 fs/ocfs2/super.c                                   |   14 
 fs/omfs/inode.c                                    |    9 
 fs/openpromfs/inode.c                              |   11 
 fs/orangefs/orangefs-kernel.h                      |    2 
 fs/orangefs/super.c                                |    5 
 fs/overlayfs/super.c                               |   11 
 fs/pipe.c                                          |    3 
 fs/pnode.c                                         |    1 
 fs/proc/inode.c                                    |   50 --
 fs/proc/internal.h                                 |    6 
 fs/proc/proc_net.c                                 |    3 
 fs/proc/root.c                                     |  202 +++++--
 fs/pstore/inode.c                                  |   10 
 fs/qnx4/inode.c                                    |   14 
 fs/qnx6/inode.c                                    |   14 
 fs/ramfs/inode.c                                   |    6 
 fs/reiserfs/super.c                                |   14 
 fs/romfs/super.c                                   |   13 
 fs/squashfs/super.c                                |   12 
 fs/super.c                                         |  389 ++++++++++---
 fs/sysfs/mount.c                                   |   59 +-
 fs/sysv/inode.c                                    |    3 
 fs/sysv/super.c                                    |   16 -
 fs/tracefs/inode.c                                 |   10 
 fs/ubifs/super.c                                   |    5 
 fs/udf/super.c                                     |   16 -
 fs/ufs/super.c                                     |   11 
 fs/xfs/xfs_super.c                                 |   10 
 include/linux/cgroup.h                             |    3 
 include/linux/debugfs.h                            |    8 
 include/linux/fs.h                                 |   40 +
 include/linux/fs_context.h                         |  106 ++++
 include/linux/kernfs.h                             |   37 +
 include/linux/lsm_hooks.h                          |   74 ++
 include/linux/mount.h                              |    7 
 include/linux/mtd/super.h                          |    4 
 include/linux/proc_fs.h                            |    2 
 include/linux/ramfs.h                              |    4 
 include/linux/security.h                           |   62 ++
 include/linux/shmem_fs.h                           |    3 
 include/linux/syscalls.h                           |    4 
 include/uapi/linux/fs.h                            |   56 --
 include/uapi/linux/magic.h                         |    1 
 include/uapi/linux/mount.h                         |   58 ++
 init/do_mounts.c                                   |    5 
 init/do_mounts_initrd.c                            |    1 
 ipc/mqueue.c                                       |  115 +++-
 kernel/bpf/inode.c                                 |    7 
 kernel/cgroup/cgroup-internal.h                    |   42 +
 kernel/cgroup/cgroup-v1.c                          |  295 +++++-----
 kernel/cgroup/cgroup.c                             |  219 ++++---
 kernel/cgroup/cpuset.c                             |   65 ++
 kernel/sys_ni.c                                    |    4 
 kernel/trace/trace.c                               |    7 
 mm/shmem.c                                         |   10 
 mm/zsmalloc.c                                      |    3 
 net/socket.c                                       |    3 
 net/sunrpc/rpc_pipe.c                              |    7 
 security/apparmor/apparmorfs.c                     |    8 
 security/apparmor/include/mount.h                  |   11 
 security/apparmor/lsm.c                            |   84 +++
 security/apparmor/mount.c                          |   47 ++
 security/inode.c                                   |    7 
 security/security.c                                |   60 ++
 security/selinux/hooks.c                           |  292 +++++++++-
 security/selinux/selinuxfs.c                       |    8 
 security/smack/smack_lsm.c                         |  344 ++++++++++--
 security/smack/smackfs.c                           |    9 
 security/tomoyo/common.h                           |    3 
 security/tomoyo/mount.c                            |   43 +
 security/tomoyo/tomoyo.c                           |   19 +
 171 files changed, 5105 insertions(+), 1739 deletions(-)
 create mode 100644 Documentation/filesystems/mounting.txt
 create mode 100644 fs/fs_context.c
 create mode 100644 fs/fsopen.c
 create mode 100644 include/linux/fs_context.h
 create mode 100644 include/uapi/linux/mount.h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/24] vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-04-19 13:31 ` [PATCH 02/24] VFS: Suppress MS_* flag defs within the kernel unless explicitly enabled " David Howells
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

In do_mount() when the MS_* flags are being converted to MNT_* flags,
MS_RDONLY got accidentally convered to SB_RDONLY.

Undo this change.

Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/namespace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e398f32d7541..6f720ebca133 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2814,7 +2814,7 @@ long do_mount(const char *dev_name, const char __user *dir_name,
 		mnt_flags |= MNT_NODIRATIME;
 	if (flags & MS_STRICTATIME)
 		mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
-	if (flags & SB_RDONLY)
+	if (flags & MS_RDONLY)
 		mnt_flags |= MNT_READONLY;
 
 	/* The default atime for remount is preservation */

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/24] VFS: Suppress MS_* flag defs within the kernel unless explicitly enabled [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
  2018-04-19 13:31 ` [PATCH 01/24] vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion " David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-04-19 13:31 ` [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context " David Howells
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Only the mount namespace code that implements mount(2) should be using the
MS_* flags.  Suppress them inside the kernel unless uapi/linux/mount.h is
included.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/arc/kernel/setup.c       |    1 +
 arch/arm/kernel/atags_parse.c |    1 +
 arch/sh/kernel/setup.c        |    1 +
 arch/sparc/kernel/setup_32.c  |    1 +
 arch/sparc/kernel/setup_64.c  |    1 +
 arch/x86/kernel/setup.c       |    1 +
 drivers/base/devtmpfs.c       |    1 +
 fs/f2fs/super.c               |    2 +
 fs/namespace.c                |    1 +
 fs/pnode.c                    |    1 +
 fs/super.c                    |    1 +
 include/uapi/linux/fs.h       |   56 ++++------------------------------------
 include/uapi/linux/mount.h    |   58 +++++++++++++++++++++++++++++++++++++++++
 init/do_mounts.c              |    1 +
 init/do_mounts_initrd.c       |    1 +
 security/apparmor/lsm.c       |    1 +
 security/apparmor/mount.c     |    1 +
 security/selinux/hooks.c      |    1 +
 security/tomoyo/mount.c       |    1 +
 19 files changed, 80 insertions(+), 52 deletions(-)
 create mode 100644 include/uapi/linux/mount.h

diff --git a/arch/arc/kernel/setup.c b/arch/arc/kernel/setup.c
index b2cae79a25d7..714dc5c2baf1 100644
--- a/arch/arc/kernel/setup.c
+++ b/arch/arc/kernel/setup.c
@@ -19,6 +19,7 @@
 #include <linux/of_fdt.h>
 #include <linux/of.h>
 #include <linux/cache.h>
+#include <uapi/linux/mount.h>
 #include <asm/sections.h>
 #include <asm/arcregs.h>
 #include <asm/tlb.h>
diff --git a/arch/arm/kernel/atags_parse.c b/arch/arm/kernel/atags_parse.c
index c10a3e8ee998..a8a4333929f5 100644
--- a/arch/arm/kernel/atags_parse.c
+++ b/arch/arm/kernel/atags_parse.c
@@ -24,6 +24,7 @@
 #include <linux/root_dev.h>
 #include <linux/screen_info.h>
 #include <linux/memblock.h>
+#include <uapi/linux/mount.h>
 
 #include <asm/setup.h>
 #include <asm/system_info.h>
diff --git a/arch/sh/kernel/setup.c b/arch/sh/kernel/setup.c
index d34e998b809f..d60c7f794d7a 100644
--- a/arch/sh/kernel/setup.c
+++ b/arch/sh/kernel/setup.c
@@ -33,6 +33,7 @@
 #include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/uaccess.h>
+#include <uapi/linux/mount.h>
 #include <asm/io.h>
 #include <asm/page.h>
 #include <asm/elf.h>
diff --git a/arch/sparc/kernel/setup_32.c b/arch/sparc/kernel/setup_32.c
index 13664c377196..7df3d704284c 100644
--- a/arch/sparc/kernel/setup_32.c
+++ b/arch/sparc/kernel/setup_32.c
@@ -34,6 +34,7 @@
 #include <linux/kdebug.h>
 #include <linux/export.h>
 #include <linux/start_kernel.h>
+#include <uapi/linux/mount.h>
 
 #include <asm/io.h>
 #include <asm/processor.h>
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 7944b3ca216a..206bf81eedaf 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -33,6 +33,7 @@
 #include <linux/module.h>
 #include <linux/start_kernel.h>
 #include <linux/bootmem.h>
+#include <uapi/linux/mount.h>
 
 #include <asm/io.h>
 #include <asm/processor.h>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6285697b6e56..29b43f69ae55 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -50,6 +50,7 @@
 #include <linux/init_ohci1394_dma.h>
 #include <linux/kvm_para.h>
 #include <linux/dma-contiguous.h>
+#include <uapi/linux/mount.h>
 
 #include <linux/errno.h>
 #include <linux/kernel.h>
diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index f7768077e817..79a235184fb5 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -25,6 +25,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <uapi/linux/mount.h>
 #include "base.h"
 
 static struct task_struct *thread;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 42d564c5ccd0..a31cc49b7295 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1450,7 +1450,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
 		err = dquot_suspend(sb, -1);
 		if (err < 0)
 			goto restore_opts;
-	} else if (f2fs_readonly(sb) && !(*flags & MS_RDONLY)) {
+	} else if (f2fs_readonly(sb) && !(*flags & SB_RDONLY)) {
 		/* dquot_resume needs RW */
 		sb->s_flags &= ~SB_RDONLY;
 		if (sb_any_quota_suspended(sb)) {
diff --git a/fs/namespace.c b/fs/namespace.c
index 6f720ebca133..3f98e1a36b84 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -26,6 +26,7 @@
 #include <linux/bootmem.h>
 #include <linux/task_work.h>
 #include <linux/sched/task.h>
+#include <uapi/linux/mount.h>
 
 #include "pnode.h"
 #include "internal.h"
diff --git a/fs/pnode.c b/fs/pnode.c
index 53d411a371ce..1100e810d855 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -10,6 +10,7 @@
 #include <linux/mount.h>
 #include <linux/fs.h>
 #include <linux/nsproxy.h>
+#include <uapi/linux/mount.h>
 #include "internal.h"
 #include "pnode.h"
 
diff --git a/fs/super.c b/fs/super.c
index 5fa9a8d8d865..f7c5629bbbda 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -35,6 +35,7 @@
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
 #include <linux/user_namespace.h>
+#include <uapi/linux/mount.h>
 #include "internal.h"
 
 static int thaw_super_locked(struct super_block *sb);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index d2a8313fabd7..5da6c2d96af5 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -14,6 +14,11 @@
 #include <linux/ioctl.h>
 #include <linux/types.h>
 
+/* Use of MS_* flags within the kernel is restricted to core mount(2) code. */
+#if !defined(__KERNEL__)
+#include <linux/mount.h>
+#endif
+
 /*
  * It's silly to have NR_OPEN bigger than NR_FILE, but you can change
  * the file limit at runtime and only root can increase the per-process
@@ -101,57 +106,6 @@ struct inodes_stat_t {
 
 #define NR_FILE  8192	/* this can well be larger on a larger system */
 
-
-/*
- * These are the fs-independent mount-flags: up to 32 flags are supported
- */
-#define MS_RDONLY	 1	/* Mount read-only */
-#define MS_NOSUID	 2	/* Ignore suid and sgid bits */
-#define MS_NODEV	 4	/* Disallow access to device special files */
-#define MS_NOEXEC	 8	/* Disallow program execution */
-#define MS_SYNCHRONOUS	16	/* Writes are synced at once */
-#define MS_REMOUNT	32	/* Alter flags of a mounted FS */
-#define MS_MANDLOCK	64	/* Allow mandatory locks on an FS */
-#define MS_DIRSYNC	128	/* Directory modifications are synchronous */
-#define MS_NOATIME	1024	/* Do not update access times. */
-#define MS_NODIRATIME	2048	/* Do not update directory access times */
-#define MS_BIND		4096
-#define MS_MOVE		8192
-#define MS_REC		16384
-#define MS_VERBOSE	32768	/* War is peace. Verbosity is silence.
-				   MS_VERBOSE is deprecated. */
-#define MS_SILENT	32768
-#define MS_POSIXACL	(1<<16)	/* VFS does not apply the umask */
-#define MS_UNBINDABLE	(1<<17)	/* change to unbindable */
-#define MS_PRIVATE	(1<<18)	/* change to private */
-#define MS_SLAVE	(1<<19)	/* change to slave */
-#define MS_SHARED	(1<<20)	/* change to shared */
-#define MS_RELATIME	(1<<21)	/* Update atime relative to mtime/ctime. */
-#define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
-#define MS_I_VERSION	(1<<23) /* Update inode I_version field */
-#define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
-#define MS_LAZYTIME	(1<<25) /* Update the on-disk [acm]times lazily */
-
-/* These sb flags are internal to the kernel */
-#define MS_SUBMOUNT     (1<<26)
-#define MS_NOREMOTELOCK	(1<<27)
-#define MS_NOSEC	(1<<28)
-#define MS_BORN		(1<<29)
-#define MS_ACTIVE	(1<<30)
-#define MS_NOUSER	(1<<31)
-
-/*
- * Superblock flags that can be altered by MS_REMOUNT
- */
-#define MS_RMT_MASK	(MS_RDONLY|MS_SYNCHRONOUS|MS_MANDLOCK|MS_I_VERSION|\
-			 MS_LAZYTIME)
-
-/*
- * Old magic mount flag and mask
- */
-#define MS_MGC_VAL 0xC0ED0000
-#define MS_MGC_MSK 0xffff0000
-
 /*
  * Structure for FS_IOC_FSGETXATTR[A] and FS_IOC_FSSETXATTR.
  */
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
new file mode 100644
index 000000000000..3f9ec42510b0
--- /dev/null
+++ b/include/uapi/linux/mount.h
@@ -0,0 +1,58 @@
+#ifndef _UAPI_LINUX_MOUNT_H
+#define _UAPI_LINUX_MOUNT_H
+
+/*
+ * These are the fs-independent mount-flags: up to 32 flags are supported
+ *
+ * Usage of these is restricted within the kernel to core mount(2) code and
+ * callers of sys_mount() only.  Filesystems should be using the SB_*
+ * equivalent instead.
+ */
+#define MS_RDONLY	 1	/* Mount read-only */
+#define MS_NOSUID	 2	/* Ignore suid and sgid bits */
+#define MS_NODEV	 4	/* Disallow access to device special files */
+#define MS_NOEXEC	 8	/* Disallow program execution */
+#define MS_SYNCHRONOUS	16	/* Writes are synced at once */
+#define MS_REMOUNT	32	/* Alter flags of a mounted FS */
+#define MS_MANDLOCK	64	/* Allow mandatory locks on an FS */
+#define MS_DIRSYNC	128	/* Directory modifications are synchronous */
+#define MS_NOATIME	1024	/* Do not update access times. */
+#define MS_NODIRATIME	2048	/* Do not update directory access times */
+#define MS_BIND		4096
+#define MS_MOVE		8192
+#define MS_REC		16384
+#define MS_VERBOSE	32768	/* War is peace. Verbosity is silence.
+				   MS_VERBOSE is deprecated. */
+#define MS_SILENT	32768
+#define MS_POSIXACL	(1<<16)	/* VFS does not apply the umask */
+#define MS_UNBINDABLE	(1<<17)	/* change to unbindable */
+#define MS_PRIVATE	(1<<18)	/* change to private */
+#define MS_SLAVE	(1<<19)	/* change to slave */
+#define MS_SHARED	(1<<20)	/* change to shared */
+#define MS_RELATIME	(1<<21)	/* Update atime relative to mtime/ctime. */
+#define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
+#define MS_I_VERSION	(1<<23) /* Update inode I_version field */
+#define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_LAZYTIME	(1<<25) /* Update the on-disk [acm]times lazily */
+
+/* These sb flags are internal to the kernel */
+#define MS_SUBMOUNT     (1<<26)
+#define MS_NOREMOTELOCK	(1<<27)
+#define MS_NOSEC	(1<<28)
+#define MS_BORN		(1<<29)
+#define MS_ACTIVE	(1<<30)
+#define MS_NOUSER	(1<<31)
+
+/*
+ * Superblock flags that can be altered by MS_REMOUNT
+ */
+#define MS_RMT_MASK	(MS_RDONLY|MS_SYNCHRONOUS|MS_MANDLOCK|MS_I_VERSION|\
+			 MS_LAZYTIME)
+
+/*
+ * Old magic mount flag and mask
+ */
+#define MS_MGC_VAL 0xC0ED0000
+#define MS_MGC_MSK 0xffff0000
+
+#endif /* _UAPI_LINUX_MOUNT_H */
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 2c71dabe5626..ea6f21bb9440 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -32,6 +32,7 @@
 #include <linux/nfs_fs.h>
 #include <linux/nfs_fs_sb.h>
 #include <linux/nfs_mount.h>
+#include <uapi/linux/mount.h>
 
 #include "do_mounts.h"
 
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 5a91aefa7305..65de0412f80f 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -18,6 +18,7 @@
 #include <linux/sched.h>
 #include <linux/freezer.h>
 #include <linux/kmod.h>
+#include <uapi/linux/mount.h>
 
 #include "do_mounts.h"
 
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index ce2b89e9ad94..9ebc9e9c3854 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -24,6 +24,7 @@
 #include <linux/audit.h>
 #include <linux/user_namespace.h>
 #include <net/sock.h>
+#include <uapi/linux/mount.h>
 
 #include "include/apparmor.h"
 #include "include/apparmorfs.h"
diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
index 6e8c7ac0b33d..45bb769d6cd7 100644
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -15,6 +15,7 @@
 #include <linux/fs.h>
 #include <linux/mount.h>
 #include <linux/namei.h>
+#include <uapi/linux/mount.h>
 
 #include "include/apparmor.h"
 #include "include/audit.h"
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4cafe6a19167..1f0316bf7e29 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -88,6 +88,7 @@
 #include <linux/msg.h>
 #include <linux/shm.h>
 #include <linux/bpf.h>
+#include <uapi/linux/mount.h>
 
 #include "avc.h"
 #include "objsec.h"
diff --git a/security/tomoyo/mount.c b/security/tomoyo/mount.c
index 807fd91dbb54..7dc7f59b7dde 100644
--- a/security/tomoyo/mount.c
+++ b/security/tomoyo/mount.c
@@ -6,6 +6,7 @@
  */
 
 #include <linux/slab.h>
+#include <uapi/linux/mount.h>
 #include "common.h"
 
 /* String table for special mount operations. */

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
  2018-04-19 13:31 ` [PATCH 01/24] vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion " David Howells
  2018-04-19 13:31 ` [PATCH 02/24] VFS: Suppress MS_* flag defs within the kernel unless explicitly enabled " David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-04-23  3:36   ` Randy Dunlap
  2018-05-01 14:29   ` David Howells
  2018-04-19 13:31 ` [PATCH 04/24] VFS: Add LSM hooks for " David Howells
                   ` (20 subsequent siblings)
  23 siblings, 2 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Introduce a filesystem context concept to be used during superblock
creation for mount and superblock reconfiguration for remount.  This is
allocated at the beginning of the mount procedure and into it is placed:

 (1) Filesystem type.

 (2) Namespaces.

 (3) Device name.

 (4) Superblock flags (MS_*).

 (5) Security details.

 (6) Filesystem-specific data, as set by the mount options.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 Documentation/filesystems/mounting.txt |  445 ++++++++++++++++++++++++++++++++
 include/linux/fs_context.h             |   76 +++++
 2 files changed, 521 insertions(+)
 create mode 100644 Documentation/filesystems/mounting.txt
 create mode 100644 include/linux/fs_context.h

diff --git a/Documentation/filesystems/mounting.txt b/Documentation/filesystems/mounting.txt
new file mode 100644
index 000000000000..805135a66b64
--- /dev/null
+++ b/Documentation/filesystems/mounting.txt
@@ -0,0 +1,445 @@
+			      ===================
+			      FILESYSTEM MOUNTING
+			      ===================
+
+CONTENTS
+
+ (1) Overview.
+
+ (2) The filesystem context.
+
+ (3) The filesystem context operations.
+
+ (4) Filesystem context security.
+
+ (5) VFS filesystem context operations.
+
+
+========
+OVERVIEW
+========
+
+The creation of new mounts is now to be done in a multistep process:
+
+ (1) Create a filesystem context.
+
+ (2) Parse the options and attach them to the context.  Options may be passed
+     individually from userspace.
+
+ (3) Validate and pre-process the context.
+
+ (4) Get or create a superblock and mountable root.
+
+ (5) Perform the mount.
+
+ (6) Return an error message attached to the context.
+
+ (7) Destroy the context.
+
+To support this, the file_system_type struct gains two new fields:
+
+	unsigned short fs_context_size;
+
+which indicates the total amount of space that should be allocated for context
+data (see the Filesystem Context section), and:
+
+	int (*init_fs_context)(struct fs_context *fc, struct super_block *src_sb);
+
+which is invoked to set up the filesystem-specific parts of a filesystem
+context, including the additional space.  The src_sb parameter is used to
+convey the superblock from which the filesystem may draw extra information
+(such as namespaces) for submount (FS_CONTEXT_FOR_SUBMOUNT) or reconfiguration
+(FS_CONTEXT_FOR_RECONFIGURE) purposes - otherwise it will be NULL.
+
+Note that security initialisation is done *after* the filesystem is called so
+that the namespaces may be adjusted first.
+
+And the super_operations struct gains one field:
+
+	int (*reconfigure) (struct super_block *, struct fs_context *);
+
+This shadows the ->reconfigure() operation and takes a prepared filesystem
+context instead of the mount flags and data page.  It may modify the sb_flags
+in the context for the caller to pick up.
+
+[NOTE] reconfigure is intended as a replacement for remount_fs.
+
+
+======================
+THE FILESYSTEM CONTEXT
+======================
+
+The creation and reconfiguration of a superblock is governed by a filesystem
+context.  This is represented by the fs_context structure:
+
+	struct fs_context {
+		const struct fs_context_operations *ops;
+		struct file_system_type *fs;
+		struct dentry		*root;
+		struct user_namespace	*user_ns;
+		struct net		*net_ns;
+		const struct cred	*cred;
+		char			*device;
+		char			*subtype;
+		void			*security;
+		void			*s_fs_info;
+		unsigned int		sb_flags;
+		bool			sloppy;
+		bool			silent;
+		bool			degraded;
+		bool			drop_sb;
+		enum fs_context_purpose	purpose : 8;
+	};
+
+When the VFS creates this, it allocates ->fs_context_size bytes (as specified
+by the file_system_type object) to hold both the fs_context struct and any
+extra data required by the filesystem.  The fs_context struct is placed at the
+beginning of this space.  Any extra space beyond that is for use by the
+filesystem.  The filesystem should wrap the struct in its own, e.g.:
+
+	struct nfs_fs_context {
+		struct fs_context fc;
+		...
+	};
+
+placing the fs_context struct first.  container_of() can then be used.  The
+file_system_type would be initialised thus:
+
+	struct file_system_type nfs = {
+		...
+		.fs_context_size	= sizeof(struct nfs_fs_context),
+		.init_fs_context	= nfs_init_fs_context,
+		...
+	};
+
+The fs_context fields are as follows:
+
+ (*) const struct fs_context_operations *ops
+
+     These are operations that can be done on a filesystem context (see
+     below).  This must be set by the ->init_fs_context() file_system_type
+     operation.
+
+ (*) struct file_system_type *fs
+
+     A pointer to the file_system_type of the filesystem that is being
+     constructed or reconfigured.  This retains a reference on the type owner.
+
+ (*) struct dentry *root
+
+     A pointer to the root of the mountable tree (and indirectly, the
+     superblock thereof).  This is filled in by the ->get_tree() op.
+
+ (*) struct user_namespace *user_ns
+ (*) struct net *net_ns
+
+     There are a subset of the namespaces in use by the invoking process.  They
+     retain references on each namespace.  The subscribed namespaces may be
+     replaced by the filesystem to reflect other sources, such as the parent
+     mount superblock on an automount.
+
+ (*) struct cred *cred
+
+     The mounter's credentials.  This retains a reference on the credentials.
+
+ (*) char *device
+
+     This is the device to be mounted.  It may be a block device
+     (e.g. /dev/sda1) or something more exotic, such as the "host:/path" that
+     NFS desires.
+
+ (*) char *subtype
+
+     This is a string to be added to the type displayed in /proc/mounts to
+     qualify it (used by FUSE).  This is available for the filesystem to set if
+     desired.
+
+ (*) void *security
+
+     A place for the LSMs to hang their security data for the superblock.  The
+     relevant security operations are described below.
+
+ (*) void *s_fs_info
+
+     The proposed s_fs_info for a new superblock, set in the superblock by
+     sget_fc().  This can be used to distinguish superblocks.
+
+ (*) unsigned int sb_flags
+
+     This holds the SB_* flags to be set in super_block::s_flags.
+
+ (*) bool sloppy
+ (*) bool silent
+
+     These are set if the sloppy or silent mount options are given.
+
+     [NOTE] sloppy is probably unnecessary when userspace passes over one
+     option at a time since the error can just be ignored if userspace deems it
+     to be unimportant.
+
+     [NOTE] silent is probably redundant with sb_flags & SB_SILENT.
+
+ (*) bool degraded
+
+     This is set if any preallocated resources in the context have been used
+     up, thereby rendering it unreusable for the ->get_tree() op.
+
+ (*) bool drop_sb
+
+     This is set if a superblock reference needs to be deactivated when the
+     context is put.
+
+ (*) enum fs_context_purpose
+
+     This indicates the purpose for which the context is intended.  The
+     available values are:
+
+	FS_CONTEXT_FOR_USER_MOUNT,	-- New superblock for user-specified mount
+	FS_CONTEXT_FOR_KERNEL_MOUNT,	-- New superblock for kernel-internal mount
+	FS_CONTEXT_FOR_SUBMOUNT		-- New automatic submount of extant mount
+	FS_CONTEXT_FOR_RECONFIGURE	-- Change an existing mount
+
+The mount context is created by calling vfs_new_fs_context(), vfs_sb_reconfig()
+or vfs_dup_fs_context() and is destroyed with put_fs_context().  Note that the
+structure is not refcounted.
+
+VFS, security and filesystem mount options are set individually with
+vfs_parse_mount_option().  Options provided by the old mount(2) system call as
+a page of data can be parsed with generic_parse_monolithic().
+
+When mounting, the filesystem is allowed to take data from any of the pointers
+and attach it to the superblock (or whatever), provided it clears the pointer
+in the mount context.
+
+The filesystem is also allowed to allocate resources and pin them with the
+mount context.  For instance, NFS might pin the appropriate protocol version
+module.
+
+
+=================================
+THE FILESYSTEM CONTEXT OPERATIONS
+=================================
+
+The filesystem context points to a table of operations:
+
+	struct fs_context_operations {
+		void (*free)(struct fs_context *fc);
+		int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
+		int (*parse_source)(struct fs_context *fc);
+		int (*parse_option)(struct fs_context *fc, char *opt);
+		int (*parse_monolithic)(struct fs_context *fc, void *data);
+		int (*validate)(struct fs_context *fc);
+		int (*get_tree)(struct fs_context *fc);
+	};
+
+These operations are invoked by the various stages of the mount procedure to
+manage the filesystem context.  They are as follows:
+
+ (*) void (*free)(struct fs_context *fc);
+
+     Called to clean up the filesystem-specific part of the filesystem context
+     when the context is destroyed.  It should be aware that parts of the
+     context may have been removed and NULL'd out by ->get_tree().
+
+ (*) int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
+
+     Called when a filesystem context has been duplicated to get any refs or
+     copy any non-referenced resources held in the filesystem-specific part of
+     the filesystem context.  An error may be returned to indicate failure to
+     do this.
+
+     [!] Note that even if this fails, put_fs_context() will be called
+	 immediately thereafter, so ->dup() *must* make the
+	 filesystem-specific part safe for ->free().
+
+ (*) int (*parse_source)(struct fs_context *fc);
+
+     Called when the source or device is specified for a filesystem context.
+     The string will have been stored in fc->source prior to calling.  If
+     successful, 0 should be returned and a negative error code otherwise.
+
+ (*) int (*parse_option)(struct fs_context *fc, char *p);
+
+     Called when an option is to be added to the filesystem context.  p points
+     to the option string, likely in "key[=val]" format.  VFS-specific options
+     will have been weeded out and fc->sb_flags updated in the context.
+     Security options will also have been weeded out and fc->security updated.
+
+     If successful, 0 should be returned and a negative error code otherwise.
+
+ (*) int (*parse_monolithic)(struct fs_context *fc, void *data);
+
+     Called when the mount(2) system call is invoked to pass the entire data
+     page in one go.  If this is expected to be just a list of "key[=val]"
+     items separated by commas, then this may be set to NULL.
+
+     The return value is as for ->parse_option().
+
+     If the filesystem (eg. NFS) needs to examine the data first and then finds
+     it's the standard key-val list then it may pass it off to
+     generic_parse_monolithic().
+
+ (*) int (*validate)(struct fs_context *fc);
+
+     Called when all the options have been applied and the mount is about to
+     take place.  It is should check for inconsistencies from mount options and
+     it is also allowed to do preliminary resource acquisition.  For instance,
+     the core NFS module could load the NFS protocol module here.
+
+     Note that if fc->purpose == FS_CONTEXT_FOR_RECONFIGURE, some of the
+     options necessary for a new mount may not be set.
+
+     The return value is as for ->parse_option().
+
+ (*) int (*get_tree)(struct fs_context *fc);
+
+     Called to get or create the mountable root and superblock, using the
+     information stored in the filesystem context (reconfiguration goes via a
+     different vector).  It may detach any resources it desires from the
+     filesystem context and transfer them to the superblock it creates.
+
+     On success it should set fc->root to the mountable root and return 0.  In
+     the case of an error, it should return a negative error code.
+
+
+===========================
+FILESYSTEM CONTEXT SECURITY
+===========================
+
+The filesystem context contains a security pointer that the LSMs can use for
+building up a security context for the superblock to be mounted.  There are a
+number of operations used by the new mount code for this purpose:
+
+ (*) int security_fs_context_alloc(struct fs_context *fc,
+				   struct super_block *src_sb);
+
+     Called to initialise fc->security (which is preset to NULL) and allocate
+     any resources needed.  It should return 0 on success and a negative error
+     code on failure.
+
+     src_sb is non-NULL in the case of reconfiguration
+     (FS_CONTEXT_FOR_RECONFIGURE) in which case it indicates the superblock to
+     be reconfigured or in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in
+     which case it indicates the parent superblock.
+
+ (*) int security_fs_context_dup(struct fs_context *fc,
+				 struct fs_context *src_fc);
+
+     Called to initialise fc->security (which is preset to NULL) and allocate
+     any resources needed.  The original filesystem context is pointed to by
+     src_fc and may be used for reference.  It should return 0 on success and a
+     negative error code on failure.
+
+ (*) void security_fs_context_free(struct fs_context *fc);
+
+     Called to clean up anything attached to fc->security.  Note that the
+     contents may have been transferred to a superblock and the pointer NULL'd
+     out during mount.
+
+ (*) int security_fs_context_parse_option(struct fs_context *fc, char *opt);
+
+     Called for each mount option.  The arguments are as for the
+     ->parse_option() method.  An active LSM may reject one with an error, pass
+     one over and return 0 or consume one and return 1.  If consumed, the
+     option isn't passed on to the filesystem.
+
+ (*) int security_sb_get_tree(struct fs_context *fc);
+
+     Called during the mount procedure to verify that the specified superblock
+     is allowed to be mounted and to transfer the security data there.  It
+     should return 0 or a negative error code.
+
+     [NOTE] Should I add a security_fs_context_validate() operation so that the
+     LSM has the opportunity to allocate stuff and check the options as a
+     whole?
+
+ (*) int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint)
+
+     Called during the mount procedure to verify that the root dentry attached
+     to the context is permitted to be attached to the specified mountpoint.
+     It should return 0 on success and a negative error code on failure.
+
+
+=================================
+VFS FILESYSTEM CONTEXT OPERATIONS
+=================================
+
+There are four operations for creating a filesystem context and
+one for destroying a context:
+
+ (*) struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
+					   struct super_block *src_sb;
+					   unsigned int sb_flags);
+
+     Create a filesystem context for a given filesystem type.  This allocates
+     the filesystem context, sets the flags, initialises the security and calls
+     fs_type->init_fs_context() to initialise the filesystem context.
+
+     src_sb can be NULL or it may indicate a superblock that is going to be
+     reconfigured (FS_CONTEXT_FOR_RECONFIGURE) or a superblock that is the
+     parent of a submount (FS_CONTEXT_FOR_SUBMOUNT).  This superblock is
+     provided as a source of namespace information.
+
+ (*) struct fs_context *vfs_sb_reconfigure(struct vfsmount *mnt,
+					   unsigned int sb_flags);
+
+     Create a filesystem context from the same filesystem as an extant mount
+     and initialise the mount parameters from the superblock underlying that
+     mount.  This is for use by superblock parameter reconfiguration.
+
+ (*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
+
+     Duplicate a filesystem context, copying any options noted and duplicating
+     or additionally referencing any resources held therein.  This is available
+     for use where a filesystem has to get a mount within a mount, such as NFS4
+     does by internally mounting the root of the target server and then doing a
+     private pathwalk to the target directory.
+
+ (*) void put_fs_context(struct fs_context *fc);
+
+     Destroy a filesystem context, releasing any resources it holds.  This
+     calls the ->free() operation.  This is intended to be called by anyone who
+     created a filesystem context.
+
+     [!] filesystem contexts are not refcounted, so this causes unconditional
+	 destruction.
+
+In all the above operations, apart from the put op, the return is a mount
+context pointer or a negative error code.
+
+For the remaining operations, if an error occurs, a negative error code will be
+returned.
+
+ (*) int vfs_get_tree(struct fs_context *fc);
+
+     Get or create the mountable root and superblock, using the parameters in
+     the filesystem context to select/configure the superblock.  This invokes
+     the ->validate() op and then the ->get_tree() op.
+
+     [NOTE] ->validate() could perhaps be rolled into ->get_tree() and
+     ->reconfigure().
+
+ (*) struct vfsmount *vfs_create_mount(struct fs_context *fc);
+
+     Create a mount given the parameters in the specified filesystem context.
+     Note that this does not attach the mount to anything.
+
+ (*) int vfs_set_fs_source(struct fs_context *fc, char *source);
+
+     Supply the source name or device name for the mount.  This may cause the
+     filesystem to access the device.
+
+ (*) int vfs_parse_fs_option(struct fs_context *fc, char *data);
+
+     Supply a single mount option to the filesystem context.  The mount option
+     should likely be in a "key[=val]" string form.  The option is first
+     checked to see if it corresponds to a standard mount flag (in which case
+     it is used to set an SB_xxx flag and consumed) or a security option (in
+     which case the LSM consumes it) before it is passed on to the filesystem.
+
+ (*) int generic_parse_monolithic(struct fs_context *fc, void *data);
+
+     Parse a sys_mount() data page, assuming the form to be a text list
+     consisting of key[=val] options separated by commas.  Each item in the
+     list is passed to vfs_mount_option().  This is the default when the
+     ->parse_monolithic() operation is NULL.
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
new file mode 100644
index 000000000000..732a11898242
--- /dev/null
+++ b/include/linux/fs_context.h
@@ -0,0 +1,76 @@
+/* Filesystem superblock creation and reconfiguration context.
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#ifndef _LINUX_FS_CONTEXT_H
+#define _LINUX_FS_CONTEXT_H
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+
+struct cred;
+struct dentry;
+struct file_operations;
+struct file_system_type;
+struct mnt_namespace;
+struct net;
+struct pid_namespace;
+struct super_block;
+struct user_namespace;
+struct vfsmount;
+
+enum fs_context_purpose {
+	FS_CONTEXT_FOR_USER_MOUNT,	/* New superblock for user-specified mount */
+	FS_CONTEXT_FOR_KERNEL_MOUNT,	/* New superblock for kernel-internal mount */
+	FS_CONTEXT_FOR_SUBMOUNT,	/* New superblock for automatic submount */
+	FS_CONTEXT_FOR_RECONFIGURE,	/* Superblock reconfiguration (remount) */
+};
+
+/*
+ * Filesystem context as allocated and constructed by the ->init_fs_context()
+ * file_system_type operation.  The size of the object allocated is specified
+ * in struct file_system_type::fs_context_size and this must include sufficient
+ * space for the fs_context struct.
+ *
+ * Superblock creation fills in ->root whereas reconfiguration begins with this
+ * already set.
+ *
+ * See Documentation/filesystems/mounting.txt
+ */
+struct fs_context {
+	const struct fs_context_operations *ops;
+	struct file_system_type	*fs_type;
+	struct dentry		*root;		/* The root and superblock */
+	struct user_namespace	*user_ns;	/* The user namespace for this mount */
+	struct net		*net_ns;	/* The network namespace for this mount */
+	const struct cred	*cred;		/* The mounter's credentials */
+	char			*source;	/* The source name (eg. device) */
+	char			*subtype;	/* The subtype to set on the superblock */
+	void			*security;	/* The LSM context */
+	void			*s_fs_info;	/* Proposed s_fs_info */
+	unsigned int		sb_flags;	/* Proposed superblock flags (SB_*) */
+	bool			sloppy;		/* Unrecognised options are okay */
+	bool			silent;
+	bool			degraded;	/* True if the context can't be reused */
+	bool			drop_sb;	/* T if need to drop an SB reference */
+	enum fs_context_purpose	purpose : 8;
+};
+
+struct fs_context_operations {
+	void (*free)(struct fs_context *fc);
+	int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
+	int (*parse_source)(struct fs_context *fc);
+	int (*parse_option)(struct fs_context *fc, char *opt, size_t len);
+	int (*parse_monolithic)(struct fs_context *fc, void *data);
+	int (*validate)(struct fs_context *fc);
+	int (*get_tree)(struct fs_context *fc);
+};
+
+#endif /* _LINUX_FS_CONTEXT_H */

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/24] VFS: Add LSM hooks for filesystem context [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (2 preceding siblings ...)
  2018-04-19 13:31 ` [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context " David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-04-19 20:32   ` Paul Moore
  2018-04-20 15:35   ` David Howells
  2018-04-19 13:31 ` [PATCH 05/24] apparmor: Implement security hooks for the new mount API " David Howells
                   ` (19 subsequent siblings)
  23 siblings, 2 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Add LSM hooks for use by the filesystem context code.  This includes:

 (1) Hooks to handle allocation, duplication and freeing of the security
     record attached to a filesystem context.

 (2) A hook to snoop a mount options in key[=val] form.  If the LSM decides
     it wants to handle it, it can suppress the option being passed to the
     filesystem.  Note that 'val' may include commas and binary data with
     the fsopen patch.

 (3) A hook to transfer the security from the context to a newly created
     superblock.

 (4) A hook to rule on whether a path point can be used as a mountpoint.

These are intended to replace:

	security_sb_copy_data
	security_sb_kern_mount
	security_sb_mount
	security_sb_set_mnt_opts
	security_sb_clone_mnt_opts
	security_sb_parse_opts_str

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-security-module@vger.kernel.org
---

 include/linux/lsm_hooks.h |   62 +++++++++++
 include/linux/security.h  |   44 ++++++++
 security/security.c       |   41 +++++++
 security/selinux/hooks.c  |  262 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 409 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9d0b286f3dba..da20f90d40bb 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -76,6 +76,50 @@
  *	changes on the process such as clearing out non-inheritable signal
  *	state.  This is called immediately after commit_creds().
  *
+ * Security hooks for mount using fd context.
+ *
+ * @fs_context_alloc:
+ *	Allocate and attach a security structure to sc->security.  This pointer
+ *	is initialised to NULL by the caller.
+ *	@fc indicates the new filesystem context.
+ *	@src_sb indicates the source superblock of a submount.
+ * @fs_context_dup:
+ *	Allocate and attach a security structure to sc->security.  This pointer
+ *	is initialised to NULL by the caller.
+ *	@fc indicates the new filesystem context.
+ *	@src_fc indicates the original filesystem context.
+ * @fs_context_free:
+ *	Clean up a filesystem context.
+ *	@fc indicates the filesystem context.
+ * @fs_context_parse_option:
+ *	Userspace provided an option to configure a superblock.  The LSM may
+ *	reject it with an error and may use it for itself, in which case it
+ *	should return 1; otherwise it should return 0 to pass it on to the
+ *	filesystem.
+ *	@fc indicates the filesystem context.
+ *	@opt indicates the option in "key[=val]" form.  It is NUL-terminated,
+ *	but val may be binary data.
+ *	@len indicates the size of the option.
+ * @fs_context_validate:
+ *	Validate the filesystem context preparatory to applying it.  This is
+ *	done after all the options have been parsed.
+ *	@fc indicates the filesystem context.
+ * @sb_get_tree:
+ *	Assign the security to a newly created superblock.
+ *	@fc indicates the filesystem context.
+ *	@fc->root indicates the root that will be mounted.
+ *	@fc->root->d_sb points to the superblock.
+ * @sb_reconfigure:
+ *	Apply reconfiguration to the security on a superblock.
+ *	@fc indicates the filesystem context.
+ *	@fc->root indicates a dentry in the mount.
+ *	@fc->root->d_sb points to the superblock.
+ * @sb_mountpoint:
+ *	Equivalent of sb_mount, but with an fs_context.
+ *	@fc indicates the filesystem context.
+ *	@mountpoint indicates the path on which the mount will take place.
+ *	@mnt_flags indicates the MNT_* flags specified.
+ *
  * Security hooks for filesystem operations.
  *
  * @sb_alloc_security:
@@ -1450,6 +1494,16 @@ union security_list_options {
 	void (*bprm_committing_creds)(struct linux_binprm *bprm);
 	void (*bprm_committed_creds)(struct linux_binprm *bprm);
 
+	int (*fs_context_alloc)(struct fs_context *fc, struct super_block *src_sb);
+	int (*fs_context_dup)(struct fs_context *fc, struct fs_context *src_sc);
+	void (*fs_context_free)(struct fs_context *fc);
+	int (*fs_context_parse_option)(struct fs_context *fc, char *opt, size_t len);
+	int (*fs_context_validate)(struct fs_context *fc);
+	int (*sb_get_tree)(struct fs_context *fc);
+	void (*sb_reconfigure)(struct fs_context *fc);
+	int (*sb_mountpoint)(struct fs_context *fc, struct path *mountpoint,
+			     unsigned int mnt_flags);
+
 	int (*sb_alloc_security)(struct super_block *sb);
 	void (*sb_free_security)(struct super_block *sb);
 	int (*sb_copy_data)(char *orig, char *copy);
@@ -1787,6 +1841,14 @@ struct security_hook_heads {
 	struct hlist_head bprm_check_security;
 	struct hlist_head bprm_committing_creds;
 	struct hlist_head bprm_committed_creds;
+	struct hlist_head fs_context_alloc;
+	struct hlist_head fs_context_dup;
+	struct hlist_head fs_context_free;
+	struct hlist_head fs_context_parse_option;
+	struct hlist_head fs_context_validate;
+	struct hlist_head sb_get_tree;
+	struct hlist_head sb_reconfigure;
+	struct hlist_head sb_mountpoint;
 	struct hlist_head sb_alloc_security;
 	struct hlist_head sb_free_security;
 	struct hlist_head sb_copy_data;
diff --git a/include/linux/security.h b/include/linux/security.h
index 200920f521a1..60a85bd9dfef 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -53,6 +53,7 @@ struct msg_msg;
 struct xattr;
 struct xfrm_sec_ctx;
 struct mm_struct;
+struct fs_context;
 
 /* If capable should audit the security request */
 #define SECURITY_CAP_NOAUDIT 0
@@ -231,6 +232,15 @@ int security_bprm_set_creds(struct linux_binprm *bprm);
 int security_bprm_check(struct linux_binprm *bprm);
 void security_bprm_committing_creds(struct linux_binprm *bprm);
 void security_bprm_committed_creds(struct linux_binprm *bprm);
+int security_fs_context_alloc(struct fs_context *fc, struct super_block *sb);
+int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc);
+void security_fs_context_free(struct fs_context *fc);
+int security_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len);
+int security_fs_context_validate(struct fs_context *fc);
+int security_sb_get_tree(struct fs_context *fc);
+void security_sb_reconfigure(struct fs_context *fc);
+int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
+			   unsigned int mnt_flags);
 int security_sb_alloc(struct super_block *sb);
 void security_sb_free(struct super_block *sb);
 int security_sb_copy_data(char *orig, char *copy);
@@ -539,6 +549,40 @@ static inline void security_bprm_committed_creds(struct linux_binprm *bprm)
 {
 }
 
+static inline int security_fs_context_alloc(struct fs_context *fc,
+					    struct super_block *src_sb)
+{
+	return 0;
+}
+static inline int security_fs_context_dup(struct fs_context *fc,
+					  struct fs_context *src_fc)
+{
+	return 0;
+}
+static inline void security_fs_context_free(struct fs_context *fc)
+{
+}
+static inline int security_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
+{
+	return 0;
+}
+static inline int security_fs_context_validate(struct fs_context *fc)
+{
+	return 0;
+}
+static inline int security_sb_get_tree(struct fs_context *fc)
+{
+	return 0;
+}
+static inline void security_sb_reconfigure(struct fs_context *fc)
+{
+}
+static inline int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
+					 unsigned int mnt_flags)
+{
+	return 0;
+}
+
 static inline int security_sb_alloc(struct super_block *sb)
 {
 	return 0;
diff --git a/security/security.c b/security/security.c
index 7bc2fde023a7..42e4ea19b61c 100644
--- a/security/security.c
+++ b/security/security.c
@@ -358,6 +358,47 @@ void security_bprm_committed_creds(struct linux_binprm *bprm)
 	call_void_hook(bprm_committed_creds, bprm);
 }
 
+int security_fs_context_alloc(struct fs_context *fc, struct super_block *src_sb)
+{
+	return call_int_hook(fs_context_alloc, 0, fc, src_sb);
+}
+
+int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
+{
+	return call_int_hook(fs_context_dup, 0, fc, src_fc);
+}
+
+void security_fs_context_free(struct fs_context *fc)
+{
+	call_void_hook(fs_context_free, fc);
+}
+
+int security_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
+{
+	return call_int_hook(fs_context_parse_option, 0, fc, opt, len);
+}
+
+int security_fs_context_validate(struct fs_context *fc)
+{
+	return call_int_hook(fs_context_validate, 0, fc);
+}
+
+int security_sb_get_tree(struct fs_context *fc)
+{
+	return call_int_hook(sb_get_tree, 0, fc);
+}
+
+void security_sb_reconfigure(struct fs_context *fc)
+{
+	call_void_hook(sb_reconfigure, fc);
+}
+
+int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
+			   unsigned int mnt_flags)
+{
+	return call_int_hook(sb_mountpoint, 0, fc, mountpoint, mnt_flags);
+}
+
 int security_sb_alloc(struct super_block *sb)
 {
 	return call_int_hook(sb_alloc_security, 0, sb);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 1f0316bf7e29..969a2a0dc582 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -48,6 +48,7 @@
 #include <linux/fdtable.h>
 #include <linux/namei.h>
 #include <linux/mount.h>
+#include <linux/fs_context.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_ipv6.h>
 #include <linux/tty.h>
@@ -2960,6 +2961,259 @@ static int selinux_umount(struct vfsmount *mnt, int flags)
 				   FILESYSTEM__UNMOUNT, NULL);
 }
 
+/* fsopen mount context operations */
+
+static int selinux_fs_context_alloc(struct fs_context *fc,
+				    struct super_block *src_sb)
+{
+	struct security_mnt_opts *opts;
+
+	opts = kzalloc(sizeof(*opts), GFP_KERNEL);
+	if (!opts)
+		return -ENOMEM;
+
+	fc->security = opts;
+	return 0;
+}
+
+static int selinux_fs_context_dup(struct fs_context *fc,
+				  struct fs_context *src_fc)
+{
+	const struct security_mnt_opts *src = src_fc->security;
+	struct security_mnt_opts *opts;
+	int i, n;
+
+	opts = kzalloc(sizeof(*opts), GFP_KERNEL);
+	if (!opts)
+		return -ENOMEM;
+	fc->security = opts;
+
+	if (!src || !src->num_mnt_opts)
+		return 0;
+	n = opts->num_mnt_opts = src->num_mnt_opts;
+
+	if (src->mnt_opts) {
+		opts->mnt_opts = kcalloc(n, sizeof(char *), GFP_KERNEL);
+		if (!opts->mnt_opts)
+			return -ENOMEM;
+
+		for (i = 0; i < n; i++) {
+			if (src->mnt_opts[i]) {
+				opts->mnt_opts[i] = kstrdup(src->mnt_opts[i],
+							    GFP_KERNEL);
+				if (!opts->mnt_opts[i])
+					return -ENOMEM;
+			}
+		}
+	}
+
+	if (src->mnt_opts_flags) {
+		opts->mnt_opts_flags = kmemdup(src->mnt_opts_flags,
+					       n * sizeof(int), GFP_KERNEL);
+		if (!opts->mnt_opts_flags)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void selinux_fs_context_free(struct fs_context *fc)
+{
+	struct security_mnt_opts *opts = fc->security;
+
+	security_free_mnt_opts(opts);
+	fc->security = NULL;
+}
+
+static int selinux_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
+{
+	struct security_mnt_opts *opts = fc->security;
+	substring_t args[MAX_OPT_ARGS];
+	unsigned int have;
+	char *c, **oo;
+	int token, ctx, i, *of;
+
+	token = match_token(opt, tokens, args);
+	if (token == Opt_error)
+		return 0; /* Doesn't belong to us. */
+
+	have = 0;
+	for (i = 0; i < opts->num_mnt_opts; i++)
+		have |= 1 << opts->mnt_opts_flags[i];
+	if (have & (1 << token))
+		return -EINVAL;
+
+	switch (token) {
+	case Opt_context:
+		if (have & (1 << Opt_defcontext))
+			goto incompatible;
+		ctx = CONTEXT_MNT;
+		goto copy_context_string;
+
+	case Opt_fscontext:
+		ctx = FSCONTEXT_MNT;
+		goto copy_context_string;
+
+	case Opt_rootcontext:
+		ctx = ROOTCONTEXT_MNT;
+		goto copy_context_string;
+
+	case Opt_defcontext:
+		if (have & (1 << Opt_context))
+			goto incompatible;
+		ctx = DEFCONTEXT_MNT;
+		goto copy_context_string;
+
+	case Opt_labelsupport:
+		return 1;
+
+	default:
+		return -EINVAL;
+	}
+
+copy_context_string:
+	if (opts->num_mnt_opts > 3)
+		return -EINVAL;
+
+	of = krealloc(opts->mnt_opts_flags,
+		      (opts->num_mnt_opts + 1) * sizeof(int), GFP_KERNEL);
+	if (!of)
+		return -ENOMEM;
+	of[opts->num_mnt_opts] = 0;
+	opts->mnt_opts_flags = of;
+
+	oo = krealloc(opts->mnt_opts,
+		      (opts->num_mnt_opts + 1) * sizeof(char *), GFP_KERNEL);
+	if (!oo)
+		return -ENOMEM;
+	oo[opts->num_mnt_opts] = NULL;
+	opts->mnt_opts = oo;
+
+	c = match_strdup(&args[0]);
+	if (!c)
+		return -ENOMEM;
+	opts->mnt_opts[opts->num_mnt_opts] = c;
+	opts->mnt_opts_flags[opts->num_mnt_opts] = ctx;
+	opts->num_mnt_opts++;
+	return 1;
+
+incompatible:
+	return -EINVAL;
+}
+
+/*
+ * Validate the security parameters supplied for a reconfiguration/remount
+ * event.
+ */
+static int selinux_validate_for_sb_reconfigure(struct fs_context *fc)
+{
+	struct super_block *sb = fc->root->d_sb;
+	struct superblock_security_struct *sbsec = sb->s_security;
+	struct security_mnt_opts *opts = fc->security;
+	int rc, i, *flags;
+	char **mount_options;
+
+	if (!(sbsec->flags & SE_SBINITIALIZED))
+		return 0;
+
+	mount_options = opts->mnt_opts;
+	flags = opts->mnt_opts_flags;
+
+	for (i = 0; i < opts->num_mnt_opts; i++) {
+		u32 sid;
+
+		if (flags[i] == SBLABEL_MNT)
+			continue;
+
+		rc = security_context_str_to_sid(&selinux_state, mount_options[i],
+						 &sid, GFP_KERNEL);
+		if (rc) {
+			pr_warn("SELinux: security_context_str_to_sid"
+				"(%s) failed for (dev %s, type %s) errno=%d\n",
+				mount_options[i], sb->s_id, sb->s_type->name, rc);
+			goto inval;
+		}
+
+		switch (flags[i]) {
+		case FSCONTEXT_MNT:
+			if (bad_option(sbsec, FSCONTEXT_MNT, sbsec->sid, sid))
+				goto bad_option;
+			break;
+		case CONTEXT_MNT:
+			if (bad_option(sbsec, CONTEXT_MNT, sbsec->mntpoint_sid, sid))
+				goto bad_option;
+			break;
+		case ROOTCONTEXT_MNT: {
+			struct inode_security_struct *root_isec;
+			root_isec = backing_inode_security(sb->s_root);
+
+			if (bad_option(sbsec, ROOTCONTEXT_MNT, root_isec->sid, sid))
+				goto bad_option;
+			break;
+		}
+		case DEFCONTEXT_MNT:
+			if (bad_option(sbsec, DEFCONTEXT_MNT, sbsec->def_sid, sid))
+				goto bad_option;
+			break;
+		default:
+			goto inval;
+		}
+	}
+
+	rc = 0;
+out:
+	return rc;
+
+bad_option:
+	pr_warn("SELinux: unable to change security options "
+		"during remount (dev %s, type=%s)\n",
+		sb->s_id, sb->s_type->name);
+inval:
+	rc = -EINVAL;
+	goto out;
+}
+
+/*
+ * Validate the security context assembled from the option data supplied to
+ * mount.
+ */
+static int selinux_fs_context_validate(struct fs_context *fc)
+{
+	if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE)
+		return selinux_validate_for_sb_reconfigure(fc);
+	return 0;
+}
+
+/*
+ * Set the security context on a superblock.
+ */
+static int selinux_sb_get_tree(struct fs_context *fc)
+{
+	const struct cred *cred = current_cred();
+	struct common_audit_data ad;
+	int rc;
+
+	rc = selinux_set_mnt_opts(fc->root->d_sb, fc->security, 0, NULL);
+	if (rc)
+		return rc;
+
+	/* Allow all mounts performed by the kernel */
+	if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
+		return 0;
+
+	ad.type = LSM_AUDIT_DATA_DENTRY;
+	ad.u.dentry = fc->root;
+	return superblock_has_perm(cred, fc->root->d_sb, FILESYSTEM__MOUNT, &ad);
+}
+
+static int selinux_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
+				 unsigned int mnt_flags)
+{
+	const struct cred *cred = current_cred();
+
+	return path_has_perm(cred, mountpoint, FILE__MOUNTON);
+}
+
 /* inode security operations */
 
 static int selinux_inode_alloc_security(struct inode *inode)
@@ -6871,6 +7125,14 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds),
 	LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds),
 
+	LSM_HOOK_INIT(fs_context_alloc, selinux_fs_context_alloc),
+	LSM_HOOK_INIT(fs_context_dup, selinux_fs_context_dup),
+	LSM_HOOK_INIT(fs_context_free, selinux_fs_context_free),
+	LSM_HOOK_INIT(fs_context_parse_option, selinux_fs_context_parse_option),
+	LSM_HOOK_INIT(fs_context_validate, selinux_fs_context_validate),
+	LSM_HOOK_INIT(sb_get_tree, selinux_sb_get_tree),
+	LSM_HOOK_INIT(sb_mountpoint, selinux_sb_mountpoint),
+
 	LSM_HOOK_INIT(sb_alloc_security, selinux_sb_alloc_security),
 	LSM_HOOK_INIT(sb_free_security, selinux_sb_free_security),
 	LSM_HOOK_INIT(sb_copy_data, selinux_sb_copy_data),

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/24] apparmor: Implement security hooks for the new mount API [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (3 preceding siblings ...)
  2018-04-19 13:31 ` [PATCH 04/24] VFS: Add LSM hooks for " David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-05-04  0:10   ` John Johansen
  2018-05-11 12:20   ` David Howells
  2018-04-19 13:31 ` [PATCH 06/24] tomoyo: " David Howells
                   ` (18 subsequent siblings)
  23 siblings, 2 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, John Johansen, apparmor, linux-kernel, dhowells,
	linux-security-module, linux-fsdevel, linux-afs

Implement hooks to check the creation of new mountpoints for AppArmor.

Unfortunately, the DFA evaluation puts the option data in last, after the
details of the mountpoint, so we have to cache the mount options in the
fs_context using those hooks till we get to the new mountpoint hook.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: John Johansen <john.johansen@canonical.com>
cc: apparmor@lists.ubuntu.com
cc: linux-security-module@vger.kernel.org
---

 security/apparmor/include/mount.h |   11 +++++
 security/apparmor/lsm.c           |   80 +++++++++++++++++++++++++++++++++++++
 security/apparmor/mount.c         |   46 +++++++++++++++++++++
 3 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/security/apparmor/include/mount.h b/security/apparmor/include/mount.h
index 25d6067fa6ef..0441bfae30fa 100644
--- a/security/apparmor/include/mount.h
+++ b/security/apparmor/include/mount.h
@@ -16,6 +16,7 @@
 
 #include <linux/fs.h>
 #include <linux/path.h>
+#include <linux/fs_context.h>
 
 #include "domain.h"
 #include "policy.h"
@@ -27,7 +28,13 @@
 #define AA_AUDIT_DATA		0x40
 #define AA_MNT_CONT_MATCH	0x40
 
-#define AA_MS_IGNORE_MASK (MS_KERNMOUNT | MS_NOSEC | MS_ACTIVE | MS_BORN)
+#define AA_SB_IGNORE_MASK (SB_KERNMOUNT | SB_NOSEC | SB_ACTIVE | SB_BORN)
+
+struct apparmor_fs_context {
+	struct fs_context	fc;
+	char			*saved_options;
+	size_t			saved_size;
+};
 
 int aa_remount(struct aa_label *label, const struct path *path,
 	       unsigned long flags, void *data);
@@ -45,6 +52,8 @@ int aa_move_mount(struct aa_label *label, const struct path *path,
 int aa_new_mount(struct aa_label *label, const char *dev_name,
 		 const struct path *path, const char *type, unsigned long flags,
 		 void *data);
+int aa_new_mount_fc(struct aa_label *label, struct fs_context *fc,
+		    const struct path *mountpoint);
 
 int aa_umount(struct aa_label *label, struct vfsmount *mnt, int flags);
 
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 9ebc9e9c3854..14398dec2e38 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -518,6 +518,78 @@ static int apparmor_file_mprotect(struct vm_area_struct *vma,
 			   !(vma->vm_flags & VM_SHARED) ? MAP_PRIVATE : 0);
 }
 
+static int apparmor_fs_context_alloc(struct fs_context *fc, struct super_block *src_sb)
+{
+	struct apparmor_fs_context *afc;
+
+	afc = kzalloc(sizeof(*afc), GFP_KERNEL);
+	if (!afc)
+		return -ENOMEM;
+
+	fc->security = afc;
+	return 0;
+}
+
+static int apparmor_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
+{
+	fc->security = NULL;
+	return 0;
+}
+
+static void apparmor_fs_context_free(struct fs_context *fc)
+{
+	struct apparmor_fs_context *afc = fc->security;
+
+	if (afc) {
+		kfree(afc->saved_options);
+		kfree(afc);
+	}
+}
+
+/*
+ * As a temporary hack, we buffer all the options.  The problem is that we need
+ * to pass them to the DFA evaluator *after* mount point parameters, which
+ * means deferring the entire check to the sb_mountpoint hook.
+ */
+static int apparmor_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
+{
+	struct apparmor_fs_context *afc = fc->security;
+	size_t space = 0;
+	char *p, *q;
+
+	if (afc->saved_size > 0)
+		space = 1;
+	
+	p = krealloc(afc->saved_options, afc->saved_size + space + len + 1, GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	q = p + afc->saved_size;
+	if (q != p)
+		*q++ = ' ';
+	memcpy(q, opt, len);
+	q += len;
+	*q = 0;
+
+	afc->saved_options = p;
+	afc->saved_size += 1 + len;
+	return 0;
+}
+
+static int apparmor_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
+				  unsigned int mnt_flags)
+{
+	struct aa_label *label;
+	int error = 0;
+
+	label = __begin_current_label_crit_section();
+	if (!unconfined(label))
+		error = aa_new_mount_fc(label, fc, mountpoint);
+	__end_current_label_crit_section(label);
+
+	return error;
+}
+
 static int apparmor_sb_mount(const char *dev_name, const struct path *path,
 			     const char *type, unsigned long flags, void *data)
 {
@@ -528,7 +600,7 @@ static int apparmor_sb_mount(const char *dev_name, const struct path *path,
 	if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
 		flags &= ~MS_MGC_MSK;
 
-	flags &= ~AA_MS_IGNORE_MASK;
+	flags &= ~AA_SB_IGNORE_MASK;
 
 	label = __begin_current_label_crit_section();
 	if (!unconfined(label)) {
@@ -1124,6 +1196,12 @@ static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(capget, apparmor_capget),
 	LSM_HOOK_INIT(capable, apparmor_capable),
 
+	LSM_HOOK_INIT(fs_context_alloc, apparmor_fs_context_alloc),
+	LSM_HOOK_INIT(fs_context_dup, apparmor_fs_context_dup),
+	LSM_HOOK_INIT(fs_context_free, apparmor_fs_context_free),
+	LSM_HOOK_INIT(fs_context_parse_option, apparmor_fs_context_parse_option),
+	LSM_HOOK_INIT(sb_mountpoint, apparmor_sb_mountpoint),
+
 	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
 	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
 	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
index 45bb769d6cd7..3d477d288627 100644
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -554,6 +554,52 @@ int aa_new_mount(struct aa_label *label, const char *dev_name,
 	return error;
 }
 
+int aa_new_mount_fc(struct aa_label *label, struct fs_context *fc,
+		    const struct path *mountpoint)
+{
+	struct apparmor_fs_context *afc = fc->security;
+	struct aa_profile *profile;
+	char *buffer = NULL, *dev_buffer = NULL;
+	bool binary;
+	int error;
+	struct path tmp_path, *dev_path = NULL;
+
+	AA_BUG(!label);
+	AA_BUG(!mountpoint);
+
+	binary = fc->fs_type->fs_flags & FS_BINARY_MOUNTDATA;
+
+	if (fc->fs_type->fs_flags & FS_REQUIRES_DEV) {
+		if (!fc->source)
+			return -ENOENT;
+
+		error = kern_path(fc->source, LOOKUP_FOLLOW, &tmp_path);
+		if (error)
+			return error;
+		dev_path = &tmp_path;
+	}
+
+	get_buffers(buffer, dev_buffer);
+	if (dev_path) {
+		error = fn_for_each_confined(label, profile,
+			match_mnt(profile, mountpoint, buffer, dev_path, dev_buffer,
+				  fc->fs_type->name,
+				  fc->sb_flags & ~AA_SB_IGNORE_MASK,
+				  afc->saved_options, binary));
+	} else {
+		error = fn_for_each_confined(label, profile,
+			match_mnt_path_str(profile, mountpoint, buffer, fc->source,
+					   fc->fs_type->name,
+					   fc->sb_flags & ~AA_SB_IGNORE_MASK,
+					   afc->saved_options, binary, NULL));
+	}
+	put_buffers(buffer, dev_buffer);
+	if (dev_path)
+		path_put(dev_path);
+
+	return error;
+}
+
 static int profile_umount(struct aa_profile *profile, struct path *path,
 			  char *buffer)
 {

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/24] tomoyo: Implement security hooks for the new mount API [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (4 preceding siblings ...)
  2018-04-19 13:31 ` [PATCH 05/24] apparmor: Implement security hooks for the new mount API " David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-04-19 13:31 ` [PATCH 07/24] smack: Implement filesystem context security hooks " David Howells
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, Tetsuo Handa, linux-kernel, dhowells, linux-fsdevel,
	linux-security-module, tomoyo-dev-en, linux-afs

Implement the security hook to check the creation of a new mountpoint for
Tomoyo.

As far as I can tell, Tomoyo doesn't make use of the mount data or parse
any mount options, so I haven't implemented any of the fs_context hooks for
it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
cc: tomoyo-dev-en@lists.sourceforge.jp
cc: linux-security-module@vger.kernel.org
---

 security/tomoyo/common.h |    3 +++
 security/tomoyo/mount.c  |   42 ++++++++++++++++++++++++++++++++++++++++++
 security/tomoyo/tomoyo.c |   15 +++++++++++++++
 3 files changed, 60 insertions(+)

diff --git a/security/tomoyo/common.h b/security/tomoyo/common.h
index 539bcdd30bb8..e637ce73f7f9 100644
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -971,6 +971,9 @@ int tomoyo_init_request_info(struct tomoyo_request_info *r,
 			     const u8 index);
 int tomoyo_mkdev_perm(const u8 operation, const struct path *path,
 		      const unsigned int mode, unsigned int dev);
+int tomoyo_mount_permission_fc(struct fs_context *fc,
+			       const struct path *mountpoint,
+			       unsigned int mnt_flags);
 int tomoyo_mount_permission(const char *dev_name, const struct path *path,
 			    const char *type, unsigned long flags,
 			    void *data_page);
diff --git a/security/tomoyo/mount.c b/security/tomoyo/mount.c
index 7dc7f59b7dde..e2e1cb203775 100644
--- a/security/tomoyo/mount.c
+++ b/security/tomoyo/mount.c
@@ -6,6 +6,7 @@
  */
 
 #include <linux/slab.h>
+#include <linux/fs_context.h>
 #include <uapi/linux/mount.h>
 #include "common.h"
 
@@ -236,3 +237,44 @@ int tomoyo_mount_permission(const char *dev_name, const struct path *path,
 	tomoyo_read_unlock(idx);
 	return error;
 }
+
+/**
+ * tomoyo_mount_permission_fc - Check permission to create a new mount.
+ * @fc:		Context describing the object to be mounted.
+ * @mountpoint:	The target object to mount on.
+ * @mnt:	The MNT_* flags to be set on the mountpoint.
+ *
+ * Check the permission to create a mount of the object described in @fc.  Note
+ * that the source object may be a newly created superblock or may be an
+ * existing one picked from the filesystem (bind mount).
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+int tomoyo_mount_permission_fc(struct fs_context *fc,
+			       const struct path *mountpoint,
+			       unsigned int mnt_flags)
+{
+	struct tomoyo_request_info r;
+	unsigned int ms_flags = 0;
+	int error;
+	int idx;
+
+	if (tomoyo_init_request_info(&r, NULL, TOMOYO_MAC_FILE_MOUNT) ==
+	    TOMOYO_CONFIG_DISABLED)
+		return 0;
+
+	/* Convert MNT_* flags to MS_* equivalents. */
+	if (mnt_flags & MNT_NOSUID)	ms_flags |= MS_NOSUID;
+	if (mnt_flags & MNT_NODEV)	ms_flags |= MS_NODEV;
+	if (mnt_flags & MNT_NOEXEC)	ms_flags |= MS_NOEXEC;
+	if (mnt_flags & MNT_NOATIME)	ms_flags |= MS_NOATIME;
+	if (mnt_flags & MNT_NODIRATIME)	ms_flags |= MS_NODIRATIME;
+	if (mnt_flags & MNT_RELATIME)	ms_flags |= MS_RELATIME;
+	if (mnt_flags & MNT_READONLY)	ms_flags |= MS_RDONLY;
+
+	idx = tomoyo_read_lock();
+	error = tomoyo_mount_acl(&r, fc->source, mountpoint, fc->fs_type->name,
+				 ms_flags);
+	tomoyo_read_unlock(idx);
+	return error;
+}
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 213b8c593668..31fd6bd4f657 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -391,6 +391,20 @@ static int tomoyo_path_chroot(const struct path *path)
 	return tomoyo_path_perm(TOMOYO_TYPE_CHROOT, path, NULL);
 }
 
+/**
+ * tomoyo_sb_mount - Target for security_sb_mountpoint().
+ * @fc:		Context describing the object to be mounted.
+ * @mountpoint:	The target object to mount on.
+ * @mnt_flags:	Mountpoint specific options (as MNT_* flags).
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
+				unsigned int mnt_flags)
+{
+	return tomoyo_mount_permission_fc(fc, mountpoint, mnt_flags);
+}
+
 /**
  * tomoyo_sb_mount - Target for security_sb_mount().
  *
@@ -519,6 +533,7 @@ static struct security_hook_list tomoyo_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(path_chmod, tomoyo_path_chmod),
 	LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
 	LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
+	LSM_HOOK_INIT(sb_mountpoint, tomoyo_sb_mountpoint),
 	LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
 	LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
 	LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/24] smack: Implement filesystem context security hooks [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (5 preceding siblings ...)
  2018-04-19 13:31 ` [PATCH 06/24] tomoyo: " David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-04-19 13:31 ` [PATCH 08/24] VFS: Require specification of size of mount data for internal mounts " David Howells
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-fsdevel,
	linux-security-module, Casey Schaufler, linux-afs

Implement filesystem context security hooks for the smack LSM.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: linux-security-module@vger.kernel.org
---

 security/smack/smack_lsm.c |  309 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 309 insertions(+)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 0b414836bebd..549aaa46353b 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -42,6 +42,7 @@
 #include <linux/shm.h>
 #include <linux/binfmts.h>
 #include <linux/parser.h>
+#include <linux/fs_context.h>
 #include "smack.h"
 
 #define TRANS_TRUE	"TRUE"
@@ -521,6 +522,307 @@ static int smack_syslog(int typefrom_file)
 	return rc;
 }
 
+/*
+ * Mount context operations
+ */
+
+struct smack_fs_context {
+	union {
+		struct {
+			char		*fsdefault;
+			char		*fsfloor;
+			char		*fshat;
+			char		*fsroot;
+			char		*fstransmute;
+		};
+		char			*ptrs[5];
+
+	};
+	struct superblock_smack		*sbsp;
+	struct inode_smack		*isp;
+	bool				transmute;
+};
+
+/**
+ * smack_fs_context_free - Free the security data from a filesystem context
+ * @fc: The filesystem context to be cleaned up.
+ */
+static void smack_fs_context_free(struct fs_context *fc)
+{
+	struct smack_fs_context *ctx = fc->security;
+	int i;
+
+	if (ctx) {
+		for (i = 0; i < ARRAY_SIZE(ctx->ptrs); i++)
+			kfree(ctx->ptrs[i]);
+		kfree(ctx->isp);
+		kfree(ctx->sbsp);
+		kfree(ctx);
+		fc->security = NULL;
+	}
+}
+
+/**
+ * smack_fs_context_alloc - Allocate security data for a filesystem context
+ * @fc: The filesystem context.
+ * @src_sb: Reference superblock (automount/reconfigure) or NULL
+ *
+ * Returns 0 on success or -ENOMEM on error.
+ */
+static int smack_fs_context_alloc(struct fs_context *fc,
+				  struct super_block *src_sb)
+{
+	struct smack_fs_context *ctx;
+	struct superblock_smack *sbsp;
+	struct inode_smack *isp;
+	struct smack_known *skp;
+
+	ctx = kzalloc(sizeof(struct smack_fs_context), GFP_KERNEL);
+	if (!ctx)
+		goto nomem;
+	fc->security = ctx;
+
+	sbsp = kzalloc(sizeof(struct superblock_smack), GFP_KERNEL);
+	if (!sbsp)
+		goto nomem_free;
+	ctx->sbsp = sbsp;
+
+	isp = new_inode_smack(NULL);
+	if (!isp)
+		goto nomem_free;
+	ctx->isp = isp;
+
+	if (src_sb) {
+		if (src_sb->s_security)
+			memcpy(sbsp, src_sb->s_security, sizeof(*sbsp));
+	} else if (!smack_privileged(CAP_MAC_ADMIN)) {
+		/* Unprivileged mounts get root and default from the caller. */
+		skp = smk_of_current();
+		sbsp->smk_root = skp;
+		sbsp->smk_default = skp;
+	} else {
+		sbsp->smk_root = &smack_known_floor;
+		sbsp->smk_default = &smack_known_floor;
+		sbsp->smk_floor = &smack_known_floor;
+		sbsp->smk_hat = &smack_known_hat;
+		/* SMK_SB_INITIALIZED will be zero from kzalloc. */
+	}
+
+	return 0;
+
+nomem_free:
+	smack_fs_context_free(fc);
+nomem:
+	return -ENOMEM;
+}
+
+/**
+ * smack_fs_context_dup - Duplicate the security data on fs_context duplication
+ * @fc: The new filesystem context.
+ * @src_fc: The source filesystem context being duplicated.
+ *
+ * Returns 0 on success or -ENOMEM on error.
+ */
+static int smack_fs_context_dup(struct fs_context *fc,
+				struct fs_context *src_fc)
+{
+	struct smack_fs_context *dst, *src = src_fc->security;
+	int i;
+
+	dst = kzalloc(sizeof(struct smack_fs_context), GFP_KERNEL);
+	if (!dst)
+		goto nomem;
+	fc->security = dst;
+
+	dst->sbsp = kmemdup(src->sbsp, sizeof(struct superblock_smack),
+			    GFP_KERNEL);
+	if (!dst->sbsp)
+		goto nomem_free;
+
+	for (i = 0; i < ARRAY_SIZE(dst->ptrs); i++) {
+		if (src->ptrs[i]) {
+			dst->ptrs[i] = kstrdup(src->ptrs[i], GFP_KERNEL);
+			if (!dst->ptrs[i])
+				goto nomem_free;
+		}
+	}
+
+	return 0;
+
+nomem_free:
+	smack_fs_context_free(fc);
+nomem:
+	return -ENOMEM;
+}
+
+/**
+ * smack_fs_context_parse_option - Parse a single mount option
+ * @fc: The new filesystem context being constructed.
+ * @opt: The option text buffer.
+ * @len: The length of the text.
+ *
+ * Returns 0 on success or -ENOMEM on error.
+ */
+static int smack_fs_context_parse_option(struct fs_context *fc, char *p, size_t len)
+{
+	struct smack_fs_context *ctx = fc->security;
+	substring_t args[MAX_OPT_ARGS];
+	int rc = -ENOMEM;
+	int token;
+
+	/* Unprivileged mounts don't get to specify Smack values. */
+	if (!smack_privileged(CAP_MAC_ADMIN))
+		return -EPERM;
+
+	token = match_token(p, smk_mount_tokens, args);
+	switch (token) {
+	case Opt_fsdefault:
+		if (ctx->fsdefault)
+			goto error_dup;
+		ctx->fsdefault = match_strdup(&args[0]);
+		if (!ctx->fsdefault)
+			goto error;
+		break;
+	case Opt_fsfloor:
+		if (ctx->fsfloor)
+			goto error_dup;
+		ctx->fsfloor = match_strdup(&args[0]);
+		if (!ctx->fsfloor)
+			goto error;
+		break;
+	case Opt_fshat:
+		if (ctx->fshat)
+			goto error_dup;
+		ctx->fshat = match_strdup(&args[0]);
+		if (!ctx->fshat)
+			goto error;
+		break;
+	case Opt_fsroot:
+		if (ctx->fsroot)
+			goto error_dup;
+		ctx->fsroot = match_strdup(&args[0]);
+		if (!ctx->fsroot)
+			goto error;
+		break;
+	case Opt_fstransmute:
+		if (ctx->fstransmute)
+			goto error_dup;
+		ctx->fstransmute = match_strdup(&args[0]);
+		if (!ctx->fstransmute)
+			goto error;
+		break;
+	default:
+		pr_warn("Smack:  unknown mount option\n");
+		goto error_inval;
+	}
+
+	return 0;
+
+error_dup:
+	pr_warn("Smack: duplicate mount option\n");
+error_inval:
+	rc = -EINVAL;
+error:
+	return rc;
+}
+
+/**
+ * smack_fs_context_validate - Validate the filesystem context security data
+ * @fc: The filesystem context.
+ *
+ * Returns 0 on success or -ENOMEM on error.
+ */
+static int smack_fs_context_validate(struct fs_context *fc)
+{
+	struct smack_fs_context *ctx = fc->security;
+	struct superblock_smack *sbsp = ctx->sbsp;
+	struct inode_smack *isp = ctx->isp;
+	struct smack_known *skp;
+
+	if (ctx->fsdefault) {
+		skp = smk_import_entry(ctx->fsdefault, 0);
+		if (IS_ERR(skp))
+			return PTR_ERR(skp);
+		sbsp->smk_default = skp;
+	}
+
+	if (ctx->fsfloor) {
+		skp = smk_import_entry(ctx->fsfloor, 0);
+		if (IS_ERR(skp))
+			return PTR_ERR(skp);
+		sbsp->smk_floor = skp;
+	}
+
+	if (ctx->fshat) {
+		skp = smk_import_entry(ctx->fshat, 0);
+		if (IS_ERR(skp))
+			return PTR_ERR(skp);
+		sbsp->smk_hat = skp;
+	}
+
+	if (ctx->fsroot || ctx->fstransmute) {
+		skp = smk_import_entry(ctx->fstransmute ?: ctx->fsroot, 0);
+		if (IS_ERR(skp))
+			return PTR_ERR(skp);
+		sbsp->smk_root = skp;
+		ctx->transmute = !!ctx->fstransmute;
+	}
+
+	isp->smk_inode = sbsp->smk_root;
+	return 0;
+}
+
+/**
+ * smack_sb_get_tree - Assign the context to a newly created superblock
+ * @fc: The new filesystem context.
+ *
+ * Returns 0 on success or -ENOMEM on error.
+ */
+static int smack_sb_get_tree(struct fs_context *fc)
+{
+	struct smack_fs_context *ctx = fc->security;
+	struct superblock_smack *sbsp = ctx->sbsp;
+	struct dentry *root = fc->root;
+	struct inode *inode = d_backing_inode(root);
+	struct super_block *sb = root->d_sb;
+	struct inode_smack *isp;
+	bool transmute = ctx->transmute;
+
+	if (sb->s_security)
+		return 0;
+
+	if (!smack_privileged(CAP_MAC_ADMIN)) {
+		/*
+		 * For a handful of fs types with no user-controlled
+		 * backing store it's okay to trust security labels
+		 * in the filesystem. The rest are untrusted.
+		 */
+		if (fc->user_ns != &init_user_ns &&
+		    sb->s_magic != SYSFS_MAGIC && sb->s_magic != TMPFS_MAGIC &&
+		    sb->s_magic != RAMFS_MAGIC) {
+			transmute = true;
+			sbsp->smk_flags |= SMK_SB_UNTRUSTED;
+		}
+	}
+
+	sbsp->smk_flags |= SMK_SB_INITIALIZED;
+	sb->s_security = sbsp;
+	ctx->sbsp = NULL;
+
+	/* Initialize the root inode. */
+	isp = inode->i_security;
+	if (isp == NULL) {
+		isp = ctx->isp;
+		ctx->isp = NULL;
+		inode->i_security = isp;
+	} else
+		isp->smk_inode = sbsp->smk_root;
+
+	if (transmute)
+		isp->smk_flags |= SMK_INODE_TRANSMUTE;
+
+	return 0;
+}
 
 /*
  * Superblock Hooks.
@@ -4628,6 +4930,13 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(ptrace_traceme, smack_ptrace_traceme),
 	LSM_HOOK_INIT(syslog, smack_syslog),
 
+	LSM_HOOK_INIT(fs_context_alloc, smack_fs_context_alloc),
+	LSM_HOOK_INIT(fs_context_dup, smack_fs_context_dup),
+	LSM_HOOK_INIT(fs_context_free, smack_fs_context_free),
+	LSM_HOOK_INIT(fs_context_parse_option, smack_fs_context_parse_option),
+	LSM_HOOK_INIT(fs_context_validate, smack_fs_context_validate),
+	LSM_HOOK_INIT(sb_get_tree, smack_sb_get_tree),
+
 	LSM_HOOK_INIT(sb_alloc_security, smack_sb_alloc_security),
 	LSM_HOOK_INIT(sb_free_security, smack_sb_free_security),
 	LSM_HOOK_INIT(sb_copy_data, smack_sb_copy_data),

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/24] VFS: Require specification of size of mount data for internal mounts [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (6 preceding siblings ...)
  2018-04-19 13:31 ` [PATCH 07/24] smack: Implement filesystem context security hooks " David Howells
@ 2018-04-19 13:31 ` David Howells
  2018-04-19 13:32 ` [PATCH 09/24] VFS: Implement a filesystem superblock creation/configuration context " David Howells
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:31 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Require specification of the size of the mount data passed to the VFS
mounting functions by internal mounts.  The problem is that the legacy
handling for the upcoming mount-context patches has to copy an entire page
as that's how big the buffer is defined as being, but some of the internal
calls pass in a short bit of stack space, with the result that the stack
guard page may get hit.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/ia64/kernel/perfmon.c                         |    3 +
 arch/powerpc/platforms/cell/spufs/inode.c          |    6 +--
 arch/s390/hypfs/inode.c                            |    7 ++-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c           |    2 -
 drivers/base/devtmpfs.c                            |    6 +--
 drivers/dax/super.c                                |    2 -
 drivers/gpu/drm/drm_drv.c                          |    3 +
 drivers/gpu/drm/i915/i915_gemfs.c                  |    2 -
 drivers/infiniband/hw/qib/qib_fs.c                 |    7 ++-
 drivers/misc/ibmasm/ibmasmfs.c                     |   11 +++--
 drivers/mtd/mtdsuper.c                             |   26 +++++++-----
 drivers/oprofile/oprofilefs.c                      |    8 ++--
 .../staging/lustre/lustre/llite/llite_internal.h   |    2 -
 drivers/staging/lustre/lustre/llite/llite_lib.c    |    3 +
 drivers/staging/lustre/lustre/obdclass/obd_mount.c |    7 ++-
 drivers/staging/ncpfs/inode.c                      |   10 +++--
 drivers/usb/gadget/function/f_fs.c                 |    7 ++-
 drivers/usb/gadget/legacy/inode.c                  |    7 ++-
 drivers/virtio/virtio_balloon.c                    |    2 -
 drivers/xen/xenfs/super.c                          |    7 ++-
 fs/9p/vfs_super.c                                  |    2 -
 fs/adfs/super.c                                    |    9 ++--
 fs/affs/super.c                                    |   13 ++++--
 fs/afs/mntpt.c                                     |    3 +
 fs/afs/super.c                                     |    6 ++-
 fs/aio.c                                           |    3 +
 fs/anon_inodes.c                                   |    3 +
 fs/autofs4/autofs_i.h                              |    2 -
 fs/autofs4/init.c                                  |    4 +-
 fs/autofs4/inode.c                                 |    3 +
 fs/befs/linuxvfs.c                                 |   11 +++--
 fs/bfs/inode.c                                     |    8 ++--
 fs/binfmt_misc.c                                   |    7 ++-
 fs/block_dev.c                                     |    2 -
 fs/btrfs/super.c                                   |   30 ++++++++------
 fs/btrfs/tests/btrfs-tests.c                       |    2 -
 fs/ceph/super.c                                    |    3 +
 fs/cifs/cifs_dfs_ref.c                             |    3 +
 fs/cifs/cifsfs.c                                   |    5 +-
 fs/coda/inode.c                                    |   11 +++--
 fs/configfs/mount.c                                |    7 ++-
 fs/cramfs/inode.c                                  |   17 +++++---
 fs/debugfs/inode.c                                 |   14 ++++--
 fs/devpts/inode.c                                  |   10 +++--
 fs/ecryptfs/main.c                                 |    2 -
 fs/efivarfs/super.c                                |    9 +++-
 fs/efs/super.c                                     |   14 ++++--
 fs/exofs/super.c                                   |    7 ++-
 fs/ext2/super.c                                    |   14 ++++--
 fs/ext4/super.c                                    |   16 +++++--
 fs/f2fs/super.c                                    |   11 +++--
 fs/fat/inode.c                                     |    3 +
 fs/fat/namei_msdos.c                               |    8 ++--
 fs/fat/namei_vfat.c                                |    8 ++--
 fs/freevxfs/vxfs_super.c                           |   12 ++++-
 fs/fuse/control.c                                  |    9 +++-
 fs/fuse/inode.c                                    |   16 +++++--
 fs/gfs2/ops_fstype.c                               |    6 ++-
 fs/gfs2/super.c                                    |    4 +-
 fs/hfs/super.c                                     |   12 ++++-
 fs/hfsplus/super.c                                 |   12 ++++-
 fs/hostfs/hostfs_kern.c                            |    7 ++-
 fs/hpfs/super.c                                    |   11 +++--
 fs/hugetlbfs/inode.c                               |   13 ++++--
 fs/internal.h                                      |    4 +-
 fs/isofs/inode.c                                   |   11 +++--
 fs/jffs2/super.c                                   |   10 +++--
 fs/jfs/super.c                                     |   11 +++--
 fs/kernfs/mount.c                                  |    3 +
 fs/libfs.c                                         |    2 -
 fs/minix/inode.c                                   |   14 ++++--
 fs/namespace.c                                     |   38 ++++++++++-------
 fs/nfs/internal.h                                  |    4 +-
 fs/nfs/namespace.c                                 |    3 +
 fs/nfs/nfs4namespace.c                             |    3 +
 fs/nfs/nfs4super.c                                 |   27 +++++++-----
 fs/nfs/super.c                                     |   22 +++++-----
 fs/nfsd/nfsctl.c                                   |    8 ++--
 fs/nilfs2/super.c                                  |   10 +++--
 fs/nsfs.c                                          |    3 +
 fs/ntfs/super.c                                    |   13 ++++--
 fs/ocfs2/dlmfs/dlmfs.c                             |    5 +-
 fs/ocfs2/super.c                                   |   14 ++++--
 fs/omfs/inode.c                                    |    9 +++-
 fs/openpromfs/inode.c                              |   11 +++--
 fs/orangefs/orangefs-kernel.h                      |    2 -
 fs/orangefs/super.c                                |    5 +-
 fs/overlayfs/super.c                               |   11 +++--
 fs/pipe.c                                          |    3 +
 fs/proc/inode.c                                    |    3 +
 fs/proc/internal.h                                 |    4 +-
 fs/proc/root.c                                     |   11 +++--
 fs/pstore/inode.c                                  |   10 +++--
 fs/qnx4/inode.c                                    |   14 ++++--
 fs/qnx6/inode.c                                    |   14 ++++--
 fs/ramfs/inode.c                                   |    6 +--
 fs/reiserfs/super.c                                |   14 ++++--
 fs/romfs/super.c                                   |   13 ++++--
 fs/squashfs/super.c                                |   12 ++++-
 fs/super.c                                         |   44 +++++++++++---------
 fs/sysfs/mount.c                                   |    2 -
 fs/sysv/inode.c                                    |    3 +
 fs/sysv/super.c                                    |   16 +++++--
 fs/tracefs/inode.c                                 |   10 +++--
 fs/ubifs/super.c                                   |    5 +-
 fs/udf/super.c                                     |   16 +++++--
 fs/ufs/super.c                                     |   11 +++--
 fs/xfs/xfs_super.c                                 |   10 +++--
 include/linux/debugfs.h                            |    8 ++--
 include/linux/fs.h                                 |   29 +++++++------
 include/linux/lsm_hooks.h                          |   13 ++++--
 include/linux/mount.h                              |    5 +-
 include/linux/mtd/super.h                          |    4 +-
 include/linux/ramfs.h                              |    4 +-
 include/linux/security.h                           |   17 ++++----
 include/linux/shmem_fs.h                           |    3 +
 init/do_mounts.c                                   |    4 +-
 ipc/mqueue.c                                       |    9 ++--
 kernel/bpf/inode.c                                 |    7 ++-
 kernel/cgroup/cgroup.c                             |    2 -
 kernel/cgroup/cpuset.c                             |    7 ++-
 kernel/trace/trace.c                               |    7 ++-
 mm/shmem.c                                         |   10 +++--
 mm/zsmalloc.c                                      |    3 +
 net/socket.c                                       |    3 +
 net/sunrpc/rpc_pipe.c                              |    7 ++-
 security/apparmor/apparmorfs.c                     |    8 ++--
 security/apparmor/lsm.c                            |    3 +
 security/inode.c                                   |    7 ++-
 security/security.c                                |   18 +++++---
 security/selinux/hooks.c                           |   11 +++--
 security/selinux/selinuxfs.c                       |    8 ++--
 security/smack/smack_lsm.c                         |    6 ++-
 security/smack/smackfs.c                           |    9 +++-
 security/tomoyo/tomoyo.c                           |    4 +-
 135 files changed, 710 insertions(+), 470 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 8fb280e33114..cefb4dd72b53 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -611,7 +611,8 @@ pfm_unprotect_ctx_ctxsw(pfm_context_t *x, unsigned long f)
 static const struct dentry_operations pfmfs_dentry_operations;
 
 static struct dentry *
-pfmfs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data)
+pfmfs_mount(struct file_system_type *fs_type, int flags, const char *dev_name,
+	    void *data, size_t data_size)
 {
 	return mount_pseudo(fs_type, "pfm:", NULL, &pfmfs_dentry_operations,
 			PFMFS_MAGIC);
diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c
index db329d4bf1c3..90d55b47c471 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -734,7 +734,7 @@ spufs_create_root(struct super_block *sb, void *data)
 }
 
 static int
-spufs_fill_super(struct super_block *sb, void *data, int silent)
+spufs_fill_super(struct super_block *sb, void *data, size_t data_size, int silent)
 {
 	struct spufs_sb_info *info;
 	static const struct super_operations s_ops = {
@@ -761,9 +761,9 @@ spufs_fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *
 spufs_mount(struct file_system_type *fstype, int flags,
-		const char *name, void *data)
+		const char *name, void *data, size_t data_size)
 {
-	return mount_single(fstype, flags, data, spufs_fill_super);
+	return mount_single(fstype, flags, data, data_size, spufs_fill_super);
 }
 
 static struct file_system_type spufs_type = {
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 43bbe63e2992..791dcbffd65d 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -266,7 +266,8 @@ static int hypfs_show_options(struct seq_file *s, struct dentry *root)
 	return 0;
 }
 
-static int hypfs_fill_super(struct super_block *sb, void *data, int silent)
+static int hypfs_fill_super(struct super_block *sb,
+			    void *data, size_t data_size, int silent)
 {
 	struct inode *root_inode;
 	struct dentry *root_dentry;
@@ -309,9 +310,9 @@ static int hypfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *hypfs_mount(struct file_system_type *fst, int flags,
-			const char *devname, void *data)
+			const char *devname, void *data, size_t data_size)
 {
-	return mount_single(fst, flags, data, hypfs_fill_super);
+	return mount_single(fst, flags, data, data_size, hypfs_fill_super);
 }
 
 static void hypfs_kill_super(struct super_block *sb)
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index fca759d272a1..3584ef8de1fd 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1207,7 +1207,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 
 static struct dentry *rdt_mount(struct file_system_type *fs_type,
 				int flags, const char *unused_dev_name,
-				void *data)
+				void *data, size_t data_size)
 {
 	struct rdt_domain *dom;
 	struct rdt_resource *r;
diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 79a235184fb5..1b87a1e03b45 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -57,12 +57,12 @@ static int __init mount_param(char *str)
 __setup("devtmpfs.mount=", mount_param);
 
 static struct dentry *dev_mount(struct file_system_type *fs_type, int flags,
-		      const char *dev_name, void *data)
+		      const char *dev_name, void *data, size_t data_size)
 {
 #ifdef CONFIG_TMPFS
-	return mount_single(fs_type, flags, data, shmem_fill_super);
+	return mount_single(fs_type, flags, data, data_size, shmem_fill_super);
 #else
-	return mount_single(fs_type, flags, data, ramfs_fill_super);
+	return mount_single(fs_type, flags, data, data_size, ramfs_fill_super);
 #endif
 }
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 2b2332b605e4..cda4ab7b1dd4 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -396,7 +396,7 @@ static const struct super_operations dax_sops = {
 };
 
 static struct dentry *dax_mount(struct file_system_type *fs_type,
-		int flags, const char *dev_name, void *data)
+		int flags, const char *dev_name, void *data, size_t data_size)
 {
 	return mount_pseudo(fs_type, "dax:", &dax_sops, NULL, DAXFS_MAGIC);
 }
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index a1b9338736e3..fc652bb90b78 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -375,7 +375,8 @@ static const struct super_operations drm_fs_sops = {
 };
 
 static struct dentry *drm_fs_mount(struct file_system_type *fs_type, int flags,
-				   const char *dev_name, void *data)
+				   const char *dev_name,
+				   void *data, size_t data_size)
 {
 	return mount_pseudo(fs_type,
 			    "drm:",
diff --git a/drivers/gpu/drm/i915/i915_gemfs.c b/drivers/gpu/drm/i915/i915_gemfs.c
index 888b7d3f04c3..bf0a355e8f46 100644
--- a/drivers/gpu/drm/i915/i915_gemfs.c
+++ b/drivers/gpu/drm/i915/i915_gemfs.c
@@ -57,7 +57,7 @@ int i915_gemfs_init(struct drm_i915_private *i915)
 		int flags = 0;
 		int err;
 
-		err = sb->s_op->remount_fs(sb, &flags, options);
+		err = sb->s_op->remount_fs(sb, &flags, options, sizeof(options));
 		if (err) {
 			kern_unmount(gemfs);
 			return err;
diff --git a/drivers/infiniband/hw/qib/qib_fs.c b/drivers/infiniband/hw/qib/qib_fs.c
index 1d940a2885c9..28648ef1f4cc 100644
--- a/drivers/infiniband/hw/qib/qib_fs.c
+++ b/drivers/infiniband/hw/qib/qib_fs.c
@@ -506,7 +506,8 @@ static int remove_device_files(struct super_block *sb,
  * after device init.  The direct add_cntr_files() call handles adding
  * them from the init code, when the fs is already mounted.
  */
-static int qibfs_fill_super(struct super_block *sb, void *data, int silent)
+static int qibfs_fill_super(struct super_block *sb,
+			    void *data, size_t data_size, int silent)
 {
 	struct qib_devdata *dd, *tmp;
 	unsigned long flags;
@@ -541,11 +542,11 @@ static int qibfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *qibfs_mount(struct file_system_type *fs_type, int flags,
-			const char *dev_name, void *data)
+			const char *dev_name, void *data, size_t data_size)
 {
 	struct dentry *ret;
 
-	ret = mount_single(fs_type, flags, data, qibfs_fill_super);
+	ret = mount_single(fs_type, flags, data, data_size, qibfs_fill_super);
 	if (!IS_ERR(ret))
 		qib_super = ret->d_sb;
 	return ret;
diff --git a/drivers/misc/ibmasm/ibmasmfs.c b/drivers/misc/ibmasm/ibmasmfs.c
index e05c3245930a..d0378eec6bca 100644
--- a/drivers/misc/ibmasm/ibmasmfs.c
+++ b/drivers/misc/ibmasm/ibmasmfs.c
@@ -88,13 +88,15 @@ static LIST_HEAD(service_processors);
 
 static struct inode *ibmasmfs_make_inode(struct super_block *sb, int mode);
 static void ibmasmfs_create_files (struct super_block *sb);
-static int ibmasmfs_fill_super (struct super_block *sb, void *data, int silent);
+static int ibmasmfs_fill_super (struct super_block *sb, void *data, size_t data_size,
+				int silent);
 
 
 static struct dentry *ibmasmfs_mount(struct file_system_type *fst,
-			int flags, const char *name, void *data)
+				     int flags, const char *name,
+				     void *data, size_t data_size)
 {
-	return mount_single(fst, flags, data, ibmasmfs_fill_super);
+	return mount_single(fst, flags, data, data_size, ibmasmfs_fill_super);
 }
 
 static const struct super_operations ibmasmfs_s_ops = {
@@ -112,7 +114,8 @@ static struct file_system_type ibmasmfs_type = {
 };
 MODULE_ALIAS_FS("ibmasmfs");
 
-static int ibmasmfs_fill_super (struct super_block *sb, void *data, int silent)
+static int ibmasmfs_fill_super (struct super_block *sb,
+				void *data, size_t data_size, int silent)
 {
 	struct inode *root;
 
diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index d58a61c09304..13706ea5cf50 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -61,9 +61,9 @@ static int get_sb_mtd_set(struct super_block *sb, void *_mtd)
  * get a superblock on an MTD-backed filesystem
  */
 static struct dentry *mount_mtd_aux(struct file_system_type *fs_type, int flags,
-			  const char *dev_name, void *data,
+			  const char *dev_name, void *data, size_t data_size,
 			  struct mtd_info *mtd,
-			  int (*fill_super)(struct super_block *, void *, int))
+			  int (*fill_super)(struct super_block *, void *, size_t, int))
 {
 	struct super_block *sb;
 	int ret;
@@ -79,7 +79,7 @@ static struct dentry *mount_mtd_aux(struct file_system_type *fs_type, int flags,
 	pr_debug("MTDSB: New superblock for device %d (\"%s\")\n",
 	      mtd->index, mtd->name);
 
-	ret = fill_super(sb, data, flags & SB_SILENT ? 1 : 0);
+	ret = fill_super(sb, data, data_size, flags & SB_SILENT ? 1 : 0);
 	if (ret < 0) {
 		deactivate_locked_super(sb);
 		return ERR_PTR(ret);
@@ -105,8 +105,10 @@ static struct dentry *mount_mtd_aux(struct file_system_type *fs_type, int flags,
  * get a superblock on an MTD-backed filesystem by MTD device number
  */
 static struct dentry *mount_mtd_nr(struct file_system_type *fs_type, int flags,
-			 const char *dev_name, void *data, int mtdnr,
-			 int (*fill_super)(struct super_block *, void *, int))
+				   const char *dev_name,
+				   void *data, size_t data_size, int mtdnr,
+				   int (*fill_super)(struct super_block *, void *,
+						     size_t, int))
 {
 	struct mtd_info *mtd;
 
@@ -116,15 +118,16 @@ static struct dentry *mount_mtd_nr(struct file_system_type *fs_type, int flags,
 		return ERR_CAST(mtd);
 	}
 
-	return mount_mtd_aux(fs_type, flags, dev_name, data, mtd, fill_super);
+	return mount_mtd_aux(fs_type, flags, dev_name, data, data_size, mtd,
+			     fill_super);
 }
 
 /*
  * set up an MTD-based superblock
  */
 struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
-	       const char *dev_name, void *data,
-	       int (*fill_super)(struct super_block *, void *, int))
+			 const char *dev_name, void *data, size_t data_size,
+			 int (*fill_super)(struct super_block *, void *, size_t, int))
 {
 #ifdef CONFIG_BLOCK
 	struct block_device *bdev;
@@ -153,7 +156,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 			if (!IS_ERR(mtd))
 				return mount_mtd_aux(
 					fs_type, flags,
-					dev_name, data, mtd,
+					dev_name, data, data_size, mtd,
 					fill_super);
 
 			printk(KERN_NOTICE "MTD:"
@@ -170,7 +173,7 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 				pr_debug("MTDSB: mtd%%d, mtdnr %d\n",
 				      mtdnr);
 				return mount_mtd_nr(fs_type, flags,
-						     dev_name, data,
+						    dev_name, data, data_size,
 						     mtdnr, fill_super);
 			}
 		}
@@ -197,7 +200,8 @@ struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
 	if (major != MTD_BLOCK_MAJOR)
 		goto not_an_MTD_device;
 
-	return mount_mtd_nr(fs_type, flags, dev_name, data, mtdnr, fill_super);
+	return mount_mtd_nr(fs_type, flags, dev_name, data, data_size, mtdnr,
+			    fill_super);
 
 not_an_MTD_device:
 #endif /* CONFIG_BLOCK */
diff --git a/drivers/oprofile/oprofilefs.c b/drivers/oprofile/oprofilefs.c
index 4ea08979312c..c721d7fd7c7e 100644
--- a/drivers/oprofile/oprofilefs.c
+++ b/drivers/oprofile/oprofilefs.c
@@ -238,7 +238,8 @@ struct dentry *oprofilefs_mkdir(struct dentry *parent, char const *name)
 }
 
 
-static int oprofilefs_fill_super(struct super_block *sb, void *data, int silent)
+static int oprofilefs_fill_super(struct super_block *sb,
+				 void *data, size_t data_size, int silent)
 {
 	struct inode *root_inode;
 
@@ -265,9 +266,10 @@ static int oprofilefs_fill_super(struct super_block *sb, void *data, int silent)
 
 
 static struct dentry *oprofilefs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, oprofilefs_fill_super);
+	return mount_single(fs_type, flags, data, data_size,
+			    oprofilefs_fill_super);
 }
 
 
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index d46bcf71b273..48b218ecdbd6 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -810,7 +810,7 @@ int ll_iocontrol(struct inode *inode, struct file *file,
 		 unsigned int cmd, unsigned long arg);
 int ll_flush_ctx(struct inode *inode);
 void ll_umount_begin(struct super_block *sb);
-int ll_remount_fs(struct super_block *sb, int *flags, char *data);
+int ll_remount_fs(struct super_block *sb, int *flags, char *data, size_t data_size);
 int ll_show_options(struct seq_file *seq, struct dentry *dentry);
 void ll_dirty_page_discard_warn(struct page *page, int ioret);
 int ll_prep_inode(struct inode **inode, struct ptlrpc_request *req,
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index e7500c53fafc..d8bb57ff3797 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -2039,7 +2039,8 @@ void ll_umount_begin(struct super_block *sb)
 	schedule();
 }
 
-int ll_remount_fs(struct super_block *sb, int *flags, char *data)
+int ll_remount_fs(struct super_block *sb, int *flags,
+		  char *data, size_t data_size)
 {
 	struct ll_sb_info *sbi = ll_s2sbi(sb);
 	char *profilenm = get_profile_name(sb);
diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
index f5e8214ac37b..0fc2a5604a10 100644
--- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
+++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
@@ -1112,7 +1112,8 @@ static int lmd_parse(char *options, struct lustre_mount_data *lmd)
  * and this is where we start setting things up.
  * @param data Mount options (e.g. -o flock,abort_recov)
  */
-static int lustre_fill_super(struct super_block *sb, void *lmd2_data, int silent)
+static int lustre_fill_super(struct super_block *sb,
+			     void *lmd2_data, size_t data_size, int silent)
 {
 	struct lustre_mount_data *lmd;
 	struct lustre_sb_info *lsi;
@@ -1207,9 +1208,9 @@ EXPORT_SYMBOL(lustre_register_super_ops);
 
 /***************** FS registration ******************/
 static struct dentry *lustre_mount(struct file_system_type *fs_type, int flags,
-				   const char *devname, void *data)
+				   const char *devname, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, lustre_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, lustre_fill_super);
 }
 
 static void lustre_kill_super(struct super_block *sb)
diff --git a/drivers/staging/ncpfs/inode.c b/drivers/staging/ncpfs/inode.c
index bb411610a071..c26606ed0a0c 100644
--- a/drivers/staging/ncpfs/inode.c
+++ b/drivers/staging/ncpfs/inode.c
@@ -101,7 +101,8 @@ static void destroy_inodecache(void)
 	kmem_cache_destroy(ncp_inode_cachep);
 }
 
-static int ncp_remount(struct super_block *sb, int *flags, char* data)
+static int ncp_remount(struct super_block *sb, int *flags,
+		       char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_NODIRATIME;
@@ -466,7 +467,8 @@ static int ncp_parse_options(struct ncp_mount_data_kernel *data, char *options)
 	return ret;
 }
 
-static int ncp_fill_super(struct super_block *sb, void *raw_data, int silent)
+static int ncp_fill_super(struct super_block *sb,
+			  void *raw_data, size_t data_size, int silent)
 {
 	struct ncp_mount_data_kernel data;
 	struct ncp_server *server;
@@ -1023,9 +1025,9 @@ int ncp_notify_change(struct dentry *dentry, struct iattr *attr)
 }
 
 static struct dentry *ncp_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, ncp_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, ncp_fill_super);
 }
 
 static struct file_system_type ncp_fs_type = {
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 0294e4f18873..694d59e613f4 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1355,7 +1355,8 @@ struct ffs_sb_fill_data {
 	struct ffs_data *ffs_data;
 };
 
-static int ffs_sb_fill(struct super_block *sb, void *_data, int silent)
+static int ffs_sb_fill(struct super_block *sb, void *_data, size_t data_size,
+		       int silent)
 {
 	struct ffs_sb_fill_data *data = _data;
 	struct inode	*inode;
@@ -1483,7 +1484,7 @@ static int ffs_fs_parse_opts(struct ffs_sb_fill_data *data, char *opts)
 
 static struct dentry *
 ffs_fs_mount(struct file_system_type *t, int flags,
-	      const char *dev_name, void *opts)
+	     const char *dev_name, void *opts, size_t data_size)
 {
 	struct ffs_sb_fill_data data = {
 		.perms = {
@@ -1525,7 +1526,7 @@ ffs_fs_mount(struct file_system_type *t, int flags,
 	ffs->private_data = ffs_dev;
 	data.ffs_data = ffs;
 
-	rv = mount_nodev(t, flags, &data, ffs_sb_fill);
+	rv = mount_nodev(t, flags, &data, sizeof(data), ffs_sb_fill);
 	if (IS_ERR(rv) && data.ffs_data) {
 		ffs_release_dev(data.ffs_data);
 		ffs_data_put(data.ffs_data);
diff --git a/drivers/usb/gadget/legacy/inode.c b/drivers/usb/gadget/legacy/inode.c
index 37ca0e669bd8..286a982b43a3 100644
--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -1990,7 +1990,8 @@ static const struct super_operations gadget_fs_operations = {
 };
 
 static int
-gadgetfs_fill_super (struct super_block *sb, void *opts, int silent)
+gadgetfs_fill_super (struct super_block *sb, void *opts, size_t data_size,
+		     int silent)
 {
 	struct inode	*inode;
 	struct dev_data	*dev;
@@ -2046,9 +2047,9 @@ gadgetfs_fill_super (struct super_block *sb, void *opts, int silent)
 /* "mount -t gadgetfs path /dev/gadget" ends up here */
 static struct dentry *
 gadgetfs_mount (struct file_system_type *t, int flags,
-		const char *path, void *opts)
+		const char *path, void *opts, size_t data_size)
 {
-	return mount_single (t, flags, opts, gadgetfs_fill_super);
+	return mount_single (t, flags, opts, data_size, gadgetfs_fill_super);
 }
 
 static void
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6b237e3f4983..49f4a03ec162 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -526,7 +526,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 }
 
 static struct dentry *balloon_mount(struct file_system_type *fs_type,
-		int flags, const char *dev_name, void *data)
+		int flags, const char *dev_name, void *data, size_t data_size)
 {
 	static const struct dentry_operations ops = {
 		.d_dname = simple_dname,
diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 71ddfb4cf61c..fc4e6e43b66f 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -42,7 +42,8 @@ static const struct file_operations capabilities_file_ops = {
 	.llseek = default_llseek,
 };
 
-static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
+static int xenfs_fill_super(struct super_block *sb,
+			    void *data, size_t data_size, int silent)
 {
 	static const struct tree_descr xenfs_files[] = {
 		[2] = { "xenbus", &xen_xenbus_fops, S_IRUSR|S_IWUSR },
@@ -69,9 +70,9 @@ static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *xenfs_mount(struct file_system_type *fs_type,
 				  int flags, const char *dev_name,
-				  void *data)
+				  void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, xenfs_fill_super);
+	return mount_single(fs_type, flags, data, data_size, xenfs_fill_super);
 }
 
 static struct file_system_type xenfs_type = {
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 48ce50484e80..7def28abd3a5 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -116,7 +116,7 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses,
  */
 
 static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
-		       const char *dev_name, void *data)
+		       const char *dev_name, void *data, size_t data_size)
 {
 	struct super_block *sb = NULL;
 	struct inode *inode = NULL;
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index cfda2c7caedc..8a7b2d263afd 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -210,7 +210,7 @@ static int parse_options(struct super_block *sb, char *options)
 	return 0;
 }
 
-static int adfs_remount(struct super_block *sb, int *flags, char *data)
+static int adfs_remount(struct super_block *sb, int *flags, char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_NODIRATIME;
@@ -362,7 +362,8 @@ static inline unsigned long adfs_discsize(struct adfs_discrecord *dr, int block_
 	return discsize;
 }
 
-static int adfs_fill_super(struct super_block *sb, void *data, int silent)
+static int adfs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct adfs_discrecord *dr;
 	struct buffer_head *bh;
@@ -522,9 +523,9 @@ static int adfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *adfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, adfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size, adfs_fill_super);
 }
 
 static struct file_system_type adfs_fs_type = {
diff --git a/fs/affs/super.c b/fs/affs/super.c
index e602619aed9d..b406ffca2066 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -26,7 +26,8 @@
 
 static int affs_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int affs_show_options(struct seq_file *m, struct dentry *root);
-static int affs_remount (struct super_block *sb, int *flags, char *data);
+static int affs_remount (struct super_block *sb, int *flags,
+			 char *data, size_t data_size);
 
 static void
 affs_commit_super(struct super_block *sb, int wait)
@@ -334,7 +335,8 @@ static int affs_show_options(struct seq_file *m, struct dentry *root)
  * hopefully have the guts to do so. Until then: sorry for the mess.
  */
 
-static int affs_fill_super(struct super_block *sb, void *data, int silent)
+static int affs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct affs_sb_info	*sbi;
 	struct buffer_head	*root_bh = NULL;
@@ -549,7 +551,7 @@ static int affs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static int
-affs_remount(struct super_block *sb, int *flags, char *data)
+affs_remount(struct super_block *sb, int *flags, char *data, size_t data_size)
 {
 	struct affs_sb_info	*sbi = AFFS_SB(sb);
 	int			 blocksize;
@@ -632,9 +634,10 @@ affs_statfs(struct dentry *dentry, struct kstatfs *buf)
 }
 
 static struct dentry *affs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, affs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  affs_fill_super);
 }
 
 static void affs_kill_sb(struct super_block *sb)
diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
index 99fd13500a97..c45aa1776591 100644
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -152,7 +152,8 @@ static struct vfsmount *afs_mntpt_do_automount(struct dentry *mntpt)
 
 	/* try and do the mount */
 	_debug("--- attempting mount %s -o %s ---", devname, options);
-	mnt = vfs_submount(mntpt, &afs_fs_type, devname, options);
+	mnt = vfs_submount(mntpt, &afs_fs_type, devname,
+			   options, strlen(options) + 1);
 	_debug("--- mount result %p ---", mnt);
 
 	free_page((unsigned long) devname);
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 65081ec3c36e..7d17d01ca0cd 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -31,7 +31,8 @@
 
 static void afs_i_init_once(void *foo);
 static struct dentry *afs_mount(struct file_system_type *fs_type,
-		      int flags, const char *dev_name, void *data);
+				int flags, const char *dev_name,
+				void *data, size_t data_size);
 static void afs_kill_super(struct super_block *sb);
 static struct inode *afs_alloc_inode(struct super_block *sb);
 static void afs_destroy_inode(struct inode *inode);
@@ -460,7 +461,8 @@ static void afs_destroy_sbi(struct afs_super_info *as)
  * get an AFS superblock
  */
 static struct dentry *afs_mount(struct file_system_type *fs_type,
-				int flags, const char *dev_name, void *options)
+				int flags, const char *dev_name,
+				void *options, size_t data_size)
 {
 	struct afs_mount_params params;
 	struct super_block *sb;
diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..09ac6fd6dba9 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -234,7 +234,8 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
 }
 
 static struct dentry *aio_mount(struct file_system_type *fs_type,
-				int flags, const char *dev_name, void *data)
+				int flags, const char *dev_name,
+				void *data, size_t data_size)
 {
 	static const struct dentry_operations ops = {
 		.d_dname	= simple_dname,
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 3168ee4e77f4..13c06a7e0b85 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -39,7 +39,8 @@ static const struct dentry_operations anon_inodefs_dentry_operations = {
 };
 
 static struct dentry *anon_inodefs_mount(struct file_system_type *fs_type,
-				int flags, const char *dev_name, void *data)
+					 int flags, const char *dev_name,
+					 void *data, size_t data_size)
 {
 	return mount_pseudo(fs_type, "anon_inode:", NULL,
 			&anon_inodefs_dentry_operations, ANON_INODE_FS_MAGIC);
diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 4737615f0eaa..06a975ee5724 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -201,7 +201,7 @@ static inline void managed_dentry_clear_managed(struct dentry *dentry)
 
 /* Initializing function */
 
-int autofs4_fill_super(struct super_block *, void *, int);
+int autofs4_fill_super(struct super_block *, void *, size_t, int);
 struct autofs_info *autofs4_new_ino(struct autofs_sb_info *);
 void autofs4_clean_ino(struct autofs_info *);
 
diff --git a/fs/autofs4/init.c b/fs/autofs4/init.c
index 8cf0e63389ae..3335cfdb9403 100644
--- a/fs/autofs4/init.c
+++ b/fs/autofs4/init.c
@@ -11,9 +11,9 @@
 #include "autofs_i.h"
 
 static struct dentry *autofs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, autofs4_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, autofs4_fill_super);
 }
 
 static struct file_system_type autofs_fs_type = {
diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index 09e7d68dff02..49389477ba36 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -206,7 +206,8 @@ static int parse_options(char *options, int *pipefd, kuid_t *uid, kgid_t *gid,
 	return (*pipefd < 0);
 }
 
-int autofs4_fill_super(struct super_block *s, void *data, int silent)
+int autofs4_fill_super(struct super_block *s, void *data, size_t data_size,
+		       int silent)
 {
 	struct inode *root_inode;
 	struct dentry *root;
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index af2832aaeec5..70f958115745 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -52,7 +52,7 @@ static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
 			char **out, int *out_len);
 static void befs_put_super(struct super_block *);
-static int befs_remount(struct super_block *, int *, char *);
+static int befs_remount(struct super_block *, int *, char *, size_t);
 static int befs_statfs(struct dentry *, struct kstatfs *);
 static int befs_show_options(struct seq_file *, struct dentry *);
 static int parse_options(char *, struct befs_mount_options *);
@@ -817,7 +817,7 @@ befs_put_super(struct super_block *sb)
  * Load a set of NLS translations if needed.
  */
 static int
-befs_fill_super(struct super_block *sb, void *data, int silent)
+befs_fill_super(struct super_block *sb, void *data, size_t data_size, int silent)
 {
 	struct buffer_head *bh;
 	struct befs_sb_info *befs_sb;
@@ -949,7 +949,7 @@ befs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static int
-befs_remount(struct super_block *sb, int *flags, char *data)
+befs_remount(struct super_block *sb, int *flags, char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	if (!(*flags & SB_RDONLY))
@@ -983,9 +983,10 @@ befs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 static struct dentry *
 befs_mount(struct file_system_type *fs_type, int flags, const char *dev_name,
-	    void *data)
+	   void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, befs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  befs_fill_super);
 }
 
 static struct file_system_type befs_fs_type = {
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 9a69392f1fb3..6e76e4e762e8 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -317,7 +317,8 @@ void bfs_dump_imap(const char *prefix, struct super_block *s)
 #endif
 }
 
-static int bfs_fill_super(struct super_block *s, void *data, int silent)
+static int bfs_fill_super(struct super_block *s, void *data, size_t data_size,
+			  int silent)
 {
 	struct buffer_head *bh, *sbh;
 	struct bfs_super_block *bfs_sb;
@@ -460,9 +461,10 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent)
 }
 
 static struct dentry *bfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, bfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  bfs_fill_super);
 }
 
 static struct file_system_type bfs_fs_type = {
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index a41b48f82a70..274de8bfc004 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -814,7 +814,8 @@ static const struct super_operations s_ops = {
 	.evict_inode	= bm_evict_inode,
 };
 
-static int bm_fill_super(struct super_block *sb, void *data, int silent)
+static int bm_fill_super(struct super_block *sb, void *data, size_t data_size,
+			 int silent)
 {
 	int err;
 	static const struct tree_descr bm_files[] = {
@@ -830,9 +831,9 @@ static int bm_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *bm_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, bm_fill_super);
+	return mount_single(fs_type, flags, data, data_size, bm_fill_super);
 }
 
 static struct linux_binfmt misc_format = {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7ec920e27065..313e57b06425 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -786,7 +786,7 @@ static const struct super_operations bdev_sops = {
 };
 
 static struct dentry *bd_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
 	struct dentry *dent;
 	dent = mount_pseudo(fs_type, "bdev:", &bdev_sops, NULL, BDEVFS_MAGIC);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0628092b0b1b..8d8fcd2f0403 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -64,7 +64,8 @@ static const struct super_operations btrfs_super_ops;
 static struct file_system_type btrfs_fs_type;
 static struct file_system_type btrfs_root_fs_type;
 
-static int btrfs_remount(struct super_block *sb, int *flags, char *data);
+static int btrfs_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size);
 
 const char *btrfs_decode_error(int errno)
 {
@@ -1453,7 +1454,7 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
 	return root;
 }
 
-static int parse_security_options(char *orig_opts,
+static int parse_security_options(char *orig_opts, size_t data_size,
 				  struct security_mnt_opts *sec_opts)
 {
 	char *secdata = NULL;
@@ -1462,7 +1463,7 @@ static int parse_security_options(char *orig_opts,
 	secdata = alloc_secdata();
 	if (!secdata)
 		return -ENOMEM;
-	ret = security_sb_copy_data(orig_opts, secdata);
+	ret = security_sb_copy_data(orig_opts, data_size, secdata);
 	if (ret) {
 		free_secdata(secdata);
 		return ret;
@@ -1510,7 +1511,8 @@ static int setup_security_options(struct btrfs_fs_info *fs_info,
  *       for multiple device setup.  Make sure to keep it in sync.
  */
 static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
-		int flags, const char *device_name, void *data)
+				       int flags, const char *device_name,
+				       void *data, size_t data_size)
 {
 	struct block_device *bdev = NULL;
 	struct super_block *s;
@@ -1531,7 +1533,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 
 	security_init_mnt_opts(&new_sec_opts);
 	if (data) {
-		error = parse_security_options(data, &new_sec_opts);
+		error = parse_security_options(data, data_size, &new_sec_opts);
 		if (error)
 			return ERR_PTR(error);
 	}
@@ -1635,7 +1637,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
  *      "btrfs subvolume set-default", mount_subvol() is called always.
  */
 static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
-		const char *device_name, void *data)
+		const char *device_name, void *data, size_t data_size)
 {
 	struct vfsmount *mnt_root;
 	struct dentry *root;
@@ -1655,21 +1657,24 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
 	}
 
 	/* mount device's root (/) */
-	mnt_root = vfs_kern_mount(&btrfs_root_fs_type, flags, device_name, data);
+	mnt_root = vfs_kern_mount(&btrfs_root_fs_type, flags, device_name,
+				  data, data_size);
 	if (PTR_ERR_OR_ZERO(mnt_root) == -EBUSY) {
 		if (flags & SB_RDONLY) {
 			mnt_root = vfs_kern_mount(&btrfs_root_fs_type,
-				flags & ~SB_RDONLY, device_name, data);
+				flags & ~SB_RDONLY, device_name,
+				data, data_size);
 		} else {
 			mnt_root = vfs_kern_mount(&btrfs_root_fs_type,
-				flags | SB_RDONLY, device_name, data);
+				flags | SB_RDONLY, device_name,
+				data, data_size);
 			if (IS_ERR(mnt_root)) {
 				root = ERR_CAST(mnt_root);
 				goto out;
 			}
 
 			down_write(&mnt_root->mnt_sb->s_umount);
-			error = btrfs_remount(mnt_root->mnt_sb, &flags, NULL);
+			error = btrfs_remount(mnt_root->mnt_sb, &flags, NULL, 0);
 			up_write(&mnt_root->mnt_sb->s_umount);
 			if (error < 0) {
 				root = ERR_PTR(error);
@@ -1751,7 +1756,8 @@ static inline void btrfs_remount_cleanup(struct btrfs_fs_info *fs_info,
 	clear_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state);
 }
 
-static int btrfs_remount(struct super_block *sb, int *flags, char *data)
+static int btrfs_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_root *root = fs_info->tree_root;
@@ -1770,7 +1776,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		struct security_mnt_opts new_sec_opts;
 
 		security_init_mnt_opts(&new_sec_opts);
-		ret = parse_security_options(data, &new_sec_opts);
+		ret = parse_security_options(data, data_size, &new_sec_opts);
 		if (ret)
 			goto restore;
 		ret = setup_security_options(fs_info, sb,
diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index 30ed438da2a9..d646cf7b04e5 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -24,7 +24,7 @@ static const struct super_operations btrfs_test_super_ops = {
 
 static struct dentry *btrfs_test_mount(struct file_system_type *fs_type,
 				       int flags, const char *dev_name,
-				       void *data)
+				       void *data, size_t data_size)
 {
 	return mount_pseudo(fs_type, "btrfs_test:", &btrfs_test_super_ops,
 			    NULL, BTRFS_TEST_MAGIC);
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index b33082e6878f..81d035a7a204 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1002,7 +1002,8 @@ static int ceph_setup_bdi(struct super_block *sb, struct ceph_fs_client *fsc)
 }
 
 static struct dentry *ceph_mount(struct file_system_type *fs_type,
-		       int flags, const char *dev_name, void *data)
+				 int flags, const char *dev_name,
+				 void *data, size_t data_size)
 {
 	struct super_block *sb;
 	struct ceph_fs_client *fsc;
diff --git a/fs/cifs/cifs_dfs_ref.c b/fs/cifs/cifs_dfs_ref.c
index 6b61df117fd4..461d052a5d73 100644
--- a/fs/cifs/cifs_dfs_ref.c
+++ b/fs/cifs/cifs_dfs_ref.c
@@ -260,7 +260,8 @@ static struct vfsmount *cifs_dfs_do_refmount(struct dentry *mntpt,
 	if (IS_ERR(mountdata))
 		return (struct vfsmount *)mountdata;
 
-	mnt = vfs_submount(mntpt, &cifs_fs_type, devname, mountdata);
+	mnt = vfs_submount(mntpt, &cifs_fs_type, devname,
+			   mountdata, strlen(mountdata) + 1);
 	kfree(mountdata);
 	kfree(devname);
 	return mnt;
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index f715609b13f3..b4004c3b3a2a 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -572,7 +572,8 @@ static int cifs_show_stats(struct seq_file *s, struct dentry *root)
 }
 #endif
 
-static int cifs_remount(struct super_block *sb, int *flags, char *data)
+static int cifs_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_NODIRATIME;
@@ -675,7 +676,7 @@ static int cifs_set_super(struct super_block *sb, void *data)
 
 static struct dentry *
 cifs_do_mount(struct file_system_type *fs_type,
-	      int flags, const char *dev_name, void *data)
+	      int flags, const char *dev_name, void *data, size_t data_size)
 {
 	int rc;
 	struct super_block *sb;
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 97424cf206c0..dd819c150f70 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -93,7 +93,8 @@ void coda_destroy_inodecache(void)
 	kmem_cache_destroy(coda_inode_cachep);
 }
 
-static int coda_remount(struct super_block *sb, int *flags, char *data)
+static int coda_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_NOATIME;
@@ -150,7 +151,8 @@ static int get_device_index(struct coda_mount_data *data)
 	return -1;
 }
 
-static int coda_fill_super(struct super_block *sb, void *data, int silent)
+static int coda_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct inode *root = NULL;
 	struct venus_comm *vc;
@@ -316,9 +318,10 @@ static int coda_statfs(struct dentry *dentry, struct kstatfs *buf)
 /* init_coda: used by filesystems.c to register coda */
 
 static struct dentry *coda_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+				 int flags, const char *dev_name,
+				 void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, coda_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, coda_fill_super);
 }
 
 struct file_system_type coda_fs_type = {
diff --git a/fs/configfs/mount.c b/fs/configfs/mount.c
index cfd91320e869..c9c7c14eb9db 100644
--- a/fs/configfs/mount.c
+++ b/fs/configfs/mount.c
@@ -66,7 +66,8 @@ static struct configfs_dirent configfs_root = {
 	.s_iattr	= NULL,
 };
 
-static int configfs_fill_super(struct super_block *sb, void *data, int silent)
+static int configfs_fill_super(struct super_block *sb,
+			       void *data, size_t data_size, int silent)
 {
 	struct inode *inode;
 	struct dentry *root;
@@ -103,9 +104,9 @@ static int configfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *configfs_do_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, configfs_fill_super);
+	return mount_single(fs_type, flags, data, data_size, configfs_fill_super);
 }
 
 static struct file_system_type configfs_fs_type = {
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 017b0ab19bc4..79527b703256 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -502,7 +502,8 @@ static void cramfs_kill_sb(struct super_block *sb)
 	kfree(sbi);
 }
 
-static int cramfs_remount(struct super_block *sb, int *flags, char *data)
+static int cramfs_remount(struct super_block *sb, int *flags,
+			  char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_RDONLY;
@@ -603,7 +604,8 @@ static int cramfs_finalize_super(struct super_block *sb,
 	return 0;
 }
 
-static int cramfs_blkdev_fill_super(struct super_block *sb, void *data,
+static int cramfs_blkdev_fill_super(struct super_block *sb,
+				    void *data, size_t data_size,
 				    int silent)
 {
 	struct cramfs_sb_info *sbi;
@@ -625,8 +627,8 @@ static int cramfs_blkdev_fill_super(struct super_block *sb, void *data,
 	return cramfs_finalize_super(sb, &super.root);
 }
 
-static int cramfs_mtd_fill_super(struct super_block *sb, void *data,
-				 int silent)
+static int cramfs_mtd_fill_super(struct super_block *sb,
+				 void *data, size_t data_size, int silent)
 {
 	struct cramfs_sb_info *sbi;
 	struct cramfs_super super;
@@ -951,18 +953,19 @@ static const struct super_operations cramfs_ops = {
 };
 
 static struct dentry *cramfs_mount(struct file_system_type *fs_type, int flags,
-				   const char *dev_name, void *data)
+				   const char *dev_name,
+				   void *data, size_t data_size)
 {
 	struct dentry *ret = ERR_PTR(-ENOPROTOOPT);
 
 	if (IS_ENABLED(CONFIG_CRAMFS_MTD)) {
-		ret = mount_mtd(fs_type, flags, dev_name, data,
+		ret = mount_mtd(fs_type, flags, dev_name, data, data_size,
 				cramfs_mtd_fill_super);
 		if (!IS_ERR(ret))
 			return ret;
 	}
 	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV)) {
-		ret = mount_bdev(fs_type, flags, dev_name, data,
+		ret = mount_bdev(fs_type, flags, dev_name, data, data_size,
 				 cramfs_blkdev_fill_super);
 	}
 	return ret;
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 13b01351dd1c..57ba6d891c85 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -130,7 +130,8 @@ static int debugfs_apply_options(struct super_block *sb)
 	return 0;
 }
 
-static int debugfs_remount(struct super_block *sb, int *flags, char *data)
+static int debugfs_remount(struct super_block *sb, int *flags,
+			   char *data, size_t data_size)
 {
 	int err;
 	struct debugfs_fs_info *fsi = sb->s_fs_info;
@@ -190,7 +191,7 @@ static struct vfsmount *debugfs_automount(struct path *path)
 {
 	debugfs_automount_t f;
 	f = (debugfs_automount_t)path->dentry->d_fsdata;
-	return f(path->dentry, d_inode(path->dentry)->i_private);
+	return f(path->dentry, d_inode(path->dentry)->i_private, 0);
 }
 
 static const struct dentry_operations debugfs_dops = {
@@ -199,7 +200,8 @@ static const struct dentry_operations debugfs_dops = {
 	.d_automount = debugfs_automount,
 };
 
-static int debug_fill_super(struct super_block *sb, void *data, int silent)
+static int debug_fill_super(struct super_block *sb,
+			    void *data, size_t data_size, int silent)
 {
 	static const struct tree_descr debug_files[] = {{""}};
 	struct debugfs_fs_info *fsi;
@@ -235,9 +237,9 @@ static int debug_fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *debug_mount(struct file_system_type *fs_type,
 			int flags, const char *dev_name,
-			void *data)
+			void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, debug_fill_super);
+	return mount_single(fs_type, flags, data, data_size, debug_fill_super);
 }
 
 static struct file_system_type debug_fs_type = {
@@ -539,7 +541,7 @@ EXPORT_SYMBOL_GPL(debugfs_create_dir);
 struct dentry *debugfs_create_automount(const char *name,
 					struct dentry *parent,
 					debugfs_automount_t f,
-					void *data)
+					void *data, size_t data_size)
 {
 	struct dentry *dentry = start_creating(name, parent);
 	struct inode *inode;
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index e072e955ce33..2dee3d0c8554 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -386,7 +386,8 @@ static void update_ptmx_mode(struct pts_fs_info *fsi)
 	}
 }
 
-static int devpts_remount(struct super_block *sb, int *flags, char *data)
+static int devpts_remount(struct super_block *sb, int *flags,
+			  char *data, size_t data_size)
 {
 	int err;
 	struct pts_fs_info *fsi = DEVPTS_SB(sb);
@@ -447,7 +448,8 @@ static void *new_pts_fs_info(struct super_block *sb)
 }
 
 static int
-devpts_fill_super(struct super_block *s, void *data, int silent)
+devpts_fill_super(struct super_block *s, void *data, size_t data_size,
+		  int silent)
 {
 	struct inode *inode;
 	int error;
@@ -504,9 +506,9 @@ devpts_fill_super(struct super_block *s, void *data, int silent)
  *     instance are independent of the PTYs in other devpts instances.
  */
 static struct dentry *devpts_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, devpts_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, devpts_fill_super);
 }
 
 static void devpts_kill_sb(struct super_block *sb)
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index 025d66a705db..5d029b7e069a 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -488,7 +488,7 @@ static struct file_system_type ecryptfs_fs_type;
  * @raw_data: The options passed into the kernel
  */
 static struct dentry *ecryptfs_mount(struct file_system_type *fs_type, int flags,
-			const char *dev_name, void *raw_data)
+			const char *dev_name, void *raw_data, size_t data_size)
 {
 	struct super_block *s;
 	struct ecryptfs_sb_info *sbi;
diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 5b68e4294faa..db0e417f1c7e 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -191,7 +191,8 @@ static int efivarfs_destroy(struct efivar_entry *entry, void *data)
 	return 0;
 }
 
-static int efivarfs_fill_super(struct super_block *sb, void *data, int silent)
+static int efivarfs_fill_super(struct super_block *sb,
+			       void *data, size_t data_size, int silent)
 {
 	struct inode *inode = NULL;
 	struct dentry *root;
@@ -227,9 +228,11 @@ static int efivarfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *efivarfs_mount(struct file_system_type *fs_type,
-				    int flags, const char *dev_name, void *data)
+				     int flags, const char *dev_name,
+				     void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, efivarfs_fill_super);
+	return mount_single(fs_type, flags, data, data_size,
+			    efivarfs_fill_super);
 }
 
 static void efivarfs_kill_sb(struct super_block *sb)
diff --git a/fs/efs/super.c b/fs/efs/super.c
index 6ffb7ba1547a..ce85f22651f3 100644
--- a/fs/efs/super.c
+++ b/fs/efs/super.c
@@ -19,12 +19,14 @@
 #include <linux/efs_fs_sb.h>
 
 static int efs_statfs(struct dentry *dentry, struct kstatfs *buf);
-static int efs_fill_super(struct super_block *s, void *d, int silent);
+static int efs_fill_super(struct super_block *s, void *d, size_t data_size,
+			  int silent);
 
 static struct dentry *efs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, efs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  efs_fill_super);
 }
 
 static void efs_kill_sb(struct super_block *s)
@@ -113,7 +115,8 @@ static void destroy_inodecache(void)
 	kmem_cache_destroy(efs_inode_cachep);
 }
 
-static int efs_remount(struct super_block *sb, int *flags, char *data)
+static int efs_remount(struct super_block *sb, int *flags,
+		       char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_RDONLY;
@@ -253,7 +256,8 @@ static int efs_validate_super(struct efs_sb_info *sb, struct efs_super *super) {
 	return 0;    
 }
 
-static int efs_fill_super(struct super_block *s, void *d, int silent)
+static int efs_fill_super(struct super_block *s, void *d, size_t data_size,
+			  int silent)
 {
 	struct efs_sb_info *sb;
 	struct buffer_head *bh;
diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 179cd5c2f52a..21886644b6f3 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -705,7 +705,8 @@ static int exofs_read_lookup_dev_table(struct exofs_sb_info *sbi,
 /*
  * Read the superblock from the OSD and fill in the fields
  */
-static int exofs_fill_super(struct super_block *sb, void *data, int silent)
+static int exofs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			    int silent)
 {
 	struct inode *root;
 	struct exofs_mountopt *opts = data;
@@ -861,7 +862,7 @@ static int exofs_fill_super(struct super_block *sb, void *data, int silent)
  */
 static struct dentry *exofs_mount(struct file_system_type *type,
 			  int flags, const char *dev_name,
-			  void *data)
+			  void *data, size_t data_size)
 {
 	struct exofs_mountopt opts;
 	int ret;
@@ -872,7 +873,7 @@ static struct dentry *exofs_mount(struct file_system_type *type,
 
 	if (!opts.dev_name)
 		opts.dev_name = dev_name;
-	return mount_nodev(type, flags, &opts, exofs_fill_super);
+	return mount_nodev(type, flags, &opts, 0, exofs_fill_super);
 }
 
 /*
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index de1694512f1f..ca2cd53959b3 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -39,7 +39,8 @@
 #include "acl.h"
 
 static void ext2_write_super(struct super_block *sb);
-static int ext2_remount (struct super_block * sb, int * flags, char * data);
+static int ext2_remount (struct super_block * sb, int * flags,
+			 char * data, size_t data_size);
 static int ext2_statfs (struct dentry * dentry, struct kstatfs * buf);
 static int ext2_sync_fs(struct super_block *sb, int wait);
 static int ext2_freeze(struct super_block *sb);
@@ -815,7 +816,8 @@ static unsigned long descriptor_loc(struct super_block *sb,
 	return ext2_group_first_block_no(sb, bg) + has_super;
 }
 
-static int ext2_fill_super(struct super_block *sb, void *data, int silent)
+static int ext2_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
 	struct buffer_head * bh;
@@ -1319,7 +1321,8 @@ static void ext2_write_super(struct super_block *sb)
 		ext2_sync_fs(sb, 1);
 }
 
-static int ext2_remount (struct super_block * sb, int * flags, char * data)
+static int ext2_remount (struct super_block * sb, int * flags,
+			 char *data, size_t data_size)
 {
 	struct ext2_sb_info * sbi = EXT2_SB(sb);
 	struct ext2_super_block * es;
@@ -1473,9 +1476,10 @@ static int ext2_statfs (struct dentry * dentry, struct kstatfs * buf)
 }
 
 static struct dentry *ext2_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, ext2_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  ext2_fill_super);
 }
 
 #ifdef CONFIG_QUOTA
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 185f7e61f4cf..966c79a9c93c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -70,12 +70,13 @@ static void ext4_mark_recovery_complete(struct super_block *sb,
 static void ext4_clear_journal_err(struct super_block *sb,
 				   struct ext4_super_block *es);
 static int ext4_sync_fs(struct super_block *sb, int wait);
-static int ext4_remount(struct super_block *sb, int *flags, char *data);
+static int ext4_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size);
 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int ext4_unfreeze(struct super_block *sb);
 static int ext4_freeze(struct super_block *sb);
 static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
-		       const char *dev_name, void *data);
+		       const char *dev_name, void *data, size_t data_size);
 static inline int ext2_feature_set_ok(struct super_block *sb);
 static inline int ext3_feature_set_ok(struct super_block *sb);
 static int ext4_feature_set_ok(struct super_block *sb, int readonly);
@@ -3405,7 +3406,8 @@ static void ext4_set_resv_clusters(struct super_block *sb)
 	atomic64_set(&sbi->s_resv_clusters, resv_clusters);
 }
 
-static int ext4_fill_super(struct super_block *sb, void *data, int silent)
+static int ext4_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
 	char *orig_data = kstrdup(data, GFP_KERNEL);
@@ -4976,7 +4978,8 @@ struct ext4_mount_options {
 #endif
 };
 
-static int ext4_remount(struct super_block *sb, int *flags, char *data)
+static int ext4_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	struct ext4_super_block *es;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -5735,9 +5738,10 @@ static int ext4_get_next_id(struct super_block *sb, struct kqid *qid)
 #endif
 
 static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
-		       const char *dev_name, void *data)
+		       const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, ext4_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  ext4_fill_super);
 }
 
 #if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a31cc49b7295..96db72d025d8 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1384,7 +1384,8 @@ static void default_options(struct f2fs_sb_info *sbi)
 #ifdef CONFIG_QUOTA
 static int f2fs_enable_quotas(struct super_block *sb);
 #endif
-static int f2fs_remount(struct super_block *sb, int *flags, char *data)
+static int f2fs_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	struct f2fs_sb_info *sbi = F2FS_SB(sb);
 	struct f2fs_mount_info org_mount_opt;
@@ -2589,7 +2590,8 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
 	}
 }
 
-static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
+static int f2fs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct f2fs_sb_info *sbi;
 	struct f2fs_super_block *raw_super;
@@ -3015,9 +3017,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *f2fs_mount(struct file_system_type *fs_type, int flags,
-			const char *dev_name, void *data)
+			const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, f2fs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  f2fs_fill_super);
 }
 
 static void kill_f2fs_super(struct super_block *sb)
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index ffbbf0520d9e..53765ffd96c0 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -778,7 +778,8 @@ static void __exit fat_destroy_inodecache(void)
 	kmem_cache_destroy(fat_inode_cachep);
 }
 
-static int fat_remount(struct super_block *sb, int *flags, char *data)
+static int fat_remount(struct super_block *sb, int *flags,
+		       char *data, size_t data_size)
 {
 	bool new_rdonly;
 	struct msdos_sb_info *sbi = MSDOS_SB(sb);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 582ca731a6c9..8df24e45fea6 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -650,16 +650,18 @@ static void setup(struct super_block *sb)
 	sb->s_flags |= SB_NOATIME;
 }
 
-static int msdos_fill_super(struct super_block *sb, void *data, int silent)
+static int msdos_fill_super(struct super_block *sb, void *data, size_t data_size,
+			    int silent)
 {
 	return fat_fill_super(sb, data, silent, 0, setup);
 }
 
 static struct dentry *msdos_mount(struct file_system_type *fs_type,
 			int flags, const char *dev_name,
-			void *data)
+			void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, msdos_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  msdos_fill_super);
 }
 
 static struct file_system_type msdos_fs_type = {
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 2649759c478a..b3c2c0ccde15 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1054,16 +1054,18 @@ static void setup(struct super_block *sb)
 		sb->s_d_op = &vfat_dentry_ops;
 }
 
-static int vfat_fill_super(struct super_block *sb, void *data, int silent)
+static int vfat_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	return fat_fill_super(sb, data, silent, 1, setup);
 }
 
 static struct dentry *vfat_mount(struct file_system_type *fs_type,
 		       int flags, const char *dev_name,
-		       void *data)
+		       void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, vfat_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  vfat_fill_super);
 }
 
 static struct file_system_type vfat_fs_type = {
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 48b24bb50d02..1c6cf91f6de9 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -113,7 +113,8 @@ vxfs_statfs(struct dentry *dentry, struct kstatfs *bufp)
 	return 0;
 }
 
-static int vxfs_remount(struct super_block *sb, int *flags, char *data)
+static int vxfs_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_RDONLY;
@@ -199,6 +200,7 @@ static int vxfs_try_sb_magic(struct super_block *sbp, int silent,
  * vxfs_read_super - read superblock into memory and initialize filesystem
  * @sbp:		VFS superblock (to fill)
  * @dp:			fs private mount data
+ * @data_size:		size of mount data
  * @silent:		do not complain loudly when sth is wrong
  *
  * Description:
@@ -211,7 +213,8 @@ static int vxfs_try_sb_magic(struct super_block *sbp, int silent,
  * Locking:
  *   We are under @sbp->s_lock.
  */
-static int vxfs_fill_super(struct super_block *sbp, void *dp, int silent)
+static int vxfs_fill_super(struct super_block *sbp, void *dp, size_t data_size,
+			   int silent)
 {
 	struct vxfs_sb_info	*infp;
 	struct vxfs_sb		*rsbp;
@@ -312,9 +315,10 @@ static int vxfs_fill_super(struct super_block *sbp, void *dp, int silent)
  * The usual module blurb.
  */
 static struct dentry *vxfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, vxfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  vxfs_fill_super);
 }
 
 static struct file_system_type vxfs_fs_type = {
diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index b9ea99c5b5b3..5d0abd02aa83 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -290,7 +290,8 @@ void fuse_ctl_remove_conn(struct fuse_conn *fc)
 	drop_nlink(d_inode(fuse_control_sb->s_root));
 }
 
-static int fuse_ctl_fill_super(struct super_block *sb, void *data, int silent)
+static int fuse_ctl_fill_super(struct super_block *sb,
+			       void *data, size_t data_size, int silent)
 {
 	static const struct tree_descr empty_descr = {""};
 	struct fuse_conn *fc;
@@ -317,9 +318,11 @@ static int fuse_ctl_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *fuse_ctl_mount(struct file_system_type *fs_type,
-			int flags, const char *dev_name, void *raw_data)
+				     int flags, const char *dev_name,
+				     void *raw_data, size_t data_size)
 {
-	return mount_single(fs_type, flags, raw_data, fuse_ctl_fill_super);
+	return mount_single(fs_type, flags, raw_data, data_size,
+			    fuse_ctl_fill_super);
 }
 
 static void fuse_ctl_kill_sb(struct super_block *sb)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ef309958e060..e150d078419f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -138,7 +138,8 @@ static void fuse_evict_inode(struct inode *inode)
 	}
 }
 
-static int fuse_remount_fs(struct super_block *sb, int *flags, char *data)
+static int fuse_remount_fs(struct super_block *sb, int *flags,
+			   char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	if (*flags & SB_MANDLOCK)
@@ -1043,7 +1044,8 @@ void fuse_dev_free(struct fuse_dev *fud)
 }
 EXPORT_SYMBOL_GPL(fuse_dev_free);
 
-static int fuse_fill_super(struct super_block *sb, void *data, int silent)
+static int fuse_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct fuse_dev *fud;
 	struct fuse_conn *fc;
@@ -1187,9 +1189,10 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *fuse_mount(struct file_system_type *fs_type,
 		       int flags, const char *dev_name,
-		       void *raw_data)
+		       void *raw_data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, raw_data, fuse_fill_super);
+	return mount_nodev(fs_type, flags, raw_data, data_size,
+			   fuse_fill_super);
 }
 
 static void fuse_kill_sb_anon(struct super_block *sb)
@@ -1217,9 +1220,10 @@ MODULE_ALIAS_FS("fuse");
 #ifdef CONFIG_BLOCK
 static struct dentry *fuse_mount_blk(struct file_system_type *fs_type,
 			   int flags, const char *dev_name,
-			   void *raw_data)
+			   void *raw_data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, raw_data, fuse_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, raw_data, data_size,
+			  fuse_fill_super);
 }
 
 static void fuse_kill_sb_blk(struct super_block *sb)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 3ba3f167641c..a8a664eed01e 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1239,6 +1239,7 @@ static int test_gfs2_super(struct super_block *s, void *ptr)
  * @flags: Mount flags
  * @dev_name: The name of the device
  * @data: The mount arguments
+ * @data_size: The size of the mount arguments
  *
  * Q. Why not use get_sb_bdev() ?
  * A. We need to select one of two root directories to mount, independent
@@ -1248,7 +1249,7 @@ static int test_gfs2_super(struct super_block *s, void *ptr)
  */
 
 static struct dentry *gfs2_mount(struct file_system_type *fs_type, int flags,
-		       const char *dev_name, void *data)
+		       const char *dev_name, void *data, size_t data_size)
 {
 	struct block_device *bdev;
 	struct super_block *s;
@@ -1345,7 +1346,8 @@ static int set_meta_super(struct super_block *s, void *ptr)
 }
 
 static struct dentry *gfs2_mount_meta(struct file_system_type *fs_type,
-			int flags, const char *dev_name, void *data)
+				      int flags, const char *dev_name,
+				      void *data, size_t data_size)
 {
 	struct super_block *s;
 	struct gfs2_sbd *sdp;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index cf5c7f3080d2..a2add54e63e3 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1228,11 +1228,13 @@ static int gfs2_statfs(struct dentry *dentry, struct kstatfs *buf)
  * @sb:  the filesystem
  * @flags:  the remount flags
  * @data:  extra data passed in (not used right now)
+ * @data_size: size of the extra data
  *
  * Returns: errno
  */
 
-static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
+static int gfs2_remount_fs(struct super_block *sb, int *flags,
+			   char *data, size_t data_size)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct gfs2_args args = sdp->sd_args; /* Default to current settings */
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 173876782f73..e739b381b041 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -111,7 +111,8 @@ static int hfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return 0;
 }
 
-static int hfs_remount(struct super_block *sb, int *flags, char *data)
+static int hfs_remount(struct super_block *sb, int *flags,
+		       char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_NODIRATIME;
@@ -382,7 +383,8 @@ static int parse_options(char *options, struct hfs_sb_info *hsb)
  * hfs_btree_init() to get the necessary data about the extents and
  * catalog B-trees and, finally, reading the root inode into memory.
  */
-static int hfs_fill_super(struct super_block *sb, void *data, int silent)
+static int hfs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			  int silent)
 {
 	struct hfs_sb_info *sbi;
 	struct hfs_find_data fd;
@@ -458,9 +460,11 @@ static int hfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *hfs_mount(struct file_system_type *fs_type,
-		      int flags, const char *dev_name, void *data)
+				int flags, const char *dev_name,
+				void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, hfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  hfs_fill_super);
 }
 
 static struct file_system_type hfs_fs_type = {
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 513c357c734b..758e2315be60 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -326,7 +326,8 @@ static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
 	return 0;
 }
 
-static int hfsplus_remount(struct super_block *sb, int *flags, char *data)
+static int hfsplus_remount(struct super_block *sb, int *flags,
+			   char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
@@ -371,7 +372,8 @@ static const struct super_operations hfsplus_sops = {
 	.show_options	= hfsplus_show_options,
 };
 
-static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
+static int hfsplus_fill_super(struct super_block *sb,
+			      void *data, size_t data_size, int silent)
 {
 	struct hfsplus_vh *vhdr;
 	struct hfsplus_sb_info *sbi;
@@ -640,9 +642,11 @@ static void hfsplus_destroy_inode(struct inode *inode)
 #define HFSPLUS_INODE_SIZE	sizeof(struct hfsplus_inode_info)
 
 static struct dentry *hfsplus_mount(struct file_system_type *fs_type,
-			  int flags, const char *dev_name, void *data)
+				    int flags, const char *dev_name,
+				    void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, hfsplus_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  hfsplus_fill_super);
 }
 
 static struct file_system_type hfsplus_fs_type = {
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 3cd85eb5bbb1..09047b506764 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -923,7 +923,8 @@ static const struct inode_operations hostfs_link_iops = {
 	.get_link	= hostfs_get_link,
 };
 
-static int hostfs_fill_sb_common(struct super_block *sb, void *d, int silent)
+static int hostfs_fill_sb_common(struct super_block *sb,
+				 void *d, size_t data_size, int silent)
 {
 	struct inode *root_inode;
 	char *host_root_path, *req_root = d;
@@ -983,9 +984,9 @@ static int hostfs_fill_sb_common(struct super_block *sb, void *d, int silent)
 
 static struct dentry *hostfs_read_sb(struct file_system_type *type,
 			  int flags, const char *dev_name,
-			  void *data)
+			  void *data, size_t data_size)
 {
-	return mount_nodev(type, flags, data, hostfs_fill_sb_common);
+	return mount_nodev(type, flags, data, data_size, hostfs_fill_sb_common);
 }
 
 static void hostfs_kill_sb(struct super_block *s)
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index f2c3ebcd309c..53e585b27c05 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -445,7 +445,8 @@ HPFS filesystem options:\n\
 \n");
 }
 
-static int hpfs_remount_fs(struct super_block *s, int *flags, char *data)
+static int hpfs_remount_fs(struct super_block *s, int *flags,
+			   char *data, size_t data_size)
 {
 	kuid_t uid;
 	kgid_t gid;
@@ -540,7 +541,8 @@ static const struct super_operations hpfs_sops =
 	.show_options	= hpfs_show_options,
 };
 
-static int hpfs_fill_super(struct super_block *s, void *options, int silent)
+static int hpfs_fill_super(struct super_block *s,
+			   void *options, size_t data_size, int silent)
 {
 	struct buffer_head *bh0, *bh1, *bh2;
 	struct hpfs_boot_block *bootblock;
@@ -757,9 +759,10 @@ bail2:	brelse(bh0);
 }
 
 static struct dentry *hpfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, hpfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  hpfs_fill_super);
 }
 
 static struct file_system_type hpfs_fs_type = {
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index d508c7844681..76fb8eb2bea8 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1220,7 +1220,8 @@ hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
 }
 
 static int
-hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
+hugetlbfs_fill_super(struct super_block *sb, void *data, size_t data_size,
+		     int silent)
 {
 	int ret;
 	struct hugetlbfs_config config;
@@ -1279,9 +1280,10 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *hugetlbfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, hugetlbfs_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size,
+			   hugetlbfs_fill_super);
 }
 
 static struct file_system_type hugetlbfs_fs_type = {
@@ -1420,10 +1422,11 @@ static int __init init_hugetlbfs_fs(void)
 	for_each_hstate(h) {
 		char buf[50];
 		unsigned ps_kb = 1U << (h->order + PAGE_SHIFT - 10);
+		int n;
 
-		snprintf(buf, sizeof(buf), "pagesize=%uK", ps_kb);
+		n = snprintf(buf, sizeof(buf), "pagesize=%uK", ps_kb);
 		hugetlbfs_vfsmount[i] = kern_mount_data(&hugetlbfs_fs_type,
-							buf);
+							buf, n + 1);
 
 		if (IS_ERR(hugetlbfs_vfsmount[i])) {
 			pr_err("Cannot mount internal hugetlbfs for "
diff --git a/fs/internal.h b/fs/internal.h
index e08972db0303..1afa522c5f30 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -98,10 +98,10 @@ extern struct file *get_empty_filp(void);
 /*
  * super.c
  */
-extern int do_remount_sb(struct super_block *, int, void *, int);
+extern int do_remount_sb(struct super_block *, int, void *, size_t, int);
 extern bool trylock_super(struct super_block *sb);
 extern struct dentry *mount_fs(struct file_system_type *,
-			       int, const char *, void *);
+			       int, const char *, void *, size_t);
 extern struct super_block *user_get_super(dev_t);
 
 /*
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index bc258a4402f6..4f24361db2d6 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -111,7 +111,8 @@ static void destroy_inodecache(void)
 	kmem_cache_destroy(isofs_inode_cachep);
 }
 
-static int isofs_remount(struct super_block *sb, int *flags, char *data)
+static int isofs_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	if (!(*flags & SB_RDONLY))
@@ -616,7 +617,8 @@ static bool rootdir_empty(struct super_block *sb, unsigned long block)
  * Note: a check_disk_change() has been done immediately prior
  * to this call, so we don't need to check again.
  */
-static int isofs_fill_super(struct super_block *s, void *data, int silent)
+static int isofs_fill_super(struct super_block *s, void *data, size_t data_size,
+			    int silent)
 {
 	struct buffer_head *bh = NULL, *pri_bh = NULL;
 	struct hs_primary_descriptor *h_pri = NULL;
@@ -1555,9 +1557,10 @@ struct inode *__isofs_iget(struct super_block *sb,
 }
 
 static struct dentry *isofs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, isofs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  isofs_fill_super);
 }
 
 static struct file_system_type iso9660_fs_type = {
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index f60dee7faf03..51715d90473a 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -238,7 +238,8 @@ static int jffs2_parse_options(struct jffs2_sb_info *c, char *data)
 	return 0;
 }
 
-static int jffs2_remount_fs(struct super_block *sb, int *flags, char *data)
+static int jffs2_remount_fs(struct super_block *sb, int *flags,
+			    char *data, size_t data_size)
 {
 	struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
 	int err;
@@ -267,7 +268,8 @@ static const struct super_operations jffs2_super_operations =
 /*
  * fill in the superblock
  */
-static int jffs2_fill_super(struct super_block *sb, void *data, int silent)
+static int jffs2_fill_super(struct super_block *sb,
+			    void *data, size_t data_size, int silent)
 {
 	struct jffs2_sb_info *c;
 	int ret;
@@ -312,9 +314,9 @@ static int jffs2_fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *jffs2_mount(struct file_system_type *fs_type,
 			int flags, const char *dev_name,
-			void *data)
+			void *data, size_t data_size)
 {
-	return mount_mtd(fs_type, flags, dev_name, data, jffs2_fill_super);
+	return mount_mtd(fs_type, flags, dev_name, data, data_size, jffs2_fill_super);
 }
 
 static void jffs2_put_super (struct super_block *sb)
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 1b9264fd54b6..88f30ff12564 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -456,7 +456,8 @@ static int parse_options(char *options, struct super_block *sb, s64 *newLVSize,
 	return 0;
 }
 
-static int jfs_remount(struct super_block *sb, int *flags, char *data)
+static int jfs_remount(struct super_block *sb, int *flags,
+		       char *data, size_t data_size)
 {
 	s64 newLVSize = 0;
 	int rc = 0;
@@ -516,7 +517,8 @@ static int jfs_remount(struct super_block *sb, int *flags, char *data)
 	return 0;
 }
 
-static int jfs_fill_super(struct super_block *sb, void *data, int silent)
+static int jfs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			  int silent)
 {
 	struct jfs_sb_info *sbi;
 	struct inode *inode;
@@ -698,9 +700,10 @@ static int jfs_unfreeze(struct super_block *sb)
 }
 
 static struct dentry *jfs_do_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, jfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  jfs_fill_super);
 }
 
 static int jfs_sync_fs(struct super_block *sb, int wait)
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 26dd9a50f383..c05efbd1822e 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -22,7 +22,8 @@
 
 struct kmem_cache *kernfs_node_cache;
 
-static int kernfs_sop_remount_fs(struct super_block *sb, int *flags, char *data)
+static int kernfs_sop_remount_fs(struct super_block *sb, int *flags,
+				 char *data, size_t data_size)
 {
 	struct kernfs_root *root = kernfs_info(sb)->root;
 	struct kernfs_syscall_ops *scops = root->syscall_ops;
diff --git a/fs/libfs.c b/fs/libfs.c
index 0fb590d79f30..9f1f4884b7cc 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -578,7 +578,7 @@ int simple_pin_fs(struct file_system_type *type, struct vfsmount **mount, int *c
 	spin_lock(&pin_fs_lock);
 	if (unlikely(!*mount)) {
 		spin_unlock(&pin_fs_lock);
-		mnt = vfs_kern_mount(type, SB_KERNMOUNT, type->name, NULL);
+		mnt = vfs_kern_mount(type, SB_KERNMOUNT, type->name, NULL, 0);
 		if (IS_ERR(mnt))
 			return PTR_ERR(mnt);
 		spin_lock(&pin_fs_lock);
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 72e308c3e66b..3d91d9096b24 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -22,7 +22,8 @@
 static int minix_write_inode(struct inode *inode,
 		struct writeback_control *wbc);
 static int minix_statfs(struct dentry *dentry, struct kstatfs *buf);
-static int minix_remount (struct super_block * sb, int * flags, char * data);
+static int minix_remount (struct super_block * sb, int * flags,
+			  char * data, size_t data_size);
 
 static void minix_evict_inode(struct inode *inode)
 {
@@ -118,7 +119,8 @@ static const struct super_operations minix_sops = {
 	.remount_fs	= minix_remount,
 };
 
-static int minix_remount (struct super_block * sb, int * flags, char * data)
+static int minix_remount (struct super_block * sb, int * flags,
+			  char * data, size_t data_size)
 {
 	struct minix_sb_info * sbi = minix_sb(sb);
 	struct minix_super_block * ms;
@@ -155,7 +157,8 @@ static int minix_remount (struct super_block * sb, int * flags, char * data)
 	return 0;
 }
 
-static int minix_fill_super(struct super_block *s, void *data, int silent)
+static int minix_fill_super(struct super_block *s, void *data, size_t data_size,
+			    int silent)
 {
 	struct buffer_head *bh;
 	struct buffer_head **map;
@@ -651,9 +654,10 @@ void minix_truncate(struct inode * inode)
 }
 
 static struct dentry *minix_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, minix_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  minix_fill_super);
 }
 
 static struct file_system_type minix_fs_type = {
diff --git a/fs/namespace.c b/fs/namespace.c
index 3f98e1a36b84..8fc4f3459b80 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1020,7 +1020,8 @@ static struct mount *skip_mnt_tree(struct mount *p)
 }
 
 struct vfsmount *
-vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data)
+vfs_kern_mount(struct file_system_type *type, int flags, const char *name,
+	       void *data, size_t data_size)
 {
 	struct mount *mnt;
 	struct dentry *root;
@@ -1035,7 +1036,7 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	if (flags & SB_KERNMOUNT)
 		mnt->mnt.mnt_flags = MNT_INTERNAL;
 
-	root = mount_fs(type, flags, name, data);
+	root = mount_fs(type, flags, name, data, data_size);
 	if (IS_ERR(root)) {
 		mnt_free_id(mnt);
 		free_vfsmnt(mnt);
@@ -1055,7 +1056,7 @@ EXPORT_SYMBOL_GPL(vfs_kern_mount);
 
 struct vfsmount *
 vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
-	     const char *name, void *data)
+	     const char *name, void *data, size_t data_size)
 {
 	/* Until it is worked out how to pass the user namespace
 	 * through from the parent mount to the submount don't support
@@ -1064,7 +1065,7 @@ vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
 	if (mountpoint->d_sb->s_user_ns != &init_user_ns)
 		return ERR_PTR(-EPERM);
 
-	return vfs_kern_mount(type, SB_SUBMOUNT, name, data);
+	return vfs_kern_mount(type, SB_SUBMOUNT, name, data, data_size);
 }
 EXPORT_SYMBOL_GPL(vfs_submount);
 
@@ -1594,7 +1595,7 @@ static int do_umount(struct mount *mnt, int flags)
 			return -EPERM;
 		down_write(&sb->s_umount);
 		if (!sb_rdonly(sb))
-			retval = do_remount_sb(sb, SB_RDONLY, NULL, 0);
+			retval = do_remount_sb(sb, SB_RDONLY, NULL, 0, 0);
 		up_write(&sb->s_umount);
 		return retval;
 	}
@@ -2287,7 +2288,7 @@ static int change_mount_flags(struct vfsmount *mnt, int ms_flags)
  * on it - tough luck.
  */
 static int do_remount(struct path *path, int ms_flags, int sb_flags,
-		      int mnt_flags, void *data)
+		      int mnt_flags, void *data, size_t data_size)
 {
 	int err;
 	struct super_block *sb = path->mnt->mnt_sb;
@@ -2326,7 +2327,7 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
 		return -EPERM;
 	}
 
-	err = security_sb_remount(sb, data);
+	err = security_sb_remount(sb, data, data_size);
 	if (err)
 		return err;
 
@@ -2336,7 +2337,7 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
 	else if (!capable(CAP_SYS_ADMIN))
 		err = -EPERM;
 	else
-		err = do_remount_sb(sb, sb_flags, data, 0);
+		err = do_remount_sb(sb, sb_flags, data, data_size, 0);
 	if (!err) {
 		lock_mount_hash();
 		mnt_flags |= mnt->mnt.mnt_flags & ~MNT_USER_SETTABLE_MASK;
@@ -2502,7 +2503,8 @@ static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags);
  * namespace's tree
  */
 static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
-			int mnt_flags, const char *name, void *data)
+			int mnt_flags, const char *name,
+			void *data, size_t data_size)
 {
 	struct file_system_type *type;
 	struct vfsmount *mnt;
@@ -2515,7 +2517,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 	if (!type)
 		return -ENODEV;
 
-	mnt = vfs_kern_mount(type, sb_flags, name, data);
+	mnt = vfs_kern_mount(type, sb_flags, name, data, data_size);
 	if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
 	    !mnt->mnt_sb->s_subtype)
 		mnt = fs_set_subtype(mnt, fstype);
@@ -2771,6 +2773,7 @@ long do_mount(const char *dev_name, const char __user *dir_name,
 {
 	struct path path;
 	unsigned int mnt_flags = 0, sb_flags;
+	size_t data_size = data_page ? PAGE_SIZE : 0;
 	int retval = 0;
 
 	/* Discard magic */
@@ -2789,8 +2792,8 @@ long do_mount(const char *dev_name, const char __user *dir_name,
 	if (retval)
 		return retval;
 
-	retval = security_sb_mount(dev_name, &path,
-				   type_page, flags, data_page);
+	retval = security_sb_mount(dev_name, &path, type_page, flags,
+				   data_page, data_size);
 	if (!retval && !may_mount())
 		retval = -EPERM;
 	if (!retval && (flags & SB_MANDLOCK) && !may_mandlock())
@@ -2837,7 +2840,7 @@ long do_mount(const char *dev_name, const char __user *dir_name,
 
 	if (flags & MS_REMOUNT)
 		retval = do_remount(&path, flags, sb_flags, mnt_flags,
-				    data_page);
+				    data_page, data_size);
 	else if (flags & MS_BIND)
 		retval = do_loopback(&path, dev_name, flags & MS_REC);
 	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
@@ -2846,7 +2849,7 @@ long do_mount(const char *dev_name, const char __user *dir_name,
 		retval = do_move_mount(&path, dev_name);
 	else
 		retval = do_new_mount(&path, type_page, sb_flags, mnt_flags,
-				      dev_name, data_page);
+				      dev_name, data_page, data_size);
 dput_out:
 	path_put(&path);
 	return retval;
@@ -3236,7 +3239,7 @@ static void __init init_mount_tree(void)
 	type = get_fs_type("rootfs");
 	if (!type)
 		panic("Can't find rootfs type");
-	mnt = vfs_kern_mount(type, 0, "rootfs", NULL);
+	mnt = vfs_kern_mount(type, 0, "rootfs", NULL, 0);
 	put_filesystem(type);
 	if (IS_ERR(mnt))
 		panic("Can't create rootfs");
@@ -3298,10 +3301,11 @@ void put_mnt_ns(struct mnt_namespace *ns)
 	free_mnt_ns(ns);
 }
 
-struct vfsmount *kern_mount_data(struct file_system_type *type, void *data)
+struct vfsmount *kern_mount_data(struct file_system_type *type,
+				 void *data, size_t data_size)
 {
 	struct vfsmount *mnt;
-	mnt = vfs_kern_mount(type, SB_KERNMOUNT, type->name, data);
+	mnt = vfs_kern_mount(type, SB_KERNMOUNT, type->name, data, data_size);
 	if (!IS_ERR(mnt)) {
 		/*
 		 * it is a longterm mount, don't release mnt until
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 8357ff69962f..db0f3ca3a35c 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -405,7 +405,7 @@ int nfs_set_sb_security(struct super_block *, struct dentry *, struct nfs_mount_
 int nfs_clone_sb_security(struct super_block *, struct dentry *, struct nfs_mount_info *);
 struct dentry *nfs_fs_mount_common(struct nfs_server *, int, const char *,
 				   struct nfs_mount_info *, struct nfs_subversion *);
-struct dentry *nfs_fs_mount(struct file_system_type *, int, const char *, void *);
+struct dentry *nfs_fs_mount(struct file_system_type *, int, const char *, void *, size_t);
 struct dentry * nfs_xdev_mount_common(struct file_system_type *, int,
 		const char *, struct nfs_mount_info *);
 void nfs_kill_super(struct super_block *);
@@ -466,7 +466,7 @@ int  nfs_show_options(struct seq_file *, struct dentry *);
 int  nfs_show_devname(struct seq_file *, struct dentry *);
 int  nfs_show_path(struct seq_file *, struct dentry *);
 int  nfs_show_stats(struct seq_file *, struct dentry *);
-int nfs_remount(struct super_block *sb, int *flags, char *raw_data);
+int nfs_remount(struct super_block *sb, int *flags, char *raw_data, size_t data_size);
 
 /* write.c */
 extern void nfs_pageio_init_write(struct nfs_pageio_descriptor *pgio,
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index e5686be67be8..df9e87331558 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -216,7 +216,8 @@ static struct vfsmount *nfs_do_clone_mount(struct nfs_server *server,
 					   const char *devname,
 					   struct nfs_clone_mount *mountdata)
 {
-	return vfs_submount(mountdata->dentry, &nfs_xdev_fs_type, devname, mountdata);
+	return vfs_submount(mountdata->dentry, &nfs_xdev_fs_type, devname,
+			    mountdata, 0);
 }
 
 /**
diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c
index 24f06dcc2b08..191cb4202056 100644
--- a/fs/nfs/nfs4namespace.c
+++ b/fs/nfs/nfs4namespace.c
@@ -278,7 +278,8 @@ static struct vfsmount *try_location(struct nfs_clone_mount *mountdata,
 				mountdata->hostname,
 				mountdata->mnt_path);
 
-		mnt = vfs_submount(mountdata->dentry, &nfs4_referral_fs_type, page, mountdata);
+		mnt = vfs_submount(mountdata->dentry, &nfs4_referral_fs_type, page,
+				   mountdata, 0);
 		if (!IS_ERR(mnt))
 			break;
 	}
diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index 6fb7cb6b3f4b..e72e5dbdfcd0 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -18,11 +18,11 @@
 static int nfs4_write_inode(struct inode *inode, struct writeback_control *wbc);
 static void nfs4_evict_inode(struct inode *inode);
 static struct dentry *nfs4_remote_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *raw_data);
+	int flags, const char *dev_name, void *raw_data, size_t data_size);
 static struct dentry *nfs4_referral_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *raw_data);
+	int flags, const char *dev_name, void *raw_data, size_t data_size);
 static struct dentry *nfs4_remote_referral_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *raw_data);
+	int flags, const char *dev_name, void *raw_data, size_t data_size);
 
 static struct file_system_type nfs4_remote_fs_type = {
 	.owner		= THIS_MODULE,
@@ -105,7 +105,7 @@ static void nfs4_evict_inode(struct inode *inode)
  */
 static struct dentry *
 nfs4_remote_mount(struct file_system_type *fs_type, int flags,
-		  const char *dev_name, void *info)
+		  const char *dev_name, void *info, size_t data_size)
 {
 	struct nfs_mount_info *mount_info = info;
 	struct nfs_server *server;
@@ -127,7 +127,7 @@ nfs4_remote_mount(struct file_system_type *fs_type, int flags,
 }
 
 static struct vfsmount *nfs_do_root_mount(struct file_system_type *fs_type,
-		int flags, void *data, const char *hostname)
+		int flags, void *data, size_t data_size, const char *hostname)
 {
 	struct vfsmount *root_mnt;
 	char *root_devname;
@@ -142,7 +142,8 @@ static struct vfsmount *nfs_do_root_mount(struct file_system_type *fs_type,
 		snprintf(root_devname, len, "[%s]:/", hostname);
 	else
 		snprintf(root_devname, len, "%s:/", hostname);
-	root_mnt = vfs_kern_mount(fs_type, flags, root_devname, data);
+	root_mnt = vfs_kern_mount(fs_type, flags, root_devname,
+				  data, data_size);
 	kfree(root_devname);
 	return root_mnt;
 }
@@ -247,8 +248,8 @@ struct dentry *nfs4_try_mount(int flags, const char *dev_name,
 
 	export_path = data->nfs_server.export_path;
 	data->nfs_server.export_path = "/";
-	root_mnt = nfs_do_root_mount(&nfs4_remote_fs_type, flags, mount_info,
-			data->nfs_server.hostname);
+	root_mnt = nfs_do_root_mount(&nfs4_remote_fs_type, flags, mount_info, 0,
+				     data->nfs_server.hostname);
 	data->nfs_server.export_path = export_path;
 
 	res = nfs_follow_remote_path(root_mnt, export_path);
@@ -261,7 +262,8 @@ struct dentry *nfs4_try_mount(int flags, const char *dev_name,
 
 static struct dentry *
 nfs4_remote_referral_mount(struct file_system_type *fs_type, int flags,
-			   const char *dev_name, void *raw_data)
+			   const char *dev_name,
+			   void *raw_data, size_t data_size)
 {
 	struct nfs_mount_info mount_info = {
 		.fill_super = nfs_fill_super,
@@ -294,7 +296,8 @@ nfs4_remote_referral_mount(struct file_system_type *fs_type, int flags,
  * Create an NFS4 server record on referral traversal
  */
 static struct dentry *nfs4_referral_mount(struct file_system_type *fs_type,
-		int flags, const char *dev_name, void *raw_data)
+					  int flags, const char *dev_name,
+					  void *raw_data, size_t data_size)
 {
 	struct nfs_clone_mount *data = raw_data;
 	char *export_path;
@@ -306,8 +309,8 @@ static struct dentry *nfs4_referral_mount(struct file_system_type *fs_type,
 	export_path = data->mnt_path;
 	data->mnt_path = "/";
 
-	root_mnt = nfs_do_root_mount(&nfs4_remote_referral_fs_type,
-			flags, data, data->hostname);
+	root_mnt = nfs_do_root_mount(&nfs4_remote_referral_fs_type, flags,
+				     data, 0, data->hostname);
 	data->mnt_path = export_path;
 
 	res = nfs_follow_remote_path(root_mnt, export_path);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 5e470e233c83..b5f27d6999e5 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -287,7 +287,8 @@ static match_table_t nfs_vers_tokens = {
 };
 
 static struct dentry *nfs_xdev_mount(struct file_system_type *fs_type,
-		int flags, const char *dev_name, void *raw_data);
+				     int flags, const char *dev_name,
+				     void *raw_data, size_t data_size);
 
 struct file_system_type nfs_fs_type = {
 	.owner		= THIS_MODULE,
@@ -1203,7 +1204,7 @@ static int nfs_get_option_ul_bound(substring_t args[], unsigned long *option,
  * skipped as they are encountered.  If there were no errors, return 1;
  * otherwise return 0 (zero).
  */
-static int nfs_parse_mount_options(char *raw,
+static int nfs_parse_mount_options(char *raw, size_t raw_size,
 				   struct nfs_parsed_mount_data *mnt)
 {
 	char *p, *string, *secdata;
@@ -1221,7 +1222,7 @@ static int nfs_parse_mount_options(char *raw,
 	if (!secdata)
 		goto out_nomem;
 
-	rc = security_sb_copy_data(raw, secdata);
+	rc = security_sb_copy_data(raw, raw_size, secdata);
 	if (rc)
 		goto out_security_failure;
 
@@ -2151,7 +2152,7 @@ static int nfs_validate_mount_data(struct file_system_type *fs_type,
 }
 #endif
 
-static int nfs_validate_text_mount_data(void *options,
+static int nfs_validate_text_mount_data(void *options, size_t data_size,
 					struct nfs_parsed_mount_data *args,
 					const char *dev_name)
 {
@@ -2160,7 +2161,7 @@ static int nfs_validate_text_mount_data(void *options,
 	int max_pathlen = NFS_MAXPATHLEN;
 	struct sockaddr *sap = (struct sockaddr *)&args->nfs_server.address;
 
-	if (nfs_parse_mount_options((char *)options, args) == 0)
+	if (nfs_parse_mount_options((char *)options, data_size, args) == 0)
 		return -EINVAL;
 
 	if (!nfs_verify_server_address(sap))
@@ -2243,7 +2244,7 @@ nfs_compare_remount_data(struct nfs_server *nfss,
 }
 
 int
-nfs_remount(struct super_block *sb, int *flags, char *raw_data)
+nfs_remount(struct super_block *sb, int *flags, char *raw_data, size_t data_size)
 {
 	int error;
 	struct nfs_server *nfss = sb->s_fs_info;
@@ -2290,7 +2291,7 @@ nfs_remount(struct super_block *sb, int *flags, char *raw_data)
 
 	/* overwrite those values with any that were specified */
 	error = -EINVAL;
-	if (!nfs_parse_mount_options((char *)options, data))
+	if (!nfs_parse_mount_options((char *)options, data_size, data))
 		goto out;
 
 	/*
@@ -2662,7 +2663,7 @@ struct dentry *nfs_fs_mount_common(struct nfs_server *server,
 EXPORT_SYMBOL_GPL(nfs_fs_mount_common);
 
 struct dentry *nfs_fs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *raw_data)
+	int flags, const char *dev_name, void *raw_data, size_t data_size)
 {
 	struct nfs_mount_info mount_info = {
 		.fill_super = nfs_fill_super,
@@ -2680,7 +2681,8 @@ struct dentry *nfs_fs_mount(struct file_system_type *fs_type,
 	/* Validate the mount data */
 	error = nfs_validate_mount_data(fs_type, raw_data, mount_info.parsed, mount_info.mntfh, dev_name);
 	if (error == NFS_TEXT_DATA)
-		error = nfs_validate_text_mount_data(raw_data, mount_info.parsed, dev_name);
+		error = nfs_validate_text_mount_data(raw_data, data_size,
+						     mount_info.parsed, dev_name);
 	if (error < 0) {
 		mntroot = ERR_PTR(error);
 		goto out;
@@ -2724,7 +2726,7 @@ EXPORT_SYMBOL_GPL(nfs_kill_super);
  */
 static struct dentry *
 nfs_xdev_mount(struct file_system_type *fs_type, int flags,
-		const char *dev_name, void *raw_data)
+		const char *dev_name, void *raw_data, size_t data_size)
 {
 	struct nfs_clone_mount *data = raw_data;
 	struct nfs_mount_info mount_info = {
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d107b4426f7e..661296305123 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1144,7 +1144,8 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
  *	populating the filesystem.
  */
 
-static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
+static int nfsd_fill_super(struct super_block * sb,
+			   void * data, size_t data_size, int silent)
 {
 	static const struct tree_descr nfsd_files[] = {
 		[NFSD_List] = {"exports", &exports_nfsd_operations, S_IRUGO},
@@ -1179,10 +1180,11 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 }
 
 static struct dentry *nfsd_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
 	struct net *net = current->nsproxy->net_ns;
-	return mount_ns(fs_type, flags, data, net, net->user_ns, nfsd_fill_super);
+	return mount_ns(fs_type, flags, data, data_size,
+			net, net->user_ns, nfsd_fill_super);
 }
 
 static void nfsd_umount(struct super_block *sb)
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6ffeca84d7c3..3a21a1ab141f 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -69,7 +69,8 @@ struct kmem_cache *nilfs_segbuf_cachep;
 struct kmem_cache *nilfs_btree_path_cache;
 
 static int nilfs_setup_super(struct super_block *sb, int is_mount);
-static int nilfs_remount(struct super_block *sb, int *flags, char *data);
+static int nilfs_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size);
 
 void __nilfs_msg(struct super_block *sb, const char *level, const char *fmt,
 		 ...)
@@ -1118,7 +1119,8 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent)
 	return err;
 }
 
-static int nilfs_remount(struct super_block *sb, int *flags, char *data)
+static int nilfs_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size)
 {
 	struct the_nilfs *nilfs = sb->s_fs_info;
 	unsigned long old_sb_flags;
@@ -1278,7 +1280,7 @@ static int nilfs_test_bdev_super(struct super_block *s, void *data)
 
 static struct dentry *
 nilfs_mount(struct file_system_type *fs_type, int flags,
-	     const char *dev_name, void *data)
+	    const char *dev_name, void *data, size_t data_size)
 {
 	struct nilfs_super_data sd;
 	struct super_block *s;
@@ -1346,7 +1348,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 			 * Try remount to setup mount states if the current
 			 * tree is not mounted and only snapshots use this sb.
 			 */
-			err = nilfs_remount(s, &flags, data);
+			err = nilfs_remount(s, &flags, data, data_size);
 			if (err)
 				goto failed_super;
 		}
diff --git a/fs/nsfs.c b/fs/nsfs.c
index 60702d677bd4..f069eb6495b0 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -263,7 +263,8 @@ static const struct super_operations nsfs_ops = {
 	.show_path = nsfs_show_path,
 };
 static struct dentry *nsfs_mount(struct file_system_type *fs_type,
-			int flags, const char *dev_name, void *data)
+				 int flags, const char *dev_name,
+				 void *data, size_t data_size)
 {
 	return mount_pseudo(fs_type, "nsfs:", &nsfs_ops,
 			&ns_dentry_operations, NSFS_MAGIC);
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index bb7159f697f2..8501bbcceb5a 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -456,6 +456,7 @@ static inline int ntfs_clear_volume_flags(ntfs_volume *vol, VOLUME_FLAGS flags)
  * @sb:		superblock of mounted ntfs filesystem
  * @flags:	remount flags
  * @opt:	remount options string
+ * @data_size:	size of the options string
  *
  * Change the mount options of an already mounted ntfs filesystem.
  *
@@ -463,7 +464,8 @@ static inline int ntfs_clear_volume_flags(ntfs_volume *vol, VOLUME_FLAGS flags)
  * ntfs_remount() returns successfully (i.e. returns 0).  Otherwise,
  * @sb->s_flags are not changed.
  */
-static int ntfs_remount(struct super_block *sb, int *flags, char *opt)
+static int ntfs_remount(struct super_block *sb, int *flags,
+			char *opt, size_t data_size)
 {
 	ntfs_volume *vol = NTFS_SB(sb);
 
@@ -2694,6 +2696,7 @@ static const struct super_operations ntfs_sops = {
  * ntfs_fill_super - mount an ntfs filesystem
  * @sb:		super block of ntfs filesystem to mount
  * @opt:	string containing the mount options
+ * @data_size:	size of the mount options string
  * @silent:	silence error output
  *
  * ntfs_fill_super() is called by the VFS to mount the device described by @sb
@@ -2708,7 +2711,8 @@ static const struct super_operations ntfs_sops = {
  *
  * NOTE: @sb->s_flags contains the mount options flags.
  */
-static int ntfs_fill_super(struct super_block *sb, void *opt, const int silent)
+static int ntfs_fill_super(struct super_block *sb, void *opt, size_t data_size,
+			   const int silent)
 {
 	ntfs_volume *vol;
 	struct buffer_head *bh;
@@ -3060,9 +3064,10 @@ struct kmem_cache *ntfs_index_ctx_cache;
 DEFINE_MUTEX(ntfs_lock);
 
 static struct dentry *ntfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, ntfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  ntfs_fill_super);
 }
 
 static struct file_system_type ntfs_fs_type = {
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index 602c71f32740..642e471a6472 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -568,6 +568,7 @@ static int dlmfs_unlink(struct inode *dir,
 
 static int dlmfs_fill_super(struct super_block * sb,
 			    void * data,
+			    size_t data_size,
 			    int silent)
 {
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
@@ -617,9 +618,9 @@ static const struct inode_operations dlmfs_file_inode_operations = {
 };
 
 static struct dentry *dlmfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, dlmfs_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, dlmfs_fill_super);
 }
 
 static struct file_system_type dlmfs_fs_type = {
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 3415e0b09398..62237837a098 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -107,7 +107,8 @@ static int ocfs2_check_set_options(struct super_block *sb,
 static int ocfs2_show_options(struct seq_file *s, struct dentry *root);
 static void ocfs2_put_super(struct super_block *sb);
 static int ocfs2_mount_volume(struct super_block *sb);
-static int ocfs2_remount(struct super_block *sb, int *flags, char *data);
+static int ocfs2_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size);
 static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err);
 static int ocfs2_initialize_mem_caches(void);
 static void ocfs2_free_mem_caches(void);
@@ -633,7 +634,8 @@ static unsigned long long ocfs2_max_file_offset(unsigned int bbits,
 	return (((unsigned long long)bytes) << bitshift) - trim;
 }
 
-static int ocfs2_remount(struct super_block *sb, int *flags, char *data)
+static int ocfs2_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size)
 {
 	int incompat_features;
 	int ret = 0;
@@ -999,7 +1001,8 @@ static void ocfs2_disable_quotas(struct ocfs2_super *osb)
 	}
 }
 
-static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
+static int ocfs2_fill_super(struct super_block *sb, void *data, size_t data_size,
+			    int silent)
 {
 	struct dentry *root;
 	int status, sector_size;
@@ -1236,9 +1239,10 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
 static struct dentry *ocfs2_mount(struct file_system_type *fs_type,
 			int flags,
 			const char *dev_name,
-			void *data)
+			void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, ocfs2_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  ocfs2_fill_super);
 }
 
 static struct file_system_type ocfs2_fs_type = {
diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c
index ee14af9e26f2..e5258fefcd2b 100644
--- a/fs/omfs/inode.c
+++ b/fs/omfs/inode.c
@@ -454,7 +454,8 @@ static int parse_options(char *options, struct omfs_sb_info *sbi)
 	return 1;
 }
 
-static int omfs_fill_super(struct super_block *sb, void *data, int silent)
+static int omfs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct buffer_head *bh, *bh2;
 	struct omfs_super_block *omfs_sb;
@@ -596,9 +597,11 @@ static int omfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *omfs_mount(struct file_system_type *fs_type,
-			int flags, const char *dev_name, void *data)
+				 int flags, const char *dev_name,
+				 void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, omfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  omfs_fill_super);
 }
 
 static struct file_system_type omfs_fs_type = {
diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c
index 2200662a9bf1..6171138c76c7 100644
--- a/fs/openpromfs/inode.c
+++ b/fs/openpromfs/inode.c
@@ -366,7 +366,8 @@ static struct inode *openprom_iget(struct super_block *sb, ino_t ino)
 	return inode;
 }
 
-static int openprom_remount(struct super_block *sb, int *flags, char *data)
+static int openprom_remount(struct super_block *sb, int *flags,
+			    char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_NOATIME;
@@ -380,7 +381,8 @@ static const struct super_operations openprom_sops = {
 	.remount_fs	= openprom_remount,
 };
 
-static int openprom_fill_super(struct super_block *s, void *data, int silent)
+static int openprom_fill_super(struct super_block *s,
+			       void *data, size_t data_size, int silent)
 {
 	struct inode *root_inode;
 	struct op_inode_info *oi;
@@ -415,9 +417,10 @@ static int openprom_fill_super(struct super_block *s, void *data, int silent)
 }
 
 static struct dentry *openprom_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, openprom_fill_super);
+	return mount_single(fs_type, flags, data, data_size,
+			    openprom_fill_super);
 }
 
 static struct file_system_type openprom_fs_type = {
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index c29bb0ebc6bb..411725292d37 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -319,7 +319,7 @@ extern uint64_t orangefs_features;
 struct dentry *orangefs_mount(struct file_system_type *fst,
 			   int flags,
 			   const char *devname,
-			   void *data);
+			   void *data, size_t data_size);
 
 void orangefs_kill_sb(struct super_block *sb);
 int orangefs_remount(struct orangefs_sb_info_s *);
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 3ae5fdba0225..3f082ad5f355 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -206,7 +206,8 @@ static int orangefs_statfs(struct dentry *dentry, struct kstatfs *buf)
  * Remount as initiated by VFS layer.  We just need to reparse the mount
  * options, no need to signal pvfs2-client-core about it.
  */
-static int orangefs_remount_fs(struct super_block *sb, int *flags, char *data)
+static int orangefs_remount_fs(struct super_block *sb, int *flags,
+			       char *data, size_t data_size)
 {
 	gossip_debug(GOSSIP_SUPER_DEBUG, "orangefs_remount_fs: called\n");
 	return parse_mount_options(sb, data, 1);
@@ -456,7 +457,7 @@ static int orangefs_fill_sb(struct super_block *sb,
 struct dentry *orangefs_mount(struct file_system_type *fst,
 			   int flags,
 			   const char *devname,
-			   void *data)
+			   void *data, size_t data_size)
 {
 	int ret = -EINVAL;
 	struct super_block *sb = ERR_PTR(-EINVAL);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index e8551c97de51..b5548c5a19be 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -379,7 +379,8 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
 	return 0;
 }
 
-static int ovl_remount(struct super_block *sb, int *flags, char *data)
+static int ovl_remount(struct super_block *sb, int *flags,
+		       char *data, size_t data_size)
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
 
@@ -1355,7 +1356,8 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
 	goto out;
 }
 
-static int ovl_fill_super(struct super_block *sb, void *data, int silent)
+static int ovl_fill_super(struct super_block *sb, void *data, size_t data_size,
+			  int silent)
 {
 	struct path upperpath = { };
 	struct dentry *root_dentry;
@@ -1493,9 +1495,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *ovl_mount(struct file_system_type *fs_type, int flags,
-				const char *dev_name, void *raw_data)
+				const char *dev_name,
+				void *raw_data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, raw_data, ovl_fill_super);
+	return mount_nodev(fs_type, flags, raw_data, data_size, ovl_fill_super);
 }
 
 static struct file_system_type ovl_fs_type = {
diff --git a/fs/pipe.c b/fs/pipe.c
index 39d6f431da83..915032a76bfd 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1179,7 +1179,8 @@ static const struct super_operations pipefs_ops = {
  * d_name - pipe: will go nicely and kill the special-casing in procfs.
  */
 static struct dentry *pipefs_mount(struct file_system_type *fs_type,
-			 int flags, const char *dev_name, void *data)
+				   int flags, const char *dev_name,
+				   void *data, size_t data_size)
 {
 	return mount_pseudo(fs_type, "pipe:", &pipefs_ops,
 			&pipefs_dentry_operations, PIPEFS_MAGIC);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2cf3b74391ca..df65431c00be 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -490,7 +490,8 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 	return inode;
 }
 
-int proc_fill_super(struct super_block *s, void *data, int silent)
+int proc_fill_super(struct super_block *s, void *data, size_t data_size,
+		    int silent)
 {
 	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
 	struct inode *root_inode;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 0f1692e63cb6..3362732fffa3 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -195,7 +195,7 @@ extern const struct inode_operations proc_pid_link_inode_operations;
 void proc_init_kmemcache(void);
 void set_proc_pid_nlink(void);
 extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
-extern int proc_fill_super(struct super_block *, void *data, int flags);
+extern int proc_fill_super(struct super_block *, void *, size_t, int);
 extern void proc_entry_rundown(struct proc_dir_entry *);
 
 /*
@@ -256,7 +256,7 @@ extern struct proc_dir_entry proc_root;
 extern int proc_parse_options(char *options, struct pid_namespace *pid);
 
 extern void proc_self_init(void);
-extern int proc_remount(struct super_block *, int *, char *);
+extern int proc_remount(struct super_block *, int *, char *, size_t);
 
 /*
  * task_[no]mmu.c
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 61b7340b357a..99ce06c4e1a2 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -78,7 +78,8 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 	return 1;
 }
 
-int proc_remount(struct super_block *sb, int *flags, char *data)
+int proc_remount(struct super_block *sb, int *flags,
+		 char *data, size_t data_size)
 {
 	struct pid_namespace *pid = sb->s_fs_info;
 
@@ -87,7 +88,8 @@ int proc_remount(struct super_block *sb, int *flags, char *data)
 }
 
 static struct dentry *proc_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+				 int flags, const char *dev_name,
+				 void *data, size_t data_size)
 {
 	struct pid_namespace *ns;
 
@@ -98,7 +100,8 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 		ns = task_active_pid_ns(current);
 	}
 
-	return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
+	return mount_ns(fs_type, flags, data, data_size, ns, ns->user_ns,
+			proc_fill_super);
 }
 
 static void proc_kill_sb(struct super_block *sb)
@@ -212,7 +215,7 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 {
 	struct vfsmount *mnt;
 
-	mnt = kern_mount_data(&proc_fs_type, ns);
+	mnt = kern_mount_data(&proc_fs_type, ns, 0);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 5fcb845b9fec..793258231096 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -271,7 +271,8 @@ static int pstore_show_options(struct seq_file *m, struct dentry *root)
 	return 0;
 }
 
-static int pstore_remount(struct super_block *sb, int *flags, char *data)
+static int pstore_remount(struct super_block *sb, int *flags,
+			  char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	parse_options(data);
@@ -432,7 +433,8 @@ void pstore_get_records(int quiet)
 	inode_unlock(d_inode(root));
 }
 
-static int pstore_fill_super(struct super_block *sb, void *data, int silent)
+static int pstore_fill_super(struct super_block *sb,
+			     void *data, size_t data_size, int silent)
 {
 	struct inode *inode;
 
@@ -464,9 +466,9 @@ static int pstore_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *pstore_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, pstore_fill_super);
+	return mount_single(fs_type, flags, data, data_size, pstore_fill_super);
 }
 
 static void pstore_kill_sb(struct super_block *sb)
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 3d46fe302fcb..be35529c8052 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -29,7 +29,8 @@ static const struct super_operations qnx4_sops;
 
 static struct inode *qnx4_alloc_inode(struct super_block *sb);
 static void qnx4_destroy_inode(struct inode *inode);
-static int qnx4_remount(struct super_block *sb, int *flags, char *data);
+static int qnx4_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size);
 static int qnx4_statfs(struct dentry *, struct kstatfs *);
 
 static const struct super_operations qnx4_sops =
@@ -40,7 +41,8 @@ static const struct super_operations qnx4_sops =
 	.remount_fs	= qnx4_remount,
 };
 
-static int qnx4_remount(struct super_block *sb, int *flags, char *data)
+static int qnx4_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	struct qnx4_sb_info *qs;
 
@@ -183,7 +185,8 @@ static const char *qnx4_checkroot(struct super_block *sb,
 	return "bitmap file not found.";
 }
 
-static int qnx4_fill_super(struct super_block *s, void *data, int silent)
+static int qnx4_fill_super(struct super_block *s, void *data, size_t data_size,
+			   int silent)
 {
 	struct buffer_head *bh;
 	struct inode *root;
@@ -383,9 +386,10 @@ static void destroy_inodecache(void)
 }
 
 static struct dentry *qnx4_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, qnx4_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  qnx4_fill_super);
 }
 
 static struct file_system_type qnx4_fs_type = {
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
index 4aeb26bcb4d0..a415c1b5f936 100644
--- a/fs/qnx6/inode.c
+++ b/fs/qnx6/inode.c
@@ -30,7 +30,8 @@ static const struct super_operations qnx6_sops;
 static void qnx6_put_super(struct super_block *sb);
 static struct inode *qnx6_alloc_inode(struct super_block *sb);
 static void qnx6_destroy_inode(struct inode *inode);
-static int qnx6_remount(struct super_block *sb, int *flags, char *data);
+static int qnx6_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size);
 static int qnx6_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int qnx6_show_options(struct seq_file *seq, struct dentry *root);
 
@@ -53,7 +54,8 @@ static int qnx6_show_options(struct seq_file *seq, struct dentry *root)
 	return 0;
 }
 
-static int qnx6_remount(struct super_block *sb, int *flags, char *data)
+static int qnx6_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_RDONLY;
@@ -294,7 +296,8 @@ static struct buffer_head *qnx6_check_first_superblock(struct super_block *s,
 static struct inode *qnx6_private_inode(struct super_block *s,
 					struct qnx6_root_node *p);
 
-static int qnx6_fill_super(struct super_block *s, void *data, int silent)
+static int qnx6_fill_super(struct super_block *s, void *data, size_t data_size,
+			   int silent)
 {
 	struct buffer_head *bh1 = NULL, *bh2 = NULL;
 	struct qnx6_super_block *sb1 = NULL, *sb2 = NULL;
@@ -643,9 +646,10 @@ static void destroy_inodecache(void)
 }
 
 static struct dentry *qnx6_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, qnx6_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  qnx6_fill_super);
 }
 
 static struct file_system_type qnx6_fs_type = {
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index 11201b2d06b9..2e9b23b4a98b 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -217,7 +217,7 @@ static int ramfs_parse_options(char *data, struct ramfs_mount_opts *opts)
 	return 0;
 }
 
-int ramfs_fill_super(struct super_block *sb, void *data, int silent)
+int ramfs_fill_super(struct super_block *sb, void *data, size_t data_size, int silent)
 {
 	struct ramfs_fs_info *fsi;
 	struct inode *inode;
@@ -248,9 +248,9 @@ int ramfs_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 struct dentry *ramfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, ramfs_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, ramfs_fill_super);
 }
 
 static void ramfs_kill_sb(struct super_block *sb)
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 1fc934d24459..d8631cb38485 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -61,7 +61,8 @@ static int is_any_reiserfs_magic_string(struct reiserfs_super_block *rs)
 		is_reiserfs_jr(rs));
 }
 
-static int reiserfs_remount(struct super_block *s, int *flags, char *data);
+static int reiserfs_remount(struct super_block *s, int *flags,
+			    char *data, size_t data_size);
 static int reiserfs_statfs(struct dentry *dentry, struct kstatfs *buf);
 
 static int reiserfs_sync_fs(struct super_block *s, int wait)
@@ -1433,7 +1434,8 @@ static void handle_quota_files(struct super_block *s, char **qf_names,
 }
 #endif
 
-static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg)
+static int reiserfs_remount(struct super_block *s, int *mount_flags,
+			    char *arg, size_t data_size)
 {
 	struct reiserfs_super_block *rs;
 	struct reiserfs_transaction_handle th;
@@ -1898,7 +1900,8 @@ static int function2code(hashf_t func)
 	if (!(silent))				\
 		reiserfs_warning(s, id, __VA_ARGS__)
 
-static int reiserfs_fill_super(struct super_block *s, void *data, int silent)
+static int reiserfs_fill_super(struct super_block *s, void *data, size_t data_size,
+			       int silent)
 {
 	struct inode *root_inode;
 	struct reiserfs_transaction_handle th;
@@ -2600,9 +2603,10 @@ static ssize_t reiserfs_quota_write(struct super_block *sb, int type,
 
 static struct dentry *get_super_block(struct file_system_type *fs_type,
 			   int flags, const char *dev_name,
-			   void *data)
+			   void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, reiserfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  reiserfs_fill_super);
 }
 
 static int __init init_reiserfs_fs(void)
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index 8f06fd1f3d69..1c5b16ba3da7 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -448,7 +448,8 @@ static int romfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 /*
  * remounting must involve read-only
  */
-static int romfs_remount(struct super_block *sb, int *flags, char *data)
+static int romfs_remount(struct super_block *sb, int *flags,
+			 char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_RDONLY;
@@ -482,7 +483,8 @@ static __u32 romfs_checksum(const void *data, int size)
 /*
  * fill in the superblock
  */
-static int romfs_fill_super(struct super_block *sb, void *data, int silent)
+static int romfs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			    int silent)
 {
 	struct romfs_super_block *rsb;
 	struct inode *root;
@@ -575,16 +577,17 @@ static int romfs_fill_super(struct super_block *sb, void *data, int silent)
  */
 static struct dentry *romfs_mount(struct file_system_type *fs_type,
 			int flags, const char *dev_name,
-			void *data)
+			void *data, size_t data_size)
 {
 	struct dentry *ret = ERR_PTR(-EINVAL);
 
 #ifdef CONFIG_ROMFS_ON_MTD
-	ret = mount_mtd(fs_type, flags, dev_name, data, romfs_fill_super);
+	ret = mount_mtd(fs_type, flags, dev_name, data, data_size,
+			romfs_fill_super);
 #endif
 #ifdef CONFIG_ROMFS_ON_BLOCK
 	if (ret == ERR_PTR(-EINVAL))
-		ret = mount_bdev(fs_type, flags, dev_name, data,
+		ret = mount_bdev(fs_type, flags, dev_name, data, data_size,
 				  romfs_fill_super);
 #endif
 	return ret;
diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
index 8a73b97217c8..ed6881d97b3c 100644
--- a/fs/squashfs/super.c
+++ b/fs/squashfs/super.c
@@ -76,7 +76,8 @@ static const struct squashfs_decompressor *supported_squashfs_filesystem(short
 }
 
 
-static int squashfs_fill_super(struct super_block *sb, void *data, int silent)
+static int squashfs_fill_super(struct super_block *sb,
+			       void *data, size_t data_size, int silent)
 {
 	struct squashfs_sb_info *msblk;
 	struct squashfs_super_block *sblk = NULL;
@@ -370,7 +371,8 @@ static int squashfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 }
 
 
-static int squashfs_remount(struct super_block *sb, int *flags, char *data)
+static int squashfs_remount(struct super_block *sb, int *flags,
+			    char *data, size_t data_size)
 {
 	sync_filesystem(sb);
 	*flags |= SB_RDONLY;
@@ -398,9 +400,11 @@ static void squashfs_put_super(struct super_block *sb)
 
 
 static struct dentry *squashfs_mount(struct file_system_type *fs_type,
-				int flags, const char *dev_name, void *data)
+				     int flags, const char *dev_name,
+				     void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, squashfs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  squashfs_fill_super);
 }
 
 
diff --git a/fs/super.c b/fs/super.c
index f7c5629bbbda..9117e3447837 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -827,11 +827,13 @@ struct super_block *user_get_super(dev_t dev)
  *	@sb:	superblock in question
  *	@sb_flags: revised superblock flags
  *	@data:	the rest of options
+ *	@data_size: The size of the data
  *      @force: whether or not to force the change
  *
  *	Alters the mount options of a mounted file system.
  */
-int do_remount_sb(struct super_block *sb, int sb_flags, void *data, int force)
+int do_remount_sb(struct super_block *sb, int sb_flags, void *data,
+		  size_t data_size, int force)
 {
 	int retval;
 	int remount_ro;
@@ -874,7 +876,7 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data, int force)
 	}
 
 	if (sb->s_op->remount_fs) {
-		retval = sb->s_op->remount_fs(sb, &sb_flags, data);
+		retval = sb->s_op->remount_fs(sb, &sb_flags, data, data_size);
 		if (retval) {
 			if (!force)
 				goto cancel_readonly;
@@ -913,7 +915,7 @@ static void do_emergency_remount_callback(struct super_block *sb)
 		/*
 		 * What lock protects sb->s_flags??
 		 */
-		do_remount_sb(sb, SB_RDONLY, NULL, 1);
+		do_remount_sb(sb, SB_RDONLY, NULL, 0, 1);
 	}
 	up_write(&sb->s_umount);
 }
@@ -1062,8 +1064,9 @@ static int ns_set_super(struct super_block *sb, void *data)
 }
 
 struct dentry *mount_ns(struct file_system_type *fs_type,
-	int flags, void *data, void *ns, struct user_namespace *user_ns,
-	int (*fill_super)(struct super_block *, void *, int))
+	int flags, void *data, size_t data_size,
+	void *ns, struct user_namespace *user_ns,
+	int (*fill_super)(struct super_block *, void *, size_t, int))
 {
 	struct super_block *sb;
 
@@ -1080,7 +1083,7 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
 
 	if (!sb->s_root) {
 		int err;
-		err = fill_super(sb, data, flags & SB_SILENT ? 1 : 0);
+		err = fill_super(sb, data, data_size, flags & SB_SILENT ? 1 : 0);
 		if (err) {
 			deactivate_locked_super(sb);
 			return ERR_PTR(err);
@@ -1110,8 +1113,8 @@ static int test_bdev_super(struct super_block *s, void *data)
 }
 
 struct dentry *mount_bdev(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data,
-	int (*fill_super)(struct super_block *, void *, int))
+	int flags, const char *dev_name, void *data, size_t data_size,
+	int (*fill_super)(struct super_block *, void *, size_t, int))
 {
 	struct block_device *bdev;
 	struct super_block *s;
@@ -1163,7 +1166,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 		s->s_mode = mode;
 		snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
 		sb_set_blocksize(s, block_size(bdev));
-		error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
+		error = fill_super(s, data, data_size, flags & SB_SILENT ? 1 : 0);
 		if (error) {
 			deactivate_locked_super(s);
 			goto error;
@@ -1200,8 +1203,8 @@ EXPORT_SYMBOL(kill_block_super);
 #endif
 
 struct dentry *mount_nodev(struct file_system_type *fs_type,
-	int flags, void *data,
-	int (*fill_super)(struct super_block *, void *, int))
+	int flags, void *data, size_t data_size,
+	int (*fill_super)(struct super_block *, void *, size_t, int))
 {
 	int error;
 	struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL);
@@ -1209,7 +1212,7 @@ struct dentry *mount_nodev(struct file_system_type *fs_type,
 	if (IS_ERR(s))
 		return ERR_CAST(s);
 
-	error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
+	error = fill_super(s, data, data_size, flags & SB_SILENT ? 1 : 0);
 	if (error) {
 		deactivate_locked_super(s);
 		return ERR_PTR(error);
@@ -1225,8 +1228,8 @@ static int compare_single(struct super_block *s, void *p)
 }
 
 struct dentry *mount_single(struct file_system_type *fs_type,
-	int flags, void *data,
-	int (*fill_super)(struct super_block *, void *, int))
+	int flags, void *data, size_t data_size,
+	int (*fill_super)(struct super_block *, void *, size_t, int))
 {
 	struct super_block *s;
 	int error;
@@ -1235,21 +1238,22 @@ struct dentry *mount_single(struct file_system_type *fs_type,
 	if (IS_ERR(s))
 		return ERR_CAST(s);
 	if (!s->s_root) {
-		error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
+		error = fill_super(s, data, data_size, flags & SB_SILENT ? 1 : 0);
 		if (error) {
 			deactivate_locked_super(s);
 			return ERR_PTR(error);
 		}
 		s->s_flags |= SB_ACTIVE;
 	} else {
-		do_remount_sb(s, flags, data, 0);
+		do_remount_sb(s, flags, data, data_size, 0);
 	}
 	return dget(s->s_root);
 }
 EXPORT_SYMBOL(mount_single);
 
 struct dentry *
-mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
+mount_fs(struct file_system_type *type, int flags, const char *name,
+	 void *data, size_t data_size)
 {
 	struct dentry *root;
 	struct super_block *sb;
@@ -1261,12 +1265,12 @@ mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
 		if (!secdata)
 			goto out;
 
-		error = security_sb_copy_data(data, secdata);
+		error = security_sb_copy_data(data, data_size, secdata);
 		if (error)
 			goto out_free_secdata;
 	}
 
-	root = type->mount(type, flags, name, data);
+	root = type->mount(type, flags, name, data, data_size);
 	if (IS_ERR(root)) {
 		error = PTR_ERR(root);
 		goto out_free_secdata;
@@ -1276,7 +1280,7 @@ mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
 	WARN_ON(!sb->s_bdi);
 	sb->s_flags |= SB_BORN;
 
-	error = security_sb_kern_mount(sb, flags, secdata);
+	error = security_sb_kern_mount(sb, flags, secdata, data_size);
 	if (error)
 		goto out_sb;
 
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index b428d317ae92..b2b7d9ae4aba 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -21,7 +21,7 @@ static struct kernfs_root *sysfs_root;
 struct kernfs_node *sysfs_root_kn;
 
 static struct dentry *sysfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
 	struct dentry *root;
 	void *ns;
diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index bec9f79adb25..47f66bbc4578 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -57,7 +57,8 @@ static int sysv_sync_fs(struct super_block *sb, int wait)
 	return 0;
 }
 
-static int sysv_remount(struct super_block *sb, int *flags, char *data)
+static int sysv_remount(struct super_block *sb, int *flags,
+			char *data, size_t data_size)
 {
 	struct sysv_sb_info *sbi = SYSV_SB(sb);
 
diff --git a/fs/sysv/super.c b/fs/sysv/super.c
index 89765ddfb738..275c7038eecd 100644
--- a/fs/sysv/super.c
+++ b/fs/sysv/super.c
@@ -349,7 +349,8 @@ static int complete_read_super(struct super_block *sb, int silent, int size)
 	return 1;
 }
 
-static int sysv_fill_super(struct super_block *sb, void *data, int silent)
+static int sysv_fill_super(struct super_block *sb, void *data, size_t data_size,
+			   int silent)
 {
 	struct buffer_head *bh1, *bh = NULL;
 	struct sysv_sb_info *sbi;
@@ -470,7 +471,8 @@ static int v7_sanity_check(struct super_block *sb, struct buffer_head *bh)
 	return 1;
 }
 
-static int v7_fill_super(struct super_block *sb, void *data, int silent)
+static int v7_fill_super(struct super_block *sb, void *data, size_t data_size,
+			 int silent)
 {
 	struct sysv_sb_info *sbi;
 	struct buffer_head *bh;
@@ -528,15 +530,17 @@ static int v7_fill_super(struct super_block *sb, void *data, int silent)
 /* Every kernel module contains stuff like this. */
 
 static struct dentry *sysv_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, sysv_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  sysv_fill_super);
 }
 
 static struct dentry *v7_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, v7_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  v7_fill_super);
 }
 
 static struct file_system_type sysv_fs_type = {
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index bea8ad876bf9..85b3f230e202 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -225,7 +225,8 @@ static int tracefs_apply_options(struct super_block *sb)
 	return 0;
 }
 
-static int tracefs_remount(struct super_block *sb, int *flags, char *data)
+static int tracefs_remount(struct super_block *sb, int *flags,
+			   char *data, size_t data_size)
 {
 	int err;
 	struct tracefs_fs_info *fsi = sb->s_fs_info;
@@ -264,7 +265,8 @@ static const struct super_operations tracefs_super_operations = {
 	.show_options	= tracefs_show_options,
 };
 
-static int trace_fill_super(struct super_block *sb, void *data, int silent)
+static int trace_fill_super(struct super_block *sb,
+			    void *data, size_t data_size, int silent)
 {
 	static const struct tree_descr trace_files[] = {{""}};
 	struct tracefs_fs_info *fsi;
@@ -299,9 +301,9 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *trace_mount(struct file_system_type *fs_type,
 			int flags, const char *dev_name,
-			void *data)
+			void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, trace_fill_super);
+	return mount_single(fs_type, flags, data, data_size, trace_fill_super);
 }
 
 static struct file_system_type trace_fs_type = {
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 6c397a389105..90144fecbf27 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1842,7 +1842,8 @@ static void ubifs_put_super(struct super_block *sb)
 	mutex_unlock(&c->umount_mutex);
 }
 
-static int ubifs_remount_fs(struct super_block *sb, int *flags, char *data)
+static int ubifs_remount_fs(struct super_block *sb, int *flags,
+			    char *data, size_t data_size)
 {
 	int err;
 	struct ubifs_info *c = sb->s_fs_info;
@@ -2105,7 +2106,7 @@ static int sb_set(struct super_block *sb, void *data)
 }
 
 static struct dentry *ubifs_mount(struct file_system_type *fs_type, int flags,
-			const char *name, void *data)
+			const char *name, void *data, size_t data_size)
 {
 	struct ubi_volume_desc *ubi;
 	struct ubifs_info *c;
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 7949c338efa5..91212c33c8d7 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -87,10 +87,10 @@ enum {
 enum { UDF_MAX_LINKS = 0xffff };
 
 /* These are the "meat" - everything else is stuffing */
-static int udf_fill_super(struct super_block *, void *, int);
+static int udf_fill_super(struct super_block *, void *, size_t, int);
 static void udf_put_super(struct super_block *);
 static int udf_sync_fs(struct super_block *, int);
-static int udf_remount_fs(struct super_block *, int *, char *);
+static int udf_remount_fs(struct super_block *, int *, char *, size_t);
 static void udf_load_logicalvolint(struct super_block *, struct kernel_extent_ad);
 static int udf_find_fileset(struct super_block *, struct kernel_lb_addr *,
 			    struct kernel_lb_addr *);
@@ -126,9 +126,11 @@ struct logicalVolIntegrityDescImpUse *udf_sb_lvidiu(struct super_block *sb)
 
 /* UDF filesystem type */
 static struct dentry *udf_mount(struct file_system_type *fs_type,
-		      int flags, const char *dev_name, void *data)
+				int flags, const char *dev_name,
+				void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, udf_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  udf_fill_super);
 }
 
 static struct file_system_type udf_fstype = {
@@ -610,7 +612,8 @@ static int udf_parse_options(char *options, struct udf_options *uopt,
 	return 1;
 }
 
-static int udf_remount_fs(struct super_block *sb, int *flags, char *options)
+static int udf_remount_fs(struct super_block *sb, int *flags,
+			  char *options, size_t data_size)
 {
 	struct udf_options uopt;
 	struct udf_sb_info *sbi = UDF_SB(sb);
@@ -2083,7 +2086,8 @@ u64 lvid_get_unique_id(struct super_block *sb)
 	return ret;
 }
 
-static int udf_fill_super(struct super_block *sb, void *options, int silent)
+static int udf_fill_super(struct super_block *sb,
+			  void *options, size_t data_size, int silent)
 {
 	int ret = -EINVAL;
 	struct inode *inode = NULL;
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 8254b8b3690f..b52917639e30 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -772,7 +772,8 @@ static u64 ufs_max_bytes(struct super_block *sb)
 	return res << uspi->s_bshift;
 }
 
-static int ufs_fill_super(struct super_block *sb, void *data, int silent)
+static int ufs_fill_super(struct super_block *sb, void *data, size_t data_size,
+			  int silent)
 {
 	struct ufs_sb_info * sbi;
 	struct ufs_sb_private_info * uspi;
@@ -1295,7 +1296,8 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 	return -ENOMEM;
 }
 
-static int ufs_remount (struct super_block *sb, int *mount_flags, char *data)
+static int ufs_remount (struct super_block *sb, int *mount_flags,
+			char *data, size_t data_size)
 {
 	struct ufs_sb_private_info * uspi;
 	struct ufs_super_block_first * usb1;
@@ -1503,9 +1505,10 @@ static const struct super_operations ufs_super_ops = {
 };
 
 static struct dentry *ufs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, ufs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  ufs_fill_super);
 }
 
 static struct file_system_type ufs_fs_type = {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d71424052917..0d91c924b8f5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1268,7 +1268,8 @@ STATIC int
 xfs_fs_remount(
 	struct super_block	*sb,
 	int			*flags,
-	char			*options)
+	char			*options,
+	size_t			data_size)
 {
 	struct xfs_mount	*mp = XFS_M(sb);
 	xfs_sb_t		*sbp = &mp->m_sb;
@@ -1607,6 +1608,7 @@ STATIC int
 xfs_fs_fill_super(
 	struct super_block	*sb,
 	void			*data,
+	size_t			data_size,
 	int			silent)
 {
 	struct inode		*root;
@@ -1796,9 +1798,11 @@ xfs_fs_mount(
 	struct file_system_type	*fs_type,
 	int			flags,
 	const char		*dev_name,
-	void			*data)
+	void			*data,
+	size_t			data_size)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, xfs_fs_fill_super);
+	return mount_bdev(fs_type, flags, dev_name, data, data_size,
+			  xfs_fs_fill_super);
 }
 
 static long
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 3b0ba54cc4d5..a02de1b397ca 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -75,11 +75,11 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
 struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
 				      const char *dest);
 
-typedef struct vfsmount *(*debugfs_automount_t)(struct dentry *, void *);
+typedef struct vfsmount *(*debugfs_automount_t)(struct dentry *, void *, size_t);
 struct dentry *debugfs_create_automount(const char *name,
 					struct dentry *parent,
 					debugfs_automount_t f,
-					void *data);
+					void *data, size_t data_size);
 
 void debugfs_remove(struct dentry *dentry);
 void debugfs_remove_recursive(struct dentry *dentry);
@@ -204,8 +204,8 @@ static inline struct dentry *debugfs_create_symlink(const char *name,
 
 static inline struct dentry *debugfs_create_automount(const char *name,
 					struct dentry *parent,
-					struct vfsmount *(*f)(void *),
-					void *data)
+					struct vfsmount *(*f)(void *, size_t),
+					void *data, size_t data_size)
 {
 	return ERR_PTR(-ENODEV);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..9703931cf095 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1827,7 +1827,7 @@ struct super_operations {
 	int (*thaw_super) (struct super_block *);
 	int (*unfreeze_fs) (struct super_block *);
 	int (*statfs) (struct dentry *, struct kstatfs *);
-	int (*remount_fs) (struct super_block *, int *, char *);
+	int (*remount_fs) (struct super_block *, int *, char *, size_t);
 	void (*umount_begin) (struct super_block *);
 
 	int (*show_options)(struct seq_file *, struct dentry *);
@@ -2075,7 +2075,7 @@ struct file_system_type {
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
-		       const char *, void *);
+				 const char *, void *, size_t);
 	void (*kill_sb) (struct super_block *);
 	struct module *owner;
 	struct file_system_type * next;
@@ -2094,26 +2094,27 @@ struct file_system_type {
 #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME)
 
 extern struct dentry *mount_ns(struct file_system_type *fs_type,
-	int flags, void *data, void *ns, struct user_namespace *user_ns,
-	int (*fill_super)(struct super_block *, void *, int));
+	int flags, void *data, size_t data_size,
+	void *ns, struct user_namespace *user_ns,
+	int (*fill_super)(struct super_block *, void *, size_t, int));
 #ifdef CONFIG_BLOCK
 extern struct dentry *mount_bdev(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data,
-	int (*fill_super)(struct super_block *, void *, int));
+	int flags, const char *dev_name, void *data, size_t data_size,
+	int (*fill_super)(struct super_block *, void *, size_t, int));
 #else
 static inline struct dentry *mount_bdev(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data,
-	int (*fill_super)(struct super_block *, void *, int))
+	int flags, const char *dev_name, void *data, size_t data_size,
+	int (*fill_super)(struct super_block *, void *, size_t, int))
 {
 	return ERR_PTR(-ENODEV);
 }
 #endif
 extern struct dentry *mount_single(struct file_system_type *fs_type,
-	int flags, void *data,
-	int (*fill_super)(struct super_block *, void *, int));
+	int flags, void *data, size_t data_size,
+	int (*fill_super)(struct super_block *, void *, size_t, int));
 extern struct dentry *mount_nodev(struct file_system_type *fs_type,
-	int flags, void *data,
-	int (*fill_super)(struct super_block *, void *, int));
+	int flags, void *data, size_t data_size,
+	int (*fill_super)(struct super_block *, void *, size_t, int));
 extern struct dentry *mount_subtree(struct vfsmount *mnt, const char *path);
 void generic_shutdown_super(struct super_block *sb);
 #ifdef CONFIG_BLOCK
@@ -2173,8 +2174,8 @@ mount_pseudo(struct file_system_type *fs_type, char *name,
 
 extern int register_filesystem(struct file_system_type *);
 extern int unregister_filesystem(struct file_system_type *);
-extern struct vfsmount *kern_mount_data(struct file_system_type *, void *data);
-#define kern_mount(type) kern_mount_data(type, NULL)
+extern struct vfsmount *kern_mount_data(struct file_system_type *, void *, size_t);
+#define kern_mount(type) kern_mount_data(type, NULL, 0)
 extern void kern_unmount(struct vfsmount *mnt);
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index da20f90d40bb..d533ca038604 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -148,6 +148,7 @@
  *	@type contains the filesystem type.
  *	@flags contains the mount flags.
  *	@data contains the filesystem-specific data.
+ *	@data_size contains the size of the data.
  *	Return 0 if permission is granted.
  * @sb_copy_data:
  *	Allow mount option data to be copied prior to parsing by the filesystem,
@@ -157,6 +158,7 @@
  *	specific options to avoid having to make filesystems aware of them.
  *	@type the type of filesystem being mounted.
  *	@orig the original mount data copied from userspace.
+ *	@orig_data is the size of the original data
  *	@copy copied data which will be passed to the security module.
  *	Returns 0 if the copy was successful.
  * @sb_remount:
@@ -164,6 +166,7 @@
  *	are being made to those options.
  *	@sb superblock being remounted
  *	@data contains the filesystem-specific data.
+ *	@data_size contains the size of the data.
  *	Return 0 if permission is granted.
  * @sb_umount:
  *	Check permission before the @mnt file system is unmounted.
@@ -1506,13 +1509,15 @@ union security_list_options {
 
 	int (*sb_alloc_security)(struct super_block *sb);
 	void (*sb_free_security)(struct super_block *sb);
-	int (*sb_copy_data)(char *orig, char *copy);
-	int (*sb_remount)(struct super_block *sb, void *data);
-	int (*sb_kern_mount)(struct super_block *sb, int flags, void *data);
+	int (*sb_copy_data)(char *orig, size_t orig_size, char *copy);
+	int (*sb_remount)(struct super_block *sb, void *data, size_t data_size);
+	int (*sb_kern_mount)(struct super_block *sb, int flags,
+			     void *data, size_t data_size);
 	int (*sb_show_options)(struct seq_file *m, struct super_block *sb);
 	int (*sb_statfs)(struct dentry *dentry);
 	int (*sb_mount)(const char *dev_name, const struct path *path,
-			const char *type, unsigned long flags, void *data);
+			const char *type, unsigned long flags,
+			void *data, size_t data_size);
 	int (*sb_umount)(struct vfsmount *mnt, int flags);
 	int (*sb_pivotroot)(const struct path *old_path, const struct path *new_path);
 	int (*sb_set_mnt_opts)(struct super_block *sb,
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 45b1f56c6c2f..8a1031a511c9 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -90,10 +90,11 @@ extern struct vfsmount *clone_private_mount(const struct path *path);
 struct file_system_type;
 extern struct vfsmount *vfs_kern_mount(struct file_system_type *type,
 				      int flags, const char *name,
-				      void *data);
+				      void *data, size_t data_size);
 extern struct vfsmount *vfs_submount(const struct dentry *mountpoint,
 				     struct file_system_type *type,
-				     const char *name, void *data);
+				     const char *name,
+				     void *data, size_t data_size);
 
 extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list);
 extern void mark_mounts_for_expiry(struct list_head *mounts);
diff --git a/include/linux/mtd/super.h b/include/linux/mtd/super.h
index f456230f9330..3f37c7cd711c 100644
--- a/include/linux/mtd/super.h
+++ b/include/linux/mtd/super.h
@@ -19,8 +19,8 @@
 #include <linux/mount.h>
 
 extern struct dentry *mount_mtd(struct file_system_type *fs_type, int flags,
-		      const char *dev_name, void *data,
-		      int (*fill_super)(struct super_block *, void *, int));
+		      const char *dev_name, void *data, size_t data_size,
+		      int (*fill_super)(struct super_block *, void *, size_t, int));
 extern void kill_mtd_super(struct super_block *sb);
 
 
diff --git a/include/linux/ramfs.h b/include/linux/ramfs.h
index 5ef7d54caac2..6d64e6be9928 100644
--- a/include/linux/ramfs.h
+++ b/include/linux/ramfs.h
@@ -5,7 +5,7 @@
 struct inode *ramfs_get_inode(struct super_block *sb, const struct inode *dir,
 	 umode_t mode, dev_t dev);
 extern struct dentry *ramfs_mount(struct file_system_type *fs_type,
-	 int flags, const char *dev_name, void *data);
+	 int flags, const char *dev_name, void *data, size_t data_size);
 
 #ifdef CONFIG_MMU
 static inline int
@@ -21,6 +21,6 @@ extern const struct file_operations ramfs_file_operations;
 extern const struct vm_operations_struct generic_file_vm_ops;
 extern int __init init_ramfs_fs(void);
 
-int ramfs_fill_super(struct super_block *sb, void *data, int silent);
+int ramfs_fill_super(struct super_block *sb, void *data, size_t data_size, int silent);
 
 #endif
diff --git a/include/linux/security.h b/include/linux/security.h
index 60a85bd9dfef..22b83dc28bd3 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -243,13 +243,13 @@ int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags);
 int security_sb_alloc(struct super_block *sb);
 void security_sb_free(struct super_block *sb);
-int security_sb_copy_data(char *orig, char *copy);
-int security_sb_remount(struct super_block *sb, void *data);
-int security_sb_kern_mount(struct super_block *sb, int flags, void *data);
+int security_sb_copy_data(char *orig, size_t orig_size, char *copy);
+int security_sb_remount(struct super_block *sb, void *data, size_t data_size);
+int security_sb_kern_mount(struct super_block *sb, int flags, void *data, size_t data_size);
 int security_sb_show_options(struct seq_file *m, struct super_block *sb);
 int security_sb_statfs(struct dentry *dentry);
 int security_sb_mount(const char *dev_name, const struct path *path,
-		      const char *type, unsigned long flags, void *data);
+		      const char *type, unsigned long flags, void *data, size_t data_size);
 int security_sb_umount(struct vfsmount *mnt, int flags);
 int security_sb_pivotroot(const struct path *old_path, const struct path *new_path);
 int security_sb_set_mnt_opts(struct super_block *sb,
@@ -591,17 +591,18 @@ static inline int security_sb_alloc(struct super_block *sb)
 static inline void security_sb_free(struct super_block *sb)
 { }
 
-static inline int security_sb_copy_data(char *orig, char *copy)
+static inline int security_sb_copy_data(char *orig, size_t orig_size, char *copy)
 {
 	return 0;
 }
 
-static inline int security_sb_remount(struct super_block *sb, void *data)
+static inline int security_sb_remount(struct super_block *sb, void *data, size_t data_size)
 {
 	return 0;
 }
 
-static inline int security_sb_kern_mount(struct super_block *sb, int flags, void *data)
+static inline int security_sb_kern_mount(struct super_block *sb, int flags,
+					 void *data, size_t data_size)
 {
 	return 0;
 }
@@ -619,7 +620,7 @@ static inline int security_sb_statfs(struct dentry *dentry)
 
 static inline int security_sb_mount(const char *dev_name, const struct path *path,
 				    const char *type, unsigned long flags,
-				    void *data)
+				    void *data, size_t data_size)
 {
 	return 0;
 }
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 73b5e655a76e..f170fc673047 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -49,7 +49,8 @@ static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
  * Functions in mm/shmem.c called directly from elsewhere:
  */
 extern int shmem_init(void);
-extern int shmem_fill_super(struct super_block *sb, void *data, int silent);
+extern int shmem_fill_super(struct super_block *sb, void *data, size_t data_size,
+			    int silent);
 extern struct file *shmem_file_setup(const char *name,
 					loff_t size, unsigned long flags);
 extern struct file *shmem_kernel_file_setup(const char *name, loff_t size,
diff --git a/init/do_mounts.c b/init/do_mounts.c
index ea6f21bb9440..d4fc2a5afdb6 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -606,7 +606,7 @@ void __init prepare_namespace(void)
 
 static bool is_tmpfs;
 static struct dentry *rootfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
 	static unsigned long once;
 	void *fill = ramfs_fill_super;
@@ -617,7 +617,7 @@ static struct dentry *rootfs_mount(struct file_system_type *fs_type,
 	if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
 		fill = shmem_fill_super;
 
-	return mount_nodev(fs_type, flags, data, fill);
+	return mount_nodev(fs_type, flags, data, data_size, fill);
 }
 
 static struct file_system_type rootfs_fs_type = {
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index a808f29d4c5a..910c3c7532e6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -322,7 +322,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 	return ERR_PTR(ret);
 }
 
-static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
+static int mqueue_fill_super(struct super_block *sb, void *data, size_t data_size, int silent)
 {
 	struct inode *inode;
 	struct ipc_namespace *ns = sb->s_fs_info;
@@ -345,7 +345,7 @@ static int mqueue_fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 			 int flags, const char *dev_name,
-			 void *data)
+			 void *data, size_t data_size)
 {
 	struct ipc_namespace *ns;
 	if (flags & SB_KERNMOUNT) {
@@ -354,7 +354,8 @@ static struct dentry *mqueue_mount(struct file_system_type *fs_type,
 	} else {
 		ns = current->nsproxy->ipc_ns;
 	}
-	return mount_ns(fs_type, flags, data, ns, ns->user_ns, mqueue_fill_super);
+	return mount_ns(fs_type, flags, data, data_size, ns, ns->user_ns,
+			mqueue_fill_super);
 }
 
 static void init_once(void *foo)
@@ -1536,7 +1537,7 @@ int mq_init_ns(struct ipc_namespace *ns)
 	ns->mq_msg_default   = DFLT_MSG;
 	ns->mq_msgsize_default  = DFLT_MSGSIZE;
 
-	ns->mq_mnt = kern_mount_data(&mqueue_fs_type, ns);
+	ns->mq_mnt = kern_mount_data(&mqueue_fs_type, ns, 0);
 	if (IS_ERR(ns->mq_mnt)) {
 		int err = PTR_ERR(ns->mq_mnt);
 		ns->mq_mnt = NULL;
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index bf6da59ae0d0..d663a1efcfcc 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -480,7 +480,7 @@ static int bpf_parse_options(char *data, struct bpf_mount_opts *opts)
 	return 0;
 }
 
-static int bpf_fill_super(struct super_block *sb, void *data, int silent)
+static int bpf_fill_super(struct super_block *sb, void *data, size_t data_size, int silent)
 {
 	static const struct tree_descr bpf_rfiles[] = { { "" } };
 	struct bpf_mount_opts opts;
@@ -506,9 +506,10 @@ static int bpf_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *bpf_mount(struct file_system_type *type, int flags,
-				const char *dev_name, void *data)
+				const char *dev_name, void *data,
+				size_t data_size)
 {
-	return mount_nodev(type, flags, data, bpf_fill_super);
+	return mount_nodev(type, flags, data, data_size, bpf_fill_super);
 }
 
 static struct file_system_type bpf_fs_type = {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a662bfcbea0e..c2678e4800fc 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2010,7 +2010,7 @@ struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
 
 static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 			 int flags, const char *unused_dev_name,
-			 void *data)
+			 void *data, size_t data_size)
 {
 	struct cgroup_namespace *ns = current->nsproxy->cgroup_ns;
 	struct dentry *dentry;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b42037e6e81d..3c8ef37879f0 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -316,7 +316,8 @@ static inline bool is_in_v2_mode(void)
  * silently switch it to mount "cgroup" instead
  */
 static struct dentry *cpuset_mount(struct file_system_type *fs_type,
-			 int flags, const char *unused_dev_name, void *data)
+				   int flags, const char *unused_dev_name,
+				   void *data, size_t data_size)
 {
 	struct file_system_type *cgroup_fs = get_fs_type("cgroup");
 	struct dentry *ret = ERR_PTR(-ENODEV);
@@ -324,8 +325,8 @@ static struct dentry *cpuset_mount(struct file_system_type *fs_type,
 		char mountopts[] =
 			"cpuset,noprefix,"
 			"release_agent=/sbin/cpuset_release_agent";
-		ret = cgroup_fs->mount(cgroup_fs, flags,
-					   unused_dev_name, mountopts);
+		ret = cgroup_fs->mount(cgroup_fs, flags, unused_dev_name,
+				       mountopts, data_size);
 		put_filesystem(cgroup_fs);
 	}
 	return ret;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dfbcf9ee1447..ed301a4235e6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7961,7 +7961,8 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 	ftrace_init_tracefs(tr, d_tracer);
 }
 
-static struct vfsmount *trace_automount(struct dentry *mntpt, void *ingore)
+static struct vfsmount *trace_automount(struct dentry *mntpt,
+					void *data, size_t data_size)
 {
 	struct vfsmount *mnt;
 	struct file_system_type *type;
@@ -7974,7 +7975,7 @@ static struct vfsmount *trace_automount(struct dentry *mntpt, void *ingore)
 	type = get_fs_type("tracefs");
 	if (!type)
 		return NULL;
-	mnt = vfs_submount(mntpt, type, "tracefs", NULL);
+	mnt = vfs_submount(mntpt, type, "tracefs", NULL, 0);
 	put_filesystem(type);
 	if (IS_ERR(mnt))
 		return NULL;
@@ -8010,7 +8011,7 @@ struct dentry *tracing_init_dentry(void)
 	 * work with the newer kerenl.
 	 */
 	tr->dir = debugfs_create_automount("tracing", NULL,
-					   trace_automount, NULL);
+					   trace_automount, NULL, 0);
 	if (!tr->dir) {
 		pr_warn_once("Could not create debugfs directory 'tracing'\n");
 		return ERR_PTR(-ENOMEM);
diff --git a/mm/shmem.c b/mm/shmem.c
index 9d6c7e595415..76838f26822f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3602,7 +3602,8 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
 
 }
 
-static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
+static int shmem_remount_fs(struct super_block *sb, int *flags,
+			    char *data, size_t data_size)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
 	struct shmem_sb_info config = *sbinfo;
@@ -3772,7 +3773,8 @@ static void shmem_put_super(struct super_block *sb)
 	sb->s_fs_info = NULL;
 }
 
-int shmem_fill_super(struct super_block *sb, void *data, int silent)
+int shmem_fill_super(struct super_block *sb, void *data, size_t data_size,
+		     int silent)
 {
 	struct inode *inode;
 	struct shmem_sb_info *sbinfo;
@@ -3986,9 +3988,9 @@ static const struct vm_operations_struct shmem_vm_ops = {
 };
 
 static struct dentry *shmem_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+	int flags, const char *dev_name, void *data, size_t data_size)
 {
-	return mount_nodev(fs_type, flags, data, shmem_fill_super);
+	return mount_nodev(fs_type, flags, data, data_size, shmem_fill_super);
 }
 
 static struct file_system_type shmem_fs_type = {
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 61cb05dc950c..dc60f6d89f31 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1814,7 +1814,8 @@ static enum fullness_group putback_zspage(struct size_class *class,
 
 #ifdef CONFIG_COMPACTION
 static struct dentry *zs_mount(struct file_system_type *fs_type,
-				int flags, const char *dev_name, void *data)
+			       int flags, const char *dev_name,
+			       void *data, size_t data_size)
 {
 	static const struct dentry_operations ops = {
 		.d_dname = simple_dname,
diff --git a/net/socket.c b/net/socket.c
index f10f1d947c78..34d3dd0f8ba3 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -353,7 +353,8 @@ static const struct xattr_handler *sockfs_xattr_handlers[] = {
 };
 
 static struct dentry *sockfs_mount(struct file_system_type *fs_type,
-			 int flags, const char *dev_name, void *data)
+				   int flags, const char *dev_name,
+				   void *data, size_t data_size)
 {
 	return mount_pseudo_xattr(fs_type, "socket:", &sockfs_ops,
 				  sockfs_xattr_handlers,
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 0f08934b2cea..7ae4237e3ddd 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -1382,7 +1382,7 @@ rpc_gssd_dummy_depopulate(struct dentry *pipe_dentry)
 }
 
 static int
-rpc_fill_super(struct super_block *sb, void *data, int silent)
+rpc_fill_super(struct super_block *sb, void *data, size_t data_size, int silent)
 {
 	struct inode *inode;
 	struct dentry *root, *gssd_dentry;
@@ -1445,10 +1445,11 @@ EXPORT_SYMBOL_GPL(gssd_running);
 
 static struct dentry *
 rpc_mount(struct file_system_type *fs_type,
-		int flags, const char *dev_name, void *data)
+	  int flags, const char *dev_name, void *data, size_t data_size)
 {
 	struct net *net = current->nsproxy->net_ns;
-	return mount_ns(fs_type, flags, data, net, net->user_ns, rpc_fill_super);
+	return mount_ns(fs_type, flags, data, data_size,
+			net, net->user_ns, rpc_fill_super);
 }
 
 static void rpc_kill_sb(struct super_block *sb)
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 949dd8a48164..04548c8102f3 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -137,7 +137,8 @@ static const struct super_operations aafs_super_ops = {
 	.show_path = aafs_show_path,
 };
 
-static int fill_super(struct super_block *sb, void *data, int silent)
+static int fill_super(struct super_block *sb, void *data, size_t data_size,
+		      int silent)
 {
 	static struct tree_descr files[] = { {""} };
 	int error;
@@ -151,9 +152,10 @@ static int fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *aafs_mount(struct file_system_type *fs_type,
-				 int flags, const char *dev_name, void *data)
+				 int flags, const char *dev_name, void *data,
+				 size_t data_size)
 {
-	return mount_single(fs_type, flags, data, fill_super);
+	return mount_single(fs_type, flags, data, data_size, fill_super);
 }
 
 static struct file_system_type aafs_ops = {
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 14398dec2e38..d60c3b9c6a07 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -591,7 +591,8 @@ static int apparmor_sb_mountpoint(struct fs_context *fc, struct path *mountpoint
 }
 
 static int apparmor_sb_mount(const char *dev_name, const struct path *path,
-			     const char *type, unsigned long flags, void *data)
+			     const char *type, unsigned long flags,
+			     void *data, size_t data_size)
 {
 	struct aa_label *label;
 	int error = 0;
diff --git a/security/inode.c b/security/inode.c
index 8dd9ca8848e4..a89a00714f33 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -39,7 +39,8 @@ static const struct super_operations securityfs_super_operations = {
 	.evict_inode	= securityfs_evict_inode,
 };
 
-static int fill_super(struct super_block *sb, void *data, int silent)
+static int fill_super(struct super_block *sb, void *data, size_t data_size,
+		      int silent)
 {
 	static const struct tree_descr files[] = {{""}};
 	int error;
@@ -55,9 +56,9 @@ static int fill_super(struct super_block *sb, void *data, int silent)
 
 static struct dentry *get_sb(struct file_system_type *fs_type,
 		  int flags, const char *dev_name,
-		  void *data)
+		  void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, fill_super);
+	return mount_single(fs_type, flags, data, data_size, fill_super);
 }
 
 static struct file_system_type fs_type = {
diff --git a/security/security.c b/security/security.c
index 42e4ea19b61c..35abdc964724 100644
--- a/security/security.c
+++ b/security/security.c
@@ -409,20 +409,20 @@ void security_sb_free(struct super_block *sb)
 	call_void_hook(sb_free_security, sb);
 }
 
-int security_sb_copy_data(char *orig, char *copy)
+int security_sb_copy_data(char *orig, size_t data_size, char *copy)
 {
-	return call_int_hook(sb_copy_data, 0, orig, copy);
+	return call_int_hook(sb_copy_data, 0, orig, data_size, copy);
 }
 EXPORT_SYMBOL(security_sb_copy_data);
 
-int security_sb_remount(struct super_block *sb, void *data)
+int security_sb_remount(struct super_block *sb, void *data, size_t data_size)
 {
-	return call_int_hook(sb_remount, 0, sb, data);
+	return call_int_hook(sb_remount, 0, sb, data, data_size);
 }
 
-int security_sb_kern_mount(struct super_block *sb, int flags, void *data)
+int security_sb_kern_mount(struct super_block *sb, int flags, void *data, size_t data_size)
 {
-	return call_int_hook(sb_kern_mount, 0, sb, flags, data);
+	return call_int_hook(sb_kern_mount, 0, sb, flags, data, data_size);
 }
 
 int security_sb_show_options(struct seq_file *m, struct super_block *sb)
@@ -436,9 +436,11 @@ int security_sb_statfs(struct dentry *dentry)
 }
 
 int security_sb_mount(const char *dev_name, const struct path *path,
-                       const char *type, unsigned long flags, void *data)
+		      const char *type, unsigned long flags,
+		      void *data, size_t data_size)
 {
-	return call_int_hook(sb_mount, 0, dev_name, path, type, flags, data);
+	return call_int_hook(sb_mount, 0, dev_name, path, type, flags,
+			     data, data_size);
 }
 
 int security_sb_umount(struct vfsmount *mnt, int flags)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 969a2a0dc582..46d31858eaba 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2782,7 +2782,7 @@ static inline void take_selinux_option(char **to, char *from, int *first,
 	}
 }
 
-static int selinux_sb_copy_data(char *orig, char *copy)
+static int selinux_sb_copy_data(char *orig, size_t data_size, char *copy)
 {
 	int fnosec, fsec, rc = 0;
 	char *in_save, *in_curr, *in_end;
@@ -2824,7 +2824,7 @@ static int selinux_sb_copy_data(char *orig, char *copy)
 	return rc;
 }
 
-static int selinux_sb_remount(struct super_block *sb, void *data)
+static int selinux_sb_remount(struct super_block *sb, void *data, size_t data_size)
 {
 	int rc, i, *flags;
 	struct security_mnt_opts opts;
@@ -2844,7 +2844,7 @@ static int selinux_sb_remount(struct super_block *sb, void *data)
 	secdata = alloc_secdata();
 	if (!secdata)
 		return -ENOMEM;
-	rc = selinux_sb_copy_data(data, secdata);
+	rc = selinux_sb_copy_data(data, data_size, secdata);
 	if (rc)
 		goto out_free_secdata;
 
@@ -2909,7 +2909,7 @@ static int selinux_sb_remount(struct super_block *sb, void *data)
 	goto out_free_opts;
 }
 
-static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data)
+static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data, size_t data_size)
 {
 	const struct cred *cred = current_cred();
 	struct common_audit_data ad;
@@ -2942,7 +2942,8 @@ static int selinux_mount(const char *dev_name,
 			 const struct path *path,
 			 const char *type,
 			 unsigned long flags,
-			 void *data)
+			 void *data,
+			 size_t data_size)
 {
 	const struct cred *cred = current_cred();
 
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 245160373dab..87c07ff2ae7e 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1884,7 +1884,8 @@ static struct dentry *sel_make_dir(struct dentry *dir, const char *name,
 
 #define NULL_FILE_NAME "null"
 
-static int sel_fill_super(struct super_block *sb, void *data, int silent)
+static int sel_fill_super(struct super_block *sb, void *data, size_t data_size,
+			  int silent)
 {
 	struct selinux_fs_info *fsi;
 	int ret;
@@ -1999,9 +2000,10 @@ static int sel_fill_super(struct super_block *sb, void *data, int silent)
 }
 
 static struct dentry *sel_mount(struct file_system_type *fs_type,
-		      int flags, const char *dev_name, void *data)
+				int flags, const char *dev_name,
+				void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, sel_fill_super);
+	return mount_single(fs_type, flags, data, data_size, sel_fill_super);
 }
 
 static void sel_kill_sb(struct super_block *sb)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 549aaa46353b..5dc31a940961 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -869,6 +869,7 @@ static void smack_sb_free_security(struct super_block *sb)
 /**
  * smack_sb_copy_data - copy mount options data for processing
  * @orig: where to start
+ * @orig_size: Size of orig buffer
  * @smackopts: mount options string
  *
  * Returns 0 on success or -ENOMEM on error.
@@ -876,7 +877,7 @@ static void smack_sb_free_security(struct super_block *sb)
  * Copy the Smack specific mount options out of the mount
  * options list.
  */
-static int smack_sb_copy_data(char *orig, char *smackopts)
+static int smack_sb_copy_data(char *orig, size_t orig_size, char *smackopts)
 {
 	char *cp, *commap, *otheropts, *dp;
 
@@ -1157,7 +1158,8 @@ static int smack_set_mnt_opts(struct super_block *sb,
  *
  * Returns 0 on success, an error code on failure
  */
-static int smack_sb_kern_mount(struct super_block *sb, int flags, void *data)
+static int smack_sb_kern_mount(struct super_block *sb, int flags,
+			       void *data, size_t data_size)
 {
 	int rc = 0;
 	char *options = data;
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index f6482e53d55a..f4e91c5d6c2c 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -2844,13 +2844,15 @@ static const struct file_operations smk_ptrace_ops = {
  * smk_fill_super - fill the smackfs superblock
  * @sb: the empty superblock
  * @data: unused
+ * @data_size: size of data buffer
  * @silent: unused
  *
  * Fill in the well known entries for the smack filesystem
  *
  * Returns 0 on success, an error code on failure
  */
-static int smk_fill_super(struct super_block *sb, void *data, int silent)
+static int smk_fill_super(struct super_block *sb, void *data, size_t data_size,
+			  int silent)
 {
 	int rc;
 	struct inode *root_inode;
@@ -2934,9 +2936,10 @@ static int smk_fill_super(struct super_block *sb, void *data, int silent)
  * Returns what the lower level code does.
  */
 static struct dentry *smk_mount(struct file_system_type *fs_type,
-		      int flags, const char *dev_name, void *data)
+				int flags, const char *dev_name,
+				void *data, size_t data_size)
 {
-	return mount_single(fs_type, flags, data, smk_fill_super);
+	return mount_single(fs_type, flags, data, data_size, smk_fill_super);
 }
 
 static struct file_system_type smk_fs_type = {
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 31fd6bd4f657..c3a0ae4fa7ce 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -413,11 +413,13 @@ static int tomoyo_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
  * @type:     Name of filesystem type. Maybe NULL.
  * @flags:    Mount options.
  * @data:     Optional data. Maybe NULL.
+ * @data_size: Size of data.
  *
  * Returns 0 on success, negative value otherwise.
  */
 static int tomoyo_sb_mount(const char *dev_name, const struct path *path,
-			   const char *type, unsigned long flags, void *data)
+			   const char *type, unsigned long flags,
+			   void *data, size_t data_size)
 {
 	return tomoyo_mount_permission(dev_name, path, type, flags, data);
 }

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/24] VFS: Implement a filesystem superblock creation/configuration context [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (7 preceding siblings ...)
  2018-04-19 13:31 ` [PATCH 08/24] VFS: Require specification of size of mount data for internal mounts " David Howells
@ 2018-04-19 13:32 ` David Howells
  2018-04-19 13:32 ` [PATCH 10/24] VFS: Remove unused code after filesystem context changes " David Howells
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:32 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Implement a filesystem context concept to be used during superblock
creation for mount and superblock reconfiguration for remount.

The mounting procedure then becomes:

 (1) Allocate new fs_context context.

 (2) Configure the context.

 (3) Create superblock.

 (4) Mount the superblock any number of times.

 (5) Destroy the context.

Rather than calling fs_type->mount(), an fs_context struct is created and
fs_type->init_fs_context() is called to set it up.
fs_type->fs_context_size says how much space should be allocated for the
config context.  The fs_context struct is placed at the beginning and any
extra space is for the filesystem's use.

A set of operations has to be set by ->init_fs_context() to provide
freeing, duplication, option parsing, binary data parsing, validation,
mounting and superblock filling.

Legacy filesystems are supported by the provision of a set of legacy
fs_context operations that build up a list of mount options and then invoke
fs_type->mount() from within the fs_context ->get_tree() operation.  This
allows all filesystems to be accessed using fs_context.

It should be noted that, whilst this patch adds a lot of lines of code,
there is quite a bit of duplication with existing code that can be
eliminated should all filesystems be converted over.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/Makefile                |    3 
 fs/fs_context.c            |  593 ++++++++++++++++++++++++++++++++++++++++++++
 fs/internal.h              |    3 
 fs/libfs.c                 |   17 +
 fs/namespace.c             |  332 ++++++++++++++++---------
 fs/super.c                 |  309 ++++++++++++++++++++++-
 include/linux/fs.h         |   14 +
 include/linux/fs_context.h |   31 ++
 include/linux/mount.h      |    2 
 9 files changed, 1167 insertions(+), 137 deletions(-)
 create mode 100644 fs/fs_context.c

diff --git a/fs/Makefile b/fs/Makefile
index c9375fd2c8c4..6f2dae3c32da 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -12,7 +12,8 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		attr.o bad_inode.o file.o filesystems.o namespace.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o splice.o sync.o utimes.o d_path.o \
-		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o
+		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
+		fs_context.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o block_dev.o direct-io.o mpage.o
diff --git a/fs/fs_context.c b/fs/fs_context.c
new file mode 100644
index 000000000000..0e3f561a8219
--- /dev/null
+++ b/fs/fs_context.c
@@ -0,0 +1,593 @@
+/* Provide a way to create a superblock configuration context within the kernel
+ * that allows a superblock to be set up prior to mounting.
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/fs_context.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/nsproxy.h>
+#include <linux/slab.h>
+#include <linux/magic.h>
+#include <linux/security.h>
+#include <linux/parser.h>
+#include <linux/mnt_namespace.h>
+#include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
+#include <net/net_namespace.h>
+#include "mount.h"
+
+enum legacy_fs_param {
+	LEGACY_FS_UNSET_PARAMS,
+	LEGACY_FS_NO_PARAMS,
+	LEGACY_FS_MONOLITHIC_PARAMS,
+	LEGACY_FS_INDIVIDUAL_PARAMS,
+	LEGACY_FS_MAGIC_PARAMS,
+};
+
+struct legacy_fs_context {
+	struct fs_context	fc;
+	char			*legacy_data;	/* Data page for legacy filesystems */
+	char			*secdata;
+	size_t			data_size;
+	enum legacy_fs_param	param_type;
+};
+
+static const struct fs_context_operations legacy_fs_context_ops;
+
+static const match_table_t common_set_sb_flag = {
+	{ SB_DIRSYNC,		"dirsync" },
+	{ SB_LAZYTIME,		"lazytime" },
+	{ SB_MANDLOCK,		"mand" },
+	{ SB_POSIXACL,		"posixacl" },
+	{ SB_RDONLY,		"ro" },
+	{ SB_SYNCHRONOUS,	"sync" },
+	{ },
+};
+
+static const match_table_t common_clear_sb_flag = {
+	{ SB_LAZYTIME,		"nolazytime" },
+	{ SB_MANDLOCK,		"nomand" },
+	{ SB_RDONLY,		"rw" },
+	{ SB_SILENT,		"silent" },
+	{ SB_SYNCHRONOUS,	"async" },
+	{ },
+};
+
+static const match_table_t forbidden_sb_flag = {
+	{ 0,	"bind" },
+	{ 0,	"move" },
+	{ 0,	"private" },
+	{ 0,	"remount" },
+	{ 0,	"shared" },
+	{ 0,	"slave" },
+	{ 0,	"unbindable" },
+	{ 0,	"rec" },
+	{ 0,	"noatime" },
+	{ 0,	"relatime" },
+	{ 0,	"norelatime" },
+	{ 0,	"strictatime" },
+	{ 0,	"nostrictatime" },
+	{ 0,	"nodiratime" },
+	{ 0,	"dev" },
+	{ 0,	"nodev" },
+	{ 0,	"exec" },
+	{ 0,	"noexec" },
+	{ 0,	"suid" },
+	{ 0,	"nosuid" },
+	{ },
+};
+
+/*
+ * Check for a common mount option that manipulates s_flags.
+ */
+static int vfs_parse_sb_flag_option(struct fs_context *fc, char *data)
+{
+	substring_t args[MAX_OPT_ARGS];
+	unsigned int token;
+
+	token = match_token(data, common_set_sb_flag, args);
+	if (token) {
+		fc->sb_flags |= token;
+		return 1;
+	}
+
+	token = match_token(data, common_clear_sb_flag, args);
+	if (token) {
+		fc->sb_flags &= ~token;
+		return 1;
+	}
+
+	token = match_token(data, forbidden_sb_flag, args);
+	if (token)
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * vfs_parse_fs_option - Add a single mount option to a superblock config
+ * @fc: The filesystem context to modify
+ * @opt: The option to apply.
+ * @len: The length of the option.
+ *
+ * A single mount option in string form is applied to the filesystem context
+ * being set up.  Certain standard options (for example "ro") are translated
+ * into flag bits without going to the filesystem.  The active security module
+ * is allowed to observe and poach options.  Any other options are passed over
+ * to the filesystem to parse.
+ *
+ * This may be called multiple times for a context.
+ *
+ * Returns 0 on success and a negative error code on failure.  In the event of
+ * failure, supplementary error information may have been set.
+ */
+int vfs_parse_fs_option(struct fs_context *fc, char *opt, size_t len)
+{
+	int ret;
+
+	ret = vfs_parse_sb_flag_option(fc, opt);
+	if (ret < 0)
+		return ret;
+	if (ret == 1)
+		return 0;
+
+	ret = security_fs_context_parse_option(fc, opt, len);
+	if (ret < 0)
+		return ret;
+	if (ret == 1)
+		return 0;
+
+	if (fc->ops->parse_option)
+		return fc->ops->parse_option(fc, opt, len);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(vfs_parse_fs_option);
+
+/**
+ * vfs_set_fs_source - Set the source/device name in a filesystem context
+ * @fc: The filesystem context to alter
+ * @source: The name of the source
+ * @slen: Length of @source string
+ */
+int vfs_set_fs_source(struct fs_context *fc, const char *source, size_t slen)
+{
+	if (fc->source)
+		return -EINVAL;
+	if (source) {
+		fc->source = kmemdup_nul(source, slen, GFP_KERNEL);
+		if (!fc->source)
+			return -ENOMEM;
+	}
+
+	if (fc->ops->parse_source)
+		return fc->ops->parse_source(fc);
+	return 0;
+}
+EXPORT_SYMBOL(vfs_set_fs_source);
+
+/**
+ * generic_parse_monolithic - Parse key[=val][,key[=val]]* mount data
+ * @ctx: The superblock configuration to fill in.
+ * @data: The data to parse
+ * @data_size: The amount of data
+ *
+ * Parse a blob of data that's in key[=val][,key[=val]]* form.  This can be
+ * called from the ->monolithic_mount_data() fs_context operation.
+ *
+ * Returns 0 on success or the error returned by the ->parse_option() fs_context
+ * operation on failure.
+ */
+int generic_parse_monolithic(struct fs_context *ctx, void *data, size_t data_size)
+{
+	char *options = data, *opt;
+	int ret;
+
+	if (!options)
+		return 0;
+
+	while ((opt = strsep(&options, ",")) != NULL) {
+		if (*opt) {
+			ret = vfs_parse_fs_option(ctx, opt, strlen(opt));
+			if (ret < 0)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(generic_parse_monolithic);
+
+/**
+ * vfs_new_fs_context - Create a filesystem context.
+ * @fs_type: The filesystem type.
+ * @src_sb: A superblock from which this one derives (or NULL)
+ * @sb_flags: Filesystem/superblock flags (SB_*)
+ * @purpose: The purpose that this configuration shall be used for.
+ *
+ * Open a filesystem and create a mount context.  The mount context is
+ * initialised with the supplied flags and, if a submount/automount from
+ * another superblock (@src_sb) is supplied, may have parameters such as
+ * namespaces copied across from that superblock.
+ */
+struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
+				      struct super_block *src_sb,
+				      unsigned int sb_flags,
+				      enum fs_context_purpose purpose)
+{
+	struct fs_context *fc;
+	size_t fc_size = fs_type->fs_context_size;
+	int ret;
+
+	BUG_ON(fs_type->init_fs_context && fc_size < sizeof(*fc));
+
+	if (!fs_type->init_fs_context)
+		fc_size = sizeof(struct legacy_fs_context);
+
+	fc = kzalloc(fc_size, GFP_KERNEL);
+	if (!fc)
+		return ERR_PTR(-ENOMEM);
+
+	fc->purpose	= purpose;
+	fc->sb_flags	= sb_flags;
+	fc->fs_type	= get_filesystem(fs_type);
+	fc->cred	= get_current_cred();
+
+	switch (purpose) {
+	case FS_CONTEXT_FOR_KERNEL_MOUNT:
+		fc->sb_flags |= SB_KERNMOUNT;
+		/* Fallthrough */
+	case FS_CONTEXT_FOR_USER_MOUNT:
+		fc->user_ns = get_user_ns(fc->cred->user_ns);
+		fc->net_ns = get_net(current->nsproxy->net_ns);
+		break;
+	case FS_CONTEXT_FOR_SUBMOUNT:
+		fc->user_ns = get_user_ns(src_sb->s_user_ns);
+		fc->net_ns = get_net(current->nsproxy->net_ns);
+		break;
+	case FS_CONTEXT_FOR_RECONFIGURE:
+		/* We don't pin any namespaces as the superblock's
+		 * subscriptions cannot be changed at this point.
+		 */
+		break;
+	}
+
+
+	/* TODO: Make all filesystems support this unconditionally */
+	if (fc->fs_type->init_fs_context) {
+		ret = fc->fs_type->init_fs_context(fc, src_sb);
+		if (ret < 0)
+			goto err_fc;
+	} else {
+		fc->ops = &legacy_fs_context_ops;
+	}
+
+	/* Do the security check last because ->init_fs_context may change the
+	 * namespace subscriptions.
+	 */
+	ret = security_fs_context_alloc(fc, src_sb);
+	if (ret < 0)
+		goto err_fc;
+
+	return fc;
+
+err_fc:
+	put_fs_context(fc);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(vfs_new_fs_context);
+
+/**
+ * vfs_sb_reconfig - Create a filesystem context for remount/reconfiguration
+ * @mountpoint: The mountpoint to open
+ * @sb_flags: Filesystem/superblock flags (SB_*)
+ *
+ * Open a mounted filesystem and create a filesystem context such that a
+ * remount can be effected.
+ */
+struct fs_context *vfs_sb_reconfig(struct path *mountpoint,
+				   unsigned int sb_flags)
+{
+	struct fs_context *fc;
+
+	fc = vfs_new_fs_context(mountpoint->mnt->mnt_sb->s_type,
+				mountpoint->mnt->mnt_sb,
+				sb_flags, FS_CONTEXT_FOR_RECONFIGURE);
+	if (IS_ERR(fc))
+		return fc;
+
+	fc->root = dget(mountpoint->dentry);
+	return fc;
+}
+
+/**
+ * vfs_dup_fc_config: Duplicate a filesytem context.
+ * @src_fc: The context to copy.
+ */
+struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc)
+{
+	struct fs_context *fc;
+	size_t fc_size;
+	int ret;
+
+	if (!src_fc->ops->dup)
+		return ERR_PTR(-ENOTSUPP);
+
+	fc_size = src_fc->fs_type->fs_context_size;
+	if (!src_fc->fs_type->init_fs_context)
+		fc_size = sizeof(struct legacy_fs_context);
+
+	fc = kmemdup(src_fc, src_fc->fs_type->fs_context_size, GFP_KERNEL);
+	if (!fc)
+		return ERR_PTR(-ENOMEM);
+
+	fc->source	= NULL;
+	fc->security	= NULL;
+	get_filesystem(fc->fs_type);
+	get_net(fc->net_ns);
+	get_user_ns(fc->user_ns);
+	get_cred(fc->cred);
+
+	/* Can't call put until we've called ->dup */
+	ret = fc->ops->dup(fc, src_fc);
+	if (ret < 0)
+		goto err_fc;
+
+	ret = security_fs_context_dup(fc, src_fc);
+	if (ret < 0)
+		goto err_fc;
+	return fc;
+
+err_fc:
+	put_fs_context(fc);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(vfs_dup_fs_context);
+
+/**
+ * put_fs_context - Dispose of a superblock configuration context.
+ * @fc: The context to dispose of.
+ */
+void put_fs_context(struct fs_context *fc)
+{
+	struct super_block *sb;
+
+	if (fc->root) {
+		sb = fc->root->d_sb;
+		dput(fc->root);
+		fc->root = NULL;
+		if (fc->drop_sb)
+			deactivate_super(sb);
+	}
+
+	if (fc->ops && fc->ops->free)
+		fc->ops->free(fc);
+
+	security_fs_context_free(fc);
+	if (fc->net_ns)
+		put_net(fc->net_ns);
+	put_user_ns(fc->user_ns);
+	if (fc->cred)
+		put_cred(fc->cred);
+	kfree(fc->subtype);
+	put_filesystem(fc->fs_type);
+	kfree(fc->source);
+	kfree(fc);
+}
+EXPORT_SYMBOL(put_fs_context);
+
+/*
+ * Free the config for a filesystem that doesn't support fs_context.
+ */
+static void legacy_fs_context_free(struct fs_context *fc)
+{
+	struct legacy_fs_context *ctx = container_of(fc, struct legacy_fs_context, fc);
+
+	free_secdata(ctx->secdata);
+	switch (ctx->param_type) {
+	case LEGACY_FS_UNSET_PARAMS:
+	case LEGACY_FS_NO_PARAMS:
+		break;
+	case LEGACY_FS_MAGIC_PARAMS:
+		break; /* ctx->data is a weird pointer */
+	default:
+		kfree(ctx->legacy_data);
+		break;
+	}
+}
+
+/*
+ * Duplicate a legacy config.
+ */
+static int legacy_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
+{
+	struct legacy_fs_context *ctx = container_of(fc, struct legacy_fs_context, fc);
+	struct legacy_fs_context *src_ctx = container_of(src_fc, struct legacy_fs_context, fc);
+
+	switch (ctx->param_type) {
+	case LEGACY_FS_MONOLITHIC_PARAMS:
+	case LEGACY_FS_INDIVIDUAL_PARAMS:
+		ctx->legacy_data = kmemdup(src_ctx->legacy_data,
+					   src_ctx->data_size, GFP_KERNEL);
+		if (!ctx->legacy_data)
+			return -ENOMEM;
+		/* Fall through */
+	default:
+		break;
+	}
+	return 0;
+}
+
+/*
+ * Add an option to a legacy config.  We build up a comma-separated list of
+ * options.
+ */
+static int legacy_parse_option(struct fs_context *fc, char *opt, size_t len)
+{
+	struct legacy_fs_context *ctx = container_of(fc, struct legacy_fs_context, fc);
+	unsigned int size = ctx->data_size;
+
+	if (ctx->param_type != LEGACY_FS_UNSET_PARAMS &&
+	    ctx->param_type != LEGACY_FS_INDIVIDUAL_PARAMS) {
+		pr_warn("VFS: Can't mix monolithic and individual options\n");
+		return -EINVAL;
+	}
+
+	if (len > PAGE_SIZE - 2 - size)
+		return -EINVAL;
+	if (memchr(opt, ',', len) != NULL)
+		return -EINVAL;
+	if (!ctx->legacy_data) {
+		ctx->legacy_data = kmalloc(PAGE_SIZE, GFP_KERNEL);
+		if (!ctx->legacy_data)
+			return -ENOMEM;
+	}
+
+	ctx->legacy_data[size++] = ',';
+	memcpy(ctx->legacy_data + size, opt, len);
+	size += len;
+	ctx->legacy_data[size] = '\0';
+	ctx->data_size = size;
+	ctx->param_type = LEGACY_FS_INDIVIDUAL_PARAMS;
+	return 0;
+}
+
+/*
+ * Add monolithic mount data.
+ */
+static int legacy_parse_monolithic(struct fs_context *fc, void *data, size_t data_size)
+{
+	struct legacy_fs_context *ctx = container_of(fc, struct legacy_fs_context, fc);
+
+	if (ctx->param_type != LEGACY_FS_UNSET_PARAMS) {
+		pr_warn("VFS: Can't mix monolithic and individual options\n");
+		return -EINVAL;
+	}
+
+	if (!data) {
+		ctx->param_type = LEGACY_FS_NO_PARAMS;
+		return 0;
+	}
+
+	ctx->data_size = data_size;
+	if (data_size > 0) {
+		ctx->legacy_data = kmemdup(data, data_size, GFP_KERNEL);
+		if (!ctx->legacy_data)
+			return -ENOMEM;
+		ctx->param_type = LEGACY_FS_MONOLITHIC_PARAMS;
+	} else {
+		/* Some filesystems pass weird pointers through that we don't
+		 * want to copy.  They can indicate this by setting data_size
+		 * to 0.
+		 */
+		ctx->legacy_data = data;
+		ctx->param_type = LEGACY_FS_MAGIC_PARAMS;
+	}
+
+	return 0;
+}
+
+/*
+ * Use the legacy mount validation step to strip out and process security
+ * config options.
+ */
+static int legacy_validate(struct fs_context *fc)
+{
+	struct legacy_fs_context *ctx = container_of(fc, struct legacy_fs_context, fc);
+
+	switch (ctx->param_type) {
+	case LEGACY_FS_UNSET_PARAMS:
+		ctx->param_type = LEGACY_FS_NO_PARAMS;
+		/* Fall through */
+	case LEGACY_FS_NO_PARAMS:
+	case LEGACY_FS_MAGIC_PARAMS:
+		return 0;
+	default:
+		break;
+	}
+
+	if (ctx->fc.fs_type->fs_flags & FS_BINARY_MOUNTDATA)
+		return 0;
+
+	ctx->secdata = alloc_secdata();
+	if (!ctx->secdata)
+		return -ENOMEM;
+
+	return security_sb_copy_data(ctx->legacy_data, ctx->data_size,
+				     ctx->secdata);
+}
+
+/*
+ * Determine the superblock subtype.
+ */
+static int legacy_set_subtype(struct fs_context *fc)
+{
+	const char *subtype = strchr(fc->fs_type->name, '.');
+
+	if (subtype) {
+		subtype++;
+		if (!subtype[0])
+			return -EINVAL;
+	} else {
+		subtype = "";
+	}
+
+	fc->subtype = kstrdup(subtype, GFP_KERNEL);
+	if (!fc->subtype)
+		return -ENOMEM;
+	return 0;
+}
+
+/*
+ * Get a mountable root with the legacy mount command.
+ */
+static int legacy_get_tree(struct fs_context *fc)
+{
+	struct legacy_fs_context *ctx = container_of(fc, struct legacy_fs_context, fc);
+	struct super_block *sb;
+	struct dentry *root;
+	int ret;
+
+	root = ctx->fc.fs_type->mount(ctx->fc.fs_type, ctx->fc.sb_flags,
+				      ctx->fc.source, ctx->legacy_data,
+				      ctx->data_size);
+	if (IS_ERR(root))
+		return PTR_ERR(root);
+
+	sb = root->d_sb;
+	BUG_ON(!sb);
+
+	if ((ctx->fc.fs_type->fs_flags & FS_HAS_SUBTYPE) &&
+	    !fc->subtype) {
+		ret = legacy_set_subtype(fc);
+		if (ret < 0)
+			goto err_sb;
+	}
+
+	ctx->fc.root = root;
+	ctx->fc.drop_sb = true;
+	return 0;
+
+err_sb:
+	dput(root);
+	deactivate_locked_super(sb);
+	return ret;
+}
+
+static const struct fs_context_operations legacy_fs_context_ops = {
+	.free			= legacy_fs_context_free,
+	.dup			= legacy_fs_context_dup,
+	.parse_option		= legacy_parse_option,
+	.parse_monolithic	= legacy_parse_monolithic,
+	.validate		= legacy_validate,
+	.get_tree		= legacy_get_tree,
+};
diff --git a/fs/internal.h b/fs/internal.h
index 1afa522c5f30..91a990234488 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -98,7 +98,8 @@ extern struct file *get_empty_filp(void);
 /*
  * super.c
  */
-extern int do_remount_sb(struct super_block *, int, void *, size_t, int);
+extern int do_remount_sb(struct super_block *, int, void *, size_t, int,
+			 struct fs_context *);
 extern bool trylock_super(struct super_block *sb);
 extern struct dentry *mount_fs(struct file_system_type *,
 			       int, const char *, void *, size_t);
diff --git a/fs/libfs.c b/fs/libfs.c
index 9f1f4884b7cc..0bbe1ff1d09e 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 #include <linux/cred.h>
 #include <linux/mount.h>
+#include <linux/fs_context.h>
 #include <linux/vfs.h>
 #include <linux/quotaops.h>
 #include <linux/mutex.h>
@@ -574,13 +575,27 @@ static DEFINE_SPINLOCK(pin_fs_lock);
 
 int simple_pin_fs(struct file_system_type *type, struct vfsmount **mount, int *count)
 {
+	struct fs_context *fc;
 	struct vfsmount *mnt = NULL;
+	int ret;
+
 	spin_lock(&pin_fs_lock);
 	if (unlikely(!*mount)) {
 		spin_unlock(&pin_fs_lock);
-		mnt = vfs_kern_mount(type, SB_KERNMOUNT, type->name, NULL, 0);
+
+		fc = vfs_new_fs_context(type, NULL, 0, FS_CONTEXT_FOR_KERNEL_MOUNT);
+		if (IS_ERR(fc))
+			return PTR_ERR(fc);
+
+		ret = vfs_get_tree(fc);
+		if (ret < 0)
+			return ret;
+
+		mnt = vfs_create_mount(fc);
+		put_fs_context(fc);
 		if (IS_ERR(mnt))
 			return PTR_ERR(mnt);
+
 		spin_lock(&pin_fs_lock);
 		if (!*mount)
 			*mount = mnt;
diff --git a/fs/namespace.c b/fs/namespace.c
index 8fc4f3459b80..c61ff2ab090a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -25,8 +25,10 @@
 #include <linux/magic.h>
 #include <linux/bootmem.h>
 #include <linux/task_work.h>
+#include <linux/file.h>
 #include <linux/sched/task.h>
 #include <uapi/linux/mount.h>
+#include <linux/fs_context.h>
 
 #include "pnode.h"
 #include "internal.h"
@@ -1019,56 +1021,6 @@ static struct mount *skip_mnt_tree(struct mount *p)
 	return p;
 }
 
-struct vfsmount *
-vfs_kern_mount(struct file_system_type *type, int flags, const char *name,
-	       void *data, size_t data_size)
-{
-	struct mount *mnt;
-	struct dentry *root;
-
-	if (!type)
-		return ERR_PTR(-ENODEV);
-
-	mnt = alloc_vfsmnt(name);
-	if (!mnt)
-		return ERR_PTR(-ENOMEM);
-
-	if (flags & SB_KERNMOUNT)
-		mnt->mnt.mnt_flags = MNT_INTERNAL;
-
-	root = mount_fs(type, flags, name, data, data_size);
-	if (IS_ERR(root)) {
-		mnt_free_id(mnt);
-		free_vfsmnt(mnt);
-		return ERR_CAST(root);
-	}
-
-	mnt->mnt.mnt_root = root;
-	mnt->mnt.mnt_sb = root->d_sb;
-	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
-	mnt->mnt_parent = mnt;
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &root->d_sb->s_mounts);
-	unlock_mount_hash();
-	return &mnt->mnt;
-}
-EXPORT_SYMBOL_GPL(vfs_kern_mount);
-
-struct vfsmount *
-vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
-	     const char *name, void *data, size_t data_size)
-{
-	/* Until it is worked out how to pass the user namespace
-	 * through from the parent mount to the submount don't support
-	 * unprivileged mounts with submounts.
-	 */
-	if (mountpoint->d_sb->s_user_ns != &init_user_ns)
-		return ERR_PTR(-EPERM);
-
-	return vfs_kern_mount(type, SB_SUBMOUNT, name, data, data_size);
-}
-EXPORT_SYMBOL_GPL(vfs_submount);
-
 static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 					int flag)
 {
@@ -1595,7 +1547,7 @@ static int do_umount(struct mount *mnt, int flags)
 			return -EPERM;
 		down_write(&sb->s_umount);
 		if (!sb_rdonly(sb))
-			retval = do_remount_sb(sb, SB_RDONLY, NULL, 0, 0);
+			retval = do_remount_sb(sb, SB_RDONLY, NULL, 0, 0, NULL);
 		up_write(&sb->s_umount);
 		return retval;
 	}
@@ -2282,6 +2234,20 @@ static int change_mount_flags(struct vfsmount *mnt, int ms_flags)
 	return error;
 }
 
+/*
+ * Parse the monolithic page of mount data given to sys_mount().
+ */
+static int parse_monolithic_mount_data(struct fs_context *fc, void *data, size_t data_size)
+{
+	int (*monolithic_mount_data)(struct fs_context *, void *, size_t);
+
+	monolithic_mount_data = fc->ops->parse_monolithic;
+	if (!monolithic_mount_data)
+		monolithic_mount_data = generic_parse_monolithic;
+
+	return monolithic_mount_data(fc, data, data_size);
+}
+
 /*
  * change filesystem flags. dir should be a physical root of filesystem.
  * If you've mounted a non-root directory somewhere and want to do remount
@@ -2290,9 +2256,11 @@ static int change_mount_flags(struct vfsmount *mnt, int ms_flags)
 static int do_remount(struct path *path, int ms_flags, int sb_flags,
 		      int mnt_flags, void *data, size_t data_size)
 {
+	struct fs_context *fc = NULL;
 	int err;
 	struct super_block *sb = path->mnt->mnt_sb;
 	struct mount *mnt = real_mount(path->mnt);
+	struct file_system_type *type = sb->s_type;
 
 	if (!check_mnt(mnt))
 		return -EINVAL;
@@ -2327,9 +2295,29 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
 		return -EPERM;
 	}
 
-	err = security_sb_remount(sb, data, data_size);
-	if (err)
-		return err;
+	if (type->init_fs_context) {
+		fc = vfs_sb_reconfig(path, sb_flags);
+		if (IS_ERR(fc))
+			return PTR_ERR(fc);
+
+		err = parse_monolithic_mount_data(fc, data, data_size);
+		if (err < 0)
+			goto err_fc;
+
+		if (fc->ops->validate) {
+			err = fc->ops->validate(fc);
+			if (err < 0)
+				goto err_fc;
+		}
+
+		err = security_fs_context_validate(fc);
+		if (err)
+			return err;
+	} else {
+		err = security_sb_remount(sb, data, data_size);
+		if (err)
+			return err;
+	}
 
 	down_write(&sb->s_umount);
 	if (ms_flags & MS_BIND)
@@ -2337,7 +2325,7 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
 	else if (!capable(CAP_SYS_ADMIN))
 		err = -EPERM;
 	else
-		err = do_remount_sb(sb, sb_flags, data, data_size, 0);
+		err = do_remount_sb(sb, sb_flags, data, data_size, 0, fc);
 	if (!err) {
 		lock_mount_hash();
 		mnt_flags |= mnt->mnt.mnt_flags & ~MNT_USER_SETTABLE_MASK;
@@ -2346,6 +2334,9 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
 		unlock_mount_hash();
 	}
 	up_write(&sb->s_umount);
+err_fc:
+	if (fc)
+		put_fs_context(fc);
 	return err;
 }
 
@@ -2429,29 +2420,6 @@ static int do_move_mount(struct path *path, const char *old_name)
 	return err;
 }
 
-static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
-{
-	int err;
-	const char *subtype = strchr(fstype, '.');
-	if (subtype) {
-		subtype++;
-		err = -EINVAL;
-		if (!subtype[0])
-			goto err;
-	} else
-		subtype = "";
-
-	mnt->mnt_sb->s_subtype = kstrdup(subtype, GFP_KERNEL);
-	err = -ENOMEM;
-	if (!mnt->mnt_sb->s_subtype)
-		goto err;
-	return mnt;
-
- err:
-	mntput(mnt);
-	return ERR_PTR(err);
-}
-
 /*
  * add a mount into a namespace's mount tree
  */
@@ -2498,42 +2466,85 @@ static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
 
 static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags);
 
+/*
+ * Create a new mount using a superblock configuration and request it
+ * be added to the namespace tree.
+ */
+static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
+			   unsigned int mnt_flags)
+{
+	struct vfsmount *mnt;
+	int ret;
+
+	ret = security_sb_mountpoint(fc, mountpoint,
+				     mnt_flags & ~MNT_INTERNAL_FLAGS);
+	if (ret < 0)
+		return ret;
+
+	mnt = vfs_create_mount(fc);
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
+
+	ret = -EPERM;
+	if (mount_too_revealing(mnt, &mnt_flags)) {
+		pr_warn("VFS: Mount too revealing\n");
+		goto err_mnt;
+	}
+
+	ret = do_add_mount(real_mount(mnt), mountpoint, mnt_flags);
+	if (ret < 0)
+		goto err_mnt;
+	return ret;
+
+err_mnt:
+	mntput(mnt);
+	return ret;
+}
+
 /*
  * create a new mount for userspace and request it to be added into the
  * namespace's tree
  */
-static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
-			int mnt_flags, const char *name,
+static int do_new_mount(struct path *mountpoint, const char *fstype,
+			int sb_flags, int mnt_flags, const char *name,
 			void *data, size_t data_size)
 {
-	struct file_system_type *type;
-	struct vfsmount *mnt;
+	struct file_system_type *fs_type;
+	struct fs_context *fc;
 	int err;
 
 	if (!fstype)
 		return -EINVAL;
 
-	type = get_fs_type(fstype);
-	if (!type)
-		return -ENODEV;
+	err = -ENODEV;
+	fs_type = get_fs_type(fstype);
+	if (!fs_type)
+		goto out;
 
-	mnt = vfs_kern_mount(type, sb_flags, name, data, data_size);
-	if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
-	    !mnt->mnt_sb->s_subtype)
-		mnt = fs_set_subtype(mnt, fstype);
+	fc = vfs_new_fs_context(fs_type, NULL, sb_flags,
+				FS_CONTEXT_FOR_USER_MOUNT);
+	put_filesystem(fs_type);
+	if (IS_ERR(fc)) {
+		err = PTR_ERR(fc);
+		goto out;
+	}
 
-	put_filesystem(type);
-	if (IS_ERR(mnt))
-		return PTR_ERR(mnt);
+	err = vfs_set_fs_source(fc, name, name ? strlen(name) : 0);
+	if (err < 0)
+		goto out_fc;
 
-	if (mount_too_revealing(mnt, &mnt_flags)) {
-		mntput(mnt);
-		return -EPERM;
-	}
+	err = parse_monolithic_mount_data(fc, data, data_size);
+	if (err < 0)
+		goto out_fc;
 
-	err = do_add_mount(real_mount(mnt), path, mnt_flags);
-	if (err)
-		mntput(mnt);
+	err = vfs_get_tree(fc);
+	if (err < 0)
+		goto out_fc;
+
+	err = do_new_mount_fc(fc, mountpoint, mnt_flags);
+out_fc:
+	put_fs_context(fc);
+out:
 	return err;
 }
 
@@ -3081,6 +3092,113 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
 	return ksys_mount(dev_name, dir_name, type, flags, data);
 }
 
+/**
+ * vfs_create_mount - Create a mount for a configured superblock
+ * @fc: The configuration context with the superblock attached
+ *
+ * Create a mount to an already configured superblock.  If necessary, the
+ * caller should invoke vfs_get_tree() before calling this.
+ *
+ * Note that this does not attach the mount to anything.
+ */
+struct vfsmount *vfs_create_mount(struct fs_context *fc)
+{
+	struct mount *mnt;
+
+	if (!fc->root)
+		return ERR_PTR(-EINVAL);
+
+	mnt = alloc_vfsmnt(fc->source ?: "none");
+	if (!mnt)
+		return ERR_PTR(-ENOMEM);
+
+	if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
+		/* It's a longterm mount, don't release mnt until we unmount
+		 * before file sys is unregistered
+		 */
+		mnt->mnt.mnt_flags = MNT_INTERNAL;
+
+	atomic_inc(&fc->root->d_sb->s_active);
+	mnt->mnt.mnt_sb		= fc->root->d_sb;
+	mnt->mnt.mnt_root	= dget(fc->root);
+	mnt->mnt_mountpoint	= mnt->mnt.mnt_root;
+	mnt->mnt_parent		= mnt;
+
+	lock_mount_hash();
+	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
+	unlock_mount_hash();
+	return &mnt->mnt;
+}
+EXPORT_SYMBOL(vfs_create_mount);
+
+struct vfsmount *vfs_kern_mount(struct file_system_type *type,
+				int sb_flags, const char *devname,
+				void *data, size_t data_size)
+{
+	struct fs_context *fc;
+	struct vfsmount *mnt;
+	int ret;
+
+	if (!type)
+		return ERR_PTR(-EINVAL);
+
+	fc = vfs_new_fs_context(type, NULL, sb_flags,
+				sb_flags & SB_KERNMOUNT ?
+				FS_CONTEXT_FOR_KERNEL_MOUNT :
+				FS_CONTEXT_FOR_USER_MOUNT);
+	if (IS_ERR(fc))
+		return ERR_CAST(fc);
+
+	ret = vfs_set_fs_source(fc, devname, devname ? strlen(devname) : 0);
+	if (ret < 0)
+		goto err_fc;
+
+	ret = parse_monolithic_mount_data(fc, data, data_size);
+	if (ret < 0)
+		goto err_fc;
+
+	ret = vfs_get_tree(fc);
+	if (ret < 0)
+		goto err_fc;
+
+	mnt = vfs_create_mount(fc);
+out:
+	put_fs_context(fc);
+	return mnt;
+err_fc:
+	mnt = ERR_PTR(ret);
+	goto out;
+}
+EXPORT_SYMBOL_GPL(vfs_kern_mount);
+
+struct vfsmount *
+vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
+	     const char *name, void *data, size_t data_size)
+{
+	/* Until it is worked out how to pass the user namespace
+	 * through from the parent mount to the submount don't support
+	 * unprivileged mounts with submounts.
+	 */
+	if (mountpoint->d_sb->s_user_ns != &init_user_ns)
+		return ERR_PTR(-EPERM);
+
+	return vfs_kern_mount(type, MS_SUBMOUNT, name, data, data_size);
+}
+EXPORT_SYMBOL_GPL(vfs_submount);
+
+struct vfsmount *kern_mount(struct file_system_type *type)
+{
+	return vfs_kern_mount(type, SB_KERNMOUNT, type->name, NULL, 0);
+}
+EXPORT_SYMBOL_GPL(kern_mount);
+
+struct vfsmount *kern_mount_data(struct file_system_type *type,
+				 void *data, size_t data_size)
+{
+	return vfs_kern_mount(type, SB_KERNMOUNT, type->name, data, data_size);
+}
+EXPORT_SYMBOL_GPL(kern_mount_data);
+
 /*
  * Return true if path is reachable from root
  *
@@ -3301,22 +3419,6 @@ void put_mnt_ns(struct mnt_namespace *ns)
 	free_mnt_ns(ns);
 }
 
-struct vfsmount *kern_mount_data(struct file_system_type *type,
-				 void *data, size_t data_size)
-{
-	struct vfsmount *mnt;
-	mnt = vfs_kern_mount(type, SB_KERNMOUNT, type->name, data, data_size);
-	if (!IS_ERR(mnt)) {
-		/*
-		 * it is a longterm mount, don't release mnt until
-		 * we unmount before file sys is unregistered
-		*/
-		real_mount(mnt)->mnt_ns = MNT_NS_INTERNAL;
-	}
-	return mnt;
-}
-EXPORT_SYMBOL_GPL(kern_mount_data);
-
 void kern_unmount(struct vfsmount *mnt)
 {
 	/* release long term mount so mount point can be released */
diff --git a/fs/super.c b/fs/super.c
index 9117e3447837..5d65a45ca6db 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -36,6 +36,7 @@
 #include <linux/lockdep.h>
 #include <linux/user_namespace.h>
 #include <uapi/linux/mount.h>
+#include <linux/fs_context.h>
 #include "internal.h"
 
 static int thaw_super_locked(struct super_block *sb);
@@ -173,16 +174,13 @@ static void destroy_unused_super(struct super_block *s)
 }
 
 /**
- *	alloc_super	-	create new superblock
- *	@type:	filesystem type superblock should belong to
- *	@flags: the mount flags
- *	@user_ns: User namespace for the super_block
+ *	alloc_super - Create new superblock
+ *	@fc: The filesystem configuration context
  *
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(struct file_system_type *type, int flags,
-				       struct user_namespace *user_ns)
+static struct super_block *alloc_super(struct fs_context *fc)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
@@ -192,9 +190,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 		return NULL;
 
 	INIT_LIST_HEAD(&s->s_mounts);
-	s->s_user_ns = get_user_ns(user_ns);
+	s->s_user_ns = get_user_ns(fc->user_ns);
 	init_rwsem(&s->s_umount);
-	lockdep_set_class(&s->s_umount, &type->s_umount_key);
+	lockdep_set_class(&s->s_umount, &fc->fs_type->s_umount_key);
 	/*
 	 * sget() can have s_umount recursion.
 	 *
@@ -218,12 +216,12 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	for (i = 0; i < SB_FREEZE_LEVELS; i++) {
 		if (__percpu_init_rwsem(&s->s_writers.rw_sem[i],
 					sb_writers_name[i],
-					&type->s_writers_key[i]))
+					&fc->fs_type->s_writers_key[i]))
 			goto fail;
 	}
 	init_waitqueue_head(&s->s_writers.wait_unfrozen);
 	s->s_bdi = &noop_backing_dev_info;
-	s->s_flags = flags;
+	s->s_flags = fc->sb_flags;
 	if (s->s_user_ns != &init_user_ns)
 		s->s_iflags |= SB_I_NODEV;
 	INIT_HLIST_NODE(&s->s_instances);
@@ -241,7 +239,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_count = 1;
 	atomic_set(&s->s_active, 1);
 	mutex_init(&s->s_vfs_rename_mutex);
-	lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
+	lockdep_set_class(&s->s_vfs_rename_mutex, &fc->fs_type->s_vfs_rename_key);
 	init_rwsem(&s->s_dquot.dqio_sem);
 	s->s_maxbytes = MAX_NON_LFS;
 	s->s_op = &default_op;
@@ -459,6 +457,96 @@ void generic_shutdown_super(struct super_block *sb)
 
 EXPORT_SYMBOL(generic_shutdown_super);
 
+/**
+ * sget_fc - Find or create a superblock
+ * @fc:	Filesystem context.
+ * @test: Comparison callback
+ * @set: Setup callback
+ *
+ * Find or create a superblock using the parameters stored in the filesystem
+ * context and the two callback functions.
+ *
+ * If an extant superblock is matched, then that will be returned with an
+ * elevated reference count that the caller must transfer or discard.
+ *
+ * If no match is made, a new superblock will be allocated and basic
+ * initialisation will be performed (s_type, s_fs_info and s_id will be set and
+ * the set() callback will be invoked), the superblock will be published and it
+ * will be returned in a partially constructed state with SB_BORN and SB_ACTIVE
+ * as yet unset.
+ */
+struct super_block *sget_fc(struct fs_context *fc,
+			    int (*test)(struct super_block *, struct fs_context *),
+			    int (*set)(struct super_block *, struct fs_context *))
+{
+	struct super_block *s = NULL;
+	struct super_block *old;
+	int err;
+
+	if (!(fc->sb_flags & SB_KERNMOUNT) &&
+	    fc->purpose != FS_CONTEXT_FOR_SUBMOUNT) {
+		/* Don't allow mounting unless the caller has CAP_SYS_ADMIN
+		 * over the namespace.
+		 */
+		if (!(fc->fs_type->fs_flags & FS_USERNS_MOUNT) &&
+		    !capable(CAP_SYS_ADMIN))
+			return ERR_PTR(-EPERM);
+		else if (!ns_capable(fc->user_ns, CAP_SYS_ADMIN))
+			return ERR_PTR(-EPERM);
+	}
+
+retry:
+	spin_lock(&sb_lock);
+	if (test) {
+		hlist_for_each_entry(old, &fc->fs_type->fs_supers, s_instances) {
+			if (!test(old, fc))
+				continue;
+			if (fc->user_ns != old->s_user_ns) {
+				spin_unlock(&sb_lock);
+				if (s) {
+					up_write(&s->s_umount);
+					destroy_unused_super(s);
+				}
+				return ERR_PTR(-EBUSY);
+			}
+			if (!grab_super(old))
+				goto retry;
+			if (s) {
+				up_write(&s->s_umount);
+				destroy_unused_super(s);
+				s = NULL;
+			}
+			return old;
+		}
+	}
+	if (!s) {
+		spin_unlock(&sb_lock);
+		s = alloc_super(fc);
+		if (!s)
+			return ERR_PTR(-ENOMEM);
+		goto retry;
+	}
+
+	s->s_fs_info = fc->s_fs_info;
+	err = set(s, fc);
+	if (err) {
+		s->s_fs_info = NULL;
+		spin_unlock(&sb_lock);
+		up_write(&s->s_umount);
+		destroy_unused_super(s);
+		return ERR_PTR(err);
+	}
+	s->s_type = fc->fs_type;
+	strlcpy(s->s_id, s->s_type->name, sizeof(s->s_id));
+	list_add_tail(&s->s_list, &super_blocks);
+	hlist_add_head(&s->s_instances, &s->s_type->fs_supers);
+	spin_unlock(&sb_lock);
+	get_filesystem(s->s_type);
+	register_shrinker(&s->s_shrink);
+	return s;
+}
+EXPORT_SYMBOL(sget_fc);
+
 /**
  *	sget_userns -	find or create a superblock
  *	@type:	filesystem type superblock should belong to
@@ -501,7 +589,14 @@ struct super_block *sget_userns(struct file_system_type *type,
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super(type, (flags & ~SB_SUBMOUNT), user_ns);
+		{
+			struct fs_context fc = {
+				.fs_type	= type,
+				.sb_flags	= flags & ~SB_SUBMOUNT,
+				.user_ns	= user_ns,
+			};
+			s = alloc_super(&fc);
+		}
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -829,11 +924,13 @@ struct super_block *user_get_super(dev_t dev)
  *	@data:	the rest of options
  *	@data_size: The size of the data
  *      @force: whether or not to force the change
+ *	@fc:	the superblock config for filesystems that support it
+ *		(NULL if called from emergency or umount)
  *
  *	Alters the mount options of a mounted file system.
  */
 int do_remount_sb(struct super_block *sb, int sb_flags, void *data,
-		  size_t data_size, int force)
+		  size_t data_size, int force, struct fs_context *fc)
 {
 	int retval;
 	int remount_ro;
@@ -875,8 +972,17 @@ int do_remount_sb(struct super_block *sb, int sb_flags, void *data,
 		}
 	}
 
-	if (sb->s_op->remount_fs) {
-		retval = sb->s_op->remount_fs(sb, &sb_flags, data, data_size);
+	if (sb->s_op->reconfigure ||
+	    sb->s_op->remount_fs) {
+		if (sb->s_op->reconfigure) {
+			retval = sb->s_op->reconfigure(sb, fc);
+			sb_flags = fc->sb_flags;
+			if (retval == 0)
+				security_sb_reconfigure(fc);
+		} else {
+			retval = sb->s_op->remount_fs(sb, &sb_flags,
+						      data, data_size);
+		}
 		if (retval) {
 			if (!force)
 				goto cancel_readonly;
@@ -915,7 +1021,7 @@ static void do_emergency_remount_callback(struct super_block *sb)
 		/*
 		 * What lock protects sb->s_flags??
 		 */
-		do_remount_sb(sb, SB_RDONLY, NULL, 0, 1);
+		do_remount_sb(sb, SB_RDONLY, NULL, 0, 1, NULL);
 	}
 	up_write(&sb->s_umount);
 }
@@ -1097,6 +1203,90 @@ struct dentry *mount_ns(struct file_system_type *fs_type,
 
 EXPORT_SYMBOL(mount_ns);
 
+static int set_anon_super_fc(struct super_block *sb, struct fs_context *fc)
+{
+	return set_anon_super(sb, NULL);
+}
+
+static int test_keyed_super(struct super_block *sb, struct fs_context *fc)
+{
+	return sb->s_fs_info == fc->s_fs_info;
+}
+
+static int test_single_super(struct super_block *s, struct fs_context *fc)
+{
+	return 1;
+}
+
+/**
+ * vfs_get_super - Get a superblock with a search key set in s_fs_info.
+ * @fc: The filesystem context holding the parameters
+ * @keying: How to distinguish superblocks
+ * @fill_super: Helper to initialise a new superblock
+ *
+ * Search for a superblock and create a new one if not found.  The search
+ * criterion is controlled by @keying.  If the search fails, a new superblock
+ * is created and @fill_super() is called to initialise it.
+ *
+ * @keying can take one of a number of values:
+ *
+ * (1) vfs_get_single_super - Only one superblock of this type may exist on the
+ *     system.  This is typically used for special system filesystems.
+ *
+ * (2) vfs_get_keyed_super - Multiple superblocks may exist, but they must have
+ *     distinct keys (where the key is in s_fs_info).  Searching for the same
+ *     key again will turn up the superblock for that key.
+ *
+ * (3) vfs_get_independent_super - Multiple superblocks may exist and are
+ *     unkeyed.  Each call will get a new superblock.
+ *
+ * A permissions check is made by sget_fc() unless we're getting a superblock
+ * for a kernel-internal mount or a submount.
+ */
+int vfs_get_super(struct fs_context *fc,
+		  enum vfs_get_super_keying keying,
+		  int (*fill_super)(struct super_block *sb,
+				    struct fs_context *fc))
+{
+	int (*test)(struct super_block *, struct fs_context *);
+	struct super_block *sb;
+
+	switch (keying) {
+	case vfs_get_single_super:
+		test = test_single_super;
+		break;
+	case vfs_get_keyed_super:
+		test = test_keyed_super;
+		break;
+	case vfs_get_independent_super:
+		test = NULL;
+		break;
+	default:
+		BUG();
+	}
+
+	sb = sget_fc(fc, test, set_anon_super_fc);
+	if (IS_ERR(sb))
+		return PTR_ERR(sb);
+
+	if (!sb->s_root) {
+		int err;
+		err = fill_super(sb, fc);
+		if (err) {
+			deactivate_locked_super(sb);
+			return err;
+		}
+
+		sb->s_flags |= SB_ACTIVE;
+	}
+
+	BUG_ON(fc->root);
+	fc->root = dget(sb->s_root);
+	fc->drop_sb = true;
+	return 0;
+}
+EXPORT_SYMBOL(vfs_get_super);
+
 #ifdef CONFIG_BLOCK
 static int set_bdev_super(struct super_block *s, void *data)
 {
@@ -1245,7 +1435,7 @@ struct dentry *mount_single(struct file_system_type *fs_type,
 		}
 		s->s_flags |= SB_ACTIVE;
 	} else {
-		do_remount_sb(s, flags, data, data_size, 0);
+		do_remount_sb(s, flags, data, data_size, 0, NULL);
 	}
 	return dget(s->s_root);
 }
@@ -1584,3 +1774,88 @@ int thaw_super(struct super_block *sb)
 	return thaw_super_locked(sb);
 }
 EXPORT_SYMBOL(thaw_super);
+
+/**
+ * vfs_get_tree - Get the mountable root
+ * @fc: The superblock configuration context.
+ *
+ * The filesystem is invoked to get or create a superblock which can then later
+ * be used for mounting.  The filesystem places a pointer to the root to be
+ * used for mounting in @fc->root.
+ */
+int vfs_get_tree(struct fs_context *fc)
+{
+	struct super_block *sb;
+	int ret;
+
+	if (fc->root)
+		return -EBUSY;
+
+	if (fc->ops->validate) {
+		ret = fc->ops->validate(fc);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = security_fs_context_validate(fc);
+	if (ret < 0)
+		return ret;
+
+	/* The filesystem may transfer preallocated resources from the
+	 * configuration context to the superblock, thereby rendering the
+	 * config unusable for another attempt at creation if this one fails.
+	 */
+	if (fc->degraded)
+		return -EBUSY;
+
+	/* Get the mountable root in fc->root, with a ref on the root and a ref
+	 * on the superblock.
+	 */
+	ret = fc->ops->get_tree(fc);
+	if (ret < 0)
+		return ret;
+
+	if (!fc->root) {
+		pr_err("Filesystem %s get_tree() didn't set fc->root\n",
+		       fc->fs_type->name);
+		/* We don't know what the locking state of the superblock is -
+		 * if there is a superblock.
+		 */
+		BUG();
+	}
+
+	sb = fc->root->d_sb;
+	WARN_ON(!sb->s_bdi);
+
+	ret = security_sb_get_tree(fc);
+	if (ret < 0)
+		goto err_sb;
+
+	ret = -ENOMEM;
+	if (fc->subtype && !sb->s_subtype) {
+		sb->s_subtype = kstrdup(fc->subtype, GFP_KERNEL);
+		if (!sb->s_subtype)
+			goto err_sb;
+	}
+
+	sb->s_flags |= SB_BORN;
+
+	/* Filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE
+	 * but s_maxbytes was an unsigned long long for many releases.  Throw
+	 * this warning for a little while to try and catch filesystems that
+	 * violate this rule.
+	 */
+	WARN(sb->s_maxbytes < 0,
+	     "%s set sb->s_maxbytes to negative value (%lld)\n",
+	     fc->fs_type->name, sb->s_maxbytes);
+
+	up_write(&sb->s_umount);
+	return 0;
+
+err_sb:
+	dput(fc->root);
+	fc->root = NULL;
+	deactivate_locked_super(sb);
+	return ret;
+}
+EXPORT_SYMBOL(vfs_get_tree);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9703931cf095..c1f1428f6c67 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -60,6 +60,7 @@ struct workqueue_struct;
 struct iov_iter;
 struct fscrypt_info;
 struct fscrypt_operations;
+struct fs_context;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
@@ -718,6 +719,11 @@ static inline void inode_unlock(struct inode *inode)
 	up_write(&inode->i_rwsem);
 }
 
+static inline int inode_lock_killable(struct inode *inode)
+{
+	return down_write_killable(&inode->i_rwsem);
+}
+
 static inline void inode_lock_shared(struct inode *inode)
 {
 	down_read(&inode->i_rwsem);
@@ -1828,6 +1834,7 @@ struct super_operations {
 	int (*unfreeze_fs) (struct super_block *);
 	int (*statfs) (struct dentry *, struct kstatfs *);
 	int (*remount_fs) (struct super_block *, int *, char *, size_t);
+	int (*reconfigure) (struct super_block *, struct fs_context *);
 	void (*umount_begin) (struct super_block *);
 
 	int (*show_options)(struct seq_file *, struct dentry *);
@@ -2074,8 +2081,10 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
+	unsigned short fs_context_size;	/* Size of superblock config context to allocate */
 	struct dentry *(*mount) (struct file_system_type *, int,
 				 const char *, void *, size_t);
+	int (*init_fs_context)(struct fs_context *, struct super_block *);
 	void (*kill_sb) (struct super_block *);
 	struct module *owner;
 	struct file_system_type * next;
@@ -2132,6 +2141,9 @@ void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+struct super_block *sget_fc(struct fs_context *fc,
+			    int (*test)(struct super_block *, struct fs_context *),
+			    int (*set)(struct super_block *, struct fs_context *));
 struct super_block *sget_userns(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
@@ -2174,8 +2186,8 @@ mount_pseudo(struct file_system_type *fs_type, char *name,
 
 extern int register_filesystem(struct file_system_type *);
 extern int unregister_filesystem(struct file_system_type *);
+extern struct vfsmount *kern_mount(struct file_system_type *);
 extern struct vfsmount *kern_mount_data(struct file_system_type *, void *, size_t);
-#define kern_mount(type) kern_mount_data(type, NULL, 0)
 extern void kern_unmount(struct vfsmount *mnt);
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index 732a11898242..1914eef0a88f 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -25,6 +25,7 @@ struct pid_namespace;
 struct super_block;
 struct user_namespace;
 struct vfsmount;
+struct path;
 
 enum fs_context_purpose {
 	FS_CONTEXT_FOR_USER_MOUNT,	/* New superblock for user-specified mount */
@@ -68,9 +69,37 @@ struct fs_context_operations {
 	int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
 	int (*parse_source)(struct fs_context *fc);
 	int (*parse_option)(struct fs_context *fc, char *opt, size_t len);
-	int (*parse_monolithic)(struct fs_context *fc, void *data);
+	int (*parse_monolithic)(struct fs_context *fc, void *data, size_t data_size);
 	int (*validate)(struct fs_context *fc);
 	int (*get_tree)(struct fs_context *fc);
 };
 
+/*
+ * fs_context manipulation functions.
+ */
+extern struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
+					     struct super_block *src_sb,
+					     unsigned int ms_flags,
+					     enum fs_context_purpose purpose);
+extern struct fs_context *vfs_sb_reconfig(struct path *path, unsigned int ms_flags);
+extern struct fs_context *vfs_dup_fs_context(struct fs_context *src);
+extern int vfs_set_fs_source(struct fs_context *fc, const char *source, size_t slen);
+extern int vfs_parse_fs_option(struct fs_context *fc, char *data, size_t opt);
+extern int generic_parse_monolithic(struct fs_context *fc, void *data, size_t data_size);
+extern int vfs_get_tree(struct fs_context *fc);
+extern void put_fs_context(struct fs_context *fc);
+
+/*
+ * sget() wrapper to be called from the ->get_tree() op.
+ */
+enum vfs_get_super_keying {
+	vfs_get_single_super,	/* Only one such superblock may exist */
+	vfs_get_keyed_super,	/* Superblocks with different s_fs_info keys may exist */
+	vfs_get_independent_super, /* Multiple independent superblocks may exist */
+};
+extern int vfs_get_super(struct fs_context *fc,
+			 enum vfs_get_super_keying keying,
+			 int (*fill_super)(struct super_block *sb,
+					   struct fs_context *fc));
+
 #endif /* _LINUX_FS_CONTEXT_H */
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 8a1031a511c9..5f7e994614b1 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -21,6 +21,7 @@ struct super_block;
 struct vfsmount;
 struct dentry;
 struct mnt_namespace;
+struct fs_context;
 
 #define MNT_NOSUID	0x01
 #define MNT_NODEV	0x02
@@ -88,6 +89,7 @@ struct path;
 extern struct vfsmount *clone_private_mount(const struct path *path);
 
 struct file_system_type;
+extern struct vfsmount *vfs_create_mount(struct fs_context *fc);
 extern struct vfsmount *vfs_kern_mount(struct file_system_type *type,
 				      int flags, const char *name,
 				      void *data, size_t data_size);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/24] VFS: Remove unused code after filesystem context changes [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (8 preceding siblings ...)
  2018-04-19 13:32 ` [PATCH 09/24] VFS: Implement a filesystem superblock creation/configuration context " David Howells
@ 2018-04-19 13:32 ` David Howells
  2018-04-19 13:32 ` [PATCH 11/24] procfs: Move proc_fill_super() to fs/proc/root.c " David Howells
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:32 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Remove code that is now unused after the filesystem context changes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/internal.h              |    2 --
 fs/super.c                 |   54 --------------------------------------------
 include/linux/lsm_hooks.h  |    3 --
 include/linux/security.h   |    7 ------
 security/security.c        |    5 ----
 security/selinux/hooks.c   |   20 ----------------
 security/smack/smack_lsm.c |   33 ---------------------------
 7 files changed, 124 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 91a990234488..f47ede6ace5a 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -101,8 +101,6 @@ extern struct file *get_empty_filp(void);
 extern int do_remount_sb(struct super_block *, int, void *, size_t, int,
 			 struct fs_context *);
 extern bool trylock_super(struct super_block *sb);
-extern struct dentry *mount_fs(struct file_system_type *,
-			       int, const char *, void *, size_t);
 extern struct super_block *user_get_super(dev_t);
 
 /*
diff --git a/fs/super.c b/fs/super.c
index 5d65a45ca6db..a27487e34ea4 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1441,60 +1441,6 @@ struct dentry *mount_single(struct file_system_type *fs_type,
 }
 EXPORT_SYMBOL(mount_single);
 
-struct dentry *
-mount_fs(struct file_system_type *type, int flags, const char *name,
-	 void *data, size_t data_size)
-{
-	struct dentry *root;
-	struct super_block *sb;
-	char *secdata = NULL;
-	int error = -ENOMEM;
-
-	if (data && !(type->fs_flags & FS_BINARY_MOUNTDATA)) {
-		secdata = alloc_secdata();
-		if (!secdata)
-			goto out;
-
-		error = security_sb_copy_data(data, data_size, secdata);
-		if (error)
-			goto out_free_secdata;
-	}
-
-	root = type->mount(type, flags, name, data, data_size);
-	if (IS_ERR(root)) {
-		error = PTR_ERR(root);
-		goto out_free_secdata;
-	}
-	sb = root->d_sb;
-	BUG_ON(!sb);
-	WARN_ON(!sb->s_bdi);
-	sb->s_flags |= SB_BORN;
-
-	error = security_sb_kern_mount(sb, flags, secdata, data_size);
-	if (error)
-		goto out_sb;
-
-	/*
-	 * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE
-	 * but s_maxbytes was an unsigned long long for many releases. Throw
-	 * this warning for a little while to try and catch filesystems that
-	 * violate this rule.
-	 */
-	WARN((sb->s_maxbytes < 0), "%s set sb->s_maxbytes to "
-		"negative value (%lld)\n", type->name, sb->s_maxbytes);
-
-	up_write(&sb->s_umount);
-	free_secdata(secdata);
-	return root;
-out_sb:
-	dput(root);
-	deactivate_locked_super(sb);
-out_free_secdata:
-	free_secdata(secdata);
-out:
-	return ERR_PTR(error);
-}
-
 /*
  * Setup private BDI for given superblock. It gets automatically cleaned up
  * in generic_shutdown_super().
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d533ca038604..1d6481c4f965 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1511,8 +1511,6 @@ union security_list_options {
 	void (*sb_free_security)(struct super_block *sb);
 	int (*sb_copy_data)(char *orig, size_t orig_size, char *copy);
 	int (*sb_remount)(struct super_block *sb, void *data, size_t data_size);
-	int (*sb_kern_mount)(struct super_block *sb, int flags,
-			     void *data, size_t data_size);
 	int (*sb_show_options)(struct seq_file *m, struct super_block *sb);
 	int (*sb_statfs)(struct dentry *dentry);
 	int (*sb_mount)(const char *dev_name, const struct path *path,
@@ -1858,7 +1856,6 @@ struct security_hook_heads {
 	struct hlist_head sb_free_security;
 	struct hlist_head sb_copy_data;
 	struct hlist_head sb_remount;
-	struct hlist_head sb_kern_mount;
 	struct hlist_head sb_show_options;
 	struct hlist_head sb_statfs;
 	struct hlist_head sb_mount;
diff --git a/include/linux/security.h b/include/linux/security.h
index 22b83dc28bd3..5b9d9ffa6abd 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -245,7 +245,6 @@ int security_sb_alloc(struct super_block *sb);
 void security_sb_free(struct super_block *sb);
 int security_sb_copy_data(char *orig, size_t orig_size, char *copy);
 int security_sb_remount(struct super_block *sb, void *data, size_t data_size);
-int security_sb_kern_mount(struct super_block *sb, int flags, void *data, size_t data_size);
 int security_sb_show_options(struct seq_file *m, struct super_block *sb);
 int security_sb_statfs(struct dentry *dentry);
 int security_sb_mount(const char *dev_name, const struct path *path,
@@ -601,12 +600,6 @@ static inline int security_sb_remount(struct super_block *sb, void *data, size_t
 	return 0;
 }
 
-static inline int security_sb_kern_mount(struct super_block *sb, int flags,
-					 void *data, size_t data_size)
-{
-	return 0;
-}
-
 static inline int security_sb_show_options(struct seq_file *m,
 					   struct super_block *sb)
 {
diff --git a/security/security.c b/security/security.c
index 35abdc964724..9693a175587d 100644
--- a/security/security.c
+++ b/security/security.c
@@ -420,11 +420,6 @@ int security_sb_remount(struct super_block *sb, void *data, size_t data_size)
 	return call_int_hook(sb_remount, 0, sb, data, data_size);
 }
 
-int security_sb_kern_mount(struct super_block *sb, int flags, void *data, size_t data_size)
-{
-	return call_int_hook(sb_kern_mount, 0, sb, flags, data, data_size);
-}
-
 int security_sb_show_options(struct seq_file *m, struct super_block *sb)
 {
 	return call_int_hook(sb_show_options, 0, m, sb);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 46d31858eaba..cb76a2428926 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2909,25 +2909,6 @@ static int selinux_sb_remount(struct super_block *sb, void *data, size_t data_si
 	goto out_free_opts;
 }
 
-static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data, size_t data_size)
-{
-	const struct cred *cred = current_cred();
-	struct common_audit_data ad;
-	int rc;
-
-	rc = superblock_doinit(sb, data);
-	if (rc)
-		return rc;
-
-	/* Allow all mounts performed by the kernel */
-	if (flags & MS_KERNMOUNT)
-		return 0;
-
-	ad.type = LSM_AUDIT_DATA_DENTRY;
-	ad.u.dentry = sb->s_root;
-	return superblock_has_perm(cred, sb, FILESYSTEM__MOUNT, &ad);
-}
-
 static int selinux_sb_statfs(struct dentry *dentry)
 {
 	const struct cred *cred = current_cred();
@@ -7138,7 +7119,6 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(sb_free_security, selinux_sb_free_security),
 	LSM_HOOK_INIT(sb_copy_data, selinux_sb_copy_data),
 	LSM_HOOK_INIT(sb_remount, selinux_sb_remount),
-	LSM_HOOK_INIT(sb_kern_mount, selinux_sb_kern_mount),
 	LSM_HOOK_INIT(sb_show_options, selinux_sb_show_options),
 	LSM_HOOK_INIT(sb_statfs, selinux_sb_statfs),
 	LSM_HOOK_INIT(sb_mount, selinux_mount),
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 5dc31a940961..5fe9e7948fde 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1150,38 +1150,6 @@ static int smack_set_mnt_opts(struct super_block *sb,
 	return 0;
 }
 
-/**
- * smack_sb_kern_mount - Smack specific mount processing
- * @sb: the file system superblock
- * @flags: the mount flags
- * @data: the smack mount options
- *
- * Returns 0 on success, an error code on failure
- */
-static int smack_sb_kern_mount(struct super_block *sb, int flags,
-			       void *data, size_t data_size)
-{
-	int rc = 0;
-	char *options = data;
-	struct security_mnt_opts opts;
-
-	security_init_mnt_opts(&opts);
-
-	if (!options)
-		goto out;
-
-	rc = smack_parse_opts_str(options, &opts);
-	if (rc)
-		goto out_err;
-
-out:
-	rc = smack_set_mnt_opts(sb, &opts, 0, NULL);
-
-out_err:
-	security_free_mnt_opts(&opts);
-	return rc;
-}
-
 /**
  * smack_sb_statfs - Smack check on statfs
  * @dentry: identifies the file system in question
@@ -4942,7 +4910,6 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(sb_alloc_security, smack_sb_alloc_security),
 	LSM_HOOK_INIT(sb_free_security, smack_sb_free_security),
 	LSM_HOOK_INIT(sb_copy_data, smack_sb_copy_data),
-	LSM_HOOK_INIT(sb_kern_mount, smack_sb_kern_mount),
 	LSM_HOOK_INIT(sb_statfs, smack_sb_statfs),
 	LSM_HOOK_INIT(sb_set_mnt_opts, smack_set_mnt_opts),
 	LSM_HOOK_INIT(sb_parse_opts_str, smack_parse_opts_str),

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/24] procfs: Move proc_fill_super() to fs/proc/root.c [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (9 preceding siblings ...)
  2018-04-19 13:32 ` [PATCH 10/24] VFS: Remove unused code after filesystem context changes " David Howells
@ 2018-04-19 13:32 ` David Howells
  2018-04-19 13:32 ` [PATCH 12/24] proc: Add fs_context support to procfs " David Howells
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:32 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Move proc_fill_super() to fs/proc/root.c as that's where the other
superblock stuff is.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/proc/inode.c    |   49 +------------------------------------------------
 fs/proc/internal.h |    4 +---
 fs/proc/root.c     |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index df65431c00be..0b13cf6eb6d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -24,7 +24,6 @@
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/mount.h>
-#include <linux/magic.h>
 
 #include <linux/uaccess.h>
 
@@ -123,7 +122,7 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 	return 0;
 }
 
-static const struct super_operations proc_sops = {
+const struct super_operations proc_sops = {
 	.alloc_inode	= proc_alloc_inode,
 	.destroy_inode	= proc_destroy_inode,
 	.drop_inode	= generic_delete_inode,
@@ -489,49 +488,3 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de)
 	       pde_put(de);
 	return inode;
 }
-
-int proc_fill_super(struct super_block *s, void *data, size_t data_size,
-		    int silent)
-{
-	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
-	struct inode *root_inode;
-	int ret;
-
-	if (!proc_parse_options(data, ns))
-		return -EINVAL;
-
-	/* User space would break if executables or devices appear on proc */
-	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
-	s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
-	s->s_blocksize = 1024;
-	s->s_blocksize_bits = 10;
-	s->s_magic = PROC_SUPER_MAGIC;
-	s->s_op = &proc_sops;
-	s->s_time_gran = 1;
-
-	/*
-	 * procfs isn't actually a stacking filesystem; however, there is
-	 * too much magic going on inside it to permit stacking things on
-	 * top of it
-	 */
-	s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
-	
-	pde_get(&proc_root);
-	root_inode = proc_get_inode(s, &proc_root);
-	if (!root_inode) {
-		pr_err("proc_fill_super: get root inode failed\n");
-		return -ENOMEM;
-	}
-
-	s->s_root = d_make_root(root_inode);
-	if (!s->s_root) {
-		pr_err("proc_fill_super: allocate dentry failed\n");
-		return -ENOMEM;
-	}
-
-	ret = proc_setup_self(s);
-	if (ret) {
-		return ret;
-	}
-	return proc_setup_thread_self(s);
-}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3362732fffa3..3182e1b636d3 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -189,13 +189,12 @@ struct pde_opener {
 	struct completion *c;
 } __randomize_layout;
 extern const struct inode_operations proc_link_inode_operations;
-
 extern const struct inode_operations proc_pid_link_inode_operations;
+extern const struct super_operations proc_sops;
 
 void proc_init_kmemcache(void);
 void set_proc_pid_nlink(void);
 extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
-extern int proc_fill_super(struct super_block *, void *, size_t, int);
 extern void proc_entry_rundown(struct proc_dir_entry *);
 
 /*
@@ -253,7 +252,6 @@ static inline void proc_tty_init(void) {}
  * root.c
  */
 extern struct proc_dir_entry proc_root;
-extern int proc_parse_options(char *options, struct pid_namespace *pid);
 
 extern void proc_self_init(void);
 extern int proc_remount(struct super_block *, int *, char *, size_t);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 99ce06c4e1a2..2fbc177f37a8 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -23,6 +23,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/parser.h>
 #include <linux/cred.h>
+#include <linux/magic.h>
 
 #include "internal.h"
 
@@ -36,7 +37,7 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
-int proc_parse_options(char *options, struct pid_namespace *pid)
+static int proc_parse_options(char *options, struct pid_namespace *pid)
 {
 	char *p;
 	substring_t args[MAX_OPT_ARGS];
@@ -78,6 +79,51 @@ int proc_parse_options(char *options, struct pid_namespace *pid)
 	return 1;
 }
 
+static int proc_fill_super(struct super_block *s, void *data, size_t data_size, int silent)
+{
+	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
+	struct inode *root_inode;
+	int ret;
+
+	if (!proc_parse_options(data, ns))
+		return -EINVAL;
+
+	/* User space would break if executables or devices appear on proc */
+	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
+	s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
+	s->s_blocksize = 1024;
+	s->s_blocksize_bits = 10;
+	s->s_magic = PROC_SUPER_MAGIC;
+	s->s_op = &proc_sops;
+	s->s_time_gran = 1;
+
+	/*
+	 * procfs isn't actually a stacking filesystem; however, there is
+	 * too much magic going on inside it to permit stacking things on
+	 * top of it
+	 */
+	s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
+	
+	pde_get(&proc_root);
+	root_inode = proc_get_inode(s, &proc_root);
+	if (!root_inode) {
+		pr_err("proc_fill_super: get root inode failed\n");
+		return -ENOMEM;
+	}
+
+	s->s_root = d_make_root(root_inode);
+	if (!s->s_root) {
+		pr_err("proc_fill_super: allocate dentry failed\n");
+		return -ENOMEM;
+	}
+
+	ret = proc_setup_self(s);
+	if (ret) {
+		return ret;
+	}
+	return proc_setup_thread_self(s);
+}
+
 int proc_remount(struct super_block *sb, int *flags,
 		 char *data, size_t data_size)
 {

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/24] proc: Add fs_context support to procfs [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (10 preceding siblings ...)
  2018-04-19 13:32 ` [PATCH 11/24] procfs: Move proc_fill_super() to fs/proc/root.c " David Howells
@ 2018-04-19 13:32 ` David Howells
  2018-06-19  3:34   ` [12/24] " Andrei Vagin
  2018-04-19 13:32 ` [PATCH 13/24] ipc: Convert mqueue fs to fs_context " David Howells
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: David Howells @ 2018-04-19 13:32 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Add fs_context support to procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/proc/inode.c    |    2 -
 fs/proc/internal.h |    2 -
 fs/proc/root.c     |  169 ++++++++++++++++++++++++++++++++++------------------
 3 files changed, 113 insertions(+), 60 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 0b13cf6eb6d7..7aa86dd65ba8 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -128,7 +128,7 @@ const struct super_operations proc_sops = {
 	.drop_inode	= generic_delete_inode,
 	.evict_inode	= proc_evict_inode,
 	.statfs		= simple_statfs,
-	.remount_fs	= proc_remount,
+	.reconfigure	= proc_reconfigure,
 	.show_options	= proc_show_options,
 };
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3182e1b636d3..a5ab9504768a 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -254,7 +254,7 @@ static inline void proc_tty_init(void) {}
 extern struct proc_dir_entry proc_root;
 
 extern void proc_self_init(void);
-extern int proc_remount(struct super_block *, int *, char *, size_t);
+extern int proc_reconfigure(struct super_block *, struct fs_context *);
 
 /*
  * task_[no]mmu.c
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 2fbc177f37a8..e6bd31fbc714 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -19,14 +19,24 @@
 #include <linux/module.h>
 #include <linux/bitops.h>
 #include <linux/user_namespace.h>
+#include <linux/fs_context.h>
 #include <linux/mount.h>
 #include <linux/pid_namespace.h>
 #include <linux/parser.h>
 #include <linux/cred.h>
 #include <linux/magic.h>
+#include <linux/slab.h>
 
 #include "internal.h"
 
+struct proc_fs_context {
+	struct fs_context	fc;
+	struct pid_namespace	*pid_ns;
+	unsigned long		mask;
+	int			hidepid;
+	int			gid;
+};
+
 enum {
 	Opt_gid, Opt_hidepid, Opt_err,
 };
@@ -37,56 +47,60 @@ static const match_table_t tokens = {
 	{Opt_err, NULL},
 };
 
-static int proc_parse_options(char *options, struct pid_namespace *pid)
+static int proc_parse_option(struct fs_context *fc, char *opt, size_t len)
 {
-	char *p;
+	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
 	substring_t args[MAX_OPT_ARGS];
-	int option;
-
-	if (!options)
-		return 1;
-
-	while ((p = strsep(&options, ",")) != NULL) {
-		int token;
-		if (!*p)
-			continue;
-
-		args[0].to = args[0].from = NULL;
-		token = match_token(p, tokens, args);
-		switch (token) {
-		case Opt_gid:
-			if (match_int(&args[0], &option))
-				return 0;
-			pid->pid_gid = make_kgid(current_user_ns(), option);
-			break;
-		case Opt_hidepid:
-			if (match_int(&args[0], &option))
-				return 0;
-			if (option < HIDEPID_OFF ||
-			    option > HIDEPID_INVISIBLE) {
-				pr_err("proc: hidepid value must be between 0 and 2.\n");
-				return 0;
-			}
-			pid->hide_pid = option;
-			break;
-		default:
-			pr_err("proc: unrecognized mount option \"%s\" "
-			       "or missing value\n", p);
-			return 0;
+	int token;
+	
+	args[0].to = args[0].from = NULL;
+	token = match_token(opt, tokens, args);
+	switch (token) {
+	case Opt_gid:
+		if (match_int(&args[0], &ctx->gid))
+			return -EINVAL;
+		break;
+
+	case Opt_hidepid:
+		if (match_int(&args[0], &ctx->hidepid))
+			return -EINVAL;
+		if (ctx->hidepid < HIDEPID_OFF ||
+		    ctx->hidepid > HIDEPID_INVISIBLE) {
+			pr_err("proc: hidepid value must be between 0 and 2.\n");
+			return -EINVAL;
 		}
+		break;
+
+	default:
+		pr_err("proc: unrecognized mount option \"%s\" or missing value\n",
+		       opt);
+		return -EINVAL;
 	}
 
-	return 1;
+	ctx->mask |= 1 << token;
+	return 0;
+}
+
+static void proc_set_options(struct super_block *s,
+			     struct fs_context *fc,
+			     struct pid_namespace *pid_ns,
+			     struct user_namespace *user_ns)
+{
+	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
+
+	if (ctx->mask & (1 << Opt_gid))
+		pid_ns->pid_gid = make_kgid(user_ns, ctx->gid);
+	if (ctx->mask & (1 << Opt_hidepid))
+		pid_ns->hide_pid = ctx->hidepid;
 }
 
-static int proc_fill_super(struct super_block *s, void *data, size_t data_size, int silent)
+static int proc_fill_super(struct super_block *s, struct fs_context *fc)
 {
-	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
+	struct pid_namespace *pid_ns = get_pid_ns(s->s_fs_info);
 	struct inode *root_inode;
 	int ret;
 
-	if (!proc_parse_options(data, ns))
-		return -EINVAL;
+	proc_set_options(s, fc, pid_ns, current_user_ns());
 
 	/* User space would break if executables or devices appear on proc */
 	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
@@ -103,7 +117,7 @@ static int proc_fill_super(struct super_block *s, void *data, size_t data_size,
 	 * top of it
 	 */
 	s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
-	
+
 	pde_get(&proc_root);
 	root_inode = proc_get_inode(s, &proc_root);
 	if (!root_inode) {
@@ -124,30 +138,46 @@ static int proc_fill_super(struct super_block *s, void *data, size_t data_size,
 	return proc_setup_thread_self(s);
 }
 
-int proc_remount(struct super_block *sb, int *flags,
-		 char *data, size_t data_size)
+int proc_reconfigure(struct super_block *sb, struct fs_context *fc)
 {
 	struct pid_namespace *pid = sb->s_fs_info;
 
 	sync_filesystem(sb);
-	return !proc_parse_options(data, pid);
+
+	if (fc)
+		proc_set_options(sb, fc, pid, current_user_ns());
+	return 0;
 }
 
-static struct dentry *proc_mount(struct file_system_type *fs_type,
-				 int flags, const char *dev_name,
-				 void *data, size_t data_size)
+static int proc_get_tree(struct fs_context *fc)
 {
-	struct pid_namespace *ns;
+	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
 
-	if (flags & SB_KERNMOUNT) {
-		ns = data;
-		data = NULL;
-	} else {
-		ns = task_active_pid_ns(current);
-	}
+	ctx->fc.s_fs_info = ctx->pid_ns;
+	return vfs_get_super(fc, vfs_get_keyed_super, proc_fill_super);
+}
 
-	return mount_ns(fs_type, flags, data, data_size, ns, ns->user_ns,
-			proc_fill_super);
+static void proc_fs_context_free(struct fs_context *fc)
+{
+	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
+
+	if (ctx->pid_ns)
+		put_pid_ns(ctx->pid_ns);
+}
+
+static const struct fs_context_operations proc_fs_context_ops = {
+	.free		= proc_fs_context_free,
+	.parse_option	= proc_parse_option,
+	.get_tree	= proc_get_tree,
+};
+
+static int proc_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
+
+	ctx->pid_ns = get_pid_ns(task_active_pid_ns(current));
+	ctx->fc.ops = &proc_fs_context_ops;
+	return 0;
 }
 
 static void proc_kill_sb(struct super_block *sb)
@@ -165,7 +195,8 @@ static void proc_kill_sb(struct super_block *sb)
 
 static struct file_system_type proc_fs_type = {
 	.name		= "proc",
-	.mount		= proc_mount,
+	.fs_context_size	= sizeof(struct proc_fs_context),
+	.init_fs_context	= proc_init_fs_context,
 	.kill_sb	= proc_kill_sb,
 	.fs_flags	= FS_USERNS_MOUNT,
 };
@@ -205,7 +236,7 @@ static struct dentry *proc_root_lookup(struct inode * dir, struct dentry * dentr
 {
 	if (!proc_pid_lookup(dir, dentry, flags))
 		return NULL;
-	
+
 	return proc_lookup(dir, dentry, flags);
 }
 
@@ -259,9 +290,31 @@ struct proc_dir_entry proc_root = {
 
 int pid_ns_prepare_proc(struct pid_namespace *ns)
 {
+	struct proc_fs_context *ctx;
+	struct fs_context *fc;
 	struct vfsmount *mnt;
+	int ret;
+
+	fc = vfs_new_fs_context(&proc_fs_type, NULL, 0,
+				FS_CONTEXT_FOR_KERNEL_MOUNT);
+	if (IS_ERR(fc))
+		return PTR_ERR(fc);
+
+	ctx = container_of(fc, struct proc_fs_context, fc);
+	if (ctx->pid_ns != ns) {
+		put_pid_ns(ctx->pid_ns);
+		get_pid_ns(ns);
+		ctx->pid_ns = ns;
+	}
+
+	ret = vfs_get_tree(fc);
+	if (ret < 0) {
+		put_fs_context(fc);
+		return ret;
+	}
 
-	mnt = kern_mount_data(&proc_fs_type, ns, 0);
+	mnt = vfs_create_mount(fc);
+	put_fs_context(fc);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/24] ipc: Convert mqueue fs to fs_context [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (11 preceding siblings ...)
  2018-04-19 13:32 ` [PATCH 12/24] proc: Add fs_context support to procfs " David Howells
@ 2018-04-19 13:32 ` David Howells
  2018-04-19 13:32 ` [PATCH 14/24] cpuset: Use " David Howells
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:32 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Convert the mqueue filesystem to use the filesystem context stuff.

Notes:

 (1) The relevant ipc namespace is selected in when the context is
     initialised (and it defaults to the current task's ipc namespace).
     The caller can override this before calling vfs_get_tree().

 (2) Rather than simply calling kern_mount_data(), mq_init_ns() and
     mq_internal_mount() create a context, adjust it and then do the rest
     of the mount procedure.

 (3) The lazy mqueue mounting on creation of a new namespace is retained
     from a previous patch, but the avoidance of sget() if no superblock
     yet exists is reverted and the superblock is again keyed on the
     namespace pointer.

     Yes, there was a performance gain in not searching the superblock
     hash, but it's only paid once per ipc namespace - and only if someone
     uses mqueue within that namespace, so I'm not sure it's worth it,
     especially as calling sget() allows avoidance of recursion.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 ipc/mqueue.c |  116 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 94 insertions(+), 22 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 910c3c7532e6..2f2e7d73b13d 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/file.h>
 #include <linux/mount.h>
+#include <linux/fs_context.h>
 #include <linux/namei.h>
 #include <linux/sysctl.h>
 #include <linux/poll.h>
@@ -42,6 +43,11 @@
 #include <net/sock.h>
 #include "util.h"
 
+struct mqueue_fs_context {
+	struct fs_context	fc;
+	struct ipc_namespace	*ipc_ns;
+};
+
 #define MQUEUE_MAGIC	0x19800202
 #define DIRENT_SIZE	20
 #define FILENT_SIZE	80
@@ -87,9 +93,11 @@ struct mqueue_inode_info {
 	unsigned long qsize; /* size of queue in memory (sum of all msgs) */
 };
 
+static struct file_system_type mqueue_fs_type;
 static const struct inode_operations mqueue_dir_inode_operations;
 static const struct file_operations mqueue_file_operations;
 static const struct super_operations mqueue_super_ops;
+static const struct fs_context_operations mqueue_fs_context_ops;
 static void remove_notification(struct mqueue_inode_info *info);
 
 static struct kmem_cache *mqueue_inode_cachep;
@@ -322,7 +330,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 	return ERR_PTR(ret);
 }
 
-static int mqueue_fill_super(struct super_block *sb, void *data, size_t data_size, int silent)
+static int mqueue_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	struct inode *inode;
 	struct ipc_namespace *ns = sb->s_fs_info;
@@ -343,19 +351,77 @@ static int mqueue_fill_super(struct super_block *sb, void *data, size_t data_siz
 	return 0;
 }
 
-static struct dentry *mqueue_mount(struct file_system_type *fs_type,
-			 int flags, const char *dev_name,
-			 void *data, size_t data_size)
+static int mqueue_get_tree(struct fs_context *fc)
 {
-	struct ipc_namespace *ns;
-	if (flags & SB_KERNMOUNT) {
-		ns = data;
-		data = NULL;
-	} else {
-		ns = current->nsproxy->ipc_ns;
+	struct mqueue_fs_context *ctx = container_of(fc, struct mqueue_fs_context, fc);
+
+	/* As a shortcut, if the namespace already has a superblock created,
+	 * use the root from that directly rather than invoking sget() again.
+	 */
+	spin_lock(&mq_lock);
+	if (ctx->ipc_ns->mq_mnt) {
+		fc->root = dget(ctx->ipc_ns->mq_mnt->mnt_sb->s_root);
+		atomic_inc(&fc->root->d_sb->s_active);
+	}
+	spin_unlock(&mq_lock);
+	if (fc->root) {
+		down_write(&fc->root->d_sb->s_umount);
+		return 0;
 	}
-	return mount_ns(fs_type, flags, data, data_size, ns, ns->user_ns,
-			mqueue_fill_super);
+
+	ctx->fc.s_fs_info = ctx->ipc_ns;
+	return vfs_get_super(fc, vfs_get_keyed_super, mqueue_fill_super);
+}
+
+static void mqueue_fs_context_free(struct fs_context *fc)
+{
+	struct mqueue_fs_context *ctx = container_of(fc, struct mqueue_fs_context, fc);
+
+	if (ctx->ipc_ns)
+		put_ipc_ns(ctx->ipc_ns);
+}
+
+static int mqueue_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	struct mqueue_fs_context *ctx = container_of(fc, struct mqueue_fs_context, fc);
+
+	ctx->ipc_ns = get_ipc_ns(current->nsproxy->ipc_ns);
+	ctx->fc.ops = &mqueue_fs_context_ops;
+	return 0;
+}
+
+static struct vfsmount *mq_create_mount(struct ipc_namespace *ns)
+{
+	struct mqueue_fs_context *ctx;
+	struct fs_context *fc;
+	struct vfsmount *mnt;
+	int ret;
+
+	fc = vfs_new_fs_context(&mqueue_fs_type, NULL, 0,
+				FS_CONTEXT_FOR_KERNEL_MOUNT);
+	if (IS_ERR(fc))
+		return ERR_CAST(fc);
+
+	ctx = container_of(fc, struct mqueue_fs_context, fc);
+	put_ipc_ns(ctx->ipc_ns);
+	ctx->ipc_ns = get_ipc_ns(ns);
+
+	ret = vfs_get_tree(fc);
+	if (ret < 0)
+		goto err_fc;
+
+	mnt = vfs_create_mount(fc);
+	if (IS_ERR(mnt)) {
+		ret = PTR_ERR(mnt);
+		goto err_fc;
+	}
+
+	put_fs_context(fc);
+	return mnt;
+
+err_fc:
+	put_fs_context(fc);
+	return ERR_PTR(ret);
 }
 
 static void init_once(void *foo)
@@ -1521,15 +1587,23 @@ static const struct super_operations mqueue_super_ops = {
 	.statfs = simple_statfs,
 };
 
+static const struct fs_context_operations mqueue_fs_context_ops = {
+	.free		= mqueue_fs_context_free,
+	.get_tree	= mqueue_get_tree,
+};
+
 static struct file_system_type mqueue_fs_type = {
-	.name = "mqueue",
-	.mount = mqueue_mount,
-	.kill_sb = kill_litter_super,
-	.fs_flags = FS_USERNS_MOUNT,
+	.name			= "mqueue",
+	.fs_context_size	= sizeof(struct mqueue_fs_context),
+	.init_fs_context	= mqueue_init_fs_context,
+	.kill_sb		= kill_litter_super,
+	.fs_flags		= FS_USERNS_MOUNT,
 };
 
 int mq_init_ns(struct ipc_namespace *ns)
 {
+	struct vfsmount *m;
+
 	ns->mq_queues_count  = 0;
 	ns->mq_queues_max    = DFLT_QUEUESMAX;
 	ns->mq_msg_max       = DFLT_MSGMAX;
@@ -1537,12 +1611,10 @@ int mq_init_ns(struct ipc_namespace *ns)
 	ns->mq_msg_default   = DFLT_MSG;
 	ns->mq_msgsize_default  = DFLT_MSGSIZE;
 
-	ns->mq_mnt = kern_mount_data(&mqueue_fs_type, ns, 0);
-	if (IS_ERR(ns->mq_mnt)) {
-		int err = PTR_ERR(ns->mq_mnt);
-		ns->mq_mnt = NULL;
-		return err;
-	}
+	m = mq_create_mount(&init_ipc_ns);
+	if (IS_ERR(m))
+		return PTR_ERR(ns->mq_mnt);
+	ns->mq_mnt = m;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 14/24] cpuset: Use fs_context [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (12 preceding siblings ...)
  2018-04-19 13:32 ` [PATCH 13/24] ipc: Convert mqueue fs to fs_context " David Howells
@ 2018-04-19 13:32 ` David Howells
  2018-04-19 13:32 ` [PATCH 15/24] kernfs, sysfs, cgroup, intel_rdt: Support " David Howells
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:32 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-fsdevel,
	linux-security-module, Tejun Heo, linux-afs

Make the cpuset filesystem use the filesystem context.  This is potentially
tricky as the cpuset fs is almost an alias for the cgroup filesystem, but
with some special parameters.

This can, however, be handled by setting up an appropriate cgroup
filesystem and returning the root directory of that as the root dir of this
one.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tejun Heo <tj@kernel.org>
---

 kernel/cgroup/cpuset.c |   66 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 14 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 3c8ef37879f0..fce0584f161e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -38,7 +38,7 @@
 #include <linux/mm.h>
 #include <linux/memory.h>
 #include <linux/export.h>
-#include <linux/mount.h>
+#include <linux/fs_context.h>
 #include <linux/namei.h>
 #include <linux/pagemap.h>
 #include <linux/proc_fs.h>
@@ -315,26 +315,64 @@ static inline bool is_in_v2_mode(void)
  * users. If someone tries to mount the "cpuset" filesystem, we
  * silently switch it to mount "cgroup" instead
  */
-static struct dentry *cpuset_mount(struct file_system_type *fs_type,
-				   int flags, const char *unused_dev_name,
-				   void *data, size_t data_size)
+static int cpuset_get_tree(struct fs_context *fc)
 {
-	struct file_system_type *cgroup_fs = get_fs_type("cgroup");
-	struct dentry *ret = ERR_PTR(-ENODEV);
+	static const char opts[] = "cpuset,noprefix,release_agent=/sbin/cpuset_release_agent";
+	struct file_system_type *cgroup_fs;
+	struct fs_context *cg_fc;
+	char *p;
+	int ret = -ENODEV;
+
+	cgroup_fs = get_fs_type("cgroup");
 	if (cgroup_fs) {
-		char mountopts[] =
-			"cpuset,noprefix,"
-			"release_agent=/sbin/cpuset_release_agent";
-		ret = cgroup_fs->mount(cgroup_fs, flags, unused_dev_name,
-				       mountopts, data_size);
-		put_filesystem(cgroup_fs);
+		ret = PTR_ERR(cgroup_fs);
+		goto out;
+	}
+
+	cg_fc = vfs_new_fs_context(cgroup_fs, NULL, fc->sb_flags, fc->purpose);
+	put_filesystem(cgroup_fs);
+	if (IS_ERR(cg_fc)) {
+		ret = PTR_ERR(cg_fc);
+		goto out;
 	}
+
+	ret = -ENOMEM;
+	p = kstrdup(opts, GFP_KERNEL);
+	if (!p)
+		goto out_fc;
+
+	ret = generic_parse_monolithic(fc, p, sizeof(opts) - 1);
+	kfree(p);
+	if (ret < 0)
+		goto out_fc;
+
+	ret = vfs_get_tree(cg_fc);
+	if (ret < 0)
+		goto out_fc;
+
+	fc->root = dget(cg_fc->root);
+	ret = 0;
+
+out_fc:
+	put_fs_context(cg_fc);
+out:
 	return ret;
 }
 
+static const struct fs_context_operations cpuset_fs_context_ops = {
+	.get_tree	= cpuset_get_tree,
+};
+
+static int cpuset_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	fc->ops = &cpuset_fs_context_ops;
+	return 0;
+}
+
 static struct file_system_type cpuset_fs_type = {
-	.name = "cpuset",
-	.mount = cpuset_mount,
+	.name			= "cpuset",
+	.fs_context_size	= sizeof(struct fs_context),
+	.init_fs_context	= cpuset_init_fs_context,
 };
 
 /*

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 15/24] kernfs, sysfs, cgroup, intel_rdt: Support fs_context [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (13 preceding siblings ...)
  2018-04-19 13:32 ` [PATCH 14/24] cpuset: Use " David Howells
@ 2018-04-19 13:32 ` David Howells
  2018-04-19 13:33 ` [PATCH 16/24] hugetlbfs: Convert to " David Howells
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:32 UTC (permalink / raw)
  To: viro
  Cc: fenghua.yu, linux-nfs, Greg Kroah-Hartman, linux-kernel,
	dhowells, linux-fsdevel, linux-security-module, Li Zefan,
	Johannes Weiner, Tejun Heo, cgroups, linux-afs

Make kernfs support superblock creation/mount/remount with fs_context.

This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
be made to support fs_context also.

Notes:

 (1) A kernfs_fs_context struct is created to wrap fs_context and the
     kernfs mount parameters are moved in here (or are in fs_context).

 (2) kernfs_mount{,_ns}() are made into kernfs_get_tree().  The extra
     namespace tag parameter is passed in the context if desired

 (3) kernfs_free_fs_context() is provided as a destructor for the
     kernfs_fs_context struct, but for the moment it does nothing except
     get called in the right places.

 (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
     pass, but possibly this should be done anyway in case someone wants to
     add a parameter in future.

 (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
     the cgroup v1 and v2 mount parameters are all moved there.

 (6) cgroup1 parameter parsing error messages are now handled by invalf(),
     which allows userspace to collect them directly.

 (7) cgroup1 parameter cleanup is now done in the context destructor rather
     than in the mount/get_tree and remount functions.

Weirdies:

 (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
     but then uses the resulting pointer after dropping the locks.  I'm
     told this is okay and needs commenting.

 (*) The cgroup refcount web.  This really needs documenting.

 (*) cgroup2 only has one root?

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Tejun Heo <tj@kernel.org>
cc: Li Zefan <lizefan@huawei.com>
cc: Johannes Weiner <hannes@cmpxchg.org>
cc: cgroups@vger.kernel.org
cc: fenghua.yu@intel.com
---

 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  125 +++++++------
 fs/kernfs/mount.c                        |   91 +++++----
 fs/sysfs/mount.c                         |   59 ++++--
 include/linux/cgroup.h                   |    3 
 include/linux/kernfs.h                   |   37 ++--
 kernel/cgroup/cgroup-internal.h          |   42 +++-
 kernel/cgroup/cgroup-v1.c                |  295 ++++++++++++++----------------
 kernel/cgroup/cgroup.c                   |  219 +++++++++++++---------
 8 files changed, 469 insertions(+), 402 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 3584ef8de1fd..121584d9d544 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -36,6 +36,12 @@
 #include <asm/intel_rdt_sched.h>
 #include "intel_rdt.h"
 
+struct rdt_fs_context {
+	struct kernfs_fs_context kfc;
+	bool	enable_cdpl2;
+	bool	enable_cdpl3;
+};
+
 DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
 DEFINE_STATIC_KEY_FALSE(rdt_mon_enable_key);
 DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
@@ -1104,39 +1110,6 @@ static void cdp_disable_all(void)
 		cdpl2_disable();
 }
 
-static int parse_rdtgroupfs_options(char *data)
-{
-	char *token, *o = data;
-	int ret = 0;
-
-	while ((token = strsep(&o, ",")) != NULL) {
-		if (!*token) {
-			ret = -EINVAL;
-			goto out;
-		}
-
-		if (!strcmp(token, "cdp")) {
-			ret = cdpl3_enable();
-			if (ret)
-				goto out;
-		} else if (!strcmp(token, "cdpl2")) {
-			ret = cdpl2_enable();
-			if (ret)
-				goto out;
-		} else {
-			ret = -EINVAL;
-			goto out;
-		}
-	}
-
-	return 0;
-
-out:
-	pr_err("Invalid mount option \"%s\"\n", token);
-
-	return ret;
-}
-
 /*
  * We don't allow rdtgroup directories to be created anywhere
  * except the root directory. Thus when looking for the rdtgroup
@@ -1205,13 +1178,11 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 			     struct rdtgroup *prgrp,
 			     struct kernfs_node **mon_data_kn);
 
-static struct dentry *rdt_mount(struct file_system_type *fs_type,
-				int flags, const char *unused_dev_name,
-				void *data, size_t data_size)
+static int rdt_get_tree(struct fs_context *fc)
 {
+	struct rdt_fs_context *ctx = container_of(fc, struct rdt_fs_context, kfc.fc);
 	struct rdt_domain *dom;
 	struct rdt_resource *r;
-	struct dentry *dentry;
 	int ret;
 
 	cpus_read_lock();
@@ -1220,47 +1191,46 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 	 * resctrl file system can only be mounted once.
 	 */
 	if (static_branch_unlikely(&rdt_enable_key)) {
-		dentry = ERR_PTR(-EBUSY);
+		ret = -EBUSY;
 		goto out;
 	}
 
-	ret = parse_rdtgroupfs_options(data);
-	if (ret) {
-		dentry = ERR_PTR(ret);
-		goto out_cdp;
+	if (ctx->enable_cdpl2) {
+		ret = cdpl2_enable();
+		if (ret < 0)
+			goto out_cdp;
+	}
+
+	if (ctx->enable_cdpl3) {
+		ret = cdpl3_enable();
+		if (ret < 0)
+			goto out_cdp;
 	}
 
 	closid_init();
 
 	ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
-	if (ret) {
-		dentry = ERR_PTR(ret);
+	if (ret < 0)
 		goto out_cdp;
-	}
 
 	if (rdt_mon_capable) {
 		ret = mongroup_create_dir(rdtgroup_default.kn,
 					  NULL, "mon_groups",
 					  &kn_mongrp);
-		if (ret) {
-			dentry = ERR_PTR(ret);
+		if (ret < 0)
 			goto out_info;
-		}
 		kernfs_get(kn_mongrp);
 
 		ret = mkdir_mondata_all(rdtgroup_default.kn,
 					&rdtgroup_default, &kn_mondata);
-		if (ret) {
-			dentry = ERR_PTR(ret);
+		if (ret < 0)
 			goto out_mongrp;
-		}
 		kernfs_get(kn_mondata);
 		rdtgroup_default.mon.mon_data_kn = kn_mondata;
 	}
 
-	dentry = kernfs_mount(fs_type, flags, rdt_root,
-			      RDTGROUP_SUPER_MAGIC, NULL);
-	if (IS_ERR(dentry))
+	ret = kernfs_get_tree(&ctx->kfc);
+	if (ret < 0)
 		goto out_mondata;
 
 	if (rdt_alloc_capable)
@@ -1293,8 +1263,46 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 	rdt_last_cmd_clear();
 	mutex_unlock(&rdtgroup_mutex);
 	cpus_read_unlock();
+	return ret;
+}
+
+static int rdt_parse_option(struct fs_context *fc, char *opt, size_t len)
+{
+	struct rdt_fs_context *ctx = container_of(fc, struct rdt_fs_context, kfc.fc);
+
+	if (strcmp(opt, "cdp") == 0) {
+		ctx->enable_cdpl3 = true;
+		return 0;
+	}
+	if (strcmp(opt, "cdpl2") == 0) {
+		ctx->enable_cdpl2 = true;
+		return 0;
+	}
 
-	return dentry;
+	return -EINVAL;
+}
+
+static void rdt_fs_context_free(struct fs_context *fc)
+{
+	struct rdt_fs_context *ctx = container_of(fc, struct rdt_fs_context, kfc.fc);
+
+	kernfs_free_fs_context(&ctx->kfc);
+}
+
+static const struct fs_context_operations rdt_fs_context_ops = {
+	.free		= rdt_fs_context_free,
+	.parse_option	= rdt_parse_option,
+	.get_tree	= rdt_get_tree,
+};
+
+static int rdt_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	struct rdt_fs_context *ctx = container_of(fc, struct rdt_fs_context, kfc.fc);
+
+	ctx->kfc.root = rdt_root;
+	ctx->kfc.magic = RDTGROUP_SUPER_MAGIC;
+	ctx->kfc.fc.ops = &rdt_fs_context_ops;
+	return 0;
 }
 
 static int reset_all_ctrls(struct rdt_resource *r)
@@ -1459,9 +1467,10 @@ static void rdt_kill_sb(struct super_block *sb)
 }
 
 static struct file_system_type rdt_fs_type = {
-	.name    = "resctrl",
-	.mount   = rdt_mount,
-	.kill_sb = rdt_kill_sb,
+	.name			= "resctrl",
+	.fs_context_size	= sizeof(struct rdt_fs_context),
+	.init_fs_context	= rdt_init_fs_context,
+	.kill_sb		= rdt_kill_sb,
 };
 
 static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index c05efbd1822e..725874dd6c5b 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -22,14 +22,14 @@
 
 struct kmem_cache *kernfs_node_cache;
 
-static int kernfs_sop_remount_fs(struct super_block *sb, int *flags,
-				 char *data, size_t data_size)
+static int kernfs_sop_reconfigure(struct super_block *sb, struct fs_context *fc)
 {
+	struct kernfs_fs_context *kfc = container_of(fc, struct kernfs_fs_context, fc);
 	struct kernfs_root *root = kernfs_info(sb)->root;
 	struct kernfs_syscall_ops *scops = root->syscall_ops;
 
-	if (scops && scops->remount_fs)
-		return scops->remount_fs(root, flags, data);
+	if (scops && scops->reconfigure)
+		return scops->reconfigure(root, kfc);
 	return 0;
 }
 
@@ -61,7 +61,7 @@ const struct super_operations kernfs_sops = {
 	.drop_inode	= generic_delete_inode,
 	.evict_inode	= kernfs_evict_inode,
 
-	.remount_fs	= kernfs_sop_remount_fs,
+	.reconfigure	= kernfs_sop_reconfigure,
 	.show_options	= kernfs_sop_show_options,
 	.show_path	= kernfs_sop_show_path,
 };
@@ -219,7 +219,7 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
 	} while (true);
 }
 
-static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
+static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *kfc)
 {
 	struct kernfs_super_info *info = kernfs_info(sb);
 	struct inode *inode;
@@ -230,7 +230,7 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	sb->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	sb->s_blocksize = PAGE_SIZE;
 	sb->s_blocksize_bits = PAGE_SHIFT;
-	sb->s_magic = magic;
+	sb->s_magic = kfc->magic;
 	sb->s_op = &kernfs_sops;
 	sb->s_xattr = kernfs_xattr_handlers;
 	if (info->root->flags & KERNFS_ROOT_SUPPORT_EXPORTOP)
@@ -257,20 +257,25 @@ static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 	return 0;
 }
 
-static int kernfs_test_super(struct super_block *sb, void *data)
+static int kernfs_test_super(struct super_block *sb, struct fs_context *fc)
 {
+	struct kernfs_fs_context *kfc = container_of(fc, struct kernfs_fs_context, fc);
 	struct kernfs_super_info *sb_info = kernfs_info(sb);
-	struct kernfs_super_info *info = data;
+	struct kernfs_super_info *info = kfc->info;
 
 	return sb_info->root == info->root && sb_info->ns == info->ns;
 }
 
-static int kernfs_set_super(struct super_block *sb, void *data)
+static int kernfs_set_super(struct super_block *sb, struct fs_context *fc)
 {
+	struct kernfs_fs_context *kfc = container_of(fc, struct kernfs_fs_context, fc);
 	int error;
-	error = set_anon_super(sb, data);
-	if (!error)
-		sb->s_fs_info = data;
+
+	error = set_anon_super(sb, kfc->info);
+	if (!error) {
+		sb->s_fs_info = kfc->info;
+		kfc->info = NULL;
+	}
 	return error;
 }
 
@@ -288,24 +293,15 @@ const void *kernfs_super_ns(struct super_block *sb)
 }
 
 /**
- * kernfs_mount_ns - kernfs mount helper
- * @fs_type: file_system_type of the fs being mounted
- * @flags: mount flags specified for the mount
- * @root: kernfs_root of the hierarchy being mounted
- * @magic: file system specific magic number
- * @new_sb_created: tell the caller if we allocated a new superblock
- * @ns: optional namespace tag of the mount
- *
- * This is to be called from each kernfs user's file_system_type->mount()
- * implementation, which should pass through the specified @fs_type and
- * @flags, and specify the hierarchy and namespace tag to mount via @root
- * and @ns, respectively.
+ * kernfs_get_tree - kernfs filesystem access/retrieval helper
+ * @kfc: The filesystem context.
  *
- * The return value can be passed to the vfs layer verbatim.
+ * This is to be called from each kernfs user's fs_context->ops->get_tree()
+ * implementation, which should set the specified ->@fs_type and ->@flags, and
+ * specify the hierarchy and namespace tag to mount via ->@root and ->@ns,
+ * respectively.
  */
-struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
-				struct kernfs_root *root, unsigned long magic,
-				bool *new_sb_created, const void *ns)
+int kernfs_get_tree(struct kernfs_fs_context *kfc)
 {
 	struct super_block *sb;
 	struct kernfs_super_info *info;
@@ -313,37 +309,42 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
 
 	info = kzalloc(sizeof(*info), GFP_KERNEL);
 	if (!info)
-		return ERR_PTR(-ENOMEM);
-
-	info->root = root;
-	info->ns = ns;
+		return -ENOMEM;
 
-	sb = sget_userns(fs_type, kernfs_test_super, kernfs_set_super, flags,
-			 &init_user_ns, info);
-	if (IS_ERR(sb) || sb->s_fs_info != info)
-		kfree(info);
+	info->root = kfc->root;
+	info->ns = kfc->ns_tag;
+
+	kfc->info = info;
+	sb = sget_fc(&kfc->fc, kernfs_test_super, kernfs_set_super);
+	if (kfc->info) {
+		kfree(kfc->info);
+		kfc->info = NULL;
+	} else {
+		kfc->ns_tag = NULL;
+		kfc->fc.degraded = true;
+	}
 	if (IS_ERR(sb))
-		return ERR_CAST(sb);
-
-	if (new_sb_created)
-		*new_sb_created = !sb->s_root;
+		return PTR_ERR(sb);
 
 	if (!sb->s_root) {
 		struct kernfs_super_info *info = kernfs_info(sb);
 
-		error = kernfs_fill_super(sb, magic);
+		kfc->new_sb_created = true;
+
+		error = kernfs_fill_super(sb, kfc);
 		if (error) {
 			deactivate_locked_super(sb);
-			return ERR_PTR(error);
+			return error;
 		}
 		sb->s_flags |= SB_ACTIVE;
 
 		mutex_lock(&kernfs_mutex);
-		list_add(&info->node, &root->supers);
+		list_add(&info->node, &info->root->supers);
 		mutex_unlock(&kernfs_mutex);
 	}
 
-	return dget(sb->s_root);
+	kfc->fc.root = dget(sb->s_root);
+	return 0;
 }
 
 /**
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index b2b7d9ae4aba..d81f117562f5 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -20,27 +20,45 @@
 static struct kernfs_root *sysfs_root;
 struct kernfs_node *sysfs_root_kn;
 
-static struct dentry *sysfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data, size_t data_size)
+static int sysfs_get_tree(struct fs_context *fc)
 {
-	struct dentry *root;
-	void *ns;
-	bool new_sb;
+	struct kernfs_fs_context *kfc = container_of(fc, struct kernfs_fs_context, fc);
+	int ret;
 
-	if (!(flags & SB_KERNMOUNT)) {
+	ret = kernfs_get_tree(kfc);
+	if (kfc->new_sb_created)
+		fc->root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
+	return 0;
+}
+
+static void sysfs_fs_context_free(struct fs_context *fc)
+{
+	struct kernfs_fs_context *kfc = container_of(fc, struct kernfs_fs_context, fc);
+
+	if (kfc->ns_tag)
+		kobj_ns_drop(KOBJ_NS_TYPE_NET, kfc->ns_tag);
+	kernfs_free_fs_context(kfc);
+}
+
+static const struct fs_context_operations sysfs_fs_context_ops = {
+	.free		= sysfs_fs_context_free,
+	.get_tree	= sysfs_get_tree,
+};
+
+static int sysfs_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	struct kernfs_fs_context *kfc = container_of(fc, struct kernfs_fs_context, fc);
+
+	if (!(fc->sb_flags & SB_KERNMOUNT)) {
 		if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
-			return ERR_PTR(-EPERM);
+			return -EPERM;
 	}
 
-	ns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
-	root = kernfs_mount_ns(fs_type, flags, sysfs_root,
-				SYSFS_MAGIC, &new_sb, ns);
-	if (IS_ERR(root) || !new_sb)
-		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
-	else if (new_sb)
-		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
-
-	return root;
+	kfc->ns_tag = kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
+	kfc->root = sysfs_root;
+	kfc->magic = SYSFS_MAGIC;
+	kfc->fc.ops = &sysfs_fs_context_ops;
+	return 0;
 }
 
 static void sysfs_kill_sb(struct super_block *sb)
@@ -52,10 +70,11 @@ static void sysfs_kill_sb(struct super_block *sb)
 }
 
 static struct file_system_type sysfs_fs_type = {
-	.name		= "sysfs",
-	.mount		= sysfs_mount,
-	.kill_sb	= sysfs_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.name			= "sysfs",
+	.fs_context_size	= sizeof(struct kernfs_fs_context),
+	.init_fs_context	= sysfs_init_fs_context,
+	.kill_sb		= sysfs_kill_sb,
+	.fs_flags		= FS_USERNS_MOUNT,
 };
 
 int __init sysfs_init(void)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 473e0c0abb86..50771a7d0be9 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -821,10 +821,11 @@ copy_cgroup_ns(unsigned long flags, struct user_namespace *user_ns,
 
 #endif /* !CONFIG_CGROUPS */
 
-static inline void get_cgroup_ns(struct cgroup_namespace *ns)
+static inline struct cgroup_namespace *get_cgroup_ns(struct cgroup_namespace *ns)
 {
 	if (ns)
 		refcount_inc(&ns->count);
+	return ns;
 }
 
 static inline void put_cgroup_ns(struct cgroup_namespace *ns)
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index ab25c8b6d9e3..29f6e15254bc 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -16,6 +16,7 @@
 #include <linux/rbtree.h>
 #include <linux/atomic.h>
 #include <linux/wait.h>
+#include <linux/fs_context.h>
 
 struct file;
 struct dentry;
@@ -25,6 +26,7 @@ struct vm_area_struct;
 struct super_block;
 struct file_system_type;
 
+struct kernfs_fs_context;
 struct kernfs_open_node;
 struct kernfs_iattrs;
 
@@ -166,7 +168,7 @@ struct kernfs_node {
  * kernfs_node parameter.
  */
 struct kernfs_syscall_ops {
-	int (*remount_fs)(struct kernfs_root *root, int *flags, char *data);
+	int (*reconfigure)(struct kernfs_root *root, struct kernfs_fs_context *kfc);
 	int (*show_options)(struct seq_file *sf, struct kernfs_root *root);
 
 	int (*mkdir)(struct kernfs_node *parent, const char *name,
@@ -267,6 +269,20 @@ struct kernfs_ops {
 #endif
 };
 
+/*
+ * The kernfs superblock creation/mount parameter context.
+ */
+struct kernfs_fs_context {
+	struct fs_context	fc;
+	struct kernfs_root	*root;		/* Root of the hierarchy being mounted */
+	void			*ns_tag;	/* Namespace tag of the mount (or NULL) */
+	unsigned long		magic;		/* File system specific magic number */
+
+	/* The following are set/used by kernfs_mount() */
+	struct kernfs_super_info *info;		/* The new superblock info */
+	bool			new_sb_created;	/* Set to T if we allocated a new sb */
+};
+
 #ifdef CONFIG_KERNFS
 
 static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn)
@@ -350,9 +366,7 @@ int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr);
 void kernfs_notify(struct kernfs_node *kn);
 
 const void *kernfs_super_ns(struct super_block *sb);
-struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
-			       struct kernfs_root *root, unsigned long magic,
-			       bool *new_sb_created, const void *ns);
+int kernfs_get_tree(struct kernfs_fs_context *fc);
 void kernfs_kill_sb(struct super_block *sb);
 struct super_block *kernfs_pin_sb(struct kernfs_root *root, const void *ns);
 
@@ -454,11 +468,8 @@ static inline void kernfs_notify(struct kernfs_node *kn) { }
 static inline const void *kernfs_super_ns(struct super_block *sb)
 { return NULL; }
 
-static inline struct dentry *
-kernfs_mount_ns(struct file_system_type *fs_type, int flags,
-		struct kernfs_root *root, unsigned long magic,
-		bool *new_sb_created, const void *ns)
-{ return ERR_PTR(-ENOSYS); }
+static inline int kernfs_get_tree(struct kernfs_fs_context *fc)
+{ return -ENOSYS; }
 
 static inline void kernfs_kill_sb(struct super_block *sb) { }
 
@@ -535,13 +546,9 @@ static inline int kernfs_rename(struct kernfs_node *kn,
 	return kernfs_rename_ns(kn, new_parent, new_name, NULL);
 }
 
-static inline struct dentry *
-kernfs_mount(struct file_system_type *fs_type, int flags,
-		struct kernfs_root *root, unsigned long magic,
-		bool *new_sb_created)
+static inline void kernfs_free_fs_context(struct kernfs_fs_context *kfc)
 {
-	return kernfs_mount_ns(fs_type, flags, root,
-				magic, new_sb_created, NULL);
+	/* Note that we don't deal with kfc->ns_tag here. */
 }
 
 #endif	/* __LINUX_KERNFS_H */
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index b928b27050c6..1aa3176779c9 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -8,6 +8,26 @@
 #include <linux/list.h>
 #include <linux/refcount.h>
 
+/*
+ * The cgroup filesystem superblock creation/mount context.
+ */
+struct cgroup_fs_context {
+	struct kernfs_fs_context kfc;
+	struct cgroup_root	*root;
+	struct cgroup_namespace	*ns;
+	u8		version;		/* cgroups version */
+	unsigned int	flags;			/* CGRP_ROOT_* flags */
+
+	/* cgroup1 bits */
+	bool		cpuset_clone_children;
+	bool		none;			/* User explicitly requested empty subsystem */
+	bool		all_ss;			/* Seen 'all' option */
+	bool		one_ss;			/* Seen 'none' option */
+	u16		subsys_mask;		/* Selected subsystems */
+	char		*name;			/* Hierarchy name */
+	char		*release_agent;		/* Path for release notifications */
+};
+
 /*
  * A cgroup can be associated with multiple css_sets as different tasks may
  * belong to different cgroups on different hierarchies.  In the other
@@ -89,16 +109,6 @@ struct cgroup_mgctx {
 #define DEFINE_CGROUP_MGCTX(name)						\
 	struct cgroup_mgctx name = CGROUP_MGCTX_INIT(name)
 
-struct cgroup_sb_opts {
-	u16 subsys_mask;
-	unsigned int flags;
-	char *release_agent;
-	bool cpuset_clone_children;
-	char *name;
-	/* User explicitly requested empty subsystem */
-	bool none;
-};
-
 extern struct mutex cgroup_mutex;
 extern spinlock_t css_set_lock;
 extern struct cgroup_subsys *cgroup_subsys[];
@@ -169,12 +179,10 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
 			  struct cgroup_namespace *ns);
 
 void cgroup_free_root(struct cgroup_root *root);
-void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts);
+void init_cgroup_root(struct cgroup_fs_context *ctx);
 int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags);
 int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
-struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
-			       struct cgroup_root *root, unsigned long magic,
-			       struct cgroup_namespace *ns);
+int cgroup_do_get_tree(struct cgroup_fs_context *ctx);
 
 int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp);
 void cgroup_migrate_finish(struct cgroup_mgctx *mgctx);
@@ -225,8 +233,8 @@ bool cgroup1_ssid_disabled(int ssid);
 void cgroup1_pidlist_destroy_all(struct cgroup *cgrp);
 void cgroup1_release_agent(struct work_struct *work);
 void cgroup1_check_for_release(struct cgroup *cgrp);
-struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
-			     void *data, unsigned long magic,
-			     struct cgroup_namespace *ns);
+int cgroup1_parse_option(struct cgroup_fs_context *ctx, char *p);
+int cgroup1_validate(struct cgroup_fs_context *ctx);
+int cgroup1_get_tree(struct cgroup_fs_context *ctx);
 
 #endif /* __CGROUP_INTERNAL_H */
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index a2c05d2476ac..e149ff63b35a 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -16,6 +16,8 @@
 
 #include <trace/events/cgroup.h>
 
+#define cg_invalf(fmt, ...) ({ pr_err(fmt, ## __VA_ARGS__); })
+
 /*
  * pidlists linger the following amount before being destroyed.  The goal
  * is avoiding frequent destruction in the middle of consecutive read calls
@@ -915,168 +917,166 @@ static int cgroup1_show_options(struct seq_file *seq, struct kernfs_root *kf_roo
 	return 0;
 }
 
-static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
+int cgroup1_parse_option(struct cgroup_fs_context *ctx, char *token)
 {
-	char *token, *o = data;
-	bool all_ss = false, one_ss = false;
-	u16 mask = U16_MAX;
 	struct cgroup_subsys *ss;
-	int nr_opts = 0;
 	int i;
 
-#ifdef CONFIG_CPUSETS
-	mask = ~((u16)1 << cpuset_cgrp_id);
-#endif
-
-	memset(opts, 0, sizeof(*opts));
-
-	while ((token = strsep(&o, ",")) != NULL) {
-		nr_opts++;
+	if (!strcmp(token, "none")) {
+		/* Explicitly have no subsystems */
+		ctx->none = true;
+		return 0;
+	}
+	if (!strcmp(token, "all")) {
+		/* Mutually exclusive option 'all' + subsystem name */
+		if (ctx->one_ss)
+			return cg_invalf("cgroup1: all conflicts with subsys name");
+		ctx->all_ss = true;
+		return 0;
+	}
+	if (!strcmp(token, "noprefix")) {
+		ctx->flags |= CGRP_ROOT_NOPREFIX;
+		return 0;
+	}
+	if (!strcmp(token, "clone_children")) {
+		ctx->cpuset_clone_children = true;
+		return 0;
+	}
+	if (!strcmp(token, "xattr")) {
+		ctx->flags |= CGRP_ROOT_XATTR;
+		return 0;
+	}
+	if (!strncmp(token, "release_agent=", 14)) {
+		/* Specifying two release agents is forbidden */
+		if (ctx->release_agent)
+			return cg_invalf("cgroup1: release_agent respecified");
+		ctx->release_agent =
+			kstrndup(token + 14, PATH_MAX - 1, GFP_KERNEL);
+		if (!ctx->release_agent)
+			return -ENOMEM;
+		return 0;
+	}
 
-		if (!*token)
-			return -EINVAL;
-		if (!strcmp(token, "none")) {
-			/* Explicitly have no subsystems */
-			opts->none = true;
-			continue;
-		}
-		if (!strcmp(token, "all")) {
-			/* Mutually exclusive option 'all' + subsystem name */
-			if (one_ss)
-				return -EINVAL;
-			all_ss = true;
-			continue;
-		}
-		if (!strcmp(token, "noprefix")) {
-			opts->flags |= CGRP_ROOT_NOPREFIX;
-			continue;
+	if (!strncmp(token, "name=", 5)) {
+		const char *name = token + 5;
+		/* Can't specify an empty name */
+		if (!strlen(name))
+			return cg_invalf("cgroup1: Empty name");
+		/* Must match [\w.-]+ */
+		for (i = 0; i < strlen(name); i++) {
+			char c = name[i];
+			if (isalnum(c))
+				continue;
+			if ((c == '.') || (c == '-') || (c == '_'))
+				continue;
+			return cg_invalf("cgroup1: Invalid name");
 		}
-		if (!strcmp(token, "clone_children")) {
-			opts->cpuset_clone_children = true;
+		/* Specifying two names is forbidden */
+		if (ctx->name)
+			return cg_invalf("cgroup1: name respecified");
+		ctx->name = kstrndup(name,
+				     MAX_CGROUP_ROOT_NAMELEN - 1,
+				     GFP_KERNEL);
+		if (!ctx->name)
+			return -ENOMEM;
+
+		return 0;
+	}
+
+	for_each_subsys(ss, i) {
+		if (strcmp(token, ss->legacy_name))
 			continue;
-		}
 		if (!strcmp(token, "cpuset_v2_mode")) {
-			opts->flags |= CGRP_ROOT_CPUSET_V2_MODE;
+			ctx->flags |= CGRP_ROOT_CPUSET_V2_MODE;
 			continue;
 		}
 		if (!strcmp(token, "xattr")) {
-			opts->flags |= CGRP_ROOT_XATTR;
+			ctx->flags |= CGRP_ROOT_XATTR;
 			continue;
 		}
-		if (!strncmp(token, "release_agent=", 14)) {
-			/* Specifying two release agents is forbidden */
-			if (opts->release_agent)
-				return -EINVAL;
-			opts->release_agent =
-				kstrndup(token + 14, PATH_MAX - 1, GFP_KERNEL);
-			if (!opts->release_agent)
-				return -ENOMEM;
+		if (cgroup1_ssid_disabled(i))
 			continue;
-		}
-		if (!strncmp(token, "name=", 5)) {
-			const char *name = token + 5;
-			/* Can't specify an empty name */
-			if (!strlen(name))
-				return -EINVAL;
-			/* Must match [\w.-]+ */
-			for (i = 0; i < strlen(name); i++) {
-				char c = name[i];
-				if (isalnum(c))
-					continue;
-				if ((c == '.') || (c == '-') || (c == '_'))
-					continue;
-				return -EINVAL;
-			}
-			/* Specifying two names is forbidden */
-			if (opts->name)
-				return -EINVAL;
-			opts->name = kstrndup(name,
-					      MAX_CGROUP_ROOT_NAMELEN - 1,
-					      GFP_KERNEL);
-			if (!opts->name)
-				return -ENOMEM;
 
-			continue;
-		}
+		/* Mutually exclusive option 'all' + subsystem name */
+		if (ctx->all_ss)
+			return cg_invalf("cgroup1: subsys name conflicts with all");
+		ctx->subsys_mask |= (1 << i);
+		ctx->one_ss = true;
+		return 0;
+	}
 
-		for_each_subsys(ss, i) {
-			if (strcmp(token, ss->legacy_name))
-				continue;
-			if (!cgroup_ssid_enabled(i))
-				continue;
-			if (cgroup1_ssid_disabled(i))
-				continue;
+	if (i == CGROUP_SUBSYS_COUNT)
+		return -ENOENT;
+
+	return 0;
+}
 
-			/* Mutually exclusive option 'all' + subsystem name */
-			if (all_ss)
-				return -EINVAL;
-			opts->subsys_mask |= (1 << i);
-			one_ss = true;
+/*
+ * Validate the options that have been parsed.
+ */
+int cgroup1_validate(struct cgroup_fs_context *ctx)
+{
+	struct cgroup_subsys *ss;
+	u16 mask = U16_MAX;
+	int i;
 
-			break;
-		}
-		if (i == CGROUP_SUBSYS_COUNT)
-			return -ENOENT;
-	}
+#ifdef CONFIG_CPUSETS
+	mask = ~((u16)1 << cpuset_cgrp_id);
+#endif
 
 	/*
 	 * If the 'all' option was specified select all the subsystems,
 	 * otherwise if 'none', 'name=' and a subsystem name options were
 	 * not specified, let's default to 'all'
 	 */
-	if (all_ss || (!one_ss && !opts->none && !opts->name))
+	if (ctx->all_ss || (!ctx->one_ss && !ctx->none && !ctx->name))
 		for_each_subsys(ss, i)
 			if (cgroup_ssid_enabled(i) && !cgroup1_ssid_disabled(i))
-				opts->subsys_mask |= (1 << i);
+				ctx->subsys_mask |= (1 << i);
 
 	/*
 	 * We either have to specify by name or by subsystems. (So all
 	 * empty hierarchies must have a name).
 	 */
-	if (!opts->subsys_mask && !opts->name)
-		return -EINVAL;
+	if (!ctx->subsys_mask && !ctx->name)
+		return cg_invalf("cgroup1: Need name or subsystem set");
 
 	/*
 	 * Option noprefix was introduced just for backward compatibility
 	 * with the old cpuset, so we allow noprefix only if mounting just
 	 * the cpuset subsystem.
 	 */
-	if ((opts->flags & CGRP_ROOT_NOPREFIX) && (opts->subsys_mask & mask))
-		return -EINVAL;
+	if ((ctx->flags & CGRP_ROOT_NOPREFIX) && (ctx->subsys_mask & mask))
+		return cg_invalf("cgroup1: noprefix used incorrectly");
 
 	/* Can't specify "none" and some subsystems */
-	if (opts->subsys_mask && opts->none)
-		return -EINVAL;
+	if (ctx->subsys_mask && ctx->none)
+		return cg_invalf("cgroup1: none used incorrectly");
 
 	return 0;
 }
 
-static int cgroup1_remount(struct kernfs_root *kf_root, int *flags, char *data)
+static int cgroup1_reconfigure(struct kernfs_root *kf_root, struct kernfs_fs_context *kfc)
 {
-	int ret = 0;
+	struct cgroup_fs_context *ctx = container_of(kfc, struct cgroup_fs_context, kfc);
 	struct cgroup_root *root = cgroup_root_from_kf(kf_root);
-	struct cgroup_sb_opts opts;
 	u16 added_mask, removed_mask;
+	int ret = 0;
 
 	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
-	/* See what subsystems are wanted */
-	ret = parse_cgroupfs_options(data, &opts);
-	if (ret)
-		goto out_unlock;
-
-	if (opts.subsys_mask != root->subsys_mask || opts.release_agent)
+	if (ctx->subsys_mask != root->subsys_mask || ctx->release_agent)
 		pr_warn("option changes via remount are deprecated (pid=%d comm=%s)\n",
 			task_tgid_nr(current), current->comm);
 
-	added_mask = opts.subsys_mask & ~root->subsys_mask;
-	removed_mask = root->subsys_mask & ~opts.subsys_mask;
+	added_mask = ctx->subsys_mask & ~root->subsys_mask;
+	removed_mask = root->subsys_mask & ~ctx->subsys_mask;
 
 	/* Don't allow flags or name to change at remount */
-	if ((opts.flags ^ root->flags) ||
-	    (opts.name && strcmp(opts.name, root->name))) {
-		pr_err("option or name mismatch, new: 0x%x \"%s\", old: 0x%x \"%s\"\n",
-		       opts.flags, opts.name ?: "", root->flags, root->name);
+	if ((ctx->flags ^ root->flags) ||
+	    (ctx->name && strcmp(ctx->name, root->name))) {
+		cg_invalf("option or name mismatch, new: 0x%x \"%s\", old: 0x%x \"%s\"",
+		       ctx->flags, ctx->name ?: "", root->flags, root->name);
 		ret = -EINVAL;
 		goto out_unlock;
 	}
@@ -1093,17 +1093,15 @@ static int cgroup1_remount(struct kernfs_root *kf_root, int *flags, char *data)
 
 	WARN_ON(rebind_subsystems(&cgrp_dfl_root, removed_mask));
 
-	if (opts.release_agent) {
+	if (ctx->release_agent) {
 		spin_lock(&release_agent_path_lock);
-		strcpy(root->release_agent_path, opts.release_agent);
+		strcpy(root->release_agent_path, ctx->release_agent);
 		spin_unlock(&release_agent_path_lock);
 	}
 
 	trace_cgroup_remount(root);
 
  out_unlock:
-	kfree(opts.release_agent);
-	kfree(opts.name);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
 }
@@ -1111,31 +1109,25 @@ static int cgroup1_remount(struct kernfs_root *kf_root, int *flags, char *data)
 struct kernfs_syscall_ops cgroup1_kf_syscall_ops = {
 	.rename			= cgroup1_rename,
 	.show_options		= cgroup1_show_options,
-	.remount_fs		= cgroup1_remount,
+	.reconfigure		= cgroup1_reconfigure,
 	.mkdir			= cgroup_mkdir,
 	.rmdir			= cgroup_rmdir,
 	.show_path		= cgroup_show_path,
 };
 
-struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
-			     void *data, unsigned long magic,
-			     struct cgroup_namespace *ns)
+/*
+ * Find or create a v1 cgroups superblock.
+ */
+int cgroup1_get_tree(struct cgroup_fs_context *ctx)
 {
 	struct super_block *pinned_sb = NULL;
-	struct cgroup_sb_opts opts;
 	struct cgroup_root *root;
 	struct cgroup_subsys *ss;
-	struct dentry *dentry;
 	int i, ret;
 	bool new_root = false;
 
 	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
-	/* First find the desired set of subsystems */
-	ret = parse_cgroupfs_options(data, &opts);
-	if (ret)
-		goto out_unlock;
-
 	/*
 	 * Destruction of cgroup root is asynchronous, so subsystems may
 	 * still be dying after the previous unmount.  Let's drain the
@@ -1144,15 +1136,13 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 	 * starting.  Testing ref liveliness is good enough.
 	 */
 	for_each_subsys(ss, i) {
-		if (!(opts.subsys_mask & (1 << i)) ||
+		if (!(ctx->subsys_mask & (1 << i)) ||
 		    ss->root == &cgrp_dfl_root)
 			continue;
 
 		if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) {
 			mutex_unlock(&cgroup_mutex);
-			msleep(10);
-			ret = restart_syscall();
-			goto out_free;
+			goto err_restart;
 		}
 		cgroup_put(&ss->root->cgrp);
 	}
@@ -1168,8 +1158,8 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 		 * name matches but sybsys_mask doesn't, we should fail.
 		 * Remember whether name matched.
 		 */
-		if (opts.name) {
-			if (strcmp(opts.name, root->name))
+		if (ctx->name) {
+			if (strcmp(ctx->name, root->name))
 				continue;
 			name_match = true;
 		}
@@ -1178,15 +1168,15 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 		 * If we asked for subsystems (or explicitly for no
 		 * subsystems) then they must match.
 		 */
-		if ((opts.subsys_mask || opts.none) &&
-		    (opts.subsys_mask != root->subsys_mask)) {
+		if ((ctx->subsys_mask || ctx->none) &&
+		    (ctx->subsys_mask != root->subsys_mask)) {
 			if (!name_match)
 				continue;
 			ret = -EBUSY;
-			goto out_unlock;
+			goto err_unlock;
 		}
 
-		if (root->flags ^ opts.flags)
+		if (root->flags ^ ctx->flags)
 			pr_warn("new mount options do not match the existing superblock, will be ignored\n");
 
 		/*
@@ -1207,9 +1197,7 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 			mutex_unlock(&cgroup_mutex);
 			if (!IS_ERR_OR_NULL(pinned_sb))
 				deactivate_super(pinned_sb);
-			msleep(10);
-			ret = restart_syscall();
-			goto out_free;
+			goto err_restart;
 		}
 
 		ret = 0;
@@ -1221,41 +1209,35 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 	 * specification is allowed for already existing hierarchies but we
 	 * can't create new one without subsys specification.
 	 */
-	if (!opts.subsys_mask && !opts.none) {
-		ret = -EINVAL;
-		goto out_unlock;
+	if (!ctx->subsys_mask && !ctx->none) {
+		ret = cg_invalf("cgroup1: No subsys list or none specified");
+		goto err_unlock;
 	}
 
 	/* Hierarchies may only be created in the initial cgroup namespace. */
-	if (ns != &init_cgroup_ns) {
+	if (ctx->ns != &init_cgroup_ns) {
 		ret = -EPERM;
-		goto out_unlock;
+		goto err_unlock;
 	}
 
 	root = kzalloc(sizeof(*root), GFP_KERNEL);
 	if (!root) {
 		ret = -ENOMEM;
-		goto out_unlock;
+		goto err_unlock;
 	}
 	new_root = true;
+	ctx->root = root;
 
-	init_cgroup_root(root, &opts);
+	init_cgroup_root(ctx);
 
-	ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD);
+	ret = cgroup_setup_root(root, ctx->subsys_mask, PERCPU_REF_INIT_DEAD);
 	if (ret)
 		cgroup_free_root(root);
 
 out_unlock:
 	mutex_unlock(&cgroup_mutex);
-out_free:
-	kfree(opts.release_agent);
-	kfree(opts.name);
-
-	if (ret)
-		return ERR_PTR(ret);
 
-	dentry = cgroup_do_mount(&cgroup_fs_type, flags, root,
-				 CGROUP_SUPER_MAGIC, ns);
+	ret = cgroup_do_get_tree(ctx);
 
 	/*
 	 * There's a race window after we release cgroup_mutex and before
@@ -1276,7 +1258,14 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 	if (pinned_sb)
 		deactivate_super(pinned_sb);
 
-	return dentry;
+	return ret;
+
+err_restart:
+	msleep(10);
+	return restart_syscall();
+err_unlock:
+	mutex_unlock(&cgroup_mutex);
+	return ret;
 }
 
 static int __init cgroup1_wq_init(void)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c2678e4800fc..93d544e3286c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1718,25 +1718,21 @@ int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node,
 	return len;
 }
 
-static int parse_cgroup_root_flags(char *data, unsigned int *root_flags)
+static int cgroup2_parse_option(struct cgroup_fs_context *ctx, char *token)
 {
-	char *token;
-
-	*root_flags = 0;
-
-	if (!data)
+	if (!strcmp(token, "nsdelegate")) {
+		ctx->flags |= CGRP_ROOT_NS_DELEGATE;
 		return 0;
-
-	while ((token = strsep(&data, ",")) != NULL) {
-		if (!strcmp(token, "nsdelegate")) {
-			*root_flags |= CGRP_ROOT_NS_DELEGATE;
-			continue;
-		}
-
-		pr_err("cgroup2: unknown option \"%s\"\n", token);
-		return -EINVAL;
 	}
 
+	return -EINVAL;
+}
+
+static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root)
+{
+	if (current->nsproxy->cgroup_ns == &init_cgroup_ns &&
+	    cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
+		seq_puts(seq, ",nsdelegate");
 	return 0;
 }
 
@@ -1750,23 +1746,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
 	}
 }
 
-static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root)
+static int cgroup_reconfigure(struct kernfs_root *kf_root, struct kernfs_fs_context *kfc)
 {
-	if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
-		seq_puts(seq, ",nsdelegate");
-	return 0;
-}
+	struct cgroup_fs_context *ctx = container_of(kfc, struct cgroup_fs_context, kfc);
 
-static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data)
-{
-	unsigned int root_flags;
-	int ret;
-
-	ret = parse_cgroup_root_flags(data, &root_flags);
-	if (ret)
-		return ret;
-
-	apply_cgroup_root_flags(root_flags);
+	apply_cgroup_root_flags(ctx->flags);
 	return 0;
 }
 
@@ -1852,8 +1836,9 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_WORK(&cgrp->release_agent_work, cgroup1_release_agent);
 }
 
-void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts)
+void init_cgroup_root(struct cgroup_fs_context *ctx)
 {
+	struct cgroup_root *root = ctx->root;
 	struct cgroup *cgrp = &root->cgrp;
 
 	INIT_LIST_HEAD(&root->root_list);
@@ -1862,12 +1847,12 @@ void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts)
 	init_cgroup_housekeeping(cgrp);
 	idr_init(&root->cgroup_idr);
 
-	root->flags = opts->flags;
-	if (opts->release_agent)
-		strscpy(root->release_agent_path, opts->release_agent, PATH_MAX);
-	if (opts->name)
-		strscpy(root->name, opts->name, MAX_CGROUP_ROOT_NAMELEN);
-	if (opts->cpuset_clone_children)
+	root->flags = ctx->flags;
+	if (ctx->release_agent)
+		strscpy(root->release_agent_path, ctx->release_agent, PATH_MAX);
+	if (ctx->name)
+		strscpy(root->name, ctx->name, MAX_CGROUP_ROOT_NAMELEN);
+	if (ctx->cpuset_clone_children)
 		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
 }
 
@@ -1972,57 +1957,50 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
 	return ret;
 }
 
-struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
-			       struct cgroup_root *root, unsigned long magic,
-			       struct cgroup_namespace *ns)
+int cgroup_do_get_tree(struct cgroup_fs_context *ctx)
 {
-	struct dentry *dentry;
-	bool new_sb;
+	int ret;
+
+	ctx->kfc.root = ctx->root->kf_root;
 
-	dentry = kernfs_mount(fs_type, flags, root->kf_root, magic, &new_sb);
+	ret = kernfs_get_tree(&ctx->kfc);
+	if (ret < 0)
+		goto out_cgrp;
 
 	/*
 	 * In non-init cgroup namespace, instead of root cgroup's dentry,
 	 * we return the dentry corresponding to the cgroupns->root_cgrp.
 	 */
-	if (!IS_ERR(dentry) && ns != &init_cgroup_ns) {
+	if (ctx->ns != &init_cgroup_ns) {
 		struct dentry *nsdentry;
 		struct cgroup *cgrp;
 
 		mutex_lock(&cgroup_mutex);
 		spin_lock_irq(&css_set_lock);
 
-		cgrp = cset_cgroup_from_root(ns->root_cset, root);
+		cgrp = cset_cgroup_from_root(ctx->ns->root_cset, ctx->root);
 
 		spin_unlock_irq(&css_set_lock);
 		mutex_unlock(&cgroup_mutex);
 
-		nsdentry = kernfs_node_dentry(cgrp->kn, dentry->d_sb);
-		dput(dentry);
-		dentry = nsdentry;
+		nsdentry = kernfs_node_dentry(cgrp->kn, ctx->kfc.fc.root->d_sb);
+		dput(ctx->kfc.fc.root);
+		ctx->kfc.fc.root = nsdentry;
 	}
 
-	if (IS_ERR(dentry) || !new_sb)
-		cgroup_put(&root->cgrp);
+	ret = 0;
+	if (ctx->kfc.new_sb_created)
+		goto out_cgrp;
+	apply_cgroup_root_flags(ctx->flags);
+	return 0;
 
-	return dentry;
+out_cgrp:
+	return ret;
 }
 
-static struct dentry *cgroup_mount(struct file_system_type *fs_type,
-			 int flags, const char *unused_dev_name,
-			 void *data, size_t data_size)
+static int cgroup_get_tree(struct fs_context *fc)
 {
-	struct cgroup_namespace *ns = current->nsproxy->cgroup_ns;
-	struct dentry *dentry;
-	int ret;
-
-	get_cgroup_ns(ns);
-
-	/* Check if the caller has permission to mount. */
-	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) {
-		put_cgroup_ns(ns);
-		return ERR_PTR(-EPERM);
-	}
+	struct cgroup_fs_context *ctx = container_of(fc, struct cgroup_fs_context, kfc.fc);
 
 	/*
 	 * The first time anyone tries to mount a cgroup, enable the list
@@ -2031,29 +2009,81 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	if (!use_task_css_set_links)
 		cgroup_enable_task_cg_lists();
 
-	if (fs_type == &cgroup2_fs_type) {
-		unsigned int root_flags;
-
-		ret = parse_cgroup_root_flags(data, &root_flags);
-		if (ret) {
-			put_cgroup_ns(ns);
-			return ERR_PTR(ret);
-		}
+	switch (ctx->version) {
+	case 1:
+		return cgroup1_get_tree(ctx);
 
+	case 2:
 		cgrp_dfl_visible = true;
 		cgroup_get_live(&cgrp_dfl_root.cgrp);
 
-		dentry = cgroup_do_mount(&cgroup2_fs_type, flags, &cgrp_dfl_root,
-					 CGROUP2_SUPER_MAGIC, ns);
-		if (!IS_ERR(dentry))
-			apply_cgroup_root_flags(root_flags);
-	} else {
-		dentry = cgroup1_mount(&cgroup_fs_type, flags, data,
-				       CGROUP_SUPER_MAGIC, ns);
+		ctx->root = &cgrp_dfl_root;
+		return cgroup_do_get_tree(ctx);
+
+	default:
+		BUG();
 	}
+}
+
+static int cgroup_parse_option(struct fs_context *fc, char *opt, size_t len)
+{
+	struct cgroup_fs_context *ctx = container_of(fc, struct cgroup_fs_context, kfc.fc);
+
+	if (ctx->version == 1)
+		return cgroup1_parse_option(ctx, opt);
+
+	return cgroup2_parse_option(ctx, opt);
+}
+
+static int cgroup_validate(struct fs_context *fc)
+{
+	struct cgroup_fs_context *ctx = container_of(fc, struct cgroup_fs_context, kfc.fc);
+
+	if (ctx->version == 1)
+		return cgroup1_validate(ctx);
+	return 0;
+}
+
+/*
+ * Destroy a cgroup filesystem context.
+ */
+static void cgroup_fs_context_free(struct fs_context *fc)
+{
+	struct cgroup_fs_context *ctx = container_of(fc, struct cgroup_fs_context, kfc.fc);
 
-	put_cgroup_ns(ns);
-	return dentry;
+	kfree(ctx->name);
+	kfree(ctx->release_agent);
+	if (ctx->root)
+		cgroup_put(&ctx->root->cgrp);
+	put_cgroup_ns(ctx->ns);
+	kernfs_free_fs_context(&ctx->kfc);
+}
+
+static const struct fs_context_operations cgroup_fs_context_ops = {
+	.free		= cgroup_fs_context_free,
+	.parse_option	= cgroup_parse_option,
+	.validate	= cgroup_validate,
+	.get_tree	= cgroup_get_tree,
+};
+
+/*
+ * Initialise the cgroup filesystem creation/reconfiguration context.  Notably,
+ * we select the namespace we're going to use.
+ */
+static int cgroup_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	struct cgroup_fs_context *ctx = container_of(fc, struct cgroup_fs_context, kfc.fc);
+	struct cgroup_namespace *ns = current->nsproxy->cgroup_ns;
+
+	/* Check if the caller has permission to mount. */
+	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	ctx->ns = get_cgroup_ns(ns);
+	ctx->version = (fc->fs_type == &cgroup2_fs_type) ? 2 : 1;
+	ctx->kfc.magic = (ctx->version == 2) ? CGROUP2_SUPER_MAGIC : CGROUP_SUPER_MAGIC;
+	ctx->kfc.fc.ops = &cgroup_fs_context_ops;
+	return 0;
 }
 
 static void cgroup_kill_sb(struct super_block *sb)
@@ -2078,17 +2108,19 @@ static void cgroup_kill_sb(struct super_block *sb)
 }
 
 struct file_system_type cgroup_fs_type = {
-	.name = "cgroup",
-	.mount = cgroup_mount,
-	.kill_sb = cgroup_kill_sb,
-	.fs_flags = FS_USERNS_MOUNT,
+	.name			= "cgroup",
+	.fs_context_size	= sizeof(struct cgroup_fs_context),
+	.init_fs_context	= cgroup_init_fs_context,
+	.kill_sb		= cgroup_kill_sb,
+	.fs_flags		= FS_USERNS_MOUNT,
 };
 
 static struct file_system_type cgroup2_fs_type = {
-	.name = "cgroup2",
-	.mount = cgroup_mount,
-	.kill_sb = cgroup_kill_sb,
-	.fs_flags = FS_USERNS_MOUNT,
+	.name			= "cgroup2",
+	.fs_context_size	= sizeof(struct cgroup_fs_context),
+	.init_fs_context	= cgroup_init_fs_context,
+	.kill_sb		= cgroup_kill_sb,
+	.fs_flags		= FS_USERNS_MOUNT,
 };
 
 int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
@@ -5132,7 +5164,7 @@ int cgroup_rmdir(struct kernfs_node *kn)
 
 static struct kernfs_syscall_ops cgroup_kf_syscall_ops = {
 	.show_options		= cgroup_show_options,
-	.remount_fs		= cgroup_remount,
+	.reconfigure		= cgroup_reconfigure,
 	.mkdir			= cgroup_mkdir,
 	.rmdir			= cgroup_rmdir,
 	.show_path		= cgroup_show_path,
@@ -5199,11 +5231,12 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
  */
 int __init cgroup_init_early(void)
 {
-	static struct cgroup_sb_opts __initdata opts;
+	static struct cgroup_fs_context __initdata ctx;
 	struct cgroup_subsys *ss;
 	int i;
 
-	init_cgroup_root(&cgrp_dfl_root, &opts);
+	ctx.root = &cgrp_dfl_root;
+	init_cgroup_root(&ctx);
 	cgrp_dfl_root.cgrp.self.flags |= CSS_NO_REF;
 
 	RCU_INIT_POINTER(init_task.cgroups, &init_css_set);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 16/24] hugetlbfs: Convert to fs_context [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (14 preceding siblings ...)
  2018-04-19 13:32 ` [PATCH 15/24] kernfs, sysfs, cgroup, intel_rdt: Support " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 17/24] VFS: Remove kern_mount_data() " David Howells
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Convert the hugetlbfs to use the fs_context during mount.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/hugetlbfs/inode.c |  330 ++++++++++++++++++++++++++++----------------------
 1 file changed, 184 insertions(+), 146 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 76fb8eb2bea8..11056af43e66 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -45,11 +45,18 @@ const struct file_operations hugetlbfs_file_operations;
 static const struct inode_operations hugetlbfs_dir_inode_operations;
 static const struct inode_operations hugetlbfs_inode_operations;
 
-struct hugetlbfs_config {
+enum hugetlbfs_size_type { NO_SIZE, SIZE_STD, SIZE_PERCENT };
+
+struct hugetlbfs_fs_context {
+	struct fs_context	fc;
 	struct hstate		*hstate;
+	unsigned long long	max_size_opt;
+	unsigned long long	min_size_opt;
 	long			max_hpages;
 	long			nr_inodes;
 	long			min_hpages;
+	enum hugetlbfs_size_type max_val_type;
+	enum hugetlbfs_size_type min_val_type;
 	kuid_t			uid;
 	kgid_t			gid;
 	umode_t			mode;
@@ -708,16 +715,16 @@ static int hugetlbfs_setattr(struct dentry *dentry, struct iattr *attr)
 }
 
 static struct inode *hugetlbfs_get_root(struct super_block *sb,
-					struct hugetlbfs_config *config)
+					struct hugetlbfs_fs_context *ctx)
 {
 	struct inode *inode;
 
 	inode = new_inode(sb);
 	if (inode) {
 		inode->i_ino = get_next_ino();
-		inode->i_mode = S_IFDIR | config->mode;
-		inode->i_uid = config->uid;
-		inode->i_gid = config->gid;
+		inode->i_mode = S_IFDIR | ctx->mode;
+		inode->i_uid = ctx->uid;
+		inode->i_gid = ctx->gid;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 		inode->i_op = &hugetlbfs_dir_inode_operations;
 		inode->i_fop = &simple_dir_operations;
@@ -1081,8 +1088,6 @@ static const struct super_operations hugetlbfs_ops = {
 	.show_options	= hugetlbfs_show_options,
 };
 
-enum hugetlbfs_size_type { NO_SIZE, SIZE_STD, SIZE_PERCENT };
-
 /*
  * Convert size option passed from command line to number of huge pages
  * in the pool specified by hstate.  Size option could be in bytes
@@ -1105,171 +1110,156 @@ hugetlbfs_size_to_hpages(struct hstate *h, unsigned long long size_opt,
 	return size_opt;
 }
 
-static int
-hugetlbfs_parse_options(char *options, struct hugetlbfs_config *pconfig)
+/*
+ * Parse one mount option.
+ */
+static int hugetlbfs_parse_option(struct fs_context *fc, char *opt, size_t len)
 {
-	char *p, *rest;
+	struct hugetlbfs_fs_context *ctx = container_of(fc, struct hugetlbfs_fs_context, fc);
+	char *rest;
+	unsigned long ps;
 	substring_t args[MAX_OPT_ARGS];
-	int option;
-	unsigned long long max_size_opt = 0, min_size_opt = 0;
-	enum hugetlbfs_size_type max_val_type = NO_SIZE, min_val_type = NO_SIZE;
-
-	if (!options)
+	int token, option;
+
+	token = match_token(opt, tokens, args);
+	switch (token) {
+	case Opt_uid:
+		if (match_int(&args[0], &option))
+			goto bad_val;
+		ctx->uid = make_kuid(current_user_ns(), option);
+		if (!uid_valid(ctx->uid))
+			goto bad_val;
 		return 0;
 
-	while ((p = strsep(&options, ",")) != NULL) {
-		int token;
-		if (!*p)
-			continue;
+	case Opt_gid:
+		if (match_int(&args[0], &option))
+			goto bad_val;
+		ctx->gid = make_kgid(current_user_ns(), option);
+		if (!gid_valid(ctx->gid))
+			goto bad_val;
+		return 0;
 
-		token = match_token(p, tokens, args);
-		switch (token) {
-		case Opt_uid:
-			if (match_int(&args[0], &option))
- 				goto bad_val;
-			pconfig->uid = make_kuid(current_user_ns(), option);
-			if (!uid_valid(pconfig->uid))
-				goto bad_val;
-			break;
+	case Opt_mode:
+		if (match_octal(&args[0], &option))
+			goto bad_val;
+		ctx->mode = option & 01777U;
+		return 0;
 
-		case Opt_gid:
-			if (match_int(&args[0], &option))
- 				goto bad_val;
-			pconfig->gid = make_kgid(current_user_ns(), option);
-			if (!gid_valid(pconfig->gid))
-				goto bad_val;
-			break;
+	case Opt_size:
+		/* memparse() will accept a K/M/G without a digit */
+		if (!isdigit(*args[0].from))
+			goto bad_val;
+		ctx->max_size_opt = memparse(args[0].from, &rest);
+		ctx->max_val_type = SIZE_STD;
+		if (*rest == '%')
+			ctx->max_val_type = SIZE_PERCENT;
+		return 0;
 
-		case Opt_mode:
-			if (match_octal(&args[0], &option))
- 				goto bad_val;
-			pconfig->mode = option & 01777U;
-			break;
+	case Opt_nr_inodes:
+		/* memparse() will accept a K/M/G without a digit */
+		if (!isdigit(*args[0].from))
+			goto bad_val;
+		ctx->nr_inodes = memparse(args[0].from, &rest);
+		return 0;
 
-		case Opt_size: {
-			/* memparse() will accept a K/M/G without a digit */
-			if (!isdigit(*args[0].from))
-				goto bad_val;
-			max_size_opt = memparse(args[0].from, &rest);
-			max_val_type = SIZE_STD;
-			if (*rest == '%')
-				max_val_type = SIZE_PERCENT;
-			break;
+	case Opt_pagesize:
+		ps = memparse(args[0].from, &rest);
+		ctx->hstate = size_to_hstate(ps);
+		if (!ctx->hstate) {
+			pr_err("Unsupported page size %lu MB\n", ps >> 20);
+			return -EINVAL;
 		}
+		return 0;
 
-		case Opt_nr_inodes:
-			/* memparse() will accept a K/M/G without a digit */
-			if (!isdigit(*args[0].from))
-				goto bad_val;
-			pconfig->nr_inodes = memparse(args[0].from, &rest);
-			break;
+	case Opt_min_size:
+		/* memparse() will accept a K/M/G without a digit */
+		if (!isdigit(*args[0].from))
+			goto bad_val;
+		ctx->min_size_opt = memparse(args[0].from, &rest);
+		ctx->min_val_type = SIZE_STD;
+		if (*rest == '%')
+			ctx->min_val_type = SIZE_PERCENT;
+		return 0;
 
-		case Opt_pagesize: {
-			unsigned long ps;
-			ps = memparse(args[0].from, &rest);
-			pconfig->hstate = size_to_hstate(ps);
-			if (!pconfig->hstate) {
-				pr_err("Unsupported page size %lu MB\n",
-					ps >> 20);
-				return -EINVAL;
-			}
-			break;
-		}
+	default:
+		pr_err("Bad mount option: \"%s\"\n", opt);
+		return -EINVAL;
+	}
 
-		case Opt_min_size: {
-			/* memparse() will accept a K/M/G without a digit */
-			if (!isdigit(*args[0].from))
-				goto bad_val;
-			min_size_opt = memparse(args[0].from, &rest);
-			min_val_type = SIZE_STD;
-			if (*rest == '%')
-				min_val_type = SIZE_PERCENT;
-			break;
-		}
+bad_val:
+	pr_err("Bad value '%s' for mount option '%s'\n", args[0].from, opt);
+	return -EINVAL;
+}
 
-		default:
-			pr_err("Bad mount option: \"%s\"\n", p);
-			return -EINVAL;
-			break;
-		}
-	}
+/*
+ * Validate the parsed options.
+ */
+static int hugetlbfs_validate(struct fs_context *fc)
+{
+	struct hugetlbfs_fs_context *ctx = container_of(fc, struct hugetlbfs_fs_context, fc);
 
 	/*
 	 * Use huge page pool size (in hstate) to convert the size
 	 * options to number of huge pages.  If NO_SIZE, -1 is returned.
 	 */
-	pconfig->max_hpages = hugetlbfs_size_to_hpages(pconfig->hstate,
-						max_size_opt, max_val_type);
-	pconfig->min_hpages = hugetlbfs_size_to_hpages(pconfig->hstate,
-						min_size_opt, min_val_type);
+	ctx->max_hpages = hugetlbfs_size_to_hpages(ctx->hstate,
+						   ctx->max_size_opt,
+						   ctx->max_val_type);
+	ctx->min_hpages = hugetlbfs_size_to_hpages(ctx->hstate,
+						   ctx->min_size_opt,
+						   ctx->min_val_type);
 
 	/*
 	 * If max_size was specified, then min_size must be smaller
 	 */
-	if (max_val_type > NO_SIZE &&
-	    pconfig->min_hpages > pconfig->max_hpages) {
-		pr_err("minimum size can not be greater than maximum size\n");
+	if (ctx->max_val_type > NO_SIZE &&
+	    ctx->min_hpages > ctx->max_hpages) {
+		pr_err("Minimum size can not be greater than maximum size\n");
 		return -EINVAL;
 	}
 
 	return 0;
-
-bad_val:
-	pr_err("Bad value '%s' for mount option '%s'\n", args[0].from, p);
- 	return -EINVAL;
 }
 
 static int
-hugetlbfs_fill_super(struct super_block *sb, void *data, size_t data_size,
-		     int silent)
+hugetlbfs_fill_super(struct super_block *sb, struct fs_context *fc)
 {
-	int ret;
-	struct hugetlbfs_config config;
+	struct hugetlbfs_fs_context *ctx =
+		container_of(fc, struct hugetlbfs_fs_context, fc);
 	struct hugetlbfs_sb_info *sbinfo;
 
-	config.max_hpages = -1; /* No limit on size by default */
-	config.nr_inodes = -1; /* No limit on number of inodes by default */
-	config.uid = current_fsuid();
-	config.gid = current_fsgid();
-	config.mode = 0755;
-	config.hstate = &default_hstate;
-	config.min_hpages = -1; /* No default minimum size */
-	ret = hugetlbfs_parse_options(data, &config);
-	if (ret)
-		return ret;
-
 	sbinfo = kmalloc(sizeof(struct hugetlbfs_sb_info), GFP_KERNEL);
 	if (!sbinfo)
 		return -ENOMEM;
 	sb->s_fs_info = sbinfo;
-	sbinfo->hstate = config.hstate;
 	spin_lock_init(&sbinfo->stat_lock);
-	sbinfo->max_inodes = config.nr_inodes;
-	sbinfo->free_inodes = config.nr_inodes;
-	sbinfo->spool = NULL;
-	sbinfo->uid = config.uid;
-	sbinfo->gid = config.gid;
-	sbinfo->mode = config.mode;
+	sbinfo->hstate		= ctx->hstate;
+	sbinfo->max_inodes	= ctx->nr_inodes;
+	sbinfo->free_inodes	= ctx->nr_inodes;
+	sbinfo->spool		= NULL;
+	sbinfo->uid		= ctx->uid;
+	sbinfo->gid		= ctx->gid;
+	sbinfo->mode		= ctx->mode;
 
 	/*
 	 * Allocate and initialize subpool if maximum or minimum size is
 	 * specified.  Any needed reservations (for minimim size) are taken
 	 * taken when the subpool is created.
 	 */
-	if (config.max_hpages != -1 || config.min_hpages != -1) {
-		sbinfo->spool = hugepage_new_subpool(config.hstate,
-							config.max_hpages,
-							config.min_hpages);
+	if (ctx->max_hpages != -1 || ctx->min_hpages != -1) {
+		sbinfo->spool = hugepage_new_subpool(ctx->hstate,
+						     ctx->max_hpages,
+						     ctx->min_hpages);
 		if (!sbinfo->spool)
 			goto out_free;
 	}
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
-	sb->s_blocksize = huge_page_size(config.hstate);
-	sb->s_blocksize_bits = huge_page_shift(config.hstate);
+	sb->s_blocksize = huge_page_size(ctx->hstate);
+	sb->s_blocksize_bits = huge_page_shift(ctx->hstate);
 	sb->s_magic = HUGETLBFS_MAGIC;
 	sb->s_op = &hugetlbfs_ops;
 	sb->s_time_gran = 1;
-	sb->s_root = d_make_root(hugetlbfs_get_root(sb, &config));
+	sb->s_root = d_make_root(hugetlbfs_get_root(sb, ctx));
 	if (!sb->s_root)
 		goto out_free;
 	return 0;
@@ -1279,17 +1269,39 @@ hugetlbfs_fill_super(struct super_block *sb, void *data, size_t data_size,
 	return -ENOMEM;
 }
 
-static struct dentry *hugetlbfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data, size_t data_size)
+static int hugetlbfs_get_tree(struct fs_context *fc)
+{
+	return vfs_get_super(fc, vfs_get_independent_super, hugetlbfs_fill_super);
+}
+
+static const struct fs_context_operations hugetlbfs_fs_context_ops = {
+	.parse_option	= hugetlbfs_parse_option,
+	.validate	= hugetlbfs_validate,
+	.get_tree	= hugetlbfs_get_tree,
+};
+
+static int hugetlbfs_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
 {
-	return mount_nodev(fs_type, flags, data, data_size,
-			   hugetlbfs_fill_super);
+	struct hugetlbfs_fs_context *ctx = container_of(fc, struct hugetlbfs_fs_context, fc);
+
+	ctx->max_hpages	= -1; /* No limit on size by default */
+	ctx->nr_inodes	= -1; /* No limit on number of inodes by default */
+	ctx->uid	= current_fsuid();
+	ctx->gid	= current_fsgid();
+	ctx->mode	= 0755;
+	ctx->hstate	= &default_hstate;
+	ctx->min_hpages	= -1; /* No default minimum size */
+	ctx->max_val_type = NO_SIZE;
+	ctx->min_val_type = NO_SIZE;
+	ctx->fc.ops	= &hugetlbfs_fs_context_ops;
+	return 0;
 }
 
 static struct file_system_type hugetlbfs_fs_type = {
-	.name		= "hugetlbfs",
-	.mount		= hugetlbfs_mount,
-	.kill_sb	= kill_litter_super,
+	.name			= "hugetlbfs",
+	.fs_context_size	= sizeof(struct hugetlbfs_fs_context),
+	.init_fs_context	= hugetlbfs_init_fs_context,
+	.kill_sb		= kill_litter_super,
 };
 
 static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE];
@@ -1396,8 +1408,47 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
 	return file;
 }
 
+static struct vfsmount *__init mount_one_hugetlbfs(struct hstate *h)
+{
+	struct hugetlbfs_fs_context *ctx;
+	struct fs_context *fc;
+	struct vfsmount *mnt;
+	int ret;
+
+	fc = vfs_new_fs_context(&hugetlbfs_fs_type, NULL, 0,
+				FS_CONTEXT_FOR_KERNEL_MOUNT);
+	if (IS_ERR(fc)) {
+		ret = PTR_ERR(fc);
+		goto err;
+	}
+
+	ctx = container_of(fc, struct hugetlbfs_fs_context, fc);
+	ctx->hstate = h;
+
+	ret = vfs_get_tree(fc);
+	if (ret < 0)
+		goto err_fc;
+
+	mnt = vfs_create_mount(fc);
+	if (IS_ERR(mnt)) {
+		ret = PTR_ERR(mnt);
+		goto err_fc;
+	}
+
+	put_fs_context(fc);
+	return mnt;
+
+err_fc:
+	put_fs_context(fc);
+err:
+	pr_err("Cannot mount internal hugetlbfs for page size %uK",
+	       1U << (h->order + PAGE_SHIFT - 10));
+	return ERR_PTR(ret);
+}
+
 static int __init init_hugetlbfs_fs(void)
 {
+	struct vfsmount *mnt;
 	struct hstate *h;
 	int error;
 	int i;
@@ -1420,25 +1471,12 @@ static int __init init_hugetlbfs_fs(void)
 
 	i = 0;
 	for_each_hstate(h) {
-		char buf[50];
-		unsigned ps_kb = 1U << (h->order + PAGE_SHIFT - 10);
-		int n;
-
-		n = snprintf(buf, sizeof(buf), "pagesize=%uK", ps_kb);
-		hugetlbfs_vfsmount[i] = kern_mount_data(&hugetlbfs_fs_type,
-							buf, n + 1);
-
-		if (IS_ERR(hugetlbfs_vfsmount[i])) {
-			pr_err("Cannot mount internal hugetlbfs for "
-				"page size %uK", ps_kb);
-			error = PTR_ERR(hugetlbfs_vfsmount[i]);
-			hugetlbfs_vfsmount[i] = NULL;
-		}
+		mnt = mount_one_hugetlbfs(h);
+		if (IS_ERR(mnt) && i == 0)
+			goto out;
+		hugetlbfs_vfsmount[i] = mnt;
 		i++;
 	}
-	/* Non default hstates are optional */
-	if (!IS_ERR_OR_NULL(hugetlbfs_vfsmount[default_hstate_idx]))
-		return 0;
 
  out:
 	kmem_cache_destroy(hugetlbfs_inode_cachep);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 17/24] VFS: Remove kern_mount_data() [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (15 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 16/24] hugetlbfs: Convert to " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 18/24] VFS: Implement fsopen() to prepare for a mount " David Howells
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

The kern_mount_data() isn't used any more so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/namespace.c     |    7 -------
 include/linux/fs.h |    1 -
 2 files changed, 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c61ff2ab090a..dff482ad87b4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3192,13 +3192,6 @@ struct vfsmount *kern_mount(struct file_system_type *type)
 }
 EXPORT_SYMBOL_GPL(kern_mount);
 
-struct vfsmount *kern_mount_data(struct file_system_type *type,
-				 void *data, size_t data_size)
-{
-	return vfs_kern_mount(type, SB_KERNMOUNT, type->name, data, data_size);
-}
-EXPORT_SYMBOL_GPL(kern_mount_data);
-
 /*
  * Return true if path is reachable from root
  *
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c1f1428f6c67..9bc78e2c3ce5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2187,7 +2187,6 @@ mount_pseudo(struct file_system_type *fs_type, char *name,
 extern int register_filesystem(struct file_system_type *);
 extern int unregister_filesystem(struct file_system_type *);
 extern struct vfsmount *kern_mount(struct file_system_type *);
-extern struct vfsmount *kern_mount_data(struct file_system_type *, void *, size_t);
 extern void kern_unmount(struct vfsmount *mnt);
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 18/24] VFS: Implement fsopen() to prepare for a mount [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (16 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 17/24] VFS: Remove kern_mount_data() " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 19/24] VFS: Implement fsmount() to effect a pre-configured " David Howells
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Provide an fsopen() system call that starts the process of preparing to
mount, using an fd as a context handle.  fsopen() is given the name of the
filesystem that will be used:

	int mfd = fsopen(const char *fsname, int open_flags,
			 void *reserved3, void *reserved4,
			 void *reserved5);

where open_flags can be 0 or O_CLOEXEC and reserved* should all be NULL for
the moment.

For example:

	mfd = fsopen("ext4", O_CLOEXEC, NULL, NULL, NULL);
	write(mfd, "s /dev/sdb1"); // note I'm ignoring write's length arg
	write(mfd, "o noatime");
	write(mfd, "o acl");
	write(mfd, "o user_attr");
	write(mfd, "o iversion");
	write(mfd, "o ");
	write(mfd, "r /my/container"); // root inside the fs
	write(mfd, "x create"); // create the superblock
	fsmount(mfd, container_fd, "/mnt", AT_NO_FOLLOW);

	mfd = fsopen("afs", -1);
	write(mfd, "s %grand.central.org:root.cell");
	write(mfd, "o cell=grand.central.org");
	write(mfd, "r /");
	write(mfd, "x create");
	fsmount(mfd, AT_FDCWD, "/mnt", 0);

If an error is reported at any step, an error message may be available to be
read() back (ENODATA will be reported if there isn't an error available) in
the form:

	"e <subsys>:<problem>"
	"e SELinux:Mount on mountpoint not permitted"

Once fsmount() has been called, further write() calls will incur EBUSY,
even if the fsmount() fails.  read() is still possible to retrieve error
information.

The fsopen() syscall creates a mount context and hangs it of the fd that it
returns.

Netlink is not used because it is optional.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/Makefile                            |    2 
 fs/fsopen.c                            |  304 ++++++++++++++++++++++++++++++++
 fs/super.c                             |    3 
 include/linux/fs_context.h             |    1 
 include/linux/syscalls.h               |    2 
 include/uapi/linux/magic.h             |    1 
 kernel/sys_ni.c                        |    3 
 9 files changed, 315 insertions(+), 3 deletions(-)
 create mode 100644 fs/fsopen.c

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d6b27dab1b30..d02346692c3f 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -396,3 +396,4 @@
 382	i386	pkey_free		sys_pkey_free			__ia32_sys_pkey_free
 383	i386	statx			sys_statx			__ia32_sys_statx
 384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
+385	i386	fsopen			sys_fsopen			__ia32_sys_fsopen
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 4dfe42666d0c..6708847571e2 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -341,6 +341,7 @@
 330	common	pkey_alloc		__x64_sys_pkey_alloc
 331	common	pkey_free		__x64_sys_pkey_free
 332	common	statx			__x64_sys_statx
+333	common	fsopen			__x64_sys_fsopen
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/Makefile b/fs/Makefile
index 6f2dae3c32da..ee3c8b31cc58 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -13,7 +13,7 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o splice.o sync.o utimes.o d_path.o \
 		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
-		fs_context.o
+		fs_context.o fsopen.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o block_dev.o direct-io.o mpage.o
diff --git a/fs/fsopen.c b/fs/fsopen.c
new file mode 100644
index 000000000000..2d115bad13bb
--- /dev/null
+++ b/fs/fsopen.c
@@ -0,0 +1,304 @@
+/* Filesystem access-by-fd.
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/fs_context.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/file.h>
+#include <linux/magic.h>
+#include <linux/syscalls.h>
+
+static struct vfsmount *fscontext_fs_mnt __read_mostly;
+
+static int fscontext_fs_release(struct inode *inode, struct file *file)
+{
+	struct fs_context *fc = file->private_data;
+
+	file->private_data = NULL;
+
+	put_fs_context(fc);
+	return 0;
+}
+
+/*
+ * Userspace writes configuration data and commands to the fd and we parse it
+ * here.  For the moment, we assume a single option or command per write.  Each
+ * line written is of the form
+ *
+ *	<option_type><space><stuff...>
+ *
+ *	d /dev/sda1				-- Device name
+ *	o noatime				-- Option without value
+ *	o cell=grand.central.org		-- Option with value
+ *	r /					-- Dir within device to mount
+ *	x create				-- Create a superblock
+ */
+static ssize_t fscontext_fs_write(struct file *file,
+			   const char __user *_buf, size_t len, loff_t *pos)
+{
+	struct fs_context *fc = file->private_data;
+	struct inode *inode = file_inode(file);
+	char opt[2], *data;
+	ssize_t ret;
+
+	if (len < 3 || len > 4095)
+		return -EINVAL;
+
+	if (copy_from_user(opt, _buf, 2) != 0)
+		return -EFAULT;
+	switch (opt[0]) {
+	case 's':
+	case 'o':
+	case 'x':
+		break;
+	default:
+		goto err_bad_cmd;
+	}
+	if (opt[1] != ' ')
+		goto err_bad_cmd;
+
+	data = memdup_user_nul(_buf + 2, len - 2);
+	if (IS_ERR(data))
+		return PTR_ERR(data);
+
+	/* From this point onwards we need to lock the fd against someone
+	 * trying to mount it.
+	 */
+	ret = inode_lock_killable(inode);
+	if (ret < 0)
+		goto err_free;
+
+	ret = -EINVAL;
+	switch (opt[0]) {
+	case 's':
+		ret = vfs_set_fs_source(fc, data, len - 2);
+		if (ret < 0)
+			goto err_unlock;
+		data = NULL;
+		break;
+
+	case 'o':
+		ret = vfs_parse_fs_option(fc, data, len - 2);
+		if (ret < 0)
+			goto err_unlock;
+		break;
+
+	case 'x':
+		if (strcmp(data, "create") == 0) {
+			ret = vfs_get_tree(fc);
+		} else {
+			ret = -EOPNOTSUPP;
+		}
+		if (ret < 0)
+			goto err_unlock;
+		break;
+
+	default:
+		goto err_unlock;
+	}
+
+	ret = len;
+err_unlock:
+	inode_unlock(inode);
+err_free:
+	kfree(data);
+	return ret;
+err_bad_cmd:
+	return -EINVAL;
+}
+
+const struct file_operations fscontext_fs_fops = {
+	.write		= fscontext_fs_write,
+	.release	= fscontext_fs_release,
+	.llseek		= no_llseek,
+};
+
+/*
+ * Indicate the name we want to display the filesystem file as.
+ */
+static char *fscontext_fs_dname(struct dentry *dentry, char *buffer, int buflen)
+{
+	return dynamic_dname(dentry, buffer, buflen, "fs:[%lu]",
+			     d_inode(dentry)->i_ino);
+}
+
+static const struct dentry_operations fscontext_fs_dentry_operations = {
+	.d_dname	= fscontext_fs_dname,
+};
+
+/*
+ * Create a file that can be used to configure a new mount.
+ */
+static struct file *create_fscontext_file(struct fs_context *fc)
+{
+	struct inode *inode;
+	struct file *f;
+	struct path path;
+	int ret;
+
+	inode = alloc_anon_inode(fscontext_fs_mnt->mnt_sb);
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
+	inode->i_fop = &fscontext_fs_fops;
+
+	ret = -ENOMEM;
+	path.dentry = d_alloc_pseudo(fscontext_fs_mnt->mnt_sb, &empty_name);
+	if (!path.dentry)
+		goto err_inode;
+	path.mnt = mntget(fscontext_fs_mnt);
+
+	d_instantiate(path.dentry, inode);
+
+	f = alloc_file(&path, FMODE_READ | FMODE_WRITE, &fscontext_fs_fops);
+	if (IS_ERR(f)) {
+		ret = PTR_ERR(f);
+		goto err_file;
+	}
+
+	f->private_data = fc;
+	return f;
+
+err_file:
+	path_put(&path);
+	return ERR_PTR(ret);
+
+err_inode:
+	iput(inode);
+	return ERR_PTR(ret);
+}
+
+static const struct super_operations fscontext_fs_super_ops = {
+	.drop_inode	= generic_delete_inode,
+	.destroy_inode	= free_inode_nonrcu,
+	.statfs		= simple_statfs,
+};
+
+/*
+ * Finish filling in the superblock and allocate the root dentry.
+ */
+static int fscontext_fs_fill_super(struct super_block *sb,
+				   struct fs_context *fc)
+{
+	struct dentry *root;
+	struct inode *inode;
+
+	sb->s_op = &fscontext_fs_super_ops;
+	inode = alloc_anon_inode(sb);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+	inode->i_fop = &fscontext_fs_fops;
+
+	root = d_make_root(inode);
+	if (!root)
+		return -ENOMEM; /* inode is put by d_make_root() */
+	sb->s_root = root;
+	return 0;
+}
+
+static int fscontext_fs_get_tree(struct fs_context *fc)
+{
+	return vfs_get_super(fc, vfs_get_single_super, fscontext_fs_fill_super);
+}
+
+static const struct fs_context_operations fscontext_fs_context_ops = {
+	.get_tree	= fscontext_fs_get_tree,
+};
+
+static int fs_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	fc->ops = &fscontext_fs_context_ops;
+	return 0;
+}
+
+static struct file_system_type fscontext_fs_type = {
+	.name			= "fscontext",
+	.fs_context_size	= sizeof(struct fs_context),
+	.init_fs_context	= fs_init_fs_context,
+	.kill_sb		= kill_anon_super,
+};
+
+static int __init init_fscontext_fs(void)
+{
+	int ret;
+
+	ret = register_filesystem(&fscontext_fs_type);
+	if (ret < 0)
+		panic("Cannot register fscontext_fs\n");
+
+	fscontext_fs_mnt = kern_mount(&fscontext_fs_type);
+	if (IS_ERR(fscontext_fs_mnt))
+		panic("Cannot mount fscontext_fs: %ld\n",
+		      PTR_ERR(fscontext_fs_mnt));
+	return 0;
+}
+
+fs_initcall(init_fscontext_fs);
+
+/*
+ * Open a filesystem by name so that it can be configured for mounting.
+ *
+ * We are allowed to specify a container in which the filesystem will be
+ * opened, thereby indicating which namespaces will be used (notably, which
+ * network namespace will be used for network filesystems).
+ */
+SYSCALL_DEFINE5(fsopen, const char __user *, _fs_name, unsigned int, flags,
+		void *, reserved3, void *, reserved4, void *, reserved5)
+{
+	struct file_system_type *fs_type;
+	struct fs_context *fc;
+	struct file *file;
+	const char *fs_name;
+	int fd, ret;
+
+	if (flags & ~O_CLOEXEC || reserved3 || reserved4 || reserved5)
+		return -EINVAL;
+
+	fs_name = strndup_user(_fs_name, PAGE_SIZE);
+	if (IS_ERR(fs_name))
+		return PTR_ERR(fs_name);
+
+	fs_type = get_fs_type(fs_name);
+	kfree(fs_name);
+	if (!fs_type)
+		return -ENODEV;
+
+	fc = vfs_new_fs_context(fs_type, NULL, 0, FS_CONTEXT_FOR_USER_MOUNT);
+	put_filesystem(fs_type);
+	if (IS_ERR(fc))
+		return PTR_ERR(fc);
+
+	ret = -ENOTSUPP;
+	if (!fc->ops)
+		goto err_fc;
+
+	file = create_fscontext_file(fc);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto err_fc;
+	}
+
+	ret = get_unused_fd_flags(flags & O_CLOEXEC);
+	if (ret < 0)
+		goto err_file;
+
+	fd = ret;
+	fd_install(fd, file);
+	return fd;
+
+err_file:
+	fput(file);
+	return ret;
+
+err_fc:
+	put_fs_context(fc);
+	return ret;
+}
diff --git a/fs/super.c b/fs/super.c
index a27487e34ea4..1e2942f81bc9 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1270,8 +1270,7 @@ int vfs_get_super(struct fs_context *fc,
 		return PTR_ERR(sb);
 
 	if (!sb->s_root) {
-		int err;
-		err = fill_super(sb, fc);
+		int err = fill_super(sb, fc);
 		if (err) {
 			deactivate_locked_super(sb);
 			return err;
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index 1914eef0a88f..536ae7d60f1f 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -102,4 +102,5 @@ extern int vfs_get_super(struct fs_context *fc,
 			 int (*fill_super)(struct super_block *sb,
 					   struct fs_context *fc));
 
+extern const struct file_operations fs_fs_fops;
 #endif /* _LINUX_FS_CONTEXT_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 70fcda1a9049..3c9b10e92015 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -890,6 +890,8 @@ asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val);
 asmlinkage long sys_pkey_free(int pkey);
 asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
+asmlinkage long sys_fsopen(const char *fs_name, unsigned int flags,
+			   void *reserved3, void *reserved4, void *reserved5);
 
 
 /*
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 1a6fee974116..2fe02277fb32 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -89,5 +89,6 @@
 #define UDF_SUPER_MAGIC		0x15013346
 #define BALLOON_KVM_MAGIC	0x13661366
 #define ZSMALLOC_MAGIC		0x58295829
+#define FSCONTEXT_FS_MAGIC	0x66736673
 
 #endif /* __LINUX_MAGIC_H__ */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 9791364925dc..c113fc9d5e77 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -430,3 +430,6 @@ COND_SYSCALL(setresgid16);
 COND_SYSCALL(setresuid16);
 COND_SYSCALL(setreuid16);
 COND_SYSCALL(setuid16);
+
+/* fd-based mount */
+COND_SYSCALL(sys_fsopen);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 19/24] VFS: Implement fsmount() to effect a pre-configured mount [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (17 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 18/24] VFS: Implement fsopen() to prepare for a mount " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 20/24] afs: Fix server record deletion " David Howells
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Provide a system call by which a filesystem opened with fsopen() and
configured by a series of writes can be mounted:

	int ret = fsmount(int fsfd, int dfd, const char *path,
			  unsigned int at_flags, unsigned int flags);

where fsfd is the fd returned by fsopen(), dfd, path and at_flags locate
the mountpoint and flags are the applicable MS_* flags.  dfd can be
AT_FDCWD or an fd open to a directory.

In the event that fsmount() fails, it may be possible to get an error
message by calling read().  If no message is available, ENODATA will be
reported.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/namespace.c                         |   82 ++++++++++++++++++++++++++++++++
 include/linux/fs_context.h             |    2 -
 include/linux/syscalls.h               |    2 +
 kernel/sys_ni.c                        |    1 
 6 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d02346692c3f..5efec45b5ecb 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -397,3 +397,4 @@
 383	i386	statx			sys_statx			__ia32_sys_statx
 384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
 385	i386	fsopen			sys_fsopen			__ia32_sys_fsopen
+386	i386	fsmount			sys_fsmount			__ia32_sys_fsmount
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 6708847571e2..f602389e1406 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -342,6 +342,7 @@
 331	common	pkey_free		__x64_sys_pkey_free
 332	common	statx			__x64_sys_statx
 333	common	fsopen			__x64_sys_fsopen
+334	common	fsmount			__x64_sys_fsmount
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/namespace.c b/fs/namespace.c
index dff482ad87b4..14d110901bbe 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3192,6 +3192,88 @@ struct vfsmount *kern_mount(struct file_system_type *type)
 }
 EXPORT_SYMBOL_GPL(kern_mount);
 
+/*
+ * Mount a new, prepared superblock (specified by fs_fd) on the location
+ * specified by dfd and dir_name.  dfd can be AT_FDCWD, a dir fd or a container
+ * fd.  This cannot be used for binding, moving or remounting mounts.
+ */
+SYSCALL_DEFINE5(fsmount, int, fs_fd, int, dfd, const char __user *, dir_name,
+		unsigned int, at_flags, unsigned int, flags)
+{
+	struct fs_context *fc;
+	struct path mountpoint;
+	struct fd f;
+	unsigned int lookup_flags, mnt_flags = 0;
+	long ret;
+
+	if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
+			  AT_EMPTY_PATH)) != 0)
+		return -EINVAL;
+
+	if (flags & ~(MS_RDONLY | MS_NOSUID | MS_NODEV | MS_NOEXEC |
+		      MS_NOATIME | MS_NODIRATIME | MS_RELATIME | MS_STRICTATIME))
+		return -EINVAL;
+
+	if (flags & MS_RDONLY)
+		mnt_flags |= MNT_READONLY;
+	if (flags & MS_NOSUID)
+		mnt_flags |= MNT_NOSUID;
+	if (flags & MS_NODEV)
+		mnt_flags |= MNT_NODEV;
+	if (flags & MS_NOEXEC)
+		mnt_flags |= MNT_NOEXEC;
+	if (flags & MS_NODIRATIME)
+		mnt_flags |= MNT_NODIRATIME;
+
+	if (flags & MS_STRICTATIME) {
+		if (flags & MS_NOATIME)
+			return -EINVAL;
+	} else if (flags & MS_NOATIME) {
+		mnt_flags |= MNT_NOATIME;
+	} else {
+		mnt_flags |= MNT_RELATIME;
+	}
+
+	f = fdget(fs_fd);
+	if (!f.file)
+		return -EBADF;
+
+	ret = -EINVAL;
+	if (f.file->f_op != &fscontext_fs_fops)
+		goto err_fsfd;
+
+	fc = f.file->private_data;
+
+	ret = -EPERM;
+	if (!may_mount() ||
+	    ((fc->sb_flags & MS_MANDLOCK) && !may_mandlock()))
+		goto err_fsfd;
+
+	/* There must be a valid superblock or we can't mount it */
+	ret = -EINVAL;
+	if (!fc->root)
+		goto err_fsfd;
+
+	/* Find the mountpoint.  A container can be specified in dfd. */
+	lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+	if (at_flags & AT_SYMLINK_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
+	if (at_flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (at_flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+	ret = user_path_at(dfd, dir_name, lookup_flags, &mountpoint);
+	if (ret < 0)
+		goto err_fsfd;
+
+	ret = do_new_mount_fc(fc, &mountpoint, mnt_flags);
+
+	path_put(&mountpoint);
+err_fsfd:
+	fdput(f);
+	return ret;
+}
+
 /*
  * Return true if path is reachable from root
  *
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index 536ae7d60f1f..dd79acddabec 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -102,5 +102,5 @@ extern int vfs_get_super(struct fs_context *fc,
 			 int (*fill_super)(struct super_block *sb,
 					   struct fs_context *fc));
 
-extern const struct file_operations fs_fs_fops;
+extern const struct file_operations fscontext_fs_fops;
 #endif /* _LINUX_FS_CONTEXT_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 3c9b10e92015..e5f68788a096 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -892,6 +892,8 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_fsopen(const char *fs_name, unsigned int flags,
 			   void *reserved3, void *reserved4, void *reserved5);
+asmlinkage long sys_fsmount(int fsfd, int dfd, const char *path, unsigned int at_flags,
+			    unsigned int flags);
 
 
 /*
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c113fc9d5e77..2c236aad9b80 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -433,3 +433,4 @@ COND_SYSCALL(setuid16);
 
 /* fd-based mount */
 COND_SYSCALL(sys_fsopen);
+COND_SYSCALL(sys_fsmount);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 20/24] afs: Fix server record deletion [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (18 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 19/24] VFS: Implement fsmount() to effect a pre-configured " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 21/24] net: Export get_proc_net() " David Howells
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

AFS server records get removed from the net->fs_servers tree when they're
deleted, but not from the net->fs_addresses{4,6} lists, which can lead to
an oops in afs_find_server() when a server record has been removed, for
instance during rmmod.

Fix this by deleting the record from the by-address lists before posting it
for RCU destruction.

The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record alive,
so the oops would only happen when a fileserver eventually gets bored and
stops pinging or if the module gets rmmod'd and a call comes in from the
fileserver during the window between the server records being destroyed and
the socket being closed.

The oops looks something like:

BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
...
Workqueue: kafsd afs_process_async_call [kafs]
RIP: 0010:afs_find_server+0x271/0x36f [kafs]
...
Call Trace:
 ? worker_thread+0x230/0x2ac
 ? worker_thread+0x230/0x2ac
 afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
 afs_deliver_to_call+0x1ee/0x5e8 [kafs]
 ? worker_thread+0x230/0x2ac
 afs_process_async_call+0x5b/0xd0 [kafs]
 process_one_work+0x2c2/0x504
 ? worker_thread+0x230/0x2ac
 worker_thread+0x1d4/0x2ac
 ? rescuer_thread+0x29b/0x29b
 kthread+0x11f/0x127
 ? kthread_create_on_node+0x3f/0x3f
 ret_from_fork+0x24/0x30

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/server.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/afs/server.c b/fs/afs/server.c
index e23be63998a8..629c74986cff 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -428,8 +428,15 @@ static void afs_gc_servers(struct afs_net *net, struct afs_server *gc_list)
 		}
 		write_sequnlock(&net->fs_lock);
 
-		if (deleted)
+		if (deleted) {
+			write_seqlock(&net->fs_addr_lock);
+			if (!hlist_unhashed(&server->addr4_link))
+				hlist_del_rcu(&server->addr4_link);
+			if (!hlist_unhashed(&server->addr6_link))
+				hlist_del_rcu(&server->addr6_link);
+			write_sequnlock(&net->fs_addr_lock);
 			afs_destroy_server(net, server);
+		}
 	}
 }
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 21/24] net: Export get_proc_net() [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (19 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 20/24] afs: Fix server record deletion " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 22/24] afs: Add fs_context support " David Howells
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Export get_proc_net() so that write() routines attached to net-namespaced
proc files can find the network namespace that they're in.  Currently, this
is only accessible via seqfile routines.

This will permit AFS to have a separate cell database per-network namespace
and to manage each one independently of the others.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/proc/proc_net.c      |    3 ++-
 include/linux/proc_fs.h |    2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 1763f370489d..23d56146e38f 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -33,10 +33,11 @@ static inline struct net *PDE_NET(struct proc_dir_entry *pde)
 	return pde->parent->data;
 }
 
-static struct net *get_proc_net(const struct inode *inode)
+struct net *get_proc_net(const struct inode *inode)
 {
 	return maybe_get_net(PDE_NET(PDE(inode)));
 }
+EXPORT_SYMBOL_GPL(get_proc_net);
 
 int seq_open_net(struct inode *ino, struct file *f,
 		 const struct seq_operations *ops, int size)
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 928ef9e4d912..47a87121d30c 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -79,6 +79,8 @@ static inline struct proc_dir_entry *proc_net_mkdir(
 	return proc_mkdir_data(name, 0, parent, net);
 }
 
+extern struct net *get_proc_net(const struct inode *inode);
+
 struct ns_common;
 int open_related_ns(struct ns_common *ns,
 		   struct ns_common *(*get_ns)(struct ns_common *ns));

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 22/24] afs: Add fs_context support [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (20 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 21/24] net: Export get_proc_net() " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 23/24] afs: Implement namespacing " David Howells
  2018-04-19 13:33 ` [PATCH 24/24] afs: Use fs_context to pass parameters over automount " David Howells
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Add fs_context support to the AFS filesystem, converting the parameter
parsing to store options there.

This will form the basis for namespace propagation over mountpoints within
the AFS model, thereby allowing AFS to be used in containers more easily.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/internal.h |    9 +
 fs/afs/super.c    |  426 +++++++++++++++++++++++++++++------------------------
 fs/afs/volume.c   |    4 
 3 files changed, 242 insertions(+), 197 deletions(-)

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f8086ec95e24..0266730b3ad7 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -32,13 +32,16 @@
 struct pagevec;
 struct afs_call;
 
-struct afs_mount_params {
+struct afs_fs_context {
+	struct fs_context	fc;
+	struct afs_super_info	*as;
 	bool			rwpath;		/* T if the parent should be considered R/W */
 	bool			force;		/* T to force cell type */
 	bool			autocell;	/* T if set auto mount operation */
 	bool			dyn_root;	/* T if dynamic root */
+	bool			no_cell;	/* T if the source is "none" (for dynroot) */
 	afs_voltype_t		type;		/* type of volume requested */
-	int			volnamesz;	/* size of volume name */
+	unsigned int		volnamesz;	/* size of volume name */
 	const char		*volname;	/* name of volume to mount */
 	struct afs_net		*net;		/* Network namespace in effect */
 	struct afs_cell		*cell;		/* cell in which to find volume */
@@ -1007,7 +1010,7 @@ static inline struct afs_volume *__afs_get_volume(struct afs_volume *volume)
 	return volume;
 }
 
-extern struct afs_volume *afs_create_volume(struct afs_mount_params *);
+extern struct afs_volume *afs_create_volume(struct afs_fs_context *);
 extern void afs_activate_volume(struct afs_volume *);
 extern void afs_deactivate_volume(struct afs_volume *);
 extern void afs_put_volume(struct afs_cell *, struct afs_volume *);
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 7d17d01ca0cd..6ab0b79e061e 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -30,22 +30,21 @@
 #include "internal.h"
 
 static void afs_i_init_once(void *foo);
-static struct dentry *afs_mount(struct file_system_type *fs_type,
-				int flags, const char *dev_name,
-				void *data, size_t data_size);
 static void afs_kill_super(struct super_block *sb);
 static struct inode *afs_alloc_inode(struct super_block *sb);
 static void afs_destroy_inode(struct inode *inode);
 static int afs_statfs(struct dentry *dentry, struct kstatfs *buf);
 static int afs_show_devname(struct seq_file *m, struct dentry *root);
 static int afs_show_options(struct seq_file *m, struct dentry *root);
+static int afs_init_fs_context(struct fs_context *fc, struct super_block *src_sb);
 
 struct file_system_type afs_fs_type = {
-	.owner		= THIS_MODULE,
-	.name		= "afs",
-	.mount		= afs_mount,
-	.kill_sb	= afs_kill_super,
-	.fs_flags	= 0,
+	.owner			= THIS_MODULE,
+	.name			= "afs",
+	.fs_context_size	= sizeof(struct afs_fs_context),
+	.init_fs_context	= afs_init_fs_context,
+	.kill_sb		= afs_kill_super,
+	.fs_flags		= 0,
 };
 MODULE_ALIAS_FS("afs");
 
@@ -189,61 +188,53 @@ static int afs_show_options(struct seq_file *m, struct dentry *root)
 }
 
 /*
- * parse the mount options
- * - this function has been shamelessly adapted from the ext3 fs which
- *   shamelessly adapted it from the msdos fs
+ * Parse an single mount option.
  */
-static int afs_parse_options(struct afs_mount_params *params,
-			     char *options, const char **devname)
+static int afs_parse_option(struct fs_context *fc, char *opt, size_t len)
 {
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 	struct afs_cell *cell;
 	substring_t args[MAX_OPT_ARGS];
-	char *p;
-	int token;
-
-	_enter("%s", options);
-
-	options[PAGE_SIZE - 1] = 0;
-
-	while ((p = strsep(&options, ","))) {
-		if (!*p)
-			continue;
-
-		token = match_token(p, afs_options_list, args);
-		switch (token) {
-		case afs_opt_cell:
-			rcu_read_lock();
-			cell = afs_lookup_cell_rcu(params->net,
-						   args[0].from,
-						   args[0].to - args[0].from);
-			rcu_read_unlock();
-			if (IS_ERR(cell))
-				return PTR_ERR(cell);
-			afs_put_cell(params->net, params->cell);
-			params->cell = cell;
-			break;
-
-		case afs_opt_rwpath:
-			params->rwpath = true;
-			break;
-
-		case afs_opt_vol:
-			*devname = args[0].from;
-			break;
-
-		case afs_opt_autocell:
-			params->autocell = true;
-			break;
-
-		case afs_opt_dyn:
-			params->dyn_root = true;
-			break;
-
-		default:
-			printk(KERN_ERR "kAFS:"
-			       " Unknown or invalid mount option: '%s'\n", p);
+	int token, size;
+
+	_enter("%s", opt);
+
+	token = match_token(opt, afs_options_list, args);
+	switch (token) {
+	case afs_opt_cell:
+		size = args[0].to - args[0].from;
+		if (size <= 0)
 			return -EINVAL;
-		}
+		if (size > AFS_MAXCELLNAME)
+			return -ENAMETOOLONG;
+
+		rcu_read_lock();
+		cell = afs_lookup_cell_rcu(ctx->net, args[0].from, size);
+		rcu_read_unlock();
+		if (IS_ERR(cell))
+			return PTR_ERR(cell);
+		afs_put_cell(ctx->net, ctx->cell);
+		ctx->cell = cell;
+		break;
+
+	case afs_opt_rwpath:
+		ctx->rwpath = true;
+		break;
+
+	case afs_opt_vol:
+		return -EINVAL; /* Not required for automount */
+
+	case afs_opt_autocell:
+		ctx->autocell = true;
+		break;
+
+	case afs_opt_dyn:
+		ctx->dyn_root = true;
+		break;
+
+	default:
+		printk(KERN_ERR "kAFS: Unknown or invalid mount option: '%s'\n", opt);
+		return -EINVAL;
 	}
 
 	_leave(" = 0");
@@ -251,9 +242,10 @@ static int afs_parse_options(struct afs_mount_params *params,
 }
 
 /*
- * parse a device name to get cell name, volume name, volume type and R/W
- * selector
- * - this can be one of the following:
+ * Parse the source name to get cell name, volume name, volume type and R/W
+ * selector.
+ *
+ * This can be one of the following:
  *	"%[cell:]volume[.]"		R/W volume
  *	"#[cell:]volume[.]"		R/O or R/W volume (rwpath=0),
  *					 or R/W (rwpath=1) volume
@@ -262,11 +254,11 @@ static int afs_parse_options(struct afs_mount_params *params,
  *	"%[cell:]volume.backup"		Backup volume
  *	"#[cell:]volume.backup"		Backup volume
  */
-static int afs_parse_device_name(struct afs_mount_params *params,
-				 const char *name)
+static int afs_parse_source(struct fs_context *fc)
 {
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 	struct afs_cell *cell;
-	const char *cellname, *suffix;
+	const char *cellname, *suffix, *name = fc->source;
 	int cellnamesz;
 
 	_enter(",%s", name);
@@ -277,69 +269,116 @@ static int afs_parse_device_name(struct afs_mount_params *params,
 	}
 
 	if ((name[0] != '%' && name[0] != '#') || !name[1]) {
+		/* To use dynroot, we don't want to have to provide a source */
+		if (strcmp(name, "none") == 0) {
+			ctx->no_cell = true;
+			return 0;
+		}
 		printk(KERN_ERR "kAFS: unparsable volume name\n");
 		return -EINVAL;
 	}
 
 	/* determine the type of volume we're looking for */
-	params->type = AFSVL_ROVOL;
-	params->force = false;
-	if (params->rwpath || name[0] == '%') {
-		params->type = AFSVL_RWVOL;
-		params->force = true;
+	ctx->type = AFSVL_ROVOL;
+	ctx->force = false;
+	if (ctx->rwpath || name[0] == '%') {
+		ctx->type = AFSVL_RWVOL;
+		ctx->force = true;
 	}
 	name++;
 
 	/* split the cell name out if there is one */
-	params->volname = strchr(name, ':');
-	if (params->volname) {
+	ctx->volname = strchr(name, ':');
+	if (ctx->volname) {
 		cellname = name;
-		cellnamesz = params->volname - name;
-		params->volname++;
+		cellnamesz = ctx->volname - name;
+		ctx->volname++;
 	} else {
-		params->volname = name;
+		ctx->volname = name;
 		cellname = NULL;
 		cellnamesz = 0;
 	}
 
 	/* the volume type is further affected by a possible suffix */
-	suffix = strrchr(params->volname, '.');
+	suffix = strrchr(ctx->volname, '.');
 	if (suffix) {
 		if (strcmp(suffix, ".readonly") == 0) {
-			params->type = AFSVL_ROVOL;
-			params->force = true;
+			ctx->type = AFSVL_ROVOL;
+			ctx->force = true;
 		} else if (strcmp(suffix, ".backup") == 0) {
-			params->type = AFSVL_BACKVOL;
-			params->force = true;
+			ctx->type = AFSVL_BACKVOL;
+			ctx->force = true;
 		} else if (suffix[1] == 0) {
 		} else {
 			suffix = NULL;
 		}
 	}
 
-	params->volnamesz = suffix ?
-		suffix - params->volname : strlen(params->volname);
+	ctx->volnamesz = suffix ?
+		suffix - ctx->volname : strlen(ctx->volname);
 
 	_debug("cell %*.*s [%p]",
-	       cellnamesz, cellnamesz, cellname ?: "", params->cell);
+	       cellnamesz, cellnamesz, cellname ?: "", ctx->cell);
 
 	/* lookup the cell record */
-	if (cellname || !params->cell) {
-		cell = afs_lookup_cell(params->net, cellname, cellnamesz,
+	if (cellname) {
+		cell = afs_lookup_cell(ctx->net, cellname, cellnamesz,
 				       NULL, false);
 		if (IS_ERR(cell)) {
-			printk(KERN_ERR "kAFS: unable to lookup cell '%*.*s'\n",
+			pr_err("kAFS: unable to lookup cell '%*.*s'\n",
 			       cellnamesz, cellnamesz, cellname ?: "");
 			return PTR_ERR(cell);
 		}
-		afs_put_cell(params->net, params->cell);
-		params->cell = cell;
+		afs_put_cell(ctx->net, ctx->cell);
+		ctx->cell = cell;
 	}
 
 	_debug("CELL:%s [%p] VOLUME:%*.*s SUFFIX:%s TYPE:%d%s",
-	       params->cell->name, params->cell,
-	       params->volnamesz, params->volnamesz, params->volname,
-	       suffix ?: "-", params->type, params->force ? " FORCE" : "");
+	       ctx->cell->name, ctx->cell,
+	       ctx->volnamesz, ctx->volnamesz, ctx->volname,
+	       suffix ?: "-", ctx->type, ctx->force ? " FORCE" : "");
+
+	return 0;
+}
+
+/*
+ * Validate the options, get the cell key and look up the volume.
+ */
+static int afs_validate_fc(struct fs_context *fc)
+{
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
+	struct afs_volume *volume;
+	struct key *key;
+
+	if (!ctx->dyn_root) {
+		if (ctx->no_cell) {
+			pr_warn("kAFS: Can only specify source 'none' with -o dyn\n");
+			return -EINVAL;
+		}
+
+		if (!ctx->cell) {
+			pr_warn("kAFS: No cell specified\n");
+			return -EDESTADDRREQ;
+		}
+
+		/* We try to do the mount securely. */
+		key = afs_request_key(ctx->cell);
+		if (IS_ERR(key))
+			return PTR_ERR(key);
+
+		ctx->key = key;
+
+		if (ctx->volume) {
+			afs_put_volume(ctx->cell, ctx->volume);
+			ctx->volume = NULL;
+		}
+
+		volume = afs_create_volume(ctx);
+		if (IS_ERR(volume))
+			return PTR_ERR(volume);
+
+		ctx->volume = volume;
+	}
 
 	return 0;
 }
@@ -347,34 +386,34 @@ static int afs_parse_device_name(struct afs_mount_params *params,
 /*
  * check a superblock to see if it's the one we're looking for
  */
-static int afs_test_super(struct super_block *sb, void *data)
+static int afs_test_super(struct super_block *sb, struct fs_context *fc)
 {
-	struct afs_super_info *as1 = data;
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 	struct afs_super_info *as = AFS_FS_S(sb);
 
-	return (as->net == as1->net &&
+	return (as->net == ctx->net &&
 		as->volume &&
-		as->volume->vid == as1->volume->vid);
+		as->volume->vid == ctx->volume->vid);
 }
 
-static int afs_dynroot_test_super(struct super_block *sb, void *data)
+static int afs_dynroot_test_super(struct super_block *sb, struct fs_context *fc)
 {
 	return false;
 }
 
-static int afs_set_super(struct super_block *sb, void *data)
+static int afs_set_super(struct super_block *sb, struct fs_context *fc)
 {
-	struct afs_super_info *as = data;
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 
-	sb->s_fs_info = as;
+	sb->s_fs_info = ctx->as;
+	ctx->as = NULL;
 	return set_anon_super(sb, NULL);
 }
 
 /*
  * fill in the superblock
  */
-static int afs_fill_super(struct super_block *sb,
-			  struct afs_mount_params *params)
+static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
 {
 	struct afs_super_info *as = AFS_FS_S(sb);
 	struct afs_fid fid;
@@ -405,13 +444,13 @@ static int afs_fill_super(struct super_block *sb,
 		fid.vid		= as->volume->vid;
 		fid.vnode	= 1;
 		fid.unique	= 1;
-		inode = afs_iget(sb, params->key, &fid, NULL, NULL, NULL);
+		inode = afs_iget(sb, ctx->key, &fid, NULL, NULL, NULL);
 	}
 
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
-	if (params->autocell || params->dyn_root)
+	if (ctx->autocell || as->dyn_root)
 		set_bit(AFS_VNODE_AUTOCELL, &AFS_FS_I(inode)->flags);
 
 	ret = -ENOMEM;
@@ -419,7 +458,7 @@ static int afs_fill_super(struct super_block *sb,
 	if (!sb->s_root)
 		goto error;
 
-	if (params->dyn_root)
+	if (as->dyn_root)
 		sb->s_d_op = &afs_dynroot_dentry_operations;
 	else
 		sb->s_d_op = &afs_fs_dentry_operations;
@@ -432,17 +471,19 @@ static int afs_fill_super(struct super_block *sb,
 	return ret;
 }
 
-static struct afs_super_info *afs_alloc_sbi(struct afs_mount_params *params)
+static struct afs_super_info *afs_alloc_sbi(struct afs_fs_context *ctx)
 {
 	struct afs_super_info *as;
 
 	as = kzalloc(sizeof(struct afs_super_info), GFP_KERNEL);
 	if (as) {
-		as->net = afs_get_net(params->net);
-		if (params->dyn_root)
+		as->net = afs_get_net(ctx->net);
+		if (ctx->dyn_root) {
 			as->dyn_root = true;
-		else
-			as->cell = afs_get_cell(params->cell);
+		} else {
+			as->cell = afs_get_cell(ctx->cell);
+			as->volume = __afs_get_volume(ctx->volume);
+		}
 	}
 	return as;
 }
@@ -457,127 +498,128 @@ static void afs_destroy_sbi(struct afs_super_info *as)
 	}
 }
 
+static void afs_kill_super(struct super_block *sb)
+{
+	struct afs_super_info *as = AFS_FS_S(sb);
+
+	/* Clear the callback interests (which will do ilookup5) before
+	 * deactivating the superblock.
+	 */
+	if (as->volume)
+		afs_clear_callback_interests(as->net, as->volume->servers);
+	kill_anon_super(sb);
+	if (as->volume)
+		afs_deactivate_volume(as->volume);
+	afs_destroy_sbi(as);
+}
+
 /*
- * get an AFS superblock
+ * Get an AFS superblock and root directory.
  */
-static struct dentry *afs_mount(struct file_system_type *fs_type,
-				int flags, const char *dev_name,
-				void *options, size_t data_size)
+static int afs_get_tree(struct fs_context *fc)
 {
-	struct afs_mount_params params;
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 	struct super_block *sb;
-	struct afs_volume *candidate;
-	struct key *key;
 	struct afs_super_info *as;
 	int ret;
 
-	_enter(",,%s,%p", dev_name, options);
-
-	memset(&params, 0, sizeof(params));
-	params.net = &__afs_net;
-
-	ret = -EINVAL;
-	if (current->nsproxy->net_ns != &init_net)
-		goto error;
-
-	/* parse the options and device name */
-	if (options) {
-		ret = afs_parse_options(&params, options, &dev_name);
-		if (ret < 0)
-			goto error;
-	}
-
-	if (!params.dyn_root) {
-		ret = afs_parse_device_name(&params, dev_name);
-		if (ret < 0)
-			goto error;
-
-		/* try and do the mount securely */
-		key = afs_request_key(params.cell);
-		if (IS_ERR(key)) {
-			_leave(" = %ld [key]", PTR_ERR(key));
-			ret = PTR_ERR(key);
-			goto error;
-		}
-		params.key = key;
-	}
+	_enter("%s", fc->source);
 
 	/* allocate a superblock info record */
-	ret = -ENOMEM;
-	as = afs_alloc_sbi(&params);
-	if (!as)
-		goto error_key;
-
-	if (!params.dyn_root) {
-		/* Assume we're going to need a volume record; at the very
-		 * least we can use it to update the volume record if we have
-		 * one already.  This checks that the volume exists within the
-		 * cell.
-		 */
-		candidate = afs_create_volume(&params);
-		if (IS_ERR(candidate)) {
-			ret = PTR_ERR(candidate);
-			goto error_as;
-		}
-
-		as->volume = candidate;
+	as = ctx->as;
+	if (!as) {
+		ret = -ENOMEM;
+		as = afs_alloc_sbi(ctx);
+		if (!as)
+			goto error;
+		ctx->as = as;
 	}
 
 	/* allocate a deviceless superblock */
-	sb = sget(fs_type,
-		  as->dyn_root ? afs_dynroot_test_super : afs_test_super,
-		  afs_set_super, flags, as);
+	sb = sget_fc(fc,
+		     as->dyn_root ? afs_dynroot_test_super : afs_test_super,
+		     afs_set_super);
 	if (IS_ERR(sb)) {
 		ret = PTR_ERR(sb);
-		goto error_as;
+		goto error;
 	}
 
 	if (!sb->s_root) {
 		/* initial superblock/root creation */
 		_debug("create");
-		ret = afs_fill_super(sb, &params);
+		ret = afs_fill_super(sb, ctx);
 		if (ret < 0)
 			goto error_sb;
-		as = NULL;
 		sb->s_flags |= SB_ACTIVE;
 	} else {
 		_debug("reuse");
 		ASSERTCMP(sb->s_flags, &, SB_ACTIVE);
-		afs_destroy_sbi(as);
-		as = NULL;
 	}
 
-	afs_put_cell(params.net, params.cell);
-	key_put(params.key);
+	ctx->fc.root = dget(sb->s_root);
+	ctx->fc.drop_sb = true;
 	_leave(" = 0 [%p]", sb);
-	return dget(sb->s_root);
+	return 0;
 
 error_sb:
 	deactivate_locked_super(sb);
-	goto error_key;
-error_as:
-	afs_destroy_sbi(as);
-error_key:
-	key_put(params.key);
 error:
-	afs_put_cell(params.net, params.cell);
 	_leave(" = %d", ret);
-	return ERR_PTR(ret);
+	return ret;
 }
 
-static void afs_kill_super(struct super_block *sb)
+static void afs_free_fc(struct fs_context *fc)
 {
-	struct afs_super_info *as = AFS_FS_S(sb);
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 
-	/* Clear the callback interests (which will do ilookup5) before
-	 * deactivating the superblock.
-	 */
-	if (as->volume)
-		afs_clear_callback_interests(as->net, as->volume->servers);
-	kill_anon_super(sb);
-	if (as->volume)
-		afs_deactivate_volume(as->volume);
-	afs_destroy_sbi(as);
+	afs_put_volume(ctx->cell, ctx->volume);
+	afs_put_cell(ctx->net, ctx->cell);
+	afs_put_net(ctx->net);
+	afs_destroy_sbi(ctx->as);
+	key_put(ctx->key);
+}
+
+static const struct fs_context_operations afs_context_ops = {
+	.free		= afs_free_fc,
+	.parse_source	= afs_parse_source,
+	.parse_option	= afs_parse_option,
+	.validate	= afs_validate_fc,
+	.get_tree	= afs_get_tree,
+};
+
+/*
+ * Set up the filesystem mount context.
+ */
+static int afs_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
+{
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
+	struct afs_super_info *src_as;
+	struct afs_cell *cell;
+
+	if (current->nsproxy->net_ns != &init_net)
+		return -EINVAL;
+
+	if (src_sb) {
+		src_as = AFS_FS_S(src_sb);
+		if (src_as) {
+			ctx->net    = afs_get_net(src_as->net);
+			ctx->cell   = afs_get_cell(src_as->cell);
+			ctx->volume = __afs_get_volume(src_as->volume);
+		}
+	} else {
+		ctx->net = afs_get_net(&__afs_net);
+
+		/* Default to the workstation cell. */
+		rcu_read_lock();
+		cell = afs_lookup_cell_rcu(ctx->net, NULL, 0);
+		rcu_read_unlock();
+		if (IS_ERR(cell))
+			cell = NULL;
+		ctx->cell = cell;
+	}
+
+	ctx->fc.ops = &afs_context_ops;
+	return 0;
 }
 
 /*
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 3037bd01f617..7adcddf02e66 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -21,7 +21,7 @@ static const char *const afs_voltypes[] = { "R/W", "R/O", "BAK" };
 /*
  * Allocate a volume record and load it up from a vldb record.
  */
-static struct afs_volume *afs_alloc_volume(struct afs_mount_params *params,
+static struct afs_volume *afs_alloc_volume(struct afs_fs_context *params,
 					   struct afs_vldb_entry *vldb,
 					   unsigned long type_mask)
 {
@@ -149,7 +149,7 @@ static struct afs_vldb_entry *afs_vl_lookup_vldb(struct afs_cell *cell,
  * - Rule 3: If parent volume is R/W, then only mount R/W volume unless
  *           explicitly told otherwise
  */
-struct afs_volume *afs_create_volume(struct afs_mount_params *params)
+struct afs_volume *afs_create_volume(struct afs_fs_context *params)
 {
 	struct afs_vldb_entry *vldb;
 	struct afs_volume *volume;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 23/24] afs: Implement namespacing [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (21 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 22/24] afs: Add fs_context support " David Howells
@ 2018-04-19 13:33 ` David Howells
  2018-04-19 13:33 ` [PATCH 24/24] afs: Use fs_context to pass parameters over automount " David Howells
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, linux-security-module,
	linux-fsdevel, linux-afs

Implement namespacing within AFS, but don't yet let mounts occur outside
the init namespace.  An additional patch will be required propagate the
network namespace across automounts.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/cell.c     |    4 +-
 fs/afs/internal.h |   36 ++++++++++++---------
 fs/afs/main.c     |   33 ++++++++++++++++----
 fs/afs/proc.c     |   89 +++++++++++++++++++++++++++++++++++------------------
 fs/afs/super.c    |   58 +++++++++++++++++++++++++----------
 5 files changed, 149 insertions(+), 71 deletions(-)

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index fdf4c36cff79..a98a8a3d5544 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -528,7 +528,7 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
 					     NULL, 0,
 					     cell, 0, true);
 #endif
-	ret = afs_proc_cell_setup(net, cell);
+	ret = afs_proc_cell_setup(cell);
 	if (ret < 0)
 		return ret;
 	spin_lock(&net->proc_cells_lock);
@@ -544,7 +544,7 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
 {
 	_enter("%s", cell->name);
 
-	afs_proc_cell_remove(net, cell);
+	afs_proc_cell_remove(cell);
 
 	spin_lock(&net->proc_cells_lock);
 	list_del_init(&cell->proc_link);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 0266730b3ad7..a5161c0ae3ab 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -22,6 +22,8 @@
 #include <linux/backing-dev.h>
 #include <linux/uuid.h>
 #include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <net/sock.h>
 #include <net/af_rxrpc.h>
 
 #include "afs.h"
@@ -192,7 +194,7 @@ struct afs_read {
  * - there's one superblock per volume
  */
 struct afs_super_info {
-	struct afs_net		*net;		/* Network namespace */
+	struct net		*net_ns;	/* Network namespace */
 	struct afs_cell		*cell;		/* The cell in which the volume resides */
 	struct afs_volume	*volume;	/* volume record */
 	bool			dyn_root;	/* True if dynamic root */
@@ -221,6 +223,7 @@ struct afs_sysnames {
  * AFS network namespace record.
  */
 struct afs_net {
+	struct net		*net;		/* Backpointer to the owning net namespace */
 	struct afs_uuid		uuid;
 	bool			live;		/* F if this namespace is being removed */
 
@@ -283,7 +286,6 @@ struct afs_net {
 };
 
 extern const char afs_init_sysname[];
-extern struct afs_net __afs_net;// Dummy AFS network namespace; TODO: replace with real netns
 
 enum afs_cell_state {
 	AFS_CELL_UNSET,
@@ -790,34 +792,36 @@ extern int afs_drop_inode(struct inode *);
  * main.c
  */
 extern struct workqueue_struct *afs_wq;
+extern int afs_net_id;
 
-static inline struct afs_net *afs_d2net(struct dentry *dentry)
+static inline struct afs_net *afs_net(struct net *net)
 {
-	return &__afs_net;
+	return net_generic(net, afs_net_id);
 }
 
-static inline struct afs_net *afs_i2net(struct inode *inode)
+static inline struct afs_net *afs_sb2net(struct super_block *sb)
 {
-	return &__afs_net;
+	return afs_net(AFS_FS_S(sb)->net_ns);
 }
 
-static inline struct afs_net *afs_v2net(struct afs_vnode *vnode)
+static inline struct afs_net *afs_d2net(struct dentry *dentry)
 {
-	return &__afs_net;
+	return afs_sb2net(dentry->d_sb);
 }
 
-static inline struct afs_net *afs_sock2net(struct sock *sk)
+static inline struct afs_net *afs_i2net(struct inode *inode)
 {
-	return &__afs_net;
+	return afs_sb2net(inode->i_sb);
 }
 
-static inline struct afs_net *afs_get_net(struct afs_net *net)
+static inline struct afs_net *afs_v2net(struct afs_vnode *vnode)
 {
-	return net;
+	return afs_i2net(&vnode->vfs_inode);
 }
 
-static inline void afs_put_net(struct afs_net *net)
+static inline struct afs_net *afs_sock2net(struct sock *sk)
 {
+	return net_generic(sock_net(sk), afs_net_id);
 }
 
 static inline void __afs_stat(atomic_t *s)
@@ -852,8 +856,8 @@ extern int afs_get_ipv4_interfaces(struct afs_interface *, size_t, bool);
  */
 extern int __net_init afs_proc_init(struct afs_net *);
 extern void __net_exit afs_proc_cleanup(struct afs_net *);
-extern int afs_proc_cell_setup(struct afs_net *, struct afs_cell *);
-extern void afs_proc_cell_remove(struct afs_net *, struct afs_cell *);
+extern int afs_proc_cell_setup(struct afs_cell *);
+extern void afs_proc_cell_remove(struct afs_cell *);
 extern void afs_put_sysnames(struct afs_sysnames *);
 
 /*
@@ -986,7 +990,7 @@ extern bool afs_annotate_server_list(struct afs_server_list *, struct afs_server
  * super.c
  */
 extern int __init afs_fs_init(void);
-extern void __exit afs_fs_exit(void);
+extern void afs_fs_exit(void);
 
 /*
  * vlclient.c
diff --git a/fs/afs/main.c b/fs/afs/main.c
index d7560168b3bf..7d2c1354e2ca 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -15,6 +15,7 @@
 #include <linux/completion.h>
 #include <linux/sched.h>
 #include <linux/random.h>
+#include <linux/proc_fs.h>
 #define CREATE_TRACE_POINTS
 #include "internal.h"
 
@@ -32,7 +33,7 @@ module_param(rootcell, charp, 0);
 MODULE_PARM_DESC(rootcell, "root AFS cell name and VL server IP addr list");
 
 struct workqueue_struct *afs_wq;
-struct afs_net __afs_net;
+static struct proc_dir_entry *afs_proc_symlink;
 
 #if defined(CONFIG_ALPHA)
 const char afs_init_sysname[] = "alpha_linux26";
@@ -67,11 +68,13 @@ const char afs_init_sysname[] = "unknown_linux26";
 /*
  * Initialise an AFS network namespace record.
  */
-static int __net_init afs_net_init(struct afs_net *net)
+static int __net_init afs_net_init(struct net *net_ns)
 {
 	struct afs_sysnames *sysnames;
+	struct afs_net *net = afs_net(net_ns);
 	int ret;
 
+	net->net = net_ns;
 	net->live = true;
 	generate_random_uuid((unsigned char *)&net->uuid);
 
@@ -142,8 +145,10 @@ static int __net_init afs_net_init(struct afs_net *net)
 /*
  * Clean up and destroy an AFS network namespace record.
  */
-static void __net_exit afs_net_exit(struct afs_net *net)
+static void __net_exit afs_net_exit(struct net *net_ns)
 {
+	struct afs_net *net = afs_net(net_ns);
+
 	net->live = false;
 	afs_cell_purge(net);
 	afs_purge_servers(net);
@@ -152,6 +157,13 @@ static void __net_exit afs_net_exit(struct afs_net *net)
 	afs_put_sysnames(net->sysnames);
 }
 
+static struct pernet_operations afs_net_ops = {
+	.init	= afs_net_init,
+	.exit	= afs_net_exit,
+	.id	= &afs_net_id,
+	.size	= sizeof(struct afs_net),
+};
+
 /*
  * initialise the AFS client FS module
  */
@@ -178,7 +190,7 @@ static int __init afs_init(void)
 		goto error_cache;
 #endif
 
-	ret = afs_net_init(&__afs_net);
+	ret = register_pernet_subsys(&afs_net_ops);
 	if (ret < 0)
 		goto error_net;
 
@@ -187,10 +199,18 @@ static int __init afs_init(void)
 	if (ret < 0)
 		goto error_fs;
 
+	afs_proc_symlink = proc_symlink("fs/afs", NULL, "../self/net/afs");
+	if (IS_ERR(afs_proc_symlink)) {
+		ret = PTR_ERR(afs_proc_symlink);
+		goto error_proc;
+	}
+
 	return ret;
 
+error_proc:
+	afs_fs_exit();
 error_fs:
-	afs_net_exit(&__afs_net);
+	unregister_pernet_subsys(&afs_net_ops);
 error_net:
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_unregister_netfs(&afs_cache_netfs);
@@ -219,8 +239,9 @@ static void __exit afs_exit(void)
 {
 	printk(KERN_INFO "kAFS: Red Hat AFS client v0.1 unregistering.\n");
 
+	proc_remove(afs_proc_symlink);
 	afs_fs_exit();
-	afs_net_exit(&__afs_net);
+	unregister_pernet_subsys(&afs_net_ops);
 #ifdef CONFIG_AFS_FSCACHE
 	fscache_unregister_netfs(&afs_cache_netfs);
 #endif
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 839a22280606..cc7c48a5b743 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -17,14 +17,16 @@
 #include <linux/uaccess.h>
 #include "internal.h"
 
-static inline struct afs_net *afs_proc2net(struct file *f)
+static inline struct afs_net *afs_proc2net_get(struct file *f)
 {
-	return &__afs_net;
+	struct net *net_ns = get_proc_net(file_inode(f));
+
+	return net_ns ? afs_net(net_ns) : NULL;
 }
 
 static inline struct afs_net *afs_seq2net(struct seq_file *m)
 {
-	return &__afs_net; // TODO: use seq_file_net(m)
+	return afs_net(seq_file_net(m));
 }
 
 static int afs_proc_cells_open(struct inode *inode, struct file *file);
@@ -161,7 +163,7 @@ int afs_proc_init(struct afs_net *net)
 {
 	_enter("");
 
-	net->proc_afs = proc_mkdir("fs/afs", NULL);
+	net->proc_afs = proc_net_mkdir(net->net, "afs", net->net->proc_net);
 	if (!net->proc_afs)
 		goto error_dir;
 
@@ -196,16 +198,8 @@ void afs_proc_cleanup(struct afs_net *net)
  */
 static int afs_proc_cells_open(struct inode *inode, struct file *file)
 {
-	struct seq_file *m;
-	int ret;
-
-	ret = seq_open(file, &afs_proc_cells_ops);
-	if (ret < 0)
-		return ret;
-
-	m = file->private_data;
-	m->private = PDE_DATA(inode);
-	return 0;
+	return seq_open_net(inode, file, &afs_proc_cells_ops,
+			    sizeof(struct seq_net_private));
 }
 
 /*
@@ -266,7 +260,8 @@ static int afs_proc_cells_show(struct seq_file *m, void *v)
 static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 				    size_t size, loff_t *_pos)
 {
-	struct afs_net *net = afs_proc2net(file);
+	struct afs_net *net;
+	struct net *net_ns = NULL;
 	char *kbuf, *name, *args;
 	int ret;
 
@@ -305,6 +300,12 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 	/* determine command to perform */
 	_debug("cmd=%s name=%s args=%s", kbuf, name, args);
 
+	ret = -ESTALE;
+	net_ns = get_proc_net(file_inode(file));
+	if (!net_ns)
+		goto done;
+	net = afs_net(net_ns);
+
 	if (strcmp(kbuf, "add") == 0) {
 		struct afs_cell *cell;
 
@@ -324,6 +325,7 @@ static ssize_t afs_proc_cells_write(struct file *file, const char __user *buf,
 	ret = size;
 
 done:
+	put_net(net_ns);
 	kfree(kbuf);
 	_leave(" = %d", ret);
 	return ret;
@@ -338,15 +340,24 @@ static ssize_t afs_proc_rootcell_read(struct file *file, char __user *buf,
 				      size_t size, loff_t *_pos)
 {
 	struct afs_cell *cell;
-	struct afs_net *net = afs_proc2net(file);
+	struct afs_net *net;
+	struct net *net_ns = NULL;
 	unsigned int seq = 0;
 	char name[AFS_MAXCELLNAME + 1];
 	int len;
 
 	if (*_pos > 0)
 		return 0;
-	if (!net->ws_cell)
-		return 0;
+
+	net_ns = get_proc_net(file_inode(file));
+	if (!net_ns)
+		return -ESTALE;
+	net = afs_net(net_ns);
+
+	if (!net->ws_cell) {
+		len = 0;
+		goto out;
+	}
 
 	rcu_read_lock();
 	do {
@@ -362,14 +373,18 @@ static ssize_t afs_proc_rootcell_read(struct file *file, char __user *buf,
 	rcu_read_unlock();
 
 	if (!len)
-		return 0;
+		goto out;
 
 	name[len++] = '\n';
 	if (len > size)
 		len = size;
-	if (copy_to_user(buf, name, len) != 0)
-		return -EFAULT;
+	if (copy_to_user(buf, name, len) != 0) {
+		len = -EFAULT;
+		goto out;
+	}
 	*_pos = 1;
+out:
+	put_net(net_ns);
 	return len;
 }
 
@@ -381,7 +396,8 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 				       const char __user *buf,
 				       size_t size, loff_t *_pos)
 {
-	struct afs_net *net = afs_proc2net(file);
+	struct afs_net *net;
+	struct net *net_ns = NULL;
 	char *kbuf, *s;
 	int ret;
 
@@ -407,6 +423,12 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 	/* determine command to perform */
 	_debug("rootcell=%s", kbuf);
 
+	ret = -ESTALE;
+	net_ns = get_proc_net(file_inode(file));
+	if (!net_ns)
+		goto out;
+	net = afs_net(net_ns);
+
 	ret = afs_cell_init(net, kbuf);
 	if (ret >= 0)
 		ret = size;	/* consume everything, always */
@@ -420,13 +442,14 @@ static ssize_t afs_proc_rootcell_write(struct file *file,
 /*
  * initialise /proc/fs/afs/<cell>/
  */
-int afs_proc_cell_setup(struct afs_net *net, struct afs_cell *cell)
+int afs_proc_cell_setup(struct afs_cell *cell)
 {
 	struct proc_dir_entry *dir;
+	struct afs_net *net = cell->net;
 
 	_enter("%p{%s},%p", cell, cell->name, net->proc_afs);
 
-	dir = proc_mkdir(cell->name, net->proc_afs);
+	dir = proc_net_mkdir(net->net, cell->name, net->proc_afs);
 	if (!dir)
 		goto error_dir;
 
@@ -449,12 +472,12 @@ int afs_proc_cell_setup(struct afs_net *net, struct afs_cell *cell)
 /*
  * remove /proc/fs/afs/<cell>/
  */
-void afs_proc_cell_remove(struct afs_net *net, struct afs_cell *cell)
+void afs_proc_cell_remove(struct afs_cell *cell)
 {
-	_enter("");
+	struct afs_net *net = cell->net;
 
+	_enter("");
 	remove_proc_subtree(cell->name, net->proc_afs);
-
 	_leave("");
 }
 
@@ -471,7 +494,8 @@ static int afs_proc_cell_volumes_open(struct inode *inode, struct file *file)
 	if (!cell)
 		return -ENOENT;
 
-	ret = seq_open(file, &afs_proc_cell_volumes_ops);
+	ret = seq_open_net(inode, file, &afs_proc_cell_volumes_ops,
+			   sizeof(struct seq_net_private));
 	if (ret < 0)
 		return ret;
 
@@ -560,7 +584,8 @@ static int afs_proc_cell_vlservers_open(struct inode *inode, struct file *file)
 	if (!cell)
 		return -ENOENT;
 
-	ret = seq_open(file, &afs_proc_cell_vlservers_ops);
+	ret = seq_open_net(inode, file, &afs_proc_cell_vlservers_ops,
+			   sizeof(struct seq_net_private));
 	if (ret<0)
 		return ret;
 
@@ -649,7 +674,8 @@ static int afs_proc_cell_vlservers_show(struct seq_file *m, void *v)
  */
 static int afs_proc_servers_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &afs_proc_servers_ops);
+	return seq_open_net(inode, file, &afs_proc_servers_ops,
+			    sizeof(struct seq_net_private));
 }
 
 /*
@@ -729,7 +755,8 @@ static int afs_proc_sysname_open(struct inode *inode, struct file *file)
 	struct seq_file *m;
 	int ret;
 
-	ret = seq_open(file, &afs_proc_sysname_ops);
+	ret = seq_open_net(inode, file, &afs_proc_sysname_ops,
+			   sizeof(struct seq_net_private));
 	if (ret < 0)
 		return ret;
 
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 6ab0b79e061e..f56070a9c606 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -48,6 +48,8 @@ struct file_system_type afs_fs_type = {
 };
 MODULE_ALIAS_FS("afs");
 
+int afs_net_id;
+
 static const struct super_operations afs_super_ops = {
 	.statfs		= afs_statfs,
 	.alloc_inode	= afs_alloc_inode,
@@ -117,7 +119,7 @@ int __init afs_fs_init(void)
 /*
  * clean up the filesystem
  */
-void __exit afs_fs_exit(void)
+void afs_fs_exit(void)
 {
 	_enter("");
 
@@ -391,7 +393,7 @@ static int afs_test_super(struct super_block *sb, struct fs_context *fc)
 	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 	struct afs_super_info *as = AFS_FS_S(sb);
 
-	return (as->net == ctx->net &&
+	return (as->net_ns == ctx->fc.net_ns &&
 		as->volume &&
 		as->volume->vid == ctx->volume->vid);
 }
@@ -477,7 +479,7 @@ static struct afs_super_info *afs_alloc_sbi(struct afs_fs_context *ctx)
 
 	as = kzalloc(sizeof(struct afs_super_info), GFP_KERNEL);
 	if (as) {
-		as->net = afs_get_net(ctx->net);
+		as->net_ns = get_net(ctx->fc.net_ns);
 		if (ctx->dyn_root) {
 			as->dyn_root = true;
 		} else {
@@ -492,8 +494,8 @@ static void afs_destroy_sbi(struct afs_super_info *as)
 {
 	if (as) {
 		afs_put_volume(as->cell, as->volume);
-		afs_put_cell(as->net, as->cell);
-		afs_put_net(as->net);
+		afs_put_cell(afs_net(as->net_ns), as->cell);
+		put_net(as->net_ns);
 		kfree(as);
 	}
 }
@@ -506,7 +508,8 @@ static void afs_kill_super(struct super_block *sb)
 	 * deactivating the superblock.
 	 */
 	if (as->volume)
-		afs_clear_callback_interests(as->net, as->volume->servers);
+		afs_clear_callback_interests(afs_net(as->net_ns),
+					     as->volume->servers);
 	kill_anon_super(sb);
 	if (as->volume)
 		afs_deactivate_volume(as->volume);
@@ -574,7 +577,6 @@ static void afs_free_fc(struct fs_context *fc)
 
 	afs_put_volume(ctx->cell, ctx->volume);
 	afs_put_cell(ctx->net, ctx->cell);
-	afs_put_net(ctx->net);
 	afs_destroy_sbi(ctx->as);
 	key_put(ctx->key);
 }
@@ -595,19 +597,19 @@ static int afs_init_fs_context(struct fs_context *fc, struct super_block *src_sb
 	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
 	struct afs_super_info *src_as;
 	struct afs_cell *cell;
+	struct net *net_ns;
 
 	if (current->nsproxy->net_ns != &init_net)
 		return -EINVAL;
+	ctx->type = AFSVL_ROVOL;
 
-	if (src_sb) {
-		src_as = AFS_FS_S(src_sb);
-		if (src_as) {
-			ctx->net    = afs_get_net(src_as->net);
-			ctx->cell   = afs_get_cell(src_as->cell);
-			ctx->volume = __afs_get_volume(src_as->volume);
-		}
-	} else {
-		ctx->net = afs_get_net(&__afs_net);
+	switch (ctx->fc.purpose) {
+	case FS_CONTEXT_FOR_USER_MOUNT:
+	case FS_CONTEXT_FOR_KERNEL_MOUNT:
+		ctx->fc.net_ns = maybe_get_net(current->nsproxy->net_ns);
+		if (!ctx->fc.net_ns)
+			return -ESTALE;
+		ctx->net = afs_net(ctx->fc.net_ns);
 
 		/* Default to the workstation cell. */
 		rcu_read_lock();
@@ -616,6 +618,30 @@ static int afs_init_fs_context(struct fs_context *fc, struct super_block *src_sb
 		if (IS_ERR(cell))
 			cell = NULL;
 		ctx->cell = cell;
+		break;
+
+	case FS_CONTEXT_FOR_SUBMOUNT:
+		if (!src_sb)
+			return -EINVAL;
+
+		src_as = AFS_FS_S(src_sb);
+		ASSERT(src_as);
+
+		net_ns = maybe_get_net(src_as->net_ns);
+		if (!net_ns)
+			return -ESTALE;
+		ctx->fc.net_ns = net_ns;
+		ctx->net = afs_net(net_ns);
+		if (src_as->cell)
+			ctx->cell = afs_get_cell(src_as->cell);
+		if (src_as->volume && src_as->volume->type == AFSVL_RWVOL) {
+			ctx->type = AFSVL_RWVOL;
+			ctx->force = true;
+		}
+		break;
+
+	case FS_CONTEXT_FOR_RECONFIGURE:
+		break;
 	}
 
 	ctx->fc.ops = &afs_context_ops;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 24/24] afs: Use fs_context to pass parameters over automount [ver #7]
  2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
                   ` (22 preceding siblings ...)
  2018-04-19 13:33 ` [PATCH 23/24] afs: Implement namespacing " David Howells
@ 2018-04-19 13:33 ` David Howells
  23 siblings, 0 replies; 40+ messages in thread
From: David Howells @ 2018-04-19 13:33 UTC (permalink / raw)
  To: viro
  Cc: linux-nfs, linux-kernel, dhowells, Eric W. Biederman,
	linux-security-module, linux-fsdevel, linux-afs

Alter the AFS automounting code to create and modify an fs_context struct
when parameterising a new mount triggered by an AFS mountpoint rather than
constructing device name and option strings.

Also remove the cell=, vol= and rwpath options as they are then redundant.
The reason they existed is because the 'device name' may be derived
literally from a mountpoint object in the filesystem, so default cell and
parent-type information needed to be passed in by some other method from
the automount routines.  The vol= option didn't end up being used.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric W. Biederman <ebiederm@redhat.com>
---

 fs/afs/internal.h |    1 
 fs/afs/mntpt.c    |  152 ++++++++++++++++++++++++++++-------------------------
 fs/afs/super.c    |   42 +--------------
 3 files changed, 83 insertions(+), 112 deletions(-)

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index a5161c0ae3ab..589e5356c560 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -37,7 +37,6 @@ struct afs_call;
 struct afs_fs_context {
 	struct fs_context	fc;
 	struct afs_super_info	*as;
-	bool			rwpath;		/* T if the parent should be considered R/W */
 	bool			force;		/* T to force cell type */
 	bool			autocell;	/* T if set auto mount operation */
 	bool			dyn_root;	/* T if dynamic root */
diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
index c45aa1776591..9c4ad0565154 100644
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -47,6 +47,8 @@ static DECLARE_DELAYED_WORK(afs_mntpt_expiry_timer, afs_mntpt_expiry_timed_out);
 
 static unsigned long afs_mntpt_expiry_timeout = 10 * 60;
 
+static const char afs_root_volume[] = "root.cell";
+
 /*
  * no valid lookup procedure on this sort of dir
  */
@@ -68,107 +70,111 @@ static int afs_mntpt_open(struct inode *inode, struct file *file)
 }
 
 /*
- * create a vfsmount to be automounted
+ * Set the parameters for the proposed superblock.
  */
-static struct vfsmount *afs_mntpt_do_automount(struct dentry *mntpt)
+static int afs_mntpt_set_params(struct fs_context *fc, struct dentry *mntpt)
 {
-	struct afs_super_info *as;
-	struct vfsmount *mnt;
-	struct afs_vnode *vnode;
-	struct page *page;
-	char *devname, *options;
-	bool rwpath = false;
+	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
+	struct afs_vnode *vnode = AFS_FS_I(d_inode(mntpt));
+	struct afs_cell *cell;
+	const char *p;
 	int ret;
 
-	_enter("{%pd}", mntpt);
-
-	BUG_ON(!d_inode(mntpt));
-
-	ret = -ENOMEM;
-	devname = (char *) get_zeroed_page(GFP_KERNEL);
-	if (!devname)
-		goto error_no_devname;
-
-	options = (char *) get_zeroed_page(GFP_KERNEL);
-	if (!options)
-		goto error_no_options;
-
-	vnode = AFS_FS_I(d_inode(mntpt));
 	if (test_bit(AFS_VNODE_PSEUDODIR, &vnode->flags)) {
 		/* if the directory is a pseudo directory, use the d_name */
-		static const char afs_root_cell[] = ":root.cell.";
 		unsigned size = mntpt->d_name.len;
 
-		ret = -ENOENT;
-		if (size < 2 || size > AFS_MAXCELLNAME)
-			goto error_no_page;
+		if (size < 2)
+			return -ENOENT;
 
+		p = mntpt->d_name.name;
 		if (mntpt->d_name.name[0] == '.') {
-			devname[0] = '%';
-			memcpy(devname + 1, mntpt->d_name.name + 1, size - 1);
-			memcpy(devname + size, afs_root_cell,
-			       sizeof(afs_root_cell));
-			rwpath = true;
-		} else {
-			devname[0] = '#';
-			memcpy(devname + 1, mntpt->d_name.name, size);
-			memcpy(devname + size + 1, afs_root_cell,
-			       sizeof(afs_root_cell));
+			size--;
+			p++;
+			ctx->type = AFSVL_RWVOL;
+			ctx->force = true;
+		}
+		if (size > AFS_MAXCELLNAME)
+			return -ENAMETOOLONG;
+
+		cell = afs_lookup_cell(ctx->net, p, size, NULL, false);
+		if (IS_ERR(cell)) {
+			pr_err("kAFS: unable to lookup cell '%pd'\n", mntpt);
+			return PTR_ERR(cell);
 		}
+		afs_put_cell(ctx->net, ctx->cell);
+		ctx->cell = cell;
+
+		ctx->volname = afs_root_volume;
+		ctx->volnamesz = sizeof(afs_root_volume) - 1;
 	} else {
 		/* read the contents of the AFS special symlink */
+		struct page *page;
 		loff_t size = i_size_read(d_inode(mntpt));
 		char *buf;
 
-		ret = -EINVAL;
 		if (size > PAGE_SIZE - 1)
-			goto error_no_page;
+			return -EINVAL;
 
 		page = read_mapping_page(d_inode(mntpt)->i_mapping, 0, NULL);
-		if (IS_ERR(page)) {
-			ret = PTR_ERR(page);
-			goto error_no_page;
-		}
+		if (IS_ERR(page))
+			return PTR_ERR(page);
 
-		ret = -EIO;
-		if (PageError(page))
-			goto error;
+		if (PageError(page)) {
+			put_page(page);
+			return -EIO;
+		}
 
-		buf = kmap_atomic(page);
-		memcpy(devname, buf, size);
-		kunmap_atomic(buf);
+		buf = kmap(page);
+		ctx->fc.source = kmemdup_nul(buf, size, GFP_KERNEL);
+		kunmap(page);
 		put_page(page);
-		page = NULL;
-	}
+		if (!ctx->fc.source)
+			return -ENOMEM;
 
-	/* work out what options we want */
-	as = AFS_FS_S(mntpt->d_sb);
-	if (as->cell) {
-		memcpy(options, "cell=", 5);
-		strcpy(options + 5, as->cell->name);
-		if ((as->volume && as->volume->type == AFSVL_RWVOL) || rwpath)
-			strcat(options, ",rwpath");
+		ret = ctx->fc.ops->parse_source(fc);
+		if (ret < 0)
+			return ret;
 	}
 
-	/* try and do the mount */
-	_debug("--- attempting mount %s -o %s ---", devname, options);
-	mnt = vfs_submount(mntpt, &afs_fs_type, devname,
-			   options, strlen(options) + 1);
-	_debug("--- mount result %p ---", mnt);
+	return 0;
+}
+
+/*
+ * create a vfsmount to be automounted
+ */
+static struct vfsmount *afs_mntpt_do_automount(struct dentry *mntpt)
+{
+	struct fs_context *fc;
+	struct vfsmount *mnt;
+	int ret;
+
+	BUG_ON(!d_inode(mntpt));
+
+	fc = vfs_new_fs_context(&afs_fs_type, mntpt->d_sb, 0,
+				FS_CONTEXT_FOR_SUBMOUNT);
+	if (IS_ERR(fc))
+		return ERR_CAST(fc);
+
+	ret = afs_mntpt_set_params(fc, mntpt);
+	if (ret < 0)
+		goto error_fc;
+
+	ret = vfs_get_tree(fc);
+	if (ret < 0)
+		goto error_fc;
+
+	mnt = vfs_create_mount(fc);
+	if (IS_ERR(mnt)) {
+		ret = PTR_ERR(mnt);
+		goto error_fc;
+	}
 
-	free_page((unsigned long) devname);
-	free_page((unsigned long) options);
-	_leave(" = %p", mnt);
+	put_fs_context(fc);
 	return mnt;
 
-error:
-	put_page(page);
-error_no_page:
-	free_page((unsigned long) options);
-error_no_options:
-	free_page((unsigned long) devname);
-error_no_devname:
-	_leave(" = %d", ret);
+error_fc:
+	put_fs_context(fc);
 	return ERR_PTR(ret);
 }
 
diff --git a/fs/afs/super.c b/fs/afs/super.c
index f56070a9c606..5f9d225e32d9 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -65,18 +65,12 @@ static atomic_t afs_count_active_inodes;
 
 enum {
 	afs_no_opt,
-	afs_opt_cell,
 	afs_opt_dyn,
-	afs_opt_rwpath,
-	afs_opt_vol,
 	afs_opt_autocell,
 };
 
 static const match_table_t afs_options_list = {
-	{ afs_opt_cell,		"cell=%s"	},
 	{ afs_opt_dyn,		"dyn"		},
-	{ afs_opt_rwpath,	"rwpath"	},
-	{ afs_opt_vol,		"vol=%s"	},
 	{ afs_opt_autocell,	"autocell"	},
 	{ afs_no_opt,		NULL		},
 };
@@ -195,37 +189,13 @@ static int afs_show_options(struct seq_file *m, struct dentry *root)
 static int afs_parse_option(struct fs_context *fc, char *opt, size_t len)
 {
 	struct afs_fs_context *ctx = container_of(fc, struct afs_fs_context, fc);
-	struct afs_cell *cell;
 	substring_t args[MAX_OPT_ARGS];
-	int token, size;
+	int token;
 
 	_enter("%s", opt);
 
 	token = match_token(opt, afs_options_list, args);
 	switch (token) {
-	case afs_opt_cell:
-		size = args[0].to - args[0].from;
-		if (size <= 0)
-			return -EINVAL;
-		if (size > AFS_MAXCELLNAME)
-			return -ENAMETOOLONG;
-
-		rcu_read_lock();
-		cell = afs_lookup_cell_rcu(ctx->net, args[0].from, size);
-		rcu_read_unlock();
-		if (IS_ERR(cell))
-			return PTR_ERR(cell);
-		afs_put_cell(ctx->net, ctx->cell);
-		ctx->cell = cell;
-		break;
-
-	case afs_opt_rwpath:
-		ctx->rwpath = true;
-		break;
-
-	case afs_opt_vol:
-		return -EINVAL; /* Not required for automount */
-
 	case afs_opt_autocell:
 		ctx->autocell = true;
 		break;
@@ -249,8 +219,8 @@ static int afs_parse_option(struct fs_context *fc, char *opt, size_t len)
  *
  * This can be one of the following:
  *	"%[cell:]volume[.]"		R/W volume
- *	"#[cell:]volume[.]"		R/O or R/W volume (rwpath=0),
- *					 or R/W (rwpath=1) volume
+ *	"#[cell:]volume[.]"		R/O or R/W volume (R/O parent),
+ *					 or R/W (R/W parent) volume
  *	"%[cell:]volume.readonly"	R/O volume
  *	"#[cell:]volume.readonly"	R/O volume
  *	"%[cell:]volume.backup"		Backup volume
@@ -281,9 +251,7 @@ static int afs_parse_source(struct fs_context *fc)
 	}
 
 	/* determine the type of volume we're looking for */
-	ctx->type = AFSVL_ROVOL;
-	ctx->force = false;
-	if (ctx->rwpath || name[0] == '%') {
+	if (name[0] == '%') {
 		ctx->type = AFSVL_RWVOL;
 		ctx->force = true;
 	}
@@ -599,8 +567,6 @@ static int afs_init_fs_context(struct fs_context *fc, struct super_block *src_sb
 	struct afs_cell *cell;
 	struct net *net_ns;
 
-	if (current->nsproxy->net_ns != &init_net)
-		return -EINVAL;
 	ctx->type = AFSVL_ROVOL;
 
 	switch (ctx->fc.purpose) {

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/24] VFS: Add LSM hooks for filesystem context [ver #7]
  2018-04-19 13:31 ` [PATCH 04/24] VFS: Add LSM hooks for " David Howells
@ 2018-04-19 20:32   ` Paul Moore
  2018-04-20 15:35   ` David Howells
  1 sibling, 0 replies; 40+ messages in thread
From: Paul Moore @ 2018-04-19 20:32 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs, selinux

On Thu, Apr 19, 2018 at 9:31 AM, David Howells <dhowells@redhat.com> wrote:
> Add LSM hooks for use by the filesystem context code.  This includes:
>
>  (1) Hooks to handle allocation, duplication and freeing of the security
>      record attached to a filesystem context.
>
>  (2) A hook to snoop a mount options in key[=val] form.  If the LSM decides
>      it wants to handle it, it can suppress the option being passed to the
>      filesystem.  Note that 'val' may include commas and binary data with
>      the fsopen patch.
>
>  (3) A hook to transfer the security from the context to a newly created
>      superblock.
>
>  (4) A hook to rule on whether a path point can be used as a mountpoint.
>
> These are intended to replace:
>
>         security_sb_copy_data
>         security_sb_kern_mount
>         security_sb_mount
>         security_sb_set_mnt_opts
>         security_sb_clone_mnt_opts
>         security_sb_parse_opts_str
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: linux-security-module@vger.kernel.org
> ---
>
>  include/linux/lsm_hooks.h |   62 +++++++++++
>  include/linux/security.h  |   44 ++++++++
>  security/security.c       |   41 +++++++
>  security/selinux/hooks.c  |  262 +++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 409 insertions(+)

Adding the SELinux mailing list to the CC line; in the future please
include the SELinux mailing list on patches like this.  It would also
be very helpful to include "selinux" somewhere in the subject line
when the patch is predominately SELinux related (much like you did for
the other LSMs in this patchset).

I can't say I've digested all of this yet, but what SELinux testing
have you done with this patchset?

> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 9d0b286f3dba..da20f90d40bb 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -76,6 +76,50 @@
>   *     changes on the process such as clearing out non-inheritable signal
>   *     state.  This is called immediately after commit_creds().
>   *
> + * Security hooks for mount using fd context.
> + *
> + * @fs_context_alloc:
> + *     Allocate and attach a security structure to sc->security.  This pointer
> + *     is initialised to NULL by the caller.
> + *     @fc indicates the new filesystem context.
> + *     @src_sb indicates the source superblock of a submount.
> + * @fs_context_dup:
> + *     Allocate and attach a security structure to sc->security.  This pointer
> + *     is initialised to NULL by the caller.
> + *     @fc indicates the new filesystem context.
> + *     @src_fc indicates the original filesystem context.
> + * @fs_context_free:
> + *     Clean up a filesystem context.
> + *     @fc indicates the filesystem context.
> + * @fs_context_parse_option:
> + *     Userspace provided an option to configure a superblock.  The LSM may
> + *     reject it with an error and may use it for itself, in which case it
> + *     should return 1; otherwise it should return 0 to pass it on to the
> + *     filesystem.
> + *     @fc indicates the filesystem context.
> + *     @opt indicates the option in "key[=val]" form.  It is NUL-terminated,
> + *     but val may be binary data.
> + *     @len indicates the size of the option.
> + * @fs_context_validate:
> + *     Validate the filesystem context preparatory to applying it.  This is
> + *     done after all the options have been parsed.
> + *     @fc indicates the filesystem context.
> + * @sb_get_tree:
> + *     Assign the security to a newly created superblock.
> + *     @fc indicates the filesystem context.
> + *     @fc->root indicates the root that will be mounted.
> + *     @fc->root->d_sb points to the superblock.
> + * @sb_reconfigure:
> + *     Apply reconfiguration to the security on a superblock.
> + *     @fc indicates the filesystem context.
> + *     @fc->root indicates a dentry in the mount.
> + *     @fc->root->d_sb points to the superblock.
> + * @sb_mountpoint:
> + *     Equivalent of sb_mount, but with an fs_context.
> + *     @fc indicates the filesystem context.
> + *     @mountpoint indicates the path on which the mount will take place.
> + *     @mnt_flags indicates the MNT_* flags specified.
> + *
>   * Security hooks for filesystem operations.
>   *
>   * @sb_alloc_security:
> @@ -1450,6 +1494,16 @@ union security_list_options {
>         void (*bprm_committing_creds)(struct linux_binprm *bprm);
>         void (*bprm_committed_creds)(struct linux_binprm *bprm);
>
> +       int (*fs_context_alloc)(struct fs_context *fc, struct super_block *src_sb);
> +       int (*fs_context_dup)(struct fs_context *fc, struct fs_context *src_sc);
> +       void (*fs_context_free)(struct fs_context *fc);
> +       int (*fs_context_parse_option)(struct fs_context *fc, char *opt, size_t len);
> +       int (*fs_context_validate)(struct fs_context *fc);
> +       int (*sb_get_tree)(struct fs_context *fc);
> +       void (*sb_reconfigure)(struct fs_context *fc);
> +       int (*sb_mountpoint)(struct fs_context *fc, struct path *mountpoint,
> +                            unsigned int mnt_flags);
> +
>         int (*sb_alloc_security)(struct super_block *sb);
>         void (*sb_free_security)(struct super_block *sb);
>         int (*sb_copy_data)(char *orig, char *copy);
> @@ -1787,6 +1841,14 @@ struct security_hook_heads {
>         struct hlist_head bprm_check_security;
>         struct hlist_head bprm_committing_creds;
>         struct hlist_head bprm_committed_creds;
> +       struct hlist_head fs_context_alloc;
> +       struct hlist_head fs_context_dup;
> +       struct hlist_head fs_context_free;
> +       struct hlist_head fs_context_parse_option;
> +       struct hlist_head fs_context_validate;
> +       struct hlist_head sb_get_tree;
> +       struct hlist_head sb_reconfigure;
> +       struct hlist_head sb_mountpoint;
>         struct hlist_head sb_alloc_security;
>         struct hlist_head sb_free_security;
>         struct hlist_head sb_copy_data;
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 200920f521a1..60a85bd9dfef 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -53,6 +53,7 @@ struct msg_msg;
>  struct xattr;
>  struct xfrm_sec_ctx;
>  struct mm_struct;
> +struct fs_context;
>
>  /* If capable should audit the security request */
>  #define SECURITY_CAP_NOAUDIT 0
> @@ -231,6 +232,15 @@ int security_bprm_set_creds(struct linux_binprm *bprm);
>  int security_bprm_check(struct linux_binprm *bprm);
>  void security_bprm_committing_creds(struct linux_binprm *bprm);
>  void security_bprm_committed_creds(struct linux_binprm *bprm);
> +int security_fs_context_alloc(struct fs_context *fc, struct super_block *sb);
> +int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc);
> +void security_fs_context_free(struct fs_context *fc);
> +int security_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len);
> +int security_fs_context_validate(struct fs_context *fc);
> +int security_sb_get_tree(struct fs_context *fc);
> +void security_sb_reconfigure(struct fs_context *fc);
> +int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
> +                          unsigned int mnt_flags);
>  int security_sb_alloc(struct super_block *sb);
>  void security_sb_free(struct super_block *sb);
>  int security_sb_copy_data(char *orig, char *copy);
> @@ -539,6 +549,40 @@ static inline void security_bprm_committed_creds(struct linux_binprm *bprm)
>  {
>  }
>
> +static inline int security_fs_context_alloc(struct fs_context *fc,
> +                                           struct super_block *src_sb)
> +{
> +       return 0;
> +}
> +static inline int security_fs_context_dup(struct fs_context *fc,
> +                                         struct fs_context *src_fc)
> +{
> +       return 0;
> +}
> +static inline void security_fs_context_free(struct fs_context *fc)
> +{
> +}
> +static inline int security_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
> +{
> +       return 0;
> +}
> +static inline int security_fs_context_validate(struct fs_context *fc)
> +{
> +       return 0;
> +}
> +static inline int security_sb_get_tree(struct fs_context *fc)
> +{
> +       return 0;
> +}
> +static inline void security_sb_reconfigure(struct fs_context *fc)
> +{
> +}
> +static inline int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
> +                                        unsigned int mnt_flags)
> +{
> +       return 0;
> +}
> +
>  static inline int security_sb_alloc(struct super_block *sb)
>  {
>         return 0;
> diff --git a/security/security.c b/security/security.c
> index 7bc2fde023a7..42e4ea19b61c 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -358,6 +358,47 @@ void security_bprm_committed_creds(struct linux_binprm *bprm)
>         call_void_hook(bprm_committed_creds, bprm);
>  }
>
> +int security_fs_context_alloc(struct fs_context *fc, struct super_block *src_sb)
> +{
> +       return call_int_hook(fs_context_alloc, 0, fc, src_sb);
> +}
> +
> +int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
> +{
> +       return call_int_hook(fs_context_dup, 0, fc, src_fc);
> +}
> +
> +void security_fs_context_free(struct fs_context *fc)
> +{
> +       call_void_hook(fs_context_free, fc);
> +}
> +
> +int security_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
> +{
> +       return call_int_hook(fs_context_parse_option, 0, fc, opt, len);
> +}
> +
> +int security_fs_context_validate(struct fs_context *fc)
> +{
> +       return call_int_hook(fs_context_validate, 0, fc);
> +}
> +
> +int security_sb_get_tree(struct fs_context *fc)
> +{
> +       return call_int_hook(sb_get_tree, 0, fc);
> +}
> +
> +void security_sb_reconfigure(struct fs_context *fc)
> +{
> +       call_void_hook(sb_reconfigure, fc);
> +}
> +
> +int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
> +                          unsigned int mnt_flags)
> +{
> +       return call_int_hook(sb_mountpoint, 0, fc, mountpoint, mnt_flags);
> +}
> +
>  int security_sb_alloc(struct super_block *sb)
>  {
>         return call_int_hook(sb_alloc_security, 0, sb);
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 1f0316bf7e29..969a2a0dc582 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -48,6 +48,7 @@
>  #include <linux/fdtable.h>
>  #include <linux/namei.h>
>  #include <linux/mount.h>
> +#include <linux/fs_context.h>
>  #include <linux/netfilter_ipv4.h>
>  #include <linux/netfilter_ipv6.h>
>  #include <linux/tty.h>
> @@ -2960,6 +2961,259 @@ static int selinux_umount(struct vfsmount *mnt, int flags)
>                                    FILESYSTEM__UNMOUNT, NULL);
>  }
>
> +/* fsopen mount context operations */
> +
> +static int selinux_fs_context_alloc(struct fs_context *fc,
> +                                   struct super_block *src_sb)
> +{
> +       struct security_mnt_opts *opts;
> +
> +       opts = kzalloc(sizeof(*opts), GFP_KERNEL);
> +       if (!opts)
> +               return -ENOMEM;
> +
> +       fc->security = opts;
> +       return 0;
> +}
> +
> +static int selinux_fs_context_dup(struct fs_context *fc,
> +                                 struct fs_context *src_fc)
> +{
> +       const struct security_mnt_opts *src = src_fc->security;
> +       struct security_mnt_opts *opts;
> +       int i, n;
> +
> +       opts = kzalloc(sizeof(*opts), GFP_KERNEL);
> +       if (!opts)
> +               return -ENOMEM;
> +       fc->security = opts;
> +
> +       if (!src || !src->num_mnt_opts)
> +               return 0;
> +       n = opts->num_mnt_opts = src->num_mnt_opts;
> +
> +       if (src->mnt_opts) {
> +               opts->mnt_opts = kcalloc(n, sizeof(char *), GFP_KERNEL);
> +               if (!opts->mnt_opts)
> +                       return -ENOMEM;
> +
> +               for (i = 0; i < n; i++) {
> +                       if (src->mnt_opts[i]) {
> +                               opts->mnt_opts[i] = kstrdup(src->mnt_opts[i],
> +                                                           GFP_KERNEL);
> +                               if (!opts->mnt_opts[i])
> +                                       return -ENOMEM;
> +                       }
> +               }
> +       }
> +
> +       if (src->mnt_opts_flags) {
> +               opts->mnt_opts_flags = kmemdup(src->mnt_opts_flags,
> +                                              n * sizeof(int), GFP_KERNEL);
> +               if (!opts->mnt_opts_flags)
> +                       return -ENOMEM;
> +       }
> +
> +       return 0;
> +}
> +
> +static void selinux_fs_context_free(struct fs_context *fc)
> +{
> +       struct security_mnt_opts *opts = fc->security;
> +
> +       security_free_mnt_opts(opts);
> +       fc->security = NULL;
> +}
> +
> +static int selinux_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
> +{
> +       struct security_mnt_opts *opts = fc->security;
> +       substring_t args[MAX_OPT_ARGS];
> +       unsigned int have;
> +       char *c, **oo;
> +       int token, ctx, i, *of;
> +
> +       token = match_token(opt, tokens, args);
> +       if (token == Opt_error)
> +               return 0; /* Doesn't belong to us. */
> +
> +       have = 0;
> +       for (i = 0; i < opts->num_mnt_opts; i++)
> +               have |= 1 << opts->mnt_opts_flags[i];
> +       if (have & (1 << token))
> +               return -EINVAL;
> +
> +       switch (token) {
> +       case Opt_context:
> +               if (have & (1 << Opt_defcontext))
> +                       goto incompatible;
> +               ctx = CONTEXT_MNT;
> +               goto copy_context_string;
> +
> +       case Opt_fscontext:
> +               ctx = FSCONTEXT_MNT;
> +               goto copy_context_string;
> +
> +       case Opt_rootcontext:
> +               ctx = ROOTCONTEXT_MNT;
> +               goto copy_context_string;
> +
> +       case Opt_defcontext:
> +               if (have & (1 << Opt_context))
> +                       goto incompatible;
> +               ctx = DEFCONTEXT_MNT;
> +               goto copy_context_string;
> +
> +       case Opt_labelsupport:
> +               return 1;
> +
> +       default:
> +               return -EINVAL;
> +       }
> +
> +copy_context_string:
> +       if (opts->num_mnt_opts > 3)
> +               return -EINVAL;
> +
> +       of = krealloc(opts->mnt_opts_flags,
> +                     (opts->num_mnt_opts + 1) * sizeof(int), GFP_KERNEL);
> +       if (!of)
> +               return -ENOMEM;
> +       of[opts->num_mnt_opts] = 0;
> +       opts->mnt_opts_flags = of;
> +
> +       oo = krealloc(opts->mnt_opts,
> +                     (opts->num_mnt_opts + 1) * sizeof(char *), GFP_KERNEL);
> +       if (!oo)
> +               return -ENOMEM;
> +       oo[opts->num_mnt_opts] = NULL;
> +       opts->mnt_opts = oo;
> +
> +       c = match_strdup(&args[0]);
> +       if (!c)
> +               return -ENOMEM;
> +       opts->mnt_opts[opts->num_mnt_opts] = c;
> +       opts->mnt_opts_flags[opts->num_mnt_opts] = ctx;
> +       opts->num_mnt_opts++;
> +       return 1;
> +
> +incompatible:
> +       return -EINVAL;
> +}
> +
> +/*
> + * Validate the security parameters supplied for a reconfiguration/remount
> + * event.
> + */
> +static int selinux_validate_for_sb_reconfigure(struct fs_context *fc)
> +{
> +       struct super_block *sb = fc->root->d_sb;
> +       struct superblock_security_struct *sbsec = sb->s_security;
> +       struct security_mnt_opts *opts = fc->security;
> +       int rc, i, *flags;
> +       char **mount_options;
> +
> +       if (!(sbsec->flags & SE_SBINITIALIZED))
> +               return 0;
> +
> +       mount_options = opts->mnt_opts;
> +       flags = opts->mnt_opts_flags;
> +
> +       for (i = 0; i < opts->num_mnt_opts; i++) {
> +               u32 sid;
> +
> +               if (flags[i] == SBLABEL_MNT)
> +                       continue;
> +
> +               rc = security_context_str_to_sid(&selinux_state, mount_options[i],
> +                                                &sid, GFP_KERNEL);
> +               if (rc) {
> +                       pr_warn("SELinux: security_context_str_to_sid"
> +                               "(%s) failed for (dev %s, type %s) errno=%d\n",
> +                               mount_options[i], sb->s_id, sb->s_type->name, rc);
> +                       goto inval;
> +               }
> +
> +               switch (flags[i]) {
> +               case FSCONTEXT_MNT:
> +                       if (bad_option(sbsec, FSCONTEXT_MNT, sbsec->sid, sid))
> +                               goto bad_option;
> +                       break;
> +               case CONTEXT_MNT:
> +                       if (bad_option(sbsec, CONTEXT_MNT, sbsec->mntpoint_sid, sid))
> +                               goto bad_option;
> +                       break;
> +               case ROOTCONTEXT_MNT: {
> +                       struct inode_security_struct *root_isec;
> +                       root_isec = backing_inode_security(sb->s_root);
> +
> +                       if (bad_option(sbsec, ROOTCONTEXT_MNT, root_isec->sid, sid))
> +                               goto bad_option;
> +                       break;
> +               }
> +               case DEFCONTEXT_MNT:
> +                       if (bad_option(sbsec, DEFCONTEXT_MNT, sbsec->def_sid, sid))
> +                               goto bad_option;
> +                       break;
> +               default:
> +                       goto inval;
> +               }
> +       }
> +
> +       rc = 0;
> +out:
> +       return rc;
> +
> +bad_option:
> +       pr_warn("SELinux: unable to change security options "
> +               "during remount (dev %s, type=%s)\n",
> +               sb->s_id, sb->s_type->name);
> +inval:
> +       rc = -EINVAL;
> +       goto out;
> +}
> +
> +/*
> + * Validate the security context assembled from the option data supplied to
> + * mount.
> + */
> +static int selinux_fs_context_validate(struct fs_context *fc)
> +{
> +       if (fc->purpose == FS_CONTEXT_FOR_RECONFIGURE)
> +               return selinux_validate_for_sb_reconfigure(fc);
> +       return 0;
> +}
> +
> +/*
> + * Set the security context on a superblock.
> + */
> +static int selinux_sb_get_tree(struct fs_context *fc)
> +{
> +       const struct cred *cred = current_cred();
> +       struct common_audit_data ad;
> +       int rc;
> +
> +       rc = selinux_set_mnt_opts(fc->root->d_sb, fc->security, 0, NULL);
> +       if (rc)
> +               return rc;
> +
> +       /* Allow all mounts performed by the kernel */
> +       if (fc->purpose == FS_CONTEXT_FOR_KERNEL_MOUNT)
> +               return 0;
> +
> +       ad.type = LSM_AUDIT_DATA_DENTRY;
> +       ad.u.dentry = fc->root;
> +       return superblock_has_perm(cred, fc->root->d_sb, FILESYSTEM__MOUNT, &ad);
> +}
> +
> +static int selinux_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
> +                                unsigned int mnt_flags)
> +{
> +       const struct cred *cred = current_cred();
> +
> +       return path_has_perm(cred, mountpoint, FILE__MOUNTON);
> +}
> +
>  /* inode security operations */
>
>  static int selinux_inode_alloc_security(struct inode *inode)
> @@ -6871,6 +7125,14 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
>         LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds),
>         LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds),
>
> +       LSM_HOOK_INIT(fs_context_alloc, selinux_fs_context_alloc),
> +       LSM_HOOK_INIT(fs_context_dup, selinux_fs_context_dup),
> +       LSM_HOOK_INIT(fs_context_free, selinux_fs_context_free),
> +       LSM_HOOK_INIT(fs_context_parse_option, selinux_fs_context_parse_option),
> +       LSM_HOOK_INIT(fs_context_validate, selinux_fs_context_validate),
> +       LSM_HOOK_INIT(sb_get_tree, selinux_sb_get_tree),
> +       LSM_HOOK_INIT(sb_mountpoint, selinux_sb_mountpoint),
> +
>         LSM_HOOK_INIT(sb_alloc_security, selinux_sb_alloc_security),
>         LSM_HOOK_INIT(sb_free_security, selinux_sb_free_security),
>         LSM_HOOK_INIT(sb_copy_data, selinux_sb_copy_data),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/24] VFS: Add LSM hooks for filesystem context [ver #7]
  2018-04-19 13:31 ` [PATCH 04/24] VFS: Add LSM hooks for " David Howells
  2018-04-19 20:32   ` Paul Moore
@ 2018-04-20 15:35   ` David Howells
  2018-04-23 13:25     ` Stephen Smalley
  2018-04-24 15:22     ` David Howells
  1 sibling, 2 replies; 40+ messages in thread
From: David Howells @ 2018-04-20 15:35 UTC (permalink / raw)
  To: Paul Moore
  Cc: dhowells, viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs, selinux

Paul Moore <paul@paul-moore.com> wrote:

> Adding the SELinux mailing list to the CC line; in the future please
> include the SELinux mailing list on patches like this.  It would also
> be very helpful to include "selinux" somewhere in the subject line
> when the patch is predominately SELinux related (much like you did for
> the other LSMs in this patchset).

I should probably evict the SELinux bits into their own patch since the point
of this patch is the LSM hooks, not specifically SELinux's implementation
thereof.

> I can't say I've digested all of this yet, but what SELinux testing
> have you done with this patchset?

Using the fsopen()/fsmount() syscalls, these hooks will be made use of, say
for NFS (which I haven't included in this list).  Even sys_mount() will make
use of them a bit, so just booting the system does that.

Note that for SELinux these hooks don't change very much except how the
parameters are handled.  It doesn't actually change the checks that are made -
at least, not yet.  There are some additional syscalls under consideration
(such as the ability to pick a live mounted filesystem into a context) that
might require additional permits.

David

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context [ver #7]
  2018-04-19 13:31 ` [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context " David Howells
@ 2018-04-23  3:36   ` Randy Dunlap
  2018-05-01 14:29   ` David Howells
  1 sibling, 0 replies; 40+ messages in thread
From: Randy Dunlap @ 2018-04-23  3:36 UTC (permalink / raw)
  To: David Howells, viro
  Cc: linux-nfs, linux-kernel, linux-security-module, linux-fsdevel, linux-afs

Hi David,

On 04/19/18 06:31, David Howells wrote:
> Introduce a filesystem context concept to be used during superblock
> creation for mount and superblock reconfiguration for remount.  This is
> allocated at the beginning of the mount procedure and into it is placed:
> 
>  (1) Filesystem type.
> 
>  (2) Namespaces.
> 
>  (3) Device name.
> 
>  (4) Superblock flags (MS_*).
> 
>  (5) Security details.
> 
>  (6) Filesystem-specific data, as set by the mount options.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  Documentation/filesystems/mounting.txt |  445 ++++++++++++++++++++++++++++++++
>  include/linux/fs_context.h             |   76 +++++
>  2 files changed, 521 insertions(+)
>  create mode 100644 Documentation/filesystems/mounting.txt
>  create mode 100644 include/linux/fs_context.h

> diff --git a/Documentation/filesystems/mounting.txt b/Documentation/filesystems/mounting.txt
> new file mode 100644
> index 000000000000..805135a66b64
> --- /dev/null
> +++ b/Documentation/filesystems/mounting.txt
> @@ -0,0 +1,445 @@
> +			      ===================
> +			      FILESYSTEM MOUNTING
> +			      ===================
> +
> +CONTENTS
> +
> + (1) Overview.
> +
> + (2) The filesystem context.
> +
> + (3) The filesystem context operations.
> +
> + (4) Filesystem context security.
> +
> + (5) VFS filesystem context operations.
> +
> +
> +========
> +OVERVIEW
> +========
> +
> +The creation of new mounts is now to be done in a multistep process:
> +
> + (1) Create a filesystem context.
> +
> + (2) Parse the options and attach them to the context.  Options may be passed
> +     individually from userspace.

Does this say that step (2) can be multiple small steps? How does step (2) know
when userspace has completed sending individual options?


> +
> + (3) Validate and pre-process the context.
> +
> + (4) Get or create a superblock and mountable root.
> +
> + (5) Perform the mount.
> +
> + (6) Return an error message attached to the context.

where/how is this done?

> +
> + (7) Destroy the context.
> +
> +To support this, the file_system_type struct gains two new fields:
> +
> +	unsigned short fs_context_size;
> +
> +which indicates the total amount of space that should be allocated for context
> +data (see the Filesystem Context section), and:
> +
> +	int (*init_fs_context)(struct fs_context *fc, struct super_block *src_sb);
> +
> +which is invoked to set up the filesystem-specific parts of a filesystem
> +context, including the additional space.  The src_sb parameter is used to
> +convey the superblock from which the filesystem may draw extra information
> +(such as namespaces) for submount (FS_CONTEXT_FOR_SUBMOUNT) or reconfiguration
> +(FS_CONTEXT_FOR_RECONFIGURE) purposes - otherwise it will be NULL.
> +
> +Note that security initialisation is done *after* the filesystem is called so
> +that the namespaces may be adjusted first.
> +
> +And the super_operations struct gains one field:
> +
> +	int (*reconfigure) (struct super_block *, struct fs_context *);
> +
> +This shadows the ->reconfigure() operation and takes a prepared filesystem
> +context instead of the mount flags and data page.  It may modify the sb_flags
> +in the context for the caller to pick up.
> +
> +[NOTE] reconfigure is intended as a replacement for remount_fs.
> +
> +
> +======================
> +THE FILESYSTEM CONTEXT
> +======================
> +
> +The creation and reconfiguration of a superblock is governed by a filesystem
> +context.  This is represented by the fs_context structure:
> +
> +	struct fs_context {
> +		const struct fs_context_operations *ops;
> +		struct file_system_type *fs;
> +		struct dentry		*root;
> +		struct user_namespace	*user_ns;
> +		struct net		*net_ns;
> +		const struct cred	*cred;
> +		char			*device;
> +		char			*subtype;
> +		void			*security;
> +		void			*s_fs_info;
> +		unsigned int		sb_flags;
> +		bool			sloppy;
> +		bool			silent;
> +		bool			degraded;
> +		bool			drop_sb;
> +		enum fs_context_purpose	purpose : 8;
> +	};
> +
> +When the VFS creates this, it allocates ->fs_context_size bytes (as specified
> +by the file_system_type object) to hold both the fs_context struct and any
> +extra data required by the filesystem.  The fs_context struct is placed at the
> +beginning of this space.  Any extra space beyond that is for use by the
> +filesystem.  The filesystem should wrap the struct in its own, e.g.:

                                                      in its own struct, e.g.:

> +
> +	struct nfs_fs_context {
> +		struct fs_context fc;
> +		...
> +	};
> +
> +placing the fs_context struct first.  container_of() can then be used.  The
> +file_system_type would be initialised thus:
> +
> +	struct file_system_type nfs = {
> +		...
> +		.fs_context_size	= sizeof(struct nfs_fs_context),
> +		.init_fs_context	= nfs_init_fs_context,
> +		...
> +	};
> +
> +The fs_context fields are as follows:
> +
> + (*) const struct fs_context_operations *ops
> +
> +     These are operations that can be done on a filesystem context (see
> +     below).  This must be set by the ->init_fs_context() file_system_type
> +     operation.
> +
> + (*) struct file_system_type *fs
> +
> +     A pointer to the file_system_type of the filesystem that is being
> +     constructed or reconfigured.  This retains a reference on the type owner.
> +
> + (*) struct dentry *root
> +
> +     A pointer to the root of the mountable tree (and indirectly, the
> +     superblock thereof).  This is filled in by the ->get_tree() op.
> +
> + (*) struct user_namespace *user_ns
> + (*) struct net *net_ns
> +
> +     There are a subset of the namespaces in use by the invoking process.  They
> +     retain references on each namespace.  The subscribed namespaces may be
> +     replaced by the filesystem to reflect other sources, such as the parent
> +     mount superblock on an automount.
> +
> + (*) struct cred *cred
> +
> +     The mounter's credentials.  This retains a reference on the credentials.
> +
> + (*) char *device
> +
> +     This is the device to be mounted.  It may be a block device
> +     (e.g. /dev/sda1) or something more exotic, such as the "host:/path" that
> +     NFS desires.
> +
> + (*) char *subtype
> +
> +     This is a string to be added to the type displayed in /proc/mounts to
> +     qualify it (used by FUSE).  This is available for the filesystem to set if
> +     desired.
> +
> + (*) void *security
> +
> +     A place for the LSMs to hang their security data for the superblock.  The
> +     relevant security operations are described below.
> +
> + (*) void *s_fs_info
> +
> +     The proposed s_fs_info for a new superblock, set in the superblock by
> +     sget_fc().  This can be used to distinguish superblocks.
> +
> + (*) unsigned int sb_flags
> +
> +     This holds the SB_* flags to be set in super_block::s_flags.
> +
> + (*) bool sloppy
> + (*) bool silent
> +
> +     These are set if the sloppy or silent mount options are given.
> +
> +     [NOTE] sloppy is probably unnecessary when userspace passes over one
> +     option at a time since the error can just be ignored if userspace deems it
> +     to be unimportant.
> +
> +     [NOTE] silent is probably redundant with sb_flags & SB_SILENT.
> +
> + (*) bool degraded
> +
> +     This is set if any preallocated resources in the context have been used
> +     up, thereby rendering it unreusable for the ->get_tree() op.
> +
> + (*) bool drop_sb
> +
> +     This is set if a superblock reference needs to be deactivated when the
> +     context is put.
> +
> + (*) enum fs_context_purpose
> +
> +     This indicates the purpose for which the context is intended.  The
> +     available values are:
> +
> +	FS_CONTEXT_FOR_USER_MOUNT,	-- New superblock for user-specified mount
> +	FS_CONTEXT_FOR_KERNEL_MOUNT,	-- New superblock for kernel-internal mount
> +	FS_CONTEXT_FOR_SUBMOUNT		-- New automatic submount of extant mount
> +	FS_CONTEXT_FOR_RECONFIGURE	-- Change an existing mount
> +
> +The mount context is created by calling vfs_new_fs_context(), vfs_sb_reconfig()
> +or vfs_dup_fs_context() and is destroyed with put_fs_context().  Note that the
> +structure is not refcounted.
> +
> +VFS, security and filesystem mount options are set individually with
> +vfs_parse_mount_option().  Options provided by the old mount(2) system call as
> +a page of data can be parsed with generic_parse_monolithic().
> +
> +When mounting, the filesystem is allowed to take data from any of the pointers
> +and attach it to the superblock (or whatever), provided it clears the pointer
> +in the mount context.
> +
> +The filesystem is also allowed to allocate resources and pin them with the
> +mount context.  For instance, NFS might pin the appropriate protocol version
> +module.
> +
> +
> +=================================
> +THE FILESYSTEM CONTEXT OPERATIONS
> +=================================
> +
> +The filesystem context points to a table of operations:
> +
> +	struct fs_context_operations {
> +		void (*free)(struct fs_context *fc);
> +		int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
> +		int (*parse_source)(struct fs_context *fc);
> +		int (*parse_option)(struct fs_context *fc, char *opt);
> +		int (*parse_monolithic)(struct fs_context *fc, void *data);
> +		int (*validate)(struct fs_context *fc);
> +		int (*get_tree)(struct fs_context *fc);
> +	};
> +
> +These operations are invoked by the various stages of the mount procedure to
> +manage the filesystem context.  They are as follows:
> +
> + (*) void (*free)(struct fs_context *fc);
> +
> +     Called to clean up the filesystem-specific part of the filesystem context
> +     when the context is destroyed.  It should be aware that parts of the
> +     context may have been removed and NULL'd out by ->get_tree().
> +
> + (*) int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
> +
> +     Called when a filesystem context has been duplicated to get any refs or
> +     copy any non-referenced resources held in the filesystem-specific part of
> +     the filesystem context.  An error may be returned to indicate failure to
> +     do this.
> +
> +     [!] Note that even if this fails, put_fs_context() will be called
> +	 immediately thereafter, so ->dup() *must* make the
> +	 filesystem-specific part safe for ->free().
> +
> + (*) int (*parse_source)(struct fs_context *fc);
> +
> +     Called when the source or device is specified for a filesystem context.
> +     The string will have been stored in fc->source prior to calling.  If

"source" is called "device" above but "source" in the header file.
Please change one of them to be consistent.

> +     successful, 0 should be returned and a negative error code otherwise.

	                                 or a

> +
> + (*) int (*parse_option)(struct fs_context *fc, char *p);
> +
> +     Called when an option is to be added to the filesystem context.  p points
> +     to the option string, likely in "key[=val]" format.  VFS-specific options
> +     will have been weeded out and fc->sb_flags updated in the context.
> +     Security options will also have been weeded out and fc->security updated.
> +
> +     If successful, 0 should be returned and a negative error code otherwise.

	                                    or a

> +
> + (*) int (*parse_monolithic)(struct fs_context *fc, void *data);
> +
> +     Called when the mount(2) system call is invoked to pass the entire data
> +     page in one go.  If this is expected to be just a list of "key[=val]"
> +     items separated by commas, then this may be set to NULL.
> +
> +     The return value is as for ->parse_option().
> +
> +     If the filesystem (eg. NFS) needs to examine the data first and then finds

	                   e.g.

> +     it's the standard key-val list then it may pass it off to
> +     generic_parse_monolithic().
> +
> + (*) int (*validate)(struct fs_context *fc);
> +
> +     Called when all the options have been applied and the mount is about to
> +     take place.  It is should check for inconsistencies from mount options and
> +     it is also allowed to do preliminary resource acquisition.  For instance,
> +     the core NFS module could load the NFS protocol module here.
> +
> +     Note that if fc->purpose == FS_CONTEXT_FOR_RECONFIGURE, some of the
> +     options necessary for a new mount may not be set.
> +
> +     The return value is as for ->parse_option().
> +
> + (*) int (*get_tree)(struct fs_context *fc);
> +
> +     Called to get or create the mountable root and superblock, using the
> +     information stored in the filesystem context (reconfiguration goes via a
> +     different vector).  It may detach any resources it desires from the
> +     filesystem context and transfer them to the superblock it creates.
> +
> +     On success it should set fc->root to the mountable root and return 0.  In
> +     the case of an error, it should return a negative error code.
> +
> +
> +===========================
> +FILESYSTEM CONTEXT SECURITY
> +===========================
> +
> +The filesystem context contains a security pointer that the LSMs can use for
> +building up a security context for the superblock to be mounted.  There are a
> +number of operations used by the new mount code for this purpose:
> +
> + (*) int security_fs_context_alloc(struct fs_context *fc,
> +				   struct super_block *src_sb);
> +
> +     Called to initialise fc->security (which is preset to NULL) and allocate
> +     any resources needed.  It should return 0 on success and a negative error

	                                                     or a

> +     code on failure.
> +
> +     src_sb is non-NULL in the case of reconfiguration
> +     (FS_CONTEXT_FOR_RECONFIGURE) in which case it indicates the superblock to
> +     be reconfigured or in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in
> +     which case it indicates the parent superblock.

I seem to recall that you were going to rewrite that long sentence above.
-ETOOMANYCASES

> +
> + (*) int security_fs_context_dup(struct fs_context *fc,
> +				 struct fs_context *src_fc);
> +
> +     Called to initialise fc->security (which is preset to NULL) and allocate
> +     any resources needed.  The original filesystem context is pointed to by
> +     src_fc and may be used for reference.  It should return 0 on success and a

	                                                                     or a

> +     negative error code on failure.
> +
> + (*) void security_fs_context_free(struct fs_context *fc);
> +
> +     Called to clean up anything attached to fc->security.  Note that the
> +     contents may have been transferred to a superblock and the pointer NULL'd
> +     out during mount.

[Here we have evidence that in English any noun can be verbed.]   :)

> +
> + (*) int security_fs_context_parse_option(struct fs_context *fc, char *opt);	
> +
> +     Called for each mount option.  The arguments are as for the
> +     ->parse_option() method.  An active LSM may reject one with an error, pass
> +     one over and return 0 or consume one and return 1.  If consumed, the

What does "pass one over" mean?

> +     option isn't passed on to the filesystem.
> +
> + (*) int security_sb_get_tree(struct fs_context *fc);
> +
> +     Called during the mount procedure to verify that the specified superblock
> +     is allowed to be mounted and to transfer the security data there.  It
> +     should return 0 or a negative error code.
> +
> +     [NOTE] Should I add a security_fs_context_validate() operation so that the
> +     LSM has the opportunity to allocate stuff and check the options as a
> +     whole?
> +
> + (*) int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint)

	end line with ';' like the other prototypes.

> +
> +     Called during the mount procedure to verify that the root dentry attached
> +     to the context is permitted to be attached to the specified mountpoint.
> +     It should return 0 on success and a negative error code on failure.

	                              or a

> +
> +
> +=================================
> +VFS FILESYSTEM CONTEXT OPERATIONS
> +=================================
> +
> +There are four operations for creating a filesystem context and
> +one for destroying a context:
> +
> + (*) struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
> +					   struct super_block *src_sb;

				s/;/,/ above

> +					   unsigned int sb_flags);
> +
> +     Create a filesystem context for a given filesystem type.  This allocates
> +     the filesystem context, sets the flags, initialises the security and calls
> +     fs_type->init_fs_context() to initialise the filesystem context.
> +
> +     src_sb can be NULL or it may indicate a superblock that is going to be
> +     reconfigured (FS_CONTEXT_FOR_RECONFIGURE) or a superblock that is the
> +     parent of a submount (FS_CONTEXT_FOR_SUBMOUNT).  This superblock is
> +     provided as a source of namespace information.
> +
> + (*) struct fs_context *vfs_sb_reconfigure(struct vfsmount *mnt,
> +					   unsigned int sb_flags);
> +
> +     Create a filesystem context from the same filesystem as an extant mount
> +     and initialise the mount parameters from the superblock underlying that
> +     mount.  This is for use by superblock parameter reconfiguration.
> +
> + (*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
> +
> +     Duplicate a filesystem context, copying any options noted and duplicating
> +     or additionally referencing any resources held therein.  This is available
> +     for use where a filesystem has to get a mount within a mount, such as NFS4
> +     does by internally mounting the root of the target server and then doing a
> +     private pathwalk to the target directory.
> +
> + (*) void put_fs_context(struct fs_context *fc);
> +
> +     Destroy a filesystem context, releasing any resources it holds.  This
> +     calls the ->free() operation.  This is intended to be called by anyone who
> +     created a filesystem context.
> +
> +     [!] filesystem contexts are not refcounted, so this causes unconditional
> +	 destruction.
> +
> +In all the above operations, apart from the put op, the return is a mount
> +context pointer or a negative error code.
> +
> +For the remaining operations, if an error occurs, a negative error code will be
> +returned.
> +
> + (*) int vfs_get_tree(struct fs_context *fc);
> +
> +     Get or create the mountable root and superblock, using the parameters in
> +     the filesystem context to select/configure the superblock.  This invokes
> +     the ->validate() op and then the ->get_tree() op.
> +
> +     [NOTE] ->validate() could perhaps be rolled into ->get_tree() and
> +     ->reconfigure().
> +
> + (*) struct vfsmount *vfs_create_mount(struct fs_context *fc);
> +
> +     Create a mount given the parameters in the specified filesystem context.
> +     Note that this does not attach the mount to anything.
> +
> + (*) int vfs_set_fs_source(struct fs_context *fc, char *source);
> +
> +     Supply the source name or device name for the mount.  This may cause the
> +     filesystem to access the device.
> +
> + (*) int vfs_parse_fs_option(struct fs_context *fc, char *data);
> +
> +     Supply a single mount option to the filesystem context.  The mount option
> +     should likely be in a "key[=val]" string form.  The option is first
> +     checked to see if it corresponds to a standard mount flag (in which case
> +     it is used to set an SB_xxx flag and consumed) or a security option (in
> +     which case the LSM consumes it) before it is passed on to the filesystem.
> +
> + (*) int generic_parse_monolithic(struct fs_context *fc, void *data);
> +
> +     Parse a sys_mount() data page, assuming the form to be a text list
> +     consisting of key[=val] options separated by commas.  Each item in the
> +     list is passed to vfs_mount_option().  This is the default when the
> +     ->parse_monolithic() operation is NULL.

> diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
> new file mode 100644
> index 000000000000..732a11898242
> --- /dev/null
> +++ b/include/linux/fs_context.h
> @@ -0,0 +1,76 @@
> +/* Filesystem superblock creation and reconfiguration context.
> + *
> + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public Licence
> + * as published by the Free Software Foundation; either version
> + * 2 of the Licence, or (at your option) any later version.
> + */
> +
> +#ifndef _LINUX_FS_CONTEXT_H
> +#define _LINUX_FS_CONTEXT_H
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +
> +struct cred;
> +struct dentry;
> +struct file_operations;
> +struct file_system_type;
> +struct mnt_namespace;
> +struct net;
> +struct pid_namespace;
> +struct super_block;
> +struct user_namespace;
> +struct vfsmount;
> +
> +enum fs_context_purpose {
> +	FS_CONTEXT_FOR_USER_MOUNT,	/* New superblock for user-specified mount */
> +	FS_CONTEXT_FOR_KERNEL_MOUNT,	/* New superblock for kernel-internal mount */
> +	FS_CONTEXT_FOR_SUBMOUNT,	/* New superblock for automatic submount */
> +	FS_CONTEXT_FOR_RECONFIGURE,	/* Superblock reconfiguration (remount) */
> +};
> +
> +/*
> + * Filesystem context as allocated and constructed by the ->init_fs_context()
> + * file_system_type operation.  The size of the object allocated is specified
> + * in struct file_system_type::fs_context_size and this must include sufficient
> + * space for the fs_context struct.
> + *
> + * Superblock creation fills in ->root whereas reconfiguration begins with this
> + * already set.
> + *
> + * See Documentation/filesystems/mounting.txt
> + */
> +struct fs_context {
> +	const struct fs_context_operations *ops;
> +	struct file_system_type	*fs_type;
> +	struct dentry		*root;		/* The root and superblock */
> +	struct user_namespace	*user_ns;	/* The user namespace for this mount */
> +	struct net		*net_ns;	/* The network namespace for this mount */
> +	const struct cred	*cred;		/* The mounter's credentials */
> +	char			*source;	/* The source name (eg. device) */
> +	char			*subtype;	/* The subtype to set on the superblock */
> +	void			*security;	/* The LSM context */
> +	void			*s_fs_info;	/* Proposed s_fs_info */
> +	unsigned int		sb_flags;	/* Proposed superblock flags (SB_*) */
> +	bool			sloppy;		/* Unrecognised options are okay */
> +	bool			silent;
> +	bool			degraded;	/* True if the context can't be reused */
> +	bool			drop_sb;	/* T if need to drop an SB reference */

s/T /True /

> +	enum fs_context_purpose	purpose : 8;
> +};
> +
> +struct fs_context_operations {
> +	void (*free)(struct fs_context *fc);
> +	int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
> +	int (*parse_source)(struct fs_context *fc);
> +	int (*parse_option)(struct fs_context *fc, char *opt, size_t len);
> +	int (*parse_monolithic)(struct fs_context *fc, void *data);
> +	int (*validate)(struct fs_context *fc);
> +	int (*get_tree)(struct fs_context *fc);
> +};
> +
> +#endif /* _LINUX_FS_CONTEXT_H */
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/24] VFS: Add LSM hooks for filesystem context [ver #7]
  2018-04-20 15:35   ` David Howells
@ 2018-04-23 13:25     ` Stephen Smalley
  2018-04-24 15:22     ` David Howells
  1 sibling, 0 replies; 40+ messages in thread
From: Stephen Smalley @ 2018-04-23 13:25 UTC (permalink / raw)
  To: David Howells, Paul Moore
  Cc: viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs, selinux

On 04/20/2018 11:35 AM, David Howells wrote:
> Paul Moore <paul@paul-moore.com> wrote:
> 
>> Adding the SELinux mailing list to the CC line; in the future please
>> include the SELinux mailing list on patches like this.  It would also
>> be very helpful to include "selinux" somewhere in the subject line
>> when the patch is predominately SELinux related (much like you did for
>> the other LSMs in this patchset).
> 
> I should probably evict the SELinux bits into their own patch since the point
> of this patch is the LSM hooks, not specifically SELinux's implementation
> thereof.
> 
>> I can't say I've digested all of this yet, but what SELinux testing
>> have you done with this patchset?
> 
> Using the fsopen()/fsmount() syscalls, these hooks will be made use of, say
> for NFS (which I haven't included in this list).  Even sys_mount() will make
> use of them a bit, so just booting the system does that.
> 
> Note that for SELinux these hooks don't change very much except how the
> parameters are handled.  It doesn't actually change the checks that are made -
> at least, not yet.  There are some additional syscalls under consideration
> (such as the ability to pick a live mounted filesystem into a context) that
> might require additional permits.

Neither fsopen() nor fscontext_fs_write() appear to perform any kind of up-front
permission checking (DAC or MAC), although some security hooks may be ultimately called
to allocate structures, parse security options, etc.  Is there a reason not apply a may_mount()
or similar check up front?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/24] VFS: Add LSM hooks for filesystem context [ver #7]
  2018-04-20 15:35   ` David Howells
  2018-04-23 13:25     ` Stephen Smalley
@ 2018-04-24 15:22     ` David Howells
  2018-04-25 14:07       ` Stephen Smalley
  1 sibling, 1 reply; 40+ messages in thread
From: David Howells @ 2018-04-24 15:22 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: dhowells, Paul Moore, linux-nfs, linux-kernel,
	linux-security-module, viro, selinux, linux-fsdevel, linux-afs

Stephen Smalley <sds@tycho.nsa.gov> wrote:

> Neither fsopen() nor fscontext_fs_write() appear to perform any kind of
> up-front permission checking (DAC or MAC), although some security hooks may
> be ultimately called to allocate structures, parse security options, etc.
> Is there a reason not apply a may_mount() or similar check up front?

may_mount() is called by fsmount() at the moment.  It may make sense to move
this earlier to fsopen().  Note that there's also going to be something that
looks like:

	fd = fspick("/mnt");
	fsmount(fd, "/a", MNT_NOEXEC); // ie. bind mount

or:

	fd = fspick("/mnt");
	write(fd, "o intr");
	write(fd, "x reconfigure"); // ie. something like remount
	close(fd);

I guess we'd want to call may_mount() in fspick() too.  But there's also the
possibility of using this to create a query interfact too:

	fd = fspick("/mnt");
	write(fd, "q intr");
	read(fd, value_buffer);

David

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/24] VFS: Add LSM hooks for filesystem context [ver #7]
  2018-04-24 15:22     ` David Howells
@ 2018-04-25 14:07       ` Stephen Smalley
  0 siblings, 0 replies; 40+ messages in thread
From: Stephen Smalley @ 2018-04-25 14:07 UTC (permalink / raw)
  To: David Howells
  Cc: Paul Moore, linux-nfs, linux-kernel, linux-security-module, viro,
	selinux, linux-fsdevel, linux-afs

On 04/24/2018 11:22 AM, David Howells wrote:
> Stephen Smalley <sds@tycho.nsa.gov> wrote:
> 
>> Neither fsopen() nor fscontext_fs_write() appear to perform any kind of
>> up-front permission checking (DAC or MAC), although some security hooks may
>> be ultimately called to allocate structures, parse security options, etc.
>> Is there a reason not apply a may_mount() or similar check up front?
> 
> may_mount() is called by fsmount() at the moment.  It may make sense to move
> this earlier to fsopen().  Note that there's also going to be something that
> looks like:
> 
> 	fd = fspick("/mnt");
> 	fsmount(fd, "/a", MNT_NOEXEC); // ie. bind mount
> 
> or:
> 
> 	fd = fspick("/mnt");
> 	write(fd, "o intr");
> 	write(fd, "x reconfigure"); // ie. something like remount
> 	close(fd);
> 
> I guess we'd want to call may_mount() in fspick() too.  But there's also the
> possibility of using this to create a query interfact too:
> 
> 	fd = fspick("/mnt");
> 	write(fd, "q intr");
> 	read(fd, value_buffer);

My concern was that fsopen()/fscontext_fs_write() may expose attack surface (e.g. mount option parsing code) that might not be normally accessible to unprivileged userspace (i.e. gated by may_mount() and security_sb_mount()) prior to your changes.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context [ver #7]
  2018-04-19 13:31 ` [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context " David Howells
  2018-04-23  3:36   ` Randy Dunlap
@ 2018-05-01 14:29   ` David Howells
  2018-05-01 15:31     ` Randy Dunlap
  1 sibling, 1 reply; 40+ messages in thread
From: David Howells @ 2018-05-01 14:29 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: dhowells, viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

Randy Dunlap <rdunlap@infradead.org> wrote:

> > + (2) Parse the options and attach them to the context.  Options may be passed
> > +     individually from userspace.
> 
> Does this say that step (2) can be multiple small steps?

Perhaps "phase (2)" would be a better name than "step (2)".  During (2),
multiple argument-supplying calls may be made.

> How does step (2) know when userspace has completed sending individual
> options?

Moving to phase (3) terminates phase (2).  This is triggered by userspace
writing "x create" or "x reconfigure" to the fd as things stand.

> > + (6) Return an error message attached to the context.
> 
> where/how is this done?

That got taken out and made general - which Linus then objected to.  I need to
reinsert this and make it fscontext-specific as most people would really like
to have it, the mount process being able to produce so many weird and
wonderful errors.

> > +When the VFS creates this, it allocates ->fs_context_size bytes (as specified
> > +by the file_system_type object) to hold both the fs_context struct and any
> > +extra data required by the filesystem.  The fs_context struct is placed at the
> > +beginning of this space.  Any extra space beyond that is for use by the
> > +filesystem.  The filesystem should wrap the struct in its own, e.g.:
> 
>                                                       in its own struct, e.g.:

How about "... The filesystem should wrap the struct in one of its own, e.g."?

> > + (*) int security_fs_context_parse_option(struct fs_context *fc, char *opt);	
> > +
> > +     Called for each mount option.  The arguments are as for the
> > +     ->parse_option() method.  An active LSM may reject one with an error, pass
> > +     one over and return 0 or consume one and return 1.  If consumed, the
> 
> What does "pass one over" mean?

How about:

	An active LSM may return 0 to pass the option on to the filesystem, 1
	to cause the option to be discarded or an error to cause the option to
	be rejected.

David

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context [ver #7]
  2018-05-01 14:29   ` David Howells
@ 2018-05-01 15:31     ` Randy Dunlap
  0 siblings, 0 replies; 40+ messages in thread
From: Randy Dunlap @ 2018-05-01 15:31 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

On 05/01/2018 07:29 AM, David Howells wrote:
> Randy Dunlap <rdunlap@infradead.org> wrote:
> 
>>> + (2) Parse the options and attach them to the context.  Options may be passed
>>> +     individually from userspace.
>>
>> Does this say that step (2) can be multiple small steps?
> 
> Perhaps "phase (2)" would be a better name than "step (2)".  During (2),
> multiple argument-supplying calls may be made.

Ack.

>> How does step (2) know when userspace has completed sending individual
>> options?
> 
> Moving to phase (3) terminates phase (2).  This is triggered by userspace
> writing "x create" or "x reconfigure" to the fd as things stand.
> 
>>> + (6) Return an error message attached to the context.
>>
>> where/how is this done?
> 
> That got taken out and made general - which Linus then objected to.  I need to
> reinsert this and make it fscontext-specific as most people would really like
> to have it, the mount process being able to produce so many weird and
> wonderful errors.
> 
>>> +When the VFS creates this, it allocates ->fs_context_size bytes (as specified
>>> +by the file_system_type object) to hold both the fs_context struct and any
>>> +extra data required by the filesystem.  The fs_context struct is placed at the
>>> +beginning of this space.  Any extra space beyond that is for use by the
>>> +filesystem.  The filesystem should wrap the struct in its own, e.g.:
>>
>>                                                       in its own struct, e.g.:
> 
> How about "... The filesystem should wrap the struct in one of its own, e.g."?

OK.

>>> + (*) int security_fs_context_parse_option(struct fs_context *fc, char *opt);	
>>> +
>>> +     Called for each mount option.  The arguments are as for the
>>> +     ->parse_option() method.  An active LSM may reject one with an error, pass
>>> +     one over and return 0 or consume one and return 1.  If consumed, the
>>
>> What does "pass one over" mean?
> 
> How about:
> 
> 	An active LSM may return 0 to pass the option on to the filesystem, 1
> 	to cause the option to be discarded or an error to cause the option to
> 	be rejected.

Much better. Thanks.

-- 
~Randy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/24] apparmor: Implement security hooks for the new mount API [ver #7]
  2018-04-19 13:31 ` [PATCH 05/24] apparmor: Implement security hooks for the new mount API " David Howells
@ 2018-05-04  0:10   ` John Johansen
  2018-05-11 12:20   ` David Howells
  1 sibling, 0 replies; 40+ messages in thread
From: John Johansen @ 2018-05-04  0:10 UTC (permalink / raw)
  To: David Howells, viro
  Cc: linux-nfs, apparmor, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

On 04/19/2018 06:31 AM, David Howells wrote:
> Implement hooks to check the creation of new mountpoints for AppArmor.
> 
> Unfortunately, the DFA evaluation puts the option data in last, after the
> details of the mountpoint, so we have to cache the mount options in the
> fs_context using those hooks till we get to the new mountpoint hook.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>

thanks David,

this looks good, and has pasted the testing that I have done so far. I
have started on the work that will allow us to reorder the match but
its not ready yet and shouldn't hold this up.

Acked-by: John Johansen <john.johansen@canonical.com>


> cc: John Johansen <john.johansen@canonical.com>
> cc: apparmor@lists.ubuntu.com
> cc: linux-security-module@vger.kernel.org
> ---
> 
>  security/apparmor/include/mount.h |   11 +++++
>  security/apparmor/lsm.c           |   80 +++++++++++++++++++++++++++++++++++++
>  security/apparmor/mount.c         |   46 +++++++++++++++++++++
>  3 files changed, 135 insertions(+), 2 deletions(-)
> 
> diff --git a/security/apparmor/include/mount.h b/security/apparmor/include/mount.h
> index 25d6067fa6ef..0441bfae30fa 100644
> --- a/security/apparmor/include/mount.h
> +++ b/security/apparmor/include/mount.h
> @@ -16,6 +16,7 @@
>  
>  #include <linux/fs.h>
>  #include <linux/path.h>
> +#include <linux/fs_context.h>
>  
>  #include "domain.h"
>  #include "policy.h"
> @@ -27,7 +28,13 @@
>  #define AA_AUDIT_DATA		0x40
>  #define AA_MNT_CONT_MATCH	0x40
>  
> -#define AA_MS_IGNORE_MASK (MS_KERNMOUNT | MS_NOSEC | MS_ACTIVE | MS_BORN)
> +#define AA_SB_IGNORE_MASK (SB_KERNMOUNT | SB_NOSEC | SB_ACTIVE | SB_BORN)
> +
> +struct apparmor_fs_context {
> +	struct fs_context	fc;
> +	char			*saved_options;
> +	size_t			saved_size;
> +};
>  
>  int aa_remount(struct aa_label *label, const struct path *path,
>  	       unsigned long flags, void *data);
> @@ -45,6 +52,8 @@ int aa_move_mount(struct aa_label *label, const struct path *path,
>  int aa_new_mount(struct aa_label *label, const char *dev_name,
>  		 const struct path *path, const char *type, unsigned long flags,
>  		 void *data);
> +int aa_new_mount_fc(struct aa_label *label, struct fs_context *fc,
> +		    const struct path *mountpoint);
>  
>  int aa_umount(struct aa_label *label, struct vfsmount *mnt, int flags);
>  
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index 9ebc9e9c3854..14398dec2e38 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -518,6 +518,78 @@ static int apparmor_file_mprotect(struct vm_area_struct *vma,
>  			   !(vma->vm_flags & VM_SHARED) ? MAP_PRIVATE : 0);
>  }
>  
> +static int apparmor_fs_context_alloc(struct fs_context *fc, struct super_block *src_sb)
> +{
> +	struct apparmor_fs_context *afc;
> +
> +	afc = kzalloc(sizeof(*afc), GFP_KERNEL);
> +	if (!afc)
> +		return -ENOMEM;
> +
> +	fc->security = afc;
> +	return 0;
> +}
> +
> +static int apparmor_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
> +{
> +	fc->security = NULL;
> +	return 0;
> +}
> +
> +static void apparmor_fs_context_free(struct fs_context *fc)
> +{
> +	struct apparmor_fs_context *afc = fc->security;
> +
> +	if (afc) {
> +		kfree(afc->saved_options);
> +		kfree(afc);
> +	}
> +}
> +
> +/*
> + * As a temporary hack, we buffer all the options.  The problem is that we need
> + * to pass them to the DFA evaluator *after* mount point parameters, which
> + * means deferring the entire check to the sb_mountpoint hook.
> + */
> +static int apparmor_fs_context_parse_option(struct fs_context *fc, char *opt, size_t len)
> +{
> +	struct apparmor_fs_context *afc = fc->security;
> +	size_t space = 0;
> +	char *p, *q;
> +
> +	if (afc->saved_size > 0)
> +		space = 1;
> +	
> +	p = krealloc(afc->saved_options, afc->saved_size + space + len + 1, GFP_KERNEL);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	q = p + afc->saved_size;
> +	if (q != p)
> +		*q++ = ' ';
> +	memcpy(q, opt, len);
> +	q += len;
> +	*q = 0;
> +
> +	afc->saved_options = p;
> +	afc->saved_size += 1 + len;
> +	return 0;
> +}
> +
> +static int apparmor_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
> +				  unsigned int mnt_flags)
> +{
> +	struct aa_label *label;
> +	int error = 0;
> +
> +	label = __begin_current_label_crit_section();
> +	if (!unconfined(label))
> +		error = aa_new_mount_fc(label, fc, mountpoint);
> +	__end_current_label_crit_section(label);
> +
> +	return error;
> +}
> +
>  static int apparmor_sb_mount(const char *dev_name, const struct path *path,
>  			     const char *type, unsigned long flags, void *data)
>  {
> @@ -528,7 +600,7 @@ static int apparmor_sb_mount(const char *dev_name, const struct path *path,
>  	if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
>  		flags &= ~MS_MGC_MSK;
>  
> -	flags &= ~AA_MS_IGNORE_MASK;
> +	flags &= ~AA_SB_IGNORE_MASK;
>  
>  	label = __begin_current_label_crit_section();
>  	if (!unconfined(label)) {
> @@ -1124,6 +1196,12 @@ static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(capget, apparmor_capget),
>  	LSM_HOOK_INIT(capable, apparmor_capable),
>  
> +	LSM_HOOK_INIT(fs_context_alloc, apparmor_fs_context_alloc),
> +	LSM_HOOK_INIT(fs_context_dup, apparmor_fs_context_dup),
> +	LSM_HOOK_INIT(fs_context_free, apparmor_fs_context_free),
> +	LSM_HOOK_INIT(fs_context_parse_option, apparmor_fs_context_parse_option),
> +	LSM_HOOK_INIT(sb_mountpoint, apparmor_sb_mountpoint),
> +
>  	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
>  	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
>  	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
> diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
> index 45bb769d6cd7..3d477d288627 100644
> --- a/security/apparmor/mount.c
> +++ b/security/apparmor/mount.c
> @@ -554,6 +554,52 @@ int aa_new_mount(struct aa_label *label, const char *dev_name,
>  	return error;
>  }
>  
> +int aa_new_mount_fc(struct aa_label *label, struct fs_context *fc,
> +		    const struct path *mountpoint)
> +{
> +	struct apparmor_fs_context *afc = fc->security;
> +	struct aa_profile *profile;
> +	char *buffer = NULL, *dev_buffer = NULL;
> +	bool binary;
> +	int error;
> +	struct path tmp_path, *dev_path = NULL;
> +
> +	AA_BUG(!label);
> +	AA_BUG(!mountpoint);
> +
> +	binary = fc->fs_type->fs_flags & FS_BINARY_MOUNTDATA;
> +
> +	if (fc->fs_type->fs_flags & FS_REQUIRES_DEV) {
> +		if (!fc->source)
> +			return -ENOENT;
> +
> +		error = kern_path(fc->source, LOOKUP_FOLLOW, &tmp_path);
> +		if (error)
> +			return error;
> +		dev_path = &tmp_path;
> +	}
> +
> +	get_buffers(buffer, dev_buffer);
> +	if (dev_path) {
> +		error = fn_for_each_confined(label, profile,
> +			match_mnt(profile, mountpoint, buffer, dev_path, dev_buffer,
> +				  fc->fs_type->name,
> +				  fc->sb_flags & ~AA_SB_IGNORE_MASK,
> +				  afc->saved_options, binary));
> +	} else {
> +		error = fn_for_each_confined(label, profile,
> +			match_mnt_path_str(profile, mountpoint, buffer, fc->source,
> +					   fc->fs_type->name,
> +					   fc->sb_flags & ~AA_SB_IGNORE_MASK,
> +					   afc->saved_options, binary, NULL));
> +	}
> +	put_buffers(buffer, dev_buffer);
> +	if (dev_path)
> +		path_put(dev_path);
> +
> +	return error;
> +}
> +
>  static int profile_umount(struct aa_profile *profile, struct path *path,
>  			  char *buffer)
>  {
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 05/24] apparmor: Implement security hooks for the new mount API [ver #7]
  2018-04-19 13:31 ` [PATCH 05/24] apparmor: Implement security hooks for the new mount API " David Howells
  2018-05-04  0:10   ` John Johansen
@ 2018-05-11 12:20   ` David Howells
  1 sibling, 0 replies; 40+ messages in thread
From: David Howells @ 2018-05-11 12:20 UTC (permalink / raw)
  To: John Johansen
  Cc: dhowells, kent.overstreet, viro, linux-nfs, apparmor,
	linux-kernel, linux-security-module, linux-fsdevel, linux-afs

John Johansen <john.johansen@canonical.com> wrote:

> this looks good, and has pasted the testing that I have done so far. I
> have started on the work that will allow us to reorder the match but
> its not ready yet and shouldn't hold this up.

Excellent, thanks!

One thing to consider: Kent Overstreet mentioned the possibility of adding
support for multiple sources - something that his bcachefs would require.

>From the userspace PoV this could look something like:

	fd = fsopen("bcachefs");
	write(fd, "d /dev/sda1");
	write(fd, "d /dev/sda2");
	...

David

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [12/24] proc: Add fs_context support to procfs [ver #7]
  2018-04-19 13:32 ` [PATCH 12/24] proc: Add fs_context support to procfs " David Howells
@ 2018-06-19  3:34   ` Andrei Vagin
  2018-06-26  6:13     ` Andrei Vagin
  0 siblings, 1 reply; 40+ messages in thread
From: Andrei Vagin @ 2018-06-19  3:34 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

[-- Attachment #1: Type: text/plain, Size: 10050 bytes --]

Hi David,

We run CRIU tests for vfs/for-next, and today a few of these test failed. I
found that the problem appears after this patch..

https://travis-ci.org/avagin/linux/jobs/393766778

The reproducer is attached. It creates a process in a new set of namespaces
(user, mount, etc) and then this process fails to mount procfs, the mount
syscall returns EBUSY.

666   pipe([3, 4])                      = 0
666   clone(child_stack=0x7ffc23a89400, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 667
666   openat(AT_FDCWD, "/proc/667/uid_map", O_WRONLY <unfinished ...>
667   close(4 <unfinished ...>
666   <... openat resumed> )            = 5
666   write(5, "0 100000 100000\n100000 200000 50"..., 36 <unfinished ...>
667   <... close resumed> )             = 0
666   <... write resumed> )             = 36
666   close(5 <unfinished ...>
667   read(3,  <unfinished ...>
666   <... close resumed> )             = 0
666   openat(AT_FDCWD, "/proc/667/gid_map", O_WRONLY) = 5
666   write(5, "0 400000 50000\n50000 500000 1000"..., 35) = 35
666   close(5)                          = 0
666   write(4, " \225\250#", 4)         = 4
667   <... read resumed> " \225\250#", 4) = 4
666   wait4(667,  <unfinished ...>
667   setsid()                          = 1
667   setuid(0)                         = 0
667   setgid(0)                         = 0
667   setgroups(0, NULL)                = 0
667   mount("proc", "/mnt", "proc", MS_MGC_VAL|MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EBUSY (Device or resource busy)

Thanks,
Andrei

On Thu, Apr 19, 2018 at 02:32:28PM +0100, David Howells wrote:
> Add fs_context support to procfs.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/proc/inode.c    |    2 -
>  fs/proc/internal.h |    2 -
>  fs/proc/root.c     |  169 ++++++++++++++++++++++++++++++++++------------------
>  3 files changed, 113 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 0b13cf6eb6d7..7aa86dd65ba8 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -128,7 +128,7 @@ const struct super_operations proc_sops = {
>  	.drop_inode	= generic_delete_inode,
>  	.evict_inode	= proc_evict_inode,
>  	.statfs		= simple_statfs,
> -	.remount_fs	= proc_remount,
> +	.reconfigure	= proc_reconfigure,
>  	.show_options	= proc_show_options,
>  };
>  
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 3182e1b636d3..a5ab9504768a 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -254,7 +254,7 @@ static inline void proc_tty_init(void) {}
>  extern struct proc_dir_entry proc_root;
>  
>  extern void proc_self_init(void);
> -extern int proc_remount(struct super_block *, int *, char *, size_t);
> +extern int proc_reconfigure(struct super_block *, struct fs_context *);
>  
>  /*
>   * task_[no]mmu.c
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 2fbc177f37a8..e6bd31fbc714 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -19,14 +19,24 @@
>  #include <linux/module.h>
>  #include <linux/bitops.h>
>  #include <linux/user_namespace.h>
> +#include <linux/fs_context.h>
>  #include <linux/mount.h>
>  #include <linux/pid_namespace.h>
>  #include <linux/parser.h>
>  #include <linux/cred.h>
>  #include <linux/magic.h>
> +#include <linux/slab.h>
>  
>  #include "internal.h"
>  
> +struct proc_fs_context {
> +	struct fs_context	fc;
> +	struct pid_namespace	*pid_ns;
> +	unsigned long		mask;
> +	int			hidepid;
> +	int			gid;
> +};
> +
>  enum {
>  	Opt_gid, Opt_hidepid, Opt_err,
>  };
> @@ -37,56 +47,60 @@ static const match_table_t tokens = {
>  	{Opt_err, NULL},
>  };
>  
> -static int proc_parse_options(char *options, struct pid_namespace *pid)
> +static int proc_parse_option(struct fs_context *fc, char *opt, size_t len)
>  {
> -	char *p;
> +	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
>  	substring_t args[MAX_OPT_ARGS];
> -	int option;
> -
> -	if (!options)
> -		return 1;
> -
> -	while ((p = strsep(&options, ",")) != NULL) {
> -		int token;
> -		if (!*p)
> -			continue;
> -
> -		args[0].to = args[0].from = NULL;
> -		token = match_token(p, tokens, args);
> -		switch (token) {
> -		case Opt_gid:
> -			if (match_int(&args[0], &option))
> -				return 0;
> -			pid->pid_gid = make_kgid(current_user_ns(), option);
> -			break;
> -		case Opt_hidepid:
> -			if (match_int(&args[0], &option))
> -				return 0;
> -			if (option < HIDEPID_OFF ||
> -			    option > HIDEPID_INVISIBLE) {
> -				pr_err("proc: hidepid value must be between 0 and 2.\n");
> -				return 0;
> -			}
> -			pid->hide_pid = option;
> -			break;
> -		default:
> -			pr_err("proc: unrecognized mount option \"%s\" "
> -			       "or missing value\n", p);
> -			return 0;
> +	int token;
> +	
> +	args[0].to = args[0].from = NULL;
> +	token = match_token(opt, tokens, args);
> +	switch (token) {
> +	case Opt_gid:
> +		if (match_int(&args[0], &ctx->gid))
> +			return -EINVAL;
> +		break;
> +
> +	case Opt_hidepid:
> +		if (match_int(&args[0], &ctx->hidepid))
> +			return -EINVAL;
> +		if (ctx->hidepid < HIDEPID_OFF ||
> +		    ctx->hidepid > HIDEPID_INVISIBLE) {
> +			pr_err("proc: hidepid value must be between 0 and 2.\n");
> +			return -EINVAL;
>  		}
> +		break;
> +
> +	default:
> +		pr_err("proc: unrecognized mount option \"%s\" or missing value\n",
> +		       opt);
> +		return -EINVAL;
>  	}
>  
> -	return 1;
> +	ctx->mask |= 1 << token;
> +	return 0;
> +}
> +
> +static void proc_set_options(struct super_block *s,
> +			     struct fs_context *fc,
> +			     struct pid_namespace *pid_ns,
> +			     struct user_namespace *user_ns)
> +{
> +	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
> +
> +	if (ctx->mask & (1 << Opt_gid))
> +		pid_ns->pid_gid = make_kgid(user_ns, ctx->gid);
> +	if (ctx->mask & (1 << Opt_hidepid))
> +		pid_ns->hide_pid = ctx->hidepid;
>  }
>  
> -static int proc_fill_super(struct super_block *s, void *data, size_t data_size, int silent)
> +static int proc_fill_super(struct super_block *s, struct fs_context *fc)
>  {
> -	struct pid_namespace *ns = get_pid_ns(s->s_fs_info);
> +	struct pid_namespace *pid_ns = get_pid_ns(s->s_fs_info);
>  	struct inode *root_inode;
>  	int ret;
>  
> -	if (!proc_parse_options(data, ns))
> -		return -EINVAL;
> +	proc_set_options(s, fc, pid_ns, current_user_ns());
>  
>  	/* User space would break if executables or devices appear on proc */
>  	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
> @@ -103,7 +117,7 @@ static int proc_fill_super(struct super_block *s, void *data, size_t data_size,
>  	 * top of it
>  	 */
>  	s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
> -	
> +
>  	pde_get(&proc_root);
>  	root_inode = proc_get_inode(s, &proc_root);
>  	if (!root_inode) {
> @@ -124,30 +138,46 @@ static int proc_fill_super(struct super_block *s, void *data, size_t data_size,
>  	return proc_setup_thread_self(s);
>  }
>  
> -int proc_remount(struct super_block *sb, int *flags,
> -		 char *data, size_t data_size)
> +int proc_reconfigure(struct super_block *sb, struct fs_context *fc)
>  {
>  	struct pid_namespace *pid = sb->s_fs_info;
>  
>  	sync_filesystem(sb);
> -	return !proc_parse_options(data, pid);
> +
> +	if (fc)
> +		proc_set_options(sb, fc, pid, current_user_ns());
> +	return 0;
>  }
>  
> -static struct dentry *proc_mount(struct file_system_type *fs_type,
> -				 int flags, const char *dev_name,
> -				 void *data, size_t data_size)
> +static int proc_get_tree(struct fs_context *fc)
>  {
> -	struct pid_namespace *ns;
> +	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
>  
> -	if (flags & SB_KERNMOUNT) {
> -		ns = data;
> -		data = NULL;
> -	} else {
> -		ns = task_active_pid_ns(current);
> -	}
> +	ctx->fc.s_fs_info = ctx->pid_ns;
> +	return vfs_get_super(fc, vfs_get_keyed_super, proc_fill_super);
> +}
>  
> -	return mount_ns(fs_type, flags, data, data_size, ns, ns->user_ns,
> -			proc_fill_super);
> +static void proc_fs_context_free(struct fs_context *fc)
> +{
> +	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
> +
> +	if (ctx->pid_ns)
> +		put_pid_ns(ctx->pid_ns);
> +}
> +
> +static const struct fs_context_operations proc_fs_context_ops = {
> +	.free		= proc_fs_context_free,
> +	.parse_option	= proc_parse_option,
> +	.get_tree	= proc_get_tree,
> +};
> +
> +static int proc_init_fs_context(struct fs_context *fc, struct super_block *src_sb)
> +{
> +	struct proc_fs_context *ctx = container_of(fc, struct proc_fs_context, fc);
> +
> +	ctx->pid_ns = get_pid_ns(task_active_pid_ns(current));
> +	ctx->fc.ops = &proc_fs_context_ops;
> +	return 0;
>  }
>  
>  static void proc_kill_sb(struct super_block *sb)
> @@ -165,7 +195,8 @@ static void proc_kill_sb(struct super_block *sb)
>  
>  static struct file_system_type proc_fs_type = {
>  	.name		= "proc",
> -	.mount		= proc_mount,
> +	.fs_context_size	= sizeof(struct proc_fs_context),
> +	.init_fs_context	= proc_init_fs_context,
>  	.kill_sb	= proc_kill_sb,
>  	.fs_flags	= FS_USERNS_MOUNT,
>  };
> @@ -205,7 +236,7 @@ static struct dentry *proc_root_lookup(struct inode * dir, struct dentry * dentr
>  {
>  	if (!proc_pid_lookup(dir, dentry, flags))
>  		return NULL;
> -	
> +
>  	return proc_lookup(dir, dentry, flags);
>  }
>  
> @@ -259,9 +290,31 @@ struct proc_dir_entry proc_root = {
>  
>  int pid_ns_prepare_proc(struct pid_namespace *ns)
>  {
> +	struct proc_fs_context *ctx;
> +	struct fs_context *fc;
>  	struct vfsmount *mnt;
> +	int ret;
> +
> +	fc = vfs_new_fs_context(&proc_fs_type, NULL, 0,
> +				FS_CONTEXT_FOR_KERNEL_MOUNT);
> +	if (IS_ERR(fc))
> +		return PTR_ERR(fc);
> +
> +	ctx = container_of(fc, struct proc_fs_context, fc);
> +	if (ctx->pid_ns != ns) {
> +		put_pid_ns(ctx->pid_ns);
> +		get_pid_ns(ns);
> +		ctx->pid_ns = ns;
> +	}
> +
> +	ret = vfs_get_tree(fc);
> +	if (ret < 0) {
> +		put_fs_context(fc);
> +		return ret;
> +	}
>  
> -	mnt = kern_mount_data(&proc_fs_type, ns, 0);
> +	mnt = vfs_create_mount(fc);
> +	put_fs_context(fc);
>  	if (IS_ERR(mnt))
>  		return PTR_ERR(mnt);
>  

[-- Attachment #2: test.c --]
[-- Type: text/plain, Size: 2265 bytes --]

#define _GNU_SOURCE
#include <sys/types.h>
#include <sched.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mount.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <grp.h>
#include <linux/limits.h>


#define NS_STACK_SIZE	4096

#define __stack_aligned__	__attribute__((aligned(16)))

/* All arguments should be above stack, because it grows down */
struct ns_exec_args {
	char stack[NS_STACK_SIZE] __stack_aligned__;
	char stack_ptr[0];
	int pfd[2];
};

static int ns_exec(void *_arg)
{
	struct ns_exec_args *args = (struct ns_exec_args *) _arg;
	int ret;

	close(args->pfd[1]);
	if (read(args->pfd[0], &ret, sizeof(ret)) != sizeof(ret))
		return -1;

	setsid();

	if (setuid(0) || setgid(0) || setgroups(0, NULL)) {
		fprintf(stderr, "set*id failed: %m\n");
		return -1;
	}

	if (mount("proc", "/mnt", "proc", MS_MGC_VAL | MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL)) {
		fprintf(stderr, "mount(/proc) failed: %m\n");
		return -1;
	}

	return 0;
}

#define UID_MAP "0 100000 100000\n100000 200000 50000"
#define GID_MAP "0 400000 50000\n50000 500000 100000"
int main()
{
	pid_t pid;
	int ret, status;
	struct ns_exec_args args;
	int flags;
	char pname[PATH_MAX];
	int fd, pfd[2];

	if (pipe(pfd))
		return 1;

	args.pfd[0] = pfd[0];
	args.pfd[1] = pfd[1];

	flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS |
		CLONE_NEWNET | CLONE_NEWIPC | CLONE_NEWUSER | SIGCHLD;

	pid = clone(ns_exec, args.stack_ptr, flags, &args);
	if (pid < 0) {
		fprintf(stderr, "clone() failed: %m\n");
		exit(1);
	}


	snprintf(pname, sizeof(pname), "/proc/%d/uid_map", pid);
	fd = open(pname, O_WRONLY);
	if (fd < 0) {
		fprintf(stderr, "open(%s): %m\n", pname);
		exit(1);
	}
	if (write(fd, UID_MAP, sizeof(UID_MAP)) < 0) {
		fprintf(stderr, "write(" UID_MAP "): %m\n");
		exit(1);
	}
	close(fd);

	snprintf(pname, sizeof(pname), "/proc/%d/gid_map", pid);
	fd = open(pname, O_WRONLY);
	if (fd < 0) {
		fprintf(stderr, "open(%s): %m\n", pname);
		exit(1);
	}
	if (write(fd, GID_MAP, sizeof(GID_MAP)) < 0) {
		fprintf(stderr, "write(" GID_MAP "): %m\n");
		exit(1);
	}
	close(fd);

	if (write(pfd[1], &ret, sizeof(ret)) != sizeof(ret))
		return 1;

	if (waitpid(pid, &status, 0) != pid)
		return 1;
	if (status)
		return 1;

	return 0;
}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [12/24] proc: Add fs_context support to procfs [ver #7]
  2018-06-19  3:34   ` [12/24] " Andrei Vagin
@ 2018-06-26  6:13     ` Andrei Vagin
  2018-06-26  7:27       ` Andrei Vagin
  2018-06-26  8:57       ` David Howells
  0 siblings, 2 replies; 40+ messages in thread
From: Andrei Vagin @ 2018-06-26  6:13 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

On Mon, Jun 18, 2018 at 08:34:50PM -0700, Andrei Vagin wrote:
> Hi David,
> 
> We run CRIU tests for vfs/for-next, and today a few of these test failed. I
> found that the problem appears after this patch..
> 
> >  int pid_ns_prepare_proc(struct pid_namespace *ns)
> >  {
> > +	struct proc_fs_context *ctx;
> > +	struct fs_context *fc;
> >  	struct vfsmount *mnt;
> > +	int ret;
> > +
> > +	fc = vfs_new_fs_context(&proc_fs_type, NULL, 0,
> > +				FS_CONTEXT_FOR_KERNEL_MOUNT);
> > +	if (IS_ERR(fc))
> > +		return PTR_ERR(fc);
> > +
> > +	ctx = container_of(fc, struct proc_fs_context, fc);
> > +	if (ctx->pid_ns != ns) {
> > +		put_pid_ns(ctx->pid_ns);
> > +		get_pid_ns(ns);
> > +		ctx->pid_ns = ns;
> > +	}
> > +
> > +	ret = vfs_get_tree(fc);
> > +	if (ret < 0) {
> > +		put_fs_context(fc);
> > +		return ret;
> > +	}
> >  
> > -	mnt = kern_mount_data(&proc_fs_type, ns, 0);

Here ns->user_ns and get_current_cred()->user_ns are not always equal

> > +	mnt = vfs_create_mount(fc);
> > +	put_fs_context(fc);
> >  	if (IS_ERR(mnt))
> >  		return PTR_ERR(mnt);
> >  

> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sched.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <sys/mount.h>
> #include <sys/wait.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <stdlib.h>
> #include <grp.h>
> #include <linux/limits.h>
> 
> 
> #define NS_STACK_SIZE	4096
> 
> #define __stack_aligned__	__attribute__((aligned(16)))
> 
> /* All arguments should be above stack, because it grows down */
> struct ns_exec_args {
> 	char stack[NS_STACK_SIZE] __stack_aligned__;
> 	char stack_ptr[0];
> 	int pfd[2];
> };
> 
> static int ns_exec(void *_arg)
> {
> 	struct ns_exec_args *args = (struct ns_exec_args *) _arg;
> 	int ret;
> 
> 	close(args->pfd[1]);
> 	if (read(args->pfd[0], &ret, sizeof(ret)) != sizeof(ret))
> 		return -1;
> 
> 	setsid();
> 
> 	if (setuid(0) || setgid(0) || setgroups(0, NULL)) {
> 		fprintf(stderr, "set*id failed: %m\n");
> 		return -1;
> 	}
> 
> 	if (mount("proc", "/mnt", "proc", MS_MGC_VAL | MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL)) {
> 		fprintf(stderr, "mount(/proc) failed: %m\n");
> 		return -1;
> 	}
> 
> 	return 0;
> }
> 
> #define UID_MAP "0 100000 100000\n100000 200000 50000"
> #define GID_MAP "0 400000 50000\n50000 500000 100000"
> int main()
> {
> 	pid_t pid;
> 	int ret, status;
> 	struct ns_exec_args args;
> 	int flags;
> 	char pname[PATH_MAX];
> 	int fd, pfd[2];
> 
> 	if (pipe(pfd))
> 		return 1;
> 
> 	args.pfd[0] = pfd[0];
> 	args.pfd[1] = pfd[1];
> 
> 	flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS |
> 		CLONE_NEWNET | CLONE_NEWIPC | CLONE_NEWUSER | SIGCHLD;
> 
> 	pid = clone(ns_exec, args.stack_ptr, flags, &args);
> 	if (pid < 0) {
> 		fprintf(stderr, "clone() failed: %m\n");
> 		exit(1);
> 	}
> 
> 
> 	snprintf(pname, sizeof(pname), "/proc/%d/uid_map", pid);
> 	fd = open(pname, O_WRONLY);
> 	if (fd < 0) {
> 		fprintf(stderr, "open(%s): %m\n", pname);
> 		exit(1);
> 	}
> 	if (write(fd, UID_MAP, sizeof(UID_MAP)) < 0) {
> 		fprintf(stderr, "write(" UID_MAP "): %m\n");
> 		exit(1);
> 	}
> 	close(fd);
> 
> 	snprintf(pname, sizeof(pname), "/proc/%d/gid_map", pid);
> 	fd = open(pname, O_WRONLY);
> 	if (fd < 0) {
> 		fprintf(stderr, "open(%s): %m\n", pname);
> 		exit(1);
> 	}
> 	if (write(fd, GID_MAP, sizeof(GID_MAP)) < 0) {
> 		fprintf(stderr, "write(" GID_MAP "): %m\n");
> 		exit(1);
> 	}
> 	close(fd);
> 
> 	if (write(pfd[1], &ret, sizeof(ret)) != sizeof(ret))
> 		return 1;
> 
> 	if (waitpid(pid, &status, 0) != pid)
> 		return 1;
> 	if (status)
> 		return 1;
> 
> 	return 0;
> }


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [12/24] proc: Add fs_context support to procfs [ver #7]
  2018-06-26  6:13     ` Andrei Vagin
@ 2018-06-26  7:27       ` Andrei Vagin
  2018-06-26  8:57       ` David Howells
  1 sibling, 0 replies; 40+ messages in thread
From: Andrei Vagin @ 2018-06-26  7:27 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

[-- Attachment #1: Type: text/plain, Size: 3966 bytes --]

On Mon, Jun 25, 2018 at 11:13:20PM -0700, Andrei Vagin wrote:
> On Mon, Jun 18, 2018 at 08:34:50PM -0700, Andrei Vagin wrote:
> > Hi David,
> > 
> > We run CRIU tests for vfs/for-next, and today a few of these test failed. I
> > found that the problem appears after this patch..
> > 
> > >  int pid_ns_prepare_proc(struct pid_namespace *ns)
> > >  {
> > > +	struct proc_fs_context *ctx;
> > > +	struct fs_context *fc;
> > >  	struct vfsmount *mnt;
> > > +	int ret;
> > > +
> > > +	fc = vfs_new_fs_context(&proc_fs_type, NULL, 0,
> > > +				FS_CONTEXT_FOR_KERNEL_MOUNT);
> > > +	if (IS_ERR(fc))
> > > +		return PTR_ERR(fc);
> > > +
> > > +	ctx = container_of(fc, struct proc_fs_context, fc);
> > > +	if (ctx->pid_ns != ns) {
> > > +		put_pid_ns(ctx->pid_ns);
> > > +		get_pid_ns(ns);
> > > +		ctx->pid_ns = ns;
> > > +	}
> > > +
> > > +	ret = vfs_get_tree(fc);
> > > +	if (ret < 0) {
> > > +		put_fs_context(fc);
> > > +		return ret;
> > > +	}
> > >  
> > > -	mnt = kern_mount_data(&proc_fs_type, ns, 0);
> 
> Here ns->user_ns and get_current_cred()->user_ns are not always equal

What do you think about the attached patch?

> 
> > > +	mnt = vfs_create_mount(fc);
> > > +	put_fs_context(fc);
> > >  	if (IS_ERR(mnt))
> > >  		return PTR_ERR(mnt);
> > >  
> 
> > #define _GNU_SOURCE
> > #include <sys/types.h>
> > #include <sched.h>
> > #include <unistd.h>
> > #include <stdio.h>
> > #include <sys/mount.h>
> > #include <sys/wait.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > #include <stdlib.h>
> > #include <grp.h>
> > #include <linux/limits.h>
> > 
> > 
> > #define NS_STACK_SIZE	4096
> > 
> > #define __stack_aligned__	__attribute__((aligned(16)))
> > 
> > /* All arguments should be above stack, because it grows down */
> > struct ns_exec_args {
> > 	char stack[NS_STACK_SIZE] __stack_aligned__;
> > 	char stack_ptr[0];
> > 	int pfd[2];
> > };
> > 
> > static int ns_exec(void *_arg)
> > {
> > 	struct ns_exec_args *args = (struct ns_exec_args *) _arg;
> > 	int ret;
> > 
> > 	close(args->pfd[1]);
> > 	if (read(args->pfd[0], &ret, sizeof(ret)) != sizeof(ret))
> > 		return -1;
> > 
> > 	setsid();
> > 
> > 	if (setuid(0) || setgid(0) || setgroups(0, NULL)) {
> > 		fprintf(stderr, "set*id failed: %m\n");
> > 		return -1;
> > 	}
> > 
> > 	if (mount("proc", "/mnt", "proc", MS_MGC_VAL | MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL)) {
> > 		fprintf(stderr, "mount(/proc) failed: %m\n");
> > 		return -1;
> > 	}
> > 
> > 	return 0;
> > }
> > 
> > #define UID_MAP "0 100000 100000\n100000 200000 50000"
> > #define GID_MAP "0 400000 50000\n50000 500000 100000"
> > int main()
> > {
> > 	pid_t pid;
> > 	int ret, status;
> > 	struct ns_exec_args args;
> > 	int flags;
> > 	char pname[PATH_MAX];
> > 	int fd, pfd[2];
> > 
> > 	if (pipe(pfd))
> > 		return 1;
> > 
> > 	args.pfd[0] = pfd[0];
> > 	args.pfd[1] = pfd[1];
> > 
> > 	flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS |
> > 		CLONE_NEWNET | CLONE_NEWIPC | CLONE_NEWUSER | SIGCHLD;
> > 
> > 	pid = clone(ns_exec, args.stack_ptr, flags, &args);
> > 	if (pid < 0) {
> > 		fprintf(stderr, "clone() failed: %m\n");
> > 		exit(1);
> > 	}
> > 
> > 
> > 	snprintf(pname, sizeof(pname), "/proc/%d/uid_map", pid);
> > 	fd = open(pname, O_WRONLY);
> > 	if (fd < 0) {
> > 		fprintf(stderr, "open(%s): %m\n", pname);
> > 		exit(1);
> > 	}
> > 	if (write(fd, UID_MAP, sizeof(UID_MAP)) < 0) {
> > 		fprintf(stderr, "write(" UID_MAP "): %m\n");
> > 		exit(1);
> > 	}
> > 	close(fd);
> > 
> > 	snprintf(pname, sizeof(pname), "/proc/%d/gid_map", pid);
> > 	fd = open(pname, O_WRONLY);
> > 	if (fd < 0) {
> > 		fprintf(stderr, "open(%s): %m\n", pname);
> > 		exit(1);
> > 	}
> > 	if (write(fd, GID_MAP, sizeof(GID_MAP)) < 0) {
> > 		fprintf(stderr, "write(" GID_MAP "): %m\n");
> > 		exit(1);
> > 	}
> > 	close(fd);
> > 
> > 	if (write(pfd[1], &ret, sizeof(ret)) != sizeof(ret))
> > 		return 1;
> > 
> > 	if (waitpid(pid, &status, 0) != pid)
> > 		return 1;
> > 	if (status)
> > 		return 1;
> > 
> > 	return 0;
> > }
> 

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 2765 bytes --]

diff --git a/fs/fs_context.c b/fs/fs_context.c
index 97e8c1dc4e3b..ad2db7504031 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -235,10 +235,11 @@ EXPORT_SYMBOL(generic_parse_monolithic);
  * another superblock (referred to by @reference) is supplied, may have
  * parameters such as namespaces copied across from that superblock.
  */
-struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
+struct fs_context *vfs_new_fs_context_userns(struct file_system_type *fs_type,
 				      struct dentry *reference,
 				      unsigned int sb_flags,
-				      enum fs_context_purpose purpose)
+				      enum fs_context_purpose purpose,
+				      struct user_namespace *user_ns)
 {
 	struct fs_context *fc;
 	int ret;
@@ -259,7 +260,7 @@ struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
 		fc->sb_flags |= SB_KERNMOUNT;
 		/* Fallthrough */
 	case FS_CONTEXT_FOR_USER_MOUNT:
-		fc->user_ns = get_user_ns(fc->cred->user_ns);
+		fc->user_ns = get_user_ns(user_ns ? : fc->cred->user_ns);
 		fc->net_ns = get_net(current->nsproxy->net_ns);
 		break;
 	case FS_CONTEXT_FOR_SUBMOUNT:
diff --git a/fs/proc/root.c b/fs/proc/root.c
index efbdc08a3c86..c832d67067d9 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -298,8 +298,8 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 	struct vfsmount *mnt;
 	int ret;
 
-	fc = vfs_new_fs_context(&proc_fs_type, NULL, 0,
-				FS_CONTEXT_FOR_KERNEL_MOUNT);
+	fc = vfs_new_fs_context_userns(&proc_fs_type, NULL, 0,
+				FS_CONTEXT_FOR_KERNEL_MOUNT, ns->user_ns);
 	if (IS_ERR(fc))
 		return PTR_ERR(fc);
 
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index 04ea338ff490..283212cda1ff 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -92,10 +92,19 @@ struct fs_context_operations {
 /*
  * fs_context manipulation functions.
  */
-extern struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
+extern struct fs_context *vfs_new_fs_context_userns(struct file_system_type *fs_type,
 					     struct dentry *reference,
 					     unsigned int ms_flags,
-					     enum fs_context_purpose purpose);
+					     enum fs_context_purpose purpose,
+					     struct user_namespace *user_ns);
+static inline struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
+					     struct dentry *reference,
+					     unsigned int ms_flags,
+					     enum fs_context_purpose purpose)
+{
+	return vfs_new_fs_context_userns(fs_type, reference, ms_flags, purpose, NULL);
+}
+
 extern struct fs_context *vfs_sb_reconfig(struct path *path, unsigned int ms_flags);
 extern struct fs_context *vfs_dup_fs_context(struct fs_context *src);
 extern int vfs_set_fs_source(struct fs_context *fc, const char *source, size_t len);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [12/24] proc: Add fs_context support to procfs [ver #7]
  2018-06-26  6:13     ` Andrei Vagin
  2018-06-26  7:27       ` Andrei Vagin
@ 2018-06-26  8:57       ` David Howells
  2018-06-28  5:50         ` Andrei Vagin
  1 sibling, 1 reply; 40+ messages in thread
From: David Howells @ 2018-06-26  8:57 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: dhowells, viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

Andrei Vagin <avagin@virtuozzo.com> wrote:

> > > > -	mnt = kern_mount_data(&proc_fs_type, ns, 0);
> > 
> > Here ns->user_ns and get_current_cred()->user_ns are not always equal
> 
> What do you think about the attached patch?
> ...
> -	fc = vfs_new_fs_context(&proc_fs_type, NULL, 0,
> -				FS_CONTEXT_FOR_KERNEL_MOUNT);
> +	fc = vfs_new_fs_context_userns(&proc_fs_type, NULL, 0,
> +				FS_CONTEXT_FOR_KERNEL_MOUNT, ns->user_ns);

Or you could just change fc->user_ns immediately after calling
vfs_new_fs_context().  This is what network filesystems should do with
fc->net_ns, for example.

> -struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
> +struct fs_context *vfs_new_fs_context_userns(struct file_system_type *fs_type,
>  				      struct dentry *reference,
>  				      unsigned int sb_flags,
> -				      enum fs_context_purpose purpose)
> +				      enum fs_context_purpose purpose,
> +				      struct user_namespace *user_ns)


If you'd really rather add a new parameter, please don't rename the function
to vfs_new_fs_context_userns() - just add a new parameter.  There don't need
to be two versions of it.


This brings me to another thought:  I want to add the ability to let
namespaces be configured by userspace, for example:

	fd = fsopen("nfs");
	sprintf(buf, "ns user %d", my_user_ns_fd);
	write(fd, buf);
	sprintf(buf, "ns net %d", my_net_ns_fd);
	write(fd, buf);
	write(fd, "s fedoraproject.org:/pub");
	write(fd, "o intr");
	...

I think therefore, I might need to insert another phase between creating the
context and calling the filesystem initialiser:

	fc = vfs_new_fs_context(&afs_fs_type, mntpt, 0,
				FS_CONTEXT_FOR_SUBMOUNT);

followed by:

	vfs_sb_set_namespace(fc, THIS_IS_USER_NS, user_ns);
	vfs_sb_set_namespace(fc, THIS_IS_NET_NS, net_ns);

but then we'd need to do:

	vfs_begin_options(fc);

before continuing (unless we made this happen automatically on the receipt of
the first option):

	afs_mntpt_set_params(fc, mntpt);
	vfs_get_tree(fc);
	mnt = vfs_create_mount(fc, 0);

Alternatively, we could do the namespace setting after initialisation and let
the fs apply the changes itself.

David

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [12/24] proc: Add fs_context support to procfs [ver #7]
  2018-06-26  8:57       ` David Howells
@ 2018-06-28  5:50         ` Andrei Vagin
  0 siblings, 0 replies; 40+ messages in thread
From: Andrei Vagin @ 2018-06-28  5:50 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-nfs, linux-kernel, linux-security-module,
	linux-fsdevel, linux-afs

[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]

On Tue, Jun 26, 2018 at 09:57:07AM +0100, David Howells wrote:
> Andrei Vagin <avagin@virtuozzo.com> wrote:
> 
> > > > > -	mnt = kern_mount_data(&proc_fs_type, ns, 0);
> > > 
> > > Here ns->user_ns and get_current_cred()->user_ns are not always equal
> > 
> > What do you think about the attached patch?
> > ...
> > -	fc = vfs_new_fs_context(&proc_fs_type, NULL, 0,
> > -				FS_CONTEXT_FOR_KERNEL_MOUNT);
> > +	fc = vfs_new_fs_context_userns(&proc_fs_type, NULL, 0,
> > +				FS_CONTEXT_FOR_KERNEL_MOUNT, ns->user_ns);
> 
> Or you could just change fc->user_ns immediately after calling
> vfs_new_fs_context().  This is what network filesystems should do with
> fc->net_ns, for example.

Ok, it works for me. The patch is attached.

> 
> > -struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type,
> > +struct fs_context *vfs_new_fs_context_userns(struct file_system_type *fs_type,
> >  				      struct dentry *reference,
> >  				      unsigned int sb_flags,
> > -				      enum fs_context_purpose purpose)
> > +				      enum fs_context_purpose purpose,
> > +				      struct user_namespace *user_ns)
> 
> 
> If you'd really rather add a new parameter, please don't rename the function
> to vfs_new_fs_context_userns() - just add a new parameter.  There don't need
> to be two versions of it.
> 
> 
> This brings me to another thought:  I want to add the ability to let
> namespaces be configured by userspace, for example:

It may be a good feature, but I am not sure about procfs. A procfs
instance is created per pidns, so they should have the same owner
userns.

> 
> 	fd = fsopen("nfs");
> 	sprintf(buf, "ns user %d", my_user_ns_fd);
> 	write(fd, buf);
> 	sprintf(buf, "ns net %d", my_net_ns_fd);
> 	write(fd, buf);
> 	write(fd, "s fedoraproject.org:/pub");
> 	write(fd, "o intr");
> 	...
> 
> I think therefore, I might need to insert another phase between creating the
> context and calling the filesystem initialiser:
> 
> 	fc = vfs_new_fs_context(&afs_fs_type, mntpt, 0,
> 				FS_CONTEXT_FOR_SUBMOUNT);
> 
> followed by:
> 
> 	vfs_sb_set_namespace(fc, THIS_IS_USER_NS, user_ns);
> 	vfs_sb_set_namespace(fc, THIS_IS_NET_NS, net_ns);
> 
> but then we'd need to do:
> 
> 	vfs_begin_options(fc);
> 
> before continuing (unless we made this happen automatically on the receipt of
> the first option):
> 
> 	afs_mntpt_set_params(fc, mntpt);
> 	vfs_get_tree(fc);
> 	mnt = vfs_create_mount(fc, 0);
> 
> Alternatively, we could do the namespace setting after initialisation and let
> the fs apply the changes itself.
> 
> David

[-- Attachment #2: 0001-proc-set-a-proper-user-namespace-for-fs_context.patch --]
[-- Type: text/plain, Size: 851 bytes --]

From 2297ffb333a7bcee466a5273a3fc84202b9695a6 Mon Sep 17 00:00:00 2001
From: Andrei Vagin <avagin@openvz.org>
Date: Wed, 27 Jun 2018 22:45:43 -0700
Subject: [PATCH] proc: set a proper user namespace for fs_context

A user namespace should be taken from a pidns for which a procfs is created.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
---
 fs/proc/root.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/proc/root.c b/fs/proc/root.c
index efbdc08a3c86..59aaf06a40c7 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -303,6 +303,11 @@ int pid_ns_prepare_proc(struct pid_namespace *ns)
 	if (IS_ERR(fc))
 		return PTR_ERR(fc);
 
+	if (fc->user_ns != ns->user_ns) {
+		put_user_ns(fc->user_ns);
+		fc->user_ns = get_user_ns(ns->user_ns);
+	}
+
 	ctx = fc->fs_private;
 	if (ctx->pid_ns != ns) {
 		put_pid_ns(ctx->pid_ns);
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-06-28  5:50 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19 13:31 [PATCH 00/24] VFS: Introduce filesystem context [ver #7] David Howells
2018-04-19 13:31 ` [PATCH 01/24] vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion " David Howells
2018-04-19 13:31 ` [PATCH 02/24] VFS: Suppress MS_* flag defs within the kernel unless explicitly enabled " David Howells
2018-04-19 13:31 ` [PATCH 03/24] VFS: Introduce the structs and doc for a filesystem context " David Howells
2018-04-23  3:36   ` Randy Dunlap
2018-05-01 14:29   ` David Howells
2018-05-01 15:31     ` Randy Dunlap
2018-04-19 13:31 ` [PATCH 04/24] VFS: Add LSM hooks for " David Howells
2018-04-19 20:32   ` Paul Moore
2018-04-20 15:35   ` David Howells
2018-04-23 13:25     ` Stephen Smalley
2018-04-24 15:22     ` David Howells
2018-04-25 14:07       ` Stephen Smalley
2018-04-19 13:31 ` [PATCH 05/24] apparmor: Implement security hooks for the new mount API " David Howells
2018-05-04  0:10   ` John Johansen
2018-05-11 12:20   ` David Howells
2018-04-19 13:31 ` [PATCH 06/24] tomoyo: " David Howells
2018-04-19 13:31 ` [PATCH 07/24] smack: Implement filesystem context security hooks " David Howells
2018-04-19 13:31 ` [PATCH 08/24] VFS: Require specification of size of mount data for internal mounts " David Howells
2018-04-19 13:32 ` [PATCH 09/24] VFS: Implement a filesystem superblock creation/configuration context " David Howells
2018-04-19 13:32 ` [PATCH 10/24] VFS: Remove unused code after filesystem context changes " David Howells
2018-04-19 13:32 ` [PATCH 11/24] procfs: Move proc_fill_super() to fs/proc/root.c " David Howells
2018-04-19 13:32 ` [PATCH 12/24] proc: Add fs_context support to procfs " David Howells
2018-06-19  3:34   ` [12/24] " Andrei Vagin
2018-06-26  6:13     ` Andrei Vagin
2018-06-26  7:27       ` Andrei Vagin
2018-06-26  8:57       ` David Howells
2018-06-28  5:50         ` Andrei Vagin
2018-04-19 13:32 ` [PATCH 13/24] ipc: Convert mqueue fs to fs_context " David Howells
2018-04-19 13:32 ` [PATCH 14/24] cpuset: Use " David Howells
2018-04-19 13:32 ` [PATCH 15/24] kernfs, sysfs, cgroup, intel_rdt: Support " David Howells
2018-04-19 13:33 ` [PATCH 16/24] hugetlbfs: Convert to " David Howells
2018-04-19 13:33 ` [PATCH 17/24] VFS: Remove kern_mount_data() " David Howells
2018-04-19 13:33 ` [PATCH 18/24] VFS: Implement fsopen() to prepare for a mount " David Howells
2018-04-19 13:33 ` [PATCH 19/24] VFS: Implement fsmount() to effect a pre-configured " David Howells
2018-04-19 13:33 ` [PATCH 20/24] afs: Fix server record deletion " David Howells
2018-04-19 13:33 ` [PATCH 21/24] net: Export get_proc_net() " David Howells
2018-04-19 13:33 ` [PATCH 22/24] afs: Add fs_context support " David Howells
2018-04-19 13:33 ` [PATCH 23/24] afs: Implement namespacing " David Howells
2018-04-19 13:33 ` [PATCH 24/24] afs: Use fs_context to pass parameters over automount " David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).