Linux-Security-Module Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/10] VFS: Provide new mount UAPI
@ 2019-02-19 17:08 David Howells
  2019-02-19 17:08 ` [PATCH 01/10] vfs: syscall: Add open_tree(2) to reference or clone a mount David Howells
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:08 UTC (permalink / raw)
  To: viro
  Cc: linux-api, linux-fsdevel, dhowells, torvalds, ebiederm,
	linux-security-module


Here's a set of patches that creates a number of new system calls to allow
the creation and parameterisation of a filesystem context and the
subsequent use of that context to create or look up a superblock:

	fd = fsopen("afs");
	fsconfig(fd, FSCONFIG_SET_STRING,
		 "source", "#grand.central.org:root.cell.", 0);
	fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);

or reconfigure a superblock:

	fd = fspick(AT_FDCWD, "/nfs/foo", FSPICK_NO_AUTOMOUNT);
	fsconfig(fd, FSCONFIG_SET_FLAG, "noac", NULL, 0);
	fsconfig(fd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);

A mount object can then be created for the superblock which will be
attached to an O_PATH-equivalent file descriptor:

	mfd = fsmount(fd, MS_NODEV);

This can then be moved into place:

	move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

move_mount() can be used more generically too, e.g.:

	move_mount(AT_FDCWD, "/mnt/foo", AT_FDCWD, "/mnt/bar", 0);

to move mount subtrees around.

One more system call is available:

	mfd = open_tree(AT_FDCWD, "/mnt/foo", 0)
	mfd = open_tree(AT_FDCWD, "/mnt/foo", OPEN_TREE_CLONE)
	mfd = open_tree(AT_FDCWD, "/mnt/foo", OPEN_TREE_CLONE | AT_RECURSIVE)

This creates an O_PATH-equivalent file descriptor referring to a mount, a
copy of a mount or a copy of a mount subtree that move_mount() can then
move/paste into place.

The patches can be found here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git

at tag:

	vfs-mount-syscalls

David
---
Al Viro (1):
      vfs: syscall: Add open_tree(2) to reference or clone a mount

David Howells (9):
      vfs: syscall: Add move_mount(2) to move mounts around
      teach move_mount(2) to work with OPEN_TREE_CLONE
      Make anon_inodes unconditional
      vfs: syscall: Add fsopen() to prepare for superblock creation
      vfs: Implement logging through fs_context
      vfs: syscall: Add fsconfig() for configuring and managing a context
      vfs: syscall: Add fsmount() to create a mount for a superblock
      vfs: syscall: Add fspick() to select a superblock for reconfiguration
      vfs: Add a sample program for the new mount API


 arch/arm/kvm/Kconfig                   |    1 
 arch/arm64/kvm/Kconfig                 |    1 
 arch/mips/kvm/Kconfig                  |    1 
 arch/powerpc/kvm/Kconfig               |    1 
 arch/s390/kvm/Kconfig                  |    1 
 arch/x86/Kconfig                       |    1 
 arch/x86/entry/syscalls/syscall_32.tbl |    6 
 arch/x86/entry/syscalls/syscall_64.tbl |    6 
 arch/x86/kvm/Kconfig                   |    1 
 drivers/base/Kconfig                   |    1 
 drivers/char/tpm/Kconfig               |    1 
 drivers/dma-buf/Kconfig                |    1 
 drivers/gpio/Kconfig                   |    1 
 drivers/iio/Kconfig                    |    1 
 drivers/infiniband/Kconfig             |    1 
 drivers/vfio/Kconfig                   |    1 
 fs/Makefile                            |    4 
 fs/file_table.c                        |    9 -
 fs/fs_context.c                        |  160 ++++++++++-
 fs/fsopen.c                            |  477 ++++++++++++++++++++++++++++++++
 fs/internal.h                          |    4 
 fs/namespace.c                         |  477 ++++++++++++++++++++++++++++----
 fs/notify/fanotify/Kconfig             |    1 
 fs/notify/inotify/Kconfig              |    1 
 include/linux/fs.h                     |    7 
 include/linux/fs_context.h             |   38 ++-
 include/linux/lsm_hooks.h              |    6 
 include/linux/module.h                 |    6 
 include/linux/security.h               |    7 
 include/linux/syscalls.h               |    9 +
 include/uapi/linux/fcntl.h             |    2 
 include/uapi/linux/mount.h             |   62 ++++
 init/Kconfig                           |   10 -
 samples/Kconfig                        |    9 -
 samples/Makefile                       |    2 
 samples/statx/Makefile                 |    7 
 samples/statx/test-statx.c             |  258 -----------------
 samples/vfs/Makefile                   |   10 +
 samples/vfs/test-fsmount.c             |  133 +++++++++
 samples/vfs/test-statx.c               |  267 ++++++++++++++++++
 security/security.c                    |    5 
 41 files changed, 1617 insertions(+), 380 deletions(-)
 create mode 100644 fs/fsopen.c
 delete mode 100644 samples/statx/Makefile
 delete mode 100644 samples/statx/test-statx.c
 create mode 100644 samples/vfs/Makefile
 create mode 100644 samples/vfs/test-fsmount.c
 create mode 100644 samples/vfs/test-statx.c


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/10] vfs: syscall: Add open_tree(2) to reference or clone a mount
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
@ 2019-02-19 17:08 ` David Howells
  2019-02-19 17:08 ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around David Howells
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:08 UTC (permalink / raw)
  To: viro
  Cc: linux-api, linux-fsdevel, dhowells, torvalds, ebiederm,
	linux-security-module

From: Al Viro <viro@zeniv.linux.org.uk>

open_tree(dfd, pathname, flags)

Returns an O_PATH-opened file descriptor or an error.
dfd and pathname specify the location to open, in usual
fashion (see e.g. fstatat(2)).  flags should be an OR of
some of the following:
	* AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW -
same meanings as usual
	* OPEN_TREE_CLOEXEC - make the resulting descriptor
close-on-exec
	* OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE -
instead of opening the location in question, create a detached
mount tree matching the subtree rooted at location specified by
dfd/pathname.  With AT_RECURSIVE the entire subtree is cloned,
without it - only the part within in the mount containing the
location in question.  In other words, the same as mount --rbind
or mount --bind would've taken.  The detached tree will be
dissolved on the final close of obtained file.  Creation of such
detached trees requires the same capabilities as doing mount --bind.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/file_table.c                        |    9 +-
 fs/internal.h                          |    1 
 fs/namespace.c                         |  157 ++++++++++++++++++++++++++++----
 include/linux/fs.h                     |    7 +
 include/linux/syscalls.h               |    1 
 include/uapi/linux/fcntl.h             |    2 
 include/uapi/linux/mount.h             |    6 +
 9 files changed, 158 insertions(+), 27 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 3cf7b533b3d1..ea1b413afd47 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -398,3 +398,4 @@
 384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
 385	i386	io_pgetevents		sys_io_pgetevents		__ia32_compat_sys_io_pgetevents
 386	i386	rseq			sys_rseq			__ia32_sys_rseq
+387	i386	open_tree		sys_open_tree			__ia32_sys_open_tree
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f0b1709a5ffb..0545bed581dc 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -343,6 +343,7 @@
 332	common	statx			__x64_sys_statx
 333	common	io_pgetevents		__x64_sys_io_pgetevents
 334	common	rseq			__x64_sys_rseq
+335	common	open_tree		__x64_sys_open_tree
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/file_table.c b/fs/file_table.c
index 5679e7fcb6b0..10e0a3dcea4d 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -255,6 +255,7 @@ static void __fput(struct file *file)
 	struct dentry *dentry = file->f_path.dentry;
 	struct vfsmount *mnt = file->f_path.mnt;
 	struct inode *inode = file->f_inode;
+	fmode_t mode = file->f_mode;
 
 	if (unlikely(!(file->f_mode & FMODE_OPENED)))
 		goto out;
@@ -277,18 +278,20 @@ static void __fput(struct file *file)
 	if (file->f_op->release)
 		file->f_op->release(inode, file);
 	if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
-		     !(file->f_mode & FMODE_PATH))) {
+		     !(mode & FMODE_PATH))) {
 		cdev_put(inode->i_cdev);
 	}
 	fops_put(file->f_op);
 	put_pid(file->f_owner.pid);
-	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
+	if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
 		i_readcount_dec(inode);
-	if (file->f_mode & FMODE_WRITER) {
+	if (mode & FMODE_WRITER) {
 		put_write_access(inode);
 		__mnt_drop_write(mnt);
 	}
 	dput(dentry);
+	if (unlikely(mode & FMODE_NEED_UNMOUNT))
+		dissolve_on_fput(mnt);
 	mntput(mnt);
 out:
 	file_free(file);
diff --git a/fs/internal.h b/fs/internal.h
index 6a8b71643af4..f3a027c44758 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -94,6 +94,7 @@ extern int __mnt_want_write_file(struct file *);
 extern void __mnt_drop_write(struct vfsmount *);
 extern void __mnt_drop_write_file(struct file *);
 
+extern void dissolve_on_fput(struct vfsmount *);
 /*
  * fs_struct.c
  */
diff --git a/fs/namespace.c b/fs/namespace.c
index bb9b7db1c66c..112d46f26fc3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>		/* init_rootfs */
 #include <linux/fs_struct.h>	/* get_fs_root et.al. */
 #include <linux/fsnotify.h>	/* fsnotify_vfsmount_delete */
+#include <linux/file.h>
 #include <linux/uaccess.h>
 #include <linux/proc_ns.h>
 #include <linux/magic.h>
@@ -1830,6 +1831,21 @@ struct vfsmount *collect_mounts(const struct path *path)
 	return &tree->mnt;
 }
 
+static void free_mnt_ns(struct mnt_namespace *);
+static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *, bool);
+
+void dissolve_on_fput(struct vfsmount *mnt)
+{
+	struct mnt_namespace *ns;
+	namespace_lock();
+	lock_mount_hash();
+	ns = real_mount(mnt)->mnt_ns;
+	umount_tree(real_mount(mnt), UMOUNT_CONNECTED);
+	unlock_mount_hash();
+	namespace_unlock();
+	free_mnt_ns(ns);
+}
+
 void drop_collected_mounts(struct vfsmount *mnt)
 {
 	namespace_lock();
@@ -2220,6 +2236,30 @@ static bool has_locked_children(struct mount *mnt, struct dentry *dentry)
 	return false;
 }
 
+static struct mount *__do_loopback(struct path *old_path, int recurse)
+{
+	struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt);
+
+	if (IS_MNT_UNBINDABLE(old))
+		return mnt;
+
+	if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations)
+		return mnt;
+
+	if (!recurse && has_locked_children(old, old_path->dentry))
+		return mnt;
+
+	if (recurse)
+		mnt = copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE);
+	else
+		mnt = clone_mnt(old, old_path->dentry, 0);
+
+	if (!IS_ERR(mnt))
+		mnt->mnt.mnt_flags &= ~MNT_LOCKED;
+
+	return mnt;
+}
+
 /*
  * do loopback mount.
  */
@@ -2227,7 +2267,7 @@ static int do_loopback(struct path *path, const char *old_name,
 				int recurse)
 {
 	struct path old_path;
-	struct mount *mnt = NULL, *old, *parent;
+	struct mount *mnt = NULL, *parent;
 	struct mountpoint *mp;
 	int err;
 	if (!old_name || !*old_name)
@@ -2241,38 +2281,21 @@ static int do_loopback(struct path *path, const char *old_name,
 		goto out;
 
 	mp = lock_mount(path);
-	err = PTR_ERR(mp);
-	if (IS_ERR(mp))
+	if (IS_ERR(mp)) {
+		err = PTR_ERR(mp);
 		goto out;
+	}
 
-	old = real_mount(old_path.mnt);
 	parent = real_mount(path->mnt);
-
-	err = -EINVAL;
-	if (IS_MNT_UNBINDABLE(old))
-		goto out2;
-
 	if (!check_mnt(parent))
 		goto out2;
 
-	if (!check_mnt(old) && old_path.dentry->d_op != &ns_dentry_operations)
-		goto out2;
-
-	if (!recurse && has_locked_children(old, old_path.dentry))
-		goto out2;
-
-	if (recurse)
-		mnt = copy_tree(old, old_path.dentry, CL_COPY_MNT_NS_FILE);
-	else
-		mnt = clone_mnt(old, old_path.dentry, 0);
-
+	mnt = __do_loopback(&old_path, recurse);
 	if (IS_ERR(mnt)) {
 		err = PTR_ERR(mnt);
 		goto out2;
 	}
 
-	mnt->mnt.mnt_flags &= ~MNT_LOCKED;
-
 	err = graft_tree(mnt, parent, mp);
 	if (err) {
 		lock_mount_hash();
@@ -2286,6 +2309,96 @@ static int do_loopback(struct path *path, const char *old_name,
 	return err;
 }
 
+static struct file *open_detached_copy(struct path *path, bool recursive)
+{
+	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct mnt_namespace *ns = alloc_mnt_ns(user_ns, true);
+	struct mount *mnt, *p;
+	struct file *file;
+
+	if (IS_ERR(ns))
+		return ERR_CAST(ns);
+
+	namespace_lock();
+	mnt = __do_loopback(path, recursive);
+	if (IS_ERR(mnt)) {
+		namespace_unlock();
+		free_mnt_ns(ns);
+		return ERR_CAST(mnt);
+	}
+
+	lock_mount_hash();
+	for (p = mnt; p; p = next_mnt(p, mnt)) {
+		p->mnt_ns = ns;
+		ns->mounts++;
+	}
+	ns->root = mnt;
+	list_add_tail(&ns->list, &mnt->mnt_list);
+	mntget(&mnt->mnt);
+	unlock_mount_hash();
+	namespace_unlock();
+
+	mntput(path->mnt);
+	path->mnt = &mnt->mnt;
+	file = dentry_open(path, O_PATH, current_cred());
+	if (IS_ERR(file))
+		dissolve_on_fput(path->mnt);
+	else
+		file->f_mode |= FMODE_NEED_UNMOUNT;
+	return file;
+}
+
+SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags)
+{
+	struct file *file;
+	struct path path;
+	int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
+	bool detached = flags & OPEN_TREE_CLONE;
+	int error;
+	int fd;
+
+	BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
+
+	if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE |
+		      AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE |
+		      OPEN_TREE_CLOEXEC))
+		return -EINVAL;
+
+	if ((flags & (AT_RECURSIVE | OPEN_TREE_CLONE)) == AT_RECURSIVE)
+		return -EINVAL;
+
+	if (flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (flags & AT_SYMLINK_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
+	if (flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+
+	if (detached && !may_mount())
+		return -EPERM;
+
+	fd = get_unused_fd_flags(flags & O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	error = user_path_at(dfd, filename, lookup_flags, &path);
+	if (unlikely(error)) {
+		file = ERR_PTR(error);
+	} else {
+		if (detached)
+			file = open_detached_copy(&path, flags & AT_RECURSIVE);
+		else
+			file = dentry_open(&path, O_PATH, current_cred());
+		path_put(&path);
+	}
+	if (IS_ERR(file)) {
+		put_unused_fd(fd);
+		return PTR_ERR(file);
+	}
+	fd_install(fd, file);
+	return fd;
+}
+
 /*
  * Don't allow locked mount flags to be cleared.
  *
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3e85cb8e8c20..c4ef82689cab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -159,10 +159,13 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define FMODE_NONOTIFY		((__force fmode_t)0x4000000)
 
 /* File is capable of returning -EAGAIN if I/O will block */
-#define FMODE_NOWAIT	((__force fmode_t)0x8000000)
+#define FMODE_NOWAIT		((__force fmode_t)0x8000000)
+
+/* File represents mount that needs unmounting */
+#define FMODE_NEED_UNMOUNT	((__force fmode_t)0x10000000)
 
 /* File does not contribute to nr_files count */
-#define FMODE_NOACCOUNT	((__force fmode_t)0x20000000)
+#define FMODE_NOACCOUNT		((__force fmode_t)0x20000000)
 
 /*
  * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 257cccba3062..07bca6796498 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -926,6 +926,7 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 			 int flags, uint32_t sig);
+asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..594b85f7cb86 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -90,5 +90,7 @@
 #define AT_STATX_FORCE_SYNC	0x2000	/* - Force the attributes to be sync'd with the server */
 #define AT_STATX_DONT_SYNC	0x4000	/* - Don't sync attributes with the server */
 
+#define AT_RECURSIVE		0x8000	/* Apply to the entire subtree */
+
 
 #endif /* _UAPI_LINUX_FCNTL_H */
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 3f9ec42510b0..fd7ae2e7eccf 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -55,4 +55,10 @@
 #define MS_MGC_VAL 0xC0ED0000
 #define MS_MGC_MSK 0xffff0000
 
+/*
+ * open_tree() flags.
+ */
+#define OPEN_TREE_CLONE		1		/* Clone the target tree and attach the clone */
+#define OPEN_TREE_CLOEXEC	O_CLOEXEC	/* Close the file on execve() */
+
 #endif /* _UAPI_LINUX_MOUNT_H */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
  2019-02-19 17:08 ` [PATCH 01/10] vfs: syscall: Add open_tree(2) to reference or clone a mount David Howells
@ 2019-02-19 17:08 ` David Howells
  2019-02-20 12:32   ` Alan Jenkins
                     ` (2 more replies)
  2019-02-19 17:08 ` [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE David Howells
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:08 UTC (permalink / raw)
  To: viro
  Cc: linux-api, linux-fsdevel, dhowells, torvalds, ebiederm,
	linux-security-module

Add a move_mount() system call that will move a mount from one place to
another and, in the next commit, allow to attach an unattached mount tree.

The new system call looks like the following:

	int move_mount(int from_dfd, const char *from_path,
		       int to_dfd, const char *to_path,
		       unsigned int flags);

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/namespace.c                         |  126 ++++++++++++++++++++++++--------
 include/linux/lsm_hooks.h              |    6 ++
 include/linux/security.h               |    7 ++
 include/linux/syscalls.h               |    3 +
 include/uapi/linux/mount.h             |   11 +++
 security/security.c                    |    5 +
 8 files changed, 129 insertions(+), 31 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ea1b413afd47..76d092b7d1b0 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -399,3 +399,4 @@
 385	i386	io_pgetevents		sys_io_pgetevents		__ia32_compat_sys_io_pgetevents
 386	i386	rseq			sys_rseq			__ia32_sys_rseq
 387	i386	open_tree		sys_open_tree			__ia32_sys_open_tree
+388	i386	move_mount		sys_move_mount			__ia32_sys_move_mount
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 0545bed581dc..37ba4e65eee6 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -344,6 +344,7 @@
 333	common	io_pgetevents		__x64_sys_io_pgetevents
 334	common	rseq			__x64_sys_rseq
 335	common	open_tree		__x64_sys_open_tree
+336	common	move_mount		__x64_sys_move_mount
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/namespace.c b/fs/namespace.c
index 112d46f26fc3..f10122028a11 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2537,72 +2537,81 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 	return 0;
 }
 
-static int do_move_mount(struct path *path, const char *old_name)
+static int do_move_mount(struct path *old_path, struct path *new_path)
 {
-	struct path old_path, parent_path;
+	struct path parent_path = {.mnt = NULL, .dentry = NULL};
 	struct mount *p;
 	struct mount *old;
 	struct mountpoint *mp;
 	int err;
-	if (!old_name || !*old_name)
-		return -EINVAL;
-	err = kern_path(old_name, LOOKUP_FOLLOW, &old_path);
-	if (err)
-		return err;
 
-	mp = lock_mount(path);
-	err = PTR_ERR(mp);
+	mp = lock_mount(new_path);
 	if (IS_ERR(mp))
-		goto out;
+		return PTR_ERR(mp);
 
-	old = real_mount(old_path.mnt);
-	p = real_mount(path->mnt);
+	old = real_mount(old_path->mnt);
+	p = real_mount(new_path->mnt);
 
 	err = -EINVAL;
 	if (!check_mnt(p) || !check_mnt(old))
-		goto out1;
+		goto out;
 
-	if (old->mnt.mnt_flags & MNT_LOCKED)
-		goto out1;
+	if (!mnt_has_parent(old))
+		goto out;
 
-	err = -EINVAL;
-	if (old_path.dentry != old_path.mnt->mnt_root)
-		goto out1;
+	if (old->mnt.mnt_flags & MNT_LOCKED)
+		goto out;
 
-	if (!mnt_has_parent(old))
-		goto out1;
+	if (old_path->dentry != old_path->mnt->mnt_root)
+		goto out;
 
-	if (d_is_dir(path->dentry) !=
-	      d_is_dir(old_path.dentry))
-		goto out1;
+	if (d_is_dir(new_path->dentry) !=
+	    d_is_dir(old_path->dentry))
+		goto out;
 	/*
 	 * Don't move a mount residing in a shared parent.
 	 */
 	if (IS_MNT_SHARED(old->mnt_parent))
-		goto out1;
+		goto out;
 	/*
 	 * Don't move a mount tree containing unbindable mounts to a destination
 	 * mount which is shared.
 	 */
 	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
-		goto out1;
+		goto out;
 	err = -ELOOP;
 	for (; mnt_has_parent(p); p = p->mnt_parent)
 		if (p == old)
-			goto out1;
+			goto out;
 
-	err = attach_recursive_mnt(old, real_mount(path->mnt), mp, &parent_path);
+	err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp,
+				   &parent_path);
 	if (err)
-		goto out1;
+		goto out;
 
 	/* if the mount is moved, it should no longer be expire
 	 * automatically */
 	list_del_init(&old->mnt_expire);
-out1:
-	unlock_mount(mp);
 out:
+	unlock_mount(mp);
 	if (!err)
 		path_put(&parent_path);
+	return err;
+}
+
+static int do_move_mount_old(struct path *path, const char *old_name)
+{
+	struct path old_path;
+	int err;
+
+	if (!old_name || !*old_name)
+		return -EINVAL;
+
+	err = kern_path(old_name, LOOKUP_FOLLOW, &old_path);
+	if (err)
+		return err;
+
+	err = do_move_mount(&old_path, path);
 	path_put(&old_path);
 	return err;
 }
@@ -3050,7 +3059,7 @@ long do_mount(const char *dev_name, const char __user *dir_name,
 	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
 		retval = do_change_type(&path, flags);
 	else if (flags & MS_MOVE)
-		retval = do_move_mount(&path, dev_name);
+		retval = do_move_mount_old(&path, dev_name);
 	else
 		retval = do_new_mount(&path, type_page, sb_flags, mnt_flags,
 				      dev_name, data_page);
@@ -3278,6 +3287,61 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
 	return ksys_mount(dev_name, dir_name, type, flags, data);
 }
 
+/*
+ * Move a mount from one place to another.
+ *
+ * Note the flags value is a combination of MOVE_MOUNT_* flags.
+ */
+SYSCALL_DEFINE5(move_mount,
+		int, from_dfd, const char *, from_pathname,
+		int, to_dfd, const char *, to_pathname,
+		unsigned int, flags)
+{
+	struct path from_path, to_path;
+	unsigned int lflags;
+	int ret = 0;
+
+	if (!may_mount())
+		return -EPERM;
+
+	if (flags & ~MOVE_MOUNT__MASK)
+		return -EINVAL;
+
+	/* If someone gives a pathname, they aren't permitted to move
+	 * from an fd that requires unmount as we can't get at the flag
+	 * to clear it afterwards.
+	 */
+	lflags = 0;
+	if (flags & MOVE_MOUNT_F_SYMLINKS)	lflags |= LOOKUP_FOLLOW;
+	if (flags & MOVE_MOUNT_F_AUTOMOUNTS)	lflags |= LOOKUP_AUTOMOUNT;
+	if (flags & MOVE_MOUNT_F_EMPTY_PATH)	lflags |= LOOKUP_EMPTY;
+
+	ret = user_path_at(from_dfd, from_pathname, lflags, &from_path);
+	if (ret < 0)
+		return ret;
+
+	lflags = 0;
+	if (flags & MOVE_MOUNT_T_SYMLINKS)	lflags |= LOOKUP_FOLLOW;
+	if (flags & MOVE_MOUNT_T_AUTOMOUNTS)	lflags |= LOOKUP_AUTOMOUNT;
+	if (flags & MOVE_MOUNT_T_EMPTY_PATH)	lflags |= LOOKUP_EMPTY;
+
+	ret = user_path_at(to_dfd, to_pathname, lflags, &to_path);
+	if (ret < 0)
+		goto out_from;
+
+	ret = security_move_mount(&from_path, &to_path);
+	if (ret < 0)
+		goto out_to;
+
+	ret = do_move_mount(&from_path, &to_path);
+
+out_to:
+	path_put(&to_path);
+out_from:
+	path_put(&from_path);
+	return ret;
+}
+
 /*
  * Return true if path is reachable from root
  *
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 356e78fe90a8..89892a031ecf 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -160,6 +160,10 @@
  *	Parse a string of security data filling in the opts structure
  *	@options string containing all mount options known by the LSM
  *	@opts binary data structure usable by the LSM
+ * @move_mount:
+ *	Check permission before a mount is moved.
+ *	@from_path indicates the mount that is going to be moved.
+ *	@to_path indicates the mountpoint that will be mounted upon.
  * @dentry_init_security:
  *	Compute a context for a dentry as the inode is not yet available
  *	since NFSv4 has no label backed by an EA anyway.
@@ -1500,6 +1504,7 @@ union security_list_options {
 					unsigned long *set_kern_flags);
 	int (*sb_add_mnt_opt)(const char *option, const char *val, int len,
 			      void **mnt_opts);
+	int (*move_mount)(const struct path *from_path, const struct path *to_path);
 	int (*dentry_init_security)(struct dentry *dentry, int mode,
 					const struct qstr *name, void **ctx,
 					u32 *ctxlen);
@@ -1835,6 +1840,7 @@ struct security_hook_heads {
 	struct hlist_head sb_set_mnt_opts;
 	struct hlist_head sb_clone_mnt_opts;
 	struct hlist_head sb_add_mnt_opt;
+	struct hlist_head move_mount;
 	struct hlist_head dentry_init_security;
 	struct hlist_head dentry_create_files_as;
 #ifdef CONFIG_SECURITY_PATH
diff --git a/include/linux/security.h b/include/linux/security.h
index f28a1ebfd78e..edc48e61c8b4 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -247,6 +247,7 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
 				unsigned long *set_kern_flags);
 int security_add_mnt_opt(const char *option, const char *val,
 				int len, void **mnt_opts);
+int security_move_mount(const struct path *from_path, const struct path *to_path);
 int security_dentry_init_security(struct dentry *dentry, int mode,
 					const struct qstr *name, void **ctx,
 					u32 *ctxlen);
@@ -609,6 +610,12 @@ static inline int security_add_mnt_opt(const char *option, const char *val,
 	return 0;
 }
 
+static inline int security_move_mount(const struct path *from_path,
+				      const struct path *to_path)
+{
+	return 0;
+}
+
 static inline int security_inode_alloc(struct inode *inode)
 {
 	return 0;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 07bca6796498..13289cdf6b52 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -927,6 +927,9 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 			 int flags, uint32_t sig);
 asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
+asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
+			       int to_dfd, const char __user *to_path,
+			       unsigned int ms_flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index fd7ae2e7eccf..3634e065836c 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -61,4 +61,15 @@
 #define OPEN_TREE_CLONE		1		/* Clone the target tree and attach the clone */
 #define OPEN_TREE_CLOEXEC	O_CLOEXEC	/* Close the file on execve() */
 
+/*
+ * move_mount() flags.
+ */
+#define MOVE_MOUNT_F_SYMLINKS		0x00000001 /* Follow symlinks on from path */
+#define MOVE_MOUNT_F_AUTOMOUNTS		0x00000002 /* Follow automounts on from path */
+#define MOVE_MOUNT_F_EMPTY_PATH		0x00000004 /* Empty from path permitted */
+#define MOVE_MOUNT_T_SYMLINKS		0x00000010 /* Follow symlinks on to path */
+#define MOVE_MOUNT_T_AUTOMOUNTS		0x00000020 /* Follow automounts on to path */
+#define MOVE_MOUNT_T_EMPTY_PATH		0x00000040 /* Empty to path permitted */
+#define MOVE_MOUNT__MASK		0x00000077
+
 #endif /* _UAPI_LINUX_MOUNT_H */
diff --git a/security/security.c b/security/security.c
index 5759339319dc..6fe31b31dc74 100644
--- a/security/security.c
+++ b/security/security.c
@@ -476,6 +476,11 @@ int security_add_mnt_opt(const char *option, const char *val, int len,
 }
 EXPORT_SYMBOL(security_add_mnt_opt);
 
+int security_move_mount(const struct path *from_path, const struct path *to_path)
+{
+	return call_int_hook(move_mount, 0, from_path, to_path);
+}
+
 int security_inode_alloc(struct inode *inode)
 {
 	inode->i_security = NULL;


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
  2019-02-19 17:08 ` [PATCH 01/10] vfs: syscall: Add open_tree(2) to reference or clone a mount David Howells
  2019-02-19 17:08 ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around David Howells
@ 2019-02-19 17:08 ` David Howells
  2019-02-20 18:59   ` Alan Jenkins
  2019-02-26 17:45   ` Alan Jenkins
  2019-02-19 17:08 ` [PATCH 04/10] Make anon_inodes unconditional David Howells
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:08 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, dhowells, torvalds, ebiederm, linux-security-module

Allow a detached tree created by open_tree(..., OPEN_TREE_CLONE) to be
attached by move_mount(2).

If by the time of final fput() of OPEN_TREE_CLONE-opened file its tree is
not detached anymore, it won't be dissolved.  move_mount(2) is adjusted
to handle detached source.

That gives us equivalents of mount --bind and mount --rbind.

Thanks also to Alan Jenkins <alan.christopher.jenkins@gmail.com> for
providing a whole bunch of ways to break things using this interface.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 fs/namespace.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f10122028a11..56423c60ac7e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1840,10 +1840,16 @@ void dissolve_on_fput(struct vfsmount *mnt)
 	namespace_lock();
 	lock_mount_hash();
 	ns = real_mount(mnt)->mnt_ns;
-	umount_tree(real_mount(mnt), UMOUNT_CONNECTED);
+	if (ns) {
+		if (is_anon_ns(ns))
+			umount_tree(real_mount(mnt), UMOUNT_CONNECTED);
+		else
+			ns = NULL;
+	}
 	unlock_mount_hash();
 	namespace_unlock();
-	free_mnt_ns(ns);
+	if (ns)
+		free_mnt_ns(ns);
 }
 
 void drop_collected_mounts(struct vfsmount *mnt)
@@ -2079,6 +2085,10 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 		attach_mnt(source_mnt, dest_mnt, dest_mp);
 		touch_mnt_namespace(source_mnt->mnt_ns);
 	} else {
+		if (source_mnt->mnt_ns) {
+			/* move from anon - the caller will destroy */
+			list_del_init(&source_mnt->mnt_ns->list);
+		}
 		mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
 		commit_tree(source_mnt);
 	}
@@ -2537,13 +2547,37 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 	return 0;
 }
 
+/*
+ * Check that there aren't references to earlier/same mount namespaces in the
+ * specified subtree.  Such references can act as pins for mount namespaces
+ * that aren't checked by the mount-cycle checking code, thereby allowing
+ * cycles to be made.
+ */
+static bool check_for_nsfs_mounts(struct mount *subtree)
+{
+	struct mount *p;
+	bool ret = false;
+
+	lock_mount_hash();
+	for (p = subtree; p; p = next_mnt(p, subtree))
+		if (mnt_ns_loop(p->mnt.mnt_root))
+			goto out;
+
+	ret = true;
+out:
+	unlock_mount_hash();
+	return ret;
+}
+
 static int do_move_mount(struct path *old_path, struct path *new_path)
 {
 	struct path parent_path = {.mnt = NULL, .dentry = NULL};
+	struct mnt_namespace *ns;
 	struct mount *p;
 	struct mount *old;
 	struct mountpoint *mp;
 	int err;
+	bool attached;
 
 	mp = lock_mount(new_path);
 	if (IS_ERR(mp))
@@ -2551,12 +2585,19 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
 
 	old = real_mount(old_path->mnt);
 	p = real_mount(new_path->mnt);
+	attached = mnt_has_parent(old);
+	ns = old->mnt_ns;
 
 	err = -EINVAL;
-	if (!check_mnt(p) || !check_mnt(old))
+	/* The mountpoint must be in our namespace. */
+	if (!check_mnt(p))
 		goto out;
 
-	if (!mnt_has_parent(old))
+	/* The thing moved should be either ours or completely unattached. */
+	if (attached && !check_mnt(old))
+		goto out;
+
+	if (!attached && !is_anon_ns(ns))
 		goto out;
 
 	if (old->mnt.mnt_flags & MNT_LOCKED)
@@ -2571,7 +2612,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
 	/*
 	 * Don't move a mount residing in a shared parent.
 	 */
-	if (IS_MNT_SHARED(old->mnt_parent))
+	if (attached && IS_MNT_SHARED(old->mnt_parent))
 		goto out;
 	/*
 	 * Don't move a mount tree containing unbindable mounts to a destination
@@ -2580,12 +2621,14 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
 	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
 		goto out;
 	err = -ELOOP;
+	if (!check_for_nsfs_mounts(old))
+		goto out;
 	for (; mnt_has_parent(p); p = p->mnt_parent)
 		if (p == old)
 			goto out;
 
 	err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp,
-				   &parent_path);
+				   attached ? &parent_path : NULL);
 	if (err)
 		goto out;
 
@@ -2594,8 +2637,11 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
 	list_del_init(&old->mnt_expire);
 out:
 	unlock_mount(mp);
-	if (!err)
+	if (!err) {
 		path_put(&parent_path);
+		if (!attached)
+			free_mnt_ns(ns);
+	}
 	return err;
 }
 
@@ -3289,6 +3335,8 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
 
 /*
  * Move a mount from one place to another.
+ * In combination with open_tree(OPEN_TREE_CLONE [| AT_RECURSIVE]) it can be
+ * used to copy a mount subtree.
  *
  * Note the flags value is a combination of MOVE_MOUNT_* flags.
  */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 04/10] Make anon_inodes unconditional
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
                   ` (2 preceding siblings ...)
  2019-02-19 17:08 ` [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE David Howells
@ 2019-02-19 17:08 ` David Howells
  2019-02-19 17:09 ` [PATCH 05/10] vfs: syscall: Add fsopen() to prepare for superblock creation David Howells
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:08 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, dhowells, torvalds, ebiederm, linux-security-module

Make the anon_inodes facility unconditional so that it can be used by core
VFS code.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 arch/arm/kvm/Kconfig       |    1 -
 arch/arm64/kvm/Kconfig     |    1 -
 arch/mips/kvm/Kconfig      |    1 -
 arch/powerpc/kvm/Kconfig   |    1 -
 arch/s390/kvm/Kconfig      |    1 -
 arch/x86/Kconfig           |    1 -
 arch/x86/kvm/Kconfig       |    1 -
 drivers/base/Kconfig       |    1 -
 drivers/char/tpm/Kconfig   |    1 -
 drivers/dma-buf/Kconfig    |    1 -
 drivers/gpio/Kconfig       |    1 -
 drivers/iio/Kconfig        |    1 -
 drivers/infiniband/Kconfig |    1 -
 drivers/vfio/Kconfig       |    1 -
 fs/Makefile                |    2 +-
 fs/notify/fanotify/Kconfig |    1 -
 fs/notify/inotify/Kconfig  |    1 -
 init/Kconfig               |   10 ----------
 18 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 3f5320f46de2..f591026347a5 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -22,7 +22,6 @@ config KVM
 	bool "Kernel-based Virtual Machine (KVM) support"
 	depends on MMU && OF
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select ARM_GIC
 	select ARM_GIC_V3
 	select ARM_GIC_V3_ITS
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a3f85624313e..a67121d419a2 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -23,7 +23,6 @@ config KVM
 	depends on OF
 	select MMU_NOTIFIER
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
 	select KVM_MMIO
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index 4528bc9c3cb1..eac25aef21e0 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -21,7 +21,6 @@ config KVM
 	depends on MIPS_FP_SUPPORT
 	select EXPORT_UASM
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
 	select KVM_MMIO
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index bfdde04e4905..f53997a8ca62 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -20,7 +20,6 @@ if VIRTUALIZATION
 config KVM
 	bool
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
 	select SRCU
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 767453faacfc..1816ee48eadd 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -21,7 +21,6 @@ config KVM
 	prompt "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM
 	select PREEMPT_NOTIFIERS
-	select ANON_INODES
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select HAVE_KVM_VCPU_ASYNC_IOCTL
 	select HAVE_KVM_EVENTFD
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 15af091611e2..7dbea4535d9b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -46,7 +46,6 @@ config X86
 	#
 	select ACPI_LEGACY_TABLES_LOOKUP	if ACPI
 	select ACPI_SYSTEM_POWER_STATES_SUPPORT	if ACPI
-	select ANON_INODES
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_CLOCKSOURCE_INIT
 	select ARCH_DISCARD_MEMBLOCK
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 72fa955f4a15..fc042419e670 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -27,7 +27,6 @@ config KVM
 	depends on X86_LOCAL_APIC
 	select PREEMPT_NOTIFIERS
 	select MMU_NOTIFIER
-	select ANON_INODES
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQFD
 	select IRQ_BYPASS_MANAGER
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 3e63a900b330..ae213ed2a7c8 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,7 +174,6 @@ source "drivers/base/regmap/Kconfig"
 config DMA_SHARED_BUFFER
 	bool
 	default n
-	select ANON_INODES
 	select IRQ_WORK
 	help
 	  This option enables the framework for buffer-sharing between
diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index 536e55d3919f..f3e4bc490cf0 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -157,7 +157,6 @@ config TCG_CRB
 config TCG_VTPM_PROXY
 	tristate "VTPM Proxy Interface"
 	depends on TCG_TPM
-	select ANON_INODES
 	---help---
 	  This driver proxies for an emulated TPM (vTPM) running in userspace.
 	  A device /dev/vtpmx is provided that creates a device pair
diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
index 2e5a0faa2cb1..3fc9c2efc583 100644
--- a/drivers/dma-buf/Kconfig
+++ b/drivers/dma-buf/Kconfig
@@ -3,7 +3,6 @@ menu "DMABUF options"
 config SYNC_FILE
 	bool "Explicit Synchronization Framework"
 	default n
-	select ANON_INODES
 	select DMA_SHARED_BUFFER
 	---help---
 	  The Sync File Framework adds explicit syncronization via
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index b5a2845347ec..07b9b838e929 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -12,7 +12,6 @@ config ARCH_HAVE_CUSTOM_GPIO_H
 
 menuconfig GPIOLIB
 	bool "GPIO Support"
-	select ANON_INODES
 	help
 	  This enables GPIO support through the generic GPIO library.
 	  You only need to enable this, if you also want to enable
diff --git a/drivers/iio/Kconfig b/drivers/iio/Kconfig
index d08aeb41cd07..1dec0fecb6ef 100644
--- a/drivers/iio/Kconfig
+++ b/drivers/iio/Kconfig
@@ -4,7 +4,6 @@
 
 menuconfig IIO
 	tristate "Industrial I/O support"
-	select ANON_INODES
 	help
 	  The industrial I/O subsystem provides a unified framework for
 	  drivers for many different types of embedded sensors using a
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 0a3ec7c726ec..2793ea38649b 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -25,7 +25,6 @@ config INFINIBAND_USER_MAD
 
 config INFINIBAND_USER_ACCESS
 	tristate "InfiniBand userspace access (verbs and CM)"
-	select ANON_INODES
 	depends on MMU
 	---help---
 	  Userspace InfiniBand access support.  This enables the
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 9de5ed38da83..3798d77d131c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -22,7 +22,6 @@ menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	depends on IOMMU_API
 	select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
-	select ANON_INODES
 	help
 	  VFIO provides a framework for secure userspace device drivers.
 	  See Documentation/vfio.txt for more details.
diff --git a/fs/Makefile b/fs/Makefile
index 9a0b8003f069..ae681523b4b1 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -25,7 +25,7 @@ obj-$(CONFIG_PROC_FS) += proc_namespace.o
 
 obj-y				+= notify/
 obj-$(CONFIG_EPOLL)		+= eventpoll.o
-obj-$(CONFIG_ANON_INODES)	+= anon_inodes.o
+obj-y				+= anon_inodes.o
 obj-$(CONFIG_SIGNALFD)		+= signalfd.o
 obj-$(CONFIG_TIMERFD)		+= timerfd.o
 obj-$(CONFIG_EVENTFD)		+= eventfd.o
diff --git a/fs/notify/fanotify/Kconfig b/fs/notify/fanotify/Kconfig
index 41355ce74ac0..f5b0b3af32dd 100644
--- a/fs/notify/fanotify/Kconfig
+++ b/fs/notify/fanotify/Kconfig
@@ -1,7 +1,6 @@
 config FANOTIFY
 	bool "Filesystem wide access notification"
 	select FSNOTIFY
-	select ANON_INODES
 	default n
 	---help---
 	   Say Y here to enable fanotify support.  fanotify is a file access
diff --git a/fs/notify/inotify/Kconfig b/fs/notify/inotify/Kconfig
index b981fc0c8379..0161c74e76e2 100644
--- a/fs/notify/inotify/Kconfig
+++ b/fs/notify/inotify/Kconfig
@@ -1,6 +1,5 @@
 config INOTIFY_USER
 	bool "Inotify support for userspace"
-	select ANON_INODES
 	select FSNOTIFY
 	default y
 	---help---
diff --git a/init/Kconfig b/init/Kconfig
index d47cb77a220e..fea23d3f44dd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1141,9 +1141,6 @@ config LD_DEAD_CODE_DATA_ELIMINATION
 config SYSCTL
 	bool
 
-config ANON_INODES
-	bool
-
 config HAVE_UID16
 	bool
 
@@ -1348,14 +1345,12 @@ config HAVE_FUTEX_CMPXCHG
 config EPOLL
 	bool "Enable eventpoll support" if EXPERT
 	default y
-	select ANON_INODES
 	help
 	  Disabling this option will cause the kernel to be built without
 	  support for epoll family of system calls.
 
 config SIGNALFD
 	bool "Enable signalfd() system call" if EXPERT
-	select ANON_INODES
 	default y
 	help
 	  Enable the signalfd() system call that allows to receive signals
@@ -1365,7 +1360,6 @@ config SIGNALFD
 
 config TIMERFD
 	bool "Enable timerfd() system call" if EXPERT
-	select ANON_INODES
 	default y
 	help
 	  Enable the timerfd() system call that allows to receive timer
@@ -1375,7 +1369,6 @@ config TIMERFD
 
 config EVENTFD
 	bool "Enable eventfd() system call" if EXPERT
-	select ANON_INODES
 	default y
 	help
 	  Enable the eventfd() system call that allows to receive both
@@ -1477,7 +1470,6 @@ config KALLSYMS_BASE_RELATIVE
 # syscall, maps, verifier
 config BPF_SYSCALL
 	bool "Enable bpf() system call"
-	select ANON_INODES
 	select BPF
 	select IRQ_WORK
 	default n
@@ -1494,7 +1486,6 @@ config BPF_JIT_ALWAYS_ON
 
 config USERFAULTFD
 	bool "Enable userfaultfd() system call"
-	select ANON_INODES
 	depends on MMU
 	help
 	  Enable the userfaultfd() system call that allows to intercept and
@@ -1561,7 +1552,6 @@ config PERF_EVENTS
 	bool "Kernel performance events and counters"
 	default y if PROFILING
 	depends on HAVE_PERF_EVENTS
-	select ANON_INODES
 	select IRQ_WORK
 	select SRCU
 	help


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 05/10] vfs: syscall: Add fsopen() to prepare for superblock creation
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
                   ` (3 preceding siblings ...)
  2019-02-19 17:08 ` [PATCH 04/10] Make anon_inodes unconditional David Howells
@ 2019-02-19 17:09 ` David Howells
  2019-02-19 17:09 ` [PATCH 06/10] vfs: Implement logging through fs_context David Howells
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:09 UTC (permalink / raw)
  To: viro
  Cc: linux-api, linux-fsdevel, dhowells, torvalds, ebiederm,
	linux-security-module

Provide an fsopen() system call that starts the process of preparing to
create a superblock that will then be mountable, using an fd as a context
handle.  fsopen() is given the name of the filesystem that will be used:

	int mfd = fsopen(const char *fsname, unsigned int flags);

where flags can be 0 or FSOPEN_CLOEXEC.

For example:

	sfd = fsopen("ext4", FSOPEN_CLOEXEC);
	fsconfig(sfd, FSCONFIG_SET_PATH, "source", "/dev/sda1", AT_FDCWD);
	fsconfig(sfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
	fsconfig(sfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
	fsconfig(sfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
	fsconfig(sfd, FSCONFIG_SET_STRING, "sb", "1", 0);
	fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
	fsinfo(sfd, NULL, ...); // query new superblock attributes
	mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MS_RELATIME);
	move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

	sfd = fsopen("afs", -1);
	fsconfig(fd, FSCONFIG_SET_STRING, "source",
		 "#grand.central.org:root.cell", 0);
	fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
	mfd = fsmount(sfd, 0, MS_NODEV);
	move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

If an error is reported at any step, an error message may be available to be
read() back (ENODATA will be reported if there isn't an error available) in
the form:

	"e <subsys>:<problem>"
	"e SELinux:Mount on mountpoint not permitted"

Once fsmount() has been called, further fsconfig() calls will incur EBUSY,
even if the fsmount() fails.  read() is still possible to retrieve error
information.

The fsopen() syscall creates a mount context and hangs it of the fd that it
returns.

Netlink is not used because it is optional and would make the core VFS
dependent on the networking layer and also potentially add network
namespace issues.

Note that, for the moment, the caller must have SYS_CAP_ADMIN to use
fsopen().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/Makefile                            |    2 -
 fs/fs_context.c                        |    4 +
 fs/fsopen.c                            |   88 ++++++++++++++++++++++++++++++++
 include/linux/fs_context.h             |   16 ++++++
 include/linux/syscalls.h               |    1 
 include/uapi/linux/mount.h             |    5 ++
 8 files changed, 117 insertions(+), 1 deletion(-)
 create mode 100644 fs/fsopen.c

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 76d092b7d1b0..1647fefd2969 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -400,3 +400,4 @@
 386	i386	rseq			sys_rseq			__ia32_sys_rseq
 387	i386	open_tree		sys_open_tree			__ia32_sys_open_tree
 388	i386	move_mount		sys_move_mount			__ia32_sys_move_mount
+389	i386	fsopen			sys_fsopen			__ia32_sys_fsopen
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 37ba4e65eee6..235d33dbccb2 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -345,6 +345,7 @@
 334	common	rseq			__x64_sys_rseq
 335	common	open_tree		__x64_sys_open_tree
 336	common	move_mount		__x64_sys_move_mount
+337	common	fsopen			__x64_sys_fsopen
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/Makefile b/fs/Makefile
index ae681523b4b1..e3ea8093b178 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -13,7 +13,7 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o splice.o sync.o utimes.o d_path.o \
 		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
-		fs_context.o fs_parser.o
+		fs_context.o fs_parser.o fsopen.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o block_dev.o direct-io.o mpage.o
diff --git a/fs/fs_context.c b/fs/fs_context.c
index 87e3546b9a52..eb806fae3117 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -271,6 +271,8 @@ static struct fs_context *alloc_fs_context(struct file_system_type *fs_type,
 	fc->cred	= get_current_cred();
 	fc->net_ns	= get_net(current->nsproxy->net_ns);
 
+	mutex_init(&fc->uapi_mutex);
+
 	switch (purpose) {
 	case FS_CONTEXT_FOR_MOUNT:
 		fc->user_ns = get_user_ns(fc->cred->user_ns);
@@ -353,6 +355,8 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc)
 	if (!fc)
 		return ERR_PTR(-ENOMEM);
 
+	mutex_init(&fc->uapi_mutex);
+
 	fc->fs_private	= NULL;
 	fc->s_fs_info	= NULL;
 	fc->source	= NULL;
diff --git a/fs/fsopen.c b/fs/fsopen.c
new file mode 100644
index 000000000000..d256f1ac9ff1
--- /dev/null
+++ b/fs/fsopen.c
@@ -0,0 +1,88 @@
+/* Filesystem access-by-fd.
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/fs_context.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+#include <linux/security.h>
+#include <linux/anon_inodes.h>
+#include <linux/namei.h>
+#include <linux/file.h>
+#include <uapi/linux/mount.h>
+#include "mount.h"
+
+static int fscontext_release(struct inode *inode, struct file *file)
+{
+	struct fs_context *fc = file->private_data;
+
+	if (fc) {
+		file->private_data = NULL;
+		put_fs_context(fc);
+	}
+	return 0;
+}
+
+const struct file_operations fscontext_fops = {
+	.release	= fscontext_release,
+	.llseek		= no_llseek,
+};
+
+/*
+ * Attach a filesystem context to a file and an fd.
+ */
+static int fscontext_create_fd(struct fs_context *fc, unsigned int o_flags)
+{
+	int fd;
+
+	fd = anon_inode_getfd("fscontext", &fscontext_fops, fc,
+			      O_RDWR | o_flags);
+	if (fd < 0)
+		put_fs_context(fc);
+	return fd;
+}
+
+/*
+ * Open a filesystem by name so that it can be configured for mounting.
+ *
+ * We are allowed to specify a container in which the filesystem will be
+ * opened, thereby indicating which namespaces will be used (notably, which
+ * network namespace will be used for network filesystems).
+ */
+SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags)
+{
+	struct file_system_type *fs_type;
+	struct fs_context *fc;
+	const char *fs_name;
+
+	if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (flags & ~FSOPEN_CLOEXEC)
+		return -EINVAL;
+
+	fs_name = strndup_user(_fs_name, PAGE_SIZE);
+	if (IS_ERR(fs_name))
+		return PTR_ERR(fs_name);
+
+	fs_type = get_fs_type(fs_name);
+	kfree(fs_name);
+	if (!fs_type)
+		return -ENODEV;
+
+	fc = fs_context_for_mount(fs_type, 0);
+	put_filesystem(fs_type);
+	if (IS_ERR(fc))
+		return PTR_ERR(fc);
+
+	fc->phase = FS_CONTEXT_CREATE_PARAMS;
+	return fscontext_create_fd(fc, flags & FSOPEN_CLOEXEC ? O_CLOEXEC : 0);
+}
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index eaca452088fa..7ab8b44fab3e 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -15,6 +15,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/security.h>
+#include <linux/mutex.h>
 
 struct cred;
 struct dentry;
@@ -34,6 +35,19 @@ enum fs_context_purpose {
 	FS_CONTEXT_FOR_RECONFIGURE,	/* Superblock reconfiguration (remount) */
 };
 
+/*
+ * Userspace usage phase for fsopen/fspick.
+ */
+enum fs_context_phase {
+	FS_CONTEXT_CREATE_PARAMS,	/* Loading params for sb creation */
+	FS_CONTEXT_CREATING,		/* A superblock is being created */
+	FS_CONTEXT_AWAITING_MOUNT,	/* Superblock created, awaiting fsmount() */
+	FS_CONTEXT_AWAITING_RECONF,	/* Awaiting initialisation for reconfiguration */
+	FS_CONTEXT_RECONF_PARAMS,	/* Loading params for reconfiguration */
+	FS_CONTEXT_RECONFIGURING,	/* Reconfiguring the superblock */
+	FS_CONTEXT_FAILED,		/* Failed to correctly transition a context */
+};
+
 /*
  * Type of parameter value.
  */
@@ -74,6 +88,7 @@ struct fs_parameter {
  */
 struct fs_context {
 	const struct fs_context_operations *ops;
+	struct mutex		uapi_mutex;	/* Userspace access mutex */
 	struct file_system_type	*fs_type;
 	void			*fs_private;	/* The filesystem's context */
 	struct dentry		*root;		/* The root and superblock */
@@ -88,6 +103,7 @@ struct fs_context {
 	unsigned int		sb_flags_mask;	/* Superblock flags that were changed */
 	unsigned int		lsm_flags;	/* Information flags from the fs to the LSM */
 	enum fs_context_purpose	purpose:8;
+	enum fs_context_phase	phase:8;	/* The phase the context is in */
 	bool			need_free:1;	/* Need to call ops->free() */
 	bool			global:1;	/* Goes into &init_user_ns */
 };
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 13289cdf6b52..22ed8a11ef55 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -930,6 +930,7 @@ asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
 asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
 			       int to_dfd, const char __user *to_path,
 			       unsigned int ms_flags);
+asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 3634e065836c..7570df43d08f 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -72,4 +72,9 @@
 #define MOVE_MOUNT_T_EMPTY_PATH		0x00000040 /* Empty to path permitted */
 #define MOVE_MOUNT__MASK		0x00000077
 
+/*
+ * fsopen() flags.
+ */
+#define FSOPEN_CLOEXEC		0x00000001
+
 #endif /* _UAPI_LINUX_MOUNT_H */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 06/10] vfs: Implement logging through fs_context
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
                   ` (4 preceding siblings ...)
  2019-02-19 17:09 ` [PATCH 05/10] vfs: syscall: Add fsopen() to prepare for superblock creation David Howells
@ 2019-02-19 17:09 ` David Howells
  2019-02-19 17:09 ` [PATCH 07/10] vfs: syscall: Add fsconfig() for configuring and managing a context David Howells
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:09 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, dhowells, torvalds, ebiederm, linux-security-module

Implement the ability for filesystems to log error, warning and
informational messages through the fs_context.  These can be extracted by
userspace by reading from an fd created by fsopen().

Error messages are prefixed with "e ", warnings with "w " and informational
messages with "i ".

Inside the kernel, formatted messages are malloc'd but unformatted messages
are not copied if they're either in the core .rodata section or in the
.rodata section of the filesystem module pinned by fs_context::fs_type.
The messages are only good till the fs_type is released.

Note that the logging object is shared between duplicated fs_context
structures.  This is so that such as NFS which do a mount within a mount
can get at least some of the errors from the inner mount.

Five logging functions are provided for this:

 (1) void logfc(struct fs_context *fc, const char *fmt, ...);

     This logs a message into the context.  If the buffer is full, the
     earliest message is discarded.

 (2) void errorf(fc, fmt, ...);

     This wraps logfc() to log an error.

 (3) void invalf(fc, fmt, ...);

     This wraps errorf() and returns -EINVAL for convenience.

 (4) void warnf(fc, fmt, ...);

     This wraps logfc() to log a warning.

 (5) void infof(fc, fmt, ...);

     This wraps logfc() to log an informational message.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 fs/fs_context.c            |  105 ++++++++++++++++++++++++++++++++++++++------
 fs/fsopen.c                |   67 ++++++++++++++++++++++++++++
 include/linux/fs_context.h |   22 ++++++---
 include/linux/module.h     |    6 +++
 4 files changed, 179 insertions(+), 21 deletions(-)

diff --git a/fs/fs_context.c b/fs/fs_context.c
index eb806fae3117..dcf3786f90f9 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -11,6 +11,7 @@
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/module.h>
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
 #include <linux/fs.h>
@@ -23,6 +24,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/user_namespace.h>
 #include <net/net_namespace.h>
+#include <asm/sections.h>
 #include "mount.h"
 #include "internal.h"
 
@@ -365,6 +367,8 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc)
 	get_net(fc->net_ns);
 	get_user_ns(fc->user_ns);
 	get_cred(fc->cred);
+	if (fc->log)
+		refcount_inc(&fc->log->usage);
 
 	/* Can't call put until we've called ->dup */
 	ret = fc->ops->dup(fc, src_fc);
@@ -382,7 +386,6 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc)
 }
 EXPORT_SYMBOL(vfs_dup_fs_context);
 
-#ifdef CONFIG_PRINTK
 /**
  * logfc - Log a message to a filesystem context
  * @fc: The filesystem context to log to.
@@ -390,27 +393,100 @@ EXPORT_SYMBOL(vfs_dup_fs_context);
  */
 void logfc(struct fs_context *fc, const char *fmt, ...)
 {
+	static const char store_failure[] = "OOM: Can't store error string";
+	struct fc_log *log = fc ? fc->log : NULL;
+	const char *p;
 	va_list va;
+	char *q;
+	u8 freeable;
 
 	va_start(va, fmt);
-
-	switch (fmt[0]) {
-	case 'w':
-		vprintk_emit(0, LOGLEVEL_WARNING, NULL, 0, fmt, va);
-		break;
-	case 'e':
-		vprintk_emit(0, LOGLEVEL_ERR, NULL, 0, fmt, va);
-		break;
-	default:
-		vprintk_emit(0, LOGLEVEL_NOTICE, NULL, 0, fmt, va);
-		break;
+	if (!strchr(fmt, '%')) {
+		p = fmt;
+		goto unformatted_string;
 	}
+	if (strcmp(fmt, "%s") == 0) {
+		p = va_arg(va, const char *);
+		goto unformatted_string;
+	}
+
+	q = kvasprintf(GFP_KERNEL, fmt, va);
+copied_string:
+	if (!q)
+		goto store_failure;
+	freeable = 1;
+	goto store_string;
+
+unformatted_string:
+	if ((unsigned long)p >= (unsigned long)__start_rodata &&
+	    (unsigned long)p <  (unsigned long)__end_rodata)
+		goto const_string;
+	if (log && within_module_core((unsigned long)p, log->owner))
+		goto const_string;
+	q = kstrdup(p, GFP_KERNEL);
+	goto copied_string;
+
+store_failure:
+	p = store_failure;
+const_string:
+	q = (char *)p;
+	freeable = 0;
+store_string:
+	if (!log) {
+		switch (fmt[0]) {
+		case 'w':
+			printk(KERN_WARNING "%s\n", q + 2);
+			break;
+		case 'e':
+			printk(KERN_ERR "%s\n", q + 2);
+			break;
+		default:
+			printk(KERN_NOTICE "%s\n", q + 2);
+			break;
+		}
+		if (freeable)
+			kfree(q);
+	} else {
+		unsigned int logsize = ARRAY_SIZE(log->buffer);
+		u8 index;
+
+		index = log->head & (logsize - 1);
+		BUILD_BUG_ON(sizeof(log->head) != sizeof(u8) ||
+			     sizeof(log->tail) != sizeof(u8));
+		if ((u8)(log->head - log->tail) == logsize) {
+			/* The buffer is full, discard the oldest message */
+			if (log->need_free & (1 << index))
+				kfree(log->buffer[index]);
+			log->tail++;
+		}
 
-	pr_cont("\n");
+		log->buffer[index] = q;
+		log->need_free &= ~(1 << index);
+		log->need_free |= freeable << index;
+		log->head++;
+	}
 	va_end(va);
 }
 EXPORT_SYMBOL(logfc);
-#endif
+
+/*
+ * Free a logging structure.
+ */
+static void put_fc_log(struct fs_context *fc)
+{
+	struct fc_log *log = fc->log;
+	int i;
+
+	if (log) {
+		if (refcount_dec_and_test(&log->usage)) {
+			fc->log = NULL;
+			for (i = 0; i <= 7; i++)
+				if (log->need_free & (1 << i))
+					kfree(log->buffer[i]);
+			kfree(log);
+		}
+	}
+}
 
 /**
  * put_fs_context - Dispose of a superblock configuration context.
@@ -435,6 +511,7 @@ void put_fs_context(struct fs_context *fc)
 	put_user_ns(fc->user_ns);
 	put_cred(fc->cred);
 	kfree(fc->subtype);
+	put_fc_log(fc);
 	put_filesystem(fc->fs_type);
 	kfree(fc->source);
 	kfree(fc);
diff --git a/fs/fsopen.c b/fs/fsopen.c
index d256f1ac9ff1..5fce6347de7a 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -20,6 +20,52 @@
 #include <uapi/linux/mount.h>
 #include "mount.h"
 
+/*
+ * Allow the user to read back any error, warning or informational messages.
+ */
+static ssize_t fscontext_read(struct file *file,
+			      char __user *_buf, size_t len, loff_t *pos)
+{
+	struct fs_context *fc = file->private_data;
+	struct fc_log *log = fc->log;
+	unsigned int logsize = ARRAY_SIZE(log->buffer);
+	ssize_t ret;
+	char *p;
+	bool need_free;
+	int index, n;
+
+	ret = mutex_lock_interruptible(&fc->uapi_mutex);
+	if (ret < 0)
+		return ret;
+
+	if (log->head == log->tail) {
+		mutex_unlock(&fc->uapi_mutex);
+		return -ENODATA;
+	}
+
+	index = log->tail & (logsize - 1);
+	p = log->buffer[index];
+	need_free = log->need_free & (1 << index);
+	log->buffer[index] = NULL;
+	log->need_free &= ~(1 << index);
+	log->tail++;
+	mutex_unlock(&fc->uapi_mutex);
+
+	ret = -EMSGSIZE;
+	n = strlen(p);
+	if (n > len)
+		goto err_free;
+	ret = -EFAULT;
+	if (copy_to_user(_buf, p, n) != 0)
+		goto err_free;
+	ret = n;
+
+err_free:
+	if (need_free)
+		kfree(p);
+	return ret;
+}
+
 static int fscontext_release(struct inode *inode, struct file *file)
 {
 	struct fs_context *fc = file->private_data;
@@ -32,6 +78,7 @@ static int fscontext_release(struct inode *inode, struct file *file)
 }
 
 const struct file_operations fscontext_fops = {
+	.read		= fscontext_read,
 	.release	= fscontext_release,
 	.llseek		= no_llseek,
 };
@@ -50,6 +97,16 @@ static int fscontext_create_fd(struct fs_context *fc, unsigned int o_flags)
 	return fd;
 }
 
+static int fscontext_alloc_log(struct fs_context *fc)
+{
+	fc->log = kzalloc(sizeof(*fc->log), GFP_KERNEL);
+	if (!fc->log)
+		return -ENOMEM;
+	refcount_set(&fc->log->usage, 1);
+	fc->log->owner = fc->fs_type->owner;
+	return 0;
+}
+
 /*
  * Open a filesystem by name so that it can be configured for mounting.
  *
@@ -62,6 +119,7 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags)
 	struct file_system_type *fs_type;
 	struct fs_context *fc;
 	const char *fs_name;
+	int ret;
 
 	if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
@@ -84,5 +142,14 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags)
 		return PTR_ERR(fc);
 
 	fc->phase = FS_CONTEXT_CREATE_PARAMS;
+
+	ret = fscontext_alloc_log(fc);
+	if (ret < 0)
+		goto err_fc;
+
 	return fscontext_create_fd(fc, flags & FSOPEN_CLOEXEC ? O_CLOEXEC : 0);
+
+err_fc:
+	put_fs_context(fc);
+	return ret;
 }
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index 7ab8b44fab3e..1f966670c8dc 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -13,6 +13,7 @@
 #define _LINUX_FS_CONTEXT_H
 
 #include <linux/kernel.h>
+#include <linux/refcount.h>
 #include <linux/errno.h>
 #include <linux/security.h>
 #include <linux/mutex.h>
@@ -95,6 +96,7 @@ struct fs_context {
 	struct user_namespace	*user_ns;	/* The user namespace for this mount */
 	struct net		*net_ns;	/* The network namespace for this mount */
 	const struct cred	*cred;		/* The mounter's credentials */
+	struct fc_log		*log;		/* Logging buffer */
 	const char		*source;	/* The source name (eg. dev path) */
 	const char		*subtype;	/* The subtype to set on the superblock */
 	void			*security;	/* Linux S&M options */
@@ -151,15 +153,21 @@ extern int vfs_get_super(struct fs_context *fc,
 
 extern const struct file_operations fscontext_fops;
 
-#ifdef CONFIG_PRINTK
+/*
+ * Mount error, warning and informational message logging.  This structure is
+ * shareable between a mount and a subordinate mount.
+ */
+struct fc_log {
+	refcount_t	usage;
+	u8		head;		/* Insertion index in buffer[] */
+	u8		tail;		/* Removal index in buffer[] */
+	u8		need_free;	/* Mask of kfree'able items in buffer[] */
+	struct module	*owner;		/* Owner module for strings that don't then need freeing */
+	char		*buffer[8];
+};
+
 extern __attribute__((format(printf, 2, 3)))
 void logfc(struct fs_context *fc, const char *fmt, ...);
-#else
-static inline __attribute__((format(printf, 2, 3)))
-void logfc(struct fs_context *fc, const char *fmt, ...)
-{
-}
-#endif
 
 /**
  * infof - Store supplementary informational message
diff --git a/include/linux/module.h b/include/linux/module.h
index 8fa38d3e7538..902cbd0047d4 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -693,6 +693,12 @@ static inline bool is_module_text_address(unsigned long addr)
 	return false;
 }
 
+static inline bool within_module_core(unsigned long addr,
+				      const struct module *mod)
+{
+	return false;
+}
+
 /* Get/put a kernel symbol (calls should be symmetric) */
 #define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })
 #define symbol_put(x) do { } while (0)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 07/10] vfs: syscall: Add fsconfig() for configuring and managing a context
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
                   ` (5 preceding siblings ...)
  2019-02-19 17:09 ` [PATCH 06/10] vfs: Implement logging through fs_context David Howells
@ 2019-02-19 17:09 ` David Howells
  2019-02-19 17:09 ` [PATCH 08/10] vfs: syscall: Add fsmount() to create a mount for a superblock David Howells
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:09 UTC (permalink / raw)
  To: viro
  Cc: linux-api, linux-fsdevel, dhowells, torvalds, ebiederm,
	linux-security-module

Add a syscall for configuring a filesystem creation context and triggering
actions upon it, to be used in conjunction with fsopen, fspick and fsmount.

    long fsconfig(int fs_fd, unsigned int cmd, const char *key,
		  const void *value, int aux);

Where fs_fd indicates the context, cmd indicates the action to take, key
indicates the parameter name for parameter-setting actions and, if needed,
value points to a buffer containing the value and aux can give more
information for the value.

The following command IDs are proposed:

 (*) FSCONFIG_SET_FLAG: No value is specified.  The parameter must be
     boolean in nature.  The key may be prefixed with "no" to invert the
     setting. value must be NULL and aux must be 0.

 (*) FSCONFIG_SET_STRING: A string value is specified.  The parameter can
     be expecting boolean, integer, string or take a path.  A conversion to
     an appropriate type will be attempted (which may include looking up as
     a path).  value points to a NUL-terminated string and aux must be 0.

 (*) FSCONFIG_SET_BINARY: A binary blob is specified.  value points to
     the blob and aux indicates its size.  The parameter must be expecting
     a blob.

 (*) FSCONFIG_SET_PATH: A non-empty path is specified.  The parameter must
     be expecting a path object.  value points to a NUL-terminated string
     that is the path and aux is a file descriptor at which to start a
     relative lookup or AT_FDCWD.

 (*) FSCONFIG_SET_PATH_EMPTY: As fsconfig_set_path, but with AT_EMPTY_PATH
     implied.

 (*) FSCONFIG_SET_FD: An open file descriptor is specified.  value must
     be NULL and aux indicates the file descriptor.

 (*) FSCONFIG_CMD_CREATE: Trigger superblock creation.

 (*) FSCONFIG_CMD_RECONFIGURE: Trigger superblock reconfiguration.

For the "set" command IDs, the idea is that the file_system_type will point
to a list of parameters and the types of value that those parameters expect
to take.  The core code can then do the parse and argument conversion and
then give the LSM and FS a cooked option or array of options to use.

Source specification is also done the same way same way, using special keys
"source", "source1", "source2", etc..

[!] Note that, for the moment, the key and value are just glued back
together and handed to the filesystem.  Every filesystem that uses options
uses match_token() and co. to do this, and this will need to be changed -
but not all at once.

Example usage:

    fd = fsopen("ext4", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_path, "source", "/dev/sda1", AT_FDCWD);
    fsconfig(fd, fsconfig_set_path_empty, "journal_path", "", journal_fd);
    fsconfig(fd, fsconfig_set_fd, "journal_fd", "", journal_fd);
    fsconfig(fd, fsconfig_set_flag, "user_xattr", NULL, 0);
    fsconfig(fd, fsconfig_set_flag, "noacl", NULL, 0);
    fsconfig(fd, fsconfig_set_string, "sb", "1", 0);
    fsconfig(fd, fsconfig_set_string, "errors", "continue", 0);
    fsconfig(fd, fsconfig_set_string, "data", "journal", 0);
    fsconfig(fd, fsconfig_set_string, "context", "unconfined_u:...", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

or:

    fd = fsopen("ext4", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_string, "source", "/dev/sda1", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

or:

    fd = fsopen("afs", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_string, "source", "#grand.central.org:root.cell", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

or:

    fd = fsopen("jffs2", FSOPEN_CLOEXEC);
    fsconfig(fd, fsconfig_set_string, "source", "mtd0", 0);
    fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0);
    mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/fs_context.c                        |   51 ++++++
 fs/fsopen.c                            |  265 ++++++++++++++++++++++++++++++++
 fs/internal.h                          |    3 
 include/linux/syscalls.h               |    2 
 include/uapi/linux/mount.h             |   14 ++
 7 files changed, 337 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 1647fefd2969..f9970310c126 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -401,3 +401,4 @@
 387	i386	open_tree		sys_open_tree			__ia32_sys_open_tree
 388	i386	move_mount		sys_move_mount			__ia32_sys_move_mount
 389	i386	fsopen			sys_fsopen			__ia32_sys_fsopen
+390	i386	fsconfig		sys_fsconfig			__ia32_sys_fsconfig
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 235d33dbccb2..4185d36e03bb 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -346,6 +346,7 @@
 335	common	open_tree		__x64_sys_open_tree
 336	common	move_mount		__x64_sys_move_mount
 337	common	fsopen			__x64_sys_fsopen
+338	common	fsconfig		__x64_sys_fsconfig
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/fs_context.c b/fs/fs_context.c
index dcf3786f90f9..a47ccd5a4a78 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -721,3 +721,54 @@ int parse_monolithic_mount_data(struct fs_context *fc, void *data)
 
 	return monolithic_mount_data(fc, data);
 }
+
+/*
+ * Clean up a context after performing an action on it and put it into a state
+ * from where it can be used to reconfigure a superblock.
+ *
+ * Note that here we do only the parts that can't fail; the rest is in
+ * finish_clean_context() below and in between those fs_context is marked
+ * FS_CONTEXT_AWAITING_RECONF.  The reason for splitup is that after
+ * successful mount or remount we need to report success to userland.
+ * Trying to do full reinit (for the sake of possible subsequent remount)
+ * and failing to allocate memory would've put us into a nasty situation.
+ * So here we only discard the old state and reinitialization is left
+ * until we actually try to reconfigure.
+ */
+void vfs_clean_context(struct fs_context *fc)
+{
+	if (fc->need_free && fc->ops && fc->ops->free)
+		fc->ops->free(fc);
+	fc->need_free = false;
+	fc->fs_private = NULL;
+	fc->s_fs_info = NULL;
+	fc->sb_flags = 0;
+	security_free_mnt_opts(&fc->security);
+	kfree(fc->subtype);
+	fc->subtype = NULL;
+	kfree(fc->source);
+	fc->source = NULL;
+
+	fc->purpose = FS_CONTEXT_FOR_RECONFIGURE;
+	fc->phase = FS_CONTEXT_AWAITING_RECONF;
+}
+
+int finish_clean_context(struct fs_context *fc)
+{
+	int error;
+
+	if (fc->phase != FS_CONTEXT_AWAITING_RECONF)
+		return 0;
+
+	if (fc->fs_type->init_fs_context)
+		error = fc->fs_type->init_fs_context(fc);
+	else
+		error = legacy_init_fs_context(fc);
+	if (unlikely(error)) {
+		fc->phase = FS_CONTEXT_FAILED;
+		return error;
+	}
+	fc->need_free = true;
+	fc->phase = FS_CONTEXT_RECONF_PARAMS;
+	return 0;
+}
diff --git a/fs/fsopen.c b/fs/fsopen.c
index 5fce6347de7a..65cc2f68f994 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -10,6 +10,7 @@
  */
 
 #include <linux/fs_context.h>
+#include <linux/fs_parser.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/syscalls.h>
@@ -18,6 +19,7 @@
 #include <linux/namei.h>
 #include <linux/file.h>
 #include <uapi/linux/mount.h>
+#include "internal.h"
 #include "mount.h"
 
 /*
@@ -153,3 +155,266 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags)
 	put_fs_context(fc);
 	return ret;
 }
+
+/*
+ * Check the state and apply the configuration.  Note that this function is
+ * allowed to 'steal' the value by setting param->xxx to NULL before returning.
+ */
+static int vfs_fsconfig_locked(struct fs_context *fc, int cmd,
+			       struct fs_parameter *param)
+{
+	struct super_block *sb;
+	int ret;
+
+	ret = finish_clean_context(fc);
+	if (ret)
+		return ret;
+	switch (cmd) {
+	case FSCONFIG_CMD_CREATE:
+		if (fc->phase != FS_CONTEXT_CREATE_PARAMS)
+			return -EBUSY;
+		fc->phase = FS_CONTEXT_CREATING;
+		ret = vfs_get_tree(fc);
+		if (ret)
+			break;
+		sb = fc->root->d_sb;
+		ret = security_sb_kern_mount(sb);
+		if (unlikely(ret)) {
+			fc_drop_locked(fc);
+			break;
+		}
+		up_write(&sb->s_umount);
+		fc->phase = FS_CONTEXT_AWAITING_MOUNT;
+		return 0;
+	case FSCONFIG_CMD_RECONFIGURE:
+		if (fc->phase != FS_CONTEXT_RECONF_PARAMS)
+			return -EBUSY;
+		fc->phase = FS_CONTEXT_RECONFIGURING;
+		sb = fc->root->d_sb;
+		if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN)) {
+			ret = -EPERM;
+			break;
+		}
+		down_write(&sb->s_umount);
+		ret = reconfigure_super(fc);
+		up_write(&sb->s_umount);
+		if (ret)
+			break;
+		vfs_clean_context(fc);
+		return 0;
+	default:
+		if (fc->phase != FS_CONTEXT_CREATE_PARAMS &&
+		    fc->phase != FS_CONTEXT_RECONF_PARAMS)
+			return -EBUSY;
+
+		return vfs_parse_fs_param(fc, param);
+	}
+	fc->phase = FS_CONTEXT_FAILED;
+	return ret;
+}
+
+/**
+ * sys_fsconfig - Set parameters and trigger actions on a context
+ * @fd: The filesystem context to act upon
+ * @cmd: The action to take
+ * @_key: Where appropriate, the parameter key to set
+ * @_value: Where appropriate, the parameter value to set
+ * @aux: Additional information for the value
+ *
+ * This system call is used to set parameters on a context, including
+ * superblock settings, data source and security labelling.
+ *
+ * Actions include triggering the creation of a superblock and the
+ * reconfiguration of the superblock attached to the specified context.
+ *
+ * When setting a parameter, @cmd indicates the type of value being proposed
+ * and @_key indicates the parameter to be altered.
+ *
+ * @_value and @aux are used to specify the value, should a value be required:
+ *
+ * (*) fsconfig_set_flag: No value is specified.  The parameter must be boolean
+ *     in nature.  The key may be prefixed with "no" to invert the
+ *     setting. @_value must be NULL and @aux must be 0.
+ *
+ * (*) fsconfig_set_string: A string value is specified.  The parameter can be
+ *     expecting boolean, integer, string or take a path.  A conversion to an
+ *     appropriate type will be attempted (which may include looking up as a
+ *     path).  @_value points to a NUL-terminated string and @aux must be 0.
+ *
+ * (*) fsconfig_set_binary: A binary blob is specified.  @_value points to the
+ *     blob and @aux indicates its size.  The parameter must be expecting a
+ *     blob.
+ *
+ * (*) fsconfig_set_path: A non-empty path is specified.  The parameter must be
+ *     expecting a path object.  @_value points to a NUL-terminated string that
+ *     is the path and @aux is a file descriptor at which to start a relative
+ *     lookup or AT_FDCWD.
+ *
+ * (*) fsconfig_set_path_empty: As fsconfig_set_path, but with AT_EMPTY_PATH
+ *     implied.
+ *
+ * (*) fsconfig_set_fd: An open file descriptor is specified.  @_value must be
+ *     NULL and @aux indicates the file descriptor.
+ */
+SYSCALL_DEFINE5(fsconfig,
+		int, fd,
+		unsigned int, cmd,
+		const char __user *, _key,
+		const void __user *, _value,
+		int, aux)
+{
+	struct fs_context *fc;
+	struct fd f;
+	int ret;
+
+	struct fs_parameter param = {
+		.type	= fs_value_is_undefined,
+	};
+
+	if (fd < 0)
+		return -EINVAL;
+
+	switch (cmd) {
+	case FSCONFIG_SET_FLAG:
+		if (!_key || _value || aux)
+			return -EINVAL;
+		break;
+	case FSCONFIG_SET_STRING:
+		if (!_key || !_value || aux)
+			return -EINVAL;
+		break;
+	case FSCONFIG_SET_BINARY:
+		if (!_key || !_value || aux <= 0 || aux > 1024 * 1024)
+			return -EINVAL;
+		break;
+	case FSCONFIG_SET_PATH:
+	case FSCONFIG_SET_PATH_EMPTY:
+		if (!_key || !_value || (aux != AT_FDCWD && aux < 0))
+			return -EINVAL;
+		break;
+	case FSCONFIG_SET_FD:
+		if (!_key || _value || aux < 0)
+			return -EINVAL;
+		break;
+	case FSCONFIG_CMD_CREATE:
+	case FSCONFIG_CMD_RECONFIGURE:
+		if (_key || _value || aux)
+			return -EINVAL;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	f = fdget(fd);
+	if (!f.file)
+		return -EBADF;
+	ret = -EINVAL;
+	if (f.file->f_op != &fscontext_fops)
+		goto out_f;
+
+	fc = f.file->private_data;
+	if (fc->ops == &legacy_fs_context_ops) {
+		switch (cmd) {
+		case FSCONFIG_SET_BINARY:
+		case FSCONFIG_SET_PATH:
+		case FSCONFIG_SET_PATH_EMPTY:
+		case FSCONFIG_SET_FD:
+			ret = -EOPNOTSUPP;
+			goto out_f;
+		}
+	}
+
+	if (_key) {
+		param.key = strndup_user(_key, 256);
+		if (IS_ERR(param.key)) {
+			ret = PTR_ERR(param.key);
+			goto out_f;
+		}
+	}
+
+	switch (cmd) {
+	case FSCONFIG_SET_FLAG:
+		param.type = fs_value_is_flag;
+		break;
+	case FSCONFIG_SET_STRING:
+		param.type = fs_value_is_string;
+		param.string = strndup_user(_value, 256);
+		if (IS_ERR(param.string)) {
+			ret = PTR_ERR(param.string);
+			goto out_key;
+		}
+		param.size = strlen(param.string);
+		break;
+	case FSCONFIG_SET_BINARY:
+		param.type = fs_value_is_blob;
+		param.size = aux;
+		param.blob = memdup_user_nul(_value, aux);
+		if (IS_ERR(param.blob)) {
+			ret = PTR_ERR(param.blob);
+			goto out_key;
+		}
+		break;
+	case FSCONFIG_SET_PATH:
+		param.type = fs_value_is_filename;
+		param.name = getname_flags(_value, 0, NULL);
+		if (IS_ERR(param.name)) {
+			ret = PTR_ERR(param.name);
+			goto out_key;
+		}
+		param.dirfd = aux;
+		param.size = strlen(param.name->name);
+		break;
+	case FSCONFIG_SET_PATH_EMPTY:
+		param.type = fs_value_is_filename_empty;
+		param.name = getname_flags(_value, LOOKUP_EMPTY, NULL);
+		if (IS_ERR(param.name)) {
+			ret = PTR_ERR(param.name);
+			goto out_key;
+		}
+		param.dirfd = aux;
+		param.size = strlen(param.name->name);
+		break;
+	case FSCONFIG_SET_FD:
+		param.type = fs_value_is_file;
+		ret = -EBADF;
+		param.file = fget(aux);
+		if (!param.file)
+			goto out_key;
+		break;
+	default:
+		break;
+	}
+
+	ret = mutex_lock_interruptible(&fc->uapi_mutex);
+	if (ret == 0) {
+		ret = vfs_fsconfig_locked(fc, cmd, &param);
+		mutex_unlock(&fc->uapi_mutex);
+	}
+
+	/* Clean up the our record of any value that we obtained from
+	 * userspace.  Note that the value may have been stolen by the LSM or
+	 * filesystem, in which case the value pointer will have been cleared.
+	 */
+	switch (cmd) {
+	case FSCONFIG_SET_STRING:
+	case FSCONFIG_SET_BINARY:
+		kfree(param.string);
+		break;
+	case FSCONFIG_SET_PATH:
+	case FSCONFIG_SET_PATH_EMPTY:
+		if (param.name)
+			putname(param.name);
+		break;
+	case FSCONFIG_SET_FD:
+		if (param.file)
+			fput(param.file);
+		break;
+	default:
+		break;
+	}
+out_key:
+	kfree(param.key);
+out_f:
+	fdput(f);
+	return ret;
+}
diff --git a/fs/internal.h b/fs/internal.h
index f3a027c44758..95cf7b0af21f 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -55,8 +55,11 @@ extern void __init chrdev_init(void);
 /*
  * fs_context.c
  */
+extern const struct fs_context_operations legacy_fs_context_ops;
 extern int parse_monolithic_mount_data(struct fs_context *, void *);
 extern void fc_drop_locked(struct fs_context *);
+extern void vfs_clean_context(struct fs_context *fc);
+extern int finish_clean_context(struct fs_context *fc);
 
 /*
  * namei.c
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 22ed8a11ef55..2586faf20078 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -931,6 +931,8 @@ asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
 			       int to_dfd, const char __user *to_path,
 			       unsigned int ms_flags);
 asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags);
+asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key,
+			     const void __user *value, int aux);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 7570df43d08f..4b90ba9d1770 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -77,4 +77,18 @@
  */
 #define FSOPEN_CLOEXEC		0x00000001
 
+/*
+ * The type of fsconfig() call made.
+ */
+enum fsconfig_command {
+	FSCONFIG_SET_FLAG	= 0,	/* Set parameter, supplying no value */
+	FSCONFIG_SET_STRING	= 1,	/* Set parameter, supplying a string value */
+	FSCONFIG_SET_BINARY	= 2,	/* Set parameter, supplying a binary blob value */
+	FSCONFIG_SET_PATH	= 3,	/* Set parameter, supplying an object by path */
+	FSCONFIG_SET_PATH_EMPTY	= 4,	/* Set parameter, supplying an object by (empty) path */
+	FSCONFIG_SET_FD		= 5,	/* Set parameter, supplying an object by fd */
+	FSCONFIG_CMD_CREATE	= 6,	/* Invoke superblock creation */
+	FSCONFIG_CMD_RECONFIGURE = 7,	/* Invoke superblock reconfiguration */
+};
+
 #endif /* _UAPI_LINUX_MOUNT_H */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 08/10] vfs: syscall: Add fsmount() to create a mount for a superblock
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
                   ` (6 preceding siblings ...)
  2019-02-19 17:09 ` [PATCH 07/10] vfs: syscall: Add fsconfig() for configuring and managing a context David Howells
@ 2019-02-19 17:09 ` David Howells
  2019-02-19 17:09 ` [PATCH 09/10] vfs: syscall: Add fspick() to select a superblock for reconfiguration David Howells
  2019-02-19 17:09 ` [PATCH 10/10] vfs: Add a sample program for the new mount API David Howells
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:09 UTC (permalink / raw)
  To: viro
  Cc: linux-api, linux-fsdevel, dhowells, torvalds, ebiederm,
	linux-security-module

Provide a system call by which a filesystem opened with fsopen() and
configured by a series of fsconfig() calls can have a detached mount object
created for it.  This mount object can then be attached to the VFS mount
hierarchy using move_mount() by passing the returned file descriptor as the
from directory fd.

The system call looks like:

	int mfd = fsmount(int fsfd, unsigned int flags,
			  unsigned int attr_flags);

where fsfd is the file descriptor returned by fsopen().  flags can be 0 or
FSMOUNT_CLOEXEC.  attr_flags is a bitwise-OR of the following flags:

	MOUNT_ATTR_RDONLY	Mount read-only
	MOUNT_ATTR_NOSUID	Ignore suid and sgid bits
	MOUNT_ATTR_NODEV	Disallow access to device special files
	MOUNT_ATTR_NOEXEC	Disallow program execution
	MOUNT_ATTR__ATIME	Setting on how atime should be updated
	MOUNT_ATTR_RELATIME	- Update atime relative to mtime/ctime
	MOUNT_ATTR_NOATIME	- Do not update access times
	MOUNT_ATTR_STRICTATIME	- Always perform atime updates
	MOUNT_ATTR_NODIRATIME	Do not update directory access times

In the event that fsmount() fails, it may be possible to get an error
message by calling read() on fsfd.  If no message is available, ENODATA
will be reported.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/namespace.c                         |  146 +++++++++++++++++++++++++++++++-
 include/linux/syscalls.h               |    1 
 include/uapi/linux/mount.h             |   18 ++++
 5 files changed, 164 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f9970310c126..c78b68256f8a 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -402,3 +402,4 @@
 388	i386	move_mount		sys_move_mount			__ia32_sys_move_mount
 389	i386	fsopen			sys_fsopen			__ia32_sys_fsopen
 390	i386	fsconfig		sys_fsconfig			__ia32_sys_fsconfig
+391	i386	fsmount			sys_fsmount			__ia32_sys_fsmount
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 4185d36e03bb..d44ead5d4368 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -347,6 +347,7 @@
 336	common	move_mount		__x64_sys_move_mount
 337	common	fsopen			__x64_sys_fsopen
 338	common	fsconfig		__x64_sys_fsconfig
+339	common	fsmount			__x64_sys_fsmount
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/namespace.c b/fs/namespace.c
index 56423c60ac7e..43cbce0643ce 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3334,9 +3334,149 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
 }
 
 /*
- * Move a mount from one place to another.
- * In combination with open_tree(OPEN_TREE_CLONE [| AT_RECURSIVE]) it can be
- * used to copy a mount subtree.
+ * Create a kernel mount representation for a new, prepared superblock
+ * (specified by fs_fd) and attach to an open_tree-like file descriptor.
+ */
+SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
+		unsigned int, attr_flags)
+{
+	struct mnt_namespace *ns;
+	struct fs_context *fc;
+	struct file *file;
+	struct path newmount;
+	struct mount *mnt;
+	struct fd f;
+	unsigned int mnt_flags = 0;
+	long ret;
+
+	if (!may_mount())
+		return -EPERM;
+
+	if ((flags & ~(FSMOUNT_CLOEXEC)) != 0)
+		return -EINVAL;
+
+	if (attr_flags & ~(MOUNT_ATTR_RDONLY |
+			   MOUNT_ATTR_NOSUID |
+			   MOUNT_ATTR_NODEV |
+			   MOUNT_ATTR_NOEXEC |
+			   MOUNT_ATTR__ATIME |
+			   MOUNT_ATTR_NODIRATIME))
+		return -EINVAL;
+
+	if (attr_flags & MOUNT_ATTR_RDONLY)
+		mnt_flags |= MNT_READONLY;
+	if (attr_flags & MOUNT_ATTR_NOSUID)
+		mnt_flags |= MNT_NOSUID;
+	if (attr_flags & MOUNT_ATTR_NODEV)
+		mnt_flags |= MNT_NODEV;
+	if (attr_flags & MOUNT_ATTR_NOEXEC)
+		mnt_flags |= MNT_NOEXEC;
+	if (attr_flags & MOUNT_ATTR_NODIRATIME)
+		mnt_flags |= MNT_NODIRATIME;
+
+	switch (attr_flags & MOUNT_ATTR__ATIME) {
+	case MOUNT_ATTR_STRICTATIME:
+		break;
+	case MOUNT_ATTR_NOATIME:
+		mnt_flags |= MNT_NOATIME;
+		break;
+	case MOUNT_ATTR_RELATIME:
+		mnt_flags |= MNT_RELATIME;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	f = fdget(fs_fd);
+	if (!f.file)
+		return -EBADF;
+
+	ret = -EINVAL;
+	if (f.file->f_op != &fscontext_fops)
+		goto err_fsfd;
+
+	fc = f.file->private_data;
+
+	ret = mutex_lock_interruptible(&fc->uapi_mutex);
+	if (ret < 0)
+		goto err_fsfd;
+
+	/* There must be a valid superblock or we can't mount it */
+	ret = -EINVAL;
+	if (!fc->root)
+		goto err_unlock;
+
+	ret = -EPERM;
+	if (mount_too_revealing(fc->root->d_sb, &mnt_flags)) {
+		pr_warn("VFS: Mount too revealing\n");
+		goto err_unlock;
+	}
+
+	ret = -EBUSY;
+	if (fc->phase != FS_CONTEXT_AWAITING_MOUNT)
+		goto err_unlock;
+
+	ret = -EPERM;
+	if ((fc->sb_flags & SB_MANDLOCK) && !may_mandlock())
+		goto err_unlock;
+
+	newmount.mnt = vfs_create_mount(fc);
+	if (IS_ERR(newmount.mnt)) {
+		ret = PTR_ERR(newmount.mnt);
+		goto err_unlock;
+	}
+	newmount.dentry = dget(fc->root);
+	newmount.mnt->mnt_flags = mnt_flags;
+
+	/* We've done the mount bit - now move the file context into more or
+	 * less the same state as if we'd done an fspick().  We don't want to
+	 * do any memory allocation or anything like that at this point as we
+	 * don't want to have to handle any errors incurred.
+	 */
+	vfs_clean_context(fc);
+
+	ns = alloc_mnt_ns(current->nsproxy->mnt_ns->user_ns, true);
+	if (IS_ERR(ns)) {
+		ret = PTR_ERR(ns);
+		goto err_path;
+	}
+	mnt = real_mount(newmount.mnt);
+	mnt->mnt_ns = ns;
+	ns->root = mnt;
+	ns->mounts = 1;
+	list_add(&mnt->mnt_list, &ns->list);
+
+	/* Attach to an apparent O_PATH fd with a note that we need to unmount
+	 * it, not just simply put it.
+	 */
+	file = dentry_open(&newmount, O_PATH, fc->cred);
+	if (IS_ERR(file)) {
+		dissolve_on_fput(newmount.mnt);
+		ret = PTR_ERR(file);
+		goto err_path;
+	}
+	file->f_mode |= FMODE_NEED_UNMOUNT;
+
+	ret = get_unused_fd_flags((flags & FSMOUNT_CLOEXEC) ? O_CLOEXEC : 0);
+	if (ret >= 0)
+		fd_install(ret, file);
+	else
+		fput(file);
+
+err_path:
+	path_put(&newmount);
+err_unlock:
+	mutex_unlock(&fc->uapi_mutex);
+err_fsfd:
+	fdput(f);
+	return ret;
+}
+
+/*
+ * Move a mount from one place to another.  In combination with
+ * fsopen()/fsmount() this is used to install a new mount and in combination
+ * with open_tree(OPEN_TREE_CLONE [| AT_RECURSIVE]) it can be used to copy
+ * a mount subtree.
  *
  * Note the flags value is a combination of MOVE_MOUNT_* flags.
  */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 2586faf20078..a332a731a0e4 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -933,6 +933,7 @@ asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
 asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags);
 asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key,
 			     const void __user *value, int aux);
+asmlinkage long sys_fsmount(int fs_fd, unsigned int flags, unsigned int ms_flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 4b90ba9d1770..3888d3b91dc5 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -91,4 +91,22 @@ enum fsconfig_command {
 	FSCONFIG_CMD_RECONFIGURE = 7,	/* Invoke superblock reconfiguration */
 };
 
+/*
+ * fsmount() flags.
+ */
+#define FSMOUNT_CLOEXEC		0x00000001
+
+/*
+ * Mount attributes.
+ */
+#define MOUNT_ATTR_RDONLY	0x00000001 /* Mount read-only */
+#define MOUNT_ATTR_NOSUID	0x00000002 /* Ignore suid and sgid bits */
+#define MOUNT_ATTR_NODEV	0x00000004 /* Disallow access to device special files */
+#define MOUNT_ATTR_NOEXEC	0x00000008 /* Disallow program execution */
+#define MOUNT_ATTR__ATIME	0x00000070 /* Setting on how atime should be updated */
+#define MOUNT_ATTR_RELATIME	0x00000000 /* - Update atime relative to mtime/ctime. */
+#define MOUNT_ATTR_NOATIME	0x00000010 /* - Do not update access times. */
+#define MOUNT_ATTR_STRICTATIME	0x00000020 /* - Always perform atime updates */
+#define MOUNT_ATTR_NODIRATIME	0x00000080 /* Do not update directory access times */
+
 #endif /* _UAPI_LINUX_MOUNT_H */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 09/10] vfs: syscall: Add fspick() to select a superblock for reconfiguration
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
                   ` (7 preceding siblings ...)
  2019-02-19 17:09 ` [PATCH 08/10] vfs: syscall: Add fsmount() to create a mount for a superblock David Howells
@ 2019-02-19 17:09 ` David Howells
  2019-02-19 17:09 ` [PATCH 10/10] vfs: Add a sample program for the new mount API David Howells
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:09 UTC (permalink / raw)
  To: viro
  Cc: linux-api, linux-fsdevel, dhowells, torvalds, ebiederm,
	linux-security-module

Provide an fspick() system call that can be used to pick an existing
mountpoint into an fs_context which can thereafter be used to reconfigure a
superblock (equivalent of the superblock side of -o remount).

This looks like:

	int fd = fspick(AT_FDCWD, "/mnt",
			FSPICK_CLOEXEC | FSPICK_NO_AUTOMOUNT);
	fsconfig(fd, FSCONFIG_SET_FLAG, "intr", NULL, 0);
	fsconfig(fd, FSCONFIG_SET_FLAG, "noac", NULL, 0);
	fsconfig(fd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);

At the point of fspick being called, the file descriptor referring to the
filesystem context is in exactly the same state as the one that was created
by fsopen() after fsmount() has been successfully called.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 +
 arch/x86/entry/syscalls/syscall_64.tbl |    1 +
 fs/fsopen.c                            |   57 ++++++++++++++++++++++++++++++++
 include/linux/syscalls.h               |    1 +
 include/uapi/linux/mount.h             |    8 ++++
 5 files changed, 68 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index c78b68256f8a..d1eb6c815790 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -403,3 +403,4 @@
 389	i386	fsopen			sys_fsopen			__ia32_sys_fsopen
 390	i386	fsconfig		sys_fsconfig			__ia32_sys_fsconfig
 391	i386	fsmount			sys_fsmount			__ia32_sys_fsmount
+392	i386	fspick			sys_fspick			__ia32_sys_fspick
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index d44ead5d4368..d3ab703c02bb 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -348,6 +348,7 @@
 337	common	fsopen			__x64_sys_fsopen
 338	common	fsconfig		__x64_sys_fsconfig
 339	common	fsmount			__x64_sys_fsmount
+340	common	fspick			__x64_sys_fspick
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/fsopen.c b/fs/fsopen.c
index 65cc2f68f994..3bb9c0c8cbcc 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -156,6 +156,63 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags)
 	return ret;
 }
 
+/*
+ * Pick a superblock into a context for reconfiguration.
+ */
+SYSCALL_DEFINE3(fspick, int, dfd, const char __user *, path, unsigned int, flags)
+{
+	struct fs_context *fc;
+	struct path target;
+	unsigned int lookup_flags;
+	int ret;
+
+	if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if ((flags & ~(FSPICK_CLOEXEC |
+		       FSPICK_SYMLINK_NOFOLLOW |
+		       FSPICK_NO_AUTOMOUNT |
+		       FSPICK_EMPTY_PATH)) != 0)
+		return -EINVAL;
+
+	lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+	if (flags & FSPICK_SYMLINK_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
+	if (flags & FSPICK_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (flags & FSPICK_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+	ret = user_path_at(dfd, path, lookup_flags, &target);
+	if (ret < 0)
+		goto err;
+
+	ret = -EINVAL;
+	if (target.mnt->mnt_root != target.dentry)
+		goto err_path;
+
+	fc = fs_context_for_reconfigure(target.dentry, 0, 0);
+	if (IS_ERR(fc)) {
+		ret = PTR_ERR(fc);
+		goto err_path;
+	}
+
+	fc->phase = FS_CONTEXT_RECONF_PARAMS;
+
+	ret = fscontext_alloc_log(fc);
+	if (ret < 0)
+		goto err_fc;
+
+	path_put(&target);
+	return fscontext_create_fd(fc, flags & FSPICK_CLOEXEC ? O_CLOEXEC : 0);
+
+err_fc:
+	put_fs_context(fc);
+err_path:
+	path_put(&target);
+err:
+	return ret;
+}
+
 /*
  * Check the state and apply the configuration.  Note that this function is
  * allowed to 'steal' the value by setting param->xxx to NULL before returning.
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a332a731a0e4..c3f9d865b91d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -934,6 +934,7 @@ asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags);
 asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key,
 			     const void __user *value, int aux);
 asmlinkage long sys_fsmount(int fs_fd, unsigned int flags, unsigned int ms_flags);
+asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 3888d3b91dc5..96a0240f23fe 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -77,6 +77,14 @@
  */
 #define FSOPEN_CLOEXEC		0x00000001
 
+/*
+ * fspick() flags.
+ */
+#define FSPICK_CLOEXEC		0x00000001
+#define FSPICK_SYMLINK_NOFOLLOW	0x00000002
+#define FSPICK_NO_AUTOMOUNT	0x00000004
+#define FSPICK_EMPTY_PATH	0x00000008
+
 /*
  * The type of fsconfig() call made.
  */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 10/10] vfs: Add a sample program for the new mount API
  2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
                   ` (8 preceding siblings ...)
  2019-02-19 17:09 ` [PATCH 09/10] vfs: syscall: Add fspick() to select a superblock for reconfiguration David Howells
@ 2019-02-19 17:09 ` David Howells
  9 siblings, 0 replies; 33+ messages in thread
From: David Howells @ 2019-02-19 17:09 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, dhowells, torvalds, ebiederm, linux-security-module

Add a sample program to demonstrate fsopen/fsmount/move_mount to mount
something.

To make it compile on all arches, irrespective of whether or not syscall
numbers are assigned, define the syscall number to -1 if it isn't to cause
the kernel to return -ENOSYS.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 samples/Kconfig            |    9 +
 samples/Makefile           |    2 
 samples/statx/Makefile     |    7 -
 samples/statx/test-statx.c |  258 -------------------------------------------
 samples/vfs/Makefile       |   10 ++
 samples/vfs/test-fsmount.c |  133 ++++++++++++++++++++++
 samples/vfs/test-statx.c   |  267 ++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 416 insertions(+), 270 deletions(-)
 delete mode 100644 samples/statx/Makefile
 delete mode 100644 samples/statx/test-statx.c
 create mode 100644 samples/vfs/Makefile
 create mode 100644 samples/vfs/test-fsmount.c
 create mode 100644 samples/vfs/test-statx.c

diff --git a/samples/Kconfig b/samples/Kconfig
index ad1ec7016d4c..dc4eb5355fad 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -147,10 +147,11 @@ config SAMPLE_VFIO_MDEV_MBOCHS
 	  Specifically it does *not* include any legacy vga stuff.
 	  Device looks a lot like "qemu -device secondary-vga".
 
-config SAMPLE_STATX
-	bool "Build example extended-stat using code"
-	depends on BROKEN
+config SAMPLE_VFS
+	bool "Build example programs that use new VFS system calls"
 	help
-	  Build example userspace program to use the new extended-stat syscall.
+	  Build example userspace programs that use new VFS system calls such
+	  as mount API and statx().  Note that this is restricted to the x86
+	  arch whilst it accesses system calls that aren't yet in all arches.
 
 endif # SAMPLES
diff --git a/samples/Makefile b/samples/Makefile
index bd601c038b86..c5a6175c2d3f 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -3,4 +3,4 @@
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ trace_events/ livepatch/ \
 			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
 			   configfs/ connector/ v4l/ trace_printk/ \
-			   vfio-mdev/ statx/ qmi/
+			   vfio-mdev/ vfs/ qmi/
diff --git a/samples/statx/Makefile b/samples/statx/Makefile
deleted file mode 100644
index 59df7c25a9d1..000000000000
--- a/samples/statx/Makefile
+++ /dev/null
@@ -1,7 +0,0 @@
-# List of programs to build
-hostprogs-$(CONFIG_SAMPLE_STATX) := test-statx
-
-# Tell kbuild to always build the programs
-always := $(hostprogs-y)
-
-HOSTCFLAGS_test-statx.o += -I$(objtree)/usr/include
diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c
deleted file mode 100644
index d4d77b09412c..000000000000
--- a/samples/statx/test-statx.c
+++ /dev/null
@@ -1,258 +0,0 @@
-/* Test the statx() system call.
- *
- * Note that the output of this program is intended to look like the output of
- * /bin/stat where possible.
- *
- * Copyright (C) 2015 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
- */
-
-#define _GNU_SOURCE
-#define _ATFILE_SOURCE
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-#include <ctype.h>
-#include <errno.h>
-#include <time.h>
-#include <sys/syscall.h>
-#include <sys/types.h>
-#include <linux/stat.h>
-#include <linux/fcntl.h>
-#include <sys/stat.h>
-
-#define AT_STATX_SYNC_TYPE	0x6000
-#define AT_STATX_SYNC_AS_STAT	0x0000
-#define AT_STATX_FORCE_SYNC	0x2000
-#define AT_STATX_DONT_SYNC	0x4000
-
-static __attribute__((unused))
-ssize_t statx(int dfd, const char *filename, unsigned flags,
-	      unsigned int mask, struct statx *buffer)
-{
-	return syscall(__NR_statx, dfd, filename, flags, mask, buffer);
-}
-
-static void print_time(const char *field, struct statx_timestamp *ts)
-{
-	struct tm tm;
-	time_t tim;
-	char buffer[100];
-	int len;
-
-	tim = ts->tv_sec;
-	if (!localtime_r(&tim, &tm)) {
-		perror("localtime_r");
-		exit(1);
-	}
-	len = strftime(buffer, 100, "%F %T", &tm);
-	if (len == 0) {
-		perror("strftime");
-		exit(1);
-	}
-	printf("%s", field);
-	fwrite(buffer, 1, len, stdout);
-	printf(".%09u", ts->tv_nsec);
-	len = strftime(buffer, 100, "%z", &tm);
-	if (len == 0) {
-		perror("strftime2");
-		exit(1);
-	}
-	fwrite(buffer, 1, len, stdout);
-	printf("\n");
-}
-
-static void dump_statx(struct statx *stx)
-{
-	char buffer[256], ft = '?';
-
-	printf("results=%x\n", stx->stx_mask);
-
-	printf(" ");
-	if (stx->stx_mask & STATX_SIZE)
-		printf(" Size: %-15llu", (unsigned long long)stx->stx_size);
-	if (stx->stx_mask & STATX_BLOCKS)
-		printf(" Blocks: %-10llu", (unsigned long long)stx->stx_blocks);
-	printf(" IO Block: %-6llu", (unsigned long long)stx->stx_blksize);
-	if (stx->stx_mask & STATX_TYPE) {
-		switch (stx->stx_mode & S_IFMT) {
-		case S_IFIFO:	printf("  FIFO\n");			ft = 'p'; break;
-		case S_IFCHR:	printf("  character special file\n");	ft = 'c'; break;
-		case S_IFDIR:	printf("  directory\n");		ft = 'd'; break;
-		case S_IFBLK:	printf("  block special file\n");	ft = 'b'; break;
-		case S_IFREG:	printf("  regular file\n");		ft = '-'; break;
-		case S_IFLNK:	printf("  symbolic link\n");		ft = 'l'; break;
-		case S_IFSOCK:	printf("  socket\n");			ft = 's'; break;
-		default:
-			printf(" unknown type (%o)\n", stx->stx_mode & S_IFMT);
-			break;
-		}
-	} else {
-		printf(" no type\n");
-	}
-
-	sprintf(buffer, "%02x:%02x", stx->stx_dev_major, stx->stx_dev_minor);
-	printf("Device: %-15s", buffer);
-	if (stx->stx_mask & STATX_INO)
-		printf(" Inode: %-11llu", (unsigned long long) stx->stx_ino);
-	if (stx->stx_mask & STATX_NLINK)
-		printf(" Links: %-5u", stx->stx_nlink);
-	if (stx->stx_mask & STATX_TYPE) {
-		switch (stx->stx_mode & S_IFMT) {
-		case S_IFBLK:
-		case S_IFCHR:
-			printf(" Device type: %u,%u",
-			       stx->stx_rdev_major, stx->stx_rdev_minor);
-			break;
-		}
-	}
-	printf("\n");
-
-	if (stx->stx_mask & STATX_MODE)
-		printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c)  ",
-		       stx->stx_mode & 07777,
-		       ft,
-		       stx->stx_mode & S_IRUSR ? 'r' : '-',
-		       stx->stx_mode & S_IWUSR ? 'w' : '-',
-		       stx->stx_mode & S_IXUSR ? 'x' : '-',
-		       stx->stx_mode & S_IRGRP ? 'r' : '-',
-		       stx->stx_mode & S_IWGRP ? 'w' : '-',
-		       stx->stx_mode & S_IXGRP ? 'x' : '-',
-		       stx->stx_mode & S_IROTH ? 'r' : '-',
-		       stx->stx_mode & S_IWOTH ? 'w' : '-',
-		       stx->stx_mode & S_IXOTH ? 'x' : '-');
-	if (stx->stx_mask & STATX_UID)
-		printf("Uid: %5d   ", stx->stx_uid);
-	if (stx->stx_mask & STATX_GID)
-		printf("Gid: %5d\n", stx->stx_gid);
-
-	if (stx->stx_mask & STATX_ATIME)
-		print_time("Access: ", &stx->stx_atime);
-	if (stx->stx_mask & STATX_MTIME)
-		print_time("Modify: ", &stx->stx_mtime);
-	if (stx->stx_mask & STATX_CTIME)
-		print_time("Change: ", &stx->stx_ctime);
-	if (stx->stx_mask & STATX_BTIME)
-		print_time(" Birth: ", &stx->stx_btime);
-
-	if (stx->stx_attributes_mask) {
-		unsigned char bits, mbits;
-		int loop, byte;
-
-		static char attr_representation[64 + 1] =
-			/* STATX_ATTR_ flags: */
-			"????????"	/* 63-56 */
-			"????????"	/* 55-48 */
-			"????????"	/* 47-40 */
-			"????????"	/* 39-32 */
-			"????????"	/* 31-24	0x00000000-ff000000 */
-			"????????"	/* 23-16	0x00000000-00ff0000 */
-			"???me???"	/* 15- 8	0x00000000-0000ff00 */
-			"?dai?c??"	/*  7- 0	0x00000000-000000ff */
-			;
-
-		printf("Attributes: %016llx (", stx->stx_attributes);
-		for (byte = 64 - 8; byte >= 0; byte -= 8) {
-			bits = stx->stx_attributes >> byte;
-			mbits = stx->stx_attributes_mask >> byte;
-			for (loop = 7; loop >= 0; loop--) {
-				int bit = byte + loop;
-
-				if (!(mbits & 0x80))
-					putchar('.');	/* Not supported */
-				else if (bits & 0x80)
-					putchar(attr_representation[63 - bit]);
-				else
-					putchar('-');	/* Not set */
-				bits <<= 1;
-				mbits <<= 1;
-			}
-			if (byte)
-				putchar(' ');
-		}
-		printf(")\n");
-	}
-}
-
-static void dump_hex(unsigned long long *data, int from, int to)
-{
-	unsigned offset, print_offset = 1, col = 0;
-
-	from /= 8;
-	to = (to + 7) / 8;
-
-	for (offset = from; offset < to; offset++) {
-		if (print_offset) {
-			printf("%04x: ", offset * 8);
-			print_offset = 0;
-		}
-		printf("%016llx", data[offset]);
-		col++;
-		if ((col & 3) == 0) {
-			printf("\n");
-			print_offset = 1;
-		} else {
-			printf(" ");
-		}
-	}
-
-	if (!print_offset)
-		printf("\n");
-}
-
-int main(int argc, char **argv)
-{
-	struct statx stx;
-	int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
-
-	unsigned int mask = STATX_ALL;
-
-	for (argv++; *argv; argv++) {
-		if (strcmp(*argv, "-F") == 0) {
-			atflag &= ~AT_STATX_SYNC_TYPE;
-			atflag |= AT_STATX_FORCE_SYNC;
-			continue;
-		}
-		if (strcmp(*argv, "-D") == 0) {
-			atflag &= ~AT_STATX_SYNC_TYPE;
-			atflag |= AT_STATX_DONT_SYNC;
-			continue;
-		}
-		if (strcmp(*argv, "-L") == 0) {
-			atflag &= ~AT_SYMLINK_NOFOLLOW;
-			continue;
-		}
-		if (strcmp(*argv, "-O") == 0) {
-			mask &= ~STATX_BASIC_STATS;
-			continue;
-		}
-		if (strcmp(*argv, "-A") == 0) {
-			atflag |= AT_NO_AUTOMOUNT;
-			continue;
-		}
-		if (strcmp(*argv, "-R") == 0) {
-			raw = 1;
-			continue;
-		}
-
-		memset(&stx, 0xbf, sizeof(stx));
-		ret = statx(AT_FDCWD, *argv, atflag, mask, &stx);
-		printf("statx(%s) = %d\n", *argv, ret);
-		if (ret < 0) {
-			perror(*argv);
-			exit(1);
-		}
-
-		if (raw)
-			dump_hex((unsigned long long *)&stx, 0, sizeof(stx));
-
-		dump_statx(&stx);
-	}
-	return 0;
-}
diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
new file mode 100644
index 000000000000..4ac9690fb3c4
--- /dev/null
+++ b/samples/vfs/Makefile
@@ -0,0 +1,10 @@
+# List of programs to build
+hostprogs-$(CONFIG_SAMPLE_VFS) := \
+	test-fsmount \
+	test-statx
+
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
+
+HOSTCFLAGS_test-fsmount.o += -I$(objtree)/usr/include
+HOSTCFLAGS_test-statx.o += -I$(objtree)/usr/include
diff --git a/samples/vfs/test-fsmount.c b/samples/vfs/test-fsmount.c
new file mode 100644
index 000000000000..266d72b3dce4
--- /dev/null
+++ b/samples/vfs/test-fsmount.c
@@ -0,0 +1,133 @@
+/* fd-based mount test.
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <linux/mount.h>
+#include <linux/unistd.h>
+
+#define E(x) do { if ((x) == -1) { perror(#x); exit(1); } } while(0)
+
+static void check_messages(int fd)
+{
+	char buf[4096];
+	int err, n;
+
+	err = errno;
+
+	for (;;) {
+		n = read(fd, buf, sizeof(buf));
+		if (n < 0)
+			break;
+		n -= 2;
+
+		switch (buf[0]) {
+		case 'e':
+			fprintf(stderr, "Error: %*.*s\n", n, n, buf + 2);
+			break;
+		case 'w':
+			fprintf(stderr, "Warning: %*.*s\n", n, n, buf + 2);
+			break;
+		case 'i':
+			fprintf(stderr, "Info: %*.*s\n", n, n, buf + 2);
+			break;
+		}
+	}
+
+	errno = err;
+}
+
+static __attribute__((noreturn))
+void mount_error(int fd, const char *s)
+{
+	check_messages(fd);
+	fprintf(stderr, "%s: %m\n", s);
+	exit(1);
+}
+
+/* Hope -1 isn't a syscall */
+#ifndef __NR_fsopen
+#define __NR_fsopen -1
+#endif
+#ifndef __NR_fsmount
+#define __NR_fsmount -1
+#endif
+#ifndef __NR_fsconfig
+#define __NR_fsconfig -1
+#endif
+#ifndef __NR_move_mount
+#define __NR_move_mount -1
+#endif
+
+
+static inline int fsopen(const char *fs_name, unsigned int flags)
+{
+	return syscall(__NR_fsopen, fs_name, flags);
+}
+
+static inline int fsmount(int fsfd, unsigned int flags, unsigned int ms_flags)
+{
+	return syscall(__NR_fsmount, fsfd, flags, ms_flags);
+}
+
+static inline int fsconfig(int fsfd, unsigned int cmd,
+			   const char *key, const void *val, int aux)
+{
+	return syscall(__NR_fsconfig, fsfd, cmd, key, val, aux);
+}
+
+static inline int move_mount(int from_dfd, const char *from_pathname,
+			     int to_dfd, const char *to_pathname,
+			     unsigned int flags)
+{
+	return syscall(__NR_move_mount,
+		       from_dfd, from_pathname,
+		       to_dfd, to_pathname, flags);
+}
+
+#define E_fsconfig(fd, cmd, key, val, aux)				\
+	do {								\
+		if (fsconfig(fd, cmd, key, val, aux) == -1)		\
+			mount_error(fd, key ?: "create");		\
+	} while (0)
+
+int main(int argc, char *argv[])
+{
+	int fsfd, mfd;
+
+	/* Mount a publically available AFS filesystem */
+	fsfd = fsopen("afs", 0);
+	if (fsfd == -1) {
+		perror("fsopen");
+		exit(1);
+	}
+
+	E_fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "#grand.central.org:root.cell.", 0);
+	E_fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+
+	mfd = fsmount(fsfd, 0, MOUNT_ATTR_RDONLY);
+	if (mfd < 0)
+		mount_error(fsfd, "fsmount");
+	E(close(fsfd));
+
+	if (move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH) < 0) {
+		perror("move_mount");
+		exit(1);
+	}
+
+	E(close(mfd));
+	exit(0);
+}
diff --git a/samples/vfs/test-statx.c b/samples/vfs/test-statx.c
new file mode 100644
index 000000000000..e91f918e84c4
--- /dev/null
+++ b/samples/vfs/test-statx.c
@@ -0,0 +1,267 @@
+/* Test the statx() system call.
+ *
+ * Note that the output of this program is intended to look like the output of
+ * /bin/stat where possible.
+ *
+ * Copyright (C) 2015 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#define _GNU_SOURCE
+#define _ATFILE_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <ctype.h>
+#include <errno.h>
+#include <time.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <linux/stat.h>
+#include <linux/fcntl.h>
+#define statx foo
+#define statx_timestamp foo_timestamp
+#include <sys/stat.h>
+#undef statx
+#undef statx_timestamp
+
+#define AT_STATX_SYNC_TYPE	0x6000
+#define AT_STATX_SYNC_AS_STAT	0x0000
+#define AT_STATX_FORCE_SYNC	0x2000
+#define AT_STATX_DONT_SYNC	0x4000
+
+#ifndef __NR_statx
+#define __NR_statx -1
+#endif
+
+static __attribute__((unused))
+ssize_t statx(int dfd, const char *filename, unsigned flags,
+	      unsigned int mask, struct statx *buffer)
+{
+	return syscall(__NR_statx, dfd, filename, flags, mask, buffer);
+}
+
+static void print_time(const char *field, struct statx_timestamp *ts)
+{
+	struct tm tm;
+	time_t tim;
+	char buffer[100];
+	int len;
+
+	tim = ts->tv_sec;
+	if (!localtime_r(&tim, &tm)) {
+		perror("localtime_r");
+		exit(1);
+	}
+	len = strftime(buffer, 100, "%F %T", &tm);
+	if (len == 0) {
+		perror("strftime");
+		exit(1);
+	}
+	printf("%s", field);
+	fwrite(buffer, 1, len, stdout);
+	printf(".%09u", ts->tv_nsec);
+	len = strftime(buffer, 100, "%z", &tm);
+	if (len == 0) {
+		perror("strftime2");
+		exit(1);
+	}
+	fwrite(buffer, 1, len, stdout);
+	printf("\n");
+}
+
+static void dump_statx(struct statx *stx)
+{
+	char buffer[256], ft = '?';
+
+	printf("results=%x\n", stx->stx_mask);
+
+	printf(" ");
+	if (stx->stx_mask & STATX_SIZE)
+		printf(" Size: %-15llu", (unsigned long long)stx->stx_size);
+	if (stx->stx_mask & STATX_BLOCKS)
+		printf(" Blocks: %-10llu", (unsigned long long)stx->stx_blocks);
+	printf(" IO Block: %-6llu", (unsigned long long)stx->stx_blksize);
+	if (stx->stx_mask & STATX_TYPE) {
+		switch (stx->stx_mode & S_IFMT) {
+		case S_IFIFO:	printf("  FIFO\n");			ft = 'p'; break;
+		case S_IFCHR:	printf("  character special file\n");	ft = 'c'; break;
+		case S_IFDIR:	printf("  directory\n");		ft = 'd'; break;
+		case S_IFBLK:	printf("  block special file\n");	ft = 'b'; break;
+		case S_IFREG:	printf("  regular file\n");		ft = '-'; break;
+		case S_IFLNK:	printf("  symbolic link\n");		ft = 'l'; break;
+		case S_IFSOCK:	printf("  socket\n");			ft = 's'; break;
+		default:
+			printf(" unknown type (%o)\n", stx->stx_mode & S_IFMT);
+			break;
+		}
+	} else {
+		printf(" no type\n");
+	}
+
+	sprintf(buffer, "%02x:%02x", stx->stx_dev_major, stx->stx_dev_minor);
+	printf("Device: %-15s", buffer);
+	if (stx->stx_mask & STATX_INO)
+		printf(" Inode: %-11llu", (unsigned long long) stx->stx_ino);
+	if (stx->stx_mask & STATX_NLINK)
+		printf(" Links: %-5u", stx->stx_nlink);
+	if (stx->stx_mask & STATX_TYPE) {
+		switch (stx->stx_mode & S_IFMT) {
+		case S_IFBLK:
+		case S_IFCHR:
+			printf(" Device type: %u,%u",
+			       stx->stx_rdev_major, stx->stx_rdev_minor);
+			break;
+		}
+	}
+	printf("\n");
+
+	if (stx->stx_mask & STATX_MODE)
+		printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c)  ",
+		       stx->stx_mode & 07777,
+		       ft,
+		       stx->stx_mode & S_IRUSR ? 'r' : '-',
+		       stx->stx_mode & S_IWUSR ? 'w' : '-',
+		       stx->stx_mode & S_IXUSR ? 'x' : '-',
+		       stx->stx_mode & S_IRGRP ? 'r' : '-',
+		       stx->stx_mode & S_IWGRP ? 'w' : '-',
+		       stx->stx_mode & S_IXGRP ? 'x' : '-',
+		       stx->stx_mode & S_IROTH ? 'r' : '-',
+		       stx->stx_mode & S_IWOTH ? 'w' : '-',
+		       stx->stx_mode & S_IXOTH ? 'x' : '-');
+	if (stx->stx_mask & STATX_UID)
+		printf("Uid: %5d   ", stx->stx_uid);
+	if (stx->stx_mask & STATX_GID)
+		printf("Gid: %5d\n", stx->stx_gid);
+
+	if (stx->stx_mask & STATX_ATIME)
+		print_time("Access: ", &stx->stx_atime);
+	if (stx->stx_mask & STATX_MTIME)
+		print_time("Modify: ", &stx->stx_mtime);
+	if (stx->stx_mask & STATX_CTIME)
+		print_time("Change: ", &stx->stx_ctime);
+	if (stx->stx_mask & STATX_BTIME)
+		print_time(" Birth: ", &stx->stx_btime);
+
+	if (stx->stx_attributes_mask) {
+		unsigned char bits, mbits;
+		int loop, byte;
+
+		static char attr_representation[64 + 1] =
+			/* STATX_ATTR_ flags: */
+			"????????"	/* 63-56 */
+			"????????"	/* 55-48 */
+			"????????"	/* 47-40 */
+			"????????"	/* 39-32 */
+			"????????"	/* 31-24	0x00000000-ff000000 */
+			"????????"	/* 23-16	0x00000000-00ff0000 */
+			"???me???"	/* 15- 8	0x00000000-0000ff00 */
+			"?dai?c??"	/*  7- 0	0x00000000-000000ff */
+			;
+
+		printf("Attributes: %016llx (",
+		       (unsigned long long)stx->stx_attributes);
+		for (byte = 64 - 8; byte >= 0; byte -= 8) {
+			bits = stx->stx_attributes >> byte;
+			mbits = stx->stx_attributes_mask >> byte;
+			for (loop = 7; loop >= 0; loop--) {
+				int bit = byte + loop;
+
+				if (!(mbits & 0x80))
+					putchar('.');	/* Not supported */
+				else if (bits & 0x80)
+					putchar(attr_representation[63 - bit]);
+				else
+					putchar('-');	/* Not set */
+				bits <<= 1;
+				mbits <<= 1;
+			}
+			if (byte)
+				putchar(' ');
+		}
+		printf(")\n");
+	}
+}
+
+static void dump_hex(unsigned long long *data, int from, int to)
+{
+	unsigned offset, print_offset = 1, col = 0;
+
+	from /= 8;
+	to = (to + 7) / 8;
+
+	for (offset = from; offset < to; offset++) {
+		if (print_offset) {
+			printf("%04x: ", offset * 8);
+			print_offset = 0;
+		}
+		printf("%016llx", data[offset]);
+		col++;
+		if ((col & 3) == 0) {
+			printf("\n");
+			print_offset = 1;
+		} else {
+			printf(" ");
+		}
+	}
+
+	if (!print_offset)
+		printf("\n");
+}
+
+int main(int argc, char **argv)
+{
+	struct statx stx;
+	int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
+
+	unsigned int mask = STATX_ALL;
+
+	for (argv++; *argv; argv++) {
+		if (strcmp(*argv, "-F") == 0) {
+			atflag &= ~AT_STATX_SYNC_TYPE;
+			atflag |= AT_STATX_FORCE_SYNC;
+			continue;
+		}
+		if (strcmp(*argv, "-D") == 0) {
+			atflag &= ~AT_STATX_SYNC_TYPE;
+			atflag |= AT_STATX_DONT_SYNC;
+			continue;
+		}
+		if (strcmp(*argv, "-L") == 0) {
+			atflag &= ~AT_SYMLINK_NOFOLLOW;
+			continue;
+		}
+		if (strcmp(*argv, "-O") == 0) {
+			mask &= ~STATX_BASIC_STATS;
+			continue;
+		}
+		if (strcmp(*argv, "-A") == 0) {
+			atflag |= AT_NO_AUTOMOUNT;
+			continue;
+		}
+		if (strcmp(*argv, "-R") == 0) {
+			raw = 1;
+			continue;
+		}
+
+		memset(&stx, 0xbf, sizeof(stx));
+		ret = statx(AT_FDCWD, *argv, atflag, mask, &stx);
+		printf("statx(%s) = %d\n", *argv, ret);
+		if (ret < 0) {
+			perror(*argv);
+			exit(1);
+		}
+
+		if (raw)
+			dump_hex((unsigned long long *)&stx, 0, sizeof(stx));
+
+		dump_statx(&stx);
+	}
+	return 0;
+}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-02-19 17:08 ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around David Howells
@ 2019-02-20 12:32   ` Alan Jenkins
  2019-02-20 12:41     ` Alan Jenkins
  2019-02-20 16:23   ` Jann Horn
  2019-07-08 12:02   ` Tetsuo Handa
  2 siblings, 1 reply; 33+ messages in thread
From: Alan Jenkins @ 2019-02-20 12:32 UTC (permalink / raw)
  To: David Howells, viro
  Cc: linux-api, linux-fsdevel, torvalds, ebiederm, linux-security-module

On 19/02/2019 17:08, David Howells wrote:
> Add a move_mount() system call that will move a mount from one place to
> another and, in the next commit, allow to attach an unattached mount tree.
>
> The new system call looks like the following:
>
> 	int move_mount(int from_dfd, const char *from_path,
> 		       int to_dfd, const char *to_path,
> 		       unsigned int flags);
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: linux-api@vger.kernel.org
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>
>   arch/x86/entry/syscalls/syscall_32.tbl |    1
>   arch/x86/entry/syscalls/syscall_64.tbl |    1
>   fs/namespace.c                         |  126 ++++++++++++++++++++++++--------
>   include/linux/lsm_hooks.h              |    6 ++
>   include/linux/security.h               |    7 ++
>   include/linux/syscalls.h               |    3 +
>   include/uapi/linux/mount.h             |   11 +++
>   security/security.c                    |    5 +
>   8 files changed, 129 insertions(+), 31 deletions(-)

> diff --git a/fs/namespace.c b/fs/namespace.c
> index 112d46f26fc3..f10122028a11 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2537,72 +2537,81 @@ static inline int tree_contains_unbindable(struct mount *mnt)
>   	return 0;
>   }
>   
> -static int do_move_mount(struct path *path, const char *old_name)
> +static int do_move_mount(struct path *old_path, struct path *new_path)
>   {
> -	struct path old_path, parent_path;
> +	struct path parent_path = {.mnt = NULL, .dentry = NULL};
>   	struct mount *p;
>   	struct mount *old;
>   	struct mountpoint *mp;
>   	int err;
> -	if (!old_name || !*old_name)
> -		return -EINVAL;
> -	err = kern_path(old_name, LOOKUP_FOLLOW, &old_path);
> -	if (err)
> -		return err;
>   
> -	mp = lock_mount(path);
> -	err = PTR_ERR(mp);
> +	mp = lock_mount(new_path);
>   	if (IS_ERR(mp))
> -		goto out;
> +		return PTR_ERR(mp);
>   
> -	old = real_mount(old_path.mnt);
> -	p = real_mount(path->mnt);
> +	old = real_mount(old_path->mnt);
> +	p = real_mount(new_path->mnt);
>   
>   	err = -EINVAL;
>   	if (!check_mnt(p) || !check_mnt(old))
> -		goto out1;
> +		goto out;
>   
> -	if (old->mnt.mnt_flags & MNT_LOCKED)
> -		goto out1;
> +	if (!mnt_has_parent(old))
> +		goto out;
>   
> -	err = -EINVAL;
> -	if (old_path.dentry != old_path.mnt->mnt_root)
> -		goto out1;
> +	if (old->mnt.mnt_flags & MNT_LOCKED)
> +		goto out;
>   
> -	if (!mnt_has_parent(old))
> -		goto out1;
> +	if (old_path->dentry != old_path->mnt->mnt_root)
> +		goto out;
>   
> -	if (d_is_dir(path->dentry) !=
> -	      d_is_dir(old_path.dentry))
> -		goto out1;
> +	if (d_is_dir(new_path->dentry) !=
> +	    d_is_dir(old_path->dentry))
> +		goto out;
>   	/*
>   	 * Don't move a mount residing in a shared parent.
>   	 */
>   	if (IS_MNT_SHARED(old->mnt_parent))
> -		goto out1;
> +		goto out;
>   	/*
>   	 * Don't move a mount tree containing unbindable mounts to a destination
>   	 * mount which is shared.
>   	 */
>   	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
> -		goto out1;
> +		goto out;
>   	err = -ELOOP;
>   	for (; mnt_has_parent(p); p = p->mnt_parent)
>   		if (p == old)
> -			goto out1;
> +			goto out;
>   
> -	err = attach_recursive_mnt(old, real_mount(path->mnt), mp, &parent_path);
> +	err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp,
> +				   &parent_path);
>   	if (err)
> -		goto out1;
> +		goto out;
>   
>   	/* if the mount is moved, it should no longer be expire
>   	 * automatically */
>   	list_del_init(&old->mnt_expire);
> -out1:
> -	unlock_mount(mp);
>   out:
> +	unlock_mount(mp);
>   	if (!err)
>   		path_put(&parent_path);
> +	return err;
> +}
> +
> +static int do_move_mount_old(struct path *path, const char *old_name)
> +{
> +	struct path old_path;
> +	int err;
> +
> +	if (!old_name || !*old_name)
> +		return -EINVAL;
> +
> +	err = kern_path(old_name, LOOKUP_FOLLOW, &old_path);
> +	if (err)
> +		return err;
> +
> +	err = do_move_mount(&old_path, path);
>   	path_put(&old_path);
>   	return err;
>   }
> @@ -3050,7 +3059,7 @@ long do_mount(const char *dev_name, const char __user *dir_name,
>   	else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
>   		retval = do_change_type(&path, flags);
>   	else if (flags & MS_MOVE)
> -		retval = do_move_mount(&path, dev_name);
> +		retval = do_move_mount_old(&path, dev_name);
>   	else
>   		retval = do_new_mount(&path, type_page, sb_flags, mnt_flags,
>   				      dev_name, data_page);
> @@ -3278,6 +3287,61 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
>   	return ksys_mount(dev_name, dir_name, type, flags, data);
>   }
>   
> +/*
> + * Move a mount from one place to another.
> + *
> + * Note the flags value is a combination of MOVE_MOUNT_* flags.
> + */
> +SYSCALL_DEFINE5(move_mount,
> +		int, from_dfd, const char *, from_pathname,
> +		int, to_dfd, const char *, to_pathname,
> +		unsigned int, flags)
> +{
> +	struct path from_path, to_path;
> +	unsigned int lflags;
> +	int ret = 0;
> +
> +	if (!may_mount())
> +		return -EPERM;
> +
> +	if (flags & ~MOVE_MOUNT__MASK)
> +		return -EINVAL;
> +
> +	/* If someone gives a pathname, they aren't permitted to move
> +	 * from an fd that requires unmount as we can't get at the flag
> +	 * to clear it afterwards.
> +	 */

Comment is incorrect.

* FMODE_NEED_UNMOUNT is never cleared.

* Technically I don't see anything preventing them giving a pathname, 
but it needs to be "." or equivalent.  Otherwise it will fail the 
"!attached" check in the next patch.

* The only argument I remember for preventing this, was that it might 
confuse users (not the kernel).  If you are allowed to move from a 
sub-mount, then in certain programming styles - like my shell script 
test cases - you might accidentally close the original file too early.  
Then you won't be able to do move_mount() from the tree, because the 
tree was unmounted ("dissolved") when you closed it.

I think the description in the previous patch, for open_tree(), makes it 
clear though. "The detached tree will be dissolved on the final close of 
obtained file".

If there is a good reason, I expect we can simply remove the "!attached" 
part of the check.  If the constraint is generating more confusion than 
the added flexibility, I think that would be a good reason :-).

> +	lflags = 0;
> +	if (flags & MOVE_MOUNT_F_SYMLINKS)	lflags |= LOOKUP_FOLLOW;
> +	if (flags & MOVE_MOUNT_F_AUTOMOUNTS)	lflags |= LOOKUP_AUTOMOUNT;
> +	if (flags & MOVE_MOUNT_F_EMPTY_PATH)	lflags |= LOOKUP_EMPTY;
> +
> +	ret = user_path_at(from_dfd, from_pathname, lflags, &from_path);
> +	if (ret < 0)
> +		return ret;
> +
> +	lflags = 0;
> +	if (flags & MOVE_MOUNT_T_SYMLINKS)	lflags |= LOOKUP_FOLLOW;
> +	if (flags & MOVE_MOUNT_T_AUTOMOUNTS)	lflags |= LOOKUP_AUTOMOUNT;
> +	if (flags & MOVE_MOUNT_T_EMPTY_PATH)	lflags |= LOOKUP_EMPTY;
> +
> +	ret = user_path_at(to_dfd, to_pathname, lflags, &to_path);
> +	if (ret < 0)
> +		goto out_from;
> +
> +	ret = security_move_mount(&from_path, &to_path);
> +	if (ret < 0)
> +		goto out_to;
> +
> +	ret = do_move_mount(&from_path, &to_path);
> +
> +out_to:
> +	path_put(&to_path);
> +out_from:
> +	path_put(&from_path);
> +	return ret;
> +}
> +
>   /*
>    * Return true if path is reachable from root
>    *
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-02-20 12:32   ` Alan Jenkins
@ 2019-02-20 12:41     ` Alan Jenkins
  0 siblings, 0 replies; 33+ messages in thread
From: Alan Jenkins @ 2019-02-20 12:41 UTC (permalink / raw)
  To: David Howells, viro
  Cc: linux-api, linux-fsdevel, torvalds, ebiederm, linux-security-module

On 20/02/2019 12:32, Alan Jenkins wrote:
> On 19/02/2019 17:08, David Howells wrote:
>> Add a move_mount() system call that will move a mount from one place to
>> another and, in the next commit, allow to attach an unattached mount 
>> tree.
>>
>> The new system call looks like the following:
>>
>>     int move_mount(int from_dfd, const char *from_path,
>>                int to_dfd, const char *to_path,
>>                unsigned int flags);
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: linux-api@vger.kernel.org
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>> ---
>>
>>   arch/x86/entry/syscalls/syscall_32.tbl |    1
>>   arch/x86/entry/syscalls/syscall_64.tbl |    1
>>   fs/namespace.c                         |  126 
>> ++++++++++++++++++++++++--------
>>   include/linux/lsm_hooks.h              |    6 ++
>>   include/linux/security.h               |    7 ++
>>   include/linux/syscalls.h               |    3 +
>>   include/uapi/linux/mount.h             |   11 +++
>>   security/security.c                    |    5 +
>>   8 files changed, 129 insertions(+), 31 deletions(-)
>
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index 112d46f26fc3..f10122028a11 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -2537,72 +2537,81 @@ static inline int 
>> tree_contains_unbindable(struct mount *mnt)
>>       return 0;
>>   }
>>   -static int do_move_mount(struct path *path, const char *old_name)
>> +static int do_move_mount(struct path *old_path, struct path *new_path)
>>   {
>> -    struct path old_path, parent_path;
>> +    struct path parent_path = {.mnt = NULL, .dentry = NULL};
>>       struct mount *p;
>>       struct mount *old;
>>       struct mountpoint *mp;
>>       int err;
>> -    if (!old_name || !*old_name)
>> -        return -EINVAL;
>> -    err = kern_path(old_name, LOOKUP_FOLLOW, &old_path);
>> -    if (err)
>> -        return err;
>>   -    mp = lock_mount(path);
>> -    err = PTR_ERR(mp);
>> +    mp = lock_mount(new_path);
>>       if (IS_ERR(mp))
>> -        goto out;
>> +        return PTR_ERR(mp);
>>   -    old = real_mount(old_path.mnt);
>> -    p = real_mount(path->mnt);
>> +    old = real_mount(old_path->mnt);
>> +    p = real_mount(new_path->mnt);
>>         err = -EINVAL;
>>       if (!check_mnt(p) || !check_mnt(old))
>> -        goto out1;
>> +        goto out;
>>   -    if (old->mnt.mnt_flags & MNT_LOCKED)
>> -        goto out1;
>> +    if (!mnt_has_parent(old))
>> +        goto out;
>>   -    err = -EINVAL;
>> -    if (old_path.dentry != old_path.mnt->mnt_root)
>> -        goto out1;
>> +    if (old->mnt.mnt_flags & MNT_LOCKED)
>> +        goto out;
>>   -    if (!mnt_has_parent(old))
>> -        goto out1;
>> +    if (old_path->dentry != old_path->mnt->mnt_root)
>> +        goto out;
>>   -    if (d_is_dir(path->dentry) !=
>> -          d_is_dir(old_path.dentry))
>> -        goto out1;
>> +    if (d_is_dir(new_path->dentry) !=
>> +        d_is_dir(old_path->dentry))
>> +        goto out;
>>       /*
>>        * Don't move a mount residing in a shared parent.
>>        */
>>       if (IS_MNT_SHARED(old->mnt_parent))
>> -        goto out1;
>> +        goto out;
>>       /*
>>        * Don't move a mount tree containing unbindable mounts to a 
>> destination
>>        * mount which is shared.
>>        */
>>       if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
>> -        goto out1;
>> +        goto out;
>>       err = -ELOOP;
>>       for (; mnt_has_parent(p); p = p->mnt_parent)
>>           if (p == old)
>> -            goto out1;
>> +            goto out;
>>   -    err = attach_recursive_mnt(old, real_mount(path->mnt), mp, 
>> &parent_path);
>> +    err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp,
>> +                   &parent_path);
>>       if (err)
>> -        goto out1;
>> +        goto out;
>>         /* if the mount is moved, it should no longer be expire
>>        * automatically */
>>       list_del_init(&old->mnt_expire);
>> -out1:
>> -    unlock_mount(mp);
>>   out:
>> +    unlock_mount(mp);
>>       if (!err)
>>           path_put(&parent_path);
>> +    return err;
>> +}
>> +
>> +static int do_move_mount_old(struct path *path, const char *old_name)
>> +{
>> +    struct path old_path;
>> +    int err;
>> +
>> +    if (!old_name || !*old_name)
>> +        return -EINVAL;
>> +
>> +    err = kern_path(old_name, LOOKUP_FOLLOW, &old_path);
>> +    if (err)
>> +        return err;
>> +
>> +    err = do_move_mount(&old_path, path);
>>       path_put(&old_path);
>>       return err;
>>   }
>> @@ -3050,7 +3059,7 @@ long do_mount(const char *dev_name, const char 
>> __user *dir_name,
>>       else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | 
>> MS_UNBINDABLE))
>>           retval = do_change_type(&path, flags);
>>       else if (flags & MS_MOVE)
>> -        retval = do_move_mount(&path, dev_name);
>> +        retval = do_move_mount_old(&path, dev_name);
>>       else
>>           retval = do_new_mount(&path, type_page, sb_flags, mnt_flags,
>>                         dev_name, data_page);
>> @@ -3278,6 +3287,61 @@ SYSCALL_DEFINE5(mount, char __user *, 
>> dev_name, char __user *, dir_name,
>>       return ksys_mount(dev_name, dir_name, type, flags, data);
>>   }
>>   +/*
>> + * Move a mount from one place to another.
>> + *
>> + * Note the flags value is a combination of MOVE_MOUNT_* flags.
>> + */
>> +SYSCALL_DEFINE5(move_mount,
>> +        int, from_dfd, const char *, from_pathname,
>> +        int, to_dfd, const char *, to_pathname,
>> +        unsigned int, flags)
>> +{
>> +    struct path from_path, to_path;
>> +    unsigned int lflags;
>> +    int ret = 0;
>> +
>> +    if (!may_mount())
>> +        return -EPERM;
>> +
>> +    if (flags & ~MOVE_MOUNT__MASK)
>> +        return -EINVAL;
>> +
>> +    /* If someone gives a pathname, they aren't permitted to move
>> +     * from an fd that requires unmount as we can't get at the flag
>> +     * to clear it afterwards.
>> +     */
>
> Comment is incorrect.
>
> * FMODE_NEED_UNMOUNT is never cleared.
>
> * Technically I don't see anything preventing them giving a pathname, 
> but it needs to be "." or equivalent.  Otherwise it will fail the 
> "!attached" check in the next patch.

>
> * The only argument I remember for preventing this, was that it might 
> confuse users (not the kernel).  If you are allowed to move from a 
> sub-mount, then in certain programming styles - like my shell script 
> test cases - you might accidentally close the original file too 
> early.  Then you won't be able to do move_mount() from the tree, 
> because the tree was unmounted ("dissolved") when you closed it.
>
> I think the description in the previous patch, for open_tree(), makes 
> it clear though. "The detached tree will be dissolved on the final 
> close of obtained file".
>
> If there is a good reason, I expect we can simply remove the 
> "!attached" part of the check.  If the constraint is generating more 
> confusion than the added flexibility, I think that would be a good 
> reason :-).

Sorry, I see it.  Although you are not clearing a flag, you have to free 
the old value of old->mnt_ns.  And that is not being reference-counted, 
it has a single owner, the file which has FMODE_NEED_UNMOUNT.  So it is 
not possible to simply remove the "!attached" check.

I still find the comment confusing, i.e. describing this as clearing a flag.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-02-19 17:08 ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around David Howells
  2019-02-20 12:32   ` Alan Jenkins
@ 2019-02-20 16:23   ` Jann Horn
  2019-07-08 12:02   ` Tetsuo Handa
  2 siblings, 0 replies; 33+ messages in thread
From: Jann Horn @ 2019-02-20 16:23 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Linux API, linux-fsdevel, Linus Torvalds,
	Eric W. Biederman, linux-security-module

On Tue, Feb 19, 2019 at 6:08 PM David Howells <dhowells@redhat.com> wrote:
> Add a move_mount() system call that will move a mount from one place to
> another and, in the next commit, allow to attach an unattached mount tree.
>
> The new system call looks like the following:
>
>         int move_mount(int from_dfd, const char *from_path,
>                        int to_dfd, const char *to_path,
>                        unsigned int flags);
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: linux-api@vger.kernel.org
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
[...]
> +/*
> + * Move a mount from one place to another.
> + *
> + * Note the flags value is a combination of MOVE_MOUNT_* flags.
> + */
> +SYSCALL_DEFINE5(move_mount,
> +               int, from_dfd, const char *, from_pathname,
> +               int, to_dfd, const char *, to_pathname,
> +               unsigned int, flags)
> +{
> +       struct path from_path, to_path;
> +       unsigned int lflags;
> +       int ret = 0;
> +
> +       if (!may_mount())
> +               return -EPERM;
> +
> +       if (flags & ~MOVE_MOUNT__MASK)
> +               return -EINVAL;
> +
> +       /* If someone gives a pathname, they aren't permitted to move
> +        * from an fd that requires unmount as we can't get at the flag
> +        * to clear it afterwards.
> +        */
> +       lflags = 0;
> +       if (flags & MOVE_MOUNT_F_SYMLINKS)      lflags |= LOOKUP_FOLLOW;
> +       if (flags & MOVE_MOUNT_F_AUTOMOUNTS)    lflags |= LOOKUP_AUTOMOUNT;
> +       if (flags & MOVE_MOUNT_F_EMPTY_PATH)    lflags |= LOOKUP_EMPTY;
> +
> +       ret = user_path_at(from_dfd, from_pathname, lflags, &from_path);
> +       if (ret < 0)
> +               return ret;
> +
> +       lflags = 0;
> +       if (flags & MOVE_MOUNT_T_SYMLINKS)      lflags |= LOOKUP_FOLLOW;
> +       if (flags & MOVE_MOUNT_T_AUTOMOUNTS)    lflags |= LOOKUP_AUTOMOUNT;
> +       if (flags & MOVE_MOUNT_T_EMPTY_PATH)    lflags |= LOOKUP_EMPTY;
> +
> +       ret = user_path_at(to_dfd, to_pathname, lflags, &to_path);
> +       if (ret < 0)
> +               goto out_from;
> +
> +       ret = security_move_mount(&from_path, &to_path);
> +       if (ret < 0)
> +               goto out_to;

Wouldn't you want to call this from do_move_mount() instead for
consistency? If MS_MOVE and this thing perform the same logical
operation, they should probably also call the same LSM hook.

> +       ret = do_move_mount(&from_path, &to_path);
> +
> +out_to:
> +       path_put(&to_path);
> +out_from:
> +       path_put(&from_path);
> +       return ret;
> +}
[...]
> diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
> index fd7ae2e7eccf..3634e065836c 100644
> --- a/include/uapi/linux/mount.h
> +++ b/include/uapi/linux/mount.h
> @@ -61,4 +61,15 @@
>  #define OPEN_TREE_CLONE                1               /* Clone the target tree and attach the clone */
>  #define OPEN_TREE_CLOEXEC      O_CLOEXEC       /* Close the file on execve() */
>
> +/*
> + * move_mount() flags.
> + */
> +#define MOVE_MOUNT_F_SYMLINKS          0x00000001 /* Follow symlinks on from path */

"Follow symlinks" sounds a bit misleading. LOOKUP_NOFOLLOW only
applies to the last element of the path; and there is a pending
patchset that's going to let userspace ask the kernel to actually not
follow *any* symlinks on the path with O_NOSYMLINKS, so this might
confuse people. Maybe change the comment to "don't follow trailing
symlink", or something like that?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE
  2019-02-19 17:08 ` [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE David Howells
@ 2019-02-20 18:59   ` Alan Jenkins
  2019-02-26 17:45   ` Alan Jenkins
  1 sibling, 0 replies; 33+ messages in thread
From: Alan Jenkins @ 2019-02-20 18:59 UTC (permalink / raw)
  To: David Howells, viro
  Cc: linux-fsdevel, torvalds, ebiederm, linux-security-module

On 19/02/2019 17:08, David Howells wrote:
> Allow a detached tree created by open_tree(..., OPEN_TREE_CLONE) to be
> attached by move_mount(2).
>
> If by the time of final fput() of OPEN_TREE_CLONE-opened file its tree is
> not detached anymore, it won't be dissolved.  move_mount(2) is adjusted
> to handle detached source.
>
> That gives us equivalents of mount --bind and mount --rbind.
> Thanks also to Alan Jenkins<alan.christopher.jenkins@gmail.com>  for
> providing a whole bunch of ways to break things using this interface.
>
> Signed-off-by: Al Viro<viro@zeniv.linux.org.uk>
> Signed-off-by: David Howells<dhowells@redhat.com>
> Signed-off-by: Al Viro<viro@zeniv.linux.org.uk>
> ---
>
>   fs/namespace.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 55 insertions(+), 7 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index f10122028a11..56423c60ac7e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1840,10 +1840,16 @@ void dissolve_on_fput(struct vfsmount *mnt)
>   	namespace_lock();
>   	lock_mount_hash();
>   	ns = real_mount(mnt)->mnt_ns;
> -	umount_tree(real_mount(mnt), UMOUNT_CONNECTED);
> +	if (ns) {
> +		if (is_anon_ns(ns))
> +			umount_tree(real_mount(mnt), UMOUNT_CONNECTED);
> +		else
> +			ns = NULL;
> +	}
>   	unlock_mount_hash();
>   	namespace_unlock();
> -	free_mnt_ns(ns);
> +	if (ns)
> +		free_mnt_ns(ns);
>   }
>   
>   void drop_collected_mounts(struct vfsmount *mnt)
> @@ -2079,6 +2085,10 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>   		attach_mnt(source_mnt, dest_mnt, dest_mp);
>   		touch_mnt_namespace(source_mnt->mnt_ns);
>   	} else {
> +		if (source_mnt->mnt_ns) {
> +			/* move from anon - the caller will destroy */
> +			list_del_init(&source_mnt->mnt_ns->list);
> +		}
>   		mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
>   		commit_tree(source_mnt);
>   	}
> @@ -2537,13 +2547,37 @@ static inline int tree_contains_unbindable(struct mount *mnt)
>   	return 0;
>   }
>   
> +/*
> + * Check that there aren't references to earlier/same mount namespaces in the
> + * specified subtree.  Such references can act as pins for mount namespaces
> + * that aren't checked by the mount-cycle checking code, thereby allowing
> + * cycles to be made.
> + */
> +static bool check_for_nsfs_mounts(struct mount *subtree)
> +{
> +	struct mount *p;
> +	bool ret = false;
> +
> +	lock_mount_hash();
> +	for (p = subtree; p; p = next_mnt(p, subtree))
> +		if (mnt_ns_loop(p->mnt.mnt_root))
> +			goto out;
> +
> +	ret = true;
> +out:
> +	unlock_mount_hash();
> +	return ret;
> +}
> +
>   static int do_move_mount(struct path *old_path, struct path *new_path)
>   {
>   	struct path parent_path = {.mnt = NULL, .dentry = NULL};
> +	struct mnt_namespace *ns;
>   	struct mount *p;
>   	struct mount *old;
>   	struct mountpoint *mp;
>   	int err;
> +	bool attached;
>   
>   	mp = lock_mount(new_path);
>   	if (IS_ERR(mp))
> @@ -2551,12 +2585,19 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
>   
>   	old = real_mount(old_path->mnt);
>   	p = real_mount(new_path->mnt);
> +	attached = mnt_has_parent(old);
> +	ns = old->mnt_ns;
>   
>   	err = -EINVAL;
> -	if (!check_mnt(p) || !check_mnt(old))
> +	/* The mountpoint must be in our namespace. */
> +	if (!check_mnt(p))
>   		goto out;
>   
> -	if (!mnt_has_parent(old))
> +	/* The thing moved should be either ours or completely unattached. */
> +	if (attached && !check_mnt(old))
> +		goto out;
> +
> +	if (!attached && !is_anon_ns(ns))

I think this is missing a check for ns != NULL, before passing it to 
is_anon_ns().

E.g. in case I called umount2(old_path, MNT_DETACH) beforehand.

>   		goto out;
>   
>   	if (old->mnt.mnt_flags & MNT_LOCKED)
> @@ -2571,7 +2612,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
>   	/*
>   	 * Don't move a mount residing in a shared parent.
>   	 */
> -	if (IS_MNT_SHARED(old->mnt_parent))
> +	if (attached && IS_MNT_SHARED(old->mnt_parent))
>   		goto out;
>   	/*
>   	 * Don't move a mount tree containing unbindable mounts to a destination
> @@ -2580,12 +2621,14 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
>   	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
>   		goto out;
>   	err = -ELOOP;
> +	if (!check_for_nsfs_mounts(old))
> +		goto out;
>   	for (; mnt_has_parent(p); p = p->mnt_parent)
>   		if (p == old)
>   			goto out;
>   
>   	err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp,
> -				   &parent_path);
> +				   attached ? &parent_path : NULL);
>   	if (err)
>   		goto out;
>   
> @@ -2594,8 +2637,11 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
>   	list_del_init(&old->mnt_expire);
>   out:
>   	unlock_mount(mp);
> -	if (!err)
> +	if (!err) {
>   		path_put(&parent_path);
> +		if (!attached)
> +			free_mnt_ns(ns);
> +	}
>   	return err;
>   }
>   
> @@ -3289,6 +3335,8 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
>   
>   /*
>    * Move a mount from one place to another.
> + * In combination with open_tree(OPEN_TREE_CLONE [| AT_RECURSIVE]) it can be
> + * used to copy a mount subtree.
>    *
>    * Note the flags value is a combination of MOVE_MOUNT_* flags.
>    */
>
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE
  2019-02-19 17:08 ` [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE David Howells
  2019-02-20 18:59   ` Alan Jenkins
@ 2019-02-26 17:45   ` Alan Jenkins
  1 sibling, 0 replies; 33+ messages in thread
From: Alan Jenkins @ 2019-02-26 17:45 UTC (permalink / raw)
  To: David Howells, viro
  Cc: linux-fsdevel, torvalds, ebiederm, linux-security-module

On 19/02/2019 17:08, David Howells wrote:
> Allow a detached tree created by open_tree(..., OPEN_TREE_CLONE) to be
> attached by move_mount(2).
>
> If by the time of final fput() of OPEN_TREE_CLONE-opened file its tree is
> not detached anymore, it won't be dissolved.  move_mount(2) is adjusted
> to handle detached source.
>
> That gives us equivalents of mount --bind and mount --rbind.

This is a bit ambiguous.  The two cases can be understood by analogy to 
bind / rbind.  But it is also seems natural, to think it could be used 
to implement the exact same thing as current `mount --bind` / 
`--rbind`.  I think it *does* now provide a full equivalence, right?

I was thinking about the case where mount propagation is enabled on the 
source tree, i.e. it is not a private mount.  Suppose a new mount is 
added inside the source tree, between open_tree() and move_mount().

In the previous version of the patch series, Eric suggested there was a 
NULL dereference in this scenario.[1]  This version should be safe.  I 
think the new mount will be propagated to the cloned tree. Furthermore - 
due to the way this version uses a temporary mount namespace - the 
propagated version of the mount will not be locked by 
attach_recursive_mnt().

[1] https://lore.kernel.org/lkml/87bm7n5k1r.fsf@xmission.com/

It looks very neat now, with the use of the temporary namespaces. 
Congratulations :-).  I have finished looking through these patches 1-3 now.

> Thanks also to Alan Jenkins<alan.christopher.jenkins@gmail.com>  for
> providing a whole bunch of ways to break things using this interface.
>
> Signed-off-by: Al Viro<viro@zeniv.linux.org.uk>
> Signed-off-by: David Howells<dhowells@redhat.com>
> Signed-off-by: Al Viro<viro@zeniv.linux.org.uk>
>
P.S. I guess Al does not need two Signed-off-by lines here.

Thanks
Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-02-19 17:08 ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around David Howells
  2019-02-20 12:32   ` Alan Jenkins
  2019-02-20 16:23   ` Jann Horn
@ 2019-07-08 12:02   ` Tetsuo Handa
  2019-07-08 13:18     ` Al Viro
  2 siblings, 1 reply; 33+ messages in thread
From: Tetsuo Handa @ 2019-07-08 12:02 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-api, linux-fsdevel, torvalds, ebiederm,
	linux-security-module

Hello, David Howells.

I realized via https://lwn.net/Articles/792622/ that a new set of
system calls for filesystem mounting has been added to Linux 5.2. But
I feel that LSM modules are not ready to support these system calls.

An example is move_mount() added by this patch. This patch added
security_move_mount() LSM hook but none of in-tree LSM modules are
providing "LSM_HOOK_INIT(move_mount, ...)" entry. Therefore, currently
security_move_mount() is a no-op. At least for TOMOYO, I want to check
mount manipulations caused by system calls because allowing mounts on
arbitrary location is not acceptable for pathname based access control.
What happened? I want TOMOYO to perform similar checks like mount() does.

On 2019/02/20 2:08, David Howells wrote:
> Add a move_mount() system call that will move a mount from one place to
> another and, in the next commit, allow to attach an unattached mount tree.
> 
> The new system call looks like the following:
> 
> 	int move_mount(int from_dfd, const char *from_path,
> 		       int to_dfd, const char *to_path,
> 		       unsigned int flags);
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: linux-api@vger.kernel.org
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-08 12:02   ` Tetsuo Handa
@ 2019-07-08 13:18     ` Al Viro
  2019-07-08 17:12       ` ebiederm
  0 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2019-07-08 13:18 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: David Howells, linux-api, linux-fsdevel, torvalds, ebiederm,
	linux-security-module

On Mon, Jul 08, 2019 at 09:02:10PM +0900, Tetsuo Handa wrote:
> Hello, David Howells.
> 
> I realized via https://lwn.net/Articles/792622/ that a new set of
> system calls for filesystem mounting has been added to Linux 5.2. But
> I feel that LSM modules are not ready to support these system calls.
> 
> An example is move_mount() added by this patch. This patch added
> security_move_mount() LSM hook but none of in-tree LSM modules are
> providing "LSM_HOOK_INIT(move_mount, ...)" entry. Therefore, currently
> security_move_mount() is a no-op. At least for TOMOYO, I want to check
> mount manipulations caused by system calls because allowing mounts on
> arbitrary location is not acceptable for pathname based access control.
> What happened? I want TOMOYO to perform similar checks like mount() does.

That would be like tomoyo_check_mount_acl(), right?
        if (need_dev) {
                /* Get mount point or device file. */
                if (!dev_name || kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
                        error = -ENOENT;
                        goto out;
                }
                obj.path1 = path;
                requested_dev_name = tomoyo_realpath_from_path(&path);
                if (!requested_dev_name) {
                        error = -ENOENT;
                        goto out;
                }
        } else {
is an obvious crap for *ALL* cases.  You are doing pathname resolution,
followed by normalization and checks.  Then the result of said pathname
resolution is thrown out and it's redone (usually by something in fs/super.c).
Results of _that_ get used.

Could you spell TOCTOU?  And yes, exploiting that takes a lot less than
being able to do mount(2) in the first place - just pass it
/proc/self/fd/69/<some acceptable path>/. with descriptor refering to
opened root directory.  With ~/<some acceptable path> being a symlink
to whatever you actually want to hit.  And descriptor 42 being your
opened homedir.  Now have that call of mount(2) overlap with dup2(42, 69)
from another thread sharing your descriptor table.  It doesn't take much
to get the timing right, especially if you can arrange for some other
activity frequently hitting namespace_sem at least shared (e.g. reading
/proc/mounts in a loop from another process); that's likely to stall
mount(2) at the point of lock_mount(), which comes *AFTER* the point
where LSM hook is stuck into.

Again, *ANY* checks on "dev_name" in ->sb_mount() instances are so much
snake oil.  Always had been.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-08 13:18     ` Al Viro
@ 2019-07-08 17:12       ` ebiederm
  2019-07-08 18:01         ` Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: ebiederm @ 2019-07-08 17:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Mon, Jul 08, 2019 at 09:02:10PM +0900, Tetsuo Handa wrote:
>> Hello, David Howells.
>> 
>> I realized via https://lwn.net/Articles/792622/ that a new set of
>> system calls for filesystem mounting has been added to Linux 5.2. But
>> I feel that LSM modules are not ready to support these system calls.
>> 
>> An example is move_mount() added by this patch. This patch added
>> security_move_mount() LSM hook but none of in-tree LSM modules are
>> providing "LSM_HOOK_INIT(move_mount, ...)" entry. Therefore, currently
>> security_move_mount() is a no-op. At least for TOMOYO, I want to check
>> mount manipulations caused by system calls because allowing mounts on
>> arbitrary location is not acceptable for pathname based access control.
>> What happened? I want TOMOYO to perform similar checks like mount() does.
>
> That would be like tomoyo_check_mount_acl(), right?
>         if (need_dev) {
>                 /* Get mount point or device file. */
>                 if (!dev_name || kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
>                         error = -ENOENT;
>                         goto out;
>                 }
>                 obj.path1 = path;
>                 requested_dev_name = tomoyo_realpath_from_path(&path);
>                 if (!requested_dev_name) {
>                         error = -ENOENT;
>                         goto out;
>                 }
>         } else {
> is an obvious crap for *ALL* cases.  You are doing pathname resolution,
> followed by normalization and checks.  Then the result of said pathname
> resolution is thrown out and it's redone (usually by something in fs/super.c).
> Results of _that_ get used.
>
> Could you spell TOCTOU?  And yes, exploiting that takes a lot less than
> being able to do mount(2) in the first place - just pass it
> /proc/self/fd/69/<some acceptable path>/. with descriptor refering to
> opened root directory.  With ~/<some acceptable path> being a symlink
> to whatever you actually want to hit.  And descriptor 42 being your
> opened homedir.  Now have that call of mount(2) overlap with dup2(42, 69)
> from another thread sharing your descriptor table.  It doesn't take much
> to get the timing right, especially if you can arrange for some other
> activity frequently hitting namespace_sem at least shared (e.g. reading
> /proc/mounts in a loop from another process); that's likely to stall
> mount(2) at the point of lock_mount(), which comes *AFTER* the point
> where LSM hook is stuck into.
>
> Again, *ANY* checks on "dev_name" in ->sb_mount() instances are so much
> snake oil.  Always had been.

Al you do realize that the TOCTOU you are talking about comes the system
call API.  TOMOYO can only be faulted for not playing in their own
sandbox and not reaching out and fixing the vfs implementation details.
Userspace has always had to very careful to only mount filesystems
on paths that root completely controls and won't change.

Further system calls for manipulating the mount tree have historically
done a crap job of validating their inputs.  Relying on the fact that
only root can call them.  So the idea of guarding against root doing
something silly was silly.

So I figure at the end of the day if the new security hooks for the new
mount system calls don't make it possible to remove the TOCTOU that is
on you and Dave.  You two touched that code last after all.

Not updating the new security hooks to at least do as good a job as the
old security hooks is the definition of regression.

So Al.  Please simmer down and take the valid criticism.

Eric

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-08 17:12       ` ebiederm
@ 2019-07-08 18:01         ` Al Viro
  2019-07-08 18:13           ` Al Viro
  2019-07-08 20:21           ` Al Viro
  0 siblings, 2 replies; 33+ messages in thread
From: Al Viro @ 2019-07-08 18:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module

On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote:

> Al you do realize that the TOCTOU you are talking about comes the system
> call API.  TOMOYO can only be faulted for not playing in their own
> sandbox and not reaching out and fixing the vfs implementation details.
>
> Userspace has always had to very careful to only mount filesystems
> on paths that root completely controls and won't change.

That has nothing whatsoever to do with the path where you are mounting
something.  _That_ is actually looked up before ->sb_mount() gets called;
no TOCTOU there.

The thing where ->sb_mount() is fucked by design is its handling of
	* device name
	* old tree in mount --bind
	* old tree in mount --move
	* things like journal name (not that any of the instances had tried
to do anything with that)

All of those *do* have TOCTOU, and that's an inevitable result of the
idiotic hook fetishism of LSM design.  Instead of "we want something
to happen when such-and-such predicate is about to change", it's
"lemme run my code, the earlier the better, I don't care about any
damn predicates, it's all too complicated anyway, whaddya mean
racy?"

Any time you have pathname resolution done twice, it's a built-in race.
If you want *ALL* checks on mount(2) to be done before the mean, nasty
kernel code gets to decide anything (bind/move/mount/etc. all squashed
together, just let us have at the syscall arguments, mmkay?) - that's
precisely what you get.

And no, that TOCTOU is not in syscall API.  "open() of an untrusted
pathname may end up trying to open hell knows what" is one thing;
"open() of an untrusted pathname may apply MAC checks to one object
and open something entirely different" is another.  The former is
inherent to syscall API.  The latter would be a badly fucked up
implementation (we don't have that issue on open(2), thankfully).

To make it clear, TOMOYO is not at fault here; LSM "architecture" is.
Note, BTW, that TOMOYO checks there do *NOT* limit the input pathname
at all - only the destination of the first pathwalk.  Repeating it
may easily lead to an entirely different place.

Canonicalized pathname is derived from pathwalk result; having concluded
that it's perfectly fine for the operation requested is pure security
theatre - it
	* says nothing about the trustedness of the original pathname
	* may have nothing whatsoever to the object yielded by the
second pathwalk, which is what'll end up actually used.
It's not even "this thing walks through /proc, and thus not to be trusted
to be stable" - the checks won't notice where the damn thing had been.

When somebody proposes _useful_ MAC for mount --move (and that really
can't be done at the level of syscall entry - we need to have already
figured out that with given combination of flags the 1st argument of
mount(2) will be a pathname *and* already looked it up), sure - it
will be added to do_move_mount(), which is where we have all lookups
done, and apply both for mount() and move_mount().

Right now anyone relying upon DAC enforced for MS_MOVE has worse problems
than "attacker will use move_mount(2) and bypass my policy" - the same
attacker can bloody well bypass those with nothing more exotic than
clone(2) and dup2(2) (and mount(2), of course).

And it's not just MS_MOVE (or MS_BIND).  Anyone trying to prevent
mounting e.g. ext2 from untrusted device and do that on the level of
->sb_mount() *is* *bloody* *well* *fucked*.  ->sb_mount() is simply
the wrong place for that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-08 18:01         ` Al Viro
@ 2019-07-08 18:13           ` Al Viro
  2019-07-08 20:21           ` Al Viro
  1 sibling, 0 replies; 33+ messages in thread
From: Al Viro @ 2019-07-08 18:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module

On Mon, Jul 08, 2019 at 07:01:32PM +0100, Al Viro wrote:

> Right now anyone relying upon DAC enforced for MS_MOVE has worse problems

That'd be MAC, of course.  Sorry.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-08 18:01         ` Al Viro
  2019-07-08 18:13           ` Al Viro
@ 2019-07-08 20:21           ` Al Viro
  2019-07-09  0:13             ` ebiederm
  2019-07-23 21:45             ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around James Morris
  1 sibling, 2 replies; 33+ messages in thread
From: Al Viro @ 2019-07-08 20:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module

On Mon, Jul 08, 2019 at 07:01:32PM +0100, Al Viro wrote:
> On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote:
> 
> > Al you do realize that the TOCTOU you are talking about comes the system
> > call API.  TOMOYO can only be faulted for not playing in their own
> > sandbox and not reaching out and fixing the vfs implementation details.

PS: the fact that mount(2) has been overloaded to hell and back (including
MS_MOVE, which goes back to v2.5.0.5) predates the introduction of ->sb_mount()
and LSM in general (2.5.27).  MS_BIND is 2.4.0-test9pre2.

In all the years since the introduction of ->sb_mount() I've seen zero
questions from LSM folks regarding a sane place for those checks.  What I have
seen was "we want it immediately upon the syscall entry, let the module
figure out what to do" in reply to several times I tried to tell them "folks,
it's called in a bad place; you want the checks applied to objects, not to
raw string arguments".

As it is, we have easily bypassable checks on mount(2) (by way of ->sb_mount();
there are other hooks also in the game for remounts and new mounts).

I see no point whatsoever trying to duplicate ->sb_mount() on the top level
of move_mount(2).  When and if sane checks are agreed upon for that thing,
they certainly should be used both for MS_MOVE case of mount(2) and for
move_mount(2).  And that'll come for free from calling those in do_move_mount().
They won't be the first thing called in mount(2) - we demultiplex first,
decide that we have a move and do pathname resolution on source.  And that's
precisely what we need to avoid the TOCTOU there.

I'm sorry, but this "run the hook at the very beginning, the modules know
better what they want, just give them as close to syscall arguments as
possible before even looking at the flags" model is wrong, plain and simple.

As for the MS_MOVE checks, the arguments are clear enough (two struct path *,
same as what we pass to do_move_mount()).  AFAICS, only tomoyo and
apparmor are trying to do anything for MS_MOVE in ->sb_mount(), and both
look like it should be easy enough to implement what said checks intend
to do (probably - assuming that aa_move_mount() doesn't depend upon
having its kern_path() inside the __begin_current_label_crit_section()/
__end_current_label_crit_section() pair; looks like it shouldn't be,
but that's for apparmor folks to decide).

That's really for LSM folks, though - I've given up on convincing
(or even getting a rationale out of) them on anything related to hook
semantics years ago.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-08 20:21           ` Al Viro
@ 2019-07-09  0:13             ` ebiederm
  2019-07-09 10:51               ` Tetsuo Handa
  2019-07-23 21:45             ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around James Morris
  1 sibling, 1 reply; 33+ messages in thread
From: ebiederm @ 2019-07-09  0:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Tetsuo Handa, David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Mon, Jul 08, 2019 at 07:01:32PM +0100, Al Viro wrote:
>> On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote:
>> 
>> > Al you do realize that the TOCTOU you are talking about comes the system
>> > call API.  TOMOYO can only be faulted for not playing in their own
>> > sandbox and not reaching out and fixing the vfs implementation details.
>
> PS: the fact that mount(2) has been overloaded to hell and back (including
> MS_MOVE, which goes back to v2.5.0.5) predates the introduction of ->sb_mount()
> and LSM in general (2.5.27).  MS_BIND is 2.4.0-test9pre2.
>
> In all the years since the introduction of ->sb_mount() I've seen zero
> questions from LSM folks regarding a sane place for those checks.  What I have
> seen was "we want it immediately upon the syscall entry, let the module
> figure out what to do" in reply to several times I tried to tell them "folks,
> it's called in a bad place; you want the checks applied to objects, not to
> raw string arguments".
>
> As it is, we have easily bypassable checks on mount(2) (by way of ->sb_mount();
> there are other hooks also in the game for remounts and new mounts).
>
> I see no point whatsoever trying to duplicate ->sb_mount() on the top level
> of move_mount(2).  When and if sane checks are agreed upon for that thing,
> they certainly should be used both for MS_MOVE case of mount(2) and for
> move_mount(2).  And that'll come for free from calling those in do_move_mount().
> They won't be the first thing called in mount(2) - we demultiplex first,
> decide that we have a move and do pathname resolution on source.  And that's
> precisely what we need to avoid the TOCTOU there.
>
> I'm sorry, but this "run the hook at the very beginning, the modules know
> better what they want, just give them as close to syscall arguments as
> possible before even looking at the flags" model is wrong, plain and simple.
>
> As for the MS_MOVE checks, the arguments are clear enough (two struct path *,
> same as what we pass to do_move_mount()).  AFAICS, only tomoyo and
> apparmor are trying to do anything for MS_MOVE in ->sb_mount(), and both
> look like it should be easy enough to implement what said checks intend
> to do (probably - assuming that aa_move_mount() doesn't depend upon
> having its kern_path() inside the __begin_current_label_crit_section()/
> __end_current_label_crit_section() pair; looks like it shouldn't be,
> but that's for apparmor folks to decide).
>
> That's really for LSM folks, though - I've given up on convincing
> (or even getting a rationale out of) them on anything related to hook
> semantics years ago.

I have found the LSM folks in recent years to be rather reasonable,
especially when something concrete has been proposed.

A quick look suggests that the new security_mount_move is a reasonable
hook for the mount_move check.

Tetsuo, do you think you can implement the checks you need for Tomoyo
for mount_move on top of the new security_mount_move?

Al is proposing that similar hooks be added for the other subcases of
mount so that less racy hooks can be implemented.  Tetsuo do you have
any problem with that?

Tetsuo whatever the virtues of this patchset getting merged into Linus's
tree it is merged now, so the only thing we can do now is roll up our
sleeves go through everything and fix the regressions/bugs/issues that
have emerged with the new mount api.

Eric




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-09  0:13             ` ebiederm
@ 2019-07-09 10:51               ` Tetsuo Handa
  2019-07-22 10:12                 ` Tetsuo Handa
  0 siblings, 1 reply; 33+ messages in thread
From: Tetsuo Handa @ 2019-07-09 10:51 UTC (permalink / raw)
  To: Eric W. Biederman, Al Viro
  Cc: David Howells, linux-api, linux-fsdevel, torvalds,
	linux-security-module, John Johansen

On 2019/07/09 9:13, Eric W. Biederman wrote:
> Tetsuo, do you think you can implement the checks you need for Tomoyo
> for mount_move on top of the new security_mount_move?

I hope "Yes", for I'm not aware what will become possible with the new mount
syscalls. For example, if it becomes possible to use multiple source device
files or directories, TOMOYO is not ready to check multiple source device
files or directories, for currently TOMOYO uses

  file mount $DEVICE $MOUNTPOINT $FILESYSTEM $OPTIONS

( https://tomoyo.osdn.jp/2.6/policy-specification/domain-policy-syntax.html#file_mount )
syntax. If only optional part (any arguments which were previously passed via
"const void *data" of mount() syscall) differs, current syntax can be reused.

> 
> Al is proposing that similar hooks be added for the other subcases of
> mount so that less racy hooks can be implemented.  Tetsuo do you have
> any problem with that?

I welcome improvements that get rid of calling kern_path() from LSM hooks.

In the past, I incorrectly implemented which MS_* flag should be checked
(commit df91e49477a9be15 ("TOMOYO: Fix mount flags checking order.")).
If LSM hooks for mount manipulation are divided into multiple LSM hooks
(along with changing the argument from "const char *" to "const struct path *"),
such mistakes can be avoided.

> Tetsuo whatever the virtues of this patchset getting merged into Linus's
> tree it is merged now, so the only thing we can do now is roll up our
> sleeves go through everything and fix the regressions/bugs/issues that
> have emerged with the new mount api.

For now, can we apply this patch for 5.2-stable ?


From dd62fab0592e02580fd5a34222a2d11bfc179f61 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 9 Jul 2019 19:05:49 +0900
Subject: [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled.

Commit 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts
around") introduced security_move_mount() LSM hook, but we missed that
TOMOYO and AppArmor did not implement hooks for checking move_mount(2).
For pathname based access controls like TOMOYO and AppArmor, unchecked
mount manipulation is not acceptable. Therefore, until TOMOYO and AppArmor
implement hooks, in order to avoid unchecked mount manipulation, pretend
as if move_mount(2) is unavailable when either TOMOYO or AppArmor is
enabled.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts around")
Cc: stable@vger.kernel.org # 5.2
---
 include/linux/lsm_hooks.h | 6 ++++++
 security/apparmor/lsm.c   | 1 +
 security/tomoyo/tomoyo.c  | 1 +
 3 files changed, 8 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 47f58cf..cd411b7 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -2142,4 +2142,10 @@ static inline void security_delete_hooks(struct security_hook_list *hooks,
 
 extern int lsm_inode_alloc(struct inode *inode);
 
+static inline int no_move_mount(const struct path *from_path,
+				const struct path *to_path)
+{
+	return -ENOSYS;
+}
+
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index ec3a928..5cdf63b 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1158,6 +1158,7 @@ struct lsm_blob_sizes apparmor_blob_sizes __lsm_ro_after_init = {
 	LSM_HOOK_INIT(capable, apparmor_capable),
 
 	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
+	LSM_HOOK_INIT(move_mount, no_move_mount),
 	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
 	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
 
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 716c92e..be1b1a1 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -558,6 +558,7 @@ static void tomoyo_task_free(struct task_struct *task)
 	LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
 	LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
 	LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
+	LSM_HOOK_INIT(move_mount, no_move_mount),
 	LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
 	LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),
 	LSM_HOOK_INIT(socket_bind, tomoyo_socket_bind),
-- 
1.8.3.1




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-09 10:51               ` Tetsuo Handa
@ 2019-07-22 10:12                 ` Tetsuo Handa
  2019-07-23  4:16                   ` John Johansen
  0 siblings, 1 reply; 33+ messages in thread
From: Tetsuo Handa @ 2019-07-22 10:12 UTC (permalink / raw)
  To: John Johansen
  Cc: Eric W. Biederman, Al Viro, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

I did not see AppArmor patch for this problem in 5.3-rc1. 
John, are you OK with this patch for 5.2-stable and 5.3 ?

On 2019/07/09 19:51, Tetsuo Handa wrote:
> For now, can we apply this patch for 5.2-stable ?
> 
> 
>>From dd62fab0592e02580fd5a34222a2d11bfc179f61 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Tue, 9 Jul 2019 19:05:49 +0900
> Subject: [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled.
> 
> Commit 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts
> around") introduced security_move_mount() LSM hook, but we missed that
> TOMOYO and AppArmor did not implement hooks for checking move_mount(2).
> For pathname based access controls like TOMOYO and AppArmor, unchecked
> mount manipulation is not acceptable. Therefore, until TOMOYO and AppArmor
> implement hooks, in order to avoid unchecked mount manipulation, pretend
> as if move_mount(2) is unavailable when either TOMOYO or AppArmor is
> enabled.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts around")
> Cc: stable@vger.kernel.org # 5.2
> ---
>  include/linux/lsm_hooks.h | 6 ++++++
>  security/apparmor/lsm.c   | 1 +
>  security/tomoyo/tomoyo.c  | 1 +
>  3 files changed, 8 insertions(+)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 47f58cf..cd411b7 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -2142,4 +2142,10 @@ static inline void security_delete_hooks(struct security_hook_list *hooks,
>  
>  extern int lsm_inode_alloc(struct inode *inode);
>  
> +static inline int no_move_mount(const struct path *from_path,
> +				const struct path *to_path)
> +{
> +	return -ENOSYS;
> +}
> +
>  #endif /* ! __LINUX_LSM_HOOKS_H */
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index ec3a928..5cdf63b 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -1158,6 +1158,7 @@ struct lsm_blob_sizes apparmor_blob_sizes __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(capable, apparmor_capable),
>  
>  	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
> +	LSM_HOOK_INIT(move_mount, no_move_mount),
>  	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
>  	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
>  
> diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
> index 716c92e..be1b1a1 100644
> --- a/security/tomoyo/tomoyo.c
> +++ b/security/tomoyo/tomoyo.c
> @@ -558,6 +558,7 @@ static void tomoyo_task_free(struct task_struct *task)
>  	LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
>  	LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
>  	LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
> +	LSM_HOOK_INIT(move_mount, no_move_mount),
>  	LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
>  	LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),
>  	LSM_HOOK_INIT(socket_bind, tomoyo_socket_bind),
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-22 10:12                 ` Tetsuo Handa
@ 2019-07-23  4:16                   ` John Johansen
  2019-07-23 13:45                     ` Tetsuo Handa
  0 siblings, 1 reply; 33+ messages in thread
From: John Johansen @ 2019-07-23  4:16 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric W. Biederman, Al Viro, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

On 7/22/19 3:12 AM, Tetsuo Handa wrote:
> I did not see AppArmor patch for this problem in 5.3-rc1. 
> John, are you OK with this patch for 5.2-stable and 5.3 ?
> 
yes, I have some larger mount rework and clean-up to do and an apparmor
patch for this is waiting on that. Looking at the thread I am wondering
where my previous reply is, probably lost in a mail client crash, I have
had a few of those lately.

Acked-by: John Johansen <john.johansen@canonical.com>


> On 2019/07/09 19:51, Tetsuo Handa wrote:
>> For now, can we apply this patch for 5.2-stable ?
>>
>>
>> >From dd62fab0592e02580fd5a34222a2d11bfc179f61 Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Tue, 9 Jul 2019 19:05:49 +0900
>> Subject: [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled.
>>
>> Commit 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts
>> around") introduced security_move_mount() LSM hook, but we missed that
>> TOMOYO and AppArmor did not implement hooks for checking move_mount(2).
>> For pathname based access controls like TOMOYO and AppArmor, unchecked
>> mount manipulation is not acceptable. Therefore, until TOMOYO and AppArmor
>> implement hooks, in order to avoid unchecked mount manipulation, pretend
>> as if move_mount(2) is unavailable when either TOMOYO or AppArmor is
>> enabled.
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Fixes: 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts around")
>> Cc: stable@vger.kernel.org # 5.2
>> ---
>>  include/linux/lsm_hooks.h | 6 ++++++
>>  security/apparmor/lsm.c   | 1 +
>>  security/tomoyo/tomoyo.c  | 1 +
>>  3 files changed, 8 insertions(+)
>>
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 47f58cf..cd411b7 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -2142,4 +2142,10 @@ static inline void security_delete_hooks(struct security_hook_list *hooks,
>>  
>>  extern int lsm_inode_alloc(struct inode *inode);
>>  
>> +static inline int no_move_mount(const struct path *from_path,
>> +				const struct path *to_path)
>> +{
>> +	return -ENOSYS;
>> +}
>> +
>>  #endif /* ! __LINUX_LSM_HOOKS_H */
>> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
>> index ec3a928..5cdf63b 100644
>> --- a/security/apparmor/lsm.c
>> +++ b/security/apparmor/lsm.c
>> @@ -1158,6 +1158,7 @@ struct lsm_blob_sizes apparmor_blob_sizes __lsm_ro_after_init = {
>>  	LSM_HOOK_INIT(capable, apparmor_capable),
>>  
>>  	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
>> +	LSM_HOOK_INIT(move_mount, no_move_mount),
>>  	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
>>  	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
>>  
>> diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
>> index 716c92e..be1b1a1 100644
>> --- a/security/tomoyo/tomoyo.c
>> +++ b/security/tomoyo/tomoyo.c
>> @@ -558,6 +558,7 @@ static void tomoyo_task_free(struct task_struct *task)
>>  	LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
>>  	LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
>>  	LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
>> +	LSM_HOOK_INIT(move_mount, no_move_mount),
>>  	LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
>>  	LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),
>>  	LSM_HOOK_INIT(socket_bind, tomoyo_socket_bind),
>>
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-23  4:16                   ` John Johansen
@ 2019-07-23 13:45                     ` Tetsuo Handa
  2019-08-06 10:43                       ` Tetsuo Handa
  0 siblings, 1 reply; 33+ messages in thread
From: Tetsuo Handa @ 2019-07-23 13:45 UTC (permalink / raw)
  To: Al Viro
  Cc: John Johansen, Eric W. Biederman, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

Al, will you send this patch to 5.3-rcX via vfs.git tree?

On 2019/07/23 13:16, John Johansen wrote:
> On 7/22/19 3:12 AM, Tetsuo Handa wrote:
>> I did not see AppArmor patch for this problem in 5.3-rc1. 
>> John, are you OK with this patch for 5.2-stable and 5.3 ?
>>
> yes, I have some larger mount rework and clean-up to do and an apparmor
> patch for this is waiting on that. Looking at the thread I am wondering
> where my previous reply is, probably lost in a mail client crash, I have
> had a few of those lately.
> 
> Acked-by: John Johansen <john.johansen@canonical.com>
> 
> 
>> On 2019/07/09 19:51, Tetsuo Handa wrote:
>>> For now, can we apply this patch for 5.2-stable ?
>>>
>>>
>>> >From dd62fab0592e02580fd5a34222a2d11bfc179f61 Mon Sep 17 00:00:00 2001
>>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>> Date: Tue, 9 Jul 2019 19:05:49 +0900
>>> Subject: [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled.
>>>
>>> Commit 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts
>>> around") introduced security_move_mount() LSM hook, but we missed that
>>> TOMOYO and AppArmor did not implement hooks for checking move_mount(2).
>>> For pathname based access controls like TOMOYO and AppArmor, unchecked
>>> mount manipulation is not acceptable. Therefore, until TOMOYO and AppArmor
>>> implement hooks, in order to avoid unchecked mount manipulation, pretend
>>> as if move_mount(2) is unavailable when either TOMOYO or AppArmor is
>>> enabled.
>>>
>>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>> Fixes: 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts around")
>>> Cc: stable@vger.kernel.org # 5.2
>>> ---
>>>  include/linux/lsm_hooks.h | 6 ++++++
>>>  security/apparmor/lsm.c   | 1 +
>>>  security/tomoyo/tomoyo.c  | 1 +
>>>  3 files changed, 8 insertions(+)
>>>
>>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>>> index 47f58cf..cd411b7 100644
>>> --- a/include/linux/lsm_hooks.h
>>> +++ b/include/linux/lsm_hooks.h
>>> @@ -2142,4 +2142,10 @@ static inline void security_delete_hooks(struct security_hook_list *hooks,
>>>  
>>>  extern int lsm_inode_alloc(struct inode *inode);
>>>  
>>> +static inline int no_move_mount(const struct path *from_path,
>>> +				const struct path *to_path)
>>> +{
>>> +	return -ENOSYS;
>>> +}
>>> +
>>>  #endif /* ! __LINUX_LSM_HOOKS_H */
>>> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
>>> index ec3a928..5cdf63b 100644
>>> --- a/security/apparmor/lsm.c
>>> +++ b/security/apparmor/lsm.c
>>> @@ -1158,6 +1158,7 @@ struct lsm_blob_sizes apparmor_blob_sizes __lsm_ro_after_init = {
>>>  	LSM_HOOK_INIT(capable, apparmor_capable),
>>>  
>>>  	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
>>> +	LSM_HOOK_INIT(move_mount, no_move_mount),
>>>  	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
>>>  	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
>>>  
>>> diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
>>> index 716c92e..be1b1a1 100644
>>> --- a/security/tomoyo/tomoyo.c
>>> +++ b/security/tomoyo/tomoyo.c
>>> @@ -558,6 +558,7 @@ static void tomoyo_task_free(struct task_struct *task)
>>>  	LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
>>>  	LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
>>>  	LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
>>> +	LSM_HOOK_INIT(move_mount, no_move_mount),
>>>  	LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
>>>  	LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),
>>>  	LSM_HOOK_INIT(socket_bind, tomoyo_socket_bind),
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-08 20:21           ` Al Viro
  2019-07-09  0:13             ` ebiederm
@ 2019-07-23 21:45             ` James Morris
  2019-07-23 23:30               ` Al Viro
  1 sibling, 1 reply; 33+ messages in thread
From: James Morris @ 2019-07-23 21:45 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric W. Biederman, Tetsuo Handa, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

On Mon, 8 Jul 2019, Al Viro wrote:

> On Mon, Jul 08, 2019 at 07:01:32PM +0100, Al Viro wrote:
> > On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote:
> > 
> > > Al you do realize that the TOCTOU you are talking about comes the system
> > > call API.  TOMOYO can only be faulted for not playing in their own
> > > sandbox and not reaching out and fixing the vfs implementation details.
> 
> PS: the fact that mount(2) has been overloaded to hell and back (including
> MS_MOVE, which goes back to v2.5.0.5) predates the introduction of ->sb_mount()
> and LSM in general (2.5.27).  MS_BIND is 2.4.0-test9pre2.
> 
> In all the years since the introduction of ->sb_mount() I've seen zero
> questions from LSM folks regarding a sane place for those checks.  What I have
> seen was "we want it immediately upon the syscall entry, let the module
> figure out what to do" in reply to several times I tried to tell them "folks,
> it's called in a bad place; you want the checks applied to objects, not to
> raw string arguments".
> 
> As it is, we have easily bypassable checks on mount(2) (by way of ->sb_mount();
> there are other hooks also in the game for remounts and new mounts).

What are your recommendations for placing these checks?

-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-23 21:45             ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around James Morris
@ 2019-07-23 23:30               ` Al Viro
  0 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2019-07-23 23:30 UTC (permalink / raw)
  To: James Morris
  Cc: Eric W. Biederman, Tetsuo Handa, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

On Wed, Jul 24, 2019 at 07:45:07AM +1000, James Morris wrote:
> On Mon, 8 Jul 2019, Al Viro wrote:
> 
> > On Mon, Jul 08, 2019 at 07:01:32PM +0100, Al Viro wrote:
> > > On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote:
> > > 
> > > > Al you do realize that the TOCTOU you are talking about comes the system
> > > > call API.  TOMOYO can only be faulted for not playing in their own
> > > > sandbox and not reaching out and fixing the vfs implementation details.
> > 
> > PS: the fact that mount(2) has been overloaded to hell and back (including
> > MS_MOVE, which goes back to v2.5.0.5) predates the introduction of ->sb_mount()
> > and LSM in general (2.5.27).  MS_BIND is 2.4.0-test9pre2.
> > 
> > In all the years since the introduction of ->sb_mount() I've seen zero
> > questions from LSM folks regarding a sane place for those checks.  What I have
> > seen was "we want it immediately upon the syscall entry, let the module
> > figure out what to do" in reply to several times I tried to tell them "folks,
> > it's called in a bad place; you want the checks applied to objects, not to
> > raw string arguments".
> > 
> > As it is, we have easily bypassable checks on mount(2) (by way of ->sb_mount();
> > there are other hooks also in the game for remounts and new mounts).
> 
> What are your recommendations for placing these checks?

For MS_MOVE: do_move_mount(), after lock_mount(), when the mount tree is stable
and pathnames are resolved.
For MS_BIND: do_loopback(), ditto.
Incidentally, for pivot_root(2) I would also suggest moving that past the
lock_mount(), for the same reasons.
For propagation flag changes: do_change_type(), after namespace_lock().
For per-mount flag changes: do_reconfigure_mnt(), possibly after having
grabbed ->s_umount.
For fs remount: IMO it should be handled purely in ->sb_remount().

For new mount: really depends upon the filesystem type, I'm afraid.  There's
nothing type-independent that can be done - in the best case you can say
"no mounting block filesystems from that device"; note that for NFS the
meaning of the argument is entirely different and you *can* have something
like foo.local.org: as a name of symlink (or directory), so blanket "you can
mount foo.local.org:/srv/nfs/blah" is asking for trouble -
mount -t ext4 foo.local.org:/srv/nfs/blah /mnt can bloody well end up
successfully mounting a very untrusted usb disk.

Note, BTW, that things like cramfs can be given
	* mtd:mtd_device_name
	* mtd<decimal number>
	* block device pathname
The last one needs to be resolved/canonicalized/whatnot.
The other two must *NOT* - there's nothing to stop the
attacker from ln -s /dev/mtdblock0 ./mtd12 and confusing
the fsck out of your LSM (it would assume that we are
trying to get mtd0 when in reality it'll mount mtd12).

The rules really need to be type-dependent; ->sb_set_mnt_opts() has the
state after the fs has been initialized to work with, but anything trying
to stop the attempt to set the damn thing up in the first place (as
current ->sb_mount() would) must be called from the inside of individual
->get_tree()/->mount() instance (or a helper used by such).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around
  2019-07-23 13:45                     ` Tetsuo Handa
@ 2019-08-06 10:43                       ` Tetsuo Handa
  2019-08-09 15:44                         ` [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled Tetsuo Handa
  2019-08-22  3:51                         ` [RFC][PATCH] fix d_absolute_path() interplay with fsmount() Al Viro
  0 siblings, 2 replies; 33+ messages in thread
From: Tetsuo Handa @ 2019-08-06 10:43 UTC (permalink / raw)
  To: Al Viro
  Cc: John Johansen, Eric W. Biederman, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

This 5.2-stable/5.3 patch is stalling. Should I ask different route?

On 2019/07/23 22:45, Tetsuo Handa wrote:
> Al, will you send this patch to 5.3-rcX via vfs.git tree?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled.
  2019-08-06 10:43                       ` Tetsuo Handa
@ 2019-08-09 15:44                         ` Tetsuo Handa
  2019-08-22  3:51                         ` [RFC][PATCH] fix d_absolute_path() interplay with fsmount() Al Viro
  1 sibling, 0 replies; 33+ messages in thread
From: Tetsuo Handa @ 2019-08-09 15:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-security-module, Tetsuo Handa, John Johansen

Commit 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts
around") introduced security_move_mount() LSM hook, but we missed that
TOMOYO and AppArmor did not implement hooks for checking move_mount(2).
For pathname based access controls like TOMOYO and AppArmor, unchecked
mount manipulation is not acceptable. Therefore, until TOMOYO and AppArmor
implement hooks, in order to avoid unchecked mount manipulation, pretend
as if move_mount(2) is unavailable when either TOMOYO or AppArmor is
enabled.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: John Johansen <john.johansen@canonical.com>
Fixes: 2db154b3ea8e14b0 ("vfs: syscall: Add move_mount(2) to move mounts around")
Cc: stable@vger.kernel.org # 5.2
---
 include/linux/lsm_hooks.h | 6 ++++++
 security/apparmor/lsm.c   | 1 +
 security/tomoyo/tomoyo.c  | 1 +
 3 files changed, 8 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 47f58cf..cd411b7 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -2142,4 +2142,10 @@ static inline void security_delete_hooks(struct security_hook_list *hooks,
 
 extern int lsm_inode_alloc(struct inode *inode);
 
+static inline int no_move_mount(const struct path *from_path,
+				const struct path *to_path)
+{
+	return -ENOSYS;
+}
+
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index ec3a928..5cdf63b 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1158,6 +1158,7 @@ struct lsm_blob_sizes apparmor_blob_sizes __lsm_ro_after_init = {
 	LSM_HOOK_INIT(capable, apparmor_capable),
 
 	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
+	LSM_HOOK_INIT(move_mount, no_move_mount),
 	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
 	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
 
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 716c92e..be1b1a1 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -558,6 +558,7 @@ static void tomoyo_task_free(struct task_struct *task)
 	LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
 	LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
 	LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
+	LSM_HOOK_INIT(move_mount, no_move_mount),
 	LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
 	LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),
 	LSM_HOOK_INIT(socket_bind, tomoyo_socket_bind),
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC][PATCH] fix d_absolute_path() interplay with fsmount()
  2019-08-06 10:43                       ` Tetsuo Handa
  2019-08-09 15:44                         ` [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled Tetsuo Handa
@ 2019-08-22  3:51                         ` Al Viro
  2019-08-30 10:11                           ` Tetsuo Handa
  1 sibling, 1 reply; 33+ messages in thread
From: Al Viro @ 2019-08-22  3:51 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: John Johansen, Eric W. Biederman, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

[bringing a private thread back to the lists]

There's a bug in interplay of fsmount() and d_absolute_path().
Namely, the check in d_absolute_path() treats the
not-yet-attached mount as "reached absolute root".
AFAICS, the right fix is this

diff --git a/fs/d_path.c b/fs/d_path.c
index a7d0a96b35ce..0f1fc1743302 100644
--- a/fs/d_path.c
+++ b/fs/d_path.c
@@ -116,8 +116,10 @@ static int prepend_path(const struct path *path,
 				vfsmnt = &mnt->mnt;
 				continue;
 			}
-			if (!error)
-				error = is_mounted(vfsmnt) ? 1 : 2;
+			if (is_mounted(vfsmnt) && !is_anon_ns(mnt->mnt_ns))
+				error = 1;	// absolute root
+			else
+				error = 2;	// detached or not attached yet
 			break;
 		}
 		parent = dentry->d_parent;

but that would slightly change the behaviour in another case.
Namely, nfs4 mount-time temporary namespaces.  There we have
the following: mount -t nfs4 server:/foo/bar/baz /mnt
will
        * set a temporary namespace, matching the mount tree as
exported by server
        * mount the root export there
        * traverse foo/bar/baz in that namespace, triggering
automounts when we cross the filesystem boundaries on server.
        * grab whatever we'd arrived at; that's what we'll
be mounting.
        * dissolve the temp namespace.

If you trigger some LSM hook (e.g. in permission checks on
that pathname traversal) for objects in that temp namespace,
do you want d_absolute_path() to succeed (and give a pathname
relative to server's root export), or should it rather fail?

AFAICS, apparmor and tomoyo are the only things that might
care either way; I would go with "fail, it's not an absolute
path" (and that's what the patch above will end up doing),
but it's really up to you.

It definitely ought to fail for yet-to-be-attached case, though;
it doesn't, and that's a bug that needs to be fixed.  Mea culpa.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH] fix d_absolute_path() interplay with fsmount()
  2019-08-22  3:51                         ` [RFC][PATCH] fix d_absolute_path() interplay with fsmount() Al Viro
@ 2019-08-30 10:11                           ` Tetsuo Handa
  0 siblings, 0 replies; 33+ messages in thread
From: Tetsuo Handa @ 2019-08-30 10:11 UTC (permalink / raw)
  To: Al Viro
  Cc: John Johansen, Eric W. Biederman, David Howells, linux-api,
	linux-fsdevel, torvalds, linux-security-module

On 2019/08/22 12:51, Al Viro wrote:
> [bringing a private thread back to the lists]
> 
> There's a bug in interplay of fsmount() and d_absolute_path().
> Namely, the check in d_absolute_path() treats the
> not-yet-attached mount as "reached absolute root".

I think you can send this patch to linux-next.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, back to index

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 17:08 [PATCH 00/10] VFS: Provide new mount UAPI David Howells
2019-02-19 17:08 ` [PATCH 01/10] vfs: syscall: Add open_tree(2) to reference or clone a mount David Howells
2019-02-19 17:08 ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around David Howells
2019-02-20 12:32   ` Alan Jenkins
2019-02-20 12:41     ` Alan Jenkins
2019-02-20 16:23   ` Jann Horn
2019-07-08 12:02   ` Tetsuo Handa
2019-07-08 13:18     ` Al Viro
2019-07-08 17:12       ` ebiederm
2019-07-08 18:01         ` Al Viro
2019-07-08 18:13           ` Al Viro
2019-07-08 20:21           ` Al Viro
2019-07-09  0:13             ` ebiederm
2019-07-09 10:51               ` Tetsuo Handa
2019-07-22 10:12                 ` Tetsuo Handa
2019-07-23  4:16                   ` John Johansen
2019-07-23 13:45                     ` Tetsuo Handa
2019-08-06 10:43                       ` Tetsuo Handa
2019-08-09 15:44                         ` [PATCH] LSM: Disable move_mount() syscall when TOMOYO or AppArmor is enabled Tetsuo Handa
2019-08-22  3:51                         ` [RFC][PATCH] fix d_absolute_path() interplay with fsmount() Al Viro
2019-08-30 10:11                           ` Tetsuo Handa
2019-07-23 21:45             ` [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around James Morris
2019-07-23 23:30               ` Al Viro
2019-02-19 17:08 ` [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE David Howells
2019-02-20 18:59   ` Alan Jenkins
2019-02-26 17:45   ` Alan Jenkins
2019-02-19 17:08 ` [PATCH 04/10] Make anon_inodes unconditional David Howells
2019-02-19 17:09 ` [PATCH 05/10] vfs: syscall: Add fsopen() to prepare for superblock creation David Howells
2019-02-19 17:09 ` [PATCH 06/10] vfs: Implement logging through fs_context David Howells
2019-02-19 17:09 ` [PATCH 07/10] vfs: syscall: Add fsconfig() for configuring and managing a context David Howells
2019-02-19 17:09 ` [PATCH 08/10] vfs: syscall: Add fsmount() to create a mount for a superblock David Howells
2019-02-19 17:09 ` [PATCH 09/10] vfs: syscall: Add fspick() to select a superblock for reconfiguration David Howells
2019-02-19 17:09 ` [PATCH 10/10] vfs: Add a sample program for the new mount API David Howells

Linux-Security-Module Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-security-module/0 linux-security-module/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-security-module linux-security-module/ https://lore.kernel.org/linux-security-module \
		linux-security-module@vger.kernel.org linux-security-module@archiver.kernel.org
	public-inbox-index linux-security-module


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-security-module


AGPL code for this site: git clone https://public-inbox.org/ public-inbox