linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Feature bug with the new mount API: no way of doing read only bind mounts
@ 2019-11-21  0:11 James Bottomley
  2019-11-21  8:10 ` David Howells
  0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2019-11-21  0:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: dhowells, Al Viro, Miklos Szeredi

My understanding is the new mount API will eventually replace the
current mount API, so it needs to be feature for feature compatible
before that happens.  I was looking to use the read only bind mount as
a template for reimplementing shiftfs when I discovered that you can't
actually create a read only bind mount with the new API.  The problem
is that fspick() will only reconfigure the underlying superblock, which
you don't want because you only want the bound subtree to become read
only and open_tree()/move_mount() doesn't give you any facility to add
or change options on the bind.

The way it works with the current API is that you create a bind mount
with MS_BIND and then remount it with MS_BIND|MS_REMOUNT|MS_RDONLY
which makes the subtree go read only.  It looks like the nearest way of
getting this to work with the new API would be to add an additional
fspick() flag, say FSPICK_BIND, that tells it to operate on the subtree
not the superblock and then pass this down to fsconfig() so it will
accept the "ro" and "rw" flags and the reconfigure command and nothing
else (although I'd like it to accept additional flags on down the road
for the shiftfs rewrite).

There are some caveats, in that fspick/fsconfig seem to be fairly
adamantly coded to work with the superblock, so the subtree handling
will have to be special cased, but if everyone's happy with that, I can
code up something.

Regards,

James


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature bug with the new mount API: no way of doing read only bind mounts
  2019-11-21  0:11 Feature bug with the new mount API: no way of doing read only bind mounts James Bottomley
@ 2019-11-21  8:10 ` David Howells
  2019-11-21 14:54   ` Christian Brauner
                     ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: David Howells @ 2019-11-21  8:10 UTC (permalink / raw)
  To: James Bottomley, Christian Brauner
  Cc: dhowells, linux-fsdevel, Al Viro, Miklos Szeredi

James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> I was looking to use the read only bind mount as a template for
> reimplementing shiftfs when I discovered that you can't actually create a
> read only bind mount with the new API.  The problem is that fspick() will
> only reconfigure the underlying superblock, which you don't want because you
> only want the bound subtree to become read only and open_tree()/move_mount()
> doesn't give you any facility to add or change options on the bind.

You'd use open_tree() with OPEN_TREE_CLONE and possibly AT_RECURSIVE rather
than fspick().  fspick() is, as you observed, more for reconfiguring the
superblock.

What is missing is a mount_setattr() syscall - something like:

	mount_setattr(int dfd, const char *path, unsigned int at_flags,
		      unsigned int attr_change_mask, unsigned int attrs);

which would allow what you want to be done like:

	fd = open_tree(AT_FDCWD, "/my/source/", OPEN_TREE_CLONE);
	mount_setattr(fd, "", AT_EMPTY_PATH | AT_RECURSIVE,
		      MOUNT_ATTR_RDONLY, MOUNT_ATTR_RDONLY);
	move_mount(fd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

Christian: you said you wanted to have a look at doing this - is that still
your intention?

Al: Is it too late to change OPEN_TREE_CLONE to be a definite value rather
than tying it to O_CLOEXEC?  OPEN_TREE_CLONE shares its space with the
AT_flags, but O_CLOEXEC has one of three different values, depending on arch.
Perhaps we can nail it to 02000000 (0x80000), which is what's used on
everything but alpha, mips, parisc and sparc.

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature bug with the new mount API: no way of doing read only bind mounts
  2019-11-21  8:10 ` David Howells
@ 2019-11-21 14:54   ` Christian Brauner
  2019-11-21 15:00   ` James Bottomley
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2019-11-21 14:54 UTC (permalink / raw)
  To: David Howells
  Cc: James Bottomley, Christian Brauner, linux-fsdevel, Al Viro,
	Miklos Szeredi

On Thu, Nov 21, 2019 at 08:10:39AM +0000, David Howells wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > I was looking to use the read only bind mount as a template for
> > reimplementing shiftfs when I discovered that you can't actually create a
> > read only bind mount with the new API.  The problem is that fspick() will
> > only reconfigure the underlying superblock, which you don't want because you
> > only want the bound subtree to become read only and open_tree()/move_mount()
> > doesn't give you any facility to add or change options on the bind.
> 
> You'd use open_tree() with OPEN_TREE_CLONE and possibly AT_RECURSIVE rather
> than fspick().  fspick() is, as you observed, more for reconfiguring the
> superblock.
> 
> What is missing is a mount_setattr() syscall - something like:
> 
> 	mount_setattr(int dfd, const char *path, unsigned int at_flags,
> 		      unsigned int attr_change_mask, unsigned int attrs);
> 
> which would allow what you want to be done like:
> 
> 	fd = open_tree(AT_FDCWD, "/my/source/", OPEN_TREE_CLONE);
> 	mount_setattr(fd, "", AT_EMPTY_PATH | AT_RECURSIVE,
> 		      MOUNT_ATTR_RDONLY, MOUNT_ATTR_RDONLY);
> 	move_mount(fd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> 
> Christian: you said you wanted to have a look at doing this - is that still
> your intention?

Yes, it is. I can't put an exact time-frame on this rn.
Also, I thought we've agreed a while back that the flags would move into
a struct since mount is gaining flags quickly too. :)

Christian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature bug with the new mount API: no way of doing read only bind mounts
  2019-11-21  8:10 ` David Howells
  2019-11-21 14:54   ` Christian Brauner
@ 2019-11-21 15:00   ` James Bottomley
  2019-11-21 15:04   ` David Howells
  2019-11-21 16:15   ` James Bottomley
  3 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-21 15:00 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

On Thu, 2019-11-21 at 08:10 +0000, David Howells wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > I was looking to use the read only bind mount as a template for
> > reimplementing shiftfs when I discovered that you can't actually
> > create a read only bind mount with the new API.  The problem is
> > that fspick() will only reconfigure the underlying superblock,
> > which you don't want because you only want the bound subtree to
> > become read only and open_tree()/move_mount() doesn't give you any
> > facility to add or change options on the bind.
> 
> You'd use open_tree() with OPEN_TREE_CLONE and possibly AT_RECURSIVE
> rather than fspick().  fspick() is, as you observed, more for
> reconfiguring the superblock.
> 
> What is missing is a mount_setattr() syscall - something like:
> 
> 	mount_setattr(int dfd, const char *path, unsigned int at_flags,
> 		      unsigned int attr_change_mask, unsigned int
> attrs);
> 
> which would allow what you want to be done like:
> 
> 	fd = open_tree(AT_FDCWD, "/my/source/", OPEN_TREE_CLONE);
> 	mount_setattr(fd, "", AT_EMPTY_PATH | AT_RECURSIVE,
> 		      MOUNT_ATTR_RDONLY, MOUNT_ATTR_RDONLY);
> 	move_mount(fd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

That would work for read only, which is a simple mount attribute flag,
but wouldn't work for anything more complex like shiftfs which wants
more complex parameters.  What I'm looking for in that case is the
ability to reconfigure the bind mount in the same way you would
reconfigure the superblock.

James


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature bug with the new mount API: no way of doing read only bind mounts
  2019-11-21  8:10 ` David Howells
  2019-11-21 14:54   ` Christian Brauner
  2019-11-21 15:00   ` James Bottomley
@ 2019-11-21 15:04   ` David Howells
  2019-11-21 16:15   ` James Bottomley
  3 siblings, 0 replies; 13+ messages in thread
From: David Howells @ 2019-11-21 15:04 UTC (permalink / raw)
  To: Christian Brauner
  Cc: dhowells, James Bottomley, Christian Brauner, linux-fsdevel,
	Al Viro, Miklos Szeredi

Christian Brauner <christian.brauner@ubuntu.com> wrote:

> Also, I thought we've agreed a while back that the flags would move into
> a struct since mount is gaining flags quickly too. :)

I think we did agree that.  What I just wrote in the email was more of a
tentative illustration.

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature bug with the new mount API: no way of doing read only bind mounts
  2019-11-21  8:10 ` David Howells
                     ` (2 preceding siblings ...)
  2019-11-21 15:04   ` David Howells
@ 2019-11-21 16:15   ` James Bottomley
  2019-11-27 20:32     ` James Bottomley
  3 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2019-11-21 16:15 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

On Thu, 2019-11-21 at 08:10 +0000, David Howells wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > I was looking to use the read only bind mount as a template for
> > reimplementing shiftfs when I discovered that you can't actually
> > create a read only bind mount with the new API.  The problem is
> > that fspick() will only reconfigure the underlying superblock,
> > which you don't want because you only want the bound subtree to
> > become read only and open_tree()/move_mount() doesn't give you any
> > facility to add or change options on the bind.
> 
> You'd use open_tree() with OPEN_TREE_CLONE and possibly AT_RECURSIVE
> rather than fspick().  fspick() is, as you observed, more for
> reconfiguring the superblock.

In the abstract, I think the concept of a configuration file descriptor
with the, open add parameters and execution to fd, and optionally
convert to representation or reconfigure in place is a very generic
one.  If we did agree to do that for bind mounts as well, I wouldn't
overload the existing logic, I'd lift it up to the generic level,
probably by hooking the execution parts, and make superblock and bind
two implementations of it.  It would basically be 3 system calls:
configopen, configparam and configconvert although obviously with more
appealing names.

The reason for thinking like this is I can see it having utility in
some of the more complex SCSI configuration operations we do today via
a bunch of mechanisms including configfs that could more compactly be
done by this file descriptor mechanism.

I'd also note that this plethora of system calls you have could then go
away: fspick itself would just become an open type to which the path
file descriptor would then be a required parameter, as would open_tree
and the missing mount_setattr would then just work.

James


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Feature bug with the new mount API: no way of doing read only bind mounts
  2019-11-21 16:15   ` James Bottomley
@ 2019-11-27 20:32     ` James Bottomley
  2019-11-27 20:33       ` [RFC 1/6] logger: add a limited buffer logging facility James Bottomley
                         ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-27 20:32 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

On Thu, 2019-11-21 at 08:15 -0800, James Bottomley wrote:
> On Thu, 2019-11-21 at 08:10 +0000, David Howells wrote:
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > 
> > > I was looking to use the read only bind mount as a template for
> > > reimplementing shiftfs when I discovered that you can't actually
> > > create a read only bind mount with the new API.  The problem is
> > > that fspick() will only reconfigure the underlying superblock,
> > > which you don't want because you only want the bound subtree to
> > > become read only and open_tree()/move_mount() doesn't give you
> > > any facility to add or change options on the bind.
> > 
> > You'd use open_tree() with OPEN_TREE_CLONE and possibly
> > AT_RECURSIVE rather than fspick().  fspick() is, as you observed,
> > more for reconfiguring the superblock.
> 
> In the abstract, I think the concept of a configuration file
> descriptor with the, open add parameters and execution to fd, and
> optionally convert to representation or reconfigure in place is a
> very generic one.  If we did agree to do that for bind mounts as
> well, I wouldn't overload the existing logic, I'd lift it up to the
> generic level, probably by hooking the execution parts, and make
> superblock and bind two implementations of it.  It would basically be
> 3 system calls: configopen, configparam and configconvert although
> obviously with more appealing names.
> 
> The reason for thinking like this is I can see it having utility in
> some of the more complex SCSI configuration operations we do today
> via a bunch of mechanisms including configfs that could more
> compactly be done by this file descriptor mechanism.
> 
> I'd also note that this plethora of system calls you have could then
> go away: fspick itself would just become an open type to which the
> path file descriptor would then be a required parameter, as would
> open_tree and the missing mount_setattr would then just work.

Well, that suggestion got crickets, so let me show you how it would
look.  I actually pulled the logger out into lib/ because that's not
filesystem dependent and implemented a generic configfd handler for
doing configuration via a file descriptor.  Configfd is completely
separated from the mechanics of filesystem reconfiguration, so it can
be used as a configuration mechanism for anything.

Once configuration file descriptor mechanics are made generic, you can
get away with two operations for everything, thus giving you the
ability to eliminate fsopen, fsmount, fsconfigure, fspick and
open_tree.  Note, I didn't do that: I just reimplemented them via
internal configfd calls instead, but it is a possibility.  Note also
I've now got the ability to do a ro bind mount without any new system
calls and I now have a scaffold on which I could successfully hang
shiftfs as a propertied bind mount.

The final patch introduces a bind configfd type that can be used both
in place of open_tree and to implement the ro bind mount as a bind
mount reconfigure operation.

James

---

James Bottomley (6):
  logger: add a limited buffer logging facility
  configfd: add generic file descriptor based configuration parser
  configfd: syscall: wire up configfd syscalls
  fs: implement fsconfig via configfd
  fs: expose internal interfaces open_detached_copy and
    do_reconfigure_mount
  fs: bind: add configfs type for bind mounts

 arch/alpha/kernel/syscalls/syscall.tbl      |   2 +
 arch/arm/tools/syscall.tbl                  |   2 +
 arch/arm64/include/asm/unistd.h             |   2 +-
 arch/arm64/include/asm/unistd32.h           |   4 +
 arch/ia64/kernel/syscalls/syscall.tbl       |   2 +
 arch/m68k/kernel/syscalls/syscall.tbl       |   2 +
 arch/microblaze/kernel/syscalls/syscall.tbl |   2 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |   2 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |   2 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |   2 +
 arch/parisc/kernel/syscalls/syscall.tbl     |   2 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |   2 +
 arch/s390/kernel/syscalls/syscall.tbl       |   2 +
 arch/sh/kernel/syscalls/syscall.tbl         |   2 +
 arch/sparc/kernel/syscalls/syscall.tbl      |   2 +
 arch/x86/entry/syscalls/syscall_32.tbl      |   2 +
 arch/x86/entry/syscalls/syscall_64.tbl      |   2 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |   2 +
 fs/Makefile                                 |   3 +-
 fs/bind.c                                   | 193 ++++++++++
 fs/configfd.c                               | 450 +++++++++++++++++++++++
 fs/filesystems.c                            |   8 +-
 fs/fs_context.c                             | 124 +------
 fs/fsopen.c                                 | 535 ++++++++++++++--------------
 fs/internal.h                               |   7 +
 fs/namespace.c                              | 115 +++---
 include/linux/configfd.h                    |  61 ++++
 include/linux/fs.h                          |   2 +
 include/linux/fs_context.h                  |  23 +-
 include/linux/fs_parser.h                   |   2 +
 include/linux/logger.h                      |  34 ++
 include/linux/syscalls.h                    |   5 +
 include/uapi/asm-generic/unistd.h           |   7 +-
 include/uapi/linux/configfd.h               |  20 ++
 lib/Makefile                                |   3 +-
 lib/logger.c                                | 211 +++++++++++
 36 files changed, 1368 insertions(+), 473 deletions(-)
 create mode 100644 fs/bind.c
 create mode 100644 fs/configfd.c
 create mode 100644 include/linux/configfd.h
 create mode 100644 include/linux/logger.h
 create mode 100644 include/uapi/linux/configfd.h
 create mode 100644 lib/logger.c

-- 
2.16.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/6] logger: add a limited buffer logging facility
  2019-11-27 20:32     ` James Bottomley
@ 2019-11-27 20:33       ` James Bottomley
  2019-11-27 20:34       ` [RFC 2/6] configfd: add generic file descriptor based configuration parser James Bottomley
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-27 20:33 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

Can be used by anything that wants to add to a list of messages and
then spit them back at a particular time.  The current use case for
this is the filesystem configuration descriptor, which can accumulate
messages about config options which can then be read from the file
descriptor later via a .read operation.

This code was based on fc_log originally written by David Howells in
commit e7582e16a170db4c85995c1c03d194ea1ea621fc "vfs: Implement
logging through fs_context".

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 include/linux/logger.h |  34 ++++++++
 lib/Makefile           |   3 +-
 lib/logger.c           | 211 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 247 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/logger.h
 create mode 100644 lib/logger.c

diff --git a/include/linux/logger.h b/include/linux/logger.h
new file mode 100644
index 000000000000..cf7a87f1d8d2
--- /dev/null
+++ b/include/linux/logger.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_LOGGER_H
+#define _LINUX_LOGGER_H
+
+#include <stdarg.h>
+
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/refcount.h>
+#include <linux/types.h>
+
+struct logger {
+	refcount_t	usage;
+	struct mutex	mutex;
+	u8		head;		/* Insertion index in buffer[] */
+	u8		tail;		/* Removal index in buffer[] */
+	u8		need_free;	/* Mask of kfree'able items in buffer[] */
+	struct module	*owner;		/* Owner module for strings that don't then need freeing */
+	const char	*buffer[8];
+};
+
+extern __printf(2, 3)
+void logger_log(struct logger *, const char *fmt, ...);
+extern __printf(2, 0)
+void logger_valog(struct logger *, const char *fmt, va_list va);
+ssize_t logger_read(struct logger *log, char __user *_buf,  size_t len);
+struct logger *logger_alloc(struct module *owner);
+void logger_put(struct logger **logp);
+void logger_get(struct logger *log);
+
+#define logger_err(log, fmt, ...) ({ logger_log(log, "e "fmt, ## __VA_ARGS__); })
+#define logger_warn(log, fmt, ...) ({ logger_log(log, "w "fmt, ## __VA_ARGS__); })
+#define logger_info(log, fmt, ...) ({ logger_log(log, "i "fmt, ## __VA_ARGS__); })
+
+#endif
diff --git a/lib/Makefile b/lib/Makefile
index c5892807e06f..57a90d1eed29 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -31,7 +31,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 	 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
-	 nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o
+	 nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o \
+	 logger.o
 
 lib-$(CONFIG_PRINTK) += dump_stack.o
 lib-$(CONFIG_MMU) += ioremap.o
diff --git a/lib/logger.c b/lib/logger.c
new file mode 100644
index 000000000000..5b8655959c77
--- /dev/null
+++ b/lib/logger.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Provide a logging function
+ *
+ * code is originally based on fc_log by David Howells and rewritten
+ * by James Bottomley.
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2019 James.Bottomley@HansenPartnership.com
+ */
+
+#include <linux/logger.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <asm/sections.h>
+
+/**
+ * logger_valog - Log a message to a logger context
+ * @log: The logger context
+ * @fmt: The format of the log entry.
+ * @va: the variable arguments pointer
+ *
+ * Currently @fmt is expected to have a letter prefix of w or e
+ * followed by a space to signal warning or error.  If @log is NULL
+ * then the message will be logged to the kernel via printk.
+ *
+ */
+void logger_valog(struct logger *log, const char *fmt, va_list va)
+{
+	static const char store_failure[] = "OOM: Can't store error string";
+	const char *p;
+	const char *q;
+	u8 freeable = 1;
+
+	if (!strchr(fmt, '%')) {
+		p = fmt;
+		goto unformatted_string;
+	}
+	if (strcmp(fmt, "%s") == 0) {
+		p = va_arg(va, const char *);
+
+	unformatted_string:
+		if (((unsigned long)p >= (unsigned long)__start_rodata &&
+		     (unsigned long)p <  (unsigned long)__end_rodata) ||
+		    (log && within_module_core((unsigned long)p, log->owner))) {
+			q = (char *)p;
+			freeable = 0;
+		} else {
+			q = kstrdup(p, GFP_KERNEL);
+		}
+	} else {
+		q = kvasprintf(GFP_KERNEL, fmt, va);
+	}
+
+	if (!q) {
+		/* this won't be the right format, but better than losing it */
+		vprintk(fmt, va);
+		q = store_failure;
+	}
+
+	if (!log) {
+		switch (fmt[0]) {
+		case 'w':
+			printk(KERN_WARNING "%s\n", q + 2);
+			break;
+		case 'e':
+			printk(KERN_ERR "%s\n", q + 2);
+			break;
+		default:
+			printk(KERN_NOTICE "%s\n", q + 2);
+			break;
+		}
+		if (freeable)
+			kfree(q);
+	} else {
+		unsigned int logsize = ARRAY_SIZE(log->buffer);
+		u8 index;
+
+		index = log->head & (logsize - 1);
+		BUILD_BUG_ON(sizeof(log->head) != sizeof(u8) ||
+			     sizeof(log->tail) != sizeof(u8));
+		if ((u8)(log->head - log->tail) == logsize) {
+			/* The buffer is full, discard the oldest message */
+			if (log->need_free & (1 << index))
+				kfree(log->buffer[index]);
+			log->tail++;
+		}
+
+		log->buffer[index] = q;
+		log->need_free &= ~(1 << index);
+		log->need_free |= freeable << index;
+		log->head++;
+	}
+}
+EXPORT_SYMBOL(logger_valog);
+
+/**
+ * logger_log - Log a message to a logger context
+ * @log: The logger context
+ * @fmt: The format of the log entry.
+ */
+void logger_log(struct logger *log, const char *fmt, ...)
+{
+	va_list va;
+
+	va_start(va, fmt);
+	logger_valog(log, fmt, va);
+	va_end(va);
+}
+EXPORT_SYMBOL(logger_log);
+
+/**
+ * logger_read - read back a log at an arbitrary position
+ * @logger: the logger context
+ * @buf: the buffer to read to (may be user space pointer)
+ * @len: the length of data to read
+ *
+ * Allow the user to read back any error, warning or informational
+ * messages. This is designed to be wrapped for a file descriptor read.
+ */
+ssize_t logger_read(struct logger *log, char __user *_buf,  size_t len)
+{
+	unsigned int logsize = ARRAY_SIZE(log->buffer);
+	ssize_t ret;
+	const char *p;
+	bool need_free;
+	int index, n;
+
+	ret = mutex_lock_interruptible(&log->mutex);
+	if (ret < 0)
+		return ret;
+
+	if (log->head == log->tail) {
+		mutex_unlock(&log->mutex);
+		return -ENODATA;
+	}
+
+	index = log->tail & (logsize - 1);
+	p = log->buffer[index];
+	need_free = log->need_free & (1 << index);
+	log->buffer[index] = NULL;
+	log->need_free &= ~(1 << index);
+	log->tail++;
+	mutex_unlock(&log->mutex);
+
+	ret = -EMSGSIZE;
+	n = strlen(p);
+	if (n > len)
+		goto err_free;
+	ret = -EFAULT;
+	if (copy_to_user(_buf, p, n) != 0)
+		goto err_free;
+	ret = n;
+
+err_free:
+	if (need_free)
+		kfree(p);
+	return ret;
+}
+EXPORT_SYMBOL(logger_read);
+
+struct logger *logger_alloc(struct module *owner)
+{
+	struct logger *log = kzalloc(sizeof(*log), GFP_KERNEL);
+
+	if (!log)
+		return ERR_PTR(-ENOMEM);
+
+	refcount_set(&log->usage, 1);
+	mutex_init(&log->mutex);
+	log->owner = owner;
+
+	return log;
+}
+EXPORT_SYMBOL(logger_alloc);
+
+/**
+ * logger_put - decrement refcount and free log if zero
+ * @logp: pointer to logger context to be freed
+ *
+ * if the logger is actually freed, then the @logp will be NULLed.
+ */
+void logger_put(struct logger **logp)
+{
+	int i;
+	struct logger *log;
+
+	if (!logp || !*logp)
+		return;
+
+	log = *logp;
+	if (refcount_dec_and_test(&log->usage)) {
+		*logp = NULL;
+		for (i = 0; i <= 7; i++)
+			if (log->need_free & (1 << i))
+				kfree(log->buffer[i]);
+		kfree(log);
+	}
+}
+EXPORT_SYMBOL(logger_put);
+
+/**
+ * logger_get - get a reference to a logger context
+ * @log: the logger context
+ */
+void logger_get(struct logger *log)
+{
+	if (!log)
+		return;
+	refcount_inc(&log->usage);
+}
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/6] configfd: add generic file descriptor based configuration parser
  2019-11-27 20:32     ` James Bottomley
  2019-11-27 20:33       ` [RFC 1/6] logger: add a limited buffer logging facility James Bottomley
@ 2019-11-27 20:34       ` James Bottomley
  2019-11-27 20:35       ` [RFC 3/6] configfd: syscall: wire up configfd syscalls James Bottomley
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-27 20:34 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

This code is based on the original filesystem context based
configuration parser by David Howells, but lifted up so it can apply
to anything rather than only filesystem contexts.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 fs/Makefile                   |   3 +-
 fs/configfd.c                 | 450 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/configfd.h      |  61 ++++++
 include/uapi/linux/configfd.h |  20 ++
 4 files changed, 533 insertions(+), 1 deletion(-)
 create mode 100644 fs/configfd.c
 create mode 100644 include/linux/configfd.h
 create mode 100644 include/uapi/linux/configfd.h

diff --git a/fs/Makefile b/fs/Makefile
index 14231b4cf383..569563f6c0d5 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -13,7 +13,8 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o splice.o sync.o utimes.o d_path.o \
 		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
-		fs_types.o fs_context.o fs_parser.o fsopen.o
+		fs_types.o fs_context.o fs_parser.o fsopen.o \
+		configfd.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o block_dev.o direct-io.o mpage.o
diff --git a/fs/configfd.c b/fs/configfd.c
new file mode 100644
index 000000000000..7f2c750ac7e3
--- /dev/null
+++ b/fs/configfd.c
@@ -0,0 +1,450 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Generic configuration file descriptor handling
+ *
+ * Copyright (C) 2019 James.Bottomley@HansenPartnership.com
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/configfd.h>
+#include <linux/namei.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/rwlock.h>
+#include <linux/uaccess.h>
+
+
+static struct configfd_type *configfds;
+static DEFINE_RWLOCK(configfds_lock);
+
+static ssize_t configfd_read(struct file *file,
+			     char __user *buf, size_t len, loff_t *pos)
+{
+	struct configfd_context *cfc = file->private_data;
+
+	return logger_read(cfc->log, buf, len);
+}
+
+static void configfd_type_put(const struct configfd_type *cft)
+{
+	module_put(cft->owner);
+}
+
+static void configfd_context_free(struct configfd_context *cfc)
+{
+	if (cfc->cft->ops->free)
+		cfc->cft->ops->free(cfc);
+	logger_put(&cfc->log);
+	configfd_type_put(cfc->cft);
+	kfree(cfc);
+}
+
+static int configfd_release(struct inode *inode, struct file *file)
+{
+	struct configfd_context *cfc = file->private_data;
+
+	if (cfc) {
+		file->private_data = NULL;
+		configfd_context_free(cfc);
+	}
+	return 0;
+}
+
+const struct file_operations configfd_context_fops = {
+	.read		= configfd_read,
+	.release	= configfd_release,
+	.llseek		= no_llseek,
+};
+
+static int configfd_create_fd(struct configfd_context *cfc,
+			      unsigned int flags)
+{
+	int fd;
+
+	fd = anon_inode_getfd("[configfd]", &configfd_context_fops, cfc,
+			      O_RDWR | flags);
+	if (fd < 0)
+		configfd_context_free(cfc);
+	return fd;
+}
+
+static struct configfd_type **configfd_type_find(const char *name)
+{
+	struct configfd_type **c;
+
+	for (c = &configfds; *c; c = &(*c)->next) {
+		if (strcmp((*c)->name, name) == 0)
+			break;
+	}
+
+	return c;
+}
+
+static struct configfd_type *configfd_type_get(const char *name)
+{
+	struct configfd_type *cft;
+
+	read_lock(&configfds_lock);
+	cft = *(configfd_type_find(name));
+	if (cft && !try_module_get(cft->owner))
+		cft = NULL;
+	read_unlock(&configfds_lock);
+
+	return cft;
+}
+
+struct configfd_context *configfd_context_alloc(const struct configfd_type *cft,
+						unsigned int op)
+{
+	struct configfd_context *cfc;
+	struct logger *log;
+	int ret;
+
+	cfc = kzalloc(sizeof(*cfc) + cft->data_size, GFP_KERNEL);
+	if (!cfc)
+		return ERR_PTR(-ENOMEM);
+
+	if (cft->data_size)
+		cfc->data = &cfc[1];
+
+	cfc->cft = cft;
+	cfc->op = op;
+	log = logger_alloc(cft->owner);
+	if (IS_ERR(log)) {
+		ret = PTR_ERR(log);
+		goto out_free;
+	}
+
+	cfc->log = log;
+	if (cft->ops->alloc) {
+		ret = cft->ops->alloc(cfc);
+		if (ret)
+			goto out_put;
+	}
+
+	return cfc;
+ out_put:
+	logger_put(&cfc->log);
+ out_free:
+	kfree(cfc);
+	return ERR_PTR(ret);
+}
+
+int kern_configfd_open(const char *config_name, unsigned int flags,
+			unsigned int op)
+{
+	const struct configfd_type *cft;
+	struct configfd_context *cfc;
+
+	if (flags & ~O_CLOEXEC)
+		return -EINVAL;
+	if (op != CONFIGFD_CMD_CREATE && op != CONFIGFD_CMD_RECONFIGURE)
+		return -EINVAL;
+
+	cft = configfd_type_get(config_name);
+	if (!cft)
+		return -ENODEV;
+	cfc = configfd_context_alloc(cft, op);
+	if (IS_ERR(cfc)) {
+		configfd_type_put(cft);
+		return PTR_ERR(cfc);
+	}
+
+	return configfd_create_fd(cfc, flags);
+}
+
+long ksys_configfd_open(const char __user *_config_name, unsigned int flags,
+			unsigned int op)
+{
+	const char *config_name = strndup_user(_config_name, PAGE_SIZE);
+	int ret;
+
+	if (IS_ERR(config_name))
+		return PTR_ERR(config_name);
+	ret = kern_configfd_open(config_name, flags, op);
+	kfree(config_name);
+
+	return ret;
+}
+
+SYSCALL_DEFINE3(configfd_open,
+		const char __user *, _config_name,
+		unsigned int, flags,
+		unsigned int, op)
+{
+	return ksys_configfd_open(_config_name, flags, op);
+}
+
+int kern_configfd_action(int fd, struct configfd_param *p)
+{
+	struct fd f = fdget(fd);
+	struct configfd_context *cfc;
+	const struct configfd_ops *ops;
+	int ret = -EINVAL;
+	/* upper 24 bits are available to consumers */
+	u8 our_cmd = p->cmd & 0xff;
+
+	if (!f.file)
+		return -EBADF;
+	if (f.file->f_op != &configfd_context_fops)
+		goto out_f;
+	cfc = f.file->private_data;
+
+	ops = cfc->cft->ops;
+
+	/* check allowability */
+	ret = -EOPNOTSUPP;
+	switch (our_cmd) {
+	case CONFIGFD_SET_FLAG:
+	case CONFIGFD_SET_STRING:
+	case CONFIGFD_SET_BINARY:
+	case CONFIGFD_SET_PATH:
+	case CONFIGFD_SET_PATH_EMPTY:
+	case CONFIGFD_SET_INT:
+		if (!ops->set)
+			goto out_f;
+		break;
+	case CONFIGFD_GET_FD:
+		if (!ops->get)
+			goto out_f;
+		break;
+	case CONFIGFD_CMD_CREATE:
+	case CONFIGFD_CMD_RECONFIGURE:
+		if (our_cmd != cfc->op) {
+			logger_err(cfc->log, "Wrong operation, we were opened for %d", cfc->op);
+			goto out_f;
+		}
+		if (!ops->act)
+			goto out_f;
+		break;
+	default:
+		break;
+	}
+
+	/*
+	 * Execute
+	 */
+	switch (our_cmd) {
+	case CONFIGFD_SET_STRING:
+	case CONFIGFD_SET_BINARY:
+	case CONFIGFD_SET_PATH:
+	case CONFIGFD_SET_PATH_EMPTY:
+	case CONFIGFD_SET_FD:
+	case CONFIGFD_SET_FLAG:
+	case CONFIGFD_SET_INT:
+		ret = ops->set(cfc, p);
+		break;
+	case CONFIGFD_GET_FD:
+		ret = ops->get(cfc, p);
+		if (ret == 0) {
+			int fd;
+
+			fd = get_unused_fd_flags(p->aux & O_CLOEXEC);
+			if (fd < 0) {
+				ret = fd;
+				break;
+			}
+			fd_install(fd, p->file);
+			p->file = NULL; /* consume the file */
+			p->aux = fd;
+		}
+		break;
+	case CONFIGFD_CMD_RECONFIGURE:
+	case CONFIGFD_CMD_CREATE:
+		ret = ops->act(cfc, p->cmd);
+		break;
+	default:
+		break;
+	}
+out_f:
+	fdput(f);
+	return ret;
+}
+
+long ksys_configfd_action(int fd, unsigned int cmd, const char __user *_key,
+			  void __user *_value, int aux)
+{
+	struct configfd_param param = {
+		.cmd = cmd,
+	};
+	u8 our_cmd = cmd & 0xff;
+	int ret;
+
+	/* check parameters required for action */
+	switch (our_cmd) {
+	case CONFIGFD_SET_FLAG:
+		if (!_key || _value || aux)
+			return -EINVAL;
+		break;
+	case CONFIGFD_SET_STRING:
+	case CONFIGFD_GET_FD:
+		if (!_key || !_value)
+			return -EINVAL;
+		break;
+	case CONFIGFD_SET_BINARY:
+		if (!_key || !_value || aux <= 0 || aux > 1024 * 1024)
+			return -EINVAL;
+		break;
+	case CONFIGFD_SET_PATH:
+	case CONFIGFD_SET_PATH_EMPTY:
+		if (!_key || !_value || (aux != AT_FDCWD && aux < 0))
+			return -EINVAL;
+		break;
+	case CONFIGFD_SET_FD:
+		if (!_key || _value || aux < 0)
+			return -EINVAL;
+		break;
+	case CONFIGFD_SET_INT:
+		if (!_key)
+			return -EINVAL;
+		break;
+	case CONFIGFD_CMD_CREATE:
+	case CONFIGFD_CMD_RECONFIGURE:
+		if (_key || _value || aux)
+			return -EINVAL;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	if (_key) {
+		param.key = strndup_user(_key, 256);
+		if (IS_ERR(param.key))
+			return PTR_ERR(param.key);
+	}
+
+	/* now pull the parameters into the kernel */
+	switch (our_cmd) {
+	case CONFIGFD_SET_STRING:
+		param.string = strndup_user(_value, 256);
+		if (IS_ERR(param.string)) {
+			ret = PTR_ERR(param.string);
+			goto out_key;
+		}
+		param.size = strlen(param.string);
+		break;
+	case CONFIGFD_SET_BINARY:
+		param.size = aux;
+		param.blob = memdup_user_nul(_value, aux);
+		if (IS_ERR(param.blob)) {
+			ret = PTR_ERR(param.blob);
+			goto out_key;
+		}
+		break;
+	case CONFIGFD_SET_PATH:
+		param.name = getname_flags(_value, 0, NULL);
+		if (IS_ERR(param.name)) {
+			ret = PTR_ERR(param.name);
+			goto out_key;
+		}
+		param.aux = aux;
+		param.size = strlen(param.name->name);
+		break;
+	case CONFIGFD_SET_PATH_EMPTY:
+		param.name = getname_flags(_value, LOOKUP_EMPTY, NULL);
+		if (IS_ERR(param.name)) {
+			ret = PTR_ERR(param.name);
+			goto out_key;
+		}
+		param.aux = aux;
+		param.size = strlen(param.name->name);
+		break;
+	case CONFIGFD_SET_FD:
+		ret = -EBADF;
+		param.file = fget_raw(aux);
+		if (!param.file)
+			goto out_key;
+		break;
+	case CONFIGFD_SET_INT:
+		param.aux = aux;
+		break;
+	default:
+		break;
+	}
+	ret = kern_configfd_action(fd, &param);
+	/* clean up unconsumed parameters */
+	switch (our_cmd) {
+	case CONFIGFD_SET_STRING:
+	case CONFIGFD_SET_BINARY:
+		kfree(param.string);
+		break;
+	case CONFIGFD_SET_PATH:
+	case CONFIGFD_SET_PATH_EMPTY:
+		if (param.name)
+			putname(param.name);
+		break;
+	case CONFIGFD_GET_FD:
+		if (!ret)
+			ret = put_user(param.aux, (int __user *)_value);
+		/* FALL THROUGH */
+	case CONFIGFD_SET_FD:
+		if (param.file)
+			fput(param.file);
+		break;
+	default:
+		break;
+	}
+ out_key:
+	kfree(param.key);
+
+	return ret;
+}
+
+
+SYSCALL_DEFINE5(configfd_action,
+		int, fd, unsigned int, cmd,
+		const char __user *, _key,
+		void __user *, _value,
+		int, aux)
+{
+	return ksys_configfd_action(fd, cmd, _key, _value, aux);
+}
+
+int configfd_type_register(struct configfd_type *cft)
+{
+	int ret = 0;
+	struct configfd_type **c;
+
+	if (WARN(cft->next,
+		 "BUG: registering already registered configfd_type: %s",
+		 cft->name))
+		return -EBUSY;
+
+	if (WARN(cft->ops == NULL,
+		 "BUG: configfd_type has no ops set: %s", cft->name))
+		return -EINVAL;
+
+	if (WARN(cft->ops->alloc && (!cft->ops->free),
+		 "BUG: if configfd ops alloc is set, free must also be"))
+		return -EINVAL;
+
+	if (WARN(cft->name == NULL || cft->name[0] == '\0',
+		 "BUG: configfd_type has no name"))
+		return -EINVAL;
+
+	write_lock(&configfds_lock);
+	c = configfd_type_find(cft->name);
+	if (WARN(*c, "BUG: configfd_type name already exists: %s",
+		 cft->name))
+		ret = -EBUSY;
+	else
+		*c = cft;
+	write_unlock(&configfds_lock);
+
+	return ret;
+}
+
+void configfd_type_unregister(struct configfd_type *cft)
+{
+	struct configfd_type **c;
+
+	write_lock(&configfds_lock);
+	c = configfd_type_find(cft->name);
+	if (WARN(*c != cft, "BUG: trying to register %s from wrong structure",
+		 cft->name))
+		goto out;
+	*c = cft->next;
+	cft->next = NULL;
+ out:
+	write_unlock(&configfds_lock);
+}
diff --git a/include/linux/configfd.h b/include/linux/configfd.h
new file mode 100644
index 000000000000..2a2efd7eabe2
--- /dev/null
+++ b/include/linux/configfd.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef _LINUX_CONFIGFD_H
+#define _LINUX_CONFIGFD_H
+
+#include <linux/logger.h>
+
+#include <uapi/linux/configfd.h>
+
+struct configfd_context;
+
+struct configfd_param {
+	const char			*key;
+	union {
+		char			*string;
+		void			*blob;
+		struct filename		*name;
+		struct file		*file;
+	};
+	int				aux;
+	unsigned int			cmd;
+	size_t				size;
+};
+
+struct configfd_ops {
+	int (*alloc)(struct configfd_context *cfc);
+	void (*free)(const struct configfd_context *cfc);
+	int (*set)(const struct configfd_context *cfc,
+		   struct configfd_param *p);
+	int (*get)(const struct configfd_context *cfc,
+		   struct configfd_param *p);
+	int (*act)(const struct configfd_context *cfc, unsigned int cmd);
+};
+
+struct configfd_type {
+	const char		*name;
+	size_t			data_size;
+	struct module		*owner;
+	struct configfd_ops	*ops;
+	struct configfd_type	*next;
+};
+
+struct configfd_context {
+	const struct configfd_type	*cft;
+	struct logger			*log;
+	void				*data;
+	unsigned int			op;
+};
+
+int configfd_type_register(struct configfd_type *cft);
+void configfd_type_unregister(struct configfd_type *cft);
+
+long ksys_configfd_open(const char __user *config_name, unsigned int flags,
+			unsigned int op);
+long ksys_configfd_action(int fd, unsigned int cmd, const char __user *key,
+			  void __user *value, int aux);
+int kern_configfd_action(int fd, struct configfd_param *p);
+int kern_configfd_open(const char *name, unsigned int flags,
+		       unsigned int op);
+
+#endif
diff --git a/include/uapi/linux/configfd.h b/include/uapi/linux/configfd.h
new file mode 100644
index 000000000000..3e54cfef0182
--- /dev/null
+++ b/include/uapi/linux/configfd.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later WITH Linux-syscall-note */
+
+#ifndef _UAPI_LINUX_CONFIGFD_H
+#define _UAPI_LINUX_CONFIGFD_H
+
+enum configfd_cmd {
+	CONFIGFD_SET_FLAG		= 0,
+	CONFIGFD_SET_STRING		= 1,
+	CONFIGFD_SET_BINARY		= 2,
+	CONFIGFD_SET_PATH		= 3,
+	CONFIGFD_SET_PATH_EMPTY		= 4,
+	CONFIGFD_SET_FD			= 5,
+	CONFIGFD_CMD_CREATE		= 6,
+	CONFIGFD_CMD_RECONFIGURE	= 7,
+	/* gap for 30 other commands */
+	CONFIGFD_SET_INT		= 38,
+	CONFIGFD_GET_FD			= 39,
+};
+
+#endif
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 3/6] configfd: syscall: wire up configfd syscalls
  2019-11-27 20:32     ` James Bottomley
  2019-11-27 20:33       ` [RFC 1/6] logger: add a limited buffer logging facility James Bottomley
  2019-11-27 20:34       ` [RFC 2/6] configfd: add generic file descriptor based configuration parser James Bottomley
@ 2019-11-27 20:35       ` James Bottomley
  2019-11-27 20:36       ` [RFC 4/6] fs: implement fsconfig via configfd James Bottomley
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-27 20:35 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 arch/alpha/kernel/syscalls/syscall.tbl      | 2 ++
 arch/arm/tools/syscall.tbl                  | 2 ++
 arch/arm64/include/asm/unistd.h             | 2 +-
 arch/arm64/include/asm/unistd32.h           | 4 ++++
 arch/ia64/kernel/syscalls/syscall.tbl       | 2 ++
 arch/m68k/kernel/syscalls/syscall.tbl       | 2 ++
 arch/microblaze/kernel/syscalls/syscall.tbl | 2 ++
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 2 ++
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 2 ++
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 2 ++
 arch/parisc/kernel/syscalls/syscall.tbl     | 2 ++
 arch/powerpc/kernel/syscalls/syscall.tbl    | 2 ++
 arch/s390/kernel/syscalls/syscall.tbl       | 2 ++
 arch/sh/kernel/syscalls/syscall.tbl         | 2 ++
 arch/sparc/kernel/syscalls/syscall.tbl      | 2 ++
 arch/x86/entry/syscalls/syscall_32.tbl      | 2 ++
 arch/x86/entry/syscalls/syscall_64.tbl      | 2 ++
 arch/xtensa/kernel/syscalls/syscall.tbl     | 2 ++
 include/linux/syscalls.h                    | 5 +++++
 include/uapi/asm-generic/unistd.h           | 7 ++++++-
 20 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 728fe028c02c..eb13483c2867 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -475,3 +475,5 @@
 543	common	fspick				sys_fspick
 544	common	pidfd_open			sys_pidfd_open
 # 545 reserved for clone3
+546	common	configfd_open			configfd_open
+547	common	configfd_action			configfd_action
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 6da7dc4d79cc..024ae0ca80d1 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -449,3 +449,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 2629a68b8724..8aa00ccb0b96 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		436
+#define __NR_compat_syscalls		438
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 94ab29cf4f00..785fc47ede34 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -879,6 +879,10 @@ __SYSCALL(__NR_fspick, sys_fspick)
 __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 #define __NR_clone3 435
 __SYSCALL(__NR_clone3, sys_clone3)
+#define __NR_configfd_open 436
+__SYSCALL(__NR_configfd_open, sys_configfd_open)
+#define __NR_configfd_action 437
+__SYSCALL(__NR_configfd_action, sys_configfd_action)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 36d5faf4c86c..9679b0e06dea 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -356,3 +356,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 # 435 reserved for clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index a88a285a0e5f..a78740ac8285 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -435,3 +435,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 # 435 reserved for clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 09b0cd7dab0a..c0ae38557315 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -441,3 +441,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index e7c5ab38e403..8a85da0a2fe5 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -374,3 +374,5 @@
 433	n32	fspick				sys_fspick
 434	n32	pidfd_open			sys_pidfd_open
 435	n32	clone3				__sys_clone3
+436	n32	configfd_open			sys_configfd_open
+437	n32	configfd_action			sys_configfd_action
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 13cd66581f3b..cafd5c2cf6f4 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -350,3 +350,5 @@
 433	n64	fspick				sys_fspick
 434	n64	pidfd_open			sys_pidfd_open
 435	n64	clone3				__sys_clone3
+436	n64	configfd_open			sys_configfd_open
+437	n64	configfd_action			sys_configfd_action
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 353539ea4140..8745d12c06c4 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -423,3 +423,5 @@
 433	o32	fspick				sys_fspick
 434	o32	pidfd_open			sys_pidfd_open
 435	o32	clone3				__sys_clone3
+436	o32	configfd_open			sys_configfd_open
+437	o32	configfd_action			sys_configfd_action
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 285ff516150c..fad358f6c107 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -433,3 +433,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3_wrapper
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 43f736ed47f2..2fa485b229fa 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -517,3 +517,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	nospu	clone3				ppc_clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 3054e9c035a3..e6401600f8e9 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -438,3 +438,5 @@
 433  common	fspick			sys_fspick			sys_fspick
 434  common	pidfd_open		sys_pidfd_open			sys_pidfd_open
 435  common	clone3			sys_clone3			sys_clone3
+436  common	configfd_open		sys_configfd_open		sys_configfd_open
+437  common	configfd_action		sys_configfd_action		sys_configfd_action
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index b5ed26c4c005..d97009f75bf9 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -438,3 +438,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 # 435 reserved for clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 8c8cc7537fb2..d899bcef3279 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -481,3 +481,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 # 435 reserved for clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 3fe02546aed3..077cdd659227 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -440,3 +440,5 @@
 433	i386	fspick			sys_fspick			__ia32_sys_fspick
 434	i386	pidfd_open		sys_pidfd_open			__ia32_sys_pidfd_open
 435	i386	clone3			sys_clone3			__ia32_sys_clone3
+436	i386	configfd_open		sys_configfd_open		__ia32_sys_configfd_open
+437	i386	configfd_action		sys_configfd_action		__ia32_sys_configfd_action
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index c29976eca4a8..e5d24b9ed28c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -357,6 +357,8 @@
 433	common	fspick			__x64_sys_fspick
 434	common	pidfd_open		__x64_sys_pidfd_open
 435	common	clone3			__x64_sys_clone3/ptregs
+436	common	configfd_open		__x64_sys_configfd_open
+437	common	configfd_action		__x64_sys_configfd_action
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 25f4de729a6d..77ec1f6997e4 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -406,3 +406,5 @@
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
 435	common	clone3				sys_clone3
+436	common	configfd_open			sys_configfd_open
+437	common	configfd_action			sys_configfd_action
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f7c561c4dcdd..9dd70803e2b1 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1000,6 +1000,11 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags)
 asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
+asmlinkage long sys_configfd_open(const char __user *config_name,
+				  unsigned int flags, unsigned int op);
+asmlinkage long sys_configfd_action(int fd, unsigned int cmd,
+				    const char __user *key, void __user *value,
+				    int aux);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 1fc8faa6e973..6ae20cdfb081 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -849,10 +849,15 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 #ifdef __ARCH_WANT_SYS_CLONE3
 #define __NR_clone3 435
 __SYSCALL(__NR_clone3, sys_clone3)
+
 #endif
+#define __NR_configfd_open 436
+__SYSCALL(__NR_configfd_open, sys_configfd_open)
+#define __NR_configfd_action 437
+__SYSCALL(__NR_configfd_action, sys_configfd_action)
 
 #undef __NR_syscalls
-#define __NR_syscalls 436
+#define __NR_syscalls 438
 
 /*
  * 32 bit systems traditionally used different
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 4/6] fs: implement fsconfig via configfd
  2019-11-27 20:32     ` James Bottomley
                         ` (2 preceding siblings ...)
  2019-11-27 20:35       ` [RFC 3/6] configfd: syscall: wire up configfd syscalls James Bottomley
@ 2019-11-27 20:36       ` James Bottomley
  2019-11-27 20:37       ` [RFC 5/6] fs: expose internal interfaces open_detached_copy and do_reconfigure_mount James Bottomley
  2019-11-27 20:38       ` [RFC 6/6] fs: bind: add configfs type for bind mounts James Bottomley
  5 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-27 20:36 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

This changes the internal implementation of fsconfig, while keeping
all the external system calls.  However, it now becomes possible to
all the same operations via configfd instead of fsconfig.

For example a filesystem remount read-only can be done:

fd = configfd_open("mount", O_CLOEXEC, CONFIGFD_CMD_RECONFIGURE);
mntpnt = open("/path/to/mount", O_PATH);
configfd_action(fd, CONFIGFD_SET_FD, "pathfd", NULL, mntpnt);
configfd_action(fd, CONFIGFD_SET_FLAG, "ro", NULL, 0);
configfd_action(fd, CONFIGFD_CMD_RECONFIGURE, NULL, NULL, 0);

And mounting a tmpfs filesystem nodev,noexec:

fd = configfd_open("tmpfs", O_CLOEXEC, CONFIGFD_CMD_CREATE);
configfd_action(fd, CONFIGFD_SET_INT, "mount_attrs", NULL,
		MOUNT_ATTR_NODEV|MOUNT_ATTR_NOEXEC);
configfd_action(fd, CONFIGFD_CMD_CREATE, NULL, NULL, 0);
configfd_action(fd, CONFIGFD_GET_FD, "mountfd", &mfd, O_CLOEXEC);
mount_move("", mfd, AT_FDCWD, "/mountpoint", MOVE_MOUNT_F_EMPTY_PATH);

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 fs/filesystems.c           |   8 +-
 fs/fs_context.c            | 124 ++---------
 fs/fsopen.c                | 535 ++++++++++++++++++++++-----------------------
 fs/internal.h              |   4 +
 fs/namespace.c             | 111 ++++------
 include/linux/fs.h         |   2 +
 include/linux/fs_context.h |  23 +-
 include/linux/fs_parser.h  |   2 +
 8 files changed, 342 insertions(+), 467 deletions(-)

diff --git a/fs/filesystems.c b/fs/filesystems.c
index 9135646e41ac..ceb6532754c1 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -7,6 +7,7 @@
  *  table of configured filesystems
  */
 
+#include <linux/configfd.h>
 #include <linux/syscalls.h>
 #include <linux/fs.h>
 #include <linux/proc_fs.h>
@@ -82,10 +83,12 @@ int register_filesystem(struct file_system_type * fs)
 		return -EBUSY;
 	write_lock(&file_systems_lock);
 	p = find_filesystem(fs->name, strlen(fs->name));
-	if (*p)
+	if (*p) {
 		res = -EBUSY;
-	else
+	} else {
 		*p = fs;
+		res = fs_context_register(fs);
+	}
 	write_unlock(&file_systems_lock);
 	return res;
 }
@@ -114,6 +117,7 @@ int unregister_filesystem(struct file_system_type * fs)
 		if (fs == *tmp) {
 			*tmp = fs->next;
 			fs->next = NULL;
+			fs_context_unregister(fs);
 			write_unlock(&file_systems_lock);
 			synchronize_rcu();
 			return 0;
diff --git a/fs/fs_context.c b/fs/fs_context.c
index 138b5b4d621d..59bd0aeb0723 100644
--- a/fs/fs_context.c
+++ b/fs/fs_context.c
@@ -235,6 +235,13 @@ int generic_parse_monolithic(struct fs_context *fc, void *data)
 }
 EXPORT_SYMBOL(generic_parse_monolithic);
 
+void fs_context_set_reconfigure(struct fs_context *fc, struct dentry *reference)
+{
+	atomic_inc(&reference->d_sb->s_active);
+	fc->user_ns = get_user_ns(reference->d_sb->s_user_ns);
+	fc->root = dget(reference);
+}
+
 /**
  * alloc_fs_context - Create a filesystem context.
  * @fs_type: The filesystem type.
@@ -279,9 +286,8 @@ static struct fs_context *alloc_fs_context(struct file_system_type *fs_type,
 		fc->user_ns = get_user_ns(reference->d_sb->s_user_ns);
 		break;
 	case FS_CONTEXT_FOR_RECONFIGURE:
-		atomic_inc(&reference->d_sb->s_active);
-		fc->user_ns = get_user_ns(reference->d_sb->s_user_ns);
-		fc->root = dget(reference);
+		if (reference)
+			fs_context_set_reconfigure(fc, reference);
 		break;
 	}
 
@@ -313,7 +319,11 @@ struct fs_context *fs_context_for_reconfigure(struct dentry *dentry,
 					unsigned int sb_flags,
 					unsigned int sb_flags_mask)
 {
-	return alloc_fs_context(dentry->d_sb->s_type, dentry, sb_flags,
+	struct file_system_type *fs_type = NULL;
+
+	if (dentry)
+		fs_type = dentry->d_sb->s_type;
+	return alloc_fs_context(fs_type, dentry, sb_flags,
 				sb_flags_mask, FS_CONTEXT_FOR_RECONFIGURE);
 }
 EXPORT_SYMBOL(fs_context_for_reconfigure);
@@ -361,8 +371,7 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc)
 	get_net(fc->net_ns);
 	get_user_ns(fc->user_ns);
 	get_cred(fc->cred);
-	if (fc->log)
-		refcount_inc(&fc->log->usage);
+	logger_get(fc->cfc->log);
 
 	/* Can't call put until we've called ->dup */
 	ret = fc->ops->dup(fc, src_fc);
@@ -380,108 +389,6 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc)
 }
 EXPORT_SYMBOL(vfs_dup_fs_context);
 
-/**
- * logfc - Log a message to a filesystem context
- * @fc: The filesystem context to log to.
- * @fmt: The format of the buffer.
- */
-void logfc(struct fs_context *fc, const char *fmt, ...)
-{
-	static const char store_failure[] = "OOM: Can't store error string";
-	struct fc_log *log = fc ? fc->log : NULL;
-	const char *p;
-	va_list va;
-	char *q;
-	u8 freeable;
-
-	va_start(va, fmt);
-	if (!strchr(fmt, '%')) {
-		p = fmt;
-		goto unformatted_string;
-	}
-	if (strcmp(fmt, "%s") == 0) {
-		p = va_arg(va, const char *);
-		goto unformatted_string;
-	}
-
-	q = kvasprintf(GFP_KERNEL, fmt, va);
-copied_string:
-	if (!q)
-		goto store_failure;
-	freeable = 1;
-	goto store_string;
-
-unformatted_string:
-	if ((unsigned long)p >= (unsigned long)__start_rodata &&
-	    (unsigned long)p <  (unsigned long)__end_rodata)
-		goto const_string;
-	if (log && within_module_core((unsigned long)p, log->owner))
-		goto const_string;
-	q = kstrdup(p, GFP_KERNEL);
-	goto copied_string;
-
-store_failure:
-	p = store_failure;
-const_string:
-	q = (char *)p;
-	freeable = 0;
-store_string:
-	if (!log) {
-		switch (fmt[0]) {
-		case 'w':
-			printk(KERN_WARNING "%s\n", q + 2);
-			break;
-		case 'e':
-			printk(KERN_ERR "%s\n", q + 2);
-			break;
-		default:
-			printk(KERN_NOTICE "%s\n", q + 2);
-			break;
-		}
-		if (freeable)
-			kfree(q);
-	} else {
-		unsigned int logsize = ARRAY_SIZE(log->buffer);
-		u8 index;
-
-		index = log->head & (logsize - 1);
-		BUILD_BUG_ON(sizeof(log->head) != sizeof(u8) ||
-			     sizeof(log->tail) != sizeof(u8));
-		if ((u8)(log->head - log->tail) == logsize) {
-			/* The buffer is full, discard the oldest message */
-			if (log->need_free & (1 << index))
-				kfree(log->buffer[index]);
-			log->tail++;
-		}
-
-		log->buffer[index] = q;
-		log->need_free &= ~(1 << index);
-		log->need_free |= freeable << index;
-		log->head++;
-	}
-	va_end(va);
-}
-EXPORT_SYMBOL(logfc);
-
-/*
- * Free a logging structure.
- */
-static void put_fc_log(struct fs_context *fc)
-{
-	struct fc_log *log = fc->log;
-	int i;
-
-	if (log) {
-		if (refcount_dec_and_test(&log->usage)) {
-			fc->log = NULL;
-			for (i = 0; i <= 7; i++)
-				if (log->need_free & (1 << i))
-					kfree(log->buffer[i]);
-			kfree(log);
-		}
-	}
-}
-
 /**
  * put_fs_context - Dispose of a superblock configuration context.
  * @fc: The context to dispose of.
@@ -504,7 +411,6 @@ void put_fs_context(struct fs_context *fc)
 	put_net(fc->net_ns);
 	put_user_ns(fc->user_ns);
 	put_cred(fc->cred);
-	put_fc_log(fc);
 	put_filesystem(fc->fs_type);
 	kfree(fc->source);
 	kfree(fc);
diff --git a/fs/fsopen.c b/fs/fsopen.c
index 043ffa8dc263..e83df6fc2b08 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -5,6 +5,7 @@
  * Written by David Howells (dhowells@redhat.com)
  */
 
+#include <linux/configfd.h>
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
 #include <linux/slab.h>
@@ -18,90 +19,41 @@
 #include "internal.h"
 #include "mount.h"
 
-/*
- * Allow the user to read back any error, warning or informational messages.
- */
-static ssize_t fscontext_read(struct file *file,
-			      char __user *_buf, size_t len, loff_t *pos)
+static void fsopen_cf_free(const struct configfd_context *cfc)
 {
-	struct fs_context *fc = file->private_data;
-	struct fc_log *log = fc->log;
-	unsigned int logsize = ARRAY_SIZE(log->buffer);
-	ssize_t ret;
-	char *p;
-	bool need_free;
-	int index, n;
-
-	ret = mutex_lock_interruptible(&fc->uapi_mutex);
-	if (ret < 0)
-		return ret;
+	struct fs_context *fc = cfc->data;
 
-	if (log->head == log->tail) {
-		mutex_unlock(&fc->uapi_mutex);
-		return -ENODATA;
-	}
-
-	index = log->tail & (logsize - 1);
-	p = log->buffer[index];
-	need_free = log->need_free & (1 << index);
-	log->buffer[index] = NULL;
-	log->need_free &= ~(1 << index);
-	log->tail++;
-	mutex_unlock(&fc->uapi_mutex);
-
-	ret = -EMSGSIZE;
-	n = strlen(p);
-	if (n > len)
-		goto err_free;
-	ret = -EFAULT;
-	if (copy_to_user(_buf, p, n) != 0)
-		goto err_free;
-	ret = n;
-
-err_free:
-	if (need_free)
-		kfree(p);
-	return ret;
+	put_fs_context(fc);
 }
 
-static int fscontext_release(struct inode *inode, struct file *file)
+static int fsopen_cf_alloc(struct configfd_context *cfc)
 {
-	struct fs_context *fc = file->private_data;
+	struct fs_context *fc;
+	struct file_system_type *fs_type;
 
-	if (fc) {
-		file->private_data = NULL;
-		put_fs_context(fc);
-	}
-	return 0;
-}
+	if (cfc->op != CONFIGFD_CMD_CREATE)
+		return -EINVAL;
 
-const struct file_operations fscontext_fops = {
-	.read		= fscontext_read,
-	.release	= fscontext_release,
-	.llseek		= no_llseek,
-};
+	fs_type = get_fs_type(cfc->cft->name);
+	if (WARN(!fs_type, "BUG: fs_type %s should exist if configfd type does",
+		 cfc->cft->name))
+		return -ENODEV;
 
-/*
- * Attach a filesystem context to a file and an fd.
- */
-static int fscontext_create_fd(struct fs_context *fc, unsigned int o_flags)
-{
-	int fd;
+	if (cfc->op == CONFIGFD_CMD_RECONFIGURE)
+		fc = fs_context_for_reconfigure(NULL, 0, 0);
+	else
+		fc = fs_context_for_mount(fs_type, 0);
+	put_filesystem(fs_type);
+	if (IS_ERR(fc))
+		return PTR_ERR(fc);
 
-	fd = anon_inode_getfd("[fscontext]", &fscontext_fops, fc,
-			      O_RDWR | o_flags);
-	if (fd < 0)
-		put_fs_context(fc);
-	return fd;
-}
+	if (cfc->op == CONFIGFD_CMD_RECONFIGURE)
+		fc->phase = FS_CONTEXT_RECONF_PARAMS;
+	else
+		fc->phase = FS_CONTEXT_CREATE_PARAMS;
+	cfc->data = fc;
+	fc->cfc = cfc;
 
-static int fscontext_alloc_log(struct fs_context *fc)
-{
-	fc->log = kzalloc(sizeof(*fc->log), GFP_KERNEL);
-	if (!fc->log)
-		return -ENOMEM;
-	refcount_set(&fc->log->usage, 1);
-	fc->log->owner = fc->fs_type->owner;
 	return 0;
 }
 
@@ -114,42 +66,14 @@ static int fscontext_alloc_log(struct fs_context *fc)
  */
 SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags)
 {
-	struct file_system_type *fs_type;
-	struct fs_context *fc;
-	const char *fs_name;
-	int ret;
-
 	if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	if (flags & ~FSOPEN_CLOEXEC)
 		return -EINVAL;
 
-	fs_name = strndup_user(_fs_name, PAGE_SIZE);
-	if (IS_ERR(fs_name))
-		return PTR_ERR(fs_name);
-
-	fs_type = get_fs_type(fs_name);
-	kfree(fs_name);
-	if (!fs_type)
-		return -ENODEV;
-
-	fc = fs_context_for_mount(fs_type, 0);
-	put_filesystem(fs_type);
-	if (IS_ERR(fc))
-		return PTR_ERR(fc);
-
-	fc->phase = FS_CONTEXT_CREATE_PARAMS;
-
-	ret = fscontext_alloc_log(fc);
-	if (ret < 0)
-		goto err_fc;
-
-	return fscontext_create_fd(fc, flags & FSOPEN_CLOEXEC ? O_CLOEXEC : 0);
-
-err_fc:
-	put_fs_context(fc);
-	return ret;
+	return  ksys_configfd_open(_fs_name, flags ? O_CLOEXEC : 0,
+				   CONFIGFD_CMD_CREATE);
 }
 
 /*
@@ -157,10 +81,14 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags)
  */
 SYSCALL_DEFINE3(fspick, int, dfd, const char __user *, path, unsigned int, flags)
 {
-	struct fs_context *fc;
-	struct path target;
+	struct path *target;
 	unsigned int lookup_flags;
 	int ret;
+	int fd;
+	struct configfd_param cp;
+	struct open_flags of;
+	struct filename *name;
+	struct file *file;
 
 	if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
@@ -178,34 +106,49 @@ SYSCALL_DEFINE3(fspick, int, dfd, const char __user *, path, unsigned int, flags
 		lookup_flags &= ~LOOKUP_AUTOMOUNT;
 	if (flags & FSPICK_EMPTY_PATH)
 		lookup_flags |= LOOKUP_EMPTY;
-	ret = user_path_at(dfd, path, lookup_flags, &target);
-	if (ret < 0)
-		goto err;
 
-	ret = -EINVAL;
-	if (target.mnt->mnt_root != target.dentry)
-		goto err_path;
+	of.lookup_flags = lookup_flags;
+	of.intent = LOOKUP_OPEN;
+	of.acc_mode = 0;
+	of.mode = 0;
+	of.open_flag = O_PATH;
 
-	fc = fs_context_for_reconfigure(target.dentry, 0, 0);
-	if (IS_ERR(fc)) {
-		ret = PTR_ERR(fc);
-		goto err_path;
-	}
+	name = getname_kernel(path);
+	if (IS_ERR(name))
+		return PTR_ERR(name);
 
-	fc->phase = FS_CONTEXT_RECONF_PARAMS;
+	file = do_filp_open(dfd, name, &of);
+	putname(name);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
 
-	ret = fscontext_alloc_log(fc);
-	if (ret < 0)
-		goto err_fc;
+	target = &file->f_path;
+	ret = -EINVAL;
+	if (target->mnt->mnt_root != target->dentry)
+		goto err_file;
 
-	path_put(&target);
-	return fscontext_create_fd(fc, flags & FSPICK_CLOEXEC ? O_CLOEXEC : 0);
+	ret = fd = kern_configfd_open("mount",
+				      flags & FSPICK_CLOEXEC ? O_CLOEXEC : 0,
+				      CONFIGFD_CMD_RECONFIGURE);
+	if (ret < 0)
+		goto err_file;
+	cp = (struct configfd_param) {
+		.key = "pathfd",
+		.file = file,
+		.cmd = CONFIGFD_SET_FD,
+	};
+	ret = kern_configfd_action(fd, &cp);
+	/* file gets NULL'd if successfully installed otherwise we put */
+	if (cp.file)
+		fput(file);
+	if (ret < 0)
+		goto err_close;
+	return fd;
 
-err_fc:
-	put_fs_context(fc);
-err_path:
-	path_put(&target);
-err:
+ err_close:
+	ksys_close(fd);
+ err_file:
+	fput(file);
 	return ret;
 }
 
@@ -268,6 +211,161 @@ static int vfs_fsconfig_locked(struct fs_context *fc, int cmd,
 	return ret;
 }
 
+static int fsopen_cf_mount_set(const struct configfd_context *icfc,
+			       struct configfd_param *p)
+{
+	struct fs_context *fc;
+	struct path *path;
+	/* cheat: we're going to mutate the context so drop the const */
+	struct configfd_context *cfc = (struct configfd_context *)icfc;
+
+	if (strcmp(p->key, "pathfd") != 0 || p->cmd != CONFIGFD_SET_FD) {
+		logger_err(cfc->log, "must set pathfd before any other parameter");
+		return -EINVAL;
+	}
+	if (cfc->op != CONFIGFD_CMD_RECONFIGURE) {
+		logger_err(cfc->log, "may only be opened for reconfigure");
+		return -EINVAL;
+	}
+
+	path = &p->file->f_path;
+	if (path->mnt->mnt_root != path->dentry) {
+		logger_err(cfc->log, "pathfd must identify a mount point");
+		return -EINVAL;
+	}
+	fc = fs_context_for_reconfigure(path->dentry, 0, 0);
+	if (IS_ERR(fc))
+		return PTR_ERR(fc);
+	fc->phase = FS_CONTEXT_RECONF_PARAMS;
+	/* hacky: reconfigure the cfc so all ops now pass to the
+	 * correct filesystem type */
+	cfc->cft = &path->dentry->d_sb->s_type->cft;
+	/* more hackery: mount type is built in so no module ref, but
+	 * filesystem may be modular, so acquire a reference to the
+	 * module for configfd_free to put later */
+	try_module_get(cfc->cft->owner);
+	cfc->data = fc;
+	fc->cfc = cfc;
+
+	return 0;
+}
+
+static int fsopen_cf_set(const struct configfd_context *cfc,
+			 struct configfd_param *p)
+{
+	struct fs_context *fc = cfc->data;
+	int ret;
+
+	struct fs_parameter param = {
+		.type	= fs_value_is_undefined,
+		.key	= p->key,
+	};
+
+	/* parameter we intercept */
+	if (strcmp(p->key, "mount_attrs") == 0 &&
+		   p->cmd == CONFIGFD_SET_INT) {
+		fc->mnt_flags = 0;
+		if (p->aux & ~(MOUNT_ATTR_RDONLY |
+			       MOUNT_ATTR_NOSUID |
+			       MOUNT_ATTR_NODEV |
+			       MOUNT_ATTR_NOEXEC |
+			       MOUNT_ATTR__ATIME |
+			       MOUNT_ATTR_NODIRATIME))
+			return -EINVAL;
+
+		if (p->aux & MOUNT_ATTR_RDONLY)
+			fc->mnt_flags |= MNT_READONLY;
+		if (p->aux & MOUNT_ATTR_NOSUID)
+			fc->mnt_flags |= MNT_NOSUID;
+		if (p->aux & MOUNT_ATTR_NODEV)
+			fc->mnt_flags |= MNT_NODEV;
+		if (p->aux & MOUNT_ATTR_NOEXEC)
+			fc->mnt_flags |= MNT_NOEXEC;
+		if (p->aux & MOUNT_ATTR_NODIRATIME)
+			fc->mnt_flags |= MNT_NODIRATIME;
+
+		switch (p->aux & MOUNT_ATTR__ATIME) {
+		case MOUNT_ATTR_STRICTATIME:
+			break;
+		case MOUNT_ATTR_NOATIME:
+			fc->mnt_flags |= MNT_NOATIME;
+			break;
+		case MOUNT_ATTR_RELATIME:
+			fc->mnt_flags |= MNT_RELATIME;
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		return 0;
+	}
+
+	if (fc->ops == &legacy_fs_context_ops) {
+		switch (p->cmd) {
+		case FSCONFIG_SET_BINARY:
+		case FSCONFIG_SET_PATH:
+		case FSCONFIG_SET_PATH_EMPTY:
+		case FSCONFIG_SET_FD:
+			return -EOPNOTSUPP;
+		default:
+			break;
+		}
+	}
+	switch (p->cmd) {
+	case FSCONFIG_SET_FLAG:
+		param.type = fs_value_is_flag;
+		break;
+	case FSCONFIG_SET_STRING:
+		param.type = fs_value_is_string;
+		param.string = p->string;
+		param.size = p->size;
+		break;
+	case FSCONFIG_SET_BINARY:
+		param.type = fs_value_is_blob;
+		param.size = p->size;
+		param.blob = p->blob;
+		break;
+	case FSCONFIG_SET_PATH:
+		param.type = fs_value_is_filename;
+		param.name = p->name;
+		param.dirfd = p->aux;
+		param.size = p->size;
+		break;
+	case FSCONFIG_SET_PATH_EMPTY:
+		param.type = fs_value_is_filename_empty;
+		param.name = p->name;
+		param.dirfd = p->aux;
+		param.size = p->size;
+		break;
+	case FSCONFIG_SET_FD:
+		param.type = fs_value_is_file;
+		param.file = p->file;
+		break;
+	default:
+		break;
+	}
+
+	ret = mutex_lock_interruptible(&fc->uapi_mutex);
+	if (ret == 0) {
+		ret = vfs_fsconfig_locked(fc, p->cmd, &param);
+		mutex_unlock(&fc->uapi_mutex);
+	}
+	return ret;
+}
+
+static int fsopen_cf_act(const struct configfd_context *cfc,
+			 unsigned int cmd)
+{
+	struct fs_context *fc = cfc->data;
+	int ret = mutex_lock_interruptible(&fc->uapi_mutex);
+
+	if (ret == 0) {
+		ret = vfs_fsconfig_locked(fc, cmd, NULL);
+		mutex_unlock(&fc->uapi_mutex);
+	}
+	return ret;
+}
+
 /**
  * sys_fsconfig - Set parameters and trigger actions on a context
  * @fd: The filesystem context to act upon
@@ -315,161 +413,50 @@ SYSCALL_DEFINE5(fsconfig,
 		int, fd,
 		unsigned int, cmd,
 		const char __user *, _key,
-		const void __user *, _value,
+		void __user *, _value,
 		int, aux)
 {
-	struct fs_context *fc;
-	struct fd f;
-	int ret;
+	return ksys_configfd_action(fd, cmd, _key, _value, aux);
+}
 
-	struct fs_parameter param = {
-		.type	= fs_value_is_undefined,
-	};
+static struct configfd_ops fsopen_cf_ops = {
+	.alloc = fsopen_cf_alloc,
+	.free = fsopen_cf_free,
+	.set = fsopen_cf_set,
+	.act = fsopen_cf_act,
+	.get = fsopen_cf_get,
+};
 
-	if (fd < 0)
-		return -EINVAL;
+static struct configfd_ops fsopen_cf_mount_ops = {
+	.set = fsopen_cf_mount_set,
+};
 
-	switch (cmd) {
-	case FSCONFIG_SET_FLAG:
-		if (!_key || _value || aux)
-			return -EINVAL;
-		break;
-	case FSCONFIG_SET_STRING:
-		if (!_key || !_value || aux)
-			return -EINVAL;
-		break;
-	case FSCONFIG_SET_BINARY:
-		if (!_key || !_value || aux <= 0 || aux > 1024 * 1024)
-			return -EINVAL;
-		break;
-	case FSCONFIG_SET_PATH:
-	case FSCONFIG_SET_PATH_EMPTY:
-		if (!_key || !_value || (aux != AT_FDCWD && aux < 0))
-			return -EINVAL;
-		break;
-	case FSCONFIG_SET_FD:
-		if (!_key || _value || aux < 0)
-			return -EINVAL;
-		break;
-	case FSCONFIG_CMD_CREATE:
-	case FSCONFIG_CMD_RECONFIGURE:
-		if (_key || _value || aux)
-			return -EINVAL;
-		break;
-	default:
-		return -EOPNOTSUPP;
-	}
+static struct configfd_type fsopen_mount_type = {
+	.name = "mount",
+	.ops = &fsopen_cf_mount_ops,
+};
 
-	f = fdget(fd);
-	if (!f.file)
-		return -EBADF;
-	ret = -EINVAL;
-	if (f.file->f_op != &fscontext_fops)
-		goto out_f;
+int fs_context_register(struct file_system_type *fs)
+{
+	int res;
 
-	fc = f.file->private_data;
-	if (fc->ops == &legacy_fs_context_ops) {
-		switch (cmd) {
-		case FSCONFIG_SET_BINARY:
-		case FSCONFIG_SET_PATH:
-		case FSCONFIG_SET_PATH_EMPTY:
-		case FSCONFIG_SET_FD:
-			ret = -EOPNOTSUPP;
-			goto out_f;
-		}
-	}
+	fs->cft.name = fs->name;
+	fs->cft.ops = &fsopen_cf_ops;
+	fs->cft.owner = fs->owner;
+	res = configfd_type_register(&(fs->cft));
 
-	if (_key) {
-		param.key = strndup_user(_key, 256);
-		if (IS_ERR(param.key)) {
-			ret = PTR_ERR(param.key);
-			goto out_f;
-		}
-	}
+	return res;
+}
 
-	switch (cmd) {
-	case FSCONFIG_SET_FLAG:
-		param.type = fs_value_is_flag;
-		break;
-	case FSCONFIG_SET_STRING:
-		param.type = fs_value_is_string;
-		param.string = strndup_user(_value, 256);
-		if (IS_ERR(param.string)) {
-			ret = PTR_ERR(param.string);
-			goto out_key;
-		}
-		param.size = strlen(param.string);
-		break;
-	case FSCONFIG_SET_BINARY:
-		param.type = fs_value_is_blob;
-		param.size = aux;
-		param.blob = memdup_user_nul(_value, aux);
-		if (IS_ERR(param.blob)) {
-			ret = PTR_ERR(param.blob);
-			goto out_key;
-		}
-		break;
-	case FSCONFIG_SET_PATH:
-		param.type = fs_value_is_filename;
-		param.name = getname_flags(_value, 0, NULL);
-		if (IS_ERR(param.name)) {
-			ret = PTR_ERR(param.name);
-			goto out_key;
-		}
-		param.dirfd = aux;
-		param.size = strlen(param.name->name);
-		break;
-	case FSCONFIG_SET_PATH_EMPTY:
-		param.type = fs_value_is_filename_empty;
-		param.name = getname_flags(_value, LOOKUP_EMPTY, NULL);
-		if (IS_ERR(param.name)) {
-			ret = PTR_ERR(param.name);
-			goto out_key;
-		}
-		param.dirfd = aux;
-		param.size = strlen(param.name->name);
-		break;
-	case FSCONFIG_SET_FD:
-		param.type = fs_value_is_file;
-		ret = -EBADF;
-		param.file = fget(aux);
-		if (!param.file)
-			goto out_key;
-		break;
-	default:
-		break;
-	}
+void fs_context_unregister(struct file_system_type *fs)
+{
+	configfd_type_unregister(&(fs->cft));
+}
 
-	ret = mutex_lock_interruptible(&fc->uapi_mutex);
-	if (ret == 0) {
-		ret = vfs_fsconfig_locked(fc, cmd, &param);
-		mutex_unlock(&fc->uapi_mutex);
-	}
+static int __init fsopen_init(void)
+{
+	configfd_type_register(&fsopen_mount_type);
 
-	/* Clean up the our record of any value that we obtained from
-	 * userspace.  Note that the value may have been stolen by the LSM or
-	 * filesystem, in which case the value pointer will have been cleared.
-	 */
-	switch (cmd) {
-	case FSCONFIG_SET_STRING:
-	case FSCONFIG_SET_BINARY:
-		kfree(param.string);
-		break;
-	case FSCONFIG_SET_PATH:
-	case FSCONFIG_SET_PATH_EMPTY:
-		if (param.name)
-			putname(param.name);
-		break;
-	case FSCONFIG_SET_FD:
-		if (param.file)
-			fput(param.file);
-		break;
-	default:
-		break;
-	}
-out_key:
-	kfree(param.key);
-out_f:
-	fdput(f);
-	return ret;
+	return 0;
 }
+fs_initcall(fsopen_init);
diff --git a/fs/internal.h b/fs/internal.h
index 315fcd8d237c..e3039de79134 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -91,6 +91,10 @@ extern int __mnt_want_write_file(struct file *);
 extern void __mnt_drop_write_file(struct file *);
 
 extern void dissolve_on_fput(struct vfsmount *);
+
+int fsopen_cf_get(const struct configfd_context *cfc,
+		  struct configfd_param *p);
+
 /*
  * fs_struct.c
  */
diff --git a/fs/namespace.c b/fs/namespace.c
index 2adfe7b166a3..13be853e9225 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -30,6 +30,7 @@
 #include <uapi/linux/mount.h>
 #include <linux/fs_context.h>
 #include <linux/shmem_fs.h>
+#include <linux/configfd.h>
 
 #include "pnode.h"
 #include "internal.h"
@@ -3365,70 +3366,19 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
 	return ksys_mount(dev_name, dir_name, type, flags, data);
 }
 
-/*
- * Create a kernel mount representation for a new, prepared superblock
- * (specified by fs_fd) and attach to an open_tree-like file descriptor.
- */
-SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
-		unsigned int, attr_flags)
+int fsopen_cf_get(const struct configfd_context *cfc,
+			 struct configfd_param *p)
 {
 	struct mnt_namespace *ns;
-	struct fs_context *fc;
+	struct fs_context *fc = cfc->data;
 	struct file *file;
 	struct path newmount;
 	struct mount *mnt;
-	struct fd f;
-	unsigned int mnt_flags = 0;
 	long ret;
 
-	if (!may_mount())
-		return -EPERM;
-
-	if ((flags & ~(FSMOUNT_CLOEXEC)) != 0)
+	if (strcmp(p->key, "mountfd") != 0 || p->cmd != CONFIGFD_GET_FD)
 		return -EINVAL;
 
-	if (attr_flags & ~(MOUNT_ATTR_RDONLY |
-			   MOUNT_ATTR_NOSUID |
-			   MOUNT_ATTR_NODEV |
-			   MOUNT_ATTR_NOEXEC |
-			   MOUNT_ATTR__ATIME |
-			   MOUNT_ATTR_NODIRATIME))
-		return -EINVAL;
-
-	if (attr_flags & MOUNT_ATTR_RDONLY)
-		mnt_flags |= MNT_READONLY;
-	if (attr_flags & MOUNT_ATTR_NOSUID)
-		mnt_flags |= MNT_NOSUID;
-	if (attr_flags & MOUNT_ATTR_NODEV)
-		mnt_flags |= MNT_NODEV;
-	if (attr_flags & MOUNT_ATTR_NOEXEC)
-		mnt_flags |= MNT_NOEXEC;
-	if (attr_flags & MOUNT_ATTR_NODIRATIME)
-		mnt_flags |= MNT_NODIRATIME;
-
-	switch (attr_flags & MOUNT_ATTR__ATIME) {
-	case MOUNT_ATTR_STRICTATIME:
-		break;
-	case MOUNT_ATTR_NOATIME:
-		mnt_flags |= MNT_NOATIME;
-		break;
-	case MOUNT_ATTR_RELATIME:
-		mnt_flags |= MNT_RELATIME;
-		break;
-	default:
-		return -EINVAL;
-	}
-
-	f = fdget(fs_fd);
-	if (!f.file)
-		return -EBADF;
-
-	ret = -EINVAL;
-	if (f.file->f_op != &fscontext_fops)
-		goto err_fsfd;
-
-	fc = f.file->private_data;
-
 	ret = mutex_lock_interruptible(&fc->uapi_mutex);
 	if (ret < 0)
 		goto err_fsfd;
@@ -3439,7 +3389,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 		goto err_unlock;
 
 	ret = -EPERM;
-	if (mount_too_revealing(fc->root->d_sb, &mnt_flags)) {
+	if (mount_too_revealing(fc->root->d_sb, &fc->mnt_flags)) {
 		pr_warn("VFS: Mount too revealing\n");
 		goto err_unlock;
 	}
@@ -3458,7 +3408,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 		goto err_unlock;
 	}
 	newmount.dentry = dget(fc->root);
-	newmount.mnt->mnt_flags = mnt_flags;
+	newmount.mnt->mnt_flags = fc->mnt_flags;
 
 	/* We've done the mount bit - now move the file context into more or
 	 * less the same state as if we'd done an fspick().  We don't want to
@@ -3488,23 +3438,56 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 		ret = PTR_ERR(file);
 		goto err_path;
 	}
+	ret = 0;
 	file->f_mode |= FMODE_NEED_UNMOUNT;
-
-	ret = get_unused_fd_flags((flags & FSMOUNT_CLOEXEC) ? O_CLOEXEC : 0);
-	if (ret >= 0)
-		fd_install(ret, file);
-	else
-		fput(file);
+	p->file = file;
 
 err_path:
 	path_put(&newmount);
 err_unlock:
 	mutex_unlock(&fc->uapi_mutex);
 err_fsfd:
-	fdput(f);
+
 	return ret;
 }
 
+/*
+ * Create a kernel mount representation for a new, prepared superblock
+ * (specified by fs_fd) and attach to an open_tree-like file descriptor.
+ */
+SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
+		unsigned int, attr_flags)
+{
+	int ret;
+	int oflags;
+	struct configfd_param p = {
+		.key = "mount_opts",
+		.cmd = CONFIGFD_SET_INT,
+		.aux = attr_flags,
+	};
+
+	if (!may_mount())
+		return -EPERM;
+
+	if ((flags & ~(FSMOUNT_CLOEXEC)) != 0)
+		return -EINVAL;
+
+	oflags = (flags & ~(FSMOUNT_CLOEXEC)) ? O_CLOEXEC : 0;
+
+	ret = kern_configfd_action(fs_fd, &p);
+	if (ret < 0)
+		return ret;
+	p = (struct configfd_param) {
+		.key = "mountfd",
+		.cmd = CONFIGFD_GET_FD,
+		.aux = oflags,
+	};
+	ret = kern_configfd_action(fs_fd, &p);
+	if (ret < 0)
+		return ret;
+	return p.aux;
+}
+
 /*
  * Move a mount from one place to another.  In combination with
  * fsopen()/fsmount() this is used to install a new mount and in combination
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e0d909d35763..51f1408180f5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_FS_H
 #define _LINUX_FS_H
 
+#include <linux/configfd.h>
 #include <linux/linkage.h>
 #include <linux/wait_bit.h>
 #include <linux/kdev_t.h>
@@ -2233,6 +2234,7 @@ struct file_system_type {
 	struct lock_class_key i_lock_key;
 	struct lock_class_key i_mutex_key;
 	struct lock_class_key i_mutex_dir_key;
+	struct configfd_type cft;
 };
 
 #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME)
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index e5c14e2c53d3..83686c0c9fed 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -97,10 +97,12 @@ struct fs_context {
 	const char		*source;	/* The source name (eg. dev path) */
 	void			*security;	/* Linux S&M options */
 	void			*s_fs_info;	/* Proposed s_fs_info */
+	struct configfd_context	*cfc;
 	unsigned int		sb_flags;	/* Proposed superblock flags (SB_*) */
 	unsigned int		sb_flags_mask;	/* Superblock flags that were changed */
 	unsigned int		s_iflags;	/* OR'd with sb->s_iflags */
 	unsigned int		lsm_flags;	/* Information flags from the fs to the LSM */
+	unsigned int		mnt_flags;	/* mnt flags translated from MOUNT_ATTRS */
 	enum fs_context_purpose	purpose:8;
 	enum fs_context_phase	phase:8;	/* The phase the context is in */
 	bool			need_free:1;	/* Need to call ops->free() */
@@ -134,6 +136,8 @@ extern int vfs_parse_fs_string(struct fs_context *fc, const char *key,
 extern int generic_parse_monolithic(struct fs_context *fc, void *data);
 extern int vfs_get_tree(struct fs_context *fc);
 extern void put_fs_context(struct fs_context *fc);
+extern void fs_context_set_reconfigure(struct fs_context *fc,
+				       struct dentry *reference);
 
 /*
  * sget() wrappers to be called from the ->get_tree() op.
@@ -166,24 +170,7 @@ extern int get_tree_keyed(struct fs_context *fc,
 extern int get_tree_bdev(struct fs_context *fc,
 			       int (*fill_super)(struct super_block *sb,
 						 struct fs_context *fc));
-
-extern const struct file_operations fscontext_fops;
-
-/*
- * Mount error, warning and informational message logging.  This structure is
- * shareable between a mount and a subordinate mount.
- */
-struct fc_log {
-	refcount_t	usage;
-	u8		head;		/* Insertion index in buffer[] */
-	u8		tail;		/* Removal index in buffer[] */
-	u8		need_free;	/* Mask of kfree'able items in buffer[] */
-	struct module	*owner;		/* Owner module for strings that don't then need freeing */
-	char		*buffer[8];
-};
-
-extern __attribute__((format(printf, 2, 3)))
-void logfc(struct fs_context *fc, const char *fmt, ...);
+#define logfc(fc, fmt, ...) ({ logger_log(fc->cfc->log, fmt, ## __VA_ARGS__); })
 
 /**
  * infof - Store supplementary informational message
diff --git a/include/linux/fs_parser.h b/include/linux/fs_parser.h
index dee140db6240..c41af1b51af6 100644
--- a/include/linux/fs_parser.h
+++ b/include/linux/fs_parser.h
@@ -143,5 +143,7 @@ static inline bool fs_validate_description(const struct fs_parameter_description
 #define fsparam_path(NAME, OPT)	__fsparam(fs_param_is_path, NAME, OPT, 0)
 #define fsparam_fd(NAME, OPT)	__fsparam(fs_param_is_fd, NAME, OPT, 0)
 
+int fs_context_register(struct file_system_type *fs);
+void fs_context_unregister(struct file_system_type *fs);
 
 #endif /* _LINUX_FS_PARSER_H */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 5/6] fs: expose internal interfaces open_detached_copy and do_reconfigure_mount
  2019-11-27 20:32     ` James Bottomley
                         ` (3 preceding siblings ...)
  2019-11-27 20:36       ` [RFC 4/6] fs: implement fsconfig via configfd James Bottomley
@ 2019-11-27 20:37       ` James Bottomley
  2019-11-27 20:38       ` [RFC 6/6] fs: bind: add configfs type for bind mounts James Bottomley
  5 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-27 20:37 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

These are needed for the forthcoming bind configure type to work.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 fs/internal.h  | 3 +++
 fs/namespace.c | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index e3039de79134..f27136058d7d 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -95,6 +95,9 @@ extern void dissolve_on_fput(struct vfsmount *);
 int fsopen_cf_get(const struct configfd_context *cfc,
 		  struct configfd_param *p);
 
+extern int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags);
+extern struct file *open_detached_copy(struct path *path, bool recursive);
+
 /*
  * fs_struct.c
  */
diff --git a/fs/namespace.c b/fs/namespace.c
index 13be853e9225..9dcbafe62e4e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2318,7 +2318,7 @@ static int do_loopback(struct path *path, const char *old_name,
 	return err;
 }
 
-static struct file *open_detached_copy(struct path *path, bool recursive)
+struct file *open_detached_copy(struct path *path, bool recursive)
 {
 	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
 	struct mnt_namespace *ns = alloc_mnt_ns(user_ns, true);
@@ -2494,7 +2494,7 @@ static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *
  * superblock it refers to.  This is triggered by specifying MS_REMOUNT|MS_BIND
  * to mount(2).
  */
-static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
+int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
 {
 	struct super_block *sb = path->mnt->mnt_sb;
 	struct mount *mnt = real_mount(path->mnt);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 6/6] fs: bind: add configfs type for bind mounts
  2019-11-27 20:32     ` James Bottomley
                         ` (4 preceding siblings ...)
  2019-11-27 20:37       ` [RFC 5/6] fs: expose internal interfaces open_detached_copy and do_reconfigure_mount James Bottomley
@ 2019-11-27 20:38       ` James Bottomley
  5 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-11-27 20:38 UTC (permalink / raw)
  To: David Howells, Christian Brauner; +Cc: linux-fsdevel, Al Viro, Miklos Szeredi

This can do the equivalent of open_tree and also do bind mount
reconfiguration from ro to rw and vice versa.

To get the equvalent of open tree you need to do

   mnt = open("/path/to/tree", O_PATH);
   fd = configfs_open(bind, O_CLOEXEC);
   configfs_action(fd, CONFIGFD_SET_FD, "pathfd", NULL, mnt);
   configfs_action(fd, CONFIGFD_SET_FLAG, "detached", NULL, 0);
   configfs_action(fd, CONFIGFD_SET_FLAG, "recursive", NULL, 0);
   configfs_action(fd, CONFIGFD_CMD_CREATE, NULL, NULL, 0);
   configfs_action(fd, CONFIGFD_GET_FD, "bindfd", &bfd, NULL, O_CLOEXEC);

And bfd will now contain the file descriptor to pass to move_tree.
There is a deficiency over the original implementation in that the
open system call has no way of clearing the LOOKUP_AUTOMOUNT path, but
that's fixable.

To do a mount reconfigure to change the bind mount to readonly do

   mnt = open("/path/to/tree", O_PATH);
   fd = configfs_open(bind, O_CLOEXEC);
   configfs_action(fd, CONFIGFD_SET_FD, "pathfd", NULL, mnt);
   configfs_action(fd, CONFIGFD_SET_FLAG, "ro", NULL, 0);
   configfs_action(fd, CONFIGFD_CMD_RECONFIGURE, NULL, NULL, 0);

And the bind properties will be changed.  You can also pass the "rw"
flag to reset the read only.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 fs/Makefile |   2 +-
 fs/bind.c   | 193 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 fs/bind.c

diff --git a/fs/Makefile b/fs/Makefile
index 569563f6c0d5..c676ca15c644 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -14,7 +14,7 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		pnode.o splice.o sync.o utimes.o d_path.o \
 		stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
 		fs_types.o fs_context.o fs_parser.o fsopen.o \
-		configfd.o
+		configfd.o bind.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o block_dev.o direct-io.o mpage.o
diff --git a/fs/bind.c b/fs/bind.c
new file mode 100644
index 000000000000..905572c70260
--- /dev/null
+++ b/fs/bind.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Dummy configfd handler for doing context based configuration
+ * on bind mounts
+ *
+ * Copyright (C) James.Bottomley@HansenPartnership.com
+ */
+
+#include <linux/configfd.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/mount.h>
+#include <linux/nsproxy.h>
+
+#include "internal.h"
+#include "mount.h"
+
+struct bind_data {
+	bool		ro:1;
+	bool		detached:1;
+	bool		recursive:1;
+	struct file	*file;
+	struct file	*retfile;
+};
+
+struct bind_data *to_bind_data(const struct configfd_context *cfc)
+{
+	return cfc->data;
+}
+
+static int bind_set_fd(const struct configfd_context *cfc,
+		       struct configfd_param *p)
+{
+	struct bind_data *bd = to_bind_data(cfc);
+	struct path *path;
+
+	if (strcmp(p->key, "pathfd") != 0)
+		return -EINVAL;
+
+	path = &p->file->f_path;
+
+	if (cfc->op == CONFIGFD_CMD_RECONFIGURE &&
+	    path->mnt->mnt_root != path->dentry) {
+		logger_err(cfc->log, "pathfd must be a bind mount");
+		return -EINVAL;
+	}
+	bd->file = p->file;
+	p->file = NULL;	/* we now own */
+	return 0;
+}
+
+static int bind_set_flag(const struct configfd_context *cfc,
+			 struct configfd_param *p)
+{
+	struct bind_data *bd = to_bind_data(cfc);
+
+	if (strcmp(p->key, "ro") == 0 &&
+	    cfc->op == CONFIGFD_CMD_RECONFIGURE) {
+		bd->ro = true;
+	} else if (strcmp(p->key, "rw") == 0 &&
+		   cfc->op == CONFIGFD_CMD_RECONFIGURE) {
+		bd->ro = false;
+	} else if (strcmp(p->key, "recursive") == 0 &&
+		   cfc->op == CONFIGFD_CMD_CREATE) {
+		bd->recursive = true;
+	} else if (strcmp(p->key, "detached") == 0 &&
+		   cfc->op == CONFIGFD_CMD_CREATE) {
+		if (!ns_capable(current->nsproxy->mnt_ns->user_ns,
+				CAP_SYS_ADMIN)) {
+			logger_err(cfc->log, "bind set: insufficient permission for detached tree");
+			return -EPERM;
+		}
+		bd->detached = true;
+	} else {
+		logger_err(cfc->log, "bind set: invalid flag %s", p->key);
+		return -EINVAL;
+	}
+	return 0;
+}
+static int bind_set(const struct configfd_context *cfc,
+		    struct configfd_param *p)
+{
+	switch (p->cmd) {
+	case CONFIGFD_SET_FLAG:
+		return bind_set_flag(cfc, p);
+	case CONFIGFD_SET_FD:
+		return bind_set_fd(cfc, p);
+	default:
+		logger_err(cfc->log, "bind only takes a flag or fd argument");
+		return -EINVAL;
+	}
+}
+
+static int bind_get(const struct configfd_context *cfc,
+		    struct configfd_param *p)
+{
+	struct bind_data *bd = to_bind_data(cfc);
+
+	if (strcmp(p->key, "bindfd") != 0 || p->cmd != CONFIGFD_GET_FD)
+		return -EINVAL;
+
+	if (!bd->retfile)
+		return -EINVAL;
+
+	p->file = bd->retfile;
+	bd->retfile = NULL;
+
+	return 0;
+}
+
+static int bind_reconfigure(const struct configfd_context *cfc)
+{
+	struct bind_data *bd = to_bind_data(cfc);
+	unsigned int mnt_flags = 0;
+
+	if (!bd->file) {
+		logger_err(cfc->log, "bind reconfigure: fd must be set");
+		return -EINVAL;
+	}
+	if (bd->ro)
+		mnt_flags |= MNT_READONLY;
+
+	return do_reconfigure_mnt(&bd->file->f_path, mnt_flags);
+}
+
+static int bind_create(const struct configfd_context *cfc)
+{
+	struct bind_data *bd = to_bind_data(cfc);
+	struct path *p;
+	struct file *f;
+
+	if (!bd->file) {
+		logger_err(cfc->log, "bind create: fd must be set");
+		return -EINVAL;
+	}
+	if (bd->recursive && !bd->detached) {
+		logger_err(cfc->log, "bind create: recursive cannot be set without detached");
+		return -EINVAL;
+	}
+
+	p = &bd->file->f_path;
+
+	if (bd->detached)
+		f = open_detached_copy(p, bd->recursive);
+	else
+		f = dentry_open(p, O_PATH, current_cred());
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+
+	bd->retfile = f;
+	return 0;
+}
+
+static int bind_act(const struct configfd_context *cfc, unsigned int cmd)
+{
+	switch (cmd) {
+	case CONFIGFD_CMD_RECONFIGURE:
+		return bind_reconfigure(cfc);
+	case CONFIGFD_CMD_CREATE:
+		return bind_create(cfc);
+	default:
+		logger_err(cfc->log, "bind only responds to reconfigure or create actions");
+		return -EINVAL;
+	}
+}
+
+static void bind_free(const struct configfd_context *cfc)
+{
+	struct bind_data *bd = to_bind_data(cfc);
+
+	if (bd->file)
+		fput(bd->file);
+}
+
+static struct configfd_ops bind_type_ops = {
+	.free = bind_free,
+	.get = bind_get,
+	.set = bind_set,
+	.act = bind_act,
+};
+
+static struct configfd_type bind_type = {
+	.name		= "bind",
+	.ops		= &bind_type_ops,
+	.data_size	= sizeof(struct bind_data),
+};
+
+static int __init bind_setup(void)
+{
+	configfd_type_register(&bind_type);
+
+	return 0;
+}
+fs_initcall(bind_setup);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-27 21:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21  0:11 Feature bug with the new mount API: no way of doing read only bind mounts James Bottomley
2019-11-21  8:10 ` David Howells
2019-11-21 14:54   ` Christian Brauner
2019-11-21 15:00   ` James Bottomley
2019-11-21 15:04   ` David Howells
2019-11-21 16:15   ` James Bottomley
2019-11-27 20:32     ` James Bottomley
2019-11-27 20:33       ` [RFC 1/6] logger: add a limited buffer logging facility James Bottomley
2019-11-27 20:34       ` [RFC 2/6] configfd: add generic file descriptor based configuration parser James Bottomley
2019-11-27 20:35       ` [RFC 3/6] configfd: syscall: wire up configfd syscalls James Bottomley
2019-11-27 20:36       ` [RFC 4/6] fs: implement fsconfig via configfd James Bottomley
2019-11-27 20:37       ` [RFC 5/6] fs: expose internal interfaces open_detached_copy and do_reconfigure_mount James Bottomley
2019-11-27 20:38       ` [RFC 6/6] fs: bind: add configfs type for bind mounts James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).