linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
@ 2020-02-18 17:04 David Howells
  2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
                   ` (23 more replies)
  0 siblings, 24 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:04 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel


Here are a set of patches that adds system calls, that (a) allow
information about the VFS, mount topology, superblock and files to be
retrieved and (b) allow for notifications of mount topology rearrangement
events, mount and superblock attribute changes and other superblock events,
such as errors.

============================
FILESYSTEM INFORMATION QUERY
============================

The first system call, fsinfo(), allows information about the filesystem at
a particular path point to be queried as a set of attributes, some of which
may have more than one value.

Attribute values are of four basic types:

 (1) Version dependent-length structure (size defined by type).

 (2) Variable-length string (up to 4096, including NUL).

 (3) List of structures (up to INT_MAX size).

 (4) Opaque blob (up to INT_MAX size).

Attributes can have multiple values either as a sequence of values or a
sequence-of-sequences of values and all the values of a particular
attribute must be of the same type.

Note that the values of an attribute *are* allowed to vary between dentries
within a single superblock, depending on the specific dentry that you're
looking at, but all the values of an attribute have to be of the same type.

I've tried to make the interface as light as possible, so integer/enum
attribute selector rather than string and the core does all the allocation
and extensibility support work rather than leaving that to the filesystems.
That means that for the first two attribute types, the filesystem will
always see a sufficiently-sized buffer allocated.  Further, this removes
the possibility of the filesystem gaining access to the userspace buffer.


fsinfo() allows a variety of information to be retrieved about a filesystem
and the mount topology:

 (1) General superblock attributes:

     - Filesystem identifiers (UUID, volume label, device numbers, ...)
     - The limits on a filesystem's capabilities
     - Information on supported statx fields and attributes and IOC flags.
     - A variety single-bit flags indicating supported capabilities.
     - Timestamp resolution and range.
     - The amount of space/free space in a filesystem (as statfs()).
     - Superblock notification counter.

 (2) Filesystem-specific superblock attributes:

     - Superblock-level timestamps.
     - Cell name.
     - Server names and addresses.
     - Filesystem-specific information.

 (3) VFS information:

     - Mount topology information.
     - Mount attributes.
     - Mount notification counter.

 (4) Information about what the fsinfo() syscall itself supports, including
     the type and struct/element size of attributes.

The system is extensible:

 (1) New attributes can be added.  There is no requirement that a
     filesystem implement every attribute.  Note that the core VFS keeps a
     table of types and sizes so it can handle future extensibility rather
     than delegating this to the filesystems.

 (2) Version length-dependent structure attributes can be made larger and
     have additional information tacked on the end, provided it keeps the
     layout of the existing fields.  If an older process asks for a shorter
     structure, it will only be given the bits it asks for.  If a newer
     process asks for a longer structure on an older kernel, the extra
     space will be set to 0.  In all cases, the size of the data actually
     available is returned.

     In essence, the size of a structure is that structure's version: a
     smaller size is an earlier version and a later version includes
     everything that the earlier version did.

 (3) New single-bit capability flags can be added.  This is a structure-typed
     attribute and, as such, (2) applies.  Any bits you wanted but the kernel
     doesn't support are automatically set to 0.

fsinfo() may be called like the following, for example:

	struct fsinfo_params params = {
		.at_flags	= AT_SYMLINK_NOFOLLOW,
		.flags		= FSINFO_FLAGS_QUERY_PATH,
		.request	= FSINFO_ATTR_AFS_SERVER_ADDRESSES,
		.Nth		= 2,
	};
	struct fsinfo_server_address address;
	len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", &params,
		     &address, sizeof(address));

The above example would query an AFS filesystem to retrieve the address
list for the 3rd server, and:

	struct fsinfo_params params = {
		.at_flags	= AT_SYMLINK_NOFOLLOW,
		.flags		= FSINFO_FLAGS_QUERY_PATH,
		.request	= FSINFO_ATTR_AFS_CELL_NAME;
	};
	char cell_name[256];
	len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", &params,
		     &cell_name, sizeof(cell_name));

would retrieve the name of an AFS cell as a string.

In future, I want to make fsinfo() capable of querying a context created by
fsopen() or fspick(), e.g.:

	fd = fsopen("ext4", 0);
	struct fsinfo_params params = {
		.flags		= FSINFO_FLAGS_QUERY_FSCONTEXT,
		.request	= FSINFO_ATTR_PARAMETERS;
	};
	char buffer[65536];
	fsinfo(fd, NULL, &params, &buffer, sizeof(buffer));

even if that context doesn't currently have a superblock attached.  I would
prefer this to contain length-prefixed strings so that there's no need to
insert escaping, especially as any character, including '\', can be used as
the separator in cifs and so that binary parameters can be returned (though
that is a lesser issue).


========================
FILESYSTEM NOTIFICATIONS
========================

The second system call, watch_mount(), places a watch on a point in the
mount topology specified by the dirfd, path and at_flags parameters.  All
mount topology change and mount attribute change notifications in the
subtree rooted at that point can be intercepted by the watch.  Watches are
ducted through pipes:

	int fd[2];
	pipe2(fd, O_NOTIFICATION_PIPE);
	ioctl(fd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
	watch_mount(AT_FDCWD, "/", 0, fd[0], 0x02);

Events include:

 - New mount made
 - Mount unmounted
 - Mount expired
 - R/O state changed
 - Other attribute changed
 - Mount moved from
 - Mount moved to

Using filtering, this may be limited in various ways (single mount watch vs
subtree watch, recursive vs non-recursive changes, to-R/O vs to-R/W, mount
vs submount).

Each mount now has a change counter.  Whenever a mount is changed, this
gets incremented.  It can be queried by fsinfo() using either
FSINFO_ATTR_MOUNT_INFO or FSINFO_ATTR_MOUNT_CHILDREN.  The ID of the mount
on which the notification is generated is placed into the notification
message (triggered_on).  If the event involves a second mount as well, such
as creation of a new mount, that gets returned too (changed_mount).


The third system call, watch_sb(), places a watch on the superblock
specified by the dirfd, path and at_flags parameters.  This allows various
superblock events to be monitored for, such as:

 - Transition between R/W and R/O
 - Filesystem errors
 - Quota overrun
 - Network status changes

Each superblock now gets a 64-bit unique superblock identifier and a
notification counter.  The counter is incremented each time one of these
notifications would be generated.  This attributes can be queried using
fsinfo() with FSINFO_ATTR_SB_NOTIFICATIONS.  The identifier is placed into
notification messages.


Two sample programs are provided, one to query filesystem attributes and
the other to display a mount subtree.  Both of them can be given a path or
a mount ID to start at.  Further, the watch_test sample program now watches
for mount events under "/" and for superblock events on whatever superblock
is backing "/mnt" when it the program is started.

The patches can be found here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git

on branch:

	fsinfo-core


===================
SIGNIFICANT CHANGES
===================

 ver #16:

 (*) Split the features bits out of the fsinfo() core into their own patch
     and got rid of the name encoding attributes.

 (*) Renamed the 'array' type to 'list' and made AFS use it for returning
     server address lists.

 (*) Changed the ->fsinfo() method into an ->fsinfo_attributes[] table,
     where each attribute has a ->get() method to deal with it.  These
     tables can then be returned with an fsinfo meta attribute.

 (*) Dropped the fscontext query and parameter/description retrieval
     attributes for now.

 (*) Picked the mount topology attributes into this branch.

 (*) Picked the mount notifications into this branch and rebased on top of
     notifications-pipe-core.

 (*) Picked the superblock notifications into this branch.

 (*) Add sample code for Ext4 and NFS.

David
---
David Howells (19):
      vfs: syscall: Add fsinfo() to query filesystem information
      fsinfo: Add syscalls to other arches
      fsinfo: Provide a bitmap of supported features
      vfs: Add mount change counter
      vfs: Introduce a non-repeating system-unique superblock ID
      vfs: Allow fsinfo() to look up a mount object by ID
      vfs: Allow mount information to be queried by fsinfo()
      vfs: fsinfo sample: Mount listing program
      fsinfo: Allow the mount topology propogation flags to be retrieved
      fsinfo: Add API documentation
      afs: Support fsinfo()
      security: Add hooks to rule on setting a superblock or mount watch
      vfs: Add a mount-notification facility
      notifications: sample: Display mount tree change notifications
      vfs: Add superblock notifications
      fsinfo: Provide superblock notification counter
      notifications: sample: Display superblock notifications
      ext4: Add example fsinfo information
      nfs: Add example filesystem information


 Documentation/filesystems/fsinfo.rst        |  490 +++++++++++++++
 arch/alpha/kernel/syscalls/syscall.tbl      |    3 
 arch/arm/tools/syscall.tbl                  |    3 
 arch/arm64/include/asm/unistd.h             |    2 
 arch/arm64/include/asm/unistd32.h           |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    3 
 arch/m68k/kernel/syscalls/syscall.tbl       |    4 
 arch/microblaze/kernel/syscalls/syscall.tbl |    3 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    3 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    3 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    3 
 arch/parisc/kernel/syscalls/syscall.tbl     |    3 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    3 
 arch/s390/kernel/syscalls/syscall.tbl       |    3 
 arch/sh/kernel/syscalls/syscall.tbl         |    3 
 arch/sparc/kernel/syscalls/syscall.tbl      |    3 
 arch/x86/entry/syscalls/syscall_32.tbl      |    3 
 arch/x86/entry/syscalls/syscall_64.tbl      |    3 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    3 
 fs/Kconfig                                  |   28 +
 fs/Makefile                                 |    2 
 fs/afs/internal.h                           |    1 
 fs/afs/super.c                              |  229 +++++++
 fs/d_path.c                                 |    2 
 fs/ext4/Makefile                            |    1 
 fs/ext4/ext4.h                              |    9 
 fs/ext4/fsinfo.c                            |   40 +
 fs/ext4/super.c                             |    1 
 fs/fsinfo.c                                 |  635 ++++++++++++++++++++
 fs/internal.h                               |   12 
 fs/mount.h                                  |   31 +
 fs/mount_notify.c                           |  188 ++++++
 fs/namespace.c                              |  323 ++++++++++
 fs/nfs/Makefile                             |    1 
 fs/nfs/internal.h                           |    8 
 fs/nfs/nfs4super.c                          |    1 
 fs/nfs/super.c                              |    1 
 fs/super.c                                  |  149 +++++
 include/linux/dcache.h                      |    1 
 include/linux/fs.h                          |   89 +++
 include/linux/fsinfo.h                      |  102 +++
 include/linux/lsm_hooks.h                   |   24 +
 include/linux/security.h                    |   16 +
 include/linux/syscalls.h                    |    8 
 include/uapi/asm-generic/unistd.h           |    8 
 include/uapi/linux/fsinfo.h                 |  371 ++++++++++++
 include/uapi/linux/mount.h                  |   10 
 include/uapi/linux/watch_queue.h            |   61 ++
 kernel/sys_ni.c                             |    3 
 samples/vfs/Makefile                        |    7 
 samples/vfs/test-fsinfo.c                   |  858 +++++++++++++++++++++++++++
 samples/vfs/test-mntinfo.c                  |  243 ++++++++
 samples/watch_queue/watch_test.c            |   76 ++
 security/security.c                         |   14 
 54 files changed, 4081 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/filesystems/fsinfo.rst
 create mode 100644 fs/ext4/fsinfo.c
 create mode 100644 fs/fsinfo.c
 create mode 100644 fs/mount_notify.c
 create mode 100644 include/linux/fsinfo.h
 create mode 100644 include/uapi/linux/fsinfo.h
 create mode 100644 samples/vfs/test-fsinfo.c
 create mode 100644 samples/vfs/test-mntinfo.c



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-19 16:31   ` Darrick J. Wong
                     ` (3 more replies)
  2020-02-18 17:05 ` [PATCH 02/19] fsinfo: Add syscalls to other arches " David Howells
                   ` (22 subsequent siblings)
  23 siblings, 4 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add a system call to allow filesystem information to be queried.  A request
value can be given to indicate the desired attribute.  Support is provided
for enumerating multi-value attributes.

===============
NEW SYSTEM CALL
===============

The new system call looks like:

	int ret = fsinfo(int dfd,
			 const char *filename,
			 const struct fsinfo_params *params,
			 void *buffer,
			 size_t buf_size);

The params parameter optionally points to a block of parameters:

	struct fsinfo_params {
		__u32	at_flags;
		__u32	request;
		__u32	Nth;
		__u32	Mth;
		__u64	__reserved[3];
	};

If params is NULL, it is assumed params->request should be
fsinfo_attr_statfs, params->Nth should be 0, params->Mth should be 0 and
params->at_flags should be 0.

If params is given, all of params->__reserved[] must be 0.

dfd, filename and params->at_flags indicate the file to query.  There is no
equivalent of lstat() as that can be emulated with fsinfo() by setting
AT_SYMLINK_NOFOLLOW in params->at_flags.  There is also no equivalent of
fstat() as that can be emulated by passing a NULL filename to fsinfo() with
the fd of interest in dfd.  AT_NO_AUTOMOUNT can also be used to an allow
automount point to be queried without triggering it.

params->request indicates the attribute/attributes to be queried.  This can
be one of:

	FSINFO_ATTR_STATFS		- statfs-style info
	FSINFO_ATTR_IDS			- Filesystem IDs
	FSINFO_ATTR_LIMITS		- Filesystem limits
	FSINFO_ATTR_SUPPORTS		- What's supported in statx(), IOC flags
	FSINFO_ATTR_TIMESTAMP_INFO	- Inode timestamp info
	FSINFO_ATTR_VOLUME_ID		- Volume ID (string)
	FSINFO_ATTR_VOLUME_UUID		- Volume UUID
	FSINFO_ATTR_VOLUME_NAME		- Volume name (string)
	FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO - Information about attr Nth
	FSINFO_ATTR_FSINFO_ATTRIBUTES	- List of supported attrs

Some attributes (such as the servers backing a network filesystem) can have
multiple values.  These can be enumerated by setting params->Nth and
params->Mth to 0, 1, ... until ENODATA is returned.

buffer and buf_size point to the reply buffer.  The buffer is filled up to
the specified size, even if this means truncating the reply.  The full size
of the reply is returned.  In future versions, this will allow extra fields
to be tacked on to the end of the reply, but anyone not expecting them will
only get the subset they're expecting.  If either buffer of buf_size are 0,
no copy will take place and the data size will be returned.

At the moment, this will only work on x86_64 and i386 as it requires the
system call to be wired up.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
---

 arch/x86/entry/syscalls/syscall_32.tbl |    1 
 arch/x86/entry/syscalls/syscall_64.tbl |    1 
 fs/Kconfig                             |    7 
 fs/Makefile                            |    1 
 fs/fsinfo.c                            |  525 ++++++++++++++++++++++++++++
 include/linux/fs.h                     |    5 
 include/linux/fsinfo.h                 |   70 ++++
 include/linux/syscalls.h               |    4 
 include/uapi/asm-generic/unistd.h      |    4 
 include/uapi/linux/fsinfo.h            |  186 ++++++++++
 kernel/sys_ni.c                        |    1 
 samples/vfs/Makefile                   |    5 
 samples/vfs/test-fsinfo.c              |  599 ++++++++++++++++++++++++++++++++
 13 files changed, 1408 insertions(+), 1 deletion(-)
 create mode 100644 fs/fsinfo.c
 create mode 100644 include/linux/fsinfo.h
 create mode 100644 include/uapi/linux/fsinfo.h
 create mode 100644 samples/vfs/test-fsinfo.c

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index c17cb77eb150..b7817acb154b 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -442,3 +442,4 @@
 435	i386	clone3			sys_clone3			__ia32_sys_clone3
 437	i386	openat2			sys_openat2			__ia32_sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd			__ia32_sys_pidfd_getfd
+439	i386	fsinfo			sys_fsinfo			__ia32_sys_fsinfo
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 44d510bc9b78..3a45ed6d28cb 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -359,6 +359,7 @@
 435	common	clone3			__x64_sys_clone3/ptregs
 437	common	openat2			__x64_sys_openat2
 438	common	pidfd_getfd		__x64_sys_pidfd_getfd
+439	common	fsinfo			__x64_sys_fsinfo
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/Kconfig b/fs/Kconfig
index 708ba336e689..1d1b48059ec9 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -15,6 +15,13 @@ config VALIDATE_FS_PARSER
 	  Enable this to perform validation of the parameter description for a
 	  filesystem when it is registered.
 
+config FSINFO
+	bool "Enable the fsinfo() system call"
+	help
+	  Enable the file system information querying system call to allow
+	  comprehensive information to be retrieved about a filesystem,
+	  superblock or mount object.
+
 if BLOCK
 
 config FS_IOMAP
diff --git a/fs/Makefile b/fs/Makefile
index 505e51166973..b5cc9bcd17a4 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_COREDUMP)		+= coredump.o
 obj-$(CONFIG_SYSCTL)		+= drop_caches.o
 
 obj-$(CONFIG_FHANDLE)		+= fhandle.o
+obj-$(CONFIG_FSINFO)		+= fsinfo.o
 obj-y				+= iomap/
 
 obj-y				+= quota/
diff --git a/fs/fsinfo.c b/fs/fsinfo.c
new file mode 100644
index 000000000000..3bc35b91f20b
--- /dev/null
+++ b/fs/fsinfo.c
@@ -0,0 +1,525 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information query.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#include <linux/syscalls.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/statfs.h>
+#include <linux/security.h>
+#include <linux/uaccess.h>
+#include <linux/fsinfo.h>
+#include <uapi/linux/mount.h>
+#include "internal.h"
+
+static const struct fsinfo_attribute fsinfo_common_attributes[];
+
+/**
+ * fsinfo_string - Store a string as an fsinfo attribute value.
+ * @s: The string to store (may be NULL)
+ * @ctx: The parameter context
+ */
+int fsinfo_string(const char *s, struct fsinfo_context *ctx)
+{
+	int ret = 0;
+
+	if (s) {
+		ret = strlen(s);
+		memcpy(ctx->buffer, s, ret);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(fsinfo_string);
+
+/*
+ * Get basic filesystem stats from statfs.
+ */
+static int fsinfo_generic_statfs(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_statfs *p = ctx->buffer;
+	struct kstatfs buf;
+	int ret;
+
+	ret = vfs_statfs(path, &buf);
+	if (ret < 0)
+		return ret;
+
+	p->f_blocks.hi	= 0;
+	p->f_blocks.lo	= buf.f_blocks;
+	p->f_bfree.hi	= 0;
+	p->f_bfree.lo	= buf.f_bfree;
+	p->f_bavail.hi	= 0;
+	p->f_bavail.lo	= buf.f_bavail;
+	p->f_files.hi	= 0;
+	p->f_files.lo	= buf.f_files;
+	p->f_ffree.hi	= 0;
+	p->f_ffree.lo	= buf.f_ffree;
+	p->f_favail.hi	= 0;
+	p->f_favail.lo	= buf.f_ffree;
+	p->f_bsize	= buf.f_bsize;
+	p->f_frsize	= buf.f_frsize;
+	return sizeof(*p);
+}
+
+static int fsinfo_generic_ids(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_ids *p = ctx->buffer;
+	struct super_block *sb;
+	struct kstatfs buf;
+	int ret;
+
+	ret = vfs_statfs(path, &buf);
+	if (ret < 0 && ret != -ENOSYS)
+		return ret;
+
+	sb = path->dentry->d_sb;
+	p->f_fstype	= sb->s_magic;
+	p->f_dev_major	= MAJOR(sb->s_dev);
+	p->f_dev_minor	= MINOR(sb->s_dev);
+
+	memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
+	strlcpy(p->f_fs_name, path->dentry->d_sb->s_type->name,
+		sizeof(p->f_fs_name));
+	return sizeof(*p);
+}
+
+static int fsinfo_generic_limits(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_limits *lim = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	lim->max_file_size.hi	= 0;
+	lim->max_file_size.lo	= sb->s_maxbytes;
+	lim->max_ino.hi		= 0;
+	lim->max_ino.lo		= UINT_MAX;
+	lim->max_hard_links	= sb->s_max_links;
+	lim->max_uid		= UINT_MAX;
+	lim->max_gid		= UINT_MAX;
+	lim->max_projid		= UINT_MAX;
+	lim->max_filename_len	= NAME_MAX;
+	lim->max_symlink_len	= PAGE_SIZE;
+	lim->max_xattr_name_len	= XATTR_NAME_MAX;
+	lim->max_xattr_body_len	= XATTR_SIZE_MAX;
+	lim->max_dev_major	= 0xffffff;
+	lim->max_dev_minor	= 0xff;
+	return sizeof(*lim);
+}
+
+static int fsinfo_generic_supports(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_supports *c = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	c->stx_mask = STATX_BASIC_STATS;
+	if (sb->s_d_op && sb->s_d_op->d_automount)
+		c->stx_attributes |= STATX_ATTR_AUTOMOUNT;
+	return sizeof(*c);
+}
+
+static const struct fsinfo_timestamp_info fsinfo_default_timestamp_info = {
+	.atime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.mtime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.ctime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.btime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+};
+
+static int fsinfo_generic_timestamp_info(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_timestamp_info *ts = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+	s8 exponent;
+
+	*ts = fsinfo_default_timestamp_info;
+
+	if (sb->s_time_gran < 1000000000) {
+		if (sb->s_time_gran < 1000)
+			exponent = -9;
+		else if (sb->s_time_gran < 1000000)
+			exponent = -6;
+		else
+			exponent = -3;
+
+		ts->atime.gran_exponent = exponent;
+		ts->mtime.gran_exponent = exponent;
+		ts->ctime.gran_exponent = exponent;
+		ts->btime.gran_exponent = exponent;
+	}
+
+	return sizeof(*ts);
+}
+
+static int fsinfo_generic_volume_uuid(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_volume_uuid *vu = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	memcpy(vu, &sb->s_uuid, sizeof(*vu));
+	return sizeof(*vu);
+}
+
+static int fsinfo_generic_volume_id(struct path *path, struct fsinfo_context *ctx)
+{
+	return fsinfo_string(path->dentry->d_sb->s_id, ctx);
+}
+
+static int fsinfo_attribute_info(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct fsinfo_attribute *attr;
+	struct fsinfo_attribute_info *info = ctx->buffer;
+	struct dentry *dentry = path->dentry;
+
+	if (dentry->d_sb->s_op->fsinfo_attributes)
+		for (attr = dentry->d_sb->s_op->fsinfo_attributes; attr->get; attr++)
+			if (attr->attr_id == ctx->Nth)
+				goto found;
+	for (attr = fsinfo_common_attributes; attr->get; attr++)
+		if (attr->attr_id == ctx->Nth)
+			goto found;
+	return -ENODATA;
+
+found:
+	info->attr_id		= attr->attr_id;
+	info->type		= attr->type;
+	info->flags		= attr->flags;
+	info->size		= attr->size;
+	info->element_size	= attr->element_size;
+	return sizeof(*attr);
+}
+
+static void fsinfo_attributes_insert(struct fsinfo_context *ctx,
+				     const struct fsinfo_attribute *attr)
+{
+	__u32 *buffer = ctx->buffer;
+	unsigned int i;
+
+	if (ctx->usage > ctx->buf_size ||
+	    ctx->buf_size - ctx->usage < sizeof(__u32)) {
+		ctx->usage += sizeof(__u32);
+		return;
+	}
+
+	for (i = 0; i < ctx->usage / sizeof(__u32); i++)
+		if (buffer[i] == attr->attr_id)
+			return;
+
+	buffer[i] = attr->attr_id;
+	ctx->usage += sizeof(__u32);
+}
+
+static int fsinfo_attributes(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct fsinfo_attribute *attr;
+	struct super_block *sb = path->dentry->d_sb;
+
+	if (sb->s_op->fsinfo_attributes)
+		for (attr = sb->s_op->fsinfo_attributes; attr->get; attr++)
+			fsinfo_attributes_insert(ctx, attr);
+	for (attr = fsinfo_common_attributes; attr->get; attr++)
+		fsinfo_attributes_insert(ctx, attr);
+	return ctx->usage;
+}
+
+static const struct fsinfo_attribute fsinfo_common_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
+	FSINFO_STRING 	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
+	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_attribute_info),
+	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_attributes),
+	{}
+};
+
+/*
+ * Retrieve large filesystem information, such as an opaque blob or array of
+ * struct elements where the value isn't limited to the size of a page.
+ */
+static int vfs_fsinfo_large(struct path *path, struct fsinfo_context *ctx,
+			    const struct fsinfo_attribute *attr)
+{
+	int ret;
+
+	while (!signal_pending(current)) {
+		ctx->usage = 0;
+		ret = attr->get(path, ctx);
+		if (IS_ERR_VALUE((long)ret))
+			return ret; /* Error */
+		if ((unsigned int)ret <= ctx->buf_size)
+			return ret; /* It fitted */
+
+		/* We need to resize the buffer */
+		kvfree(ctx->buffer);
+		ctx->buffer = NULL;
+		ctx->buf_size = roundup(ret, PAGE_SIZE);
+		if (ctx->buf_size > INT_MAX)
+			return -EMSGSIZE;
+		ctx->buffer = kvmalloc(ctx->buf_size, GFP_KERNEL);
+		if (!ctx->buffer)
+			return -ENOMEM;
+	}
+
+	return -ERESTARTSYS;
+}
+
+static int vfs_do_fsinfo(struct path *path, struct fsinfo_context *ctx,
+			 const struct fsinfo_attribute *attr)
+{
+	if (ctx->Nth != 0 && !(attr->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)))
+		return -ENODATA;
+	if (ctx->Mth != 0 && !(attr->flags & FSINFO_FLAGS_NM))
+		return -ENODATA;
+
+	ctx->buf_size = attr->size;
+	if (ctx->want_size_only && attr->type == FSINFO_TYPE_VSTRUCT)
+		return attr->size;
+
+	ctx->buffer = kvzalloc(ctx->buf_size, GFP_KERNEL);
+	if (!ctx->buffer)
+		return -ENOMEM;
+
+	switch (attr->type) {
+	case FSINFO_TYPE_VSTRUCT:
+		ctx->clear_tail = true;
+		/* Fall through */
+	case FSINFO_TYPE_STRING:
+		return attr->get(path, ctx);
+
+	case FSINFO_TYPE_OPAQUE:
+	case FSINFO_TYPE_LIST:
+		return vfs_fsinfo_large(path, ctx, attr);
+
+	default:
+		return -ENOPKG;
+	}
+}
+
+/**
+ * vfs_fsinfo - Retrieve filesystem information
+ * @path: The object to query
+ * @params: Parameters to define a request and place to store result
+ *
+ * Get an attribute on a filesystem or an object within a filesystem.  The
+ * filesystem attribute to be queried is indicated by @ctx->requested_attr, and
+ * if it's a multi-valued attribute, the particular value is selected by
+ * @ctx->Nth and then @ctx->Mth.
+ *
+ * For common attributes, a value may be fabricated if it is not supported by
+ * the filesystem.
+ *
+ * On success, the size of the attribute's value is returned (0 is a valid
+ * size).  A buffer will have been allocated and will be pointed to by
+ * @ctx->buffer.  The caller must free this with kvfree().
+ *
+ * Errors can also be returned: -ENOMEM if a buffer cannot be allocated, -EPERM
+ * or -EACCES if permission is denied by the LSM, -EOPNOTSUPP if an attribute
+ * doesn't exist for the specified object or -ENODATA if the attribute exists,
+ * but the Nth,Mth value does not exist.  -EMSGSIZE indicates that the value is
+ * unmanageable internally and -ENOPKG indicates other internal failure.
+ *
+ * Errors such as -EIO may also come from attempts to access media or servers
+ * to obtain the requested information if it's not immediately to hand.
+ *
+ * [*] Note that the caller may set @ctx->want_size_only if it only wants the
+ *     size of the value and not the data.  If this is set, a buffer may not be
+ *     allocated under some circumstances.  This is intended for size query by
+ *     userspace.
+ *
+ * [*] Note that @ctx->clear_tail will be returned set if the data should be
+ *     padded out with zeros when writing it to userspace.
+ */
+static int vfs_fsinfo(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct fsinfo_attribute *attr;
+	struct dentry *dentry = path->dentry;
+	int ret;
+
+	if (dentry->d_sb->s_op->fsinfo_attributes)
+		for (attr = dentry->d_sb->s_op->fsinfo_attributes; attr->get; attr++)
+			if (attr->attr_id == ctx->requested_attr)
+				goto found;
+	for (attr = fsinfo_common_attributes; attr->get; attr++)
+		if (attr->attr_id == ctx->requested_attr)
+			goto found;
+	return -EOPNOTSUPP;
+
+found:
+	ret = security_sb_statfs(dentry);
+	if (ret)
+		return ret;
+
+	return vfs_do_fsinfo(path, ctx, attr);
+}
+
+static int vfs_fsinfo_path(int dfd, const char __user *pathname,
+			   unsigned int at_flags, struct fsinfo_context *ctx)
+{
+	struct path path;
+	unsigned lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+	int ret = -EINVAL;
+
+	if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
+			  AT_EMPTY_PATH)) != 0)
+		return -EINVAL;
+
+	if (at_flags & AT_SYMLINK_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
+	if (at_flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (at_flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+
+retry:
+	ret = user_path_at(dfd, pathname, lookup_flags, &path);
+	if (ret)
+		goto out;
+
+	ret = vfs_fsinfo(&path, ctx);
+	path_put(&path);
+	if (retry_estale(ret, lookup_flags)) {
+		lookup_flags |= LOOKUP_REVAL;
+		goto retry;
+	}
+out:
+	return ret;
+}
+
+static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_context *ctx)
+{
+	struct fd f = fdget_raw(fd);
+	int ret = -EBADF;
+
+	if (f.file) {
+		ret = vfs_fsinfo(&f.file->f_path, ctx);
+		fdput(f);
+	}
+	return ret;
+}
+
+/**
+ * sys_fsinfo - System call to get filesystem information
+ * @dfd: Base directory to pathwalk from or fd referring to filesystem.
+ * @pathname: Filesystem to query or NULL.
+ * @_params: Parameters to define request (or NULL for enhanced statfs).
+ * @user_buffer: Result buffer.
+ * @user_buf_size: Size of result buffer.
+ *
+ * Get information on a filesystem.  The filesystem attribute to be queried is
+ * indicated by @_params->request, and some of the attributes can have multiple
+ * values, indexed by @_params->Nth and @_params->Mth.  If @_params is NULL,
+ * then the 0th fsinfo_attr_statfs attribute is queried.  If an attribute does
+ * not exist, EOPNOTSUPP is returned; if the Nth,Mth value does not exist,
+ * ENODATA is returned.
+ *
+ * On success, the size of the attribute's value is returned.  If
+ * @user_buf_size is 0 or @user_buffer is NULL, only the size is returned.  If
+ * the size of the value is larger than @user_buf_size, it will be truncated by
+ * the copy.  If the size of the value is smaller than @user_buf_size then the
+ * excess buffer space will be cleared.  The full size of the value will be
+ * returned, irrespective of how much data is actually placed in the buffer.
+ */
+SYSCALL_DEFINE5(fsinfo,
+		int, dfd, const char __user *, pathname,
+		struct fsinfo_params __user *, params,
+		void __user *, user_buffer, size_t, user_buf_size)
+{
+	struct fsinfo_context ctx;
+	struct fsinfo_params user_params;
+	unsigned int at_flags = 0, result_size;
+	int ret;
+
+	if (!user_buffer && user_buf_size)
+		return -EINVAL;
+	if (user_buffer && !user_buf_size)
+		return -EINVAL;
+	if (user_buf_size > UINT_MAX)
+		return -EOVERFLOW;
+
+	memset(&ctx, 0, sizeof(ctx));
+	ctx.requested_attr = FSINFO_ATTR_STATFS;
+	if (user_buf_size == 0)
+		ctx.want_size_only = true;
+
+	if (params) {
+		if (copy_from_user(&user_params, params, sizeof(user_params)))
+			return -EFAULT;
+		if (user_params.__reserved32[0] ||
+		    user_params.__reserved[0] ||
+		    user_params.__reserved[1] ||
+		    user_params.__reserved[2] ||
+		    user_params.flags & ~FSINFO_FLAGS_QUERY_TYPE)
+			return -EINVAL;
+		at_flags = user_params.at_flags;
+		ctx.flags = user_params.flags;
+		ctx.requested_attr = user_params.request;
+		ctx.Nth = user_params.Nth;
+		ctx.Mth = user_params.Mth;
+	}
+
+	switch (ctx.flags & FSINFO_FLAGS_QUERY_TYPE) {
+	case FSINFO_FLAGS_QUERY_PATH:
+		ret = vfs_fsinfo_path(dfd, pathname, at_flags, &ctx);
+		break;
+	case FSINFO_FLAGS_QUERY_FD:
+		if (pathname)
+			return -EINVAL;
+		ret = vfs_fsinfo_fd(dfd, &ctx);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (ret < 0)
+		goto error;
+
+	result_size = ret;
+	if (result_size > user_buf_size)
+		result_size = user_buf_size;
+
+	if (result_size > 0 &&
+	    copy_to_user(user_buffer, ctx.buffer, result_size) != 0) {
+		ret = -EFAULT;
+		goto error;
+	}
+
+	/* Clear any part of the buffer that we won't fill if we're putting a
+	 * struct in there.  Strings, opaque objects and arrays are expected to
+	 * be variable length.
+	 */
+	if (ctx.clear_tail &&
+	    user_buf_size > result_size &&
+	    clear_user(user_buffer + result_size, user_buf_size - result_size) != 0) {
+		ret = -EFAULT;
+		goto error;
+	}
+
+error:
+	kvfree(ctx.buffer);
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3cd4fe6b845e..f74a4ee36eb3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -68,6 +68,8 @@ struct fsverity_info;
 struct fsverity_operations;
 struct fs_context;
 struct fs_parameter_spec;
+struct fsinfo_kparams;
+struct fsinfo_attribute;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
@@ -1954,6 +1956,9 @@ struct super_operations {
 	int (*thaw_super) (struct super_block *);
 	int (*unfreeze_fs) (struct super_block *);
 	int (*statfs) (struct dentry *, struct kstatfs *);
+#ifdef CONFIG_FSINFO
+	const struct fsinfo_attribute *fsinfo_attributes;
+#endif
 	int (*remount_fs) (struct super_block *, int *, char *);
 	void (*umount_begin) (struct super_block *);
 
diff --git a/include/linux/fsinfo.h b/include/linux/fsinfo.h
new file mode 100644
index 000000000000..dcd55dbb02fa
--- /dev/null
+++ b/include/linux/fsinfo.h
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information query
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#ifndef _LINUX_FSINFO_H
+#define _LINUX_FSINFO_H
+
+#ifdef CONFIG_FSINFO
+
+#include <uapi/linux/fsinfo.h>
+
+struct path;
+
+#define FSINFO_NORMAL_ATTR_MAX_SIZE 4096
+
+struct fsinfo_context {
+	__u32		flags;		/* [in] FSINFO_FLAGS_* */
+	__u32		requested_attr;	/* [in] What is being asking for */
+	__u32		Nth;		/* [in] Instance of it (some may have multiple) */
+	__u32		Mth;		/* [in] Subinstance */
+	bool		want_size_only;	/* [in] Just want to know the size, not the data */
+	bool		clear_tail;	/* [out] T if tail of buffer should be cleared */
+	unsigned int	usage;		/* [tmp] Amount of buffer used (if large) */
+	unsigned int	buf_size;	/* [tmp] Size of ->buffer[] */
+	void		*buffer;	/* [out] The reply buffer */
+};
+
+/*
+ * A filesystem information attribute definition.
+ */
+struct fsinfo_attribute {
+	unsigned int		attr_id;	/* The ID of the attribute */
+	enum fsinfo_value_type	type:8;		/* The type of the attribute's value(s) */
+	unsigned int		flags:8;
+	unsigned int		size:16;	/* - Value size (FSINFO_STRUCT) */
+	unsigned int		element_size:16; /* - Element size (FSINFO_LIST) */
+	int (*get)(struct path *path, struct fsinfo_context *params);
+};
+
+#define __FSINFO(A, T, S, E, G, F) \
+	{ .attr_id = A, .type = T, .flags = F, .size = S, .element_size = E, .get = G }
+
+#define _FSINFO(A, T, S, E, G)	  __FSINFO(A, T, S, E, G, 0)
+#define _FSINFO_N(A, T, S, E, G)  __FSINFO(A, T, S, E, G, FSINFO_FLAGS_N)
+#define _FSINFO_NM(A, T, S, E, G) __FSINFO(A, T, S, E, G, FSINFO_FLAGS_NM)
+
+#define _FSINFO_VSTRUCT(A,S,G)	  _FSINFO   (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
+#define _FSINFO_VSTRUCT_N(A,S,G)  _FSINFO_N (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
+#define _FSINFO_VSTRUCT_NM(A,S,G) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
+
+#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G)
+#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G)
+#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G)
+#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
+#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
+#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
+#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
+#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, FSINFO_NORMAL_ATTR_MAX_SIZE, \
+					   sizeof(A##__STRUCT), G)
+#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, FSINFO_NORMAL_ATTR_MAX_SIZE, \
+					   sizeof(A##__STRUCT), G)
+
+extern int fsinfo_string(const char *, struct fsinfo_context *);
+
+#endif /* CONFIG_FSINFO */
+
+#endif /* _LINUX_FSINFO_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1815065d52f3..6c3157e46e7c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -47,6 +47,7 @@ struct stat64;
 struct statfs;
 struct statfs64;
 struct statx;
+struct fsinfo_params;
 struct __sysctl_args;
 struct sysinfo;
 struct timespec;
@@ -1003,6 +1004,9 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 				       siginfo_t __user *info,
 				       unsigned int flags);
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
+asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
+			   struct fsinfo_params __user *params,
+			   void __user *buffer, size_t buf_size);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 3a3201e4618e..9d00098a3f1b 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3)
 __SYSCALL(__NR_openat2, sys_openat2)
 #define __NR_pidfd_getfd 438
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
+#define __NR_fsinfo 439
+__SYSCALL(__NR_fsinfo, sys_fsinfo)
 
 #undef __NR_syscalls
-#define __NR_syscalls 439
+#define __NR_syscalls 440
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
new file mode 100644
index 000000000000..365d54fe9290
--- /dev/null
+++ b/include/uapi/linux/fsinfo.h
@@ -0,0 +1,186 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* fsinfo() definitions.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#ifndef _UAPI_LINUX_FSINFO_H
+#define _UAPI_LINUX_FSINFO_H
+
+#include <linux/types.h>
+#include <linux/socket.h>
+
+/*
+ * The filesystem attributes that can be requested.  Note that some attributes
+ * may have multiple instances which can be switched in the parameter block.
+ */
+#define FSINFO_ATTR_STATFS		0x00	/* statfs()-style state */
+#define FSINFO_ATTR_IDS			0x01	/* Filesystem IDs */
+#define FSINFO_ATTR_LIMITS		0x02	/* Filesystem limits */
+#define FSINFO_ATTR_SUPPORTS		0x03	/* What's supported in statx, iocflags, ... */
+#define FSINFO_ATTR_TIMESTAMP_INFO	0x04	/* Inode timestamp info */
+#define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
+#define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
+#define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
+
+#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
+#define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
+
+/*
+ * Optional fsinfo() parameter structure.
+ *
+ * If this is not given, it is assumed that fsinfo_attr_statfs instance 0,0 is
+ * desired.
+ */
+struct fsinfo_params {
+	__u32	at_flags;	/* AT_SYMLINK_NOFOLLOW and similar flags */
+	__u32	flags;		/* Flags controlling fsinfo() specifically */
+#define FSINFO_FLAGS_QUERY_TYPE	0x0007 /* What object should fsinfo() query? */
+#define FSINFO_FLAGS_QUERY_PATH	0x0000 /* - path, specified by dirfd,pathname,AT_EMPTY_PATH */
+#define FSINFO_FLAGS_QUERY_FD	0x0001 /* - fd specified by dirfd */
+	__u32	request;	/* ID of requested attribute */
+	__u32	Nth;		/* Instance of it (some may have multiple) */
+	__u32	Mth;		/* Subinstance of Nth instance */
+	__u32	__reserved32[1]; /* Reserved params; all must be 0 */
+	__u64	__reserved[3];
+};
+
+enum fsinfo_value_type {
+	FSINFO_TYPE_VSTRUCT	= 0,	/* Version-lengthed struct (up to 4096 bytes) */
+	FSINFO_TYPE_STRING	= 1,	/* NUL-term var-length string (up to 4095 chars) */
+	FSINFO_TYPE_OPAQUE	= 2,	/* Opaque blob (unlimited size) */
+	FSINFO_TYPE_LIST	= 3,	/* List of ints/structs (unlimited size) */
+};
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO).
+ *
+ * This gives information about the attributes supported by fsinfo for the
+ * given path.
+ */
+struct fsinfo_attribute_info {
+	unsigned int		attr_id;	/* The ID of the attribute */
+	enum fsinfo_value_type	type;		/* The type of the attribute's value(s) */
+	unsigned int		flags;
+#define FSINFO_FLAGS_N		0x01		/* - Attr has a set of values */
+#define FSINFO_FLAGS_NM		0x02		/* - Attr has a set of sets of values */
+	unsigned int		size;		/* - Value size (FSINFO_STRUCT) */
+	unsigned int		element_size;	/* - Element size (FSINFO_LIST) */
+};
+
+#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO__STRUCT struct fsinfo_attribute_info
+#define FSINFO_ATTR_FSINFO_ATTRIBUTES__STRUCT __u32
+
+struct fsinfo_u128 {
+#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
+	__u64	hi;
+	__u64	lo;
+#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
+	__u64	lo;
+	__u64	hi;
+#endif
+};
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_STATFS).
+ * - This gives extended filesystem information.
+ */
+struct fsinfo_statfs {
+	struct fsinfo_u128 f_blocks;	/* Total number of blocks in fs */
+	struct fsinfo_u128 f_bfree;	/* Total number of free blocks */
+	struct fsinfo_u128 f_bavail;	/* Number of free blocks available to ordinary user */
+	struct fsinfo_u128 f_files;	/* Total number of file nodes in fs */
+	struct fsinfo_u128 f_ffree;	/* Number of free file nodes */
+	struct fsinfo_u128 f_favail;	/* Number of file nodes available to ordinary user */
+	__u64	f_bsize;		/* Optimal block size */
+	__u64	f_frsize;		/* Fragment size */
+};
+
+#define FSINFO_ATTR_STATFS__STRUCT struct fsinfo_statfs
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_IDS).
+ *
+ * List of basic identifiers as is normally found in statfs().
+ */
+struct fsinfo_ids {
+	char	f_fs_name[15 + 1];	/* Filesystem name */
+	__u64	f_fsid;			/* Short 64-bit Filesystem ID (as statfs) */
+	__u64	f_sb_id;		/* Internal superblock ID for sbnotify()/mntnotify() */
+	__u32	f_fstype;		/* Filesystem type from linux/magic.h [uncond] */
+	__u32	f_dev_major;		/* As st_dev_* from struct statx [uncond] */
+	__u32	f_dev_minor;
+	__u32	__reserved[1];
+};
+
+#define FSINFO_ATTR_IDS__STRUCT struct fsinfo_ids
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_LIMITS).
+ *
+ * List of supported filesystem limits.
+ */
+struct fsinfo_limits {
+	struct fsinfo_u128 max_file_size;	/* Maximum file size */
+	struct fsinfo_u128 max_ino;		/* Maximum inode number */
+	__u64	max_uid;			/* Maximum UID supported */
+	__u64	max_gid;			/* Maximum GID supported */
+	__u64	max_projid;			/* Maximum project ID supported */
+	__u64	max_hard_links;			/* Maximum number of hard links on a file */
+	__u64	max_xattr_body_len;		/* Maximum xattr content length */
+	__u32	max_xattr_name_len;		/* Maximum xattr name length */
+	__u32	max_filename_len;		/* Maximum filename length */
+	__u32	max_symlink_len;		/* Maximum symlink content length */
+	__u32	max_dev_major;			/* Maximum device major representable */
+	__u32	max_dev_minor;			/* Maximum device minor representable */
+	__u32	__reserved[1];
+};
+
+#define FSINFO_ATTR_LIMITS__STRUCT struct fsinfo_limits
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_SUPPORTS).
+ *
+ * What's supported in various masks, such as statx() attribute and mask bits
+ * and IOC flags.
+ */
+struct fsinfo_supports {
+	__u64	stx_attributes;		/* What statx::stx_attributes are supported */
+	__u32	stx_mask;		/* What statx::stx_mask bits are supported */
+	__u32	ioc_flags;		/* What FS_IOC_* flags are supported */
+	__u32	win_file_attrs;		/* What DOS/Windows FILE_* attributes are supported */
+	__u32	__reserved[1];
+};
+
+#define FSINFO_ATTR_SUPPORTS__STRUCT struct fsinfo_supports
+
+struct fsinfo_timestamp_one {
+	__s64	minimum;	/* Minimum timestamp value in seconds */
+	__u64	maximum;	/* Maximum timestamp value in seconds */
+	__u16	gran_mantissa;	/* Granularity(secs) = mant * 10^exp */
+	__s8	gran_exponent;
+	__u8	reserved[5];
+};
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_TIMESTAMP_INFO).
+ */
+struct fsinfo_timestamp_info {
+	struct fsinfo_timestamp_one	atime;	/* Access time */
+	struct fsinfo_timestamp_one	mtime;	/* Modification time */
+	struct fsinfo_timestamp_one	ctime;	/* Change time */
+	struct fsinfo_timestamp_one	btime;	/* Birth/creation time */
+};
+
+#define FSINFO_ATTR_TIMESTAMP_INFO__STRUCT struct fsinfo_timestamp_info
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_VOLUME_UUID).
+ */
+struct fsinfo_volume_uuid {
+	__u8	uuid[16];
+};
+
+#define FSINFO_ATTR_VOLUME_UUID__STRUCT struct fsinfo_volume_uuid
+
+#endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..58246e6b5603 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents);
 COND_SYSCALL(io_uring_setup);
 COND_SYSCALL(io_uring_enter);
 COND_SYSCALL(io_uring_register);
+COND_SYSCALL(fsinfo);
 
 /* fs/xattr.c */
 
diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
index 65acdde5c117..9159ad1d7fc5 100644
--- a/samples/vfs/Makefile
+++ b/samples/vfs/Makefile
@@ -1,10 +1,15 @@
 # SPDX-License-Identifier: GPL-2.0-only
 # List of programs to build
+
 hostprogs := \
+	test-fsinfo \
 	test-fsmount \
 	test-statx
 
 always-y := $(hostprogs)
 
+HOSTCFLAGS_test-fsinfo.o += -I$(objtree)/usr/include
+HOSTLDLIBS_test-fsinfo += -static -lm
+
 HOSTCFLAGS_test-fsmount.o += -I$(objtree)/usr/include
 HOSTCFLAGS_test-statx.o += -I$(objtree)/usr/include
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
new file mode 100644
index 000000000000..9a4d49db2996
--- /dev/null
+++ b/samples/vfs/test-fsinfo.c
@@ -0,0 +1,599 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Test the fsinfo() system call
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define _GNU_SOURCE
+#define _ATFILE_SOURCE
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+#include <ctype.h>
+#include <errno.h>
+#include <time.h>
+#include <math.h>
+#include <fcntl.h>
+#include <sys/syscall.h>
+#include <linux/fsinfo.h>
+#include <linux/socket.h>
+#include <sys/stat.h>
+#include <arpa/inet.h>
+
+#ifndef __NR_fsinfo
+#define __NR_fsinfo -1
+#endif
+
+static bool debug = 0;
+
+static __attribute__((unused))
+ssize_t fsinfo(int dfd, const char *filename, struct fsinfo_params *params,
+	       void *buffer, size_t buf_size)
+{
+	return syscall(__NR_fsinfo, dfd, filename, params, buffer, buf_size);
+}
+
+struct fsinfo_attribute {
+	unsigned int		attr_id;
+	enum fsinfo_value_type	type;
+	unsigned int		size;
+	const char		*name;
+	void (*dump)(void *reply, unsigned int size);
+};
+
+static const struct fsinfo_attribute fsinfo_attributes[];
+
+static void dump_hex(unsigned int *data, int from, int to)
+{
+	unsigned offset, print_offset = 1, col = 0;
+
+	from /= 4;
+	to = (to + 3) / 4;
+
+	for (offset = from; offset < to; offset++) {
+		if (print_offset) {
+			printf("%04x: ", offset * 8);
+			print_offset = 0;
+		}
+		printf("%08x", data[offset]);
+		col++;
+		if ((col & 3) == 0) {
+			printf("\n");
+			print_offset = 1;
+		} else {
+			printf(" ");
+		}
+	}
+
+	if (!print_offset)
+		printf("\n");
+}
+
+static void dump_attribute_info(void *reply, unsigned int size)
+{
+	struct fsinfo_attribute_info *attr_info = reply;
+	const struct fsinfo_attribute *attr;
+	char type[32];
+
+	switch (attr_info->type) {
+	case FSINFO_TYPE_VSTRUCT:	strcpy(type, "V-STRUCT");	break;
+	case FSINFO_TYPE_STRING:	strcpy(type, "STRING");		break;
+	case FSINFO_TYPE_OPAQUE:	strcpy(type, "OPAQUE");		break;
+	case FSINFO_TYPE_LIST:		strcpy(type, "LIST");		break;
+	default:
+		sprintf(type, "type-%x", attr_info->type);
+		break;
+	}
+
+	if (attr_info->flags & FSINFO_FLAGS_N)
+		strcat(type, " x N");
+	else if (attr_info->flags & FSINFO_FLAGS_NM)
+		strcat(type, " x NM");
+
+	for (attr = fsinfo_attributes; attr->name; attr++)
+		if (attr->attr_id == attr_info->attr_id)
+			break;
+
+	printf("%8x %-12s %08x %5u %5u %s\n",
+	       attr_info->attr_id,
+	       type,
+	       attr_info->flags,
+	       attr_info->size,
+	       attr_info->element_size,
+	       attr->name ? attr->name : "");
+}
+
+static void dump_fsinfo_generic_statfs(void *reply, unsigned int size)
+{
+	struct fsinfo_statfs *f = reply;
+
+	printf("\n");
+	printf("\tblocks       : n=%llu fr=%llu av=%llu\n",
+	       (unsigned long long)f->f_blocks.lo,
+	       (unsigned long long)f->f_bfree.lo,
+	       (unsigned long long)f->f_bavail.lo);
+
+	printf("\tfiles        : n=%llu fr=%llu av=%llu\n",
+	       (unsigned long long)f->f_files.lo,
+	       (unsigned long long)f->f_ffree.lo,
+	       (unsigned long long)f->f_favail.lo);
+	printf("\tbsize        : %llu\n", f->f_bsize);
+	printf("\tfrsize       : %llu\n", f->f_frsize);
+}
+
+static void dump_fsinfo_generic_ids(void *reply, unsigned int size)
+{
+	struct fsinfo_ids *f = reply;
+
+	printf("\n");
+	printf("\tdev          : %02x:%02x\n", f->f_dev_major, f->f_dev_minor);
+	printf("\tfs           : type=%x name=%s\n", f->f_fstype, f->f_fs_name);
+	printf("\tfsid         : %llx\n", (unsigned long long)f->f_fsid);
+}
+
+static void dump_fsinfo_generic_limits(void *reply, unsigned int size)
+{
+	struct fsinfo_limits *f = reply;
+
+	printf("\n");
+	printf("\tmax file size: %llx%016llx\n",
+	       (unsigned long long)f->max_file_size.hi,
+	       (unsigned long long)f->max_file_size.lo);
+	printf("\tmax ino      : %llx%016llx\n",
+	       (unsigned long long)f->max_ino.hi,
+	       (unsigned long long)f->max_ino.lo);
+	printf("\tmax ids      : u=%llx g=%llx p=%llx\n",
+	       (unsigned long long)f->max_uid,
+	       (unsigned long long)f->max_gid,
+	       (unsigned long long)f->max_projid);
+	printf("\tmax dev      : maj=%x min=%x\n",
+	       f->max_dev_major, f->max_dev_minor);
+	printf("\tmax links    : %llx\n",
+	       (unsigned long long)f->max_hard_links);
+	printf("\tmax xattr    : n=%x b=%llx\n",
+	       f->max_xattr_name_len,
+	       (unsigned long long)f->max_xattr_body_len);
+	printf("\tmax len      : file=%x sym=%x\n",
+	       f->max_filename_len, f->max_symlink_len);
+}
+
+static void dump_fsinfo_generic_supports(void *reply, unsigned int size)
+{
+	struct fsinfo_supports *f = reply;
+
+	printf("\n");
+	printf("\tstx_attr     : %llx\n", (unsigned long long)f->stx_attributes);
+	printf("\tstx_mask     : %x\n", f->stx_mask);
+	printf("\tioc_flags    : %x\n", f->ioc_flags);
+	printf("\twin_fattrs   : %x\n", f->win_file_attrs);
+}
+
+static void print_time(struct fsinfo_timestamp_one *t, char stamp)
+{
+	printf("\t%ctime       : gran=%gs range=%llx-%llx\n",
+	       stamp,
+	       t->gran_mantissa * pow(10., t->gran_exponent),
+	       (long long)t->minimum,
+	       (long long)t->maximum);
+}
+
+static void dump_fsinfo_generic_timestamp_info(void *reply, unsigned int size)
+{
+	struct fsinfo_timestamp_info *f = reply;
+
+	printf("\n");
+	print_time(&f->atime, 'a');
+	print_time(&f->mtime, 'm');
+	print_time(&f->ctime, 'c');
+	print_time(&f->btime, 'b');
+}
+
+static void dump_fsinfo_generic_volume_uuid(void *reply, unsigned int size)
+{
+	struct fsinfo_volume_uuid *f = reply;
+
+	printf("%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x"
+	       "-%02x%02x%02x%02x%02x%02x\n",
+	       f->uuid[ 0], f->uuid[ 1],
+	       f->uuid[ 2], f->uuid[ 3],
+	       f->uuid[ 4], f->uuid[ 5],
+	       f->uuid[ 6], f->uuid[ 7],
+	       f->uuid[ 8], f->uuid[ 9],
+	       f->uuid[10], f->uuid[11],
+	       f->uuid[12], f->uuid[13],
+	       f->uuid[14], f->uuid[15]);
+}
+
+static void dump_string(void *reply, unsigned int size)
+{
+	char *s = reply, *p;
+
+	p = s;
+	if (size >= 4096) {
+		size = 4096;
+		p[4092] = '.';
+		p[4093] = '.';
+		p[4094] = '.';
+		p[4095] = 0;
+	} else {
+		p[size] = 0;
+	}
+
+	for (p = s; *p; p++) {
+		if (!isprint(*p)) {
+			printf("<non-printable>\n");
+			continue;
+		}
+	}
+
+	printf("%s\n", s);
+}
+
+#define dump_fsinfo_generic_volume_id		dump_string
+#define dump_fsinfo_generic_volume_name		dump_string
+#define dump_fsinfo_generic_name_encoding	dump_string
+
+/*
+ *
+ */
+#define __FSINFO(A, T, S, U, G, F) \
+	{ .attr_id = A, .type = T, .size = S, .name = #G, .dump = dump_##G }
+
+#define _FSINFO(A, T, S, U, G)	  __FSINFO(A, T, S, U, G, 0)
+#define _FSINFO_N(A, T, S, U, G)  __FSINFO(A, T, S, U, G, FSINFO_FLAGS_N)
+#define _FSINFO_NM(A, T, S, U, G) __FSINFO(A, T, S, U, G, FSINFO_FLAGS_NM)
+
+#define _FSINFO_VSTRUCT(A,S,G)	 _FSINFO    (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
+#define _FSINFO_VSTRUCT_N(A,S,G) _FSINFO_N  (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
+#define _FSINFO_VSTRUCT_NM(A,S,G) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
+
+#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G)
+#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G)
+#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G)
+#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, 0, 0, G)
+#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, 0, 0, G)
+#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, 0, 0, G)
+#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, 0, 0, G)
+#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, 0, sizeof(A##__STRUCT), G)
+#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, 0, sizeof(A##__STRUCT), G)
+
+static const struct fsinfo_attribute fsinfo_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	fsinfo_generic_volume_name),
+	{}
+};
+
+static void dump_value(unsigned int attr_id,
+		       const struct fsinfo_attribute *attr,
+		       const struct fsinfo_attribute_info *attr_info,
+		       void *reply, unsigned int size)
+{
+	if (!attr || !attr->dump) {
+		printf("<no dumper>\n");
+		return;
+	}
+
+	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
+		printf("<short data %u/%u>\n", size, attr->size);
+		return;
+	}
+
+	attr->dump(reply, size);
+}
+
+static void dump_list(unsigned int attr_id,
+		      const struct fsinfo_attribute *attr,
+		      const struct fsinfo_attribute_info *attr_info,
+		      void *reply, unsigned int size)
+{
+	size_t elem_size = attr_info->element_size;
+	unsigned int ix = 0;
+
+	printf("\n");
+	if (!attr || !attr->dump) {
+		printf("<no dumper>\n");
+		return;
+	}
+
+	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
+		printf("<short data %u/%u>\n", size, attr->size);
+		return;
+	}
+
+	while (size >= elem_size) {
+		printf("\t[%02x] ", ix);
+		attr->dump(reply, size);
+		reply += elem_size;
+		size -= elem_size;
+		ix++;
+	}
+}
+
+/*
+ * Call fsinfo, expanding the buffer as necessary.
+ */
+static ssize_t get_fsinfo(const char *file, const char *name,
+			  struct fsinfo_params *params, void **_r)
+{
+	ssize_t ret;
+	size_t buf_size = 4096;
+	void *r;
+
+	for (;;) {
+		r = malloc(buf_size);
+		if (!r) {
+			perror("malloc");
+			exit(1);
+		}
+		memset(r, 0xbd, buf_size);
+
+		errno = 0;
+		ret = fsinfo(AT_FDCWD, file, params, r, buf_size);
+		if (ret == -1) {
+			free(r);
+			*_r = NULL;
+			return ret;
+		}
+
+		if (ret <= buf_size)
+			break;
+		buf_size = (ret + 4096 - 1) & ~(4096 - 1);
+	}
+
+	if (debug) {
+		if (ret == -1)
+			printf("fsinfo(%s,%s,%u,%u) = %m\n",
+			       file, name, params->Nth, params->Mth);
+		else
+			printf("fsinfo(%s,%s,%u,%u) = %zd\n",
+			       file, name, params->Nth, params->Mth, ret);
+	}
+
+	*_r = r;
+	return ret;
+}
+
+/*
+ * Try one subinstance of an attribute.
+ */
+static int try_one(const char *file, struct fsinfo_params *params,
+		   const struct fsinfo_attribute_info *attr_info, bool raw)
+{
+	const struct fsinfo_attribute *attr;
+	const char *name;
+	size_t size = 4096;
+	char namebuf[32];
+	void *r;
+
+	//printf("try %03x[%u][%u]\n", params->request, params->Nth, params->Mth);
+
+	for (attr = fsinfo_attributes; attr->name; attr++) {
+		if (attr->attr_id == params->request) {
+			name = attr->name;
+			if (strncmp(name, "fsinfo_generic_", 15) == 0)
+				name += 15;
+			goto found;
+		}
+	}
+
+	sprintf(namebuf, "<unknown-%x>", params->request);
+	name = namebuf;
+	attr = NULL;
+
+found:
+	size = get_fsinfo(file, name, params, &r);
+
+	if (size == -1) {
+		if (errno == ENODATA) {
+			if (!(attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) &&
+			    params->Nth == 0 && params->Mth == 0) {
+				fprintf(stderr,
+					"Unexpected ENODATA (0x%x{%u}{%u})\n",
+					params->request, params->Nth, params->Mth);
+				exit(1);
+			}
+			free(r);
+			return (params->Mth == 0) ? 2 : 1;
+		}
+		if (errno == EOPNOTSUPP) {
+			if (params->Nth > 0 || params->Mth > 0) {
+				fprintf(stderr,
+					"Should return -ENODATA (0x%x{%u}{%u})\n",
+					params->request, params->Nth, params->Mth);
+				exit(1);
+			}
+			//printf("\e[33m%s\e[m: <not supported>\n",
+			//       fsinfo_attr_names[attr]);
+			free(r);
+			return 2;
+		}
+		perror(file);
+		exit(1);
+	}
+
+	if (raw) {
+		if (size > 4096)
+			size = 4096;
+		dump_hex(r, 0, size);
+		free(r);
+		return 0;
+	}
+
+	switch (attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) {
+	case 0:
+		printf("\e[33m%s\e[m: ", name);
+		break;
+	case FSINFO_FLAGS_N:
+		printf("\e[33m%s{%u}\e[m: ", name, params->Nth);
+		break;
+	case FSINFO_FLAGS_NM:
+		printf("\e[33m%s{%u,%u}\e[m: ", name, params->Nth, params->Mth);
+		break;
+	}
+
+	switch (attr_info->type) {
+	case FSINFO_TYPE_VSTRUCT:
+	case FSINFO_TYPE_STRING:
+		dump_value(params->request, attr, attr_info, r, size);
+		free(r);
+		return 0;
+
+	case FSINFO_TYPE_LIST:
+		dump_list(params->request, attr, attr_info, r, size);
+		free(r);
+		return 0;
+
+	case FSINFO_TYPE_OPAQUE:
+		free(r);
+		return 0;
+
+	default:
+		fprintf(stderr, "Fishy about %u 0x%x,%x,%x\n",
+			params->request, attr_info->type, attr_info->flags, attr_info->size);
+		exit(1);
+	}
+}
+
+static int cmp_u32(const void *a, const void *b)
+{
+	return *(const int *)a - *(const int *)b;
+}
+
+/*
+ *
+ */
+int main(int argc, char **argv)
+{
+	struct fsinfo_attribute_info attr_info;
+	struct fsinfo_params params = {
+		.at_flags	= AT_SYMLINK_NOFOLLOW,
+		.flags		= FSINFO_FLAGS_QUERY_PATH,
+	};
+	unsigned int *attrs, ret, nr, i;
+	bool meta = false;
+	int raw = 0, opt, Nth, Mth;
+
+	while ((opt = getopt(argc, argv, "adlmr"))) {
+		switch (opt) {
+		case 'a':
+			params.at_flags |= AT_NO_AUTOMOUNT;
+			continue;
+		case 'd':
+			debug = true;
+			continue;
+		case 'l':
+			params.at_flags &= ~AT_SYMLINK_NOFOLLOW;
+			continue;
+		case 'm':
+			meta = true;
+			continue;
+		case 'r':
+			raw = 1;
+			continue;
+		}
+		break;
+	}
+
+	argc -= optind;
+	argv += optind;
+
+	if (argc != 1) {
+		printf("Format: test-fsinfo [-alr] <file>\n");
+		exit(2);
+	}
+
+	/* Retrieve a list of supported attribute IDs */
+	params.request = FSINFO_ATTR_FSINFO_ATTRIBUTES;
+	params.Nth = 0;
+	params.Mth = 0;
+	ret = get_fsinfo(argv[0], "attributes", &params, (void **)&attrs);
+	if (ret == -1) {
+		fprintf(stderr, "Unable to get attribute list: %m\n");
+		exit(1);
+	}
+
+	if (ret % sizeof(attrs[0])) {
+		fprintf(stderr, "Bad length of attribute list (0x%x)\n", ret);
+		exit(2);
+	}
+
+	nr = ret / sizeof(attrs[0]);
+	qsort(attrs, nr, sizeof(attrs[0]), cmp_u32);
+
+	if (meta) {
+		printf("ATTR ID  TYPE         FLAGS    SIZE  ESIZE NAME\n");
+		printf("======== ============ ======== ===== ===== =========\n");
+		for (i = 0; i < nr; i++) {
+			params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
+			params.Nth = attrs[i];
+			params.Mth = 0;
+			ret = fsinfo(AT_FDCWD, argv[0], &params, &attr_info, sizeof(attr_info));
+			if (ret == -1) {
+				fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
+				exit(1);
+			}
+
+			dump_attribute_info(&attr_info, ret);
+		}
+		exit(0);
+	}
+
+	for (i = 0; i < nr; i++) {
+		params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
+		params.Nth = attrs[i];
+		params.Mth = 0;
+		ret = fsinfo(AT_FDCWD, argv[0], &params, &attr_info, sizeof(attr_info));
+		if (ret == -1) {
+			fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
+			exit(1);
+		}
+
+		if (attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
+		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTES)
+			continue;
+
+		Nth = 0;
+		do {
+			Mth = 0;
+			do {
+				params.request = attrs[i];
+				params.Nth = Nth;
+				params.Mth = Mth;
+
+				switch (try_one(argv[0], &params, &attr_info, raw)) {
+				case 0:
+					continue;
+				case 1:
+					goto done_M;
+				case 2:
+					goto done_N;
+				}
+			} while (++Mth < 100);
+
+		done_M:
+			if (Mth >= 100) {
+				fprintf(stderr, "Fishy: Mth %x[%u][%u]\n", attrs[i], Nth, Mth);
+				break;
+			}
+
+		} while (++Nth < 100);
+
+	done_N:
+		if (Nth >= 100) {
+			fprintf(stderr, "Fishy: Nth %x[%u]\n", attrs[i], Nth);
+			break;
+		}
+	}
+
+	return 0;
+}



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 02/19] fsinfo: Add syscalls to other arches [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
  2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-21 14:51   ` Christian Brauner
  2020-02-18 17:05 ` [PATCH 03/19] fsinfo: Provide a bitmap of supported features " David Howells
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add the fsinfo syscall to the other arches.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/alpha/kernel/syscalls/syscall.tbl      |    1 +
 arch/arm/tools/syscall.tbl                  |    1 +
 arch/arm64/include/asm/unistd.h             |    2 +-
 arch/arm64/include/asm/unistd32.h           |    2 +-
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |    2 ++
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 +
 arch/s390/kernel/syscalls/syscall.tbl       |    1 +
 arch/sh/kernel/syscalls/syscall.tbl         |    1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 +
 16 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 36d42da7466a..961750417ef2 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -477,3 +477,4 @@
 # 545 reserved for clone3
 547	common	openat2				sys_openat2
 548	common	pidfd_getfd			sys_pidfd_getfd
+549	common	fsinfo				sys_fsinfo
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 4d1cf74a2caa..e6b9dfe01471 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -451,3 +451,4 @@
 435	common	clone3				sys_clone3
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 1dd22da1c3a9..75f04a1023be 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		439
+#define __NR_compat_syscalls		440
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index c1c61635f89c..7d1225de47e6 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -873,7 +873,7 @@ __SYSCALL(__NR_fsopen, sys_fsopen)
 __SYSCALL(__NR_fsconfig, sys_fsconfig)
 #define __NR_fsmount 432
 __SYSCALL(__NR_fsmount, sys_fsmount)
-#define __NR_fspick 433
+#define __NR_fspick 434
 __SYSCALL(__NR_fspick, sys_fspick)
 #define __NR_pidfd_open 434
 __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 042911e670b8..9018a3a6b067 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -358,3 +358,4 @@
 # 435 reserved for clone3
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index f4f49fcb76d0..10172bb6ba1f 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -437,3 +437,5 @@
 435	common	clone3				__sys_clone3
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+# 435 reserved for clone3
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 4c67b11f9c9e..58665073c1f0 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -443,3 +443,4 @@
 435	common	clone3				sys_clone3
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 1f9e8ad636cc..1f07a89473c3 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -376,3 +376,4 @@
 435	n32	clone3				__sys_clone3
 437	n32	openat2				sys_openat2
 438	n32	pidfd_getfd			sys_pidfd_getfd
+439	n32	fsinfo				sys_fsinfo
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index c0b9d802dbf6..3c853ca54901 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -352,3 +352,4 @@
 435	n64	clone3				__sys_clone3
 437	n64	openat2				sys_openat2
 438	n64	pidfd_getfd			sys_pidfd_getfd
+439	n64	fsinfo				sys_fsinfo
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index ac586774c980..727f54542bf4 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -425,3 +425,4 @@
 435	o32	clone3				__sys_clone3
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
+439	o32	fsinfo				sys_fsinfo
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 52a15f5cd130..2e9576638d80 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -435,3 +435,4 @@
 435	common	clone3				sys_clone3_wrapper
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 35b61bfc1b1a..397190734ca7 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -519,3 +519,4 @@
 435	nospu	clone3				ppc_clone3
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index bd7bd3581a0f..e9340d712dcd 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -440,3 +440,4 @@
 435  common	clone3			sys_clone3			sys_clone3
 437  common	openat2			sys_openat2			sys_openat2
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
+439  common	fsinfo			sys_fsinfo			sys_fsinfo
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index c7a30fcd135f..7bb5ec284fbb 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -440,3 +440,4 @@
 # 435 reserved for clone3
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index f13615ecdecc..a902b757ace2 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -483,3 +483,4 @@
 # 435 reserved for clone3
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 85a9ab1bc04d..f82702a7ab38 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -408,3 +408,4 @@
 435	common	clone3				sys_clone3
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
+439	common	fsinfo				sys_fsinfo



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 03/19] fsinfo: Provide a bitmap of supported features [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
  2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
  2020-02-18 17:05 ` [PATCH 02/19] fsinfo: Add syscalls to other arches " David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-19 16:37   ` Darrick J. Wong
  2020-02-20 12:22   ` David Howells
  2020-02-18 17:05 ` [PATCH 04/19] vfs: Add mount change counter " David Howells
                   ` (20 subsequent siblings)
  23 siblings, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Provide a bitmap of features that a filesystem may provide for the path
being queried.  Features include such things as:

 (1) The general class of filesystem, such as kernel-interface,
     block-based, flash-based, network-based.

 (2) Supported inode features, such as which timestamps are supported,
     whether simple numeric user, group or project IDs are supported and
     whether user identification is actually more complex behind the
     scenes.

 (3) Supported volume features, such as it having a UUID, a name or a
     filesystem ID.

 (4) Supported filesystem features, such as what types of file are
     supported, whether sparse files, extended attributes and quotas are
     supported.

 (5) Supported interface features, such as whether locking and leases are
     supported, what open flags are honoured and how i_version is managed.

For some filesystems, this may be an immutable set and can just be memcpy'd
into the reply buffer.
---

 fs/fsinfo.c                 |   41 +++++++++++++++++++
 include/linux/fsinfo.h      |   32 +++++++++++++++
 include/uapi/linux/fsinfo.h |   78 ++++++++++++++++++++++++++++++++++++
 samples/vfs/test-fsinfo.c   |   93 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 242 insertions(+), 2 deletions(-)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index 3bc35b91f20b..55710d6da327 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -36,6 +36,17 @@ int fsinfo_string(const char *s, struct fsinfo_context *ctx)
 }
 EXPORT_SYMBOL(fsinfo_string);
 
+/*
+ * Get information about fsinfo() itself.
+ */
+static int fsinfo_generic_fsinfo(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_fsinfo *info = ctx->buffer;
+
+	info->max_features = FSINFO_FEAT__NR;
+	return sizeof(*info);
+}
+
 /*
  * Get basic filesystem stats from statfs.
  */
@@ -121,6 +132,33 @@ static int fsinfo_generic_supports(struct path *path, struct fsinfo_context *ctx
 	return sizeof(*c);
 }
 
+static int fsinfo_generic_features(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_features *ft = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	if (sb->s_mtd)
+		fsinfo_set_feature(ft, FSINFO_FEAT_IS_FLASH_FS);
+	else if (sb->s_bdev)
+		fsinfo_set_feature(ft, FSINFO_FEAT_IS_BLOCK_FS);
+
+	if (sb->s_quota_types & QTYPE_MASK_USR)
+		fsinfo_set_feature(ft, FSINFO_FEAT_USER_QUOTAS);
+	if (sb->s_quota_types & QTYPE_MASK_GRP)
+		fsinfo_set_feature(ft, FSINFO_FEAT_GROUP_QUOTAS);
+	if (sb->s_quota_types & QTYPE_MASK_PRJ)
+		fsinfo_set_feature(ft, FSINFO_FEAT_PROJECT_QUOTAS);
+	if (sb->s_d_op && sb->s_d_op->d_automount)
+		fsinfo_set_feature(ft, FSINFO_FEAT_AUTOMOUNTS);
+	if (sb->s_id[0])
+		fsinfo_set_feature(ft, FSINFO_FEAT_VOLUME_ID);
+
+	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_ATIME);
+	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_CTIME);
+	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_MTIME);
+	return sizeof(*ft);
+}
+
 static const struct fsinfo_timestamp_info fsinfo_default_timestamp_info = {
 	.atime = {
 		.minimum	= S64_MIN,
@@ -252,6 +290,9 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
 	FSINFO_STRING 	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
+
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FSINFO,		fsinfo_generic_fsinfo),
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_attribute_info),
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_attributes),
 	{}
diff --git a/include/linux/fsinfo.h b/include/linux/fsinfo.h
index dcd55dbb02fa..564765c70659 100644
--- a/include/linux/fsinfo.h
+++ b/include/linux/fsinfo.h
@@ -65,6 +65,38 @@ struct fsinfo_attribute {
 
 extern int fsinfo_string(const char *, struct fsinfo_context *);
 
+static inline void fsinfo_set_feature(struct fsinfo_features *f,
+				      enum fsinfo_feature feature)
+{
+	f->features[feature / 8] |= 1 << (feature % 8);
+}
+
+static inline void fsinfo_clear_feature(struct fsinfo_features *f,
+					enum fsinfo_feature feature)
+{
+	f->features[feature / 8] &= ~(1 << (feature % 8));
+}
+
+/**
+ * fsinfo_set_unix_features - Set standard UNIX features.
+ * @f: The features mask to alter
+ */
+static inline void fsinfo_set_unix_features(struct fsinfo_features *f)
+{
+	fsinfo_set_feature(f, FSINFO_FEAT_UIDS);
+	fsinfo_set_feature(f, FSINFO_FEAT_GIDS);
+	fsinfo_set_feature(f, FSINFO_FEAT_DIRECTORIES);
+	fsinfo_set_feature(f, FSINFO_FEAT_SYMLINKS);
+	fsinfo_set_feature(f, FSINFO_FEAT_HARD_LINKS);
+	fsinfo_set_feature(f, FSINFO_FEAT_DEVICE_FILES);
+	fsinfo_set_feature(f, FSINFO_FEAT_UNIX_SPECIALS);
+	fsinfo_set_feature(f, FSINFO_FEAT_SPARSE);
+	fsinfo_set_feature(f, FSINFO_FEAT_HAS_ATIME);
+	fsinfo_set_feature(f, FSINFO_FEAT_HAS_CTIME);
+	fsinfo_set_feature(f, FSINFO_FEAT_HAS_MTIME);
+	fsinfo_set_feature(f, FSINFO_FEAT_HAS_INODE_NUMBERS);
+}
+
 #endif /* CONFIG_FSINFO */
 
 #endif /* _LINUX_FSINFO_H */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 365d54fe9290..f40b5c0b5516 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -22,9 +22,11 @@
 #define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
 #define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
 #define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
+#define FSINFO_ATTR_FEATURES		0x08	/* Filesystem features (bits) */
 
 #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
+#define FSINFO_ATTR_FSINFO		0x102	/* Information about fsinfo() as a whole */
 
 /*
  * Optional fsinfo() parameter structure.
@@ -45,6 +47,17 @@ struct fsinfo_params {
 	__u64	__reserved[3];
 };
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_FSINFO).
+ *
+ * This gives information about fsinfo() itself.
+ */
+struct fsinfo_fsinfo {
+	__u32	max_features;	/* Number of supported features (fsinfo_features__nr) */
+};
+
+#define FSINFO_ATTR_FSINFO__STRUCT struct fsinfo_fsinfo
+
 enum fsinfo_value_type {
 	FSINFO_TYPE_VSTRUCT	= 0,	/* Version-lengthed struct (up to 4096 bytes) */
 	FSINFO_TYPE_STRING	= 1,	/* NUL-term var-length string (up to 4095 chars) */
@@ -154,6 +167,71 @@ struct fsinfo_supports {
 
 #define FSINFO_ATTR_SUPPORTS__STRUCT struct fsinfo_supports
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_FEATURES).
+ *
+ * Bitmask indicating filesystem features where renderable as single bits.
+ */
+enum fsinfo_feature {
+	FSINFO_FEAT_IS_KERNEL_FS	= 0,	/* fs is kernel-special filesystem */
+	FSINFO_FEAT_IS_BLOCK_FS		= 1,	/* fs is block-based filesystem */
+	FSINFO_FEAT_IS_FLASH_FS		= 2,	/* fs is flash filesystem */
+	FSINFO_FEAT_IS_NETWORK_FS	= 3,	/* fs is network filesystem */
+	FSINFO_FEAT_IS_AUTOMOUNTER_FS	= 4,	/* fs is automounter special filesystem */
+	FSINFO_FEAT_IS_MEMORY_FS	= 5,	/* fs is memory-based filesystem */
+	FSINFO_FEAT_AUTOMOUNTS		= 6,	/* fs supports automounts */
+	FSINFO_FEAT_ADV_LOCKS		= 7,	/* fs supports advisory file locking */
+	FSINFO_FEAT_MAND_LOCKS		= 8,	/* fs supports mandatory file locking */
+	FSINFO_FEAT_LEASES		= 9,	/* fs supports file leases */
+	FSINFO_FEAT_UIDS		= 10,	/* fs supports numeric uids */
+	FSINFO_FEAT_GIDS		= 11,	/* fs supports numeric gids */
+	FSINFO_FEAT_PROJIDS		= 12,	/* fs supports numeric project ids */
+	FSINFO_FEAT_STRING_USER_IDS	= 13,	/* fs supports string user identifiers */
+	FSINFO_FEAT_GUID_USER_IDS	= 14,	/* fs supports GUID user identifiers */
+	FSINFO_FEAT_WINDOWS_ATTRS	= 15,	/* fs has windows attributes */
+	FSINFO_FEAT_USER_QUOTAS		= 16,	/* fs has per-user quotas */
+	FSINFO_FEAT_GROUP_QUOTAS	= 17,	/* fs has per-group quotas */
+	FSINFO_FEAT_PROJECT_QUOTAS	= 18,	/* fs has per-project quotas */
+	FSINFO_FEAT_XATTRS		= 19,	/* fs has xattrs */
+	FSINFO_FEAT_JOURNAL		= 20,	/* fs has a journal */
+	FSINFO_FEAT_DATA_IS_JOURNALLED	= 21,	/* fs is using data journalling */
+	FSINFO_FEAT_O_SYNC		= 22,	/* fs supports O_SYNC */
+	FSINFO_FEAT_O_DIRECT		= 23,	/* fs supports O_DIRECT */
+	FSINFO_FEAT_VOLUME_ID		= 24,	/* fs has a volume ID */
+	FSINFO_FEAT_VOLUME_UUID		= 25,	/* fs has a volume UUID */
+	FSINFO_FEAT_VOLUME_NAME		= 26,	/* fs has a volume name */
+	FSINFO_FEAT_VOLUME_FSID		= 27,	/* fs has a volume FSID */
+	FSINFO_FEAT_IVER_ALL_CHANGE	= 28,	/* i_version represents data + meta changes */
+	FSINFO_FEAT_IVER_DATA_CHANGE	= 29,	/* i_version represents data changes only */
+	FSINFO_FEAT_IVER_MONO_INCR	= 30,	/* i_version incremented monotonically */
+	FSINFO_FEAT_DIRECTORIES		= 31,	/* fs supports (sub)directories */
+	FSINFO_FEAT_SYMLINKS		= 32,	/* fs supports symlinks */
+	FSINFO_FEAT_HARD_LINKS		= 33,	/* fs supports hard links */
+	FSINFO_FEAT_HARD_LINKS_1DIR	= 34,	/* fs supports hard links in same dir only */
+	FSINFO_FEAT_DEVICE_FILES	= 35,	/* fs supports bdev, cdev */
+	FSINFO_FEAT_UNIX_SPECIALS	= 36,	/* fs supports pipe, fifo, socket */
+	FSINFO_FEAT_RESOURCE_FORKS	= 37,	/* fs supports resource forks/streams */
+	FSINFO_FEAT_NAME_CASE_INDEP	= 38,	/* Filename case independence is mandatory */
+	FSINFO_FEAT_NAME_NON_UTF8	= 39,	/* fs has non-utf8 names */
+	FSINFO_FEAT_NAME_HAS_CODEPAGE	= 40,	/* fs has a filename codepage */
+	FSINFO_FEAT_SPARSE		= 41,	/* fs supports sparse files */
+	FSINFO_FEAT_NOT_PERSISTENT	= 42,	/* fs is not persistent */
+	FSINFO_FEAT_NO_UNIX_MODE	= 43,	/* fs does not support unix mode bits */
+	FSINFO_FEAT_HAS_ATIME		= 44,	/* fs supports access time */
+	FSINFO_FEAT_HAS_BTIME		= 45,	/* fs supports birth/creation time */
+	FSINFO_FEAT_HAS_CTIME		= 46,	/* fs supports change time */
+	FSINFO_FEAT_HAS_MTIME		= 47,	/* fs supports modification time */
+	FSINFO_FEAT_HAS_ACL		= 48,	/* fs supports ACLs of some sort */
+	FSINFO_FEAT_HAS_INODE_NUMBERS	= 49,	/* fs has inode numbers */
+	FSINFO_FEAT__NR
+};
+
+struct fsinfo_features {
+	__u8	features[(FSINFO_FEAT__NR + 7) / 8];
+};
+
+#define FSINFO_ATTR_FEATURES__STRUCT struct fsinfo_features
+
 struct fsinfo_timestamp_one {
 	__s64	minimum;	/* Minimum timestamp value in seconds */
 	__u64	maximum;	/* Maximum timestamp value in seconds */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 9a4d49db2996..6fbf0ce099b2 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -107,6 +107,13 @@ static void dump_attribute_info(void *reply, unsigned int size)
 	       attr->name ? attr->name : "");
 }
 
+static void dump_fsinfo_info(void *reply, unsigned int size)
+{
+	struct fsinfo_fsinfo *f = reply;
+
+	printf("Number of feature bits: %u\n", f->max_features);
+}
+
 static void dump_fsinfo_generic_statfs(void *reply, unsigned int size)
 {
 	struct fsinfo_statfs *f = reply;
@@ -172,6 +179,74 @@ static void dump_fsinfo_generic_supports(void *reply, unsigned int size)
 	printf("\twin_fattrs   : %x\n", f->win_file_attrs);
 }
 
+#define FSINFO_FEATURE_NAME(C) [FSINFO_FEAT_##C] = #C
+static const char *fsinfo_feature_names[FSINFO_FEAT__NR] = {
+	FSINFO_FEATURE_NAME(IS_KERNEL_FS),
+	FSINFO_FEATURE_NAME(IS_BLOCK_FS),
+	FSINFO_FEATURE_NAME(IS_FLASH_FS),
+	FSINFO_FEATURE_NAME(IS_NETWORK_FS),
+	FSINFO_FEATURE_NAME(IS_AUTOMOUNTER_FS),
+	FSINFO_FEATURE_NAME(IS_MEMORY_FS),
+	FSINFO_FEATURE_NAME(AUTOMOUNTS),
+	FSINFO_FEATURE_NAME(ADV_LOCKS),
+	FSINFO_FEATURE_NAME(MAND_LOCKS),
+	FSINFO_FEATURE_NAME(LEASES),
+	FSINFO_FEATURE_NAME(UIDS),
+	FSINFO_FEATURE_NAME(GIDS),
+	FSINFO_FEATURE_NAME(PROJIDS),
+	FSINFO_FEATURE_NAME(STRING_USER_IDS),
+	FSINFO_FEATURE_NAME(GUID_USER_IDS),
+	FSINFO_FEATURE_NAME(WINDOWS_ATTRS),
+	FSINFO_FEATURE_NAME(USER_QUOTAS),
+	FSINFO_FEATURE_NAME(GROUP_QUOTAS),
+	FSINFO_FEATURE_NAME(PROJECT_QUOTAS),
+	FSINFO_FEATURE_NAME(XATTRS),
+	FSINFO_FEATURE_NAME(JOURNAL),
+	FSINFO_FEATURE_NAME(DATA_IS_JOURNALLED),
+	FSINFO_FEATURE_NAME(O_SYNC),
+	FSINFO_FEATURE_NAME(O_DIRECT),
+	FSINFO_FEATURE_NAME(VOLUME_ID),
+	FSINFO_FEATURE_NAME(VOLUME_UUID),
+	FSINFO_FEATURE_NAME(VOLUME_NAME),
+	FSINFO_FEATURE_NAME(VOLUME_FSID),
+	FSINFO_FEATURE_NAME(IVER_ALL_CHANGE),
+	FSINFO_FEATURE_NAME(IVER_DATA_CHANGE),
+	FSINFO_FEATURE_NAME(IVER_MONO_INCR),
+	FSINFO_FEATURE_NAME(DIRECTORIES),
+	FSINFO_FEATURE_NAME(SYMLINKS),
+	FSINFO_FEATURE_NAME(HARD_LINKS),
+	FSINFO_FEATURE_NAME(HARD_LINKS_1DIR),
+	FSINFO_FEATURE_NAME(DEVICE_FILES),
+	FSINFO_FEATURE_NAME(UNIX_SPECIALS),
+	FSINFO_FEATURE_NAME(RESOURCE_FORKS),
+	FSINFO_FEATURE_NAME(NAME_CASE_INDEP),
+	FSINFO_FEATURE_NAME(NAME_NON_UTF8),
+	FSINFO_FEATURE_NAME(NAME_HAS_CODEPAGE),
+	FSINFO_FEATURE_NAME(SPARSE),
+	FSINFO_FEATURE_NAME(NOT_PERSISTENT),
+	FSINFO_FEATURE_NAME(NO_UNIX_MODE),
+	FSINFO_FEATURE_NAME(HAS_ATIME),
+	FSINFO_FEATURE_NAME(HAS_BTIME),
+	FSINFO_FEATURE_NAME(HAS_CTIME),
+	FSINFO_FEATURE_NAME(HAS_MTIME),
+	FSINFO_FEATURE_NAME(HAS_ACL),
+	FSINFO_FEATURE_NAME(HAS_INODE_NUMBERS),
+};
+
+static void dump_fsinfo_generic_features(void *reply, unsigned int size)
+{
+	struct fsinfo_features *f = reply;
+	int i;
+
+	printf("\n\t");
+	for (i = 0; i < sizeof(f->features); i++)
+		printf("%02x", f->features[i]);
+	printf("\n");
+	for (i = 0; i < FSINFO_FEAT__NR; i++)
+		if (f->features[i / 8] & (1 << (i % 8)))
+			printf("\t- %s\n", fsinfo_feature_names[i]);
+}
+
 static void print_time(struct fsinfo_timestamp_one *t, char stamp)
 {
 	printf("\t%ctime       : gran=%gs range=%llx-%llx\n",
@@ -235,7 +310,6 @@ static void dump_string(void *reply, unsigned int size)
 
 #define dump_fsinfo_generic_volume_id		dump_string
 #define dump_fsinfo_generic_volume_name		dump_string
-#define dump_fsinfo_generic_name_encoding	dump_string
 
 /*
  *
@@ -266,6 +340,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
@@ -475,6 +550,7 @@ static int cmp_u32(const void *a, const void *b)
 int main(int argc, char **argv)
 {
 	struct fsinfo_attribute_info attr_info;
+	struct fsinfo_fsinfo fsinfo_info;
 	struct fsinfo_params params = {
 		.at_flags	= AT_SYMLINK_NOFOLLOW,
 		.flags		= FSINFO_FLAGS_QUERY_PATH,
@@ -531,6 +607,18 @@ int main(int argc, char **argv)
 	qsort(attrs, nr, sizeof(attrs[0]), cmp_u32);
 
 	if (meta) {
+		params.request = FSINFO_ATTR_FSINFO;
+		params.Nth = 0;
+		params.Mth = 0;
+		ret = fsinfo(AT_FDCWD, argv[0], &params, &fsinfo_info, sizeof(fsinfo_info));
+		if (ret == -1) {
+			fprintf(stderr, "Unable to get fsinfo information: %m\n");
+			exit(1);
+		}
+
+		dump_fsinfo_info(&fsinfo_info, ret);
+		printf("\n");
+
 		printf("ATTR ID  TYPE         FLAGS    SIZE  ESIZE NAME\n");
 		printf("======== ============ ======== ===== ===== =========\n");
 		for (i = 0; i < nr; i++) {
@@ -558,7 +646,8 @@ int main(int argc, char **argv)
 			exit(1);
 		}
 
-		if (attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
+		if (attrs[i] == FSINFO_ATTR_FSINFO ||
+		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
 		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTES)
 			continue;
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 04/19] vfs: Add mount change counter [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (2 preceding siblings ...)
  2020-02-18 17:05 ` [PATCH 03/19] fsinfo: Provide a bitmap of supported features " David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-21 14:48   ` Christian Brauner
  2020-02-18 17:05 ` [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID " David Howells
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add a change counter on each mount object so that the user can easily check
to see if a mount has changed its attributes or its children.

Future patches will:

 (1) Provide this value through fsinfo() attributes.

 (2) Implement a watch_mount() system call to provide a notification
     interface for userspace monitoring.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/mount.h     |   22 ++++++++++++++++++++++
 fs/namespace.c |   11 +++++++++++
 2 files changed, 33 insertions(+)

diff --git a/fs/mount.h b/fs/mount.h
index 711a4093e475..a1625924fe81 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -72,6 +72,7 @@ struct mount {
 	int mnt_expiry_mark;		/* true if marked for expiry */
 	struct hlist_head mnt_pins;
 	struct hlist_head mnt_stuck_children;
+	atomic_t mnt_change_counter;	/* Number of changed applied */
 } __randomize_layout;
 
 #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */
@@ -153,3 +154,24 @@ static inline bool is_anon_ns(struct mnt_namespace *ns)
 {
 	return ns->seq == 0;
 }
+
+/*
+ * Type of mount topology change notification.
+ */
+enum mount_notification_subtype {
+	NOTIFY_MOUNT_NEW_MOUNT	= 0, /* New mount added */
+	NOTIFY_MOUNT_UNMOUNT	= 1, /* Mount removed manually */
+	NOTIFY_MOUNT_EXPIRY	= 2, /* Automount expired */
+	NOTIFY_MOUNT_READONLY	= 3, /* Mount R/O state changed */
+	NOTIFY_MOUNT_SETATTR	= 4, /* Mount attributes changed */
+	NOTIFY_MOUNT_MOVE_FROM	= 5, /* Mount moved from here */
+	NOTIFY_MOUNT_MOVE_TO	= 6, /* Mount moved to here (compare op_id) */
+};
+
+static inline void notify_mount(struct mount *changed,
+				struct mount *aux,
+				enum mount_notification_subtype subtype,
+				u32 info_flags)
+{
+	atomic_inc(&changed->mnt_change_counter);
+}
diff --git a/fs/namespace.c b/fs/namespace.c
index 85b5f7bea82e..5c84aadb6aa1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -498,6 +498,8 @@ static int mnt_make_readonly(struct mount *mnt)
 	smp_wmb();
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 	unlock_mount_hash();
+	if (ret == 0)
+		notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, 0x10000);
 	return ret;
 }
 
@@ -506,6 +508,7 @@ static int __mnt_unmake_readonly(struct mount *mnt)
 	lock_mount_hash();
 	mnt->mnt.mnt_flags &= ~MNT_READONLY;
 	unlock_mount_hash();
+	notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, 0);
 	return 0;
 }
 
@@ -819,6 +822,7 @@ static struct mountpoint *unhash_mnt(struct mount *mnt)
  */
 static void umount_mnt(struct mount *mnt)
 {
+	notify_mount(mnt->mnt_parent, mnt, NOTIFY_MOUNT_UNMOUNT, 0);
 	put_mountpoint(unhash_mnt(mnt));
 }
 
@@ -1453,6 +1457,7 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 		p = list_first_entry(&tmp_list, struct mount, mnt_list);
 		list_del_init(&p->mnt_expire);
 		list_del_init(&p->mnt_list);
+
 		ns = p->mnt_ns;
 		if (ns) {
 			ns->mounts--;
@@ -2078,7 +2083,10 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 		lock_mount_hash();
 	}
 	if (moving) {
+		notify_mount(source_mnt->mnt_parent, source_mnt,
+			     NOTIFY_MOUNT_MOVE_FROM, 0);
 		unhash_mnt(source_mnt);
+		notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_MOVE_TO, 0);
 		attach_mnt(source_mnt, dest_mnt, dest_mp);
 		touch_mnt_namespace(source_mnt->mnt_ns);
 	} else {
@@ -2087,6 +2095,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 			list_del_init(&source_mnt->mnt_ns->list);
 		}
 		mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
+		notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_NEW_MOUNT, 0);
 		commit_tree(source_mnt);
 	}
 
@@ -2464,6 +2473,7 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
 	mnt->mnt.mnt_flags = mnt_flags;
 	touch_mnt_namespace(mnt->mnt_ns);
 	unlock_mount_hash();
+	notify_mount(mnt, NULL, NOTIFY_MOUNT_SETATTR, 0);
 }
 
 static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt)
@@ -2898,6 +2908,7 @@ void mark_mounts_for_expiry(struct list_head *mounts)
 		if (!xchg(&mnt->mnt_expiry_mark, 1) ||
 			propagate_mount_busy(mnt, 1))
 			continue;
+		notify_mount(mnt, NULL, NOTIFY_MOUNT_EXPIRY, 0);
 		list_move(&mnt->mnt_expire, &graveyard);
 	}
 	while (!list_empty(&graveyard)) {



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (3 preceding siblings ...)
  2020-02-18 17:05 ` [PATCH 04/19] vfs: Add mount change counter " David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-19 16:53   ` Darrick J. Wong
  2020-02-20 12:45   ` David Howells
  2020-02-18 17:05 ` [PATCH 06/19] vfs: Allow fsinfo() to look up a mount object by " David Howells
                   ` (18 subsequent siblings)
  23 siblings, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Introduce an (effectively) non-repeating system-unique superblock ID that
can be used to determine that two object are in the same superblock without
risking reuse of the ID in the meantime (as is possible with device IDs).

The ID is time-based to make it harder to use it as a covert communications
channel.

Also make it so that this ID can be fetched by the fsinfo() system call.
The ID added so that superblock notification messages will also be able to
be tagged with it.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c               |    1 +
 fs/super.c                |   24 ++++++++++++++++++++++++
 include/linux/fs.h        |    3 +++
 samples/vfs/test-fsinfo.c |    1 +
 4 files changed, 29 insertions(+)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index 55710d6da327..f8e85762fc47 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -92,6 +92,7 @@ static int fsinfo_generic_ids(struct path *path, struct fsinfo_context *ctx)
 	p->f_fstype	= sb->s_magic;
 	p->f_dev_major	= MAJOR(sb->s_dev);
 	p->f_dev_minor	= MINOR(sb->s_dev);
+	p->f_sb_id	= sb->s_unique_id;
 
 	memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
 	strlcpy(p->f_fs_name, path->dentry->d_sb->s_type->name,
diff --git a/fs/super.c b/fs/super.c
index cd352530eca9..a63073e6127e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -44,6 +44,8 @@ static int thaw_super_locked(struct super_block *sb);
 
 static LIST_HEAD(super_blocks);
 static DEFINE_SPINLOCK(sb_lock);
+static u64 sb_last_identifier;
+static u64 sb_identifier_offset;
 
 static char *sb_writers_name[SB_FREEZE_LEVELS] = {
 	"sb_writers",
@@ -188,6 +190,27 @@ static void destroy_unused_super(struct super_block *s)
 	destroy_super_work(&s->destroy_work);
 }
 
+/*
+ * Generate a unique identifier for a superblock.
+ */
+static void generate_super_id(struct super_block *s)
+{
+	u64 id = ktime_to_ns(ktime_get());
+
+	spin_lock(&sb_lock);
+
+	id += sb_identifier_offset;
+	if (id <= sb_last_identifier) {
+		id = sb_last_identifier + 1;
+		sb_identifier_offset = sb_last_identifier - id;
+	}
+
+	sb_last_identifier = id;
+	spin_unlock(&sb_lock);
+
+	s->s_unique_id = id;
+}
+
 /**
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
@@ -273,6 +296,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 		goto fail;
 	if (list_lru_init_memcg(&s->s_inode_lru, &s->s_shrink))
 		goto fail;
+	generate_super_id(s);
 	return s;
 
 fail:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f74a4ee36eb3..e5db22d536a3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1550,6 +1550,9 @@ struct super_block {
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
+
+	/* Superblock event notifications */
+	u64			s_unique_id;
 } __randomize_layout;
 
 /* Helper functions so that in most cases filesystems will
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 6fbf0ce099b2..d6ec5713364f 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -140,6 +140,7 @@ static void dump_fsinfo_generic_ids(void *reply, unsigned int size)
 	printf("\tdev          : %02x:%02x\n", f->f_dev_major, f->f_dev_minor);
 	printf("\tfs           : type=%x name=%s\n", f->f_fstype, f->f_fs_name);
 	printf("\tfsid         : %llx\n", (unsigned long long)f->f_fsid);
+	printf("\tsbid         : %llx\n", (unsigned long long)f->f_sb_id);
 }
 
 static void dump_fsinfo_generic_limits(void *reply, unsigned int size)



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 06/19] vfs: Allow fsinfo() to look up a mount object by ID [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (4 preceding siblings ...)
  2020-02-18 17:05 ` [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID " David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-21 15:09   ` Christian Brauner
  2020-02-18 17:05 ` [PATCH 07/19] vfs: Allow mount information to be queried by fsinfo() " David Howells
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Allow the fsinfo() syscall to look up a mount object by ID rather than by
pathname.  This is necessary as there can be multiple mounts stacked up at
the same pathname and there's no way to look through them otherwise.

This is done by passing FSINFO_FLAGS_QUERY_MOUNT to fsinfo() in the
parameters and then passing the mount ID as a string to fsinfo() in place
of the filename:

	struct fsinfo_params params = {
		.flags	 = FSINFO_FLAGS_QUERY_MOUNT,
		.request = FSINFO_ATTR_IDS,
	};

	ret = fsinfo(AT_FDCWD, "21", &params, buffer, sizeof(buffer));

The caller is only permitted to query a mount object if the root directory
of that mount connects directly to the current chroot if dfd == AT_FDCWD[*]
or the directory specified by dfd otherwise.  Note that this is not
available to the pathwalk of any other syscall.

[*] This needs to be something other than AT_FDCWD, perhaps AT_FDROOT.

[!] This probably needs an LSM hook.

[!] This might want to check the permissions on all the intervening dirs -
    but it would have to do that under RCU conditions.

[!] This might want to check a CAP_* flag.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                 |   53 +++++++++++++++++++
 fs/internal.h               |    2 +
 fs/namespace.c              |  117 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/fsinfo.h |    1 
 samples/vfs/test-fsinfo.c   |   11 +++-
 5 files changed, 179 insertions(+), 5 deletions(-)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index f8e85762fc47..ddc11cc40b45 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -464,6 +464,56 @@ static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_context *ctx)
 	return ret;
 }
 
+/*
+ * Look up the root of a mount object.  This allows access to mount objects
+ * (and their attached superblocks) that can't be retrieved by path because
+ * they're entirely covered.
+ *
+ * We only permit access to a mount that has a direct path between either the
+ * dentry pointed to by dfd or to our chroot (if dfd is AT_FDCWD).
+ */
+static int vfs_fsinfo_mount(int dfd, const char __user *filename,
+			    struct fsinfo_context *ctx)
+{
+	struct path path;
+	struct fd f = {};
+	char *name;
+	long mnt_id;
+	int ret;
+
+	if (!filename)
+		return -EINVAL;
+
+	name = strndup_user(filename, 32);
+	if (IS_ERR(name))
+		return PTR_ERR(name);
+	ret = kstrtoul(name, 0, &mnt_id);
+	if (ret < 0)
+		goto out_name;
+	if (mnt_id > INT_MAX)
+		goto out_name;
+
+	if (dfd != AT_FDCWD) {
+		ret = -EBADF;
+		f = fdget_raw(dfd);
+		if (!f.file)
+			goto out_name;
+	}
+
+	ret = lookup_mount_object(f.file ? &f.file->f_path : NULL,
+				  mnt_id, &path);
+	if (ret < 0)
+		goto out_fd;
+
+	ret = vfs_fsinfo(&path, ctx);
+	path_put(&path);
+out_fd:
+	fdput(f);
+out_name:
+	kfree(name);
+	return ret;
+}
+
 /**
  * sys_fsinfo - System call to get filesystem information
  * @dfd: Base directory to pathwalk from or fd referring to filesystem.
@@ -533,6 +583,9 @@ SYSCALL_DEFINE5(fsinfo,
 			return -EINVAL;
 		ret = vfs_fsinfo_fd(dfd, &ctx);
 		break;
+	case FSINFO_FLAGS_QUERY_MOUNT:
+		ret = vfs_fsinfo_mount(dfd, pathname, &ctx);
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/fs/internal.h b/fs/internal.h
index f3f280b952a3..2ccd2b2eae88 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -91,6 +91,8 @@ extern int __mnt_want_write_file(struct file *);
 extern void __mnt_drop_write_file(struct file *);
 
 extern void dissolve_on_fput(struct vfsmount *);
+extern int lookup_mount_object(struct path *, int, struct path *);
+
 /*
  * fs_struct.c
  */
diff --git a/fs/namespace.c b/fs/namespace.c
index 5c84aadb6aa1..c24d779e0095 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -63,7 +63,7 @@ static int __init set_mphash_entries(char *str)
 __setup("mphash_entries=", set_mphash_entries);
 
 static u64 event;
-static DEFINE_IDA(mnt_id_ida);
+static DEFINE_IDR(mnt_id_ida);
 static DEFINE_IDA(mnt_group_ida);
 
 static struct hlist_head *mount_hashtable __read_mostly;
@@ -104,17 +104,27 @@ static inline struct hlist_head *mp_hash(struct dentry *dentry)
 
 static int mnt_alloc_id(struct mount *mnt)
 {
-	int res = ida_alloc(&mnt_id_ida, GFP_KERNEL);
+	int res;
 
+	/* Allocate an ID, but don't set the pointer back to the mount until
+	 * later, as once we do that, we have to follow RCU protocols to get
+	 * rid of the mount struct.
+	 */
+	res = idr_alloc(&mnt_id_ida, NULL, 0, INT_MAX, GFP_KERNEL);
 	if (res < 0)
 		return res;
 	mnt->mnt_id = res;
 	return 0;
 }
 
+static void mnt_publish_id(struct mount *mnt)
+{
+	idr_replace(&mnt_id_ida, mnt, mnt->mnt_id);
+}
+
 static void mnt_free_id(struct mount *mnt)
 {
-	ida_free(&mnt_id_ida, mnt->mnt_id);
+	idr_remove(&mnt_id_ida, mnt->mnt_id);
 }
 
 /*
@@ -957,6 +967,7 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
 	lock_mount_hash();
 	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
 	unlock_mount_hash();
+	mnt_publish_id(mnt);
 	return &mnt->mnt;
 }
 EXPORT_SYMBOL(vfs_create_mount);
@@ -1050,6 +1061,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	lock_mount_hash();
 	list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
 	unlock_mount_hash();
+	mnt_publish_id(mnt);
 
 	if ((flag & CL_SLAVE) ||
 	    ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) {
@@ -3986,3 +3998,102 @@ const struct proc_ns_operations mntns_operations = {
 	.install	= mntns_install,
 	.owner		= mntns_owner,
 };
+
+/*
+ * See if one path point connects directly to another by ancestral relationship
+ * across mountpoints.  Must call with the RCU read lock held.
+ */
+static bool are_paths_connected(struct path *ancestor, struct path *to_check)
+{
+	struct mount *mnt, *parent;
+	struct path cursor;
+	unsigned seq;
+	bool connected;
+
+	seq = 0;
+restart:
+	cursor = *to_check;
+
+	read_seqbegin_or_lock(&rename_lock, &seq);
+	while (cursor.mnt != ancestor->mnt) {
+		mnt = real_mount(cursor.mnt);
+		parent = READ_ONCE(mnt->mnt_parent);
+		if (mnt == parent)
+			goto failed;
+		cursor.dentry = READ_ONCE(mnt->mnt_mountpoint);
+		cursor.mnt = &parent->mnt;
+	}
+
+	while (cursor.dentry != ancestor->dentry) {
+		if (cursor.dentry == cursor.mnt->mnt_root ||
+		    IS_ROOT(cursor.dentry))
+			goto failed;
+		cursor.dentry = READ_ONCE(cursor.dentry->d_parent);
+	}
+
+	connected = true;
+out:
+	done_seqretry(&rename_lock, seq);
+	return connected;
+
+failed:
+	if (need_seqretry(&rename_lock, seq)) {
+		seq = 1;
+		goto restart;
+	}
+	connected = false;
+	goto out;
+}
+
+/**
+ * lookup_mount_object - Look up a vfsmount object by ID
+ * @root: The mount root must connect backwards to this point (or chroot if NULL).
+ * @id: The ID of the mountpoint.
+ * @_mntpt: Where to return the resulting mountpoint path.
+ *
+ * Look up the root of the mount with the corresponding ID.  This is only
+ * permitted if that mount connects directly to the specified root/chroot.
+ */
+int lookup_mount_object(struct path *root, int mnt_id, struct path *_mntpt)
+{
+	struct mount *mnt;
+	struct path stop, mntpt = {};
+	int ret = -EPERM;
+
+	if (!root)
+		get_fs_root(current->fs, &stop);
+	else
+		stop = *root;
+
+	rcu_read_lock();
+	lock_mount_hash();
+	mnt = idr_find(&mnt_id_ida, mnt_id);
+	if (!mnt)
+		goto out_unlock_mh;
+	if (mnt->mnt.mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
+		goto out_unlock_mh;
+	if (mnt_get_count(mnt) == 0)
+		goto out_unlock_mh;
+	mnt_add_count(mnt, 1);
+	mntpt.mnt = &mnt->mnt;
+	mntpt.dentry = dget(mnt->mnt.mnt_root);
+	unlock_mount_hash();
+
+	if (are_paths_connected(&stop, &mntpt)) {
+		*_mntpt = mntpt;
+		mntpt.mnt = NULL;
+		mntpt.dentry = NULL;
+		ret = 0;
+	}
+
+out_unlock:
+	rcu_read_unlock();
+	if (!root)
+		path_put(&stop);
+	path_put(&mntpt);
+	return ret;
+
+out_unlock_mh:
+	unlock_mount_hash();
+	goto out_unlock;
+}
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index f40b5c0b5516..7efc1169738d 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -40,6 +40,7 @@ struct fsinfo_params {
 #define FSINFO_FLAGS_QUERY_TYPE	0x0007 /* What object should fsinfo() query? */
 #define FSINFO_FLAGS_QUERY_PATH	0x0000 /* - path, specified by dirfd,pathname,AT_EMPTY_PATH */
 #define FSINFO_FLAGS_QUERY_FD	0x0001 /* - fd specified by dirfd */
+#define FSINFO_FLAGS_QUERY_MOUNT 0x0002	/* - mount object (path=>mount_id, dirfd=>subtree) */
 	__u32	request;	/* ID of requested attribute */
 	__u32	Nth;		/* Instance of it (some may have multiple) */
 	__u32	Mth;		/* Subinstance of Nth instance */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index d6ec5713364f..5bb4e817e5d7 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -560,16 +560,22 @@ int main(int argc, char **argv)
 	bool meta = false;
 	int raw = 0, opt, Nth, Mth;
 
-	while ((opt = getopt(argc, argv, "adlmr"))) {
+	while ((opt = getopt(argc, argv, "Madlmr"))) {
 		switch (opt) {
+		case 'M':
+			params.at_flags = 0;
+			params.flags = FSINFO_FLAGS_QUERY_MOUNT;
+			continue;
 		case 'a':
 			params.at_flags |= AT_NO_AUTOMOUNT;
+			params.flags |= FSINFO_FLAGS_QUERY_PATH;
 			continue;
 		case 'd':
 			debug = true;
 			continue;
 		case 'l':
 			params.at_flags &= ~AT_SYMLINK_NOFOLLOW;
+			params.flags |= FSINFO_FLAGS_QUERY_PATH;
 			continue;
 		case 'm':
 			meta = true;
@@ -585,7 +591,8 @@ int main(int argc, char **argv)
 	argv += optind;
 
 	if (argc != 1) {
-		printf("Format: test-fsinfo [-alr] <file>\n");
+		printf("Format: test-fsinfo [-adlr] <file>\n");
+		printf("Format: test-fsinfo [-dr] -M <mnt_id>\n");
 		exit(2);
 	}
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 07/19] vfs: Allow mount information to be queried by fsinfo() [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (5 preceding siblings ...)
  2020-02-18 17:05 ` [PATCH 06/19] vfs: Allow fsinfo() to look up a mount object by " David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-18 17:05 ` [PATCH 08/19] vfs: fsinfo sample: Mount listing program " David Howells
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Allow mount information, including information about the topology tree to
be queried with the fsinfo() system call.  Setting AT_FSINFO_QUERY_MOUNT
allows overlapping mounts to be queried by indicating that the syscall
should interpet the pathname as a number indicating the mount ID.

To this end, four fsinfo() attributes are provided:

 (1) FSINFO_ATTR_MOUNT_INFO.

     This is a structure providing information about a mount, including:

	- Mounted superblock ID.
	- Mount ID (can be used with AT_FSINFO_QUERY_MOUNT).
	- Parent mount ID.
	- Mount attributes (eg. R/O, NOEXEC).
	- A change counter.

     Note that the parent mount ID is overridden to the ID of the queried
     mount if the parent lies outside of the chroot or dfd tree.

 (2) FSINFO_ATTR_MOUNT_DEVNAME.

     This a string providing the device name associated with the mount.

     Note that the device name may be a path that lies outside of the root.

 (3) FSINFO_ATTR_MOUNT_POINT.

     This is a string indicating the name of the mountpoint within the
     parent mount, limited to the parent's mounted root and the chroot.

 (4) FSINFO_ATTR_MOUNT_CHILDREN.

     This produces an array of structures, one for each child and capped
     with one for the argument mount (checked after listing all the
     children).  Each element contains the mount ID and the change counter
     of the respective mount object.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/d_path.c                 |    2 
 fs/fsinfo.c                 |    4 +
 fs/internal.h               |   10 ++
 fs/namespace.c              |  179 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fsinfo.h |   34 ++++++++
 samples/vfs/test-fsinfo.c   |   27 ++++++
 6 files changed, 255 insertions(+), 1 deletion(-)

diff --git a/fs/d_path.c b/fs/d_path.c
index 0f1fc1743302..4c203f64e45e 100644
--- a/fs/d_path.c
+++ b/fs/d_path.c
@@ -229,7 +229,7 @@ static int prepend_unreachable(char **buffer, int *buflen)
 	return prepend(buffer, buflen, "(unreachable)", 13);
 }
 
-static void get_fs_root_rcu(struct fs_struct *fs, struct path *root)
+void get_fs_root_rcu(struct fs_struct *fs, struct path *root)
 {
 	unsigned seq;
 
diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index ddc11cc40b45..b57fbcd3a7a5 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -296,6 +296,10 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_VSTRUCT	(FSINFO_ATTR_FSINFO,		fsinfo_generic_fsinfo),
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_attribute_info),
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_attributes),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
+	FSINFO_STRING	(FSINFO_ATTR_MOUNT_DEVNAME,	fsinfo_generic_mount_devname),
+	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
+	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
 	{}
 };
 
diff --git a/fs/internal.h b/fs/internal.h
index 2ccd2b2eae88..6804cf54846d 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -15,6 +15,7 @@ struct mount;
 struct shrink_control;
 struct fs_context;
 struct user_namespace;
+struct fsinfo_context;
 
 /*
  * block_dev.c
@@ -47,6 +48,11 @@ extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
  */
 extern void __init chrdev_init(void);
 
+/*
+ * d_path.c
+ */
+extern void get_fs_root_rcu(struct fs_struct *fs, struct path *root);
+
 /*
  * fs_context.c
  */
@@ -92,6 +98,10 @@ extern void __mnt_drop_write_file(struct file *);
 
 extern void dissolve_on_fput(struct vfsmount *);
 extern int lookup_mount_object(struct path *, int, struct path *);
+extern int fsinfo_generic_mount_info(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_devname(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_point(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_children(struct path *, struct fsinfo_context *);
 
 /*
  * fs_struct.c
diff --git a/fs/namespace.c b/fs/namespace.c
index c24d779e0095..e009dacc08d4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -30,6 +30,7 @@
 #include <uapi/linux/mount.h>
 #include <linux/fs_context.h>
 #include <linux/shmem_fs.h>
+#include <linux/fsinfo.h>
 
 #include "pnode.h"
 #include "internal.h"
@@ -4097,3 +4098,181 @@ int lookup_mount_object(struct path *root, int mnt_id, struct path *_mntpt)
 	unlock_mount_hash();
 	goto out_unlock;
 }
+
+#ifdef CONFIG_FSINFO
+/*
+ * Retrieve information about the nominated mount.
+ */
+int fsinfo_generic_mount_info(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_mount_info *p = ctx->buffer;
+	struct super_block *sb;
+	struct mount *m;
+	struct path root;
+	unsigned int flags;
+
+	if (!path->mnt)
+		return -ENODATA;
+
+	m = real_mount(path->mnt);
+	sb = m->mnt.mnt_sb;
+
+	p->f_sb_id		= sb->s_unique_id;
+	p->mnt_id		= m->mnt_id;
+	p->parent_id		= m->mnt_parent->mnt_id;
+	p->change_counter	= atomic_read(&m->mnt_change_counter);
+
+	get_fs_root(current->fs, &root);
+	if (path->mnt == root.mnt) {
+		p->parent_id = p->mnt_id;
+	} else {
+		rcu_read_lock();
+		if (!are_paths_connected(&root, path))
+			p->parent_id = p->mnt_id;
+		rcu_read_unlock();
+	}
+	if (IS_MNT_SHARED(m))
+		p->group_id = m->mnt_group_id;
+	if (IS_MNT_SLAVE(m)) {
+		int master = m->mnt_master->mnt_group_id;
+		int dom = get_dominating_id(m, &root);
+		p->master_id = master;
+		if (dom && dom != master)
+			p->from_id = dom;
+	}
+	path_put(&root);
+
+	flags = READ_ONCE(m->mnt.mnt_flags);
+	if (flags & MNT_READONLY)
+		p->attr |= MOUNT_ATTR_RDONLY;
+	if (flags & MNT_NOSUID)
+		p->attr |= MOUNT_ATTR_NOSUID;
+	if (flags & MNT_NODEV)
+		p->attr |= MOUNT_ATTR_NODEV;
+	if (flags & MNT_NOEXEC)
+		p->attr |= MOUNT_ATTR_NOEXEC;
+	if (flags & MNT_NODIRATIME)
+		p->attr |= MOUNT_ATTR_NODIRATIME;
+
+	if (flags & MNT_NOATIME)
+		p->attr |= MOUNT_ATTR_NOATIME;
+	else if (flags & MNT_RELATIME)
+		p->attr |= MOUNT_ATTR_RELATIME;
+	else
+		p->attr |= MOUNT_ATTR_STRICTATIME;
+	return sizeof(*p);
+}
+
+int fsinfo_generic_mount_devname(struct path *path, struct fsinfo_context *ctx)
+{
+	if (!path->mnt)
+		return -ENODATA;
+
+	return fsinfo_string(real_mount(path->mnt)->mnt_devname, ctx);
+}
+
+/*
+ * Return the path of this mount relative to its parent and clipped to
+ * the current chroot.
+ */
+int fsinfo_generic_mount_point(struct path *path, struct fsinfo_context *ctx)
+{
+	struct mountpoint *mp;
+	struct mount *m, *parent;
+	struct path mountpoint, root;
+	size_t len;
+	void *p;
+
+	if (!path->mnt)
+		return -ENODATA;
+
+	rcu_read_lock();
+
+	m = real_mount(path->mnt);
+	parent = m->mnt_parent;
+	if (parent == m)
+		goto skip;
+	mp = READ_ONCE(m->mnt_mp);
+	if (mp)
+		goto found;
+skip:
+	rcu_read_unlock();
+	return -ENODATA;
+
+found:
+	mountpoint.mnt = &parent->mnt;
+	mountpoint.dentry = READ_ONCE(mp->m_dentry);
+
+	get_fs_root_rcu(current->fs, &root);
+	if (path->mnt == root.mnt) {
+		rcu_read_unlock();
+		len = snprintf(ctx->buffer, ctx->buf_size, "/");
+	} else {
+		if (root.mnt != &parent->mnt) {
+			root.mnt = &parent->mnt;
+			root.dentry = parent->mnt.mnt_root;
+		}
+
+		p = __d_path(&mountpoint, &root, ctx->buffer, ctx->buf_size);
+		rcu_read_unlock();
+
+		if (IS_ERR(p))
+			return PTR_ERR(p);
+		if (!p)
+			return -EPERM;
+
+		len = (ctx->buffer + ctx->buf_size) - p;
+		memmove(ctx->buffer, p, len);
+	}
+	return len;
+}
+
+/*
+ * Store a mount record into the fsinfo buffer.
+ */
+static void store_mount_fsinfo(struct fsinfo_context *ctx,
+			       struct fsinfo_mount_child *child)
+{
+	unsigned int usage = ctx->usage;
+	unsigned int total = sizeof(*child);
+
+	if (ctx->usage >= INT_MAX)
+		return;
+	ctx->usage = usage + total;
+	if (ctx->buffer && ctx->usage <= ctx->buf_size)
+		memcpy(ctx->buffer + usage, child, total);
+}
+
+/*
+ * Return information about the submounts relative to path.
+ */
+int fsinfo_generic_mount_children(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_mount_child record;
+	struct mount *m, *child;
+
+	if (!path->mnt)
+		return -ENODATA;
+
+	m = real_mount(path->mnt);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(child, &m->mnt_mounts, mnt_child) {
+		if (child->mnt_parent != m)
+			continue;
+		record.mnt_id = child->mnt_id;
+		record.change_counter = atomic_read(&child->mnt_change_counter);
+		store_mount_fsinfo(ctx, &record);
+	}
+	rcu_read_unlock();
+
+	/* End the list with a copy of the parameter mount's details so that
+	 * userspace can quickly check for changes.
+	 */
+	record.mnt_id = m->mnt_id;
+	record.change_counter = atomic_read(&m->mnt_change_counter);
+	store_mount_fsinfo(ctx, &record);
+	return ctx->usage;
+}
+
+#endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 7efc1169738d..2f67815c35af 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -28,6 +28,11 @@
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
 #define FSINFO_ATTR_FSINFO		0x102	/* Information about fsinfo() as a whole */
 
+#define FSINFO_ATTR_MOUNT_INFO		0x200	/* Mount object information */
+#define FSINFO_ATTR_MOUNT_DEVNAME	0x201	/* Mount object device name (string) */
+#define FSINFO_ATTR_MOUNT_POINT		0x202	/* Relative path of mount in parent (string) */
+#define FSINFO_ATTR_MOUNT_CHILDREN	0x203	/* Children of this mount (list) */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -82,6 +87,7 @@ struct fsinfo_attribute_info {
 	unsigned int		element_size;	/* - Element size (FSINFO_LIST) */
 };
 
+#define FSINFO_ATTR_FSINFO_ATTRIBUTES__STRUCT __u32
 #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO__STRUCT struct fsinfo_attribute_info
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES__STRUCT __u32
 
@@ -95,6 +101,34 @@ struct fsinfo_u128 {
 #endif
 };
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_MOUNT_INFO).
+ */
+struct fsinfo_mount_info {
+	__u64		f_sb_id;	/* Superblock ID */
+	__u32		mnt_id;		/* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */
+	__u32		parent_id;	/* Parent mount identifier */
+	__u32		group_id;	/* Mount group ID */
+	__u32		master_id;	/* Slave master group ID */
+	__u32		from_id;	/* Slave propagated from ID */
+	__u32		attr;		/* MOUNT_ATTR_* flags */
+	__u32		change_counter;	/* Number of changes applied. */
+	__u32		__reserved[1];
+};
+
+#define FSINFO_ATTR_MOUNT_INFO__STRUCT struct fsinfo_mount_info
+
+/*
+ * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_CHILDREN).
+ * - An extra element is placed on the end representing the parent mount.
+ */
+struct fsinfo_mount_child {
+	__u32		mnt_id;		/* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */
+	__u32		change_counter;	/* Number of changes applied to mount. */
+};
+
+#define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct fsinfo_mount_child
+
 /*
  * Information struct for fsinfo(FSINFO_ATTR_STATFS).
  * - This gives extended filesystem information.
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 5bb4e817e5d7..23a4d6d4c8b2 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -284,6 +284,26 @@ static void dump_fsinfo_generic_volume_uuid(void *reply, unsigned int size)
 	       f->uuid[14], f->uuid[15]);
 }
 
+static void dump_fsinfo_generic_mount_info(void *reply, unsigned int size)
+{
+	struct fsinfo_mount_info *f = reply;
+
+	printf("\n");
+	printf("\tsb_id   : %llx\n", (unsigned long long)f->f_sb_id);
+	printf("\tmnt_id  : %x\n", f->mnt_id);
+	printf("\tparent  : %x\n", f->parent_id);
+	printf("\tgroup   : %x\n", f->group_id);
+	printf("\tattr    : %x\n", f->attr);
+	printf("\tchanges : %x\n", f->change_counter);
+}
+
+static void dump_fsinfo_generic_mount_child(void *reply, unsigned int size)
+{
+	struct fsinfo_mount_child *f = reply;
+
+	printf("%8x %8x\n", f->mnt_id, f->change_counter);
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -311,6 +331,8 @@ static void dump_string(void *reply, unsigned int size)
 
 #define dump_fsinfo_generic_volume_id		dump_string
 #define dump_fsinfo_generic_volume_name		dump_string
+#define dump_fsinfo_generic_mount_devname	dump_string
+#define dump_fsinfo_generic_mount_point		dump_string
 
 /*
  *
@@ -346,6 +368,11 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	fsinfo_generic_volume_name),
+
+	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
+	FSINFO_STRING	(FSINFO_ATTR_MOUNT_DEVNAME,	fsinfo_generic_mount_devname),
+	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_child),
+	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
 	{}
 };
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 08/19] vfs: fsinfo sample: Mount listing program [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (6 preceding siblings ...)
  2020-02-18 17:05 ` [PATCH 07/19] vfs: Allow mount information to be queried by fsinfo() " David Howells
@ 2020-02-18 17:05 ` David Howells
  2020-02-18 17:06 ` [PATCH 09/19] fsinfo: Allow the mount topology propogation flags to be retrieved " David Howells
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:05 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Implement a program to demonstrate mount listing using the new fsinfo()
syscall, for example:

# ./test-mntinfo -M 21
MOUNT                                 MOUNT ID   CHANGE#    TYPE & DEVICE
------------------------------------- ---------- ---------- ---------------
21                                            21          8 sysfs 0:15
 \_ kernel/security                           24          0 securityfs 0:8
 \_ fs/cgroup                                 28         16 tmpfs 0:19
 |   \_ unified                               29          0 cgroup2 0:1a
 |   \_ systemd                               30          0 cgroup 0:1b
 |   \_ freezer                               34          0 cgroup 0:1f
 |   \_ cpu,cpuacct                           35          0 cgroup 0:20
 |   \_ devices                               36          0 cgroup 0:21
 |   \_ memory                                37          0 cgroup 0:22
 |   \_ cpuset                                38          0 cgroup 0:23
 |   \_ net_cls,net_prio                      39          0 cgroup 0:24
 |   \_ hugetlb                               40          0 cgroup 0:25
 |   \_ rdma                                  41          0 cgroup 0:26
 |   \_ blkio                                 42          0 cgroup 0:27
 |   \_ perf_event                            43          0 cgroup 0:28
 \_ fs/pstore                                 31          0 pstore 0:1c
 \_ firmware/efi/efivars                      32          0 efivarfs 0:1d
 \_ fs/bpf                                    33          0 bpf 0:1e
 \_ kernel/config                             92          0 configfs 0:10
 \_ fs/selinux                                44          0 selinuxfs 0:12
 \_ kernel/debug                              48          0 debugfs 0:7

Signed-off-by: David Howells <dhowells@redhat.com>
---

 samples/vfs/Makefile       |    2 
 samples/vfs/test-mntinfo.c |  243 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 245 insertions(+)
 create mode 100644 samples/vfs/test-mntinfo.c

diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
index 9159ad1d7fc5..19be60ab950e 100644
--- a/samples/vfs/Makefile
+++ b/samples/vfs/Makefile
@@ -4,12 +4,14 @@
 hostprogs := \
 	test-fsinfo \
 	test-fsmount \
+	test-mntinfo \
 	test-statx
 
 always-y := $(hostprogs)
 
 HOSTCFLAGS_test-fsinfo.o += -I$(objtree)/usr/include
 HOSTLDLIBS_test-fsinfo += -static -lm
+HOSTCFLAGS_test-mntinfo.o += -I$(objtree)/usr/include
 
 HOSTCFLAGS_test-fsmount.o += -I$(objtree)/usr/include
 HOSTCFLAGS_test-statx.o += -I$(objtree)/usr/include
diff --git a/samples/vfs/test-mntinfo.c b/samples/vfs/test-mntinfo.c
new file mode 100644
index 000000000000..f4d90d0671c5
--- /dev/null
+++ b/samples/vfs/test-mntinfo.c
@@ -0,0 +1,243 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Test the fsinfo() system call
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define _GNU_SOURCE
+#define _ATFILE_SOURCE
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+#include <ctype.h>
+#include <errno.h>
+#include <time.h>
+#include <math.h>
+#include <sys/syscall.h>
+#include <linux/fsinfo.h>
+#include <linux/socket.h>
+#include <linux/fcntl.h>
+#include <sys/stat.h>
+#include <arpa/inet.h>
+
+#ifndef __NR_fsinfo
+#define __NR_fsinfo -1
+#endif
+
+static __attribute__((unused))
+ssize_t fsinfo(int dfd, const char *filename, struct fsinfo_params *params,
+	       void *buffer, size_t buf_size)
+{
+	return syscall(__NR_fsinfo, dfd, filename, params, buffer, buf_size);
+}
+
+static char tree_buf[4096];
+static char bar_buf[4096];
+
+/*
+ * Get an fsinfo attribute in a statically allocated buffer.
+ */
+static void get_attr(unsigned int mnt_id, unsigned int attr,
+		     void *buf, size_t buf_size)
+{
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
+		.request	= attr,
+	};
+	char file[32];
+	long ret;
+
+	sprintf(file, "%u", mnt_id);
+
+	memset(buf, 0xbd, buf_size);
+
+	ret = fsinfo(AT_FDCWD, file, &params, buf, buf_size);
+	if (ret == -1) {
+		fprintf(stderr, "mount-%s: %m\n", file);
+		exit(1);
+	}
+}
+
+/*
+ * Get an fsinfo attribute in a dynamically allocated buffer.
+ */
+static void *get_attr_alloc(unsigned int mnt_id, unsigned int attr,
+			    unsigned int Nth, size_t *_size)
+{
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
+		.request	= attr,
+		.Nth		= Nth,
+	};
+	size_t buf_size = 4096;
+	char file[32];
+	void *r;
+	long ret;
+
+	sprintf(file, "%u", mnt_id);
+
+	for (;;) {
+		r = malloc(buf_size);
+		if (!r) {
+			perror("malloc");
+			exit(1);
+		}
+		memset(r, 0xbd, buf_size);
+
+		ret = fsinfo(AT_FDCWD, file, &params, r, buf_size);
+		if (ret == -1) {
+			fprintf(stderr, "mount-%s: %x,%x,%x %m\n",
+				file, params.request, params.Nth, params.Mth);
+			exit(1);
+		}
+
+		if (ret <= buf_size) {
+			*_size = ret;
+			break;
+		}
+		buf_size = (ret + 4096 - 1) & ~(4096 - 1);
+	}
+
+	return r;
+}
+
+/*
+ * Display a mount and then recurse through its children.
+ */
+static void display_mount(unsigned int mnt_id, unsigned int depth, char *path)
+{
+	struct fsinfo_mount_child *children;
+	struct fsinfo_mount_info info;
+	struct fsinfo_ids ids;
+	unsigned int d;
+	size_t ch_size, p_size;
+	char dev[64];
+	int i, n, s;
+
+	get_attr(mnt_id, FSINFO_ATTR_MOUNT_INFO, &info, sizeof(info));
+	get_attr(mnt_id, FSINFO_ATTR_IDS, &ids, sizeof(ids));
+	if (depth > 0)
+		printf("%s", tree_buf);
+
+	s = strlen(path);
+	printf("%s", !s ? "\"\"" : path);
+	if (!s)
+		s += 2;
+	s += depth;
+	if (s < 38)
+		s = 38 - s;
+	else
+		s = 1;
+	printf("%*.*s", s, s, "");
+
+	sprintf(dev, "%x:%x", ids.f_dev_major, ids.f_dev_minor);
+	printf("%10u %8x %2x %5s %s",
+	       info.mnt_id, info.change_counter,
+	       info.attr,
+	       dev, ids.f_fs_name);
+	putchar('\n');
+
+	children = get_attr_alloc(mnt_id, FSINFO_ATTR_MOUNT_CHILDREN, 0, &ch_size);
+	n = ch_size / sizeof(children[0]) - 1;
+
+	bar_buf[depth + 1] = '|';
+	if (depth > 0) {
+		tree_buf[depth - 4 + 1] = bar_buf[depth - 4 + 1];
+		tree_buf[depth - 4 + 2] = ' ';
+	}
+
+	tree_buf[depth + 0] = ' ';
+	tree_buf[depth + 1] = '\\';
+	tree_buf[depth + 2] = '_';
+	tree_buf[depth + 3] = ' ';
+	tree_buf[depth + 4] = 0;
+	d = depth + 4;
+
+	for (i = 0; i < n; i++) {
+		if (i == n - 1)
+			bar_buf[depth + 1] = ' ';
+		path = get_attr_alloc(children[i].mnt_id, FSINFO_ATTR_MOUNT_POINT,
+				      0, &p_size);
+		display_mount(children[i].mnt_id, d, path + 1);
+		free(path);
+	}
+
+	free(children);
+	if (depth > 0) {
+		tree_buf[depth - 4 + 1] = '\\';
+		tree_buf[depth - 4 + 2] = '_';
+	}
+	tree_buf[depth] = 0;
+}
+
+/*
+ * Find the ID of whatever is at the nominated path.
+ */
+static unsigned int lookup_mnt_by_path(const char *path)
+{
+	struct fsinfo_mount_info mnt;
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_PATH,
+		.request	= FSINFO_ATTR_MOUNT_INFO,
+	};
+
+	if (fsinfo(AT_FDCWD, path, &params, &mnt, sizeof(mnt)) == -1) {
+		perror(path);
+		exit(1);
+	}
+
+	return mnt.mnt_id;
+}
+
+/*
+ *
+ */
+int main(int argc, char **argv)
+{
+	unsigned int mnt_id;
+	char *path;
+	bool use_mnt_id = false;
+	int opt;
+
+	while ((opt = getopt(argc, argv, "M"))) {
+		switch (opt) {
+		case 'M':
+			use_mnt_id = true;
+			continue;
+		}
+		break;
+	}
+
+	argc -= optind;
+	argv += optind;
+
+	switch (argc) {
+	case 0:
+		mnt_id = lookup_mnt_by_path("/");
+		path = "ROOT";
+		break;
+	case 1:
+		path = argv[0];
+		if (use_mnt_id) {
+			mnt_id = strtoul(argv[0], NULL, 0);
+			break;
+		}
+
+		mnt_id = lookup_mnt_by_path(argv[0]);
+		break;
+	default:
+		printf("Format: test-mntinfo\n");
+		printf("Format: test-mntinfo <path>\n");
+		printf("Format: test-mntinfo -M <mnt_id>\n");
+		exit(2);
+	}
+
+	printf("MOUNT                                 MOUNT ID   CHANGE#  AT DEV   TYPE\n");
+	printf("------------------------------------- ---------- -------- -- ----- --------\n");
+	display_mount(mnt_id, 0, path);
+	return 0;
+}



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 09/19] fsinfo: Allow the mount topology propogation flags to be retrieved [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (7 preceding siblings ...)
  2020-02-18 17:05 ` [PATCH 08/19] vfs: fsinfo sample: Mount listing program " David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-18 17:06 ` [PATCH 10/19] fsinfo: Add API documentation " David Howells
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Allow the mount topology propogation flags to be retrieved as part of the
FSINFO_ATTR_MOUNT_INFO attributes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/namespace.c              |    7 ++++++-
 include/uapi/linux/fsinfo.h |    2 +-
 include/uapi/linux/mount.h  |   10 +++++++++-
 samples/vfs/test-fsinfo.c   |    1 +
 samples/vfs/test-mntinfo.c  |    8 ++++----
 5 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e009dacc08d4..184c1aaf669a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4131,15 +4131,20 @@ int fsinfo_generic_mount_info(struct path *path, struct fsinfo_context *ctx)
 			p->parent_id = p->mnt_id;
 		rcu_read_unlock();
 	}
-	if (IS_MNT_SHARED(m))
+	if (IS_MNT_SHARED(m)) {
 		p->group_id = m->mnt_group_id;
+		p->propagation |= MOUNT_PROPAGATION_SHARED;
+	}
 	if (IS_MNT_SLAVE(m)) {
 		int master = m->mnt_master->mnt_group_id;
 		int dom = get_dominating_id(m, &root);
 		p->master_id = master;
 		if (dom && dom != master)
 			p->from_id = dom;
+		p->propagation |= MOUNT_PROPAGATION_SLAVE;
 	}
+	if (IS_MNT_UNBINDABLE(m))
+		p->propagation |= MOUNT_PROPAGATION_UNBINDABLE;
 	path_put(&root);
 
 	flags = READ_ONCE(m->mnt.mnt_flags);
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 2f67815c35af..bf12900455b8 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -113,7 +113,7 @@ struct fsinfo_mount_info {
 	__u32		from_id;	/* Slave propagated from ID */
 	__u32		attr;		/* MOUNT_ATTR_* flags */
 	__u32		change_counter;	/* Number of changes applied. */
-	__u32		__reserved[1];
+	__u32		propagation;	/* MOUNT_PROPAGATION_* flags */
 };
 
 #define FSINFO_ATTR_MOUNT_INFO__STRUCT struct fsinfo_mount_info
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 96a0240f23fe..39e50fe9d8d9 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -105,7 +105,7 @@ enum fsconfig_command {
 #define FSMOUNT_CLOEXEC		0x00000001
 
 /*
- * Mount attributes.
+ * Mount object attributes (these are separate to filesystem attributes).
  */
 #define MOUNT_ATTR_RDONLY	0x00000001 /* Mount read-only */
 #define MOUNT_ATTR_NOSUID	0x00000002 /* Ignore suid and sgid bits */
@@ -117,4 +117,12 @@ enum fsconfig_command {
 #define MOUNT_ATTR_STRICTATIME	0x00000020 /* - Always perform atime updates */
 #define MOUNT_ATTR_NODIRATIME	0x00000080 /* Do not update directory access times */
 
+/*
+ * Mount object propogation attributes.
+ */
+#define MOUNT_PROPAGATION_UNBINDABLE	0x00000001 /* Mount is unbindable */
+#define MOUNT_PROPAGATION_SLAVE		0x00000002 /* Mount is slave */
+#define MOUNT_PROPAGATION_PRIVATE	0x00000000 /* Mount is private (ie. not shared) */
+#define MOUNT_PROPAGATION_SHARED	0x00000004 /* Mount is shared */
+
 #endif /* _UAPI_LINUX_MOUNT_H */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 23a4d6d4c8b2..1411cadc4a90 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -293,6 +293,7 @@ static void dump_fsinfo_generic_mount_info(void *reply, unsigned int size)
 	printf("\tmnt_id  : %x\n", f->mnt_id);
 	printf("\tparent  : %x\n", f->parent_id);
 	printf("\tgroup   : %x\n", f->group_id);
+	printf("\tpropag  : %x\n", f->propagation);
 	printf("\tattr    : %x\n", f->attr);
 	printf("\tchanges : %x\n", f->change_counter);
 }
diff --git a/samples/vfs/test-mntinfo.c b/samples/vfs/test-mntinfo.c
index f4d90d0671c5..5a3d6b917447 100644
--- a/samples/vfs/test-mntinfo.c
+++ b/samples/vfs/test-mntinfo.c
@@ -135,9 +135,9 @@ static void display_mount(unsigned int mnt_id, unsigned int depth, char *path)
 	printf("%*.*s", s, s, "");
 
 	sprintf(dev, "%x:%x", ids.f_dev_major, ids.f_dev_minor);
-	printf("%10u %8x %2x %5s %s",
+	printf("%10u %8x %2x %x %5s %s",
 	       info.mnt_id, info.change_counter,
-	       info.attr,
+	       info.attr, info.propagation,
 	       dev, ids.f_fs_name);
 	putchar('\n');
 
@@ -236,8 +236,8 @@ int main(int argc, char **argv)
 		exit(2);
 	}
 
-	printf("MOUNT                                 MOUNT ID   CHANGE#  AT DEV   TYPE\n");
-	printf("------------------------------------- ---------- -------- -- ----- --------\n");
+	printf("MOUNT                                 MOUNT ID   CHANGE#  AT P DEV   TYPE\n");
+	printf("------------------------------------- ---------- -------- -- - ----- --------\n");
 	display_mount(mnt_id, 0, path);
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 10/19] fsinfo: Add API documentation [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (8 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 09/19] fsinfo: Allow the mount topology propogation flags to be retrieved " David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-18 17:06 ` [PATCH 11/19] afs: Support fsinfo() " David Howells
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add API documentation for fsinfo.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 Documentation/filesystems/fsinfo.rst |  490 ++++++++++++++++++++++++++++++++++
 1 file changed, 490 insertions(+)
 create mode 100644 Documentation/filesystems/fsinfo.rst

diff --git a/Documentation/filesystems/fsinfo.rst b/Documentation/filesystems/fsinfo.rst
new file mode 100644
index 000000000000..044eebd3493c
--- /dev/null
+++ b/Documentation/filesystems/fsinfo.rst
@@ -0,0 +1,490 @@
+============================
+Filesystem Information Query
+============================
+
+The fsinfo() system call allows the querying of filesystem and filesystem
+security information beyond what stat(), statx() and statfs() can obtain.  It
+does not require a file to be opened as does ioctl().
+
+fsinfo() may be called with a path, with open file descriptor or a with a mount
+object identifier.
+
+The fsinfo() system call needs to be configured on by enabling:
+
+	"File systems"/"Enable the fsinfo() system call" (CONFIG_FSINFO)
+
+This document has the following sections:
+
+.. contents:: :local:
+
+
+Overview
+========
+
+The fsinfo() system call retrieves one of a number of attributes, the IDs of
+which can be found in include/uapi/linux/fsinfo.h::
+
+	FSINFO_ATTR_STATFS	- statfs()-style state
+	FSINFO_ATTR_IDS		- Filesystem IDs
+	FSINFO_ATTR_LIMITS	- Filesystem limits
+	...
+	FSINFO_ATTR_FSINFO	- Information about fsinfo() itself
+	...
+	FSINFO_ATTR_MOUNT_INFO	- Information about the mount topology
+	...
+
+Each attribute can have zero or more values, which can be of one of the
+following types:
+
+ * ``VStruct``.  This is a structure with a version-dependent length.  New
+   versions of the kernel may append more fields, though they are not
+   permitted to remove or replace old ones.
+
+   Older applications, expecting an older version of the field, can ask for a
+   shorter struct and will only get the fields they requested; newer
+   applications running on an older kernel will get the extra fields they
+   requested filled with zeros.  Either way, the system call returns the size
+   of the internal struct, regardless of how much data it returned.
+
+   This allows for struct-type fields to be extended in future.
+
+ * ``String``.  This is a variable-length string of up to 4096 characters (no
+   NUL character is included).  The returned string will be truncated if the
+   output buffer is too small.  The total size of the string is returned,
+   regardless of any truncation.
+
+ * ``Opaque``.  This is a variable-length blob of indeterminate structure.  It
+   may be up to INT_MAX bytes in size.
+
+ * ``List``.  This is a variable-length list of fixed-size structures.  The
+   element size may not vary over time, so the element format must be designed
+   with care.  The maximum length is INT_MAX bytes, though this depends on the
+   kernel being able to allocate an internal buffer large enough.
+
+Value type is an inherent propery of an attribute and all the values of an
+attribute must be of that type.  Each attribute can have a single value, a
+sequence of values or a sequence-of-sequences of values.
+
+
+Filesystem API
+==============
+
+If the filesystem wishes to provide a list of queryable attributes, it should
+set the table pointer in the superblock::
+
+	const struct fsinfo_attribute *fsinfo_attributes;
+
+terminating it with a blank entry.  Each entry is a ``struct fsinfo_attribute``
+and these can be created with a set of helper macros::
+
+	FSINFO_VSTRUCT(A,G)
+	FSINFO_VSTRUCT_N(A,G)
+	FSINFO_VSTRUCT_NM(A,G)
+	FSINFO_STRING(A,G)
+	FSINFO_STRING_N(A,G)
+	FSINFO_STRING_NM(A,G)
+	FSINFO_OPAQUE(A,G)
+	FSINFO_LIST(A,G)
+	FSINFO_LIST_N(A,G)
+
+The names of the macro are a combination of type (vstruct, string, opaque and
+list) and an optional qualifier, if the attribute has N values or N lots of M
+values.  ``A`` is the name of the attribute and ``G`` is a function to get a
+value for that attribute.
+
+For vstruct- and list-type attributes, it is expected that there is a macro
+defined with the name ``A##__STRUCT`` that indicates the structure or element
+type.
+
+The get function needs to match the following type::
+
+	int (*get)(struct path *path, struct fsinfo_context *ctx);
+
+where "path" indicates the object to be queried and ctx is a context describing
+the parameters and the output buffer.  The function should return the total
+size of the data it would like to produce or an error.
+
+The parameter struct looks like::
+
+	struct fsinfo_context {
+		__u32		requested_attr;
+		__u32		Nth;
+		__u32		Mth;
+		bool		want_size_only;
+		unsigned int	buf_size;
+		unsigned int	usage;
+		void		*buffer;
+		...
+	};
+
+The fields relevant to the filesystem are as follows:
+
+ * ``requested_attr``
+
+   Which attribute is being requested.  EOPNOTSUPP should be returned if the
+   attribute is not supported by the filesystem or the LSM.
+
+ * ``Nth`` and ``Mth``
+
+   Which value of an attribute is being requested.
+
+   For a single-value attribute Nth and Mth will both be 0.
+
+   For a "1D" attribute, Nth will indicate which value and Mth will always
+   be 0.  Take, for example, FSINFO_ATTR_SERVER_NAME - for a network
+   filesystem, the superblock will be backed by a number of servers.  This will
+   return the name of the Nth server.  ENODATA will be returned if Nth goes
+   beyond the end of the array.
+
+   For a "2D" attribute, Mth will indicate the index in the Nth set of values.
+   Take, for example, an attribute for a network filesystems that returns
+   server addresses - each server may have one or more addresses.  This could
+   return the Mth address of the Nth server.  ENODATA should be returned if the
+   Nth set doesn't exist or the Mth element of the Nth set doesn't exist.
+
+ * ``want_size_only``
+
+   Is set to true if the caller only wants the size of the value so that the
+   get function doesn't have to make expensive calculations or calls to
+   retrieve the value.
+
+ * ``buf_size``
+
+   This indicates the current size of the buffer.  For the list type and the
+   opaque type this will be increased if the current buffer won't hold the
+   value and the filesystem will be called again.
+
+ * ``usage``
+
+   This indicates how much of the buffer has been used so far for an list or
+   opaque type attribute.  This is updated by the fsinfo_note_param*()
+   functions.
+
+ * ``buffer``
+
+   This points to the output buffer.  For struct- and string-type attributes it
+   will always be big enough; for list- and opaque-type, it will be buf_size in
+   size and will be resized if the returned size is larger than this.
+
+To simplify filesystem code, there will always be at least a minimal buffer
+available if the ->fsinfo() method gets called - and the filesystem should
+always write what it can into the buffer.  It's possible that the fsinfo()
+system call will then throw the contents away and just return the length.
+
+
+Helper Functions
+================
+
+The API includes a number of helper functions:
+
+ * ``void fsinfo_set_feature(struct fsinfo_features *ft,
+			     enum fsinfo_feature feature);``
+
+   This function sets a feature flag.
+
+ * ``void fsinfo_clear_feature(struct fsinfo_features *ft,
+			       enum fsinfo_feature feature);``
+
+   This function clears a feature flag.
+
+ * ``void fsinfo_set_unix_features(struct fsinfo_features *ft);``
+
+   Set feature flags appropriate to the features of a standard UNIX filesystem,
+   such as having numeric UIDS and GIDS; allowing the creation of directories,
+   symbolic links, hard links, device files, FIFO and socket files; permitting
+   sparse files; and having access, change and modification times.
+
+
+Attribute Summary
+=================
+
+To summarise the attributes that are defined::
+
+  Symbolic name				Type
+  =====================================	===============
+  FSINFO_ATTR_STATFS			vstruct
+  FSINFO_ATTR_IDS			vstruct
+  FSINFO_ATTR_LIMITS			vstruct
+  FSINFO_ATTR_SUPPORTS			vstruct
+  FSINFO_ATTR_FEATURES			vstruct
+  FSINFO_ATTR_TIMESTAMP_INFO		vstruct
+  FSINFO_ATTR_VOLUME_ID			string
+  FSINFO_ATTR_VOLUME_UUID		vstruct
+  FSINFO_ATTR_VOLUME_NAME		string
+  FSINFO_ATTR_NAME_ENCODING		string
+  FSINFO_ATTR_NAME_CODEPAGE		string
+  FSINFO_ATTR_FSINFO			vstruct
+  FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO	vstruct
+  FSINFO_ATTR_FSINFO_ATTRIBUTES		list
+  FSINFO_ATTR_MOUNT_INFO		vstruct
+  FSINFO_ATTR_MOUNT_DEVNAME		string
+  FSINFO_ATTR_MOUNT_POINT		string
+  FSINFO_ATTR_MOUNT_CHILDREN		list
+  FSINFO_ATTR_AFS_CELL_NAME		string
+  FSINFO_ATTR_AFS_SERVER_NAME		N × string
+  FSINFO_ATTR_AFS_SERVER_ADDRESS	N × struct
+
+
+Attribute Catalogue
+===================
+
+A number of the attributes convey information about a filesystem superblock:
+
+ *  ``FSINFO_ATTR_STATFS``
+
+    This struct-type attribute gives most of the equivalent data to statfs(),
+    but with all the fields as unconditional 64-bit or 128-bit integers.  Note
+    that static data like IDs that don't change are retrieved with
+    FSINFO_ATTR_IDS instead.
+
+    Further, superblock flags (such as MS_RDONLY) are not exposed by this
+    attribute; rather the parameters must be listed and the attributes picked
+    out from that.
+
+ *  ``FSINFO_ATTR_IDS``
+
+    This struct-type attribute conveys various identifiers used by the target
+    filesystem.  This includes the filesystem name, the NFS filesystem ID, the
+    superblock ID used in notifications, the filesystem magic type number and
+    the primary device ID.
+
+ *  ``FSINFO_ATTR_LIMITS``
+
+    This struct-type attribute conveys the limits on various aspects of a
+    filesystem, such as maximum file, symlink and xattr sizes, maxiumm filename
+    and xattr name length, maximum number of symlinks, maximum device major and
+    minor numbers and maximum UID, GID and project ID numbers.
+
+ *  ``FSINFO_ATTR_SUPPORTS``
+
+    This struct-type attribute conveys information about the support the
+    filesystem has for various UAPI features of a filesystem.  This includes
+    information about which bits are supported in various masks employed by the
+    statx system call, what FS_IOC_* flags are supported by ioctls and what
+    DOS/Windows file attribute flags are supported.
+
+ *  ``FSINFO_ATTR_TIMESTAMP_INFO``
+
+    This struct-type attribute conveys information about the resolution and
+    range of the timestamps available in a filesystem.  The resolutions are
+    given as a mantissa and exponent (resolution = mantissa * 10^exponent
+    seconds), where the exponent can be negative to indicate a sub-second
+    resolution (-9 being nanoseconds, for example).
+
+ *  ``FSINFO_ATTR_VOLUME_ID``
+
+    This is a string-type attribute that conveys the superblock identifier for
+    the volume.  By default it will be filled in from the contents of s_id from
+    the superblock.  For a block-based filesystem, for example, this might be
+    the name of the primary block device.
+
+ *  ``FSINFO_ATTR_VOLUME_UUID``
+
+    This is a struct-type attribute that conveys the UUID identifier for the
+    volume.  By default it will be filled in from the contents of s_uuid from
+    the superblock.  If this doesn't exist, it will be an entirely zeros.
+
+ *  ``FSINFO_ATTR_VOLUME_NAME``
+
+    This is a string-type attribute that conveys the name of the volume.  By
+    default it will return EOPNOTSUPP.  For a disk-based filesystem, it might
+    convey the partition label; for a network-based filesystem, it might convey
+    the name of the remote volume.
+
+ *  ``FSINFO_ATTR_FEATURES``
+
+    This is a special attribute, being a set of single-bit feature flags,
+    formatted as struct-type attribute.  The meanings of the feature bits are
+    listed below - see the "Feature Bit Catalogue" section.  The feature bits
+    are grouped numerically into bytes, such that features 0-7 are in byte 0,
+    8-15 are in byte 1, 16-23 in byte 2 and so on.
+
+    Any feature bit that's not supported by the kernel will be set to false if
+    asked for.  The highest supported feature can be obtained from attribute
+    "FSINFO_ATTR_FSINFO".
+
+
+Some attributes give information about fsinfo itself:
+
+ *  ``FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO``
+
+    This struct-type attribute gives metadata about the attribute with the ID
+    specified by the Nth parameter, including its type, default size and
+    element size.
+
+ *  ``FSINFO_ATTR_FSINFO_ATTRIBUTES``
+
+    This list-type attribute gives a list of the attribute IDs available at the
+    point of reference.  FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO can then be used to
+    query each attribute.
+
+ *  ``FSINFO_ATTR_FSINFO``
+
+    This struct-type attribute gives information about the fsinfo() system call
+    itself, including the maximum number of feature bits supported.
+
+
+Then there are filesystem-specific attributes, e.g.:
+
+ *  ``FSINFO_ATTR_AFS_CELL_NAME``
+
+    This is a string-type attribute that retrieves the AFS cell name of the
+    target object.
+
+ *  ``FSINFO_ATTR_AFS_SERVER_NAME``
+
+    This is a string-type attribute that conveys the name of the Nth server
+    backing a network-filesystem superblock.
+
+ *  ``FSINFO_ATTR_AFS_SERVER_ADDRESSES``
+
+    This is a list-type attribute that conveys the Mth address of the Nth
+    server, as returned by FSINFO_ATTR_SERVER_NAME.
+
+
+Feature Bit Catalogue
+=====================
+
+The feature bits convey single true/false assertions about a specific instance
+of a filesystem (ie. a specific superblock).  They are accessed using the
+"FSINFO_ATTR_FEATURE" attribute:
+
+ *  ``FSINFO_FEAT_IS_KERNEL_FS``
+ *  ``FSINFO_FEAT_IS_BLOCK_FS``
+ *  ``FSINFO_FEAT_IS_FLASH_FS``
+ *  ``FSINFO_FEAT_IS_NETWORK_FS``
+ *  ``FSINFO_FEAT_IS_AUTOMOUNTER_FS``
+ *  ``FSINFO_FEAT_IS_MEMORY_FS``
+
+    These indicate what kind of filesystem the target is: kernel API (proc),
+    block-based (ext4), flash/nvm-based (jffs2), remote over the network (NFS),
+    local quasi-filesystem that acts as a tray of mountpoints (autofs), plain
+    in-memory filesystem (shmem).
+
+ *  ``FSINFO_FEAT_AUTOMOUNTS``
+
+    This indicate if a filesystem may have objects that are automount points.
+
+ *  ``FSINFO_FEAT_ADV_LOCKS``
+ *  ``FSINFO_FEAT_MAND_LOCKS``
+ *  ``FSINFO_FEAT_LEASES``
+
+    These indicate if a filesystem supports advisory locks, mandatory locks or
+    leases.
+
+ *  ``FSINFO_FEAT_UIDS``
+ *  ``FSINFO_FEAT_GIDS``
+ *  ``FSINFO_FEAT_PROJIDS``
+
+    These indicate if a filesystem supports/stores/transports numeric user IDs,
+    group IDs or project IDs.  The "FSINFO_ATTR_LIMITS" attribute can be used
+    to find out the upper limits on the IDs values.
+
+ *  ``FSINFO_FEAT_STRING_USER_IDS``
+
+    This indicates if a filesystem supports/stores/transports string user
+    identifiers.
+
+ *  ``FSINFO_FEAT_GUID_USER_IDS``
+
+    This indicates if a filesystem supports/stores/transports Windows GUIDs as
+    user identifiers (eg. ntfs).
+
+ *  ``FSINFO_FEAT_WINDOWS_ATTRS``
+
+    This indicates if a filesystem supports Windows FILE_* attribute bits
+    (eg. cifs, jfs).  The "FSINFO_ATTR_SUPPORTS" attribute can be used to find
+    out which windows file attributes are supported by the filesystem.
+
+ *  ``FSINFO_FEAT_USER_QUOTAS``
+ *  ``FSINFO_FEAT_GROUP_QUOTAS``
+ *  ``FSINFO_FEAT_PROJECT_QUOTAS``
+
+    These indicate if a filesystem supports quotas for users, groups or
+    projects.
+
+ *  ``FSINFO_FEAT_XATTRS``
+
+    These indicate if a filesystem supports extended attributes.  The
+    "FSINFO_ATTR_LIMITS" attribute can be used to find out the upper limits on
+    the supported name and body lengths.
+
+ *  ``FSINFO_FEAT_JOURNAL``
+ *  ``FSINFO_FEAT_DATA_IS_JOURNALLED``
+
+    These indicate whether the filesystem has a journal and whether data
+    changes are logged to it.
+
+ *  ``FSINFO_FEAT_O_SYNC``
+ *  ``FSINFO_FEAT_O_DIRECT``
+
+    These indicate whether the filesystem supports the O_SYNC and O_DIRECT
+    flags.
+
+ *  ``FSINFO_FEAT_VOLUME_ID``
+ *  ``FSINFO_FEAT_VOLUME_UUID``
+ *  ``FSINFO_FEAT_VOLUME_NAME``
+ *  ``FSINFO_FEAT_VOLUME_FSID``
+
+    These indicate whether ID, UUID, name and FSID identifiers actually exist
+    in the filesystem and thus might be considered persistent.
+
+ *  ``FSINFO_FEAT_IVER_ALL_CHANGE``
+ *  ``FSINFO_FEAT_IVER_DATA_CHANGE``
+ *  ``FSINFO_FEAT_IVER_MONO_INCR``
+
+    These indicate whether i_version in the inode is supported and, if so, what
+    mode it operates in.  The first two indicate if it's changed for any data
+    or metadata change, or whether it's only changed for any data changes; the
+    last indicates whether or not it's monotonically increasing for each such
+    change.
+
+ *  ``FSINFO_FEAT_HARD_LINKS``
+ *  ``FSINFO_FEAT_HARD_LINKS_1DIR``
+
+    These indicate whether the filesystem can have hard links made in it, and
+    whether they can be made between directory or only within the same
+    directory.
+
+ *  ``FSINFO_FEAT_DIRECTORIES``
+ *  ``FSINFO_FEAT_SYMLINKS``
+ *  ``FSINFO_FEAT_DEVICE_FILES``
+ *  ``FSINFO_FEAT_UNIX_SPECIALS``
+
+    These indicate whether directories; symbolic links; device files; or pipes
+    and sockets can be made within the filesystem.
+
+ *  ``FSINFO_FEAT_RESOURCE_FORKS``
+
+    This indicates if the filesystem supports resource forks.
+
+ *  ``FSINFO_FEAT_NAME_CASE_INDEP``
+ *  ``FSINFO_FEAT_NAME_NON_UTF8``
+ *  ``FSINFO_FEAT_NAME_HAS_CODEPAGE``
+
+    These indicate if the filesystem supports case-independent file names,
+    whether the filenames are non-utf8 (see the "FSINFO_ATTR_NAME_ENCODING"
+    attribute) and whether a codepage is in use to transliterate them (see
+    the "FSINFO_ATTR_NAME_CODEPAGE" attribute).
+
+ *  ``FSINFO_FEAT_SPARSE``
+
+    This indicates if a filesystem supports sparse files.
+
+ *  ``FSINFO_FEAT_NOT_PERSISTENT``
+
+    This indicates if a filesystem is not persistent.
+
+ *  ``FSINFO_FEAT_NO_UNIX_MODE``
+
+    This indicates if a filesystem doesn't support UNIX mode bits (though they
+    may be manufactured from other bits, such as Windows file attribute flags).
+
+ *  ``FSINFO_FEAT_HAS_ATIME``
+ *  ``FSINFO_FEAT_HAS_BTIME``
+ *  ``FSINFO_FEAT_HAS_CTIME``
+ *  ``FSINFO_FEAT_HAS_MTIME``
+
+    These indicate which timestamps a filesystem supports (access, birth,
+    change, modify).  The range and resolutions can be queried with the
+    "FSINFO_ATTR_TIMESTAMPS" attribute).



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 11/19] afs: Support fsinfo() [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (9 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 10/19] fsinfo: Add API documentation " David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-19 21:01   ` Jann Horn
  2020-02-20 12:58   ` David Howells
  2020-02-18 17:06 ` [PATCH 12/19] security: Add hooks to rule on setting a superblock or mount watch " David Howells
                   ` (12 subsequent siblings)
  23 siblings, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add fsinfo support to the AFS filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/internal.h           |    1 
 fs/afs/super.c              |  229 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/fsinfo.h |   15 +++
 samples/vfs/test-fsinfo.c   |   51 ++++++++++
 4 files changed, 291 insertions(+), 5 deletions(-)

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 1d81fc4c3058..b4b2a8a18e9f 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -248,6 +248,7 @@ struct afs_super_info {
 	struct afs_volume	*volume;	/* volume record */
 	enum afs_flock_mode	flock_mode:8;	/* File locking emulation mode */
 	bool			dyn_root;	/* True if dynamic root */
+	bool			autocell;	/* True if autocell */
 };
 
 static inline struct afs_super_info *AFS_FS_S(struct super_block *sb)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index dda7a9a66848..e13167a9a2f8 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -26,9 +26,14 @@
 #include <linux/sched.h>
 #include <linux/nsproxy.h>
 #include <linux/magic.h>
+#include <linux/fsinfo.h>
 #include <net/net_namespace.h>
 #include "internal.h"
 
+#ifdef CONFIG_FSINFO
+static const struct fsinfo_attribute afs_fsinfo_attributes[];
+static const struct fsinfo_attribute afs_dyn_fsinfo_attributes[];
+#endif
 static void afs_i_init_once(void *foo);
 static void afs_kill_super(struct super_block *sb);
 static struct inode *afs_alloc_inode(struct super_block *sb);
@@ -54,6 +59,23 @@ int afs_net_id;
 
 static const struct super_operations afs_super_ops = {
 	.statfs		= afs_statfs,
+#ifdef CONFIG_FSINFO
+	.fsinfo_attributes = afs_fsinfo_attributes,
+#endif
+	.alloc_inode	= afs_alloc_inode,
+	.drop_inode	= afs_drop_inode,
+	.destroy_inode	= afs_destroy_inode,
+	.free_inode	= afs_free_inode,
+	.evict_inode	= afs_evict_inode,
+	.show_devname	= afs_show_devname,
+	.show_options	= afs_show_options,
+};
+
+static const struct super_operations afs_dyn_super_ops = {
+	.statfs		= afs_statfs,
+#ifdef CONFIG_FSINFO
+	.fsinfo_attributes = afs_dyn_fsinfo_attributes,
+#endif
 	.alloc_inode	= afs_alloc_inode,
 	.drop_inode	= afs_drop_inode,
 	.destroy_inode	= afs_destroy_inode,
@@ -193,7 +215,7 @@ static int afs_show_options(struct seq_file *m, struct dentry *root)
 
 	if (as->dyn_root)
 		seq_puts(m, ",dyn");
-	if (test_bit(AFS_VNODE_AUTOCELL, &AFS_FS_I(d_inode(root))->flags))
+	if (as->autocell)
 		seq_puts(m, ",autocell");
 	switch (as->flock_mode) {
 	case afs_flock_mode_unset:	break;
@@ -432,9 +454,12 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
 	sb->s_blocksize_bits	= PAGE_SHIFT;
 	sb->s_maxbytes		= MAX_LFS_FILESIZE;
 	sb->s_magic		= AFS_FS_MAGIC;
-	sb->s_op		= &afs_super_ops;
-	if (!as->dyn_root)
+	if (!as->dyn_root) {
+		sb->s_op	= &afs_super_ops;
 		sb->s_xattr	= afs_xattr_handlers;
+	} else {
+		sb->s_op	= &afs_dyn_super_ops;
+	}
 	ret = super_setup_bdi(sb);
 	if (ret)
 		return ret;
@@ -444,7 +469,7 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
 	if (as->dyn_root) {
 		inode = afs_iget_pseudo_dir(sb, true);
 	} else {
-		sprintf(sb->s_id, "%llu", as->volume->vid);
+		sprintf(sb->s_id, "%llx", as->volume->vid);
 		afs_activate_volume(as->volume);
 		iget_data.fid.vid	= as->volume->vid;
 		iget_data.fid.vnode	= 1;
@@ -458,7 +483,7 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
-	if (ctx->autocell || as->dyn_root)
+	if (as->autocell || as->dyn_root)
 		set_bit(AFS_VNODE_AUTOCELL, &AFS_FS_I(inode)->flags);
 
 	ret = -ENOMEM;
@@ -498,6 +523,8 @@ static struct afs_super_info *afs_alloc_sbi(struct fs_context *fc)
 			as->cell = afs_get_cell(ctx->cell);
 			as->volume = __afs_get_volume(ctx->volume);
 		}
+		if (ctx->autocell)
+			as->autocell = true;
 	}
 	return as;
 }
@@ -760,3 +787,195 @@ static int afs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 	return ret;
 }
+
+#ifdef CONFIG_FSINFO
+static const struct fsinfo_timestamp_info afs_timestamp_info = {
+	.atime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.mtime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.ctime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.btime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+};
+
+static int afs_fsinfo_get_timestamp(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_timestamp_info *tsinfo = ctx->buffer;
+	*tsinfo = afs_timestamp_info;
+	return sizeof(*tsinfo);
+}
+
+static int afs_fsinfo_get_limits(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_limits *lim = ctx->buffer;
+
+	lim->max_file_size.hi	= 0;
+	lim->max_file_size.lo	= MAX_LFS_FILESIZE;
+	/* Inode numbers can be 96-bit on YFS, but that's hard to determine. */
+	lim->max_ino.hi		= 0;
+	lim->max_ino.lo		= UINT_MAX;
+	lim->max_hard_links	= UINT_MAX;
+	lim->max_uid		= UINT_MAX;
+	lim->max_gid		= UINT_MAX;
+	lim->max_filename_len	= AFSNAMEMAX - 1;
+	lim->max_symlink_len	= AFSPATHMAX - 1;
+	return sizeof(*lim);
+}
+
+static int afs_fsinfo_get_supports(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_supports *sup = ctx->buffer;
+
+	sup = ctx->buffer;
+	sup->stx_mask = (STATX_TYPE | STATX_MODE |
+			 STATX_NLINK |
+			 STATX_UID | STATX_GID |
+			 STATX_MTIME | STATX_INO |
+			 STATX_SIZE);
+	sup->stx_attributes = STATX_ATTR_AUTOMOUNT;
+	return sizeof(*sup);
+}
+
+static int afs_fsinfo_get_features(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_features *ft = ctx->buffer;
+
+	fsinfo_set_feature(ft, FSINFO_FEAT_IS_NETWORK_FS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_AUTOMOUNTS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_ADV_LOCKS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_UIDS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_GIDS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_VOLUME_ID);
+	fsinfo_set_feature(ft, FSINFO_FEAT_VOLUME_NAME);
+	fsinfo_set_feature(ft, FSINFO_FEAT_IVER_MONO_INCR);
+	fsinfo_set_feature(ft, FSINFO_FEAT_SYMLINKS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_HARD_LINKS_1DIR);
+	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_MTIME);
+	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_INODE_NUMBERS);
+	return sizeof(*ft);
+}
+
+static int afs_dyn_fsinfo_get_features(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_features *ft = ctx->buffer;
+
+	fsinfo_set_feature(ft, FSINFO_FEAT_IS_AUTOMOUNTER_FS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_AUTOMOUNTS);
+	return sizeof(*ft);
+}
+
+static int afs_fsinfo_get_volume_name(struct path *path, struct fsinfo_context *ctx)
+{
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_volume *volume = as->volume;
+
+	memcpy(ctx->buffer, volume->name, volume->name_len);
+	return volume->name_len;
+}
+
+static int afs_fsinfo_get_cell_name(struct path *path, struct fsinfo_context *ctx)
+{
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_cell *cell = as->cell;
+
+	memcpy(ctx->buffer, cell->name, cell->name_len);
+	return cell->name_len;
+}
+
+static int afs_fsinfo_get_server_name(struct path *path, struct fsinfo_context *ctx)
+{
+	struct afs_server_list *slist;
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_volume *volume = as->volume;
+	struct afs_server *server;
+	int ret = -ENODATA;
+
+	read_lock(&volume->servers_lock);
+	slist = volume->servers;
+	if (slist) {
+		if (ctx->Nth < slist->nr_servers) {
+			server = slist->servers[ctx->Nth].server;
+			ret = sprintf(ctx->buffer, "%pU", &server->uuid);
+		}
+	}
+
+	read_unlock(&volume->servers_lock);
+	return ret;
+}
+
+static int afs_fsinfo_get_server_address(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_afs_server_address *addr = ctx->buffer;
+	struct afs_server_list *slist;
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_addr_list *alist;
+	struct afs_volume *volume = as->volume;
+	struct afs_server *server;
+	struct afs_net *net = afs_d2net(path->dentry);
+	unsigned int i;
+	int ret = -ENODATA;
+
+	read_lock(&volume->servers_lock);
+	slist = afs_get_serverlist(volume->servers);
+	read_unlock(&volume->servers_lock);
+
+	if (ctx->Nth >= slist->nr_servers)
+		goto put_slist;
+	server = slist->servers[ctx->Nth].server;
+
+	read_lock(&server->fs_lock);
+	alist = afs_get_addrlist(rcu_access_pointer(server->addresses));
+	read_unlock(&server->fs_lock);
+	if (!alist)
+		goto put_slist;
+
+	ret = alist->nr_addrs * sizeof(*addr);
+	if (ret <= ctx->buf_size) {
+		for (i = 0; i < alist->nr_addrs; i++)
+			memcpy(&addr[i].address, &alist->addrs[i],
+			       sizeof(struct sockaddr_rxrpc));
+	}
+
+	afs_put_addrlist(alist);
+put_slist:
+	afs_put_serverlist(net, slist);
+	return ret;
+}
+
+static const struct fsinfo_attribute afs_fsinfo_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	afs_fsinfo_get_timestamp),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		afs_fsinfo_get_limits),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		afs_fsinfo_get_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		afs_fsinfo_get_features),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	afs_fsinfo_get_volume_name),
+	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	afs_fsinfo_get_cell_name),
+	FSINFO_STRING_N	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_fsinfo_get_server_name),
+	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_get_server_address),
+	{}
+};
+
+static const struct fsinfo_attribute afs_dyn_fsinfo_attributes[] = {
+	FSINFO_VSTRUCT(FSINFO_ATTR_TIMESTAMP_INFO,	afs_fsinfo_get_timestamp),
+	FSINFO_VSTRUCT(FSINFO_ATTR_FEATURES,		afs_dyn_fsinfo_get_features),
+	{}
+};
+
+#endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index bf12900455b8..5926b16aac4e 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -33,6 +33,10 @@
 #define FSINFO_ATTR_MOUNT_POINT		0x202	/* Relative path of mount in parent (string) */
 #define FSINFO_ATTR_MOUNT_CHILDREN	0x203	/* Children of this mount (list) */
 
+#define FSINFO_ATTR_AFS_CELL_NAME	0x300	/* AFS cell name (string) */
+#define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of the Nth server (string) */
+#define FSINFO_ATTR_AFS_SERVER_ADDRESSES 0x302	/* List of addresses of the Nth server */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -296,4 +300,15 @@ struct fsinfo_volume_uuid {
 
 #define FSINFO_ATTR_VOLUME_UUID__STRUCT struct fsinfo_volume_uuid
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_AFS_SERVER_ADDRESSES).
+ *
+ * Get the addresses of the Nth server for a network filesystem.
+ */
+struct fsinfo_afs_server_address {
+	struct __kernel_sockaddr_storage address;
+};
+
+#define FSINFO_ATTR_AFS_SERVER_ADDRESSES__STRUCT struct fsinfo_afs_server_address
+
 #endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 1411cadc4a90..6ad0f84c4327 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -23,6 +23,7 @@
 #include <linux/socket.h>
 #include <sys/stat.h>
 #include <arpa/inet.h>
+#include <linux/rxrpc.h>
 
 #ifndef __NR_fsinfo
 #define __NR_fsinfo -1
@@ -305,6 +306,50 @@ static void dump_fsinfo_generic_mount_child(void *reply, unsigned int size)
 	printf("%8x %8x\n", f->mnt_id, f->change_counter);
 }
 
+static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
+{
+	struct fsinfo_afs_server_address *f = reply;
+	struct sockaddr_storage *ss = (struct sockaddr_storage *)&f->address;
+	struct sockaddr_rxrpc *srx;
+	struct sockaddr_in6 *sin6;
+	struct sockaddr_in *sin;
+	char proto[32], buf[1024];
+
+	if (ss->ss_family == AF_RXRPC) {
+		srx = (struct sockaddr_rxrpc *)ss;
+		printf("%5u ", srx->srx_service);
+		switch (srx->transport_type) {
+		case SOCK_DGRAM:
+			sprintf(proto, "udp");
+			break;
+		case SOCK_STREAM:
+			sprintf(proto, "tcp");
+			break;
+		default:
+			sprintf(proto, "%3u", srx->transport_type);
+			break;
+		}
+		ss = (struct sockaddr_storage *)&srx->transport;
+	}
+
+	switch (ss->ss_family) {
+	case AF_INET:
+		sin = (struct sockaddr_in *)ss;
+		if (!inet_ntop(AF_INET, &sin->sin_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u/%s %s\n", ntohs(sin->sin_port), proto, buf);
+		return;
+	case AF_INET6:
+		sin6 = (struct sockaddr_in6 *)ss;
+		if (!inet_ntop(AF_INET6, &sin6->sin6_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u/%s %s\n", ntohs(sin6->sin6_port), proto, buf);
+		return;
+	}
+
+	printf("family=%u\n", ss->ss_family);
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -334,6 +379,8 @@ static void dump_string(void *reply, unsigned int size)
 #define dump_fsinfo_generic_volume_name		dump_string
 #define dump_fsinfo_generic_mount_devname	dump_string
 #define dump_fsinfo_generic_mount_point		dump_string
+#define dump_afs_cell_name			dump_string
+#define dump_afs_server_name			dump_string
 
 /*
  *
@@ -374,6 +421,10 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_DEVNAME,	fsinfo_generic_mount_devname),
 	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_child),
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
+
+	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	afs_cell_name),
+	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_server_name),
+	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),
 	{}
 };
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 12/19] security: Add hooks to rule on setting a superblock or mount watch [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (10 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 11/19] afs: Support fsinfo() " David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-18 17:06 ` [PATCH 13/19] vfs: Add a mount-notification facility " David Howells
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add security hooks that will allow an LSM to rule on whether or not a watch
may be set on a mount or on a superblock.  More than one hook is required
as the watches watch different types of object.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: Stephen Smalley <sds@tycho.nsa.gov>
cc: linux-security-module@vger.kernel.org
---

 include/linux/lsm_hooks.h |   24 ++++++++++++++++++++++++
 include/linux/security.h  |   16 ++++++++++++++++
 security/security.c       |   14 ++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 16530255dc11..c4451ac197ae 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1427,6 +1427,18 @@
  *	Check to see if a process is allowed to watch for event notifications
  *	from devices (as a global set).
  *
+ * @watch_mount:
+ *	Check to see if a process is allowed to watch for mount topology change
+ *	notifications on a mount subtree.
+ *	@watch: The watch object
+ *	@path: The root of the subtree to watch.
+ *
+ * @watch_sb:
+ *	Check to see if a process is allowed to watch for event notifications
+ *	from a superblock.
+ *	@watch: The watch object
+ *	@sb: The superblock to watch.
+ *
  * @post_notification:
  *	Check to see if a watch notification can be posted to a particular
  *	queue.
@@ -1722,6 +1734,12 @@ union security_list_options {
 #ifdef CONFIG_DEVICE_NOTIFICATIONS
 	int (*watch_devices)(void);
 #endif
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	int (*watch_mount)(struct watch *watch, struct path *path);
+#endif
+#ifdef CONFIG_SB_NOTIFICATIONS
+	int (*watch_sb)(struct watch *watch, struct super_block *sb);
+#endif
 #ifdef CONFIG_WATCH_QUEUE
 	int (*post_notification)(const struct cred *w_cred,
 				 const struct cred *cred,
@@ -2020,6 +2038,12 @@ struct security_hook_heads {
 #ifdef CONFIG_DEVICE_NOTIFICATIONS
 	struct hlist_head watch_devices;
 #endif
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	struct hlist_head watch_mount;
+#endif
+#ifdef CONFIG_SB_NOTIFICATIONS
+	struct hlist_head watch_sb;
+#endif
 #ifdef CONFIG_WATCH_QUEUE
 	struct hlist_head post_notification;
 #endif /* CONFIG_WATCH_QUEUE */
diff --git a/include/linux/security.h b/include/linux/security.h
index 910a1efa9a79..2ca2569bc12c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1306,6 +1306,22 @@ static inline int security_post_notification(const struct cred *w_cred,
 	return 0;
 }
 #endif
+#if defined(CONFIG_SECURITY) && defined(CONFIG_MOUNT_NOTIFICATIONS)
+int security_watch_mount(struct watch *watch, struct path *path);
+#else
+static inline int security_watch_mount(struct watch *watch, struct path *path)
+{
+	return 0;
+}
+#endif
+#if defined(CONFIG_SECURITY) && defined(CONFIG_SB_NOTIFICATIONS)
+int security_watch_sb(struct watch *watch, struct super_block *sb);
+#else
+static inline int security_watch_sb(struct watch *watch, struct super_block *sb)
+{
+	return 0;
+}
+#endif
 
 #ifdef CONFIG_SECURITY_NETWORK
 
diff --git a/security/security.c b/security/security.c
index db7b574c9c70..5c0463444a90 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2004,6 +2004,20 @@ int security_watch_key(struct key *key)
 }
 #endif
 
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+int security_watch_mount(struct watch *watch, struct path *path)
+{
+	return call_int_hook(watch_mount, 0, watch, path);
+}
+#endif
+
+#ifdef CONFIG_SB_NOTIFICATIONS
+int security_watch_sb(struct watch *watch, struct super_block *sb)
+{
+	return call_int_hook(watch_sb, 0, watch, sb);
+}
+#endif
+
 #ifdef CONFIG_DEVICE_NOTIFICATIONS
 int security_watch_devices(void)
 {



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 13/19] vfs: Add a mount-notification facility [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (11 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 12/19] security: Add hooks to rule on setting a superblock or mount watch " David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-19 22:40   ` Jann Horn
  2020-02-21 12:24   ` David Howells
  2020-02-18 17:06 ` [PATCH 14/19] notifications: sample: Display mount tree change notifications [ver #16] David Howells
                   ` (10 subsequent siblings)
  23 siblings, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add a mount notification facility whereby notifications about changes in
mount topology and configuration can be received.  Note that this only
covers vfsmount topology changes and not superblock events.  A separate
facility will be added for that.

Firstly, an event queue needs to be created:

	fd = open("/dev/event_queue", O_RDWR);
	ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n);

then a notification can be set up to report notifications via that queue:

	struct watch_notification_filter filter = {
		.nr_filters = 1,
		.filters = {
			[0] = {
				.type = WATCH_TYPE_MOUNT_NOTIFY,
				.subtype_filter[0] = UINT_MAX,
			},
		},
	};
	ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
	watch_mount(AT_FDCWD, "/", 0, fd, 0x02);

In this case, it would let me monitor the mount topology subtree rooted at
"/" for events.  Mount notifications propagate up the tree towards the
root, so a watch will catch all of the events happening in the subtree
rooted at the watch.

After setting the watch, records will be placed into the queue when, for
example, as superblock switches between read-write and read-only.  Records
are of the following format:

	struct mount_notification {
		struct watch_notification watch;
		__u32	triggered_on;
		__u32	changed_mount;
	} *n;

Where:

	n->watch.type will be WATCH_TYPE_MOUNT_NOTIFY.

	n->watch.subtype will indicate the type of event, such as
	NOTIFY_MOUNT_NEW_MOUNT.

	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
	record.

	n->watch.info & WATCH_INFO_ID will be the fifth argument to
	watch_mount(), shifted.

	n->watch.info & NOTIFY_MOUNT_IN_SUBTREE if true indicates that the
	notifcation was generated in the mount subtree rooted at the watch,
	and not actually in the watch itself.

	n->watch.info & NOTIFY_MOUNT_IS_RECURSIVE if true indicates that
	the notifcation was generated by an event (eg. SETATTR) that was
	applied recursively.  The notification is only generated for the
	object that initially triggered it.

	n->watch.info & NOTIFY_MOUNT_IS_NOW_RO will be used for
	NOTIFY_MOUNT_READONLY, being set if the superblock becomes R/O, and
	being cleared otherwise, and for NOTIFY_MOUNT_NEW_MOUNT, being set
	if the new mount is a submount (e.g. an automount).

	n->watch.info & NOTIFY_MOUNT_IS_SUBMOUNT if true indicates that the
	NOTIFY_MOUNT_NEW_MOUNT notification is in response to a mount
	performed by the kernel (e.g. an automount).

	n->triggered_on indicates the ID of the mount on which the watch
	was installed.

	n->changed_mount indicates the ID of the mount that was affected.

The mount IDs can be retrieved with the fsinfo() syscall, using the
fsinfo_mount_info and fsinfo_mount_child attributes.  There are change
notification counters there too for when a buffer overrun occurs, thereby
allowing the mount tree to be quickly rescanned.

Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype.  Note also that
the queue can be shared between multiple notifications of various types.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 fs/Kconfig                                  |    9 +
 fs/Makefile                                 |    1 
 fs/mount.h                                  |   33 +++--
 fs/mount_notify.c                           |  188 +++++++++++++++++++++++++++
 fs/namespace.c                              |   17 ++
 include/linux/dcache.h                      |    1 
 include/linux/syscalls.h                    |    2 
 include/uapi/asm-generic/unistd.h           |    4 -
 include/uapi/linux/watch_queue.h            |   32 ++++-
 kernel/sys_ni.c                             |    1 
 26 files changed, 287 insertions(+), 17 deletions(-)
 create mode 100644 fs/mount_notify.c

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 961750417ef2..72bc7b33c59d 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -478,3 +478,4 @@
 547	common	openat2				sys_openat2
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	fsinfo				sys_fsinfo
+550	common	watch_mount			sys_watch_mount
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index e6b9dfe01471..67777fd0b19e 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -452,3 +452,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 9018a3a6b067..cd18dc112902 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -359,3 +359,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 10172bb6ba1f..de5c7303899f 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 # 435 reserved for clone3
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 58665073c1f0..7387a44767c3 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -444,3 +444,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 1f07a89473c3..e2c76157a580 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -377,3 +377,4 @@
 437	n32	openat2				sys_openat2
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	fsinfo				sys_fsinfo
+440	n32	watch_mount			sys_watch_mount
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 3c853ca54901..e5da9a13b074 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -353,3 +353,4 @@
 437	n64	openat2				sys_openat2
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	fsinfo				sys_fsinfo
+440	n64	watch_mount			sys_watch_mount
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 727f54542bf4..fe135759d2a8 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -426,3 +426,4 @@
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	fsinfo				sys_fsinfo
+440	o32	watch_mount			sys_watch_mount
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 2e9576638d80..5ac7a58af305 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 397190734ca7..c77a1cf377ec 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -520,3 +520,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index e9340d712dcd..d81d30d02aaf 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -441,3 +441,4 @@
 437  common	openat2			sys_openat2			sys_openat2
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	fsinfo			sys_fsinfo			sys_fsinfo
+440	common	watch_mount		sys_watch_mount			sys_watch_mount
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 7bb5ec284fbb..dcdc747fa430 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -441,3 +441,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index a902b757ace2..b4f82e5a08bf 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -484,3 +484,4 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index b7817acb154b..07572644779d 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -443,3 +443,4 @@
 437	i386	openat2			sys_openat2			__ia32_sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd			__ia32_sys_pidfd_getfd
 439	i386	fsinfo			sys_fsinfo			__ia32_sys_fsinfo
+440	i386	watch_mount		sys_watch_mount			__ia32_sys_watch_mount
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 3a45ed6d28cb..1b51791fe104 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -360,6 +360,7 @@
 437	common	openat2			__x64_sys_openat2
 438	common	pidfd_getfd		__x64_sys_pidfd_getfd
 439	common	fsinfo			__x64_sys_fsinfo
+440	common	watch_mount		__x64_sys_watch_mount
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index f82702a7ab38..dfcdd3036d3e 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -409,3 +409,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
+440	common	watch_mount			sys_watch_mount
diff --git a/fs/Kconfig b/fs/Kconfig
index 1d1b48059ec9..76224bc015cb 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -124,6 +124,15 @@ source "fs/verity/Kconfig"
 
 source "fs/notify/Kconfig"
 
+config MOUNT_NOTIFICATIONS
+	bool "Mount topology change notifications"
+	select WATCH_QUEUE
+	help
+	  This option provides support for getting change notifications on the
+	  mount tree topology.  This makes use of the /dev/watch_queue misc
+	  device to handle the notification buffer and provides the
+	  mount_notify() system call to enable/disable watchpoints.
+
 source "fs/quota/Kconfig"
 
 source "fs/autofs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index b5cc9bcd17a4..b6bf2424c7f7 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -22,6 +22,7 @@ obj-y +=	no-block.o
 endif
 
 obj-$(CONFIG_PROC_FS) += proc_namespace.o
+obj-$(CONFIG_MOUNT_NOTIFICATIONS) += mount_notify.o
 
 obj-y				+= notify/
 obj-$(CONFIG_EPOLL)		+= eventpoll.o
diff --git a/fs/mount.h b/fs/mount.h
index a1625924fe81..fc791d8b274c 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -4,6 +4,7 @@
 #include <linux/poll.h>
 #include <linux/ns_common.h>
 #include <linux/fs_pin.h>
+#include <linux/watch_queue.h>
 
 struct mnt_namespace {
 	atomic_t		count;
@@ -70,9 +71,13 @@ struct mount {
 	int mnt_id;			/* mount identifier */
 	int mnt_group_id;		/* peer group identifier */
 	int mnt_expiry_mark;		/* true if marked for expiry */
+	int mnt_nr_watchers;		/* The number of subtree watches tracking this */
 	struct hlist_head mnt_pins;
 	struct hlist_head mnt_stuck_children;
 	atomic_t mnt_change_counter;	/* Number of changed applied */
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	struct watch_list *mnt_watchers; /* Watches on dentries within this mount */
+#endif
 } __randomize_layout;
 
 #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */
@@ -155,18 +160,8 @@ static inline bool is_anon_ns(struct mnt_namespace *ns)
 	return ns->seq == 0;
 }
 
-/*
- * Type of mount topology change notification.
- */
-enum mount_notification_subtype {
-	NOTIFY_MOUNT_NEW_MOUNT	= 0, /* New mount added */
-	NOTIFY_MOUNT_UNMOUNT	= 1, /* Mount removed manually */
-	NOTIFY_MOUNT_EXPIRY	= 2, /* Automount expired */
-	NOTIFY_MOUNT_READONLY	= 3, /* Mount R/O state changed */
-	NOTIFY_MOUNT_SETATTR	= 4, /* Mount attributes changed */
-	NOTIFY_MOUNT_MOVE_FROM	= 5, /* Mount moved from here */
-	NOTIFY_MOUNT_MOVE_TO	= 6, /* Mount moved to here (compare op_id) */
-};
+extern void post_mount_notification(struct mount *changed,
+				    struct mount_notification *notify);
 
 static inline void notify_mount(struct mount *changed,
 				struct mount *aux,
@@ -174,4 +169,18 @@ static inline void notify_mount(struct mount *changed,
 				u32 info_flags)
 {
 	atomic_inc(&changed->mnt_change_counter);
+
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	{
+		struct mount_notification n = {
+			.watch.type	= WATCH_TYPE_MOUNT_NOTIFY,
+			.watch.subtype	= subtype,
+			.watch.info	= info_flags | watch_sizeof(n),
+			.triggered_on	= changed->mnt_id,
+			.changed_mount	= aux ? aux->mnt_id : 0,
+		};
+
+		post_mount_notification(changed, &n);
+	}
+#endif
 }
diff --git a/fs/mount_notify.c b/fs/mount_notify.c
new file mode 100644
index 000000000000..20644544802a
--- /dev/null
+++ b/fs/mount_notify.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Provide mount topology/attribute change notifications.
+ *
+ * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/syscalls.h>
+#include <linux/slab.h>
+#include <linux/security.h>
+#include "mount.h"
+
+/*
+ * Post mount notifications to all watches going rootwards along the tree.
+ *
+ * Must be called with the mount_lock held.
+ */
+void post_mount_notification(struct mount *changed,
+			     struct mount_notification *notify)
+{
+	const struct cred *cred = current_cred();
+	struct path cursor;
+	struct mount *mnt;
+	unsigned seq;
+
+	seq = 0;
+	rcu_read_lock();
+restart:
+	cursor.mnt = &changed->mnt;
+	cursor.dentry = changed->mnt.mnt_root;
+	mnt = real_mount(cursor.mnt);
+	notify->watch.info &= ~NOTIFY_MOUNT_IN_SUBTREE;
+
+	read_seqbegin_or_lock(&rename_lock, &seq);
+	for (;;) {
+		if (mnt->mnt_watchers &&
+		    !hlist_empty(&mnt->mnt_watchers->watchers)) {
+			if (cursor.dentry->d_flags & DCACHE_MOUNT_WATCH)
+				post_watch_notification(mnt->mnt_watchers,
+							&notify->watch, cred,
+							(unsigned long)cursor.dentry);
+		} else {
+			cursor.dentry = mnt->mnt.mnt_root;
+		}
+		notify->watch.info |= NOTIFY_MOUNT_IN_SUBTREE;
+
+		if (cursor.dentry == cursor.mnt->mnt_root ||
+		    IS_ROOT(cursor.dentry)) {
+			struct mount *parent = READ_ONCE(mnt->mnt_parent);
+
+			/* Escaped? */
+			if (cursor.dentry != cursor.mnt->mnt_root)
+				break;
+
+			/* Global root? */
+			if (mnt == parent)
+				break;
+
+			cursor.dentry = READ_ONCE(mnt->mnt_mountpoint);
+			mnt = parent;
+			cursor.mnt = &mnt->mnt;
+		} else {
+			cursor.dentry = cursor.dentry->d_parent;
+		}
+	}
+
+	if (need_seqretry(&rename_lock, seq)) {
+		seq = 1;
+		goto restart;
+	}
+
+	done_seqretry(&rename_lock, seq);
+	rcu_read_unlock();
+}
+
+static void release_mount_watch(struct watch *watch)
+{
+	struct dentry *dentry = (struct dentry *)(unsigned long)watch->id;
+
+	dput(dentry);
+}
+
+/**
+ * sys_watch_mount - Watch for mount topology/attribute changes
+ * @dfd: Base directory to pathwalk from or fd referring to mount.
+ * @filename: Path to mount to place the watch upon
+ * @at_flags: Pathwalk control flags
+ * @watch_fd: The watch queue to send notifications to.
+ * @watch_id: The watch ID to be placed in the notification (-1 to remove watch)
+ */
+SYSCALL_DEFINE5(watch_mount,
+		int, dfd,
+		const char __user *, filename,
+		unsigned int, at_flags,
+		int, watch_fd,
+		int, watch_id)
+{
+	struct watch_queue *wqueue;
+	struct watch_list *wlist = NULL;
+	struct watch *watch = NULL;
+	struct mount *m;
+	struct path path;
+	unsigned int lookup_flags =
+		LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+	int ret;
+
+	if (watch_id < -1 || watch_id > 0xff)
+		return -EINVAL;
+	if ((at_flags & ~(AT_NO_AUTOMOUNT | AT_EMPTY_PATH)) != 0)
+		return -EINVAL;
+	if (at_flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (at_flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+
+	ret = user_path_at(dfd, filename, lookup_flags, &path);
+	if (ret)
+		return ret;
+
+	ret = inode_permission(path.dentry->d_inode, MAY_EXEC);
+	if (ret)
+		goto err_path;
+
+	wqueue = get_watch_queue(watch_fd);
+	if (IS_ERR(wqueue))
+		goto err_path;
+
+	m = real_mount(path.mnt);
+
+	if (watch_id >= 0) {
+		ret = -ENOMEM;
+		if (!m->mnt_watchers) {
+			wlist = kzalloc(sizeof(*wlist), GFP_KERNEL);
+			if (!wlist)
+				goto err_wqueue;
+			init_watch_list(wlist, release_mount_watch);
+		}
+
+		watch = kzalloc(sizeof(*watch), GFP_KERNEL);
+		if (!watch)
+			goto err_wlist;
+
+		init_watch(watch, wqueue);
+		watch->id		= (unsigned long)path.dentry;
+		watch->info_id		= (u32)watch_id << 24;
+
+		ret = security_watch_mount(watch, &path);
+		if (ret < 0)
+			goto err_watch;
+
+		down_write(&m->mnt.mnt_sb->s_umount);
+		if (!m->mnt_watchers) {
+			m->mnt_watchers = wlist;
+			wlist = NULL;
+		}
+
+		ret = add_watch_to_object(watch, m->mnt_watchers);
+		if (ret == 0) {
+			spin_lock(&path.dentry->d_lock);
+			path.dentry->d_flags |= DCACHE_MOUNT_WATCH;
+			spin_unlock(&path.dentry->d_lock);
+			dget(path.dentry);
+			watch = NULL;
+		}
+		up_write(&m->mnt.mnt_sb->s_umount);
+	} else {
+		ret = -EBADSLT;
+		if (m->mnt_watchers) {
+			down_write(&m->mnt.mnt_sb->s_umount);
+			ret = remove_watch_from_object(m->mnt_watchers, wqueue,
+						       (unsigned long)path.dentry,
+						       false);
+			up_write(&m->mnt.mnt_sb->s_umount);
+		}
+	}
+
+err_watch:
+	kfree(watch);
+err_wlist:
+	kfree(wlist);
+err_wqueue:
+	put_watch_queue(wqueue);
+err_path:
+	path_put(&path);
+	return ret;
+}
diff --git a/fs/namespace.c b/fs/namespace.c
index 184c1aaf669a..bbfd6cd5c501 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -510,7 +510,8 @@ static int mnt_make_readonly(struct mount *mnt)
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 	unlock_mount_hash();
 	if (ret == 0)
-		notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, 0x10000);
+		notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY,
+			     NOTIFY_MOUNT_IS_NOW_RO);
 	return ret;
 }
 
@@ -1176,6 +1177,11 @@ static void mntput_no_expire(struct mount *mnt)
 	mnt->mnt.mnt_flags |= MNT_DOOMED;
 	rcu_read_unlock();
 
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	if (mnt->mnt_watchers)
+		remove_watch_list(mnt->mnt_watchers, mnt->mnt_id);
+#endif
+
 	list_del(&mnt->mnt_instance);
 
 	if (unlikely(!list_empty(&mnt->mnt_mounts))) {
@@ -2108,7 +2114,11 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 			list_del_init(&source_mnt->mnt_ns->list);
 		}
 		mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
-		notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_NEW_MOUNT, 0);
+		notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_NEW_MOUNT,
+			     (source_mnt->mnt.mnt_sb->s_flags & SB_RDONLY ?
+			      NOTIFY_MOUNT_IS_NOW_RO : 0) |
+			     (source_mnt->mnt.mnt_sb->s_flags & SB_SUBMOUNT ?
+			      NOTIFY_MOUNT_IS_SUBMOUNT : 0));
 		commit_tree(source_mnt);
 	}
 
@@ -2486,7 +2496,8 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
 	mnt->mnt.mnt_flags = mnt_flags;
 	touch_mnt_namespace(mnt->mnt_ns);
 	unlock_mount_hash();
-	notify_mount(mnt, NULL, NOTIFY_MOUNT_SETATTR, 0);
+	notify_mount(mnt, NULL, NOTIFY_MOUNT_SETATTR,
+		     (mnt_flags & SB_RDONLY ? NOTIFY_MOUNT_IS_NOW_RO : 0));
 }
 
 static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..7b194d778155 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -217,6 +217,7 @@ struct dentry_operations {
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
 #define DCACHE_NORCU			0x40000000 /* No RCU delay for freeing */
+#define DCACHE_MOUNT_WATCH		0x80000000 /* There's a mount watch here */
 
 extern seqlock_t rename_lock;
 
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 6c3157e46e7c..1687e064751d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1007,6 +1007,8 @@ asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
 asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
 			   struct fsinfo_params __user *params,
 			   void __user *buffer, size_t buf_size);
+asmlinkage long sys_watch_mount(int dfd, const char __user *path,
+				unsigned int at_flags, int watch_fd, int watch_id);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 9d00098a3f1b..d6b6c45ad31a 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_fsinfo 439
 __SYSCALL(__NR_fsinfo, sys_fsinfo)
+#define __NR_watch_mount 440
+__SYSCALL(__NR_watch_mount, sys_watch_mount)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index c3d8320b5d3a..b0f35cf51394 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -14,7 +14,8 @@
 enum watch_notification_type {
 	WATCH_TYPE_META		= 0,	/* Special record */
 	WATCH_TYPE_KEY_NOTIFY	= 1,	/* Key change event notification */
-	WATCH_TYPE__NR		= 2
+	WATCH_TYPE_MOUNT_NOTIFY	= 2,	/* Mount topology change notification */
+	WATCH_TYPE___NR		= 3
 };
 
 enum watch_meta_notification_subtype {
@@ -101,4 +102,33 @@ struct key_notification {
 	__u32	aux;		/* Per-type auxiliary data */
 };
 
+/*
+ * Type of mount topology change notification.
+ */
+enum mount_notification_subtype {
+	NOTIFY_MOUNT_NEW_MOUNT	= 0, /* New mount added */
+	NOTIFY_MOUNT_UNMOUNT	= 1, /* Mount removed manually */
+	NOTIFY_MOUNT_EXPIRY	= 2, /* Automount expired */
+	NOTIFY_MOUNT_READONLY	= 3, /* Mount R/O state changed */
+	NOTIFY_MOUNT_SETATTR	= 4, /* Mount attributes changed */
+	NOTIFY_MOUNT_MOVE_FROM	= 5, /* Mount moved from here */
+	NOTIFY_MOUNT_MOVE_TO	= 6, /* Mount moved to here (compare op_id) */
+};
+
+#define NOTIFY_MOUNT_IN_SUBTREE		WATCH_INFO_FLAG_0 /* Event not actually at watched dentry */
+#define NOTIFY_MOUNT_IS_RECURSIVE	WATCH_INFO_FLAG_1 /* Change applied recursively */
+#define NOTIFY_MOUNT_IS_NOW_RO		WATCH_INFO_FLAG_2 /* Mount changed to R/O */
+#define NOTIFY_MOUNT_IS_SUBMOUNT	WATCH_INFO_FLAG_3 /* New mount is submount */
+
+/*
+ * Mount topology/configuration change notification record.
+ * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
+ * - watch.subtype = enum mount_notification_subtype
+ */
+struct mount_notification {
+	struct watch_notification watch; /* WATCH_TYPE_MOUNT_NOTIFY */
+	__u32	triggered_on;		/* The mount that the notify was on */
+	__u32	changed_mount;		/* The mount that got changed */
+};
+
 #endif /* _UAPI_LINUX_WATCH_QUEUE_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 58246e6b5603..1a1eb7b61914 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -52,6 +52,7 @@ COND_SYSCALL(io_uring_setup);
 COND_SYSCALL(io_uring_enter);
 COND_SYSCALL(io_uring_register);
 COND_SYSCALL(fsinfo);
+COND_SYSCALL(watch_mount);
 
 /* fs/xattr.c */
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 14/19] notifications: sample: Display mount tree change notifications [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (12 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 13/19] vfs: Add a mount-notification facility " David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-18 17:06 ` [PATCH 15/19] vfs: Add superblock " David Howells
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

This is run like:

	./watch_test

and watches "/" for changes to the mount topology and the attributes of
individual mount objects.

	# mount -t tmpfs none /mnt
	# mount -o remount,ro /mnt
	# mount -o remount,rw /mnt

producing:

	# ./watch_test
	read() = 16
	NOTIFY[000]: ty=000002 sy=00 i=02000010
	MOUNT 00000060 change=0[new_mount] aux=416
	read() = 16
	NOTIFY[000]: ty=000002 sy=04 i=02010010
	MOUNT 000001a0 change=4[setattr] aux=0
	read() = 16
	NOTIFY[000]: ty=000002 sy=04 i=02010010
	MOUNT 000001a0 change=4[setattr] aux=0

Signed-off-by: David Howells <dhowells@redhat.com>
---

 samples/watch_queue/watch_test.c |   39 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c
index 0eaff5dc04c3..49d185150506 100644
--- a/samples/watch_queue/watch_test.c
+++ b/samples/watch_queue/watch_test.c
@@ -26,6 +26,9 @@
 #ifndef __NR_watch_devices
 #define __NR_watch_devices -1
 #endif
+#ifndef __NR_watch_mount
+#define __NR_watch_mount -1
+#endif
 
 #define BUF_SIZE 256
 
@@ -58,6 +61,27 @@ static void saw_key_change(struct watch_notification *n, size_t len)
 	       k->key_id, n->subtype, key_subtypes[n->subtype], k->aux);
 }
 
+static const char *mount_subtypes[256] = {
+	[NOTIFY_MOUNT_NEW_MOUNT]	= "new_mount",
+	[NOTIFY_MOUNT_UNMOUNT]		= "unmount",
+	[NOTIFY_MOUNT_EXPIRY]		= "expiry",
+	[NOTIFY_MOUNT_READONLY]		= "readonly",
+	[NOTIFY_MOUNT_SETATTR]		= "setattr",
+	[NOTIFY_MOUNT_MOVE_FROM]	= "move_from",
+	[NOTIFY_MOUNT_MOVE_TO]		= "move_to",
+};
+
+static void saw_mount_change(struct watch_notification *n, size_t len)
+{
+	struct mount_notification *m = (struct mount_notification *)n;
+
+	if (len != sizeof(struct mount_notification))
+		return;
+
+	printf("MOUNT %08x change=%u[%s] aux=%u\n",
+	       m->triggered_on, n->subtype, mount_subtypes[n->subtype], m->changed_mount);
+}
+
 /*
  * Consume and display events.
  */
@@ -134,6 +158,9 @@ static void consumer(int fd)
 			default:
 				printf("other type\n");
 				break;
+			case WATCH_TYPE_MOUNT_NOTIFY:
+				saw_mount_change(&n.n, len);
+				break;
 			}
 
 			p += len;
@@ -142,12 +169,17 @@ static void consumer(int fd)
 }
 
 static struct watch_notification_filter filter = {
-	.nr_filters	= 1,
+	.nr_filters	= 2,
 	.filters = {
 		[0]	= {
 			.type			= WATCH_TYPE_KEY_NOTIFY,
 			.subtype_filter[0]	= UINT_MAX,
 		},
+		[1] = {
+			.type			= WATCH_TYPE_MOUNT_NOTIFY,
+			// Reject move-from notifications
+			.subtype_filter[0]	= UINT_MAX & ~(1 << NOTIFY_MOUNT_MOVE_FROM),
+		},
 	},
 };
 
@@ -181,6 +213,11 @@ int main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (syscall(__NR_watch_mount, AT_FDCWD, "/", 0, fd, 0x02) == -1) {
+		perror("watch_mount");
+		exit(1);
+	}
+
 	consumer(fd);
 	exit(0);
 }



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 15/19] vfs: Add superblock notifications [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (13 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 14/19] notifications: sample: Display mount tree change notifications [ver #16] David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-19 23:08   ` Jann Horn
  2020-02-21 14:23   ` David Howells
  2020-02-18 17:06 ` [PATCH 16/19] fsinfo: Provide superblock notification counter " David Howells
                   ` (8 subsequent siblings)
  23 siblings, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add a superblock event notification facility whereby notifications about
superblock events, such as I/O errors (EIO), quota limits being hit
(EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
process asynchronously.  Note that this does not cover vfsmount topology
changes.  watch_mount() is used for that.

Firstly, an event queue needs to be created:

	fd = open("/dev/event_queue", O_RDWR);
	ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n);

then a notification can be set up to report notifications via that queue:

	struct watch_notification_filter filter = {
		.nr_filters = 1,
		.filters = {
			[0] = {
				.type = WATCH_TYPE_SB_NOTIFY,
				.subtype_filter[0] = UINT_MAX,
			},
		},
	};
	ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
	watch_sb(AT_FDCWD, "/home/dhowells", 0, fd, 0x03);

In this case, it would let me monitor my own homedir for events.  After
setting the watch, records will be placed into the queue when, for example,
as superblock switches between read-write and read-only.  Records are of
the following format:

	struct superblock_notification {
		struct watch_notification watch;
		__u64	sb_id;
	} *n;

Where:

	n->watch.type will be WATCH_TYPE_SB_NOTIFY.

	n->watch.subtype will indicate the type of event, such as
	NOTIFY_SUPERBLOCK_READONLY.

	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
	record.

	n->watch.info & WATCH_INFO_ID will be the fifth argument to
	watch_sb(), shifted.

	n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
	NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
	R/O, and being cleared otherwise.

	n->sb_id will be the ID of the superblock, as can be retrieved with
	the fsinfo() syscall, as part of the fsinfo_sb_notifications
	attribute in the the watch_id field.

Note that it is permissible for event records to be of variable length -
or, at least, the length may be dependent on the subtype.  Note also that
the queue can be shared between multiple notifications of various types.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 fs/Kconfig                                  |   12 +++
 fs/super.c                                  |  125 +++++++++++++++++++++++++++
 include/linux/fs.h                          |   77 +++++++++++++++++
 include/linux/syscalls.h                    |    2 
 include/uapi/asm-generic/unistd.h           |    4 +
 include/uapi/linux/watch_queue.h            |   31 ++++++-
 kernel/sys_ni.c                             |    1 
 24 files changed, 267 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 72bc7b33c59d..cd39492e4f7d 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	fsinfo				sys_fsinfo
 550	common	watch_mount			sys_watch_mount
+551	common	watch_sb			sys_watch_sb
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 67777fd0b19e..a26bc42b9464 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -453,3 +453,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 75f04a1023be..388eeb71cff0 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		442
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index cd18dc112902..b13b94de9a01 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -360,3 +360,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index de5c7303899f..4a163d0200b2 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -440,3 +440,4 @@
 # 435 reserved for clone3
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 7387a44767c3..b0fed3b73462 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -445,3 +445,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index e2c76157a580..8a33cc08ed39 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -378,3 +378,4 @@
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	fsinfo				sys_fsinfo
 440	n32	watch_mount			sys_watch_mount
+441	n32	watch_sb			sys_watch_sb
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index e5da9a13b074..8a11d81717d3 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -354,3 +354,4 @@
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	fsinfo				sys_fsinfo
 440	n64	watch_mount			sys_watch_mount
+441	n64	watch_sb			sys_watch_sb
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index fe135759d2a8..76787af4a8f2 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -427,3 +427,4 @@
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	fsinfo				sys_fsinfo
 440	o32	watch_mount			sys_watch_mount
+441	o32	watch_sb			sys_watch_sb
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 5ac7a58af305..1c35cf2c0938 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -437,3 +437,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index c77a1cf377ec..c5ea6f8e95b6 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -521,3 +521,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index d81d30d02aaf..4577426e09f5 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	fsinfo			sys_fsinfo			sys_fsinfo
 440	common	watch_mount		sys_watch_mount			sys_watch_mount
+441	common	watch_sb		sys_watch_sb			sys_watch_sb
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index dcdc747fa430..e57c03fd5ba3 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index b4f82e5a08bf..1b2b19873319 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -485,3 +485,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 07572644779d..8b3a00860524 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -444,3 +444,4 @@
 438	i386	pidfd_getfd		sys_pidfd_getfd			__ia32_sys_pidfd_getfd
 439	i386	fsinfo			sys_fsinfo			__ia32_sys_fsinfo
 440	i386	watch_mount		sys_watch_mount			__ia32_sys_watch_mount
+441	i386	watch_sb		sys_watch_sb			__ia32_sys_watch_sb
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 1b51791fe104..8522ff13308c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -361,6 +361,7 @@
 438	common	pidfd_getfd		__x64_sys_pidfd_getfd
 439	common	fsinfo			__x64_sys_fsinfo
 440	common	watch_mount		__x64_sys_watch_mount
+441	common	watch_sb		__x64_sys_watch_sb
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index dfcdd3036d3e..70f0292ed37a 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -410,3 +410,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	fsinfo				sys_fsinfo
 440	common	watch_mount			sys_watch_mount
+441	common	watch_sb			sys_watch_sb
diff --git a/fs/Kconfig b/fs/Kconfig
index 76224bc015cb..01d0d436b3cd 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -133,6 +133,18 @@ config MOUNT_NOTIFICATIONS
 	  device to handle the notification buffer and provides the
 	  mount_notify() system call to enable/disable watchpoints.
 
+config SB_NOTIFICATIONS
+	bool "Superblock event notifications"
+	select WATCH_QUEUE
+	help
+	  This option provides support for receiving superblock event
+	  notifications.  This makes use of the /dev/watch_queue misc device to
+	  handle the notification buffer and provides the sb_notify() system
+	  call to enable/disable watches.
+
+	  Events can include things like changing between R/W and R/O, EIO
+	  generation, ENOSPC generation and EDQUOT generation.
+
 source "fs/quota/Kconfig"
 
 source "fs/autofs/Kconfig"
diff --git a/fs/super.c b/fs/super.c
index a63073e6127e..ec16e6f88c16 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -37,6 +37,8 @@
 #include <linux/lockdep.h>
 #include <linux/user_namespace.h>
 #include <linux/fs_context.h>
+#include <linux/syscalls.h>
+#include <linux/namei.h>
 #include <uapi/linux/mount.h>
 #include "internal.h"
 
@@ -354,6 +356,10 @@ void deactivate_locked_super(struct super_block *s)
 {
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
+#ifdef CONFIG_SB_NOTIFICATIONS
+		if (s->s_watchers)
+			remove_watch_list(s->s_watchers, s->s_unique_id);
+#endif
 		cleancache_invalidate_fs(s);
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
@@ -993,6 +999,8 @@ int reconfigure_super(struct fs_context *fc)
 	/* Needs to be ordered wrt mnt_is_readonly() */
 	smp_wmb();
 	sb->s_readonly_remount = 0;
+	notify_sb(sb, NOTIFY_SUPERBLOCK_READONLY,
+		  remount_ro ? NOTIFY_SUPERBLOCK_IS_NOW_RO : 0);
 
 	/*
 	 * Some filesystems modify their metadata via some other path than the
@@ -1891,3 +1899,120 @@ int thaw_super(struct super_block *sb)
 	return thaw_super_locked(sb);
 }
 EXPORT_SYMBOL(thaw_super);
+
+#ifdef CONFIG_SB_NOTIFICATIONS
+/*
+ * Post superblock notifications.
+ */
+void post_sb_notification(struct super_block *s, struct superblock_notification *n)
+{
+	post_watch_notification(s->s_watchers, &n->watch, current_cred(),
+				s->s_unique_id);
+}
+
+/**
+ * sys_watch_sb - Watch for superblock events.
+ * @dfd: Base directory to pathwalk from or fd referring to superblock.
+ * @filename: Path to superblock to place the watch upon
+ * @at_flags: Pathwalk control flags
+ * @watch_fd: The watch queue to send notifications to.
+ * @watch_id: The watch ID to be placed in the notification (-1 to remove watch)
+ */
+SYSCALL_DEFINE5(watch_sb,
+		int, dfd,
+		const char __user *, filename,
+		unsigned int, at_flags,
+		int, watch_fd,
+		int, watch_id)
+{
+	struct watch_queue *wqueue;
+	struct super_block *s;
+	struct watch_list *wlist = NULL;
+	struct watch *watch = NULL;
+	struct path path;
+	unsigned int lookup_flags =
+		LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+	int ret;
+
+	if (watch_id < -1 || watch_id > 0xff)
+		return -EINVAL;
+	if ((at_flags & ~(AT_NO_AUTOMOUNT | AT_EMPTY_PATH)) != 0)
+		return -EINVAL;
+	if (at_flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (at_flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+
+	ret = user_path_at(dfd, filename, at_flags, &path);
+	if (ret)
+		return ret;
+
+	ret = inode_permission(path.dentry->d_inode, MAY_EXEC);
+	if (ret)
+		goto err_path;
+
+	wqueue = get_watch_queue(watch_fd);
+	if (IS_ERR(wqueue))
+		goto err_path;
+
+	s = path.dentry->d_sb;
+	if (watch_id >= 0) {
+		ret = -ENOMEM;
+		if (!s->s_watchers) {
+			wlist = kzalloc(sizeof(*wlist), GFP_KERNEL);
+			if (!wlist)
+				goto err_wqueue;
+			init_watch_list(wlist, NULL);
+		}
+
+		watch = kzalloc(sizeof(*watch), GFP_KERNEL);
+		if (!watch)
+			goto err_wlist;
+
+		init_watch(watch, wqueue);
+		watch->id		= s->s_unique_id;
+		watch->private		= s;
+		watch->info_id		= (u32)watch_id << 24;
+
+		ret = security_watch_sb(watch, s);
+		if (ret < 0)
+			goto err_watch;
+
+		down_write(&s->s_umount);
+		ret = -EIO;
+		if (atomic_read(&s->s_active)) {
+			if (!s->s_watchers) {
+				s->s_watchers = wlist;
+				wlist = NULL;
+			}
+
+			ret = add_watch_to_object(watch, s->s_watchers);
+			if (ret == 0) {
+				spin_lock(&sb_lock);
+				s->s_count++;
+				spin_unlock(&sb_lock);
+				watch = NULL;
+			}
+		}
+		up_write(&s->s_umount);
+	} else {
+		ret = -EBADSLT;
+		if (READ_ONCE(s->s_watchers)) {
+			down_write(&s->s_umount);
+			ret = remove_watch_from_object(s->s_watchers, wqueue,
+						       s->s_unique_id, false);
+			up_write(&s->s_umount);
+		}
+	}
+
+err_watch:
+	kfree(watch);
+err_wlist:
+	kfree(wlist);
+err_wqueue:
+	put_watch_queue(wqueue);
+err_path:
+	path_put(&path);
+	return ret;
+}
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e5db22d536a3..423a6f03cdf8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -40,6 +40,7 @@
 #include <linux/fs_types.h>
 #include <linux/build_bug.h>
 #include <linux/stddef.h>
+#include <linux/watch_queue.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1553,6 +1554,10 @@ struct super_block {
 
 	/* Superblock event notifications */
 	u64			s_unique_id;
+
+#ifdef CONFIG_SB_NOTIFICATIONS
+	struct watch_list	*s_watchers;
+#endif
 } __randomize_layout;
 
 /* Helper functions so that in most cases filesystems will
@@ -3659,4 +3664,76 @@ static inline int inode_drain_writes(struct inode *inode)
 	return filemap_write_and_wait(inode->i_mapping);
 }
 
+extern void post_sb_notification(struct super_block *, struct superblock_notification *);
+
+/**
+ * notify_sb: Post simple superblock notification.
+ * @s: The superblock the notification is about.
+ * @subtype: The type of notification.
+ * @info: WATCH_INFO_FLAG_* flags to be set in the record.
+ */
+static inline void notify_sb(struct super_block *s,
+			     enum superblock_notification_type subtype,
+			     u32 info)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_notification n = {
+			.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.watch.subtype	= subtype,
+			.watch.info	= watch_sizeof(n) | info,
+			.sb_id		= s->s_unique_id,
+		};
+
+		post_sb_notification(s, &n);
+	}
+			     
+#endif
+}
+
+/**
+ * notify_sb_error: Post superblock error notification.
+ * @s: The superblock the notification is about.
+ * @error: The error number to be recorded.
+ */
+static inline int notify_sb_error(struct super_block *s, int error)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_error_notification n = {
+			.s.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.s.watch.subtype = NOTIFY_SUPERBLOCK_ERROR,
+			.s.watch.info	= watch_sizeof(n),
+			.s.sb_id	= s->s_unique_id,
+			.error_number	= error,
+			.error_cookie	= 0,
+		};
+
+		post_sb_notification(s, &n.s);
+	}
+#endif
+	return error;
+}
+
+/**
+ * notify_sb_EDQUOT: Post superblock quota overrun notification.
+ * @s: The superblock the notification is about.
+ */
+static inline int notify_sb_EQDUOT(struct super_block *s)
+{
+#ifdef CONFIG_SB_NOTIFICATIONS
+	if (unlikely(s->s_watchers)) {
+		struct superblock_notification n = {
+			.watch.type	= WATCH_TYPE_SB_NOTIFY,
+			.watch.subtype	= NOTIFY_SUPERBLOCK_EDQUOT,
+			.watch.info	= watch_sizeof(n),
+			.sb_id		= s->s_unique_id,
+		};
+
+		post_sb_notification(s, &n);
+	}
+#endif
+	return -EDQUOT;
+}
+
 #endif /* _LINUX_FS_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1687e064751d..af66fe97a586 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1009,6 +1009,8 @@ asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
 			   void __user *buffer, size_t buf_size);
 asmlinkage long sys_watch_mount(int dfd, const char __user *path,
 				unsigned int at_flags, int watch_fd, int watch_id);
+asmlinkage long sys_watch_sb(int dfd, const char __user *path,
+			     unsigned int at_flags, int watch_fd, int watch_id);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index d6b6c45ad31a..882c0fae4f37 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -859,9 +859,11 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 __SYSCALL(__NR_fsinfo, sys_fsinfo)
 #define __NR_watch_mount 440
 __SYSCALL(__NR_watch_mount, sys_watch_mount)
+#define __NR_watch_sb 441
+__SYSCALL(__NR_watch_sb, sys_watch_sb)
 
 #undef __NR_syscalls
-#define __NR_syscalls 441
+#define __NR_syscalls 442
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index b0f35cf51394..190d27073302 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -15,7 +15,8 @@ enum watch_notification_type {
 	WATCH_TYPE_META		= 0,	/* Special record */
 	WATCH_TYPE_KEY_NOTIFY	= 1,	/* Key change event notification */
 	WATCH_TYPE_MOUNT_NOTIFY	= 2,	/* Mount topology change notification */
-	WATCH_TYPE___NR		= 3
+	WATCH_TYPE_SB_NOTIFY	= 3,	/* Superblock event notification */
+	WATCH_TYPE___NR		= 4
 };
 
 enum watch_meta_notification_subtype {
@@ -131,4 +132,32 @@ struct mount_notification {
 	__u32	changed_mount;		/* The mount that got changed */
 };
 
+/*
+ * Type of superblock notification.
+ */
+enum superblock_notification_type {
+	NOTIFY_SUPERBLOCK_READONLY	= 0, /* Filesystem toggled between R/O and R/W */
+	NOTIFY_SUPERBLOCK_ERROR		= 1, /* Error in filesystem or blockdev */
+	NOTIFY_SUPERBLOCK_EDQUOT	= 2, /* EDQUOT notification */
+	NOTIFY_SUPERBLOCK_NETWORK	= 3, /* Network status change */
+};
+
+#define NOTIFY_SUPERBLOCK_IS_NOW_RO	WATCH_INFO_FLAG_0 /* Superblock changed to R/O */
+
+/*
+ * Superblock notification record.
+ * - watch.type = WATCH_TYPE_MOUNT_NOTIFY
+ * - watch.subtype = enum superblock_notification_subtype
+ */
+struct superblock_notification {
+	struct watch_notification watch; /* WATCH_TYPE_SB_NOTIFY */
+	__u64	sb_id;			/* 64-bit superblock ID [fsinfo_ids::f_sb_id] */
+};
+
+struct superblock_error_notification {
+	struct superblock_notification s; /* subtype = notify_superblock_error */
+	__u32	error_number;
+	__u32	error_cookie;
+};
+
 #endif /* _UAPI_LINUX_WATCH_QUEUE_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 1a1eb7b61914..bc2e6885ef2d 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -53,6 +53,7 @@ COND_SYSCALL(io_uring_enter);
 COND_SYSCALL(io_uring_register);
 COND_SYSCALL(fsinfo);
 COND_SYSCALL(watch_mount);
+COND_SYSCALL(watch_sb);
 
 /* fs/xattr.c */
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 16/19] fsinfo: Provide superblock notification counter [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (14 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 15/19] vfs: Add superblock " David Howells
@ 2020-02-18 17:06 ` David Howells
  2020-02-18 17:07 ` [PATCH 17/19] notifications: sample: Display superblock notifications " David Howells
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:06 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Provide an fsinfo attribute to export the superblock notification counter
so that it can be polled in the case of a notification buffer overrun.
This is accessed with:

	struct fsinfo_params params = {
		.request = FSINFO_ATTR_SB_NOTIFICATIONS,
	};

and returns a structure that looks like:

	struct fsinfo_sb_notifications {
		__u64	watch_id;
		__u32	notify_counter;
		__u32	__reserved[1];
	};

Where watch_id is a number uniquely identifying the superblock in
notification records and notify_counter is incremented for each
superblock notification posted.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                      |   11 +++++++++++
 include/linux/fs.h               |    4 ++++
 include/uapi/linux/fsinfo.h      |   12 ++++++++++++
 include/uapi/linux/watch_queue.h |    2 +-
 samples/vfs/test-fsinfo.c        |   12 ++++++++++--
 5 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index b57fbcd3a7a5..72fc4f790a5a 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -226,6 +226,16 @@ static int fsinfo_generic_volume_id(struct path *path, struct fsinfo_context *ct
 	return fsinfo_string(path->dentry->d_sb->s_id, ctx);
 }
 
+static int fsinfo_generic_sb_notifications(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_sb_notifications *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	p->watch_id		= sb->s_unique_id;
+	p->notify_counter	= atomic_read(&sb->s_notify_counter);
+	return sizeof(*p);
+}
+
 static int fsinfo_attribute_info(struct path *path, struct fsinfo_context *ctx)
 {
 	const struct fsinfo_attribute *attr;
@@ -292,6 +302,7 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_STRING 	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SB_NOTIFICATIONS,	fsinfo_generic_sb_notifications),
 
 	FSINFO_VSTRUCT	(FSINFO_ATTR_FSINFO,		fsinfo_generic_fsinfo),
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_attribute_info),
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 423a6f03cdf8..30b910d591db 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1558,6 +1558,7 @@ struct super_block {
 #ifdef CONFIG_SB_NOTIFICATIONS
 	struct watch_list	*s_watchers;
 #endif
+	atomic_t		s_notify_counter;
 } __randomize_layout;
 
 /* Helper functions so that in most cases filesystems will
@@ -3677,6 +3678,7 @@ static inline void notify_sb(struct super_block *s,
 			     u32 info)
 {
 #ifdef CONFIG_SB_NOTIFICATIONS
+	atomic_inc(&s->s_notify_counter);
 	if (unlikely(s->s_watchers)) {
 		struct superblock_notification n = {
 			.watch.type	= WATCH_TYPE_SB_NOTIFY,
@@ -3699,6 +3701,7 @@ static inline void notify_sb(struct super_block *s,
 static inline int notify_sb_error(struct super_block *s, int error)
 {
 #ifdef CONFIG_SB_NOTIFICATIONS
+	atomic_inc(&s->s_notify_counter);
 	if (unlikely(s->s_watchers)) {
 		struct superblock_error_notification n = {
 			.s.watch.type	= WATCH_TYPE_SB_NOTIFY,
@@ -3722,6 +3725,7 @@ static inline int notify_sb_error(struct super_block *s, int error)
 static inline int notify_sb_EQDUOT(struct super_block *s)
 {
 #ifdef CONFIG_SB_NOTIFICATIONS
+	atomic_inc(&s->s_notify_counter);
 	if (unlikely(s->s_watchers)) {
 		struct superblock_notification n = {
 			.watch.type	= WATCH_TYPE_SB_NOTIFY,
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 5926b16aac4e..5467f88ca9b0 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -23,6 +23,7 @@
 #define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
 #define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
 #define FSINFO_ATTR_FEATURES		0x08	/* Filesystem features (bits) */
+#define FSINFO_ATTR_SB_NOTIFICATIONS	0x09	/* sb_notify() information */
 
 #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
@@ -300,6 +301,17 @@ struct fsinfo_volume_uuid {
 
 #define FSINFO_ATTR_VOLUME_UUID__STRUCT struct fsinfo_volume_uuid
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_SB_NOTIFICATIONS).
+ */
+struct fsinfo_sb_notifications {
+	__u64		watch_id;	/* Watch ID for superblock. */
+	__u32		notify_counter;	/* Number of notifications. */
+	__u32		__reserved[1];
+};
+
+#define FSINFO_ATTR_SB_NOTIFICATIONS__STRUCT struct fsinfo_sb_notifications
+
 /*
  * Information struct for fsinfo(FSINFO_ATTR_AFS_SERVER_ADDRESSES).
  *
diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
index 190d27073302..586f4af965ac 100644
--- a/include/uapi/linux/watch_queue.h
+++ b/include/uapi/linux/watch_queue.h
@@ -151,7 +151,7 @@ enum superblock_notification_type {
  */
 struct superblock_notification {
 	struct watch_notification watch; /* WATCH_TYPE_SB_NOTIFY */
-	__u64	sb_id;			/* 64-bit superblock ID [fsinfo_ids::f_sb_id] */
+	__u64	sb_id;		/* 64-bit superblock ID [fsinfo_sb_notifications::watch_id] */
 };
 
 struct superblock_error_notification {
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 6ad0f84c4327..fd425c08b00b 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -306,6 +306,15 @@ static void dump_fsinfo_generic_mount_child(void *reply, unsigned int size)
 	printf("%8x %8x\n", f->mnt_id, f->change_counter);
 }
 
+static void dump_fsinfo_generic_sb_notifications(void *reply, unsigned int size)
+{
+	struct fsinfo_sb_notifications *f = reply;
+
+	printf("\n");
+	printf("\twatch_id: %llx\n", (unsigned long long)f->watch_id);
+	printf("\tnotifs  : %llx\n", (unsigned long long)f->notify_counter);
+}
+
 static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
 {
 	struct fsinfo_afs_server_address *f = reply;
@@ -416,12 +425,11 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	fsinfo_generic_volume_name),
-
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SB_NOTIFICATIONS,	fsinfo_generic_sb_notifications),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_DEVNAME,	fsinfo_generic_mount_devname),
 	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_child),
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
-
 	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	afs_cell_name),
 	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_server_name),
 	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 17/19] notifications: sample: Display superblock notifications [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (15 preceding siblings ...)
  2020-02-18 17:06 ` [PATCH 16/19] fsinfo: Provide superblock notification counter " David Howells
@ 2020-02-18 17:07 ` David Howells
  2020-02-18 17:07 ` [PATCH 18/19] ext4: Add example fsinfo information " David Howells
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:07 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

The notification is run as:

	./watch_test

and it then watches "/mnt" for superblock notifications:

	# mount -t tmpfs none /mnt
	# ./watch_test &
	# mount -o remount,ro /mnt
	# mount -o remount,rw /mnt

producing:

	# ./watch_test
	NOTIFY[000]: ty=000003 sy=00 i=03010010
	SUPER 157eb57ca7 change=0[readonly]
	read() = 16
	NOTIFY[000]: ty=000002 sy=04 i=02010010
	MOUNT 000001a0 change=4[setattr] aux=0
	read() = 16
	NOTIFY[000]: ty=000002 sy=04 i=02010010
	MOUNT 000001a0 change=4[setattr] aux=0

Signed-off-by: David Howells <dhowells@redhat.com>
---

 samples/watch_queue/watch_test.c |   39 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c
index 49d185150506..eea3bd8c6569 100644
--- a/samples/watch_queue/watch_test.c
+++ b/samples/watch_queue/watch_test.c
@@ -29,6 +29,9 @@
 #ifndef __NR_watch_mount
 #define __NR_watch_mount -1
 #endif
+#ifndef __NR_watch_sb
+#define __NR_watch_sb -1
+#endif
 
 #define BUF_SIZE 256
 
@@ -82,6 +85,24 @@ static void saw_mount_change(struct watch_notification *n, size_t len)
 	       m->triggered_on, n->subtype, mount_subtypes[n->subtype], m->changed_mount);
 }
 
+static const char *super_subtypes[256] = {
+	[NOTIFY_SUPERBLOCK_READONLY]	= "readonly",
+	[NOTIFY_SUPERBLOCK_ERROR]	= "error",
+	[NOTIFY_SUPERBLOCK_EDQUOT]	= "edquot",
+	[NOTIFY_SUPERBLOCK_NETWORK]	= "network",
+};
+
+static void saw_super_change(struct watch_notification *n, size_t len)
+{
+	struct superblock_notification *s = (struct superblock_notification *)n;
+
+	if (len < sizeof(struct superblock_notification))
+		return;
+
+	printf("SUPER %08llx change=%u[%s]\n",
+	       s->sb_id, n->subtype, super_subtypes[n->subtype]);
+}
+
 /*
  * Consume and display events.
  */
@@ -161,6 +182,9 @@ static void consumer(int fd)
 			case WATCH_TYPE_MOUNT_NOTIFY:
 				saw_mount_change(&n.n, len);
 				break;
+			case WATCH_TYPE_SB_NOTIFY:
+				saw_super_change(&n.n, len);
+				break;
 			}
 
 			p += len;
@@ -169,7 +193,7 @@ static void consumer(int fd)
 }
 
 static struct watch_notification_filter filter = {
-	.nr_filters	= 2,
+	.nr_filters	= 3,
 	.filters = {
 		[0]	= {
 			.type			= WATCH_TYPE_KEY_NOTIFY,
@@ -180,6 +204,14 @@ static struct watch_notification_filter filter = {
 			// Reject move-from notifications
 			.subtype_filter[0]	= UINT_MAX & ~(1 << NOTIFY_MOUNT_MOVE_FROM),
 		},
+		[2]	= {
+			.type			= WATCH_TYPE_SB_NOTIFY,
+			// Only accept notification of changes to R/O state
+			.subtype_filter[0]	= (1 << NOTIFY_SUPERBLOCK_READONLY),
+			// Only accept notifications of change-to-R/O
+			.info_mask		= WATCH_INFO_FLAG_0,
+			.info_filter		= WATCH_INFO_FLAG_0,
+		},
 	},
 };
 
@@ -218,6 +250,11 @@ int main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (syscall(__NR_watch_sb, AT_FDCWD, "/mnt", 0, fd, 0x03) == -1) {
+		perror("watch_sb");
+		exit(1);
+	}
+
 	consumer(fd);
 	exit(0);
 }



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 18/19] ext4: Add example fsinfo information [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (16 preceding siblings ...)
  2020-02-18 17:07 ` [PATCH 17/19] notifications: sample: Display superblock notifications " David Howells
@ 2020-02-18 17:07 ` David Howells
  2020-02-19 17:04   ` Darrick J. Wong
                     ` (2 more replies)
  2020-02-18 17:07 ` [PATCH 19/19] nfs: Add example filesystem " David Howells
                   ` (5 subsequent siblings)
  23 siblings, 3 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:07 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add the ability to list some ext4 volume timestamps as an example.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-ext4@vger.kernel.org
---

 fs/ext4/Makefile            |    1 +
 fs/ext4/ext4.h              |    9 +++++++++
 fs/ext4/fsinfo.c            |   40 ++++++++++++++++++++++++++++++++++++++++
 fs/ext4/super.c             |    1 +
 include/uapi/linux/fsinfo.h |   16 ++++++++++++++++
 samples/vfs/test-fsinfo.c   |   35 +++++++++++++++++++++++++++++++++++
 6 files changed, 102 insertions(+)
 create mode 100644 fs/ext4/fsinfo.c

diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 4ccb3c9189d8..71d5b460c7c7 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -16,3 +16,4 @@ ext4-$(CONFIG_EXT4_FS_SECURITY)		+= xattr_security.o
 ext4-inode-test-objs			+= inode-test.o
 obj-$(CONFIG_EXT4_KUNIT_TESTS)		+= ext4-inode-test.o
 ext4-$(CONFIG_FS_VERITY)		+= verity.o
+ext4-$(CONFIG_FSINFO)			+= fsinfo.o
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9a2ee2428ecc..d81b04227da7 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -42,6 +42,7 @@
 
 #include <linux/fscrypt.h>
 #include <linux/fsverity.h>
+#include <linux/fsinfo.h>
 
 #include <linux/compiler.h>
 
@@ -3166,6 +3167,14 @@ extern const struct inode_operations ext4_file_inode_operations;
 extern const struct file_operations ext4_file_operations;
 extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
 
+/* fsinfo.c */
+#ifdef CONFIG_FSINFO
+struct fsinfo_attribute;
+extern const struct fsinfo_attribute ext4_fsinfo_attributes[];
+#else
+#define ext4_fsinfo_attributes NULL
+#endif
+
 /* inline.c */
 extern int ext4_get_max_inline_size(struct inode *inode);
 extern int ext4_find_inline_data_nolock(struct inode *inode);
diff --git a/fs/ext4/fsinfo.c b/fs/ext4/fsinfo.c
new file mode 100644
index 000000000000..545424c410ff
--- /dev/null
+++ b/fs/ext4/fsinfo.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information for ext4
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include "ext4.h"
+
+static int ext4_fsinfo_get_volume_name(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct ext4_sb_info *sbi = EXT4_SB(path->mnt->mnt_sb);
+	const struct ext4_super_block *es = sbi->s_es;
+
+	memcpy(ctx->buffer, es->s_volume_name, sizeof(es->s_volume_name));
+	return strlen(ctx->buffer);
+}
+
+static int ext4_fsinfo_get_timestamps(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct ext4_sb_info *sbi = EXT4_SB(path->mnt->mnt_sb);
+	const struct ext4_super_block *es = sbi->s_es;
+	struct fsinfo_ext4_timestamps *ts = ctx->buffer;
+
+#define Z(R,S) R = S | (((u64)S##_hi) << 32)
+	Z(ts->mkfs_time,	es->s_mkfs_time);
+	Z(ts->mount_time,	es->s_mtime);
+	Z(ts->write_time,	es->s_wtime);
+	Z(ts->last_check_time,	es->s_lastcheck);
+	Z(ts->first_error_time,	es->s_first_error_time);
+	Z(ts->last_error_time,	es->s_last_error_time);
+	return sizeof(*ts);
+}
+
+const struct fsinfo_attribute ext4_fsinfo_attributes[] = {
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	ext4_fsinfo_get_volume_name),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_EXT4_TIMESTAMPS,	ext4_fsinfo_get_timestamps),
+	{}
+};
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8434217549b3..e21c3d99747e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1477,6 +1477,7 @@ static const struct super_operations ext4_sops = {
 	.freeze_fs	= ext4_freeze,
 	.unfreeze_fs	= ext4_unfreeze,
 	.statfs		= ext4_statfs,
+	.fsinfo_attributes = ext4_fsinfo_attributes,
 	.remount_fs	= ext4_remount,
 	.show_options	= ext4_show_options,
 #ifdef CONFIG_QUOTA
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 5467f88ca9b0..da9a6f48ec5b 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -38,6 +38,8 @@
 #define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of the Nth server (string) */
 #define FSINFO_ATTR_AFS_SERVER_ADDRESSES 0x302	/* List of addresses of the Nth server */
 
+#define FSINFO_ATTR_EXT4_TIMESTAMPS	0x400	/* Ext4 superblock timestamps */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -323,4 +325,18 @@ struct fsinfo_afs_server_address {
 
 #define FSINFO_ATTR_AFS_SERVER_ADDRESSES__STRUCT struct fsinfo_afs_server_address
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_EXT4_TIMESTAMPS).
+ */
+struct fsinfo_ext4_timestamps {
+	__u64		mkfs_time;
+	__u64		mount_time;
+	__u64		write_time;
+	__u64		last_check_time;
+	__u64		first_error_time;
+	__u64		last_error_time;
+};
+
+#define FSINFO_ATTR_EXT4_TIMESTAMPS__STRUCT struct fsinfo_ext4_timestamps
+
 #endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index fd425c08b00b..53251ee98d1c 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -359,6 +359,40 @@ static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
 	printf("family=%u\n", ss->ss_family);
 }
 
+static char *dump_ext4_time(char *buffer, time_t tim)
+{
+	struct tm tm;
+	int len;
+
+	if (tim == 0)
+		return "-";
+
+	if (!localtime_r(&tim, &tm)) {
+		perror("localtime_r");
+		exit(1);
+	}
+	len = strftime(buffer, 100, "%F %T", &tm);
+	if (len == 0) {
+		perror("strftime");
+		exit(1);
+	}
+	return buffer;
+}
+
+static void dump_ext4_fsinfo_timestamps(void *reply, unsigned int size)
+{
+	struct fsinfo_ext4_timestamps *r = reply;
+	char buffer[100];
+
+	printf("\n");
+	printf("\tmkfs    : %s\n", dump_ext4_time(buffer, r->mkfs_time));
+	printf("\tmount   : %s\n", dump_ext4_time(buffer, r->mount_time));
+	printf("\twrite   : %s\n", dump_ext4_time(buffer, r->write_time));
+	printf("\tfsck    : %s\n", dump_ext4_time(buffer, r->last_check_time));
+	printf("\t1st-err : %s\n", dump_ext4_time(buffer, r->first_error_time));
+	printf("\tlast-err: %s\n", dump_ext4_time(buffer, r->last_error_time));
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -433,6 +467,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	afs_cell_name),
 	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_server_name),
 	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_EXT4_TIMESTAMPS,	ext4_fsinfo_timestamps),
 	{}
 };
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 19/19] nfs: Add example filesystem information [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (17 preceding siblings ...)
  2020-02-18 17:07 ` [PATCH 18/19] ext4: Add example fsinfo information " David Howells
@ 2020-02-18 17:07 ` David Howells
  2020-02-20  2:13   ` kbuild test robot
  2020-02-20  2:20   ` kbuild test robot
  2020-02-18 18:12 ` David Howells
                   ` (4 subsequent siblings)
  23 siblings, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 17:07 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel

Add the ability to list NFS server addresses and hostname, timestamp
information and capabilities as an example.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-nfs@vger.kernel.org
---

 fs/nfs/Makefile             |    1 +
 fs/nfs/internal.h           |    8 ++++++++
 fs/nfs/nfs4super.c          |    1 +
 fs/nfs/super.c              |    1 +
 include/uapi/linux/fsinfo.h |   29 +++++++++++++++++++++++++++++
 samples/vfs/test-fsinfo.c   |   40 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 80 insertions(+)

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 2433c3e03cfa..20fbc9596833 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -13,6 +13,7 @@ nfs-y 			:= client.o dir.o file.o getroot.o inode.o super.o \
 nfs-$(CONFIG_ROOT_NFS)	+= nfsroot.o
 nfs-$(CONFIG_SYSCTL)	+= sysctl.o
 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
+nfs-$(CONFIG_FSINFO)	+= fsinfo.o
 
 obj-$(CONFIG_NFS_V2) += nfsv2.o
 nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f80c47d5ff27..4ddf0da25740 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -10,6 +10,7 @@
 #include <linux/sunrpc/addr.h>
 #include <linux/nfs_page.h>
 #include <linux/wait_bit.h>
+#include <linux/fsinfo.h>
 
 #define NFS_SB_MASK (SB_RDONLY|SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS)
 
@@ -247,6 +248,13 @@ extern const struct svc_version nfs4_callback_version4;
 /* fs_context.c */
 extern struct file_system_type nfs_fs_type;
 
+/* fsinfo.c */
+#ifdef CONFIG_FSINFO
+extern const struct fsinfo_attribute nfs_fsinfo_attributes[];
+#else
+#define nfs_fsinfo_attributes NULL
+#endif
+
 /* pagelist.c */
 extern int __init nfs_init_nfspagecache(void);
 extern void nfs_destroy_nfspagecache(void);
diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index 1475f932d7da..1b75144e24f4 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -26,6 +26,7 @@ static const struct super_operations nfs4_sops = {
 	.write_inode	= nfs4_write_inode,
 	.drop_inode	= nfs_drop_inode,
 	.statfs		= nfs_statfs,
+	.fsinfo_attributes = nfs_fsinfo_attributes,
 	.evict_inode	= nfs4_evict_inode,
 	.umount_begin	= nfs_umount_begin,
 	.show_options	= nfs_show_options,
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index dada09b391c6..fbc2cf5f803b 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -76,6 +76,7 @@ const struct super_operations nfs_sops = {
 	.write_inode	= nfs_write_inode,
 	.drop_inode	= nfs_drop_inode,
 	.statfs		= nfs_statfs,
+	.fsinfo_attributes = nfs_fsinfo_attributes,
 	.evict_inode	= nfs_evict_inode,
 	.umount_begin	= nfs_umount_begin,
 	.show_options	= nfs_show_options,
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index da9a6f48ec5b..7c97d65333ec 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -40,6 +40,11 @@
 
 #define FSINFO_ATTR_EXT4_TIMESTAMPS	0x400	/* Ext4 superblock timestamps */
 
+#define FSINFO_ATTR_NFS_INFO		0x500	/* Information about an NFS mount */
+#define FSINFO_ATTR_NFS_SERVER_NAME	0x501	/* Name of the server (string) */
+#define FSINFO_ATTR_NFS_SERVER_ADDRESSES 0x502	/* List of addresses of the server */
+#define FSINFO_ATTR_NFS_GSSAPI_NAME	0x503	/* GSSAPI acceptor name */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -339,4 +344,28 @@ struct fsinfo_ext4_timestamps {
 
 #define FSINFO_ATTR_EXT4_TIMESTAMPS__STRUCT struct fsinfo_ext4_timestamps
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_NFS_INFO).
+ *
+ * Get information about an NFS mount.
+ */
+struct fsinfo_nfs_info {
+	__u32		version;
+	__u32		minor_version;
+	__u32		transport_proto;
+};
+
+#define FSINFO_ATTR_NFS_INFO__STRUCT struct fsinfo_nfs_info
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_NFS_SERVER_ADDRESSES).
+ *
+ * Get the addresses of the server for an NFS mount.
+ */
+struct fsinfo_nfs_server_address {
+	struct __kernel_sockaddr_storage address;
+};
+
+#define FSINFO_ATTR_NFS_SERVER_ADDRESSES__STRUCT struct fsinfo_nfs_server_address
+
 #endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 53251ee98d1c..68652db686e8 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -393,6 +393,40 @@ static void dump_ext4_fsinfo_timestamps(void *reply, unsigned int size)
 	printf("\tlast-err: %s\n", dump_ext4_time(buffer, r->last_error_time));
 }
 
+static void dump_nfs_fsinfo_info(void *reply, unsigned int size)
+{
+	struct fsinfo_nfs_info *r = reply;
+
+	printf("ver=%u.%u proto=%u\n", r->version, r->minor_version, r->transport_proto);
+}
+
+static void dump_nfs_fsinfo_server_addresses(void *reply, unsigned int size)
+{
+	struct fsinfo_nfs_server_address *r = reply;
+	struct sockaddr_storage *ss = (struct sockaddr_storage *)&r->address;
+	struct sockaddr_in6 *sin6;
+	struct sockaddr_in *sin;
+	char buf[1024];
+
+	switch (ss->ss_family) {
+	case AF_INET:
+		sin = (struct sockaddr_in *)ss;
+		if (!inet_ntop(AF_INET, &sin->sin_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u %s\n", ntohs(sin->sin_port), buf);
+		return;
+	case AF_INET6:
+		sin6 = (struct sockaddr_in6 *)ss;
+		if (!inet_ntop(AF_INET6, &sin6->sin6_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u %s\n", ntohs(sin6->sin6_port), buf);
+		return;
+	default:
+		printf("family=%u\n", ss->ss_family);
+		return;
+	}
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -424,6 +458,8 @@ static void dump_string(void *reply, unsigned int size)
 #define dump_fsinfo_generic_mount_point		dump_string
 #define dump_afs_cell_name			dump_string
 #define dump_afs_server_name			dump_string
+#define dump_nfs_fsinfo_server_name		dump_string
+#define dump_nfs_fsinfo_gssapi_name		dump_string
 
 /*
  *
@@ -468,6 +504,10 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_server_name),
 	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_EXT4_TIMESTAMPS,	ext4_fsinfo_timestamps),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_NFS_INFO,		nfs_fsinfo_info),
+	FSINFO_STRING	(FSINFO_ATTR_NFS_SERVER_NAME,	nfs_fsinfo_server_name),
+	FSINFO_LIST	(FSINFO_ATTR_NFS_SERVER_ADDRESSES, nfs_fsinfo_server_addresses),
+	FSINFO_STRING	(FSINFO_ATTR_NFS_GSSAPI_NAME,	nfs_fsinfo_gssapi_name),
 	{}
 };
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH 19/19] nfs: Add example filesystem information [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (18 preceding siblings ...)
  2020-02-18 17:07 ` [PATCH 19/19] nfs: Add example filesystem " David Howells
@ 2020-02-18 18:12 ` David Howells
  2020-02-19 10:23 ` [PATCH 00/19] VFS: Filesystem information and notifications " Stefan Metzmacher
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-18 18:12 UTC (permalink / raw)
  To: viro
  Cc: dhowells, raven, mszeredi, christian, linux-api, linux-fsdevel,
	linux-kernel, linux-nfs

Oops.  I forgot to add a couple of files before committing.  Here's the
corrected patch.

David
---
nfs: Add example filesystem information

Add the ability to list NFS server addresses and hostname, timestamp
information and capabilities as an example.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-nfs@vger.kernel.org

---
 fs/nfs/Makefile              |    1 
 fs/nfs/fsinfo.c              |  225 +++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/internal.h            |    8 +
 fs/nfs/nfs4super.c           |    1 
 fs/nfs/super.c               |    1 
 include/uapi/linux/fsinfo.h  |   29 +++++
 include/uapi/linux/windows.h |   35 ++++++
 samples/vfs/test-fsinfo.c    |   40 +++++++
 8 files changed, 340 insertions(+)

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 2433c3e03cfa..20fbc9596833 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -13,6 +13,7 @@ nfs-y 			:= client.o dir.o file.o getroot.o inode.o super.o \
 nfs-$(CONFIG_ROOT_NFS)	+= nfsroot.o
 nfs-$(CONFIG_SYSCTL)	+= sysctl.o
 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
+nfs-$(CONFIG_FSINFO)	+= fsinfo.o
 
 obj-$(CONFIG_NFS_V2) += nfsv2.o
 nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/fsinfo.c b/fs/nfs/fsinfo.c
new file mode 100644
index 000000000000..22f7e6a16cb4
--- /dev/null
+++ b/fs/nfs/fsinfo.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information for NFS
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/nfs_fs.h>
+#include <linux/windows.h>
+#include "internal.h"
+
+static const struct fsinfo_timestamp_info nfs_timestamp_info = {
+	.atime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.mtime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.ctime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.btime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+};
+
+static int nfs_fsinfo_get_timestamp_info(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	struct fsinfo_timestamp_info *r = ctx->buffer;
+	unsigned long long nsec;
+	unsigned int rem, mant;
+	int exp = -9;
+
+	*r = nfs_timestamp_info;
+
+	nsec = server->time_delta.tv_nsec;
+	nsec += server->time_delta.tv_sec * 1000000000ULL;
+	if (nsec == 0)
+		goto out;
+
+	do {
+		mant = nsec;
+		rem = do_div(nsec, 10);
+		if (rem)
+			break;
+		exp++;
+	} while (nsec);
+
+	r->atime.gran_mantissa = mant;
+	r->atime.gran_exponent = exp;
+	r->btime.gran_mantissa = mant;
+	r->btime.gran_exponent = exp;
+	r->ctime.gran_mantissa = mant;
+	r->ctime.gran_exponent = exp;
+	r->mtime.gran_mantissa = mant;
+	r->mtime.gran_exponent = exp;
+
+out:
+	return sizeof(*r);
+}
+
+static int nfs_fsinfo_get_info(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	const struct nfs_client *clp = server->nfs_client;
+	struct fsinfo_nfs_info *r = ctx->buffer;
+
+	r->version		= clp->rpc_ops->version;
+	r->minor_version	= clp->cl_minorversion;
+	r->transport_proto	= clp->cl_proto;
+	return sizeof(*r);
+}
+
+static int nfs_fsinfo_get_server_name(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	const struct nfs_client *clp = server->nfs_client;
+
+	return fsinfo_string(clp->cl_hostname, ctx);
+}
+
+static int nfs_fsinfo_get_server_addresses(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	const struct nfs_client *clp = server->nfs_client;
+	struct fsinfo_nfs_server_address *addr = ctx->buffer;
+	int ret;
+
+	ret = 1 * sizeof(*addr);
+	if (ret <= ctx->buf_size)
+		memcpy(&addr[0].address, &clp->cl_addr, clp->cl_addrlen);
+	return ret;
+
+}
+
+static int nfs_fsinfo_get_gssapi_name(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	const struct nfs_client *clp = server->nfs_client;
+
+	return fsinfo_string(clp->cl_acceptor, ctx);
+}
+
+static int nfs_fsinfo_get_limits(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	struct fsinfo_limits *lim = ctx->buffer;
+
+	lim->max_file_size.hi	= 0;
+	lim->max_file_size.lo	= server->maxfilesize;
+	lim->max_ino.hi		= 0;
+	lim->max_ino.lo		= U64_MAX;
+	lim->max_hard_links	= UINT_MAX;
+	lim->max_uid		= UINT_MAX;
+	lim->max_gid		= UINT_MAX;
+	lim->max_filename_len	= NAME_MAX - 1;
+	lim->max_symlink_len	= PATH_MAX - 1;
+	return sizeof(*lim);
+}
+
+static int nfs_fsinfo_get_supports(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	struct fsinfo_supports *sup = ctx->buffer;
+
+	/* Don't set STATX_INO as i_ino is fabricated and may not be unique. */
+
+	if (!(server->caps & NFS_CAP_MODE))
+		sup->stx_mask |= STATX_TYPE | STATX_MODE;
+	if (server->caps & NFS_CAP_OWNER)
+		sup->stx_mask |= STATX_UID;
+	if (server->caps & NFS_CAP_OWNER_GROUP)
+		sup->stx_mask |= STATX_GID;
+	if (server->caps & NFS_CAP_ATIME)
+		sup->stx_mask |= STATX_ATIME;
+	if (server->caps & NFS_CAP_CTIME)
+		sup->stx_mask |= STATX_CTIME;
+	if (server->caps & NFS_CAP_MTIME)
+		sup->stx_mask |= STATX_MTIME;
+	if (server->attr_bitmask[0] & FATTR4_WORD0_SIZE)
+		sup->stx_mask |= STATX_SIZE;
+	if (server->attr_bitmask[1] & FATTR4_WORD1_NUMLINKS)
+		sup->stx_mask |= STATX_NLINK;
+
+	if (server->attr_bitmask[0] & FATTR4_WORD0_ARCHIVE)
+		sup->win_file_attrs |= ATTR_ARCHIVE;
+	if (server->attr_bitmask[0] & FATTR4_WORD0_HIDDEN)
+		sup->win_file_attrs |= ATTR_HIDDEN;
+	if (server->attr_bitmask[1] & FATTR4_WORD1_SYSTEM)
+		sup->win_file_attrs |= ATTR_SYSTEM;
+
+	sup->stx_attributes = STATX_ATTR_AUTOMOUNT;
+	return sizeof(*sup);
+}
+
+static int nfs_fsinfo_get_features(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct nfs_server *server = NFS_SB(path->dentry->d_sb);
+	struct fsinfo_features *ft = ctx->buffer;
+
+	fsinfo_set_feature(ft, FSINFO_FEAT_IS_NETWORK_FS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_AUTOMOUNTS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_O_SYNC);
+	fsinfo_set_feature(ft, FSINFO_FEAT_O_DIRECT);
+	fsinfo_set_feature(ft, FSINFO_FEAT_ADV_LOCKS);
+	fsinfo_set_feature(ft, FSINFO_FEAT_DEVICE_FILES);
+	fsinfo_set_feature(ft, FSINFO_FEAT_UNIX_SPECIALS);
+	if (server->nfs_client->rpc_ops->version == 4) {
+		fsinfo_set_feature(ft, FSINFO_FEAT_LEASES);
+		fsinfo_set_feature(ft, FSINFO_FEAT_IVER_ALL_CHANGE);
+	}
+
+	if (server->caps & NFS_CAP_OWNER)
+		fsinfo_set_feature(ft, FSINFO_FEAT_UIDS);
+	if (server->caps & NFS_CAP_OWNER_GROUP)
+		fsinfo_set_feature(ft, FSINFO_FEAT_GIDS);
+	if (!(server->caps & NFS_CAP_MODE))
+		fsinfo_set_feature(ft, FSINFO_FEAT_NO_UNIX_MODE);
+	if (server->caps & NFS_CAP_ACLS)
+		fsinfo_set_feature(ft, FSINFO_FEAT_HAS_ACL);
+	if (server->caps & NFS_CAP_SYMLINKS)
+		fsinfo_set_feature(ft, FSINFO_FEAT_SYMLINKS);
+	if (server->caps & NFS_CAP_HARDLINKS)
+		fsinfo_set_feature(ft, FSINFO_FEAT_HARD_LINKS);
+	if (server->caps & NFS_CAP_ATIME)
+		fsinfo_set_feature(ft, FSINFO_FEAT_HAS_ATIME);
+	if (server->caps & NFS_CAP_CTIME)
+		fsinfo_set_feature(ft, FSINFO_FEAT_HAS_CTIME);
+	if (server->caps & NFS_CAP_MTIME)
+		fsinfo_set_feature(ft, FSINFO_FEAT_HAS_MTIME);
+
+	if (server->attr_bitmask[0] & FATTR4_WORD0_CASE_INSENSITIVE)
+		fsinfo_set_feature(ft, FSINFO_FEAT_NAME_CASE_INDEP);
+	if ((server->attr_bitmask[0] & FATTR4_WORD0_ARCHIVE) ||
+	    (server->attr_bitmask[0] & FATTR4_WORD0_HIDDEN) ||
+	    (server->attr_bitmask[1] & FATTR4_WORD1_SYSTEM))
+		fsinfo_set_feature(ft, FSINFO_FEAT_WINDOWS_ATTRS);
+
+	return sizeof(*ft);
+}
+
+const struct fsinfo_attribute nfs_fsinfo_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	nfs_fsinfo_get_timestamp_info),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		nfs_fsinfo_get_limits),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		nfs_fsinfo_get_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		nfs_fsinfo_get_features),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_NFS_INFO,		nfs_fsinfo_get_info),
+	FSINFO_STRING	(FSINFO_ATTR_NFS_SERVER_NAME,	nfs_fsinfo_get_server_name),
+	FSINFO_LIST	(FSINFO_ATTR_NFS_SERVER_ADDRESSES, nfs_fsinfo_get_server_addresses),
+	FSINFO_STRING	(FSINFO_ATTR_NFS_GSSAPI_NAME,	nfs_fsinfo_get_gssapi_name),
+	{}
+};
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f80c47d5ff27..4ddf0da25740 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -10,6 +10,7 @@
 #include <linux/sunrpc/addr.h>
 #include <linux/nfs_page.h>
 #include <linux/wait_bit.h>
+#include <linux/fsinfo.h>
 
 #define NFS_SB_MASK (SB_RDONLY|SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS)
 
@@ -247,6 +248,13 @@ extern const struct svc_version nfs4_callback_version4;
 /* fs_context.c */
 extern struct file_system_type nfs_fs_type;
 
+/* fsinfo.c */
+#ifdef CONFIG_FSINFO
+extern const struct fsinfo_attribute nfs_fsinfo_attributes[];
+#else
+#define nfs_fsinfo_attributes NULL
+#endif
+
 /* pagelist.c */
 extern int __init nfs_init_nfspagecache(void);
 extern void nfs_destroy_nfspagecache(void);
diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index 1475f932d7da..1b75144e24f4 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -26,6 +26,7 @@ static const struct super_operations nfs4_sops = {
 	.write_inode	= nfs4_write_inode,
 	.drop_inode	= nfs_drop_inode,
 	.statfs		= nfs_statfs,
+	.fsinfo_attributes = nfs_fsinfo_attributes,
 	.evict_inode	= nfs4_evict_inode,
 	.umount_begin	= nfs_umount_begin,
 	.show_options	= nfs_show_options,
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index dada09b391c6..fbc2cf5f803b 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -76,6 +76,7 @@ const struct super_operations nfs_sops = {
 	.write_inode	= nfs_write_inode,
 	.drop_inode	= nfs_drop_inode,
 	.statfs		= nfs_statfs,
+	.fsinfo_attributes = nfs_fsinfo_attributes,
 	.evict_inode	= nfs_evict_inode,
 	.umount_begin	= nfs_umount_begin,
 	.show_options	= nfs_show_options,
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index da9a6f48ec5b..7c97d65333ec 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -40,6 +40,11 @@
 
 #define FSINFO_ATTR_EXT4_TIMESTAMPS	0x400	/* Ext4 superblock timestamps */
 
+#define FSINFO_ATTR_NFS_INFO		0x500	/* Information about an NFS mount */
+#define FSINFO_ATTR_NFS_SERVER_NAME	0x501	/* Name of the server (string) */
+#define FSINFO_ATTR_NFS_SERVER_ADDRESSES 0x502	/* List of addresses of the server */
+#define FSINFO_ATTR_NFS_GSSAPI_NAME	0x503	/* GSSAPI acceptor name */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -339,4 +344,28 @@ struct fsinfo_ext4_timestamps {
 
 #define FSINFO_ATTR_EXT4_TIMESTAMPS__STRUCT struct fsinfo_ext4_timestamps
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_NFS_INFO).
+ *
+ * Get information about an NFS mount.
+ */
+struct fsinfo_nfs_info {
+	__u32		version;
+	__u32		minor_version;
+	__u32		transport_proto;
+};
+
+#define FSINFO_ATTR_NFS_INFO__STRUCT struct fsinfo_nfs_info
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_NFS_SERVER_ADDRESSES).
+ *
+ * Get the addresses of the server for an NFS mount.
+ */
+struct fsinfo_nfs_server_address {
+	struct __kernel_sockaddr_storage address;
+};
+
+#define FSINFO_ATTR_NFS_SERVER_ADDRESSES__STRUCT struct fsinfo_nfs_server_address
+
 #endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/include/uapi/linux/windows.h b/include/uapi/linux/windows.h
new file mode 100644
index 000000000000..17efb9a40529
--- /dev/null
+++ b/include/uapi/linux/windows.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Common windows attributes
+ */
+#ifndef _UAPI_LINUX_WINDOWS_H
+#define _UAPI_LINUX_WINDOWS_H
+
+/*
+ * File Attribute flags
+ */
+#define ATTR_READONLY		0x0001
+#define ATTR_HIDDEN		0x0002
+#define ATTR_SYSTEM		0x0004
+#define ATTR_VOLUME		0x0008
+#define ATTR_DIRECTORY		0x0010
+#define ATTR_ARCHIVE		0x0020
+#define ATTR_DEVICE		0x0040
+#define ATTR_NORMAL		0x0080
+#define ATTR_TEMPORARY		0x0100
+#define ATTR_SPARSE		0x0200
+#define ATTR_REPARSE		0x0400
+#define ATTR_COMPRESSED		0x0800
+#define ATTR_OFFLINE		0x1000	/* ie file not immediately available -
+					   on offline storage */
+#define ATTR_NOT_CONTENT_INDEXED 0x2000
+#define ATTR_ENCRYPTED		0x4000
+#define ATTR_POSIX_SEMANTICS	0x01000000
+#define ATTR_BACKUP_SEMANTICS	0x02000000
+#define ATTR_DELETE_ON_CLOSE	0x04000000
+#define ATTR_SEQUENTIAL_SCAN	0x08000000
+#define ATTR_RANDOM_ACCESS	0x10000000
+#define ATTR_NO_BUFFERING	0x20000000
+#define ATTR_WRITE_THROUGH	0x80000000
+
+#endif /* _UAPI_LINUX_WINDOWS_H */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 53251ee98d1c..68652db686e8 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -393,6 +393,40 @@ static void dump_ext4_fsinfo_timestamps(void *reply, unsigned int size)
 	printf("\tlast-err: %s\n", dump_ext4_time(buffer, r->last_error_time));
 }
 
+static void dump_nfs_fsinfo_info(void *reply, unsigned int size)
+{
+	struct fsinfo_nfs_info *r = reply;
+
+	printf("ver=%u.%u proto=%u\n", r->version, r->minor_version, r->transport_proto);
+}
+
+static void dump_nfs_fsinfo_server_addresses(void *reply, unsigned int size)
+{
+	struct fsinfo_nfs_server_address *r = reply;
+	struct sockaddr_storage *ss = (struct sockaddr_storage *)&r->address;
+	struct sockaddr_in6 *sin6;
+	struct sockaddr_in *sin;
+	char buf[1024];
+
+	switch (ss->ss_family) {
+	case AF_INET:
+		sin = (struct sockaddr_in *)ss;
+		if (!inet_ntop(AF_INET, &sin->sin_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u %s\n", ntohs(sin->sin_port), buf);
+		return;
+	case AF_INET6:
+		sin6 = (struct sockaddr_in6 *)ss;
+		if (!inet_ntop(AF_INET6, &sin6->sin6_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u %s\n", ntohs(sin6->sin6_port), buf);
+		return;
+	default:
+		printf("family=%u\n", ss->ss_family);
+		return;
+	}
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -424,6 +458,8 @@ static void dump_string(void *reply, unsigned int size)
 #define dump_fsinfo_generic_mount_point		dump_string
 #define dump_afs_cell_name			dump_string
 #define dump_afs_server_name			dump_string
+#define dump_nfs_fsinfo_server_name		dump_string
+#define dump_nfs_fsinfo_gssapi_name		dump_string
 
 /*
  *
@@ -468,6 +504,10 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_server_name),
 	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_EXT4_TIMESTAMPS,	ext4_fsinfo_timestamps),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_NFS_INFO,		nfs_fsinfo_info),
+	FSINFO_STRING	(FSINFO_ATTR_NFS_SERVER_NAME,	nfs_fsinfo_server_name),
+	FSINFO_LIST	(FSINFO_ATTR_NFS_SERVER_ADDRESSES, nfs_fsinfo_server_addresses),
+	FSINFO_STRING	(FSINFO_ATTR_NFS_GSSAPI_NAME,	nfs_fsinfo_gssapi_name),
 	{}
 };
 

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (19 preceding siblings ...)
  2020-02-18 18:12 ` David Howells
@ 2020-02-19 10:23 ` Stefan Metzmacher
  2020-02-19 14:46 ` Christian Brauner
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 67+ messages in thread
From: Stefan Metzmacher @ 2020-02-19 10:23 UTC (permalink / raw)
  To: David Howells, viro
  Cc: raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1608 bytes --]

Hi David,

I have a few generic remarks for new syscalls...

>  (3) New single-bit capability flags can be added.  This is a structure-typed
>      attribute and, as such, (2) applies.  Any bits you wanted but the kernel
>      doesn't support are automatically set to 0.
> 
> fsinfo() may be called like the following, for example:
> 
> 	struct fsinfo_params params = {
> 		.at_flags	= AT_SYMLINK_NOFOLLOW,

Shouldn't all new syscalls be able to provide the RESOLVE_

Shouldn't all new syscalls be able to provide the RESOLVE_ flags
supported in openat2?

> 		.flags		= FSINFO_FLAGS_QUERY_PATH,
> 		.request	= FSINFO_ATTR_AFS_SERVER_ADDRESSES,
> 		.Nth		= 2,
> 	};
> 	struct fsinfo_server_address address;
> 	len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", &params,
> 		     &address, sizeof(address));

Also passing sizeof(params) would allow future updates of fsinfo_params,
also similar to openat2(), clone3()...

> ========================
> FILESYSTEM NOTIFICATIONS
> ========================
> 
> The second system call, watch_mount(), places a watch on a point in the
> mount topology specified by the dirfd, path and at_flags parameters.  All
> mount topology change and mount attribute change notifications in the
> subtree rooted at that point can be intercepted by the watch.  Watches are
> ducted through pipes:
> 
> 	int fd[2];
> 	pipe2(fd, O_NOTIFICATION_PIPE);
> 	ioctl(fd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
> 	watch_mount(AT_FDCWD, "/", 0, fd[0], 0x02);

I guess similar things apply here.

Does that make sense to you?

metze



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (20 preceding siblings ...)
  2020-02-19 10:23 ` [PATCH 00/19] VFS: Filesystem information and notifications " Stefan Metzmacher
@ 2020-02-19 14:46 ` Christian Brauner
  2020-02-19 15:50   ` Darrick J. Wong
  2020-02-20  4:42   ` Ian Kent
  2020-02-19 16:16 ` David Howells
  2020-02-21 12:57 ` David Howells
  23 siblings, 2 replies; 67+ messages in thread
From: Christian Brauner @ 2020-02-19 14:46 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote:
> 
> Here are a set of patches that adds system calls, that (a) allow
> information about the VFS, mount topology, superblock and files to be
> retrieved and (b) allow for notifications of mount topology rearrangement
> events, mount and superblock attribute changes and other superblock events,
> such as errors.
> 
> ============================
> FILESYSTEM INFORMATION QUERY
> ============================
> 
> The first system call, fsinfo(), allows information about the filesystem at
> a particular path point to be queried as a set of attributes, some of which
> may have more than one value.
> 
> Attribute values are of four basic types:
> 
>  (1) Version dependent-length structure (size defined by type).
> 
>  (2) Variable-length string (up to 4096, including NUL).
> 
>  (3) List of structures (up to INT_MAX size).
> 
>  (4) Opaque blob (up to INT_MAX size).

I mainly have an organizational question. :) This is a huge patchset
with lots and lots of (good) features. Wouldn't it make sense to make
the fsinfo() syscall a completely separate patchset from the
watch_mount() and watch_sb() syscalls? It seems that they don't need to
depend on each other at all. This would make reviewing this so much
nicer and likely would mean that fsinfo() could proceed a little faster.

Christian

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-19 14:46 ` Christian Brauner
@ 2020-02-19 15:50   ` Darrick J. Wong
  2020-02-20  4:42   ` Ian Kent
  1 sibling, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-19 15:50 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, viro, raven, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

On Wed, Feb 19, 2020 at 03:46:13PM +0100, Christian Brauner wrote:
> On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote:
> > 
> > Here are a set of patches that adds system calls, that (a) allow
> > information about the VFS, mount topology, superblock and files to be
> > retrieved and (b) allow for notifications of mount topology rearrangement
> > events, mount and superblock attribute changes and other superblock events,
> > such as errors.
> > 
> > ============================
> > FILESYSTEM INFORMATION QUERY
> > ============================
> > 
> > The first system call, fsinfo(), allows information about the filesystem at
> > a particular path point to be queried as a set of attributes, some of which
> > may have more than one value.
> > 
> > Attribute values are of four basic types:
> > 
> >  (1) Version dependent-length structure (size defined by type).
> > 
> >  (2) Variable-length string (up to 4096, including NUL).
> > 
> >  (3) List of structures (up to INT_MAX size).
> > 
> >  (4) Opaque blob (up to INT_MAX size).
> 
> I mainly have an organizational question. :) This is a huge patchset
> with lots and lots of (good) features. Wouldn't it make sense to make
> the fsinfo() syscall a completely separate patchset from the
> watch_mount() and watch_sb() syscalls? It seems that they don't need to
> depend on each other at all. This would make reviewing this so much
> nicer and likely would mean that fsinfo() could proceed a little faster.

Agreed; I was also wondering why it was necessary to have three new
features in the same large(ish) patchset.

--D

> Christian

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (21 preceding siblings ...)
  2020-02-19 14:46 ` Christian Brauner
@ 2020-02-19 16:16 ` David Howells
  2020-02-21 12:57 ` David Howells
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-19 16:16 UTC (permalink / raw)
  To: Christian Brauner
  Cc: dhowells, viro, raven, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

Christian Brauner <christian.brauner@ubuntu.com> wrote:

> I mainly have an organizational question. :) This is a huge patchset
> with lots and lots of (good) features. Wouldn't it make sense to make
> the fsinfo() syscall a completely separate patchset from the
> watch_mount() and watch_sb() syscalls? It seems that they don't need to
> depend on each other at all. This would make reviewing this so much
> nicer and likely would mean that fsinfo() could proceed a little faster.

I can split it up again, but it's not quite as independent as it seems.

There's a notification counter added to both the mount struct and the
super_block struct.  This is bumped by notifications and retrieved by
fsinfo().  You need this in the event that there's an overrun and you have to
rescan the whole tree.

So to actually make use of the mount/sb notification facilities, you need
fsinfo() as well.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
@ 2020-02-19 16:31   ` Darrick J. Wong
  2020-02-19 20:07   ` Jann Horn
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-19 16:31 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:05:02PM +0000, David Howells wrote:
> Add a system call to allow filesystem information to be queried.  A request
> value can be given to indicate the desired attribute.  Support is provided
> for enumerating multi-value attributes.
> 
> ===============
> NEW SYSTEM CALL
> ===============
> 
> The new system call looks like:
> 
> 	int ret = fsinfo(int dfd,
> 			 const char *filename,
> 			 const struct fsinfo_params *params,
> 			 void *buffer,
> 			 size_t buf_size);
> 
> The params parameter optionally points to a block of parameters:
> 
> 	struct fsinfo_params {
> 		__u32	at_flags;
> 		__u32	request;
> 		__u32	Nth;
> 		__u32	Mth;
> 		__u64	__reserved[3];
> 	};
> 
> If params is NULL, it is assumed params->request should be
> fsinfo_attr_statfs, params->Nth should be 0, params->Mth should be 0 and
> params->at_flags should be 0.
> 
> If params is given, all of params->__reserved[] must be 0.
> 
> dfd, filename and params->at_flags indicate the file to query.  There is no
> equivalent of lstat() as that can be emulated with fsinfo() by setting
> AT_SYMLINK_NOFOLLOW in params->at_flags.  There is also no equivalent of
> fstat() as that can be emulated by passing a NULL filename to fsinfo() with
> the fd of interest in dfd.  AT_NO_AUTOMOUNT can also be used to an allow
> automount point to be queried without triggering it.
> 
> params->request indicates the attribute/attributes to be queried.  This can
> be one of:
> 
> 	FSINFO_ATTR_STATFS		- statfs-style info
> 	FSINFO_ATTR_IDS			- Filesystem IDs
> 	FSINFO_ATTR_LIMITS		- Filesystem limits
> 	FSINFO_ATTR_SUPPORTS		- What's supported in statx(), IOC flags
> 	FSINFO_ATTR_TIMESTAMP_INFO	- Inode timestamp info
> 	FSINFO_ATTR_VOLUME_ID		- Volume ID (string)
> 	FSINFO_ATTR_VOLUME_UUID		- Volume UUID
> 	FSINFO_ATTR_VOLUME_NAME		- Volume name (string)
> 	FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO - Information about attr Nth
> 	FSINFO_ATTR_FSINFO_ATTRIBUTES	- List of supported attrs
> 
> Some attributes (such as the servers backing a network filesystem) can have
> multiple values.  These can be enumerated by setting params->Nth and
> params->Mth to 0, 1, ... until ENODATA is returned.
> 
> buffer and buf_size point to the reply buffer.  The buffer is filled up to
> the specified size, even if this means truncating the reply.  The full size
> of the reply is returned.  In future versions, this will allow extra fields
> to be tacked on to the end of the reply, but anyone not expecting them will
> only get the subset they're expecting.  If either buffer of buf_size are 0,
> no copy will take place and the data size will be returned.
> 
> At the moment, this will only work on x86_64 and i386 as it requires the
> system call to be wired up.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: linux-api@vger.kernel.org
> ---
> 
>  arch/x86/entry/syscalls/syscall_32.tbl |    1 
>  arch/x86/entry/syscalls/syscall_64.tbl |    1 
>  fs/Kconfig                             |    7 
>  fs/Makefile                            |    1 
>  fs/fsinfo.c                            |  525 ++++++++++++++++++++++++++++
>  include/linux/fs.h                     |    5 
>  include/linux/fsinfo.h                 |   70 ++++
>  include/linux/syscalls.h               |    4 
>  include/uapi/asm-generic/unistd.h      |    4 
>  include/uapi/linux/fsinfo.h            |  186 ++++++++++
>  kernel/sys_ni.c                        |    1 
>  samples/vfs/Makefile                   |    5 
>  samples/vfs/test-fsinfo.c              |  599 ++++++++++++++++++++++++++++++++
>  13 files changed, 1408 insertions(+), 1 deletion(-)
>  create mode 100644 fs/fsinfo.c
>  create mode 100644 include/linux/fsinfo.h
>  create mode 100644 include/uapi/linux/fsinfo.h
>  create mode 100644 samples/vfs/test-fsinfo.c
> 
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index c17cb77eb150..b7817acb154b 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -442,3 +442,4 @@
>  435	i386	clone3			sys_clone3			__ia32_sys_clone3
>  437	i386	openat2			sys_openat2			__ia32_sys_openat2
>  438	i386	pidfd_getfd		sys_pidfd_getfd			__ia32_sys_pidfd_getfd
> +439	i386	fsinfo			sys_fsinfo			__ia32_sys_fsinfo
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 44d510bc9b78..3a45ed6d28cb 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -359,6 +359,7 @@
>  435	common	clone3			__x64_sys_clone3/ptregs
>  437	common	openat2			__x64_sys_openat2
>  438	common	pidfd_getfd		__x64_sys_pidfd_getfd
> +439	common	fsinfo			__x64_sys_fsinfo
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 708ba336e689..1d1b48059ec9 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -15,6 +15,13 @@ config VALIDATE_FS_PARSER
>  	  Enable this to perform validation of the parameter description for a
>  	  filesystem when it is registered.
>  
> +config FSINFO
> +	bool "Enable the fsinfo() system call"
> +	help
> +	  Enable the file system information querying system call to allow
> +	  comprehensive information to be retrieved about a filesystem,
> +	  superblock or mount object.
> +
>  if BLOCK
>  
>  config FS_IOMAP
> diff --git a/fs/Makefile b/fs/Makefile
> index 505e51166973..b5cc9bcd17a4 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -54,6 +54,7 @@ obj-$(CONFIG_COREDUMP)		+= coredump.o
>  obj-$(CONFIG_SYSCTL)		+= drop_caches.o
>  
>  obj-$(CONFIG_FHANDLE)		+= fhandle.o
> +obj-$(CONFIG_FSINFO)		+= fsinfo.o
>  obj-y				+= iomap/
>  
>  obj-y				+= quota/
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> new file mode 100644
> index 000000000000..3bc35b91f20b
> --- /dev/null
> +++ b/fs/fsinfo.c
> @@ -0,0 +1,525 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Filesystem information query.
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +#include <linux/syscalls.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/mount.h>
> +#include <linux/namei.h>
> +#include <linux/statfs.h>
> +#include <linux/security.h>
> +#include <linux/uaccess.h>
> +#include <linux/fsinfo.h>
> +#include <uapi/linux/mount.h>
> +#include "internal.h"
> +
> +static const struct fsinfo_attribute fsinfo_common_attributes[];
> +
> +/**
> + * fsinfo_string - Store a string as an fsinfo attribute value.
> + * @s: The string to store (may be NULL)
> + * @ctx: The parameter context
> + */
> +int fsinfo_string(const char *s, struct fsinfo_context *ctx)
> +{
> +	int ret = 0;
> +
> +	if (s) {
> +		ret = strlen(s);
> +		memcpy(ctx->buffer, s, ret);
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(fsinfo_string);
> +
> +/*
> + * Get basic filesystem stats from statfs.
> + */
> +static int fsinfo_generic_statfs(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_statfs *p = ctx->buffer;
> +	struct kstatfs buf;
> +	int ret;
> +
> +	ret = vfs_statfs(path, &buf);
> +	if (ret < 0)
> +		return ret;
> +
> +	p->f_blocks.hi	= 0;
> +	p->f_blocks.lo	= buf.f_blocks;

Er... are there filesystems (besides that (xfs++)++ one) that require
u128 counters?  I suspect that the Very Large Fields are for future
expandability, but I also wonder about the whether it's worth the
complexity of doing this, since the structures can always be
version-revved later.

(I'm not opposed to u128, I'm just curious about it. :))

> +	p->f_bfree.hi	= 0;
> +	p->f_bfree.lo	= buf.f_bfree;
> +	p->f_bavail.hi	= 0;
> +	p->f_bavail.lo	= buf.f_bavail;
> +	p->f_files.hi	= 0;
> +	p->f_files.lo	= buf.f_files;
> +	p->f_ffree.hi	= 0;
> +	p->f_ffree.lo	= buf.f_ffree;
> +	p->f_favail.hi	= 0;
> +	p->f_favail.lo	= buf.f_ffree;
> +	p->f_bsize	= buf.f_bsize;
> +	p->f_frsize	= buf.f_frsize;
> +	return sizeof(*p);
> +}
> +
> +static int fsinfo_generic_ids(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_ids *p = ctx->buffer;
> +	struct super_block *sb;
> +	struct kstatfs buf;
> +	int ret;
> +
> +	ret = vfs_statfs(path, &buf);
> +	if (ret < 0 && ret != -ENOSYS)
> +		return ret;
> +
> +	sb = path->dentry->d_sb;
> +	p->f_fstype	= sb->s_magic;
> +	p->f_dev_major	= MAJOR(sb->s_dev);
> +	p->f_dev_minor	= MINOR(sb->s_dev);
> +
> +	memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
> +	strlcpy(p->f_fs_name, path->dentry->d_sb->s_type->name,
> +		sizeof(p->f_fs_name));
> +	return sizeof(*p);
> +}
> +
> +static int fsinfo_generic_limits(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_limits *lim = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	lim->max_file_size.hi	= 0;
> +	lim->max_file_size.lo	= sb->s_maxbytes;
> +	lim->max_ino.hi		= 0;
> +	lim->max_ino.lo		= UINT_MAX;

XFS inodes are u64 values...

> +	lim->max_hard_links	= sb->s_max_links;
> +	lim->max_uid		= UINT_MAX;
> +	lim->max_gid		= UINT_MAX;
> +	lim->max_projid		= UINT_MAX;
> +	lim->max_filename_len	= NAME_MAX;
> +	lim->max_symlink_len	= PAGE_SIZE;

...and the max symlink target length is 1k, not PAGE_SIZE...

> +	lim->max_xattr_name_len	= XATTR_NAME_MAX;
> +	lim->max_xattr_body_len	= XATTR_SIZE_MAX;

...so is the usage model here that XFS should call fsinfo_generic_limits
to fill out the fsinfo_limits structure, modify the values in
ctx->buffer as appropriate for XFS, and then return the structure size?

> +	lim->max_dev_major	= 0xffffff;
> +	lim->max_dev_minor	= 0xff;
> +	return sizeof(*lim);
> +}
> +
> +static int fsinfo_generic_supports(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_supports *c = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	c->stx_mask = STATX_BASIC_STATS;
> +	if (sb->s_d_op && sb->s_d_op->d_automount)
> +		c->stx_attributes |= STATX_ATTR_AUTOMOUNT;
> +	return sizeof(*c);
> +}
> +
> +static const struct fsinfo_timestamp_info fsinfo_default_timestamp_info = {
> +	.atime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +	.mtime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +	.ctime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +	.btime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +};
> +
> +static int fsinfo_generic_timestamp_info(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_timestamp_info *ts = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +	s8 exponent;
> +
> +	*ts = fsinfo_default_timestamp_info;
> +
> +	if (sb->s_time_gran < 1000000000) {
> +		if (sb->s_time_gran < 1000)
> +			exponent = -9;
> +		else if (sb->s_time_gran < 1000000)
> +			exponent = -6;
> +		else
> +			exponent = -3;
> +
> +		ts->atime.gran_exponent = exponent;
> +		ts->mtime.gran_exponent = exponent;
> +		ts->ctime.gran_exponent = exponent;
> +		ts->btime.gran_exponent = exponent;
> +	}
> +
> +	return sizeof(*ts);
> +}
> +
> +static int fsinfo_generic_volume_uuid(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_volume_uuid *vu = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	memcpy(vu, &sb->s_uuid, sizeof(*vu));
> +	return sizeof(*vu);
> +}
> +
> +static int fsinfo_generic_volume_id(struct path *path, struct fsinfo_context *ctx)
> +{
> +	return fsinfo_string(path->dentry->d_sb->s_id, ctx);
> +}
> +
> +static int fsinfo_attribute_info(struct path *path, struct fsinfo_context *ctx)
> +{
> +	const struct fsinfo_attribute *attr;
> +	struct fsinfo_attribute_info *info = ctx->buffer;
> +	struct dentry *dentry = path->dentry;
> +
> +	if (dentry->d_sb->s_op->fsinfo_attributes)
> +		for (attr = dentry->d_sb->s_op->fsinfo_attributes; attr->get; attr++)
> +			if (attr->attr_id == ctx->Nth)
> +				goto found;
> +	for (attr = fsinfo_common_attributes; attr->get; attr++)
> +		if (attr->attr_id == ctx->Nth)
> +			goto found;
> +	return -ENODATA;
> +
> +found:
> +	info->attr_id		= attr->attr_id;
> +	info->type		= attr->type;
> +	info->flags		= attr->flags;
> +	info->size		= attr->size;
> +	info->element_size	= attr->element_size;
> +	return sizeof(*attr);
> +}
> +
> +static void fsinfo_attributes_insert(struct fsinfo_context *ctx,
> +				     const struct fsinfo_attribute *attr)
> +{
> +	__u32 *buffer = ctx->buffer;
> +	unsigned int i;
> +
> +	if (ctx->usage > ctx->buf_size ||
> +	    ctx->buf_size - ctx->usage < sizeof(__u32)) {
> +		ctx->usage += sizeof(__u32);
> +		return;
> +	}
> +
> +	for (i = 0; i < ctx->usage / sizeof(__u32); i++)
> +		if (buffer[i] == attr->attr_id)
> +			return;
> +
> +	buffer[i] = attr->attr_id;
> +	ctx->usage += sizeof(__u32);
> +}
> +
> +static int fsinfo_attributes(struct path *path, struct fsinfo_context *ctx)
> +{
> +	const struct fsinfo_attribute *attr;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	if (sb->s_op->fsinfo_attributes)
> +		for (attr = sb->s_op->fsinfo_attributes; attr->get; attr++)
> +			fsinfo_attributes_insert(ctx, attr);
> +	for (attr = fsinfo_common_attributes; attr->get; attr++)
> +		fsinfo_attributes_insert(ctx, attr);
> +	return ctx->usage;
> +}
> +
> +static const struct fsinfo_attribute fsinfo_common_attributes[] = {
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
> +	FSINFO_STRING 	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),

There's a space      ^ before the tab here.

> +	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
> +	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_attribute_info),
> +	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_attributes),
> +	{}
> +};
> +
> +/*
> + * Retrieve large filesystem information, such as an opaque blob or array of
> + * struct elements where the value isn't limited to the size of a page.
> + */
> +static int vfs_fsinfo_large(struct path *path, struct fsinfo_context *ctx,
> +			    const struct fsinfo_attribute *attr)
> +{
> +	int ret;
> +
> +	while (!signal_pending(current)) {
> +		ctx->usage = 0;
> +		ret = attr->get(path, ctx);
> +		if (IS_ERR_VALUE((long)ret))
> +			return ret; /* Error */
> +		if ((unsigned int)ret <= ctx->buf_size)
> +			return ret; /* It fitted */
> +
> +		/* We need to resize the buffer */
> +		kvfree(ctx->buffer);
> +		ctx->buffer = NULL;
> +		ctx->buf_size = roundup(ret, PAGE_SIZE);
> +		if (ctx->buf_size > INT_MAX)
> +			return -EMSGSIZE;
> +		ctx->buffer = kvmalloc(ctx->buf_size, GFP_KERNEL);
> +		if (!ctx->buffer)
> +			return -ENOMEM;
> +	}
> +
> +	return -ERESTARTSYS;
> +}
> +
> +static int vfs_do_fsinfo(struct path *path, struct fsinfo_context *ctx,
> +			 const struct fsinfo_attribute *attr)
> +{
> +	if (ctx->Nth != 0 && !(attr->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)))
> +		return -ENODATA;
> +	if (ctx->Mth != 0 && !(attr->flags & FSINFO_FLAGS_NM))
> +		return -ENODATA;
> +
> +	ctx->buf_size = attr->size;
> +	if (ctx->want_size_only && attr->type == FSINFO_TYPE_VSTRUCT)
> +		return attr->size;
> +
> +	ctx->buffer = kvzalloc(ctx->buf_size, GFP_KERNEL);
> +	if (!ctx->buffer)
> +		return -ENOMEM;
> +
> +	switch (attr->type) {
> +	case FSINFO_TYPE_VSTRUCT:
> +		ctx->clear_tail = true;
> +		/* Fall through */
> +	case FSINFO_TYPE_STRING:
> +		return attr->get(path, ctx);
> +
> +	case FSINFO_TYPE_OPAQUE:
> +	case FSINFO_TYPE_LIST:
> +		return vfs_fsinfo_large(path, ctx, attr);
> +
> +	default:
> +		return -ENOPKG;
> +	}
> +}
> +
> +/**
> + * vfs_fsinfo - Retrieve filesystem information
> + * @path: The object to query
> + * @params: Parameters to define a request and place to store result
> + *
> + * Get an attribute on a filesystem or an object within a filesystem.  The
> + * filesystem attribute to be queried is indicated by @ctx->requested_attr, and
> + * if it's a multi-valued attribute, the particular value is selected by
> + * @ctx->Nth and then @ctx->Mth.
> + *
> + * For common attributes, a value may be fabricated if it is not supported by
> + * the filesystem.
> + *
> + * On success, the size of the attribute's value is returned (0 is a valid
> + * size).  A buffer will have been allocated and will be pointed to by
> + * @ctx->buffer.  The caller must free this with kvfree().
> + *
> + * Errors can also be returned: -ENOMEM if a buffer cannot be allocated, -EPERM
> + * or -EACCES if permission is denied by the LSM, -EOPNOTSUPP if an attribute
> + * doesn't exist for the specified object or -ENODATA if the attribute exists,
> + * but the Nth,Mth value does not exist.  -EMSGSIZE indicates that the value is
> + * unmanageable internally and -ENOPKG indicates other internal failure.
> + *
> + * Errors such as -EIO may also come from attempts to access media or servers
> + * to obtain the requested information if it's not immediately to hand.
> + *
> + * [*] Note that the caller may set @ctx->want_size_only if it only wants the
> + *     size of the value and not the data.  If this is set, a buffer may not be
> + *     allocated under some circumstances.  This is intended for size query by
> + *     userspace.
> + *
> + * [*] Note that @ctx->clear_tail will be returned set if the data should be
> + *     padded out with zeros when writing it to userspace.
> + */
> +static int vfs_fsinfo(struct path *path, struct fsinfo_context *ctx)
> +{
> +	const struct fsinfo_attribute *attr;
> +	struct dentry *dentry = path->dentry;
> +	int ret;
> +
> +	if (dentry->d_sb->s_op->fsinfo_attributes)
> +		for (attr = dentry->d_sb->s_op->fsinfo_attributes; attr->get; attr++)
> +			if (attr->attr_id == ctx->requested_attr)
> +				goto found;
> +	for (attr = fsinfo_common_attributes; attr->get; attr++)
> +		if (attr->attr_id == ctx->requested_attr)
> +			goto found;
> +	return -EOPNOTSUPP;
> +
> +found:
> +	ret = security_sb_statfs(dentry);
> +	if (ret)
> +		return ret;
> +
> +	return vfs_do_fsinfo(path, ctx, attr);
> +}
> +
> +static int vfs_fsinfo_path(int dfd, const char __user *pathname,
> +			   unsigned int at_flags, struct fsinfo_context *ctx)
> +{
> +	struct path path;
> +	unsigned lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
> +	int ret = -EINVAL;
> +
> +	if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
> +			  AT_EMPTY_PATH)) != 0)
> +		return -EINVAL;
> +
> +	if (at_flags & AT_SYMLINK_NOFOLLOW)
> +		lookup_flags &= ~LOOKUP_FOLLOW;
> +	if (at_flags & AT_NO_AUTOMOUNT)
> +		lookup_flags &= ~LOOKUP_AUTOMOUNT;
> +	if (at_flags & AT_EMPTY_PATH)
> +		lookup_flags |= LOOKUP_EMPTY;
> +
> +retry:
> +	ret = user_path_at(dfd, pathname, lookup_flags, &path);
> +	if (ret)
> +		goto out;
> +
> +	ret = vfs_fsinfo(&path, ctx);
> +	path_put(&path);
> +	if (retry_estale(ret, lookup_flags)) {
> +		lookup_flags |= LOOKUP_REVAL;
> +		goto retry;
> +	}
> +out:
> +	return ret;
> +}
> +
> +static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_context *ctx)
> +{
> +	struct fd f = fdget_raw(fd);
> +	int ret = -EBADF;
> +
> +	if (f.file) {
> +		ret = vfs_fsinfo(&f.file->f_path, ctx);
> +		fdput(f);
> +	}
> +	return ret;
> +}
> +
> +/**
> + * sys_fsinfo - System call to get filesystem information
> + * @dfd: Base directory to pathwalk from or fd referring to filesystem.
> + * @pathname: Filesystem to query or NULL.
> + * @_params: Parameters to define request (or NULL for enhanced statfs).
> + * @user_buffer: Result buffer.
> + * @user_buf_size: Size of result buffer.
> + *
> + * Get information on a filesystem.  The filesystem attribute to be queried is
> + * indicated by @_params->request, and some of the attributes can have multiple
> + * values, indexed by @_params->Nth and @_params->Mth.  If @_params is NULL,
> + * then the 0th fsinfo_attr_statfs attribute is queried.  If an attribute does
> + * not exist, EOPNOTSUPP is returned; if the Nth,Mth value does not exist,
> + * ENODATA is returned.
> + *
> + * On success, the size of the attribute's value is returned.  If
> + * @user_buf_size is 0 or @user_buffer is NULL, only the size is returned.  If
> + * the size of the value is larger than @user_buf_size, it will be truncated by
> + * the copy.  If the size of the value is smaller than @user_buf_size then the
> + * excess buffer space will be cleared.  The full size of the value will be
> + * returned, irrespective of how much data is actually placed in the buffer.
> + */
> +SYSCALL_DEFINE5(fsinfo,
> +		int, dfd, const char __user *, pathname,
> +		struct fsinfo_params __user *, params,
> +		void __user *, user_buffer, size_t, user_buf_size)
> +{
> +	struct fsinfo_context ctx;
> +	struct fsinfo_params user_params;
> +	unsigned int at_flags = 0, result_size;
> +	int ret;
> +
> +	if (!user_buffer && user_buf_size)
> +		return -EINVAL;
> +	if (user_buffer && !user_buf_size)
> +		return -EINVAL;
> +	if (user_buf_size > UINT_MAX)
> +		return -EOVERFLOW;
> +
> +	memset(&ctx, 0, sizeof(ctx));
> +	ctx.requested_attr = FSINFO_ATTR_STATFS;
> +	if (user_buf_size == 0)
> +		ctx.want_size_only = true;
> +
> +	if (params) {
> +		if (copy_from_user(&user_params, params, sizeof(user_params)))
> +			return -EFAULT;
> +		if (user_params.__reserved32[0] ||
> +		    user_params.__reserved[0] ||
> +		    user_params.__reserved[1] ||
> +		    user_params.__reserved[2] ||
> +		    user_params.flags & ~FSINFO_FLAGS_QUERY_TYPE)
> +			return -EINVAL;
> +		at_flags = user_params.at_flags;
> +		ctx.flags = user_params.flags;
> +		ctx.requested_attr = user_params.request;
> +		ctx.Nth = user_params.Nth;
> +		ctx.Mth = user_params.Mth;
> +	}
> +
> +	switch (ctx.flags & FSINFO_FLAGS_QUERY_TYPE) {
> +	case FSINFO_FLAGS_QUERY_PATH:
> +		ret = vfs_fsinfo_path(dfd, pathname, at_flags, &ctx);
> +		break;
> +	case FSINFO_FLAGS_QUERY_FD:
> +		if (pathname)
> +			return -EINVAL;
> +		ret = vfs_fsinfo_fd(dfd, &ctx);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (ret < 0)
> +		goto error;
> +
> +	result_size = ret;
> +	if (result_size > user_buf_size)
> +		result_size = user_buf_size;
> +
> +	if (result_size > 0 &&
> +	    copy_to_user(user_buffer, ctx.buffer, result_size) != 0) {
> +		ret = -EFAULT;
> +		goto error;
> +	}
> +
> +	/* Clear any part of the buffer that we won't fill if we're putting a
> +	 * struct in there.  Strings, opaque objects and arrays are expected to
> +	 * be variable length.
> +	 */
> +	if (ctx.clear_tail &&
> +	    user_buf_size > result_size &&
> +	    clear_user(user_buffer + result_size, user_buf_size - result_size) != 0) {
> +		ret = -EFAULT;
> +		goto error;
> +	}
> +
> +error:
> +	kvfree(ctx.buffer);
> +	return ret;
> +}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3cd4fe6b845e..f74a4ee36eb3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -68,6 +68,8 @@ struct fsverity_info;
>  struct fsverity_operations;
>  struct fs_context;
>  struct fs_parameter_spec;
> +struct fsinfo_kparams;
> +struct fsinfo_attribute;
>  
>  extern void __init inode_init(void);
>  extern void __init inode_init_early(void);
> @@ -1954,6 +1956,9 @@ struct super_operations {
>  	int (*thaw_super) (struct super_block *);
>  	int (*unfreeze_fs) (struct super_block *);
>  	int (*statfs) (struct dentry *, struct kstatfs *);
> +#ifdef CONFIG_FSINFO
> +	const struct fsinfo_attribute *fsinfo_attributes;
> +#endif
>  	int (*remount_fs) (struct super_block *, int *, char *);
>  	void (*umount_begin) (struct super_block *);
>  
> diff --git a/include/linux/fsinfo.h b/include/linux/fsinfo.h
> new file mode 100644
> index 000000000000..dcd55dbb02fa
> --- /dev/null
> +++ b/include/linux/fsinfo.h
> @@ -0,0 +1,70 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Filesystem information query
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#ifndef _LINUX_FSINFO_H
> +#define _LINUX_FSINFO_H
> +
> +#ifdef CONFIG_FSINFO
> +
> +#include <uapi/linux/fsinfo.h>
> +
> +struct path;
> +
> +#define FSINFO_NORMAL_ATTR_MAX_SIZE 4096
> +
> +struct fsinfo_context {
> +	__u32		flags;		/* [in] FSINFO_FLAGS_* */
> +	__u32		requested_attr;	/* [in] What is being asking for */
> +	__u32		Nth;		/* [in] Instance of it (some may have multiple) */
> +	__u32		Mth;		/* [in] Subinstance */
> +	bool		want_size_only;	/* [in] Just want to know the size, not the data */
> +	bool		clear_tail;	/* [out] T if tail of buffer should be cleared */
> +	unsigned int	usage;		/* [tmp] Amount of buffer used (if large) */
> +	unsigned int	buf_size;	/* [tmp] Size of ->buffer[] */
> +	void		*buffer;	/* [out] The reply buffer */
> +};
> +
> +/*
> + * A filesystem information attribute definition.
> + */
> +struct fsinfo_attribute {
> +	unsigned int		attr_id;	/* The ID of the attribute */
> +	enum fsinfo_value_type	type:8;		/* The type of the attribute's value(s) */
> +	unsigned int		flags:8;
> +	unsigned int		size:16;	/* - Value size (FSINFO_STRUCT) */
> +	unsigned int		element_size:16; /* - Element size (FSINFO_LIST) */
> +	int (*get)(struct path *path, struct fsinfo_context *params);
> +};
> +
> +#define __FSINFO(A, T, S, E, G, F) \
> +	{ .attr_id = A, .type = T, .flags = F, .size = S, .element_size = E, .get = G }
> +
> +#define _FSINFO(A, T, S, E, G)	  __FSINFO(A, T, S, E, G, 0)
> +#define _FSINFO_N(A, T, S, E, G)  __FSINFO(A, T, S, E, G, FSINFO_FLAGS_N)
> +#define _FSINFO_NM(A, T, S, E, G) __FSINFO(A, T, S, E, G, FSINFO_FLAGS_NM)
> +
> +#define _FSINFO_VSTRUCT(A,S,G)	  _FSINFO   (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
> +#define _FSINFO_VSTRUCT_N(A,S,G)  _FSINFO_N (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
> +#define _FSINFO_VSTRUCT_NM(A,S,G) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
> +
> +#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G)
> +#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G)
> +#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G)
> +#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
> +#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
> +#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
> +#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, FSINFO_NORMAL_ATTR_MAX_SIZE, 0, G)
> +#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, FSINFO_NORMAL_ATTR_MAX_SIZE, \
> +					   sizeof(A##__STRUCT), G)
> +#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, FSINFO_NORMAL_ATTR_MAX_SIZE, \
> +					   sizeof(A##__STRUCT), G)
> +
> +extern int fsinfo_string(const char *, struct fsinfo_context *);
> +
> +#endif /* CONFIG_FSINFO */
> +
> +#endif /* _LINUX_FSINFO_H */
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 1815065d52f3..6c3157e46e7c 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -47,6 +47,7 @@ struct stat64;
>  struct statfs;
>  struct statfs64;
>  struct statx;
> +struct fsinfo_params;
>  struct __sysctl_args;
>  struct sysinfo;
>  struct timespec;
> @@ -1003,6 +1004,9 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
>  				       siginfo_t __user *info,
>  				       unsigned int flags);
>  asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
> +asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
> +			   struct fsinfo_params __user *params,
> +			   void __user *buffer, size_t buf_size);
>  
>  /*
>   * Architecture-specific system calls
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 3a3201e4618e..9d00098a3f1b 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3)
>  __SYSCALL(__NR_openat2, sys_openat2)
>  #define __NR_pidfd_getfd 438
>  __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
> +#define __NR_fsinfo 439
> +__SYSCALL(__NR_fsinfo, sys_fsinfo)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 439
> +#define __NR_syscalls 440
>  
>  /*
>   * 32 bit systems traditionally used different
> diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
> new file mode 100644
> index 000000000000..365d54fe9290
> --- /dev/null
> +++ b/include/uapi/linux/fsinfo.h
> @@ -0,0 +1,186 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* fsinfo() definitions.
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +#ifndef _UAPI_LINUX_FSINFO_H
> +#define _UAPI_LINUX_FSINFO_H
> +
> +#include <linux/types.h>
> +#include <linux/socket.h>
> +
> +/*
> + * The filesystem attributes that can be requested.  Note that some attributes
> + * may have multiple instances which can be switched in the parameter block.
> + */
> +#define FSINFO_ATTR_STATFS		0x00	/* statfs()-style state */
> +#define FSINFO_ATTR_IDS			0x01	/* Filesystem IDs */
> +#define FSINFO_ATTR_LIMITS		0x02	/* Filesystem limits */
> +#define FSINFO_ATTR_SUPPORTS		0x03	/* What's supported in statx, iocflags, ... */
> +#define FSINFO_ATTR_TIMESTAMP_INFO	0x04	/* Inode timestamp info */
> +#define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
> +#define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
> +#define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */

I think I've muttered about the distinction between volume id and
volume name before, but I'm still wondering how confusing that will be
for users?  Let me check my assumptions, though:

Volume ID is whatever's in super_block.s_id, which (at least for xfs and
ext4) is the device name (e.g. "sda1").  I guess that's useful for
correlating a thing you can call fsinfo() on against strings that were
logged in dmesg.

Volume name I think is the fs label (e.g. "home"), which I think will
have to be implemented separately by each filesystem, and that's why
there's no generic vfs implementation.

Do I have that correct?

> +
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
> +
> +/*
> + * Optional fsinfo() parameter structure.
> + *
> + * If this is not given, it is assumed that fsinfo_attr_statfs instance 0,0 is
> + * desired.
> + */
> +struct fsinfo_params {
> +	__u32	at_flags;	/* AT_SYMLINK_NOFOLLOW and similar flags */
> +	__u32	flags;		/* Flags controlling fsinfo() specifically */
> +#define FSINFO_FLAGS_QUERY_TYPE	0x0007 /* What object should fsinfo() query? */
> +#define FSINFO_FLAGS_QUERY_PATH	0x0000 /* - path, specified by dirfd,pathname,AT_EMPTY_PATH */
> +#define FSINFO_FLAGS_QUERY_FD	0x0001 /* - fd specified by dirfd */

The 7 -> 0 -> 1 sequence here confused me until I figured out that
QUERY_TYPE is the mask for QUERY_{PATH,FD}.

> +	__u32	request;	/* ID of requested attribute */
> +	__u32	Nth;		/* Instance of it (some may have multiple) */
> +	__u32	Mth;		/* Subinstance of Nth instance */
> +	__u32	__reserved32[1]; /* Reserved params; all must be 0 */
> +	__u64	__reserved[3];
> +};
> +
> +enum fsinfo_value_type {
> +	FSINFO_TYPE_VSTRUCT	= 0,	/* Version-lengthed struct (up to 4096 bytes) */
> +	FSINFO_TYPE_STRING	= 1,	/* NUL-term var-length string (up to 4095 chars) */
> +	FSINFO_TYPE_OPAQUE	= 2,	/* Opaque blob (unlimited size) */
> +	FSINFO_TYPE_LIST	= 3,	/* List of ints/structs (unlimited size) */
> +};
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO).
> + *
> + * This gives information about the attributes supported by fsinfo for the
> + * given path.
> + */
> +struct fsinfo_attribute_info {
> +	unsigned int		attr_id;	/* The ID of the attribute */
> +	enum fsinfo_value_type	type;		/* The type of the attribute's value(s) */
> +	unsigned int		flags;
> +#define FSINFO_FLAGS_N		0x01		/* - Attr has a set of values */
> +#define FSINFO_FLAGS_NM		0x02		/* - Attr has a set of sets of values */
> +	unsigned int		size;		/* - Value size (FSINFO_STRUCT) */
> +	unsigned int		element_size;	/* - Element size (FSINFO_LIST) */
> +};
> +
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO__STRUCT struct fsinfo_attribute_info
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTES__STRUCT __u32
> +
> +struct fsinfo_u128 {
> +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
> +	__u64	hi;
> +	__u64	lo;
> +#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
> +	__u64	lo;
> +	__u64	hi;
> +#endif
> +};
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_STATFS).
> + * - This gives extended filesystem information.
> + */
> +struct fsinfo_statfs {
> +	struct fsinfo_u128 f_blocks;	/* Total number of blocks in fs */
> +	struct fsinfo_u128 f_bfree;	/* Total number of free blocks */
> +	struct fsinfo_u128 f_bavail;	/* Number of free blocks available to ordinary user */
> +	struct fsinfo_u128 f_files;	/* Total number of file nodes in fs */
> +	struct fsinfo_u128 f_ffree;	/* Number of free file nodes */
> +	struct fsinfo_u128 f_favail;	/* Number of file nodes available to ordinary user */
> +	__u64	f_bsize;		/* Optimal block size */
> +	__u64	f_frsize;		/* Fragment size */
> +};
> +
> +#define FSINFO_ATTR_STATFS__STRUCT struct fsinfo_statfs
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_IDS).
> + *
> + * List of basic identifiers as is normally found in statfs().
> + */
> +struct fsinfo_ids {
> +	char	f_fs_name[15 + 1];	/* Filesystem name */
> +	__u64	f_fsid;			/* Short 64-bit Filesystem ID (as statfs) */
> +	__u64	f_sb_id;		/* Internal superblock ID for sbnotify()/mntnotify() */
> +	__u32	f_fstype;		/* Filesystem type from linux/magic.h [uncond] */
> +	__u32	f_dev_major;		/* As st_dev_* from struct statx [uncond] */
> +	__u32	f_dev_minor;
> +	__u32	__reserved[1];
> +};
> +
> +#define FSINFO_ATTR_IDS__STRUCT struct fsinfo_ids
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_LIMITS).
> + *
> + * List of supported filesystem limits.
> + */
> +struct fsinfo_limits {
> +	struct fsinfo_u128 max_file_size;	/* Maximum file size */
> +	struct fsinfo_u128 max_ino;		/* Maximum inode number */
> +	__u64	max_uid;			/* Maximum UID supported */
> +	__u64	max_gid;			/* Maximum GID supported */
> +	__u64	max_projid;			/* Maximum project ID supported */
> +	__u64	max_hard_links;			/* Maximum number of hard links on a file */
> +	__u64	max_xattr_body_len;		/* Maximum xattr content length */
> +	__u32	max_xattr_name_len;		/* Maximum xattr name length */
> +	__u32	max_filename_len;		/* Maximum filename length */
> +	__u32	max_symlink_len;		/* Maximum symlink content length */
> +	__u32	max_dev_major;			/* Maximum device major representable */
> +	__u32	max_dev_minor;			/* Maximum device minor representable */
> +	__u32	__reserved[1];

I wonder if these structures ought to reserve more space than a single u32...

> +};
> +
> +#define FSINFO_ATTR_LIMITS__STRUCT struct fsinfo_limits
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_SUPPORTS).
> + *
> + * What's supported in various masks, such as statx() attribute and mask bits
> + * and IOC flags.
> + */
> +struct fsinfo_supports {
> +	__u64	stx_attributes;		/* What statx::stx_attributes are supported */
> +	__u32	stx_mask;		/* What statx::stx_mask bits are supported */
> +	__u32	ioc_flags;		/* What FS_IOC_* flags are supported */

"IOC"?  That just means 'ioctl'.  Is this field supposed to return the
supported FS_IOC_GETFLAGS flags, or the supported FS_IOC_FSGETXATTR
flags?

I suspect it would also be a big help to be able to tell userspace which
of the flags can be set, and which can be cleared.

> +	__u32	win_file_attrs;		/* What DOS/Windows FILE_* attributes are supported */
> +	__u32	__reserved[1];
> +};
> +
> +#define FSINFO_ATTR_SUPPORTS__STRUCT struct fsinfo_supports
> +
> +struct fsinfo_timestamp_one {
> +	__s64	minimum;	/* Minimum timestamp value in seconds */
> +	__u64	maximum;	/* Maximum timestamp value in seconds */

Given that time64_t is s64, why is the maximum here u64?

> +	__u16	gran_mantissa;	/* Granularity(secs) = mant * 10^exp */
> +	__s8	gran_exponent;
> +	__u8	reserved[5];
> +};
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_TIMESTAMP_INFO).
> + */
> +struct fsinfo_timestamp_info {
> +	struct fsinfo_timestamp_one	atime;	/* Access time */
> +	struct fsinfo_timestamp_one	mtime;	/* Modification time */
> +	struct fsinfo_timestamp_one	ctime;	/* Change time */
> +	struct fsinfo_timestamp_one	btime;	/* Birth/creation time */
> +};
> +
> +#define FSINFO_ATTR_TIMESTAMP_INFO__STRUCT struct fsinfo_timestamp_info
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_VOLUME_UUID).
> + */
> +struct fsinfo_volume_uuid {
> +	__u8	uuid[16];
> +};
> +
> +#define FSINFO_ATTR_VOLUME_UUID__STRUCT struct fsinfo_volume_uuid
> +
> +#endif /* _UAPI_LINUX_FSINFO_H */
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 3b69a560a7ac..58246e6b5603 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents);
>  COND_SYSCALL(io_uring_setup);
>  COND_SYSCALL(io_uring_enter);
>  COND_SYSCALL(io_uring_register);
> +COND_SYSCALL(fsinfo);
>  
>  /* fs/xattr.c */
>  
> diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
> index 65acdde5c117..9159ad1d7fc5 100644
> --- a/samples/vfs/Makefile
> +++ b/samples/vfs/Makefile
> @@ -1,10 +1,15 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  # List of programs to build
> +
>  hostprogs := \
> +	test-fsinfo \
>  	test-fsmount \
>  	test-statx
>  
>  always-y := $(hostprogs)
>  
> +HOSTCFLAGS_test-fsinfo.o += -I$(objtree)/usr/include
> +HOSTLDLIBS_test-fsinfo += -static -lm
> +
>  HOSTCFLAGS_test-fsmount.o += -I$(objtree)/usr/include
>  HOSTCFLAGS_test-statx.o += -I$(objtree)/usr/include
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> new file mode 100644
> index 000000000000..9a4d49db2996
> --- /dev/null
> +++ b/samples/vfs/test-fsinfo.c
> @@ -0,0 +1,599 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Test the fsinfo() system call
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#define _GNU_SOURCE
> +#define _ATFILE_SOURCE
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <ctype.h>
> +#include <errno.h>
> +#include <time.h>
> +#include <math.h>
> +#include <fcntl.h>
> +#include <sys/syscall.h>
> +#include <linux/fsinfo.h>
> +#include <linux/socket.h>
> +#include <sys/stat.h>
> +#include <arpa/inet.h>
> +
> +#ifndef __NR_fsinfo
> +#define __NR_fsinfo -1
> +#endif
> +
> +static bool debug = 0;
> +
> +static __attribute__((unused))
> +ssize_t fsinfo(int dfd, const char *filename, struct fsinfo_params *params,
> +	       void *buffer, size_t buf_size)
> +{
> +	return syscall(__NR_fsinfo, dfd, filename, params, buffer, buf_size);
> +}
> +
> +struct fsinfo_attribute {
> +	unsigned int		attr_id;
> +	enum fsinfo_value_type	type;
> +	unsigned int		size;
> +	const char		*name;
> +	void (*dump)(void *reply, unsigned int size);
> +};
> +
> +static const struct fsinfo_attribute fsinfo_attributes[];
> +
> +static void dump_hex(unsigned int *data, int from, int to)
> +{
> +	unsigned offset, print_offset = 1, col = 0;
> +
> +	from /= 4;
> +	to = (to + 3) / 4;
> +
> +	for (offset = from; offset < to; offset++) {
> +		if (print_offset) {
> +			printf("%04x: ", offset * 8);
> +			print_offset = 0;
> +		}
> +		printf("%08x", data[offset]);
> +		col++;
> +		if ((col & 3) == 0) {
> +			printf("\n");
> +			print_offset = 1;
> +		} else {
> +			printf(" ");
> +		}
> +	}
> +
> +	if (!print_offset)
> +		printf("\n");
> +}
> +
> +static void dump_attribute_info(void *reply, unsigned int size)
> +{
> +	struct fsinfo_attribute_info *attr_info = reply;
> +	const struct fsinfo_attribute *attr;
> +	char type[32];
> +
> +	switch (attr_info->type) {
> +	case FSINFO_TYPE_VSTRUCT:	strcpy(type, "V-STRUCT");	break;
> +	case FSINFO_TYPE_STRING:	strcpy(type, "STRING");		break;
> +	case FSINFO_TYPE_OPAQUE:	strcpy(type, "OPAQUE");		break;
> +	case FSINFO_TYPE_LIST:		strcpy(type, "LIST");		break;
> +	default:
> +		sprintf(type, "type-%x", attr_info->type);
> +		break;
> +	}
> +
> +	if (attr_info->flags & FSINFO_FLAGS_N)
> +		strcat(type, " x N");
> +	else if (attr_info->flags & FSINFO_FLAGS_NM)
> +		strcat(type, " x NM");
> +
> +	for (attr = fsinfo_attributes; attr->name; attr++)
> +		if (attr->attr_id == attr_info->attr_id)
> +			break;
> +
> +	printf("%8x %-12s %08x %5u %5u %s\n",
> +	       attr_info->attr_id,
> +	       type,
> +	       attr_info->flags,
> +	       attr_info->size,
> +	       attr_info->element_size,
> +	       attr->name ? attr->name : "");
> +}
> +
> +static void dump_fsinfo_generic_statfs(void *reply, unsigned int size)
> +{
> +	struct fsinfo_statfs *f = reply;
> +
> +	printf("\n");
> +	printf("\tblocks       : n=%llu fr=%llu av=%llu\n",
> +	       (unsigned long long)f->f_blocks.lo,
> +	       (unsigned long long)f->f_bfree.lo,
> +	       (unsigned long long)f->f_bavail.lo);
> +
> +	printf("\tfiles        : n=%llu fr=%llu av=%llu\n",
> +	       (unsigned long long)f->f_files.lo,
> +	       (unsigned long long)f->f_ffree.lo,
> +	       (unsigned long long)f->f_favail.lo);
> +	printf("\tbsize        : %llu\n", f->f_bsize);
> +	printf("\tfrsize       : %llu\n", f->f_frsize);
> +}
> +
> +static void dump_fsinfo_generic_ids(void *reply, unsigned int size)
> +{
> +	struct fsinfo_ids *f = reply;
> +
> +	printf("\n");
> +	printf("\tdev          : %02x:%02x\n", f->f_dev_major, f->f_dev_minor);
> +	printf("\tfs           : type=%x name=%s\n", f->f_fstype, f->f_fs_name);
> +	printf("\tfsid         : %llx\n", (unsigned long long)f->f_fsid);
> +}
> +
> +static void dump_fsinfo_generic_limits(void *reply, unsigned int size)
> +{
> +	struct fsinfo_limits *f = reply;
> +
> +	printf("\n");
> +	printf("\tmax file size: %llx%016llx\n",
> +	       (unsigned long long)f->max_file_size.hi,
> +	       (unsigned long long)f->max_file_size.lo);
> +	printf("\tmax ino      : %llx%016llx\n",
> +	       (unsigned long long)f->max_ino.hi,
> +	       (unsigned long long)f->max_ino.lo);
> +	printf("\tmax ids      : u=%llx g=%llx p=%llx\n",
> +	       (unsigned long long)f->max_uid,
> +	       (unsigned long long)f->max_gid,
> +	       (unsigned long long)f->max_projid);
> +	printf("\tmax dev      : maj=%x min=%x\n",
> +	       f->max_dev_major, f->max_dev_minor);
> +	printf("\tmax links    : %llx\n",
> +	       (unsigned long long)f->max_hard_links);
> +	printf("\tmax xattr    : n=%x b=%llx\n",
> +	       f->max_xattr_name_len,
> +	       (unsigned long long)f->max_xattr_body_len);
> +	printf("\tmax len      : file=%x sym=%x\n",
> +	       f->max_filename_len, f->max_symlink_len);
> +}
> +
> +static void dump_fsinfo_generic_supports(void *reply, unsigned int size)
> +{
> +	struct fsinfo_supports *f = reply;
> +
> +	printf("\n");
> +	printf("\tstx_attr     : %llx\n", (unsigned long long)f->stx_attributes);
> +	printf("\tstx_mask     : %x\n", f->stx_mask);
> +	printf("\tioc_flags    : %x\n", f->ioc_flags);
> +	printf("\twin_fattrs   : %x\n", f->win_file_attrs);
> +}
> +
> +static void print_time(struct fsinfo_timestamp_one *t, char stamp)
> +{
> +	printf("\t%ctime       : gran=%gs range=%llx-%llx\n",
> +	       stamp,
> +	       t->gran_mantissa * pow(10., t->gran_exponent),
> +	       (long long)t->minimum,
> +	       (long long)t->maximum);
> +}
> +
> +static void dump_fsinfo_generic_timestamp_info(void *reply, unsigned int size)
> +{
> +	struct fsinfo_timestamp_info *f = reply;
> +
> +	printf("\n");
> +	print_time(&f->atime, 'a');
> +	print_time(&f->mtime, 'm');
> +	print_time(&f->ctime, 'c');
> +	print_time(&f->btime, 'b');
> +}
> +
> +static void dump_fsinfo_generic_volume_uuid(void *reply, unsigned int size)
> +{
> +	struct fsinfo_volume_uuid *f = reply;
> +
> +	printf("%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x"
> +	       "-%02x%02x%02x%02x%02x%02x\n",
> +	       f->uuid[ 0], f->uuid[ 1],
> +	       f->uuid[ 2], f->uuid[ 3],
> +	       f->uuid[ 4], f->uuid[ 5],
> +	       f->uuid[ 6], f->uuid[ 7],
> +	       f->uuid[ 8], f->uuid[ 9],
> +	       f->uuid[10], f->uuid[11],
> +	       f->uuid[12], f->uuid[13],
> +	       f->uuid[14], f->uuid[15]);
> +}
> +
> +static void dump_string(void *reply, unsigned int size)
> +{
> +	char *s = reply, *p;
> +
> +	p = s;
> +	if (size >= 4096) {
> +		size = 4096;
> +		p[4092] = '.';
> +		p[4093] = '.';
> +		p[4094] = '.';
> +		p[4095] = 0;
> +	} else {
> +		p[size] = 0;
> +	}
> +
> +	for (p = s; *p; p++) {
> +		if (!isprint(*p)) {
> +			printf("<non-printable>\n");
> +			continue;
> +		}
> +	}
> +
> +	printf("%s\n", s);
> +}
> +
> +#define dump_fsinfo_generic_volume_id		dump_string
> +#define dump_fsinfo_generic_volume_name		dump_string
> +#define dump_fsinfo_generic_name_encoding	dump_string
> +
> +/*
> + *
> + */
> +#define __FSINFO(A, T, S, U, G, F) \
> +	{ .attr_id = A, .type = T, .size = S, .name = #G, .dump = dump_##G }
> +
> +#define _FSINFO(A, T, S, U, G)	  __FSINFO(A, T, S, U, G, 0)
> +#define _FSINFO_N(A, T, S, U, G)  __FSINFO(A, T, S, U, G, FSINFO_FLAGS_N)
> +#define _FSINFO_NM(A, T, S, U, G) __FSINFO(A, T, S, U, G, FSINFO_FLAGS_NM)
> +
> +#define _FSINFO_VSTRUCT(A,S,G)	 _FSINFO    (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
> +#define _FSINFO_VSTRUCT_N(A,S,G) _FSINFO_N  (A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
> +#define _FSINFO_VSTRUCT_NM(A,S,G) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), 0, G)
> +
> +#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G)
> +#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G)
> +#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G)
> +#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, 0, 0, G)
> +#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, 0, 0, G)
> +#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, 0, 0, G)
> +#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, 0, 0, G)
> +#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, 0, sizeof(A##__STRUCT), G)
> +#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, 0, sizeof(A##__STRUCT), G)
> +
> +static const struct fsinfo_attribute fsinfo_attributes[] = {
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
> +	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
> +	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	fsinfo_generic_volume_name),
> +	{}
> +};
> +
> +static void dump_value(unsigned int attr_id,
> +		       const struct fsinfo_attribute *attr,
> +		       const struct fsinfo_attribute_info *attr_info,
> +		       void *reply, unsigned int size)
> +{
> +	if (!attr || !attr->dump) {
> +		printf("<no dumper>\n");
> +		return;
> +	}
> +
> +	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
> +		printf("<short data %u/%u>\n", size, attr->size);
> +		return;
> +	}
> +
> +	attr->dump(reply, size);
> +}
> +
> +static void dump_list(unsigned int attr_id,
> +		      const struct fsinfo_attribute *attr,
> +		      const struct fsinfo_attribute_info *attr_info,
> +		      void *reply, unsigned int size)
> +{
> +	size_t elem_size = attr_info->element_size;
> +	unsigned int ix = 0;
> +
> +	printf("\n");
> +	if (!attr || !attr->dump) {
> +		printf("<no dumper>\n");
> +		return;
> +	}
> +
> +	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
> +		printf("<short data %u/%u>\n", size, attr->size);
> +		return;
> +	}
> +
> +	while (size >= elem_size) {
> +		printf("\t[%02x] ", ix);
> +		attr->dump(reply, size);
> +		reply += elem_size;
> +		size -= elem_size;
> +		ix++;
> +	}
> +}
> +
> +/*
> + * Call fsinfo, expanding the buffer as necessary.
> + */
> +static ssize_t get_fsinfo(const char *file, const char *name,
> +			  struct fsinfo_params *params, void **_r)
> +{
> +	ssize_t ret;
> +	size_t buf_size = 4096;
> +	void *r;
> +
> +	for (;;) {
> +		r = malloc(buf_size);
> +		if (!r) {
> +			perror("malloc");
> +			exit(1);
> +		}
> +		memset(r, 0xbd, buf_size);
> +
> +		errno = 0;
> +		ret = fsinfo(AT_FDCWD, file, params, r, buf_size);
> +		if (ret == -1) {
> +			free(r);
> +			*_r = NULL;
> +			return ret;
> +		}
> +
> +		if (ret <= buf_size)
> +			break;
> +		buf_size = (ret + 4096 - 1) & ~(4096 - 1);
> +	}
> +
> +	if (debug) {
> +		if (ret == -1)
> +			printf("fsinfo(%s,%s,%u,%u) = %m\n",
> +			       file, name, params->Nth, params->Mth);
> +		else
> +			printf("fsinfo(%s,%s,%u,%u) = %zd\n",
> +			       file, name, params->Nth, params->Mth, ret);
> +	}
> +
> +	*_r = r;
> +	return ret;
> +}
> +
> +/*
> + * Try one subinstance of an attribute.
> + */
> +static int try_one(const char *file, struct fsinfo_params *params,
> +		   const struct fsinfo_attribute_info *attr_info, bool raw)
> +{
> +	const struct fsinfo_attribute *attr;
> +	const char *name;
> +	size_t size = 4096;
> +	char namebuf[32];
> +	void *r;
> +
> +	//printf("try %03x[%u][%u]\n", params->request, params->Nth, params->Mth);

Stray debugging statement?

--D

> +
> +	for (attr = fsinfo_attributes; attr->name; attr++) {
> +		if (attr->attr_id == params->request) {
> +			name = attr->name;
> +			if (strncmp(name, "fsinfo_generic_", 15) == 0)
> +				name += 15;
> +			goto found;
> +		}
> +	}
> +
> +	sprintf(namebuf, "<unknown-%x>", params->request);
> +	name = namebuf;
> +	attr = NULL;
> +
> +found:
> +	size = get_fsinfo(file, name, params, &r);
> +
> +	if (size == -1) {
> +		if (errno == ENODATA) {
> +			if (!(attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) &&
> +			    params->Nth == 0 && params->Mth == 0) {
> +				fprintf(stderr,
> +					"Unexpected ENODATA (0x%x{%u}{%u})\n",
> +					params->request, params->Nth, params->Mth);
> +				exit(1);
> +			}
> +			free(r);
> +			return (params->Mth == 0) ? 2 : 1;
> +		}
> +		if (errno == EOPNOTSUPP) {
> +			if (params->Nth > 0 || params->Mth > 0) {
> +				fprintf(stderr,
> +					"Should return -ENODATA (0x%x{%u}{%u})\n",
> +					params->request, params->Nth, params->Mth);
> +				exit(1);
> +			}
> +			//printf("\e[33m%s\e[m: <not supported>\n",
> +			//       fsinfo_attr_names[attr]);
> +			free(r);
> +			return 2;
> +		}
> +		perror(file);
> +		exit(1);
> +	}
> +
> +	if (raw) {
> +		if (size > 4096)
> +			size = 4096;
> +		dump_hex(r, 0, size);
> +		free(r);
> +		return 0;
> +	}
> +
> +	switch (attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) {
> +	case 0:
> +		printf("\e[33m%s\e[m: ", name);
> +		break;
> +	case FSINFO_FLAGS_N:
> +		printf("\e[33m%s{%u}\e[m: ", name, params->Nth);
> +		break;
> +	case FSINFO_FLAGS_NM:
> +		printf("\e[33m%s{%u,%u}\e[m: ", name, params->Nth, params->Mth);
> +		break;
> +	}
> +
> +	switch (attr_info->type) {
> +	case FSINFO_TYPE_VSTRUCT:
> +	case FSINFO_TYPE_STRING:
> +		dump_value(params->request, attr, attr_info, r, size);
> +		free(r);
> +		return 0;
> +
> +	case FSINFO_TYPE_LIST:
> +		dump_list(params->request, attr, attr_info, r, size);
> +		free(r);
> +		return 0;
> +
> +	case FSINFO_TYPE_OPAQUE:
> +		free(r);
> +		return 0;
> +
> +	default:
> +		fprintf(stderr, "Fishy about %u 0x%x,%x,%x\n",
> +			params->request, attr_info->type, attr_info->flags, attr_info->size);
> +		exit(1);
> +	}
> +}
> +
> +static int cmp_u32(const void *a, const void *b)
> +{
> +	return *(const int *)a - *(const int *)b;
> +}
> +
> +/*
> + *
> + */
> +int main(int argc, char **argv)
> +{
> +	struct fsinfo_attribute_info attr_info;
> +	struct fsinfo_params params = {
> +		.at_flags	= AT_SYMLINK_NOFOLLOW,
> +		.flags		= FSINFO_FLAGS_QUERY_PATH,
> +	};
> +	unsigned int *attrs, ret, nr, i;
> +	bool meta = false;
> +	int raw = 0, opt, Nth, Mth;
> +
> +	while ((opt = getopt(argc, argv, "adlmr"))) {
> +		switch (opt) {
> +		case 'a':
> +			params.at_flags |= AT_NO_AUTOMOUNT;
> +			continue;
> +		case 'd':
> +			debug = true;
> +			continue;
> +		case 'l':
> +			params.at_flags &= ~AT_SYMLINK_NOFOLLOW;
> +			continue;
> +		case 'm':
> +			meta = true;
> +			continue;
> +		case 'r':
> +			raw = 1;
> +			continue;
> +		}
> +		break;
> +	}
> +
> +	argc -= optind;
> +	argv += optind;
> +
> +	if (argc != 1) {
> +		printf("Format: test-fsinfo [-alr] <file>\n");
> +		exit(2);
> +	}
> +
> +	/* Retrieve a list of supported attribute IDs */
> +	params.request = FSINFO_ATTR_FSINFO_ATTRIBUTES;
> +	params.Nth = 0;
> +	params.Mth = 0;
> +	ret = get_fsinfo(argv[0], "attributes", &params, (void **)&attrs);
> +	if (ret == -1) {
> +		fprintf(stderr, "Unable to get attribute list: %m\n");
> +		exit(1);
> +	}
> +
> +	if (ret % sizeof(attrs[0])) {
> +		fprintf(stderr, "Bad length of attribute list (0x%x)\n", ret);
> +		exit(2);
> +	}
> +
> +	nr = ret / sizeof(attrs[0]);
> +	qsort(attrs, nr, sizeof(attrs[0]), cmp_u32);
> +
> +	if (meta) {
> +		printf("ATTR ID  TYPE         FLAGS    SIZE  ESIZE NAME\n");
> +		printf("======== ============ ======== ===== ===== =========\n");
> +		for (i = 0; i < nr; i++) {
> +			params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
> +			params.Nth = attrs[i];
> +			params.Mth = 0;
> +			ret = fsinfo(AT_FDCWD, argv[0], &params, &attr_info, sizeof(attr_info));
> +			if (ret == -1) {
> +				fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
> +				exit(1);
> +			}
> +
> +			dump_attribute_info(&attr_info, ret);
> +		}
> +		exit(0);
> +	}
> +
> +	for (i = 0; i < nr; i++) {
> +		params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
> +		params.Nth = attrs[i];
> +		params.Mth = 0;
> +		ret = fsinfo(AT_FDCWD, argv[0], &params, &attr_info, sizeof(attr_info));
> +		if (ret == -1) {
> +			fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
> +			exit(1);
> +		}
> +
> +		if (attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
> +		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTES)
> +			continue;
> +
> +		Nth = 0;
> +		do {
> +			Mth = 0;
> +			do {
> +				params.request = attrs[i];
> +				params.Nth = Nth;
> +				params.Mth = Mth;
> +
> +				switch (try_one(argv[0], &params, &attr_info, raw)) {
> +				case 0:
> +					continue;
> +				case 1:
> +					goto done_M;
> +				case 2:
> +					goto done_N;
> +				}
> +			} while (++Mth < 100);
> +
> +		done_M:
> +			if (Mth >= 100) {
> +				fprintf(stderr, "Fishy: Mth %x[%u][%u]\n", attrs[i], Nth, Mth);
> +				break;
> +			}
> +
> +		} while (++Nth < 100);
> +
> +	done_N:
> +		if (Nth >= 100) {
> +			fprintf(stderr, "Fishy: Nth %x[%u]\n", attrs[i], Nth);
> +			break;
> +		}
> +	}
> +
> +	return 0;
> +}
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/19] fsinfo: Provide a bitmap of supported features [ver #16]
  2020-02-18 17:05 ` [PATCH 03/19] fsinfo: Provide a bitmap of supported features " David Howells
@ 2020-02-19 16:37   ` Darrick J. Wong
  2020-02-20 12:22   ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-19 16:37 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:05:20PM +0000, David Howells wrote:
> Provide a bitmap of features that a filesystem may provide for the path
> being queried.  Features include such things as:
> 
>  (1) The general class of filesystem, such as kernel-interface,
>      block-based, flash-based, network-based.
> 
>  (2) Supported inode features, such as which timestamps are supported,
>      whether simple numeric user, group or project IDs are supported and
>      whether user identification is actually more complex behind the
>      scenes.
> 
>  (3) Supported volume features, such as it having a UUID, a name or a
>      filesystem ID.
> 
>  (4) Supported filesystem features, such as what types of file are
>      supported, whether sparse files, extended attributes and quotas are
>      supported.
> 
>  (5) Supported interface features, such as whether locking and leases are
>      supported, what open flags are honoured and how i_version is managed.
> 
> For some filesystems, this may be an immutable set and can just be memcpy'd
> into the reply buffer.
> ---
> 
>  fs/fsinfo.c                 |   41 +++++++++++++++++++
>  include/linux/fsinfo.h      |   32 +++++++++++++++
>  include/uapi/linux/fsinfo.h |   78 ++++++++++++++++++++++++++++++++++++
>  samples/vfs/test-fsinfo.c   |   93 ++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 242 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> index 3bc35b91f20b..55710d6da327 100644
> --- a/fs/fsinfo.c
> +++ b/fs/fsinfo.c
> @@ -36,6 +36,17 @@ int fsinfo_string(const char *s, struct fsinfo_context *ctx)
>  }
>  EXPORT_SYMBOL(fsinfo_string);
>  
> +/*
> + * Get information about fsinfo() itself.
> + */
> +static int fsinfo_generic_fsinfo(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_fsinfo *info = ctx->buffer;
> +
> +	info->max_features = FSINFO_FEAT__NR;
> +	return sizeof(*info);
> +}
> +
>  /*
>   * Get basic filesystem stats from statfs.
>   */
> @@ -121,6 +132,33 @@ static int fsinfo_generic_supports(struct path *path, struct fsinfo_context *ctx
>  	return sizeof(*c);
>  }
>  
> +static int fsinfo_generic_features(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_features *ft = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	if (sb->s_mtd)
> +		fsinfo_set_feature(ft, FSINFO_FEAT_IS_FLASH_FS);
> +	else if (sb->s_bdev)
> +		fsinfo_set_feature(ft, FSINFO_FEAT_IS_BLOCK_FS);
> +
> +	if (sb->s_quota_types & QTYPE_MASK_USR)
> +		fsinfo_set_feature(ft, FSINFO_FEAT_USER_QUOTAS);
> +	if (sb->s_quota_types & QTYPE_MASK_GRP)
> +		fsinfo_set_feature(ft, FSINFO_FEAT_GROUP_QUOTAS);
> +	if (sb->s_quota_types & QTYPE_MASK_PRJ)
> +		fsinfo_set_feature(ft, FSINFO_FEAT_PROJECT_QUOTAS);
> +	if (sb->s_d_op && sb->s_d_op->d_automount)
> +		fsinfo_set_feature(ft, FSINFO_FEAT_AUTOMOUNTS);
> +	if (sb->s_id[0])
> +		fsinfo_set_feature(ft, FSINFO_FEAT_VOLUME_ID);
> +
> +	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_ATIME);
> +	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_CTIME);
> +	fsinfo_set_feature(ft, FSINFO_FEAT_HAS_MTIME);
> +	return sizeof(*ft);
> +}
> +
>  static const struct fsinfo_timestamp_info fsinfo_default_timestamp_info = {
>  	.atime = {
>  		.minimum	= S64_MIN,
> @@ -252,6 +290,9 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
>  	FSINFO_STRING 	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
> +
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_FSINFO,		fsinfo_generic_fsinfo),
>  	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_attribute_info),
>  	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_attributes),
>  	{}
> diff --git a/include/linux/fsinfo.h b/include/linux/fsinfo.h
> index dcd55dbb02fa..564765c70659 100644
> --- a/include/linux/fsinfo.h
> +++ b/include/linux/fsinfo.h
> @@ -65,6 +65,38 @@ struct fsinfo_attribute {
>  
>  extern int fsinfo_string(const char *, struct fsinfo_context *);
>  
> +static inline void fsinfo_set_feature(struct fsinfo_features *f,
> +				      enum fsinfo_feature feature)
> +{
> +	f->features[feature / 8] |= 1 << (feature % 8);
> +}
> +
> +static inline void fsinfo_clear_feature(struct fsinfo_features *f,
> +					enum fsinfo_feature feature)
> +{
> +	f->features[feature / 8] &= ~(1 << (feature % 8));
> +}
> +
> +/**
> + * fsinfo_set_unix_features - Set standard UNIX features.
> + * @f: The features mask to alter
> + */
> +static inline void fsinfo_set_unix_features(struct fsinfo_features *f)
> +{
> +	fsinfo_set_feature(f, FSINFO_FEAT_UIDS);
> +	fsinfo_set_feature(f, FSINFO_FEAT_GIDS);
> +	fsinfo_set_feature(f, FSINFO_FEAT_DIRECTORIES);
> +	fsinfo_set_feature(f, FSINFO_FEAT_SYMLINKS);
> +	fsinfo_set_feature(f, FSINFO_FEAT_HARD_LINKS);
> +	fsinfo_set_feature(f, FSINFO_FEAT_DEVICE_FILES);
> +	fsinfo_set_feature(f, FSINFO_FEAT_UNIX_SPECIALS);
> +	fsinfo_set_feature(f, FSINFO_FEAT_SPARSE);
> +	fsinfo_set_feature(f, FSINFO_FEAT_HAS_ATIME);
> +	fsinfo_set_feature(f, FSINFO_FEAT_HAS_CTIME);
> +	fsinfo_set_feature(f, FSINFO_FEAT_HAS_MTIME);
> +	fsinfo_set_feature(f, FSINFO_FEAT_HAS_INODE_NUMBERS);
> +}
> +
>  #endif /* CONFIG_FSINFO */
>  
>  #endif /* _LINUX_FSINFO_H */
> diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
> index 365d54fe9290..f40b5c0b5516 100644
> --- a/include/uapi/linux/fsinfo.h
> +++ b/include/uapi/linux/fsinfo.h
> @@ -22,9 +22,11 @@
>  #define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
>  #define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
>  #define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
> +#define FSINFO_ATTR_FEATURES		0x08	/* Filesystem features (bits) */
>  
>  #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
>  #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
> +#define FSINFO_ATTR_FSINFO		0x102	/* Information about fsinfo() as a whole */
>  
>  /*
>   * Optional fsinfo() parameter structure.
> @@ -45,6 +47,17 @@ struct fsinfo_params {
>  	__u64	__reserved[3];
>  };
>  
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_FSINFO).
> + *
> + * This gives information about fsinfo() itself.
> + */
> +struct fsinfo_fsinfo {
> +	__u32	max_features;	/* Number of supported features (fsinfo_features__nr) */
> +};
> +
> +#define FSINFO_ATTR_FSINFO__STRUCT struct fsinfo_fsinfo
> +
>  enum fsinfo_value_type {
>  	FSINFO_TYPE_VSTRUCT	= 0,	/* Version-lengthed struct (up to 4096 bytes) */
>  	FSINFO_TYPE_STRING	= 1,	/* NUL-term var-length string (up to 4095 chars) */
> @@ -154,6 +167,71 @@ struct fsinfo_supports {
>  
>  #define FSINFO_ATTR_SUPPORTS__STRUCT struct fsinfo_supports
>  
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_FEATURES).
> + *
> + * Bitmask indicating filesystem features where renderable as single bits.
> + */
> +enum fsinfo_feature {
> +	FSINFO_FEAT_IS_KERNEL_FS	= 0,	/* fs is kernel-special filesystem */
> +	FSINFO_FEAT_IS_BLOCK_FS		= 1,	/* fs is block-based filesystem */
> +	FSINFO_FEAT_IS_FLASH_FS		= 2,	/* fs is flash filesystem */
> +	FSINFO_FEAT_IS_NETWORK_FS	= 3,	/* fs is network filesystem */
> +	FSINFO_FEAT_IS_AUTOMOUNTER_FS	= 4,	/* fs is automounter special filesystem */
> +	FSINFO_FEAT_IS_MEMORY_FS	= 5,	/* fs is memory-based filesystem */
> +	FSINFO_FEAT_AUTOMOUNTS		= 6,	/* fs supports automounts */
> +	FSINFO_FEAT_ADV_LOCKS		= 7,	/* fs supports advisory file locking */
> +	FSINFO_FEAT_MAND_LOCKS		= 8,	/* fs supports mandatory file locking */
> +	FSINFO_FEAT_LEASES		= 9,	/* fs supports file leases */
> +	FSINFO_FEAT_UIDS		= 10,	/* fs supports numeric uids */
> +	FSINFO_FEAT_GIDS		= 11,	/* fs supports numeric gids */
> +	FSINFO_FEAT_PROJIDS		= 12,	/* fs supports numeric project ids */
> +	FSINFO_FEAT_STRING_USER_IDS	= 13,	/* fs supports string user identifiers */
> +	FSINFO_FEAT_GUID_USER_IDS	= 14,	/* fs supports GUID user identifiers */
> +	FSINFO_FEAT_WINDOWS_ATTRS	= 15,	/* fs has windows attributes */
> +	FSINFO_FEAT_USER_QUOTAS		= 16,	/* fs has per-user quotas */
> +	FSINFO_FEAT_GROUP_QUOTAS	= 17,	/* fs has per-group quotas */
> +	FSINFO_FEAT_PROJECT_QUOTAS	= 18,	/* fs has per-project quotas */
> +	FSINFO_FEAT_XATTRS		= 19,	/* fs has xattrs */
> +	FSINFO_FEAT_JOURNAL		= 20,	/* fs has a journal */
> +	FSINFO_FEAT_DATA_IS_JOURNALLED	= 21,	/* fs is using data journalling */
> +	FSINFO_FEAT_O_SYNC		= 22,	/* fs supports O_SYNC */
> +	FSINFO_FEAT_O_DIRECT		= 23,	/* fs supports O_DIRECT */
> +	FSINFO_FEAT_VOLUME_ID		= 24,	/* fs has a volume ID */
> +	FSINFO_FEAT_VOLUME_UUID		= 25,	/* fs has a volume UUID */
> +	FSINFO_FEAT_VOLUME_NAME		= 26,	/* fs has a volume name */
> +	FSINFO_FEAT_VOLUME_FSID		= 27,	/* fs has a volume FSID */
> +	FSINFO_FEAT_IVER_ALL_CHANGE	= 28,	/* i_version represents data + meta changes */
> +	FSINFO_FEAT_IVER_DATA_CHANGE	= 29,	/* i_version represents data changes only */
> +	FSINFO_FEAT_IVER_MONO_INCR	= 30,	/* i_version incremented monotonically */
> +	FSINFO_FEAT_DIRECTORIES		= 31,	/* fs supports (sub)directories */
> +	FSINFO_FEAT_SYMLINKS		= 32,	/* fs supports symlinks */
> +	FSINFO_FEAT_HARD_LINKS		= 33,	/* fs supports hard links */
> +	FSINFO_FEAT_HARD_LINKS_1DIR	= 34,	/* fs supports hard links in same dir only */
> +	FSINFO_FEAT_DEVICE_FILES	= 35,	/* fs supports bdev, cdev */
> +	FSINFO_FEAT_UNIX_SPECIALS	= 36,	/* fs supports pipe, fifo, socket */
> +	FSINFO_FEAT_RESOURCE_FORKS	= 37,	/* fs supports resource forks/streams */
> +	FSINFO_FEAT_NAME_CASE_INDEP	= 38,	/* Filename case independence is mandatory */
> +	FSINFO_FEAT_NAME_NON_UTF8	= 39,	/* fs has non-utf8 names */
> +	FSINFO_FEAT_NAME_HAS_CODEPAGE	= 40,	/* fs has a filename codepage */
> +	FSINFO_FEAT_SPARSE		= 41,	/* fs supports sparse files */
> +	FSINFO_FEAT_NOT_PERSISTENT	= 42,	/* fs is not persistent */
> +	FSINFO_FEAT_NO_UNIX_MODE	= 43,	/* fs does not support unix mode bits */
> +	FSINFO_FEAT_HAS_ATIME		= 44,	/* fs supports access time */
> +	FSINFO_FEAT_HAS_BTIME		= 45,	/* fs supports birth/creation time */
> +	FSINFO_FEAT_HAS_CTIME		= 46,	/* fs supports change time */
> +	FSINFO_FEAT_HAS_MTIME		= 47,	/* fs supports modification time */
> +	FSINFO_FEAT_HAS_ACL		= 48,	/* fs supports ACLs of some sort */
> +	FSINFO_FEAT_HAS_INODE_NUMBERS	= 49,	/* fs has inode numbers */
> +	FSINFO_FEAT__NR
> +};
> +
> +struct fsinfo_features {
> +	__u8	features[(FSINFO_FEAT__NR + 7) / 8];

Hm...the structure size is pretty small (56 bytes) and will expand as we
add new _FEAT flags.  Is this ok because the fsinfo code will truncate
its response to userspace to whatever buffer size userspace tells it?

--D

> +};
> +
> +#define FSINFO_ATTR_FEATURES__STRUCT struct fsinfo_features
> +
>  struct fsinfo_timestamp_one {
>  	__s64	minimum;	/* Minimum timestamp value in seconds */
>  	__u64	maximum;	/* Maximum timestamp value in seconds */
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> index 9a4d49db2996..6fbf0ce099b2 100644
> --- a/samples/vfs/test-fsinfo.c
> +++ b/samples/vfs/test-fsinfo.c
> @@ -107,6 +107,13 @@ static void dump_attribute_info(void *reply, unsigned int size)
>  	       attr->name ? attr->name : "");
>  }
>  
> +static void dump_fsinfo_info(void *reply, unsigned int size)
> +{
> +	struct fsinfo_fsinfo *f = reply;
> +
> +	printf("Number of feature bits: %u\n", f->max_features);
> +}
> +
>  static void dump_fsinfo_generic_statfs(void *reply, unsigned int size)
>  {
>  	struct fsinfo_statfs *f = reply;
> @@ -172,6 +179,74 @@ static void dump_fsinfo_generic_supports(void *reply, unsigned int size)
>  	printf("\twin_fattrs   : %x\n", f->win_file_attrs);
>  }
>  
> +#define FSINFO_FEATURE_NAME(C) [FSINFO_FEAT_##C] = #C
> +static const char *fsinfo_feature_names[FSINFO_FEAT__NR] = {
> +	FSINFO_FEATURE_NAME(IS_KERNEL_FS),
> +	FSINFO_FEATURE_NAME(IS_BLOCK_FS),
> +	FSINFO_FEATURE_NAME(IS_FLASH_FS),
> +	FSINFO_FEATURE_NAME(IS_NETWORK_FS),
> +	FSINFO_FEATURE_NAME(IS_AUTOMOUNTER_FS),
> +	FSINFO_FEATURE_NAME(IS_MEMORY_FS),
> +	FSINFO_FEATURE_NAME(AUTOMOUNTS),
> +	FSINFO_FEATURE_NAME(ADV_LOCKS),
> +	FSINFO_FEATURE_NAME(MAND_LOCKS),
> +	FSINFO_FEATURE_NAME(LEASES),
> +	FSINFO_FEATURE_NAME(UIDS),
> +	FSINFO_FEATURE_NAME(GIDS),
> +	FSINFO_FEATURE_NAME(PROJIDS),
> +	FSINFO_FEATURE_NAME(STRING_USER_IDS),
> +	FSINFO_FEATURE_NAME(GUID_USER_IDS),
> +	FSINFO_FEATURE_NAME(WINDOWS_ATTRS),
> +	FSINFO_FEATURE_NAME(USER_QUOTAS),
> +	FSINFO_FEATURE_NAME(GROUP_QUOTAS),
> +	FSINFO_FEATURE_NAME(PROJECT_QUOTAS),
> +	FSINFO_FEATURE_NAME(XATTRS),
> +	FSINFO_FEATURE_NAME(JOURNAL),
> +	FSINFO_FEATURE_NAME(DATA_IS_JOURNALLED),
> +	FSINFO_FEATURE_NAME(O_SYNC),
> +	FSINFO_FEATURE_NAME(O_DIRECT),
> +	FSINFO_FEATURE_NAME(VOLUME_ID),
> +	FSINFO_FEATURE_NAME(VOLUME_UUID),
> +	FSINFO_FEATURE_NAME(VOLUME_NAME),
> +	FSINFO_FEATURE_NAME(VOLUME_FSID),
> +	FSINFO_FEATURE_NAME(IVER_ALL_CHANGE),
> +	FSINFO_FEATURE_NAME(IVER_DATA_CHANGE),
> +	FSINFO_FEATURE_NAME(IVER_MONO_INCR),
> +	FSINFO_FEATURE_NAME(DIRECTORIES),
> +	FSINFO_FEATURE_NAME(SYMLINKS),
> +	FSINFO_FEATURE_NAME(HARD_LINKS),
> +	FSINFO_FEATURE_NAME(HARD_LINKS_1DIR),
> +	FSINFO_FEATURE_NAME(DEVICE_FILES),
> +	FSINFO_FEATURE_NAME(UNIX_SPECIALS),
> +	FSINFO_FEATURE_NAME(RESOURCE_FORKS),
> +	FSINFO_FEATURE_NAME(NAME_CASE_INDEP),
> +	FSINFO_FEATURE_NAME(NAME_NON_UTF8),
> +	FSINFO_FEATURE_NAME(NAME_HAS_CODEPAGE),
> +	FSINFO_FEATURE_NAME(SPARSE),
> +	FSINFO_FEATURE_NAME(NOT_PERSISTENT),
> +	FSINFO_FEATURE_NAME(NO_UNIX_MODE),
> +	FSINFO_FEATURE_NAME(HAS_ATIME),
> +	FSINFO_FEATURE_NAME(HAS_BTIME),
> +	FSINFO_FEATURE_NAME(HAS_CTIME),
> +	FSINFO_FEATURE_NAME(HAS_MTIME),
> +	FSINFO_FEATURE_NAME(HAS_ACL),
> +	FSINFO_FEATURE_NAME(HAS_INODE_NUMBERS),
> +};
> +
> +static void dump_fsinfo_generic_features(void *reply, unsigned int size)
> +{
> +	struct fsinfo_features *f = reply;
> +	int i;
> +
> +	printf("\n\t");
> +	for (i = 0; i < sizeof(f->features); i++)
> +		printf("%02x", f->features[i]);
> +	printf("\n");
> +	for (i = 0; i < FSINFO_FEAT__NR; i++)
> +		if (f->features[i / 8] & (1 << (i % 8)))
> +			printf("\t- %s\n", fsinfo_feature_names[i]);
> +}
> +
>  static void print_time(struct fsinfo_timestamp_one *t, char stamp)
>  {
>  	printf("\t%ctime       : gran=%gs range=%llx-%llx\n",
> @@ -235,7 +310,6 @@ static void dump_string(void *reply, unsigned int size)
>  
>  #define dump_fsinfo_generic_volume_id		dump_string
>  #define dump_fsinfo_generic_volume_name		dump_string
> -#define dump_fsinfo_generic_name_encoding	dump_string
>  
>  /*
>   *
> @@ -266,6 +340,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
>  	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
> @@ -475,6 +550,7 @@ static int cmp_u32(const void *a, const void *b)
>  int main(int argc, char **argv)
>  {
>  	struct fsinfo_attribute_info attr_info;
> +	struct fsinfo_fsinfo fsinfo_info;
>  	struct fsinfo_params params = {
>  		.at_flags	= AT_SYMLINK_NOFOLLOW,
>  		.flags		= FSINFO_FLAGS_QUERY_PATH,
> @@ -531,6 +607,18 @@ int main(int argc, char **argv)
>  	qsort(attrs, nr, sizeof(attrs[0]), cmp_u32);
>  
>  	if (meta) {
> +		params.request = FSINFO_ATTR_FSINFO;
> +		params.Nth = 0;
> +		params.Mth = 0;
> +		ret = fsinfo(AT_FDCWD, argv[0], &params, &fsinfo_info, sizeof(fsinfo_info));
> +		if (ret == -1) {
> +			fprintf(stderr, "Unable to get fsinfo information: %m\n");
> +			exit(1);
> +		}
> +
> +		dump_fsinfo_info(&fsinfo_info, ret);
> +		printf("\n");
> +
>  		printf("ATTR ID  TYPE         FLAGS    SIZE  ESIZE NAME\n");
>  		printf("======== ============ ======== ===== ===== =========\n");
>  		for (i = 0; i < nr; i++) {
> @@ -558,7 +646,8 @@ int main(int argc, char **argv)
>  			exit(1);
>  		}
>  
> -		if (attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
> +		if (attrs[i] == FSINFO_ATTR_FSINFO ||
> +		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
>  		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTES)
>  			continue;
>  
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID [ver #16]
  2020-02-18 17:05 ` [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID " David Howells
@ 2020-02-19 16:53   ` Darrick J. Wong
  2020-02-20 12:45   ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-19 16:53 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:05:35PM +0000, David Howells wrote:
> Introduce an (effectively) non-repeating system-unique superblock ID that
> can be used to determine that two object are in the same superblock without
> risking reuse of the ID in the meantime (as is possible with device IDs).
> 
> The ID is time-based to make it harder to use it as a covert communications
> channel.
> 
> Also make it so that this ID can be fetched by the fsinfo() system call.
> The ID added so that superblock notification messages will also be able to
> be tagged with it.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/fsinfo.c               |    1 +
>  fs/super.c                |   24 ++++++++++++++++++++++++
>  include/linux/fs.h        |    3 +++
>  samples/vfs/test-fsinfo.c |    1 +
>  4 files changed, 29 insertions(+)
> 
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> index 55710d6da327..f8e85762fc47 100644
> --- a/fs/fsinfo.c
> +++ b/fs/fsinfo.c
> @@ -92,6 +92,7 @@ static int fsinfo_generic_ids(struct path *path, struct fsinfo_context *ctx)
>  	p->f_fstype	= sb->s_magic;
>  	p->f_dev_major	= MAJOR(sb->s_dev);
>  	p->f_dev_minor	= MINOR(sb->s_dev);
> +	p->f_sb_id	= sb->s_unique_id;

Ahah, this is what the f_sb_id field is for.  I noticed a few patches
ago that it was in a header file but was never set.

I'm losing track of which IDs do what...

* f_fsid is that old int[2] thing that we used for statfs.  It sucks but
  we can't remove it because it's been in statfs since the beginning of
  time.

* f_fs_name is a string coming from s_type, which is the name of the fs
  (e.g. "XFS")?

* f_fstype comes from s_magic, which (for XFS) is 0x58465342.

* f_sb_id is basically an incore u64 cookie that one can use with the
  mount events thing that comes later in this patchset?

* FSINFO_ATTR_VOLUME_ID comes from s_id, which tends to be the block
  device name (at least for local filesystems)

* FSINFO_ATTR_VOLUME_UUID comes from s_uuid, which some filesystems fill
  in at mount time.

* FSINFO_ATTR_VOLUME_NAME is ... left to individual filesystems to
  implement, and (AFAICT) can be the label that one uses for things
  like: "mount LABEL=foo /home" ?

Assuming I got all of that right, can we please capture what all of
these "IDs" mean in the documentation?

(Assuming I got all that right, the code looks ok.)

--D

>  
>  	memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
>  	strlcpy(p->f_fs_name, path->dentry->d_sb->s_type->name,
> diff --git a/fs/super.c b/fs/super.c
> index cd352530eca9..a63073e6127e 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -44,6 +44,8 @@ static int thaw_super_locked(struct super_block *sb);
>  
>  static LIST_HEAD(super_blocks);
>  static DEFINE_SPINLOCK(sb_lock);
> +static u64 sb_last_identifier;
> +static u64 sb_identifier_offset;
>  
>  static char *sb_writers_name[SB_FREEZE_LEVELS] = {
>  	"sb_writers",
> @@ -188,6 +190,27 @@ static void destroy_unused_super(struct super_block *s)
>  	destroy_super_work(&s->destroy_work);
>  }
>  
> +/*
> + * Generate a unique identifier for a superblock.
> + */
> +static void generate_super_id(struct super_block *s)
> +{
> +	u64 id = ktime_to_ns(ktime_get());
> +
> +	spin_lock(&sb_lock);
> +
> +	id += sb_identifier_offset;
> +	if (id <= sb_last_identifier) {
> +		id = sb_last_identifier + 1;
> +		sb_identifier_offset = sb_last_identifier - id;
> +	}
> +
> +	sb_last_identifier = id;
> +	spin_unlock(&sb_lock);
> +
> +	s->s_unique_id = id;
> +}
> +
>  /**
>   *	alloc_super	-	create new superblock
>   *	@type:	filesystem type superblock should belong to
> @@ -273,6 +296,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
>  		goto fail;
>  	if (list_lru_init_memcg(&s->s_inode_lru, &s->s_shrink))
>  		goto fail;
> +	generate_super_id(s);
>  	return s;
>  
>  fail:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f74a4ee36eb3..e5db22d536a3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1550,6 +1550,9 @@ struct super_block {
>  
>  	spinlock_t		s_inode_wblist_lock;
>  	struct list_head	s_inodes_wb;	/* writeback inodes */
> +
> +	/* Superblock event notifications */
> +	u64			s_unique_id;
>  } __randomize_layout;
>  
>  /* Helper functions so that in most cases filesystems will
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> index 6fbf0ce099b2..d6ec5713364f 100644
> --- a/samples/vfs/test-fsinfo.c
> +++ b/samples/vfs/test-fsinfo.c
> @@ -140,6 +140,7 @@ static void dump_fsinfo_generic_ids(void *reply, unsigned int size)
>  	printf("\tdev          : %02x:%02x\n", f->f_dev_major, f->f_dev_minor);
>  	printf("\tfs           : type=%x name=%s\n", f->f_fstype, f->f_fs_name);
>  	printf("\tfsid         : %llx\n", (unsigned long long)f->f_fsid);
> +	printf("\tsbid         : %llx\n", (unsigned long long)f->f_sb_id);
>  }
>  
>  static void dump_fsinfo_generic_limits(void *reply, unsigned int size)
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 18/19] ext4: Add example fsinfo information [ver #16]
  2020-02-18 17:07 ` [PATCH 18/19] ext4: Add example fsinfo information " David Howells
@ 2020-02-19 17:04   ` Darrick J. Wong
  2020-02-20  0:53   ` kbuild test robot
  2020-02-21 14:43   ` David Howells
  2 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-19 17:04 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:07:14PM +0000, David Howells wrote:
> Add the ability to list some ext4 volume timestamps as an example.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: linux-ext4@vger.kernel.org
> ---
> 
>  fs/ext4/Makefile            |    1 +
>  fs/ext4/ext4.h              |    9 +++++++++
>  fs/ext4/fsinfo.c            |   40 ++++++++++++++++++++++++++++++++++++++++
>  fs/ext4/super.c             |    1 +
>  include/uapi/linux/fsinfo.h |   16 ++++++++++++++++
>  samples/vfs/test-fsinfo.c   |   35 +++++++++++++++++++++++++++++++++++
>  6 files changed, 102 insertions(+)
>  create mode 100644 fs/ext4/fsinfo.c
> 
> diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
> index 4ccb3c9189d8..71d5b460c7c7 100644
> --- a/fs/ext4/Makefile
> +++ b/fs/ext4/Makefile
> @@ -16,3 +16,4 @@ ext4-$(CONFIG_EXT4_FS_SECURITY)		+= xattr_security.o
>  ext4-inode-test-objs			+= inode-test.o
>  obj-$(CONFIG_EXT4_KUNIT_TESTS)		+= ext4-inode-test.o
>  ext4-$(CONFIG_FS_VERITY)		+= verity.o
> +ext4-$(CONFIG_FSINFO)			+= fsinfo.o
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 9a2ee2428ecc..d81b04227da7 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -42,6 +42,7 @@
>  
>  #include <linux/fscrypt.h>
>  #include <linux/fsverity.h>
> +#include <linux/fsinfo.h>
>  
>  #include <linux/compiler.h>
>  
> @@ -3166,6 +3167,14 @@ extern const struct inode_operations ext4_file_inode_operations;
>  extern const struct file_operations ext4_file_operations;
>  extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
>  
> +/* fsinfo.c */
> +#ifdef CONFIG_FSINFO
> +struct fsinfo_attribute;
> +extern const struct fsinfo_attribute ext4_fsinfo_attributes[];
> +#else
> +#define ext4_fsinfo_attributes NULL
> +#endif
> +
>  /* inline.c */
>  extern int ext4_get_max_inline_size(struct inode *inode);
>  extern int ext4_find_inline_data_nolock(struct inode *inode);
> diff --git a/fs/ext4/fsinfo.c b/fs/ext4/fsinfo.c
> new file mode 100644
> index 000000000000..545424c410ff
> --- /dev/null
> +++ b/fs/ext4/fsinfo.c
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Filesystem information for ext4
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#include <linux/mount.h>
> +#include "ext4.h"
> +
> +static int ext4_fsinfo_get_volume_name(struct path *path, struct fsinfo_context *ctx)
> +{
> +	const struct ext4_sb_info *sbi = EXT4_SB(path->mnt->mnt_sb);
> +	const struct ext4_super_block *es = sbi->s_es;
> +
> +	memcpy(ctx->buffer, es->s_volume_name, sizeof(es->s_volume_name));

Shouldn't this be checking that ctx->buffer is large enough to hold
s_volume_name?

> +	return strlen(ctx->buffer);

s_volume_name is /not/ a null-terminated string if the label is 16
characters long.

> +}
> +
> +static int ext4_fsinfo_get_timestamps(struct path *path, struct fsinfo_context *ctx)
> +{
> +	const struct ext4_sb_info *sbi = EXT4_SB(path->mnt->mnt_sb);
> +	const struct ext4_super_block *es = sbi->s_es;
> +	struct fsinfo_ext4_timestamps *ts = ctx->buffer;
> +
> +#define Z(R,S) R = S | (((u64)S##_hi) << 32)
> +	Z(ts->mkfs_time,	es->s_mkfs_time);
> +	Z(ts->mount_time,	es->s_mtime);
> +	Z(ts->write_time,	es->s_wtime);
> +	Z(ts->last_check_time,	es->s_lastcheck);
> +	Z(ts->first_error_time,	es->s_first_error_time);
> +	Z(ts->last_error_time,	es->s_last_error_time);
> +	return sizeof(*ts);
> +}
> +
> +const struct fsinfo_attribute ext4_fsinfo_attributes[] = {
> +	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	ext4_fsinfo_get_volume_name),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_EXT4_TIMESTAMPS,	ext4_fsinfo_get_timestamps),
> +	{}
> +};
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 8434217549b3..e21c3d99747e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1477,6 +1477,7 @@ static const struct super_operations ext4_sops = {
>  	.freeze_fs	= ext4_freeze,
>  	.unfreeze_fs	= ext4_unfreeze,
>  	.statfs		= ext4_statfs,
> +	.fsinfo_attributes = ext4_fsinfo_attributes,
>  	.remount_fs	= ext4_remount,
>  	.show_options	= ext4_show_options,
>  #ifdef CONFIG_QUOTA
> diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
> index 5467f88ca9b0..da9a6f48ec5b 100644
> --- a/include/uapi/linux/fsinfo.h
> +++ b/include/uapi/linux/fsinfo.h
> @@ -38,6 +38,8 @@
>  #define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of the Nth server (string) */
>  #define FSINFO_ATTR_AFS_SERVER_ADDRESSES 0x302	/* List of addresses of the Nth server */
>  
> +#define FSINFO_ATTR_EXT4_TIMESTAMPS	0x400	/* Ext4 superblock timestamps */

I guess each filesystem gets ... 256 different attrs, and the third
nibble determines the namespace?

--D

>  /*
>   * Optional fsinfo() parameter structure.
>   *
> @@ -323,4 +325,18 @@ struct fsinfo_afs_server_address {
>  
>  #define FSINFO_ATTR_AFS_SERVER_ADDRESSES__STRUCT struct fsinfo_afs_server_address
>  
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_EXT4_TIMESTAMPS).
> + */
> +struct fsinfo_ext4_timestamps {
> +	__u64		mkfs_time;
> +	__u64		mount_time;
> +	__u64		write_time;
> +	__u64		last_check_time;
> +	__u64		first_error_time;
> +	__u64		last_error_time;
> +};
> +
> +#define FSINFO_ATTR_EXT4_TIMESTAMPS__STRUCT struct fsinfo_ext4_timestamps
> +
>  #endif /* _UAPI_LINUX_FSINFO_H */
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> index fd425c08b00b..53251ee98d1c 100644
> --- a/samples/vfs/test-fsinfo.c
> +++ b/samples/vfs/test-fsinfo.c
> @@ -359,6 +359,40 @@ static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
>  	printf("family=%u\n", ss->ss_family);
>  }
>  
> +static char *dump_ext4_time(char *buffer, time_t tim)
> +{
> +	struct tm tm;
> +	int len;
> +
> +	if (tim == 0)
> +		return "-";
> +
> +	if (!localtime_r(&tim, &tm)) {
> +		perror("localtime_r");
> +		exit(1);
> +	}
> +	len = strftime(buffer, 100, "%F %T", &tm);
> +	if (len == 0) {
> +		perror("strftime");
> +		exit(1);
> +	}
> +	return buffer;
> +}
> +
> +static void dump_ext4_fsinfo_timestamps(void *reply, unsigned int size)
> +{
> +	struct fsinfo_ext4_timestamps *r = reply;
> +	char buffer[100];
> +
> +	printf("\n");
> +	printf("\tmkfs    : %s\n", dump_ext4_time(buffer, r->mkfs_time));
> +	printf("\tmount   : %s\n", dump_ext4_time(buffer, r->mount_time));
> +	printf("\twrite   : %s\n", dump_ext4_time(buffer, r->write_time));
> +	printf("\tfsck    : %s\n", dump_ext4_time(buffer, r->last_check_time));
> +	printf("\t1st-err : %s\n", dump_ext4_time(buffer, r->first_error_time));
> +	printf("\tlast-err: %s\n", dump_ext4_time(buffer, r->last_error_time));
> +}
> +
>  static void dump_string(void *reply, unsigned int size)
>  {
>  	char *s = reply, *p;
> @@ -433,6 +467,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
>  	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	afs_cell_name),
>  	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_server_name),
>  	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_EXT4_TIMESTAMPS,	ext4_fsinfo_timestamps),
>  	{}
>  };
>  
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
  2020-02-19 16:31   ` Darrick J. Wong
@ 2020-02-19 20:07   ` Jann Horn
  2020-02-20 10:34   ` David Howells
  2020-02-20 11:03   ` David Howells
  3 siblings, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-19 20:07 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Tue, Feb 18, 2020 at 6:05 PM David Howells <dhowells@redhat.com> wrote:
> Add a system call to allow filesystem information to be queried.  A request
> value can be given to indicate the desired attribute.  Support is provided
> for enumerating multi-value attributes.
[...]
> +static const struct fsinfo_attribute fsinfo_common_attributes[];
> +
> +/**
> + * fsinfo_string - Store a string as an fsinfo attribute value.
> + * @s: The string to store (may be NULL)
> + * @ctx: The parameter context
> + */
> +int fsinfo_string(const char *s, struct fsinfo_context *ctx)
> +{
> +       int ret = 0;
> +
> +       if (s) {
> +               ret = strlen(s);
> +               memcpy(ctx->buffer, s, ret);
> +       }
> +
> +       return ret;
> +}

Please add a check here to ensure that "ret" actually fits into the
buffer (and use WARN_ON() if you think the check should never fire).
Otherwise I think this is too fragile.

[...]
> +static int fsinfo_generic_ids(struct path *path, struct fsinfo_context *ctx)
> +{
> +       struct fsinfo_ids *p = ctx->buffer;
> +       struct super_block *sb;
> +       struct kstatfs buf;
> +       int ret;
> +
> +       ret = vfs_statfs(path, &buf);
> +       if (ret < 0 && ret != -ENOSYS)
> +               return ret;

What's going on here? If vfs_statfs() returns -ENOSYS, we just use the
(AFAICS uninitialized) buf.f_fsid anyway in the memcpy() below and
return it to userspace?

> +       sb = path->dentry->d_sb;
> +       p->f_fstype     = sb->s_magic;
> +       p->f_dev_major  = MAJOR(sb->s_dev);
> +       p->f_dev_minor  = MINOR(sb->s_dev);
> +
> +       memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
> +       strlcpy(p->f_fs_name, path->dentry->d_sb->s_type->name,
> +               sizeof(p->f_fs_name));
> +       return sizeof(*p);
> +}
[...]
> +static int fsinfo_attribute_info(struct path *path, struct fsinfo_context *ctx)
> +{
> +       const struct fsinfo_attribute *attr;
> +       struct fsinfo_attribute_info *info = ctx->buffer;
> +       struct dentry *dentry = path->dentry;
> +
> +       if (dentry->d_sb->s_op->fsinfo_attributes)
> +               for (attr = dentry->d_sb->s_op->fsinfo_attributes; attr->get; attr++)
> +                       if (attr->attr_id == ctx->Nth)
> +                               goto found;
> +       for (attr = fsinfo_common_attributes; attr->get; attr++)
> +               if (attr->attr_id == ctx->Nth)
> +                       goto found;
> +       return -ENODATA;
> +
> +found:
> +       info->attr_id           = attr->attr_id;
> +       info->type              = attr->type;
> +       info->flags             = attr->flags;
> +       info->size              = attr->size;
> +       info->element_size      = attr->element_size;
> +       return sizeof(*attr);

I think you meant sizeof(*info).

[...]
> +static int fsinfo_attributes(struct path *path, struct fsinfo_context *ctx)
> +{
> +       const struct fsinfo_attribute *attr;
> +       struct super_block *sb = path->dentry->d_sb;
> +
> +       if (sb->s_op->fsinfo_attributes)
> +               for (attr = sb->s_op->fsinfo_attributes; attr->get; attr++)
> +                       fsinfo_attributes_insert(ctx, attr);
> +       for (attr = fsinfo_common_attributes; attr->get; attr++)
> +               fsinfo_attributes_insert(ctx, attr);
> +       return ctx->usage;

It is kind of weird that you have to return the ctx->usage everywhere
even though the caller already has ctx...

> +}
> +
> +static const struct fsinfo_attribute fsinfo_common_attributes[] = {
> +       FSINFO_VSTRUCT  (FSINFO_ATTR_STATFS,            fsinfo_generic_statfs),
> +       FSINFO_VSTRUCT  (FSINFO_ATTR_IDS,               fsinfo_generic_ids),
> +       FSINFO_VSTRUCT  (FSINFO_ATTR_LIMITS,            fsinfo_generic_limits),
> +       FSINFO_VSTRUCT  (FSINFO_ATTR_SUPPORTS,          fsinfo_generic_supports),
> +       FSINFO_VSTRUCT  (FSINFO_ATTR_TIMESTAMP_INFO,    fsinfo_generic_timestamp_info),
> +       FSINFO_STRING   (FSINFO_ATTR_VOLUME_ID,         fsinfo_generic_volume_id),
> +       FSINFO_VSTRUCT  (FSINFO_ATTR_VOLUME_UUID,       fsinfo_generic_volume_uuid),
> +       FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_attribute_info),
> +       FSINFO_LIST     (FSINFO_ATTR_FSINFO_ATTRIBUTES, fsinfo_attributes),
> +       {}
> +};
> +
> +/*
> + * Retrieve large filesystem information, such as an opaque blob or array of
> + * struct elements where the value isn't limited to the size of a page.
> + */
> +static int vfs_fsinfo_large(struct path *path, struct fsinfo_context *ctx,
> +                           const struct fsinfo_attribute *attr)
> +{
> +       int ret;
> +
> +       while (!signal_pending(current)) {
> +               ctx->usage = 0;
> +               ret = attr->get(path, ctx);
> +               if (IS_ERR_VALUE((long)ret))
> +                       return ret; /* Error */
> +               if ((unsigned int)ret <= ctx->buf_size)
> +                       return ret; /* It fitted */
> +
> +               /* We need to resize the buffer */
> +               kvfree(ctx->buffer);
> +               ctx->buffer = NULL;
> +               ctx->buf_size = roundup(ret, PAGE_SIZE);
> +               if (ctx->buf_size > INT_MAX)
> +                       return -EMSGSIZE;
> +               ctx->buffer = kvmalloc(ctx->buf_size, GFP_KERNEL);

ctx->buffer is _almost_ always pre-zeroed (see vfs_do_fsinfo() below),
except if we have FSINFO_TYPE_OPAQUE or FSINFO_TYPE_LIST with a size
bigger than what the attribute's ->size field said? Is that
intentional?

> +               if (!ctx->buffer)
> +                       return -ENOMEM;
> +       }
> +
> +       return -ERESTARTSYS;
> +}
> +
> +static int vfs_do_fsinfo(struct path *path, struct fsinfo_context *ctx,
> +                        const struct fsinfo_attribute *attr)
> +{
> +       if (ctx->Nth != 0 && !(attr->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)))
> +               return -ENODATA;
> +       if (ctx->Mth != 0 && !(attr->flags & FSINFO_FLAGS_NM))
> +               return -ENODATA;
> +
> +       ctx->buf_size = attr->size;
> +       if (ctx->want_size_only && attr->type == FSINFO_TYPE_VSTRUCT)
> +               return attr->size;
> +
> +       ctx->buffer = kvzalloc(ctx->buf_size, GFP_KERNEL);
> +       if (!ctx->buffer)
> +               return -ENOMEM;
> +
> +       switch (attr->type) {
> +       case FSINFO_TYPE_VSTRUCT:
> +               ctx->clear_tail = true;
> +               /* Fall through */
> +       case FSINFO_TYPE_STRING:
> +               return attr->get(path, ctx);
> +
> +       case FSINFO_TYPE_OPAQUE:
> +       case FSINFO_TYPE_LIST:
> +               return vfs_fsinfo_large(path, ctx, attr);
> +
> +       default:
> +               return -ENOPKG;
> +       }
> +}
[...]
> +SYSCALL_DEFINE5(fsinfo,
> +               int, dfd, const char __user *, pathname,
> +               struct fsinfo_params __user *, params,
> +               void __user *, user_buffer, size_t, user_buf_size)
> +{
[...]
> +       if (ret < 0)
> +               goto error;
> +
> +       result_size = ret;
> +       if (result_size > user_buf_size)
> +               result_size = user_buf_size;

This is "result_size = min_t(size_t, ret, user_buf_size)".

[...]
> +/*
> + * A filesystem information attribute definition.
> + */
> +struct fsinfo_attribute {
> +       unsigned int            attr_id;        /* The ID of the attribute */
> +       enum fsinfo_value_type  type:8;         /* The type of the attribute's value(s) */
> +       unsigned int            flags:8;
> +       unsigned int            size:16;        /* - Value size (FSINFO_STRUCT) */
> +       unsigned int            element_size:16; /* - Element size (FSINFO_LIST) */
> +       int (*get)(struct path *path, struct fsinfo_context *params);
> +};

Why the bitfields? It doesn't look like that's going to help you much,
you'll just end up with 6 bytes of holes on x86-64:

$ cat fsinfo_attribute_layout.c
enum fsinfo_value_type {
  FSINFO_TYPE_VSTRUCT     = 0,    /* Version-lengthed struct (up to
4096 bytes) */
  FSINFO_TYPE_STRING      = 1,    /* NUL-term var-length string (up to
4095 chars) */
  FSINFO_TYPE_OPAQUE      = 2,    /* Opaque blob (unlimited size) */
  FSINFO_TYPE_LIST        = 3,    /* List of ints/structs (unlimited size) */
};

struct fsinfo_attribute {
  unsigned int            attr_id;        /* The ID of the attribute */
  enum fsinfo_value_type  type:8;         /* The type of the
attribute's value(s) */
  unsigned int            flags:8;
  unsigned int            size:16;        /* - Value size (FSINFO_STRUCT) */
  unsigned int            element_size:16; /* - Element size (FSINFO_LIST) */
  void *get;
};
void *blah(struct fsinfo_attribute *p) {
  return p->get;
}
$ gcc -c -o fsinfo_attribute_layout.o fsinfo_attribute_layout.c -ggdb
$ pahole -C fsinfo_attribute -E --hex fsinfo_attribute_layout.o
struct fsinfo_attribute {
        unsigned int               attr_id;          /*     0   0x4 */
        enum fsinfo_value_type type:8;               /*   0x4: 0 0x4 */
        unsigned int               flags:8;          /*   0x4:0x8 0x4 */
        unsigned int               size:16;          /*   0x4:0x10 0x4 */
        unsigned int               element_size:16;  /*   0x8: 0 0x4 */

        /* XXX 16 bits hole, try to pack */
        /* XXX 4 bytes hole, try to pack */

        void *                     get;              /*  0x10   0x8 */

        /* size: 24, cachelines: 1, members: 6 */
        /* sum members: 12, holes: 1, sum holes: 4 */
        /* sum bitfield members: 48 bits, bit holes: 1, sum bit holes:
16 bits */
        /* last cacheline: 24 bytes */
};

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/19] afs: Support fsinfo() [ver #16]
  2020-02-18 17:06 ` [PATCH 11/19] afs: Support fsinfo() " David Howells
@ 2020-02-19 21:01   ` Jann Horn
  2020-02-20 12:58   ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-19 21:01 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Tue, Feb 18, 2020 at 6:07 PM David Howells <dhowells@redhat.com> wrote:
> Add fsinfo support to the AFS filesystem.
[...]
>  static const struct super_operations afs_super_ops = {
>         .statfs         = afs_statfs,
> +#ifdef CONFIG_FSINFO
> +       .fsinfo_attributes = afs_fsinfo_attributes,
> +#endif
> +       .alloc_inode    = afs_alloc_inode,
> +       .drop_inode     = afs_drop_inode,
> +       .destroy_inode  = afs_destroy_inode,
> +       .free_inode     = afs_free_inode,
> +       .evict_inode    = afs_evict_inode,
> +       .show_devname   = afs_show_devname,
> +       .show_options   = afs_show_options,
> +};
> +
> +static const struct super_operations afs_dyn_super_ops = {
> +       .statfs         = afs_statfs,
> +#ifdef CONFIG_FSINFO
> +       .fsinfo_attributes = afs_dyn_fsinfo_attributes,
> +#endif
>         .alloc_inode    = afs_alloc_inode,
>         .drop_inode     = afs_drop_inode,
>         .destroy_inode  = afs_destroy_inode,
[...]
> @@ -432,9 +454,12 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
>         sb->s_blocksize_bits    = PAGE_SHIFT;
>         sb->s_maxbytes          = MAX_LFS_FILESIZE;
>         sb->s_magic             = AFS_FS_MAGIC;
> -       sb->s_op                = &afs_super_ops;
> -       if (!as->dyn_root)
> +       if (!as->dyn_root) {
> +               sb->s_op        = &afs_super_ops;
>                 sb->s_xattr     = afs_xattr_handlers;
> +       } else {
> +               sb->s_op        = &afs_dyn_super_ops;
> +       }

Ewww. So basically, having one static set of .fsinfo_attributes is not
sufficiently flexible for everyone, but instead of allowing the
filesystem to dynamically provide a list of supported attributes, you
just duplicate the super_operations? Seems to me like it'd be cleaner
to add a function pointer to the super_operations that can dynamically
fill out the supported fsinfo attributes.

It seems to me like the current API is going to be a dead end if you
ever want to have decent passthrough of these things for e.g. FUSE, or
overlayfs, or VirtFS?

>         ret = super_setup_bdi(sb);
>         if (ret)
>                 return ret;
> @@ -444,7 +469,7 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
>         if (as->dyn_root) {
>                 inode = afs_iget_pseudo_dir(sb, true);
>         } else {
> -               sprintf(sb->s_id, "%llu", as->volume->vid);
> +               sprintf(sb->s_id, "%llx", as->volume->vid);

(This is technically a (small) UAPI change for audit logging of AFS
filesystems, right? You may want to note that in the commit message.)

>                 afs_activate_volume(as->volume);
>                 iget_data.fid.vid       = as->volume->vid;
>                 iget_data.fid.vnode     = 1;
[...]
> +static int afs_fsinfo_get_supports(struct path *path, struct fsinfo_context *ctx)
> +{
> +       struct fsinfo_supports *sup = ctx->buffer;
> +
> +       sup = ctx->buffer;

Duplicate assignment to "sup".

> +       sup->stx_mask = (STATX_TYPE | STATX_MODE |
> +                        STATX_NLINK |
> +                        STATX_UID | STATX_GID |
> +                        STATX_MTIME | STATX_INO |
> +                        STATX_SIZE);
> +       sup->stx_attributes = STATX_ATTR_AUTOMOUNT;
> +       return sizeof(*sup);
> +}
[...]
> +static int afs_fsinfo_get_server_address(struct path *path, struct fsinfo_context *ctx)
> +{
> +       struct fsinfo_afs_server_address *addr = ctx->buffer;
> +       struct afs_server_list *slist;
> +       struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
> +       struct afs_addr_list *alist;
> +       struct afs_volume *volume = as->volume;
> +       struct afs_server *server;
> +       struct afs_net *net = afs_d2net(path->dentry);
> +       unsigned int i;
> +       int ret = -ENODATA;
> +
> +       read_lock(&volume->servers_lock);
> +       slist = afs_get_serverlist(volume->servers);
> +       read_unlock(&volume->servers_lock);
> +
> +       if (ctx->Nth >= slist->nr_servers)
> +               goto put_slist;
> +       server = slist->servers[ctx->Nth].server;
> +
> +       read_lock(&server->fs_lock);
> +       alist = afs_get_addrlist(rcu_access_pointer(server->addresses));

Documentation for rcu_access_pointer() says:

 * Return the value of the specified RCU-protected pointer, but omit the
 * lockdep checks for being in an RCU read-side critical section.  This is
 * useful when the value of this pointer is accessed, but the pointer is
 * not dereferenced, for example, when testing an RCU-protected pointer
 * against NULL.  Although rcu_access_pointer() may also be used in cases
 * where update-side locks prevent the value of the pointer from changing,
 * you should instead use rcu_dereference_protected() for this use case.
 *
 * It is also permissible to use rcu_access_pointer() when read-side
 * access to the pointer was removed at least one grace period ago, as
 * is the case in the context of the RCU callback that is freeing up
 * the data, or after a synchronize_rcu() returns.  This can be useful
 * when tearing down multi-linked structures after a grace period
 * has elapsed.

> +       read_unlock(&server->fs_lock);

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 13/19] vfs: Add a mount-notification facility [ver #16]
  2020-02-18 17:06 ` [PATCH 13/19] vfs: Add a mount-notification facility " David Howells
@ 2020-02-19 22:40   ` Jann Horn
  2020-02-19 22:55     ` Jann Horn
  2020-02-21 12:24   ` David Howells
  1 sibling, 1 reply; 67+ messages in thread
From: Jann Horn @ 2020-02-19 22:40 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Tue, Feb 18, 2020 at 6:07 PM David Howells <dhowells@redhat.com> wrote:
> Add a mount notification facility whereby notifications about changes in
> mount topology and configuration can be received.  Note that this only
> covers vfsmount topology changes and not superblock events.  A separate
> facility will be added for that.
[...]
> @@ -70,9 +71,13 @@ struct mount {
>         int mnt_id;                     /* mount identifier */
>         int mnt_group_id;               /* peer group identifier */
>         int mnt_expiry_mark;            /* true if marked for expiry */
> +       int mnt_nr_watchers;            /* The number of subtree watches tracking this */

You're never referencing this variable elsewhere in the patch, and it
also isn't gated by #ifdef.

>         struct hlist_head mnt_pins;
>         struct hlist_head mnt_stuck_children;
>         atomic_t mnt_change_counter;    /* Number of changed applied */
> +#ifdef CONFIG_MOUNT_NOTIFICATIONS
> +       struct watch_list *mnt_watchers; /* Watches on dentries within this mount */

Please document lifetime semantics. Something like "This pointer can't
change once it has been set to a non-NULL value".

> +#endif
>  } __randomize_layout;
>
>  #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */
> @@ -155,18 +160,8 @@ static inline bool is_anon_ns(struct mnt_namespace *ns)
>         return ns->seq == 0;
>  }
>
> -/*
> - * Type of mount topology change notification.
> - */
> -enum mount_notification_subtype {
> -       NOTIFY_MOUNT_NEW_MOUNT  = 0, /* New mount added */
> -       NOTIFY_MOUNT_UNMOUNT    = 1, /* Mount removed manually */
> -       NOTIFY_MOUNT_EXPIRY     = 2, /* Automount expired */
> -       NOTIFY_MOUNT_READONLY   = 3, /* Mount R/O state changed */
> -       NOTIFY_MOUNT_SETATTR    = 4, /* Mount attributes changed */
> -       NOTIFY_MOUNT_MOVE_FROM  = 5, /* Mount moved from here */
> -       NOTIFY_MOUNT_MOVE_TO    = 6, /* Mount moved to here (compare op_id) */
> -};

Is there a reason why you introduce these in "vfs: Add mount change
counter", then in this patch move them elsewhere?

[...]
> @@ -174,4 +169,18 @@ static inline void notify_mount(struct mount *changed,
>                                 u32 info_flags)
>  {
>         atomic_inc(&changed->mnt_change_counter);
> +
> +#ifdef CONFIG_MOUNT_NOTIFICATIONS
> +       {
> +               struct mount_notification n = {
> +                       .watch.type     = WATCH_TYPE_MOUNT_NOTIFY,
> +                       .watch.subtype  = subtype,
> +                       .watch.info     = info_flags | watch_sizeof(n),
> +                       .triggered_on   = changed->mnt_id,
> +                       .changed_mount  = aux ? aux->mnt_id : 0,
> +               };
> +
> +               post_mount_notification(changed, &n);
> +       }
> +#endif
[...]
> +/*
> + * Post mount notifications to all watches going rootwards along the tree.
> + *
> + * Must be called with the mount_lock held.

Please put such constraints into lockdep assertions instead of
comments; that way, violations can actually be detected.

> + */
> +void post_mount_notification(struct mount *changed,
> +                            struct mount_notification *notify)
> +{
> +       const struct cred *cred = current_cred();
> +       struct path cursor;
> +       struct mount *mnt;
> +       unsigned seq;
> +
> +       seq = 0;
> +       rcu_read_lock();
> +restart:
> +       cursor.mnt = &changed->mnt;
> +       cursor.dentry = changed->mnt.mnt_root;
> +       mnt = real_mount(cursor.mnt);
> +       notify->watch.info &= ~NOTIFY_MOUNT_IN_SUBTREE;
> +
> +       read_seqbegin_or_lock(&rename_lock, &seq);
> +       for (;;) {
> +               if (mnt->mnt_watchers &&

unlocked test should use READ_ONCE() to document that the read value
can concurrently change

> +                   !hlist_empty(&mnt->mnt_watchers->watchers)) {
> +                       if (cursor.dentry->d_flags & DCACHE_MOUNT_WATCH)
> +                               post_watch_notification(mnt->mnt_watchers,
> +                                                       &notify->watch, cred,
> +                                                       (unsigned long)cursor.dentry);
> +               } else {
> +                       cursor.dentry = mnt->mnt.mnt_root;
> +               }
> +               notify->watch.info |= NOTIFY_MOUNT_IN_SUBTREE;
> +
> +               if (cursor.dentry == cursor.mnt->mnt_root ||
> +                   IS_ROOT(cursor.dentry)) {
> +                       struct mount *parent = READ_ONCE(mnt->mnt_parent);
> +
> +                       /* Escaped? */
> +                       if (cursor.dentry != cursor.mnt->mnt_root)
> +                               break;
> +
> +                       /* Global root? */
> +                       if (mnt == parent)
> +                               break;
> +
> +                       cursor.dentry = READ_ONCE(mnt->mnt_mountpoint);
> +                       mnt = parent;
> +                       cursor.mnt = &mnt->mnt;
> +               } else {
> +                       cursor.dentry = cursor.dentry->d_parent;
> +               }
> +       }
> +
> +       if (need_seqretry(&rename_lock, seq)) {
> +               seq = 1;
> +               goto restart;
> +       }
> +
> +       done_seqretry(&rename_lock, seq);
> +       rcu_read_unlock();
> +}
> +
> +static void release_mount_watch(struct watch *watch)
> +{
> +       struct dentry *dentry = (struct dentry *)(unsigned long)watch->id;
> +
> +       dput(dentry);
> +}
> +
> +/**
> + * sys_watch_mount - Watch for mount topology/attribute changes
> + * @dfd: Base directory to pathwalk from or fd referring to mount.
> + * @filename: Path to mount to place the watch upon
> + * @at_flags: Pathwalk control flags
> + * @watch_fd: The watch queue to send notifications to.
> + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch)
> + */
> +SYSCALL_DEFINE5(watch_mount,
> +               int, dfd,
> +               const char __user *, filename,
> +               unsigned int, at_flags,
> +               int, watch_fd,
> +               int, watch_id)
> +{
> +       struct watch_queue *wqueue;
> +       struct watch_list *wlist = NULL;
> +       struct watch *watch = NULL;
> +       struct mount *m;
> +       struct path path;
> +       unsigned int lookup_flags =
> +               LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
> +       int ret;
> +
> +       if (watch_id < -1 || watch_id > 0xff)
> +               return -EINVAL;
> +       if ((at_flags & ~(AT_NO_AUTOMOUNT | AT_EMPTY_PATH)) != 0)
> +               return -EINVAL;
> +       if (at_flags & AT_NO_AUTOMOUNT)
> +               lookup_flags &= ~LOOKUP_AUTOMOUNT;
> +       if (at_flags & AT_EMPTY_PATH)
> +               lookup_flags |= LOOKUP_EMPTY;
> +
> +       ret = user_path_at(dfd, filename, lookup_flags, &path);
> +       if (ret)
> +               return ret;
> +
> +       ret = inode_permission(path.dentry->d_inode, MAY_EXEC);
> +       if (ret)
> +               goto err_path;
> +
> +       wqueue = get_watch_queue(watch_fd);
> +       if (IS_ERR(wqueue))
> +               goto err_path;
> +
> +       m = real_mount(path.mnt);
> +
> +       if (watch_id >= 0) {
> +               ret = -ENOMEM;
> +               if (!m->mnt_watchers) {

unlocked test should use READ_ONCE

> +                       wlist = kzalloc(sizeof(*wlist), GFP_KERNEL);
> +                       if (!wlist)
> +                               goto err_wqueue;
> +                       init_watch_list(wlist, release_mount_watch);
> +               }
> +
> +               watch = kzalloc(sizeof(*watch), GFP_KERNEL);
> +               if (!watch)
> +                       goto err_wlist;
> +
> +               init_watch(watch, wqueue);
> +               watch->id               = (unsigned long)path.dentry;
> +               watch->info_id          = (u32)watch_id << 24;
> +
> +               ret = security_watch_mount(watch, &path);
> +               if (ret < 0)
> +                       goto err_watch;
> +
> +               down_write(&m->mnt.mnt_sb->s_umount);
> +               if (!m->mnt_watchers) {
> +                       m->mnt_watchers = wlist;
> +                       wlist = NULL;
> +               }
> +
> +               ret = add_watch_to_object(watch, m->mnt_watchers);

If another thread concurrently runs close(watch_fd) at this point,
pipe_release -> put_pipe_info -> free_pipe_info -> watch_queue_clear
will run, correct? And then watch_queue_clear() will find the watch
that we've just created and call its ->release_watch() handler, which
causes dput() on path.dentry? At that point, we no longer hold any
reference to the dentry...

> +               if (ret == 0) {
> +                       spin_lock(&path.dentry->d_lock);
> +                       path.dentry->d_flags |= DCACHE_MOUNT_WATCH;
> +                       spin_unlock(&path.dentry->d_lock);
> +                       dget(path.dentry);

... but then here we call dget() on it?


In general, the following pattern indicates a bug unless a surrounding
lock provides the necessary protection:

ret = operation_that_hands_off_the_reference_on_success(ptr);
if (ret == SUCCESS) {
  increment_refcount(ptr);
}

and should be replaced with the following pattern:

increment_refcount(ptr);
ret = operation_that_hands_off_the_reference_on_success(ptr);
if (ret == FAILURE) {
  decrement_refcount(ptr);
}

> +                       watch = NULL;
> +               }
> +               up_write(&m->mnt.mnt_sb->s_umount);
> +       } else {
> +               ret = -EBADSLT;
> +               if (m->mnt_watchers) {
> +                       down_write(&m->mnt.mnt_sb->s_umount);
> +                       ret = remove_watch_from_object(m->mnt_watchers, wqueue,
> +                                                      (unsigned long)path.dentry,
> +                                                      false);

What exactly is the implication of only using the dentry as key here
(and not the mount)? Does this mean that if you watch install watches
on two bind mounts of the same underlying filesystem, the notification
mechanism gets confused?

> +                       up_write(&m->mnt.mnt_sb->s_umount);
> +               }
> +       }
> +
> +err_watch:
> +       kfree(watch);
> +err_wlist:
> +       kfree(wlist);
> +err_wqueue:
> +       put_watch_queue(wqueue);
> +err_path:
> +       path_put(&path);
> +       return ret;
> +}
[...]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 13/19] vfs: Add a mount-notification facility [ver #16]
  2020-02-19 22:40   ` Jann Horn
@ 2020-02-19 22:55     ` Jann Horn
  0 siblings, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-19 22:55 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Wed, Feb 19, 2020 at 11:40 PM Jann Horn <jannh@google.com> wrote:
> On Tue, Feb 18, 2020 at 6:07 PM David Howells <dhowells@redhat.com> wrote:
[...]
> > +                       watch = NULL;
> > +               }
> > +               up_write(&m->mnt.mnt_sb->s_umount);
> > +       } else {
> > +               ret = -EBADSLT;
> > +               if (m->mnt_watchers) {
> > +                       down_write(&m->mnt.mnt_sb->s_umount);
> > +                       ret = remove_watch_from_object(m->mnt_watchers, wqueue,
> > +                                                      (unsigned long)path.dentry,
> > +                                                      false);
>
> What exactly is the implication of only using the dentry as key here
> (and not the mount)? Does this mean that if you watch install watches
> on two bind mounts of the same underlying filesystem, the notification
> mechanism gets confused?

Ah, nevermind, I understand this one now... this operation only
removes watches on this mount with that dentry, and so together, that
means it effectively removes watches by the full path.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]
  2020-02-18 17:06 ` [PATCH 15/19] vfs: Add superblock " David Howells
@ 2020-02-19 23:08   ` Jann Horn
  2020-02-21 14:23   ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-19 23:08 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Tue, Feb 18, 2020 at 6:07 PM David Howells <dhowells@redhat.com> wrote:
> Add a superblock event notification facility whereby notifications about
> superblock events, such as I/O errors (EIO), quota limits being hit
> (EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
> process asynchronously.  Note that this does not cover vfsmount topology
> changes.  watch_mount() is used for that.
[...]
> @@ -354,6 +356,10 @@ void deactivate_locked_super(struct super_block *s)
>  {
>         struct file_system_type *fs = s->s_type;
>         if (atomic_dec_and_test(&s->s_active)) {
> +#ifdef CONFIG_SB_NOTIFICATIONS
> +               if (s->s_watchers)
> +                       remove_watch_list(s->s_watchers, s->s_unique_id);
> +#endif
>                 cleancache_invalidate_fs(s);
>                 unregister_shrinker(&s->s_shrink);
>                 fs->kill_sb(s);
[...]
> +/**
> + * sys_watch_sb - Watch for superblock events.
> + * @dfd: Base directory to pathwalk from or fd referring to superblock.
> + * @filename: Path to superblock to place the watch upon
> + * @at_flags: Pathwalk control flags
> + * @watch_fd: The watch queue to send notifications to.
> + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch)
> + */
> +SYSCALL_DEFINE5(watch_sb,
> +               int, dfd,
> +               const char __user *, filename,
> +               unsigned int, at_flags,
> +               int, watch_fd,
> +               int, watch_id)
> +{
> +       struct watch_queue *wqueue;
> +       struct super_block *s;
> +       struct watch_list *wlist = NULL;
> +       struct watch *watch = NULL;
> +       struct path path;
> +       unsigned int lookup_flags =
> +               LOOKUP_DIRECTORY | LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
> +       int ret;
[...]
> +       wqueue = get_watch_queue(watch_fd);
> +       if (IS_ERR(wqueue))
> +               goto err_path;
> +
> +       s = path.dentry->d_sb;
> +       if (watch_id >= 0) {
> +               ret = -ENOMEM;
> +               if (!s->s_watchers) {

READ_ONCE() ?

> +                       wlist = kzalloc(sizeof(*wlist), GFP_KERNEL);
> +                       if (!wlist)
> +                               goto err_wqueue;
> +                       init_watch_list(wlist, NULL);
> +               }
> +
> +               watch = kzalloc(sizeof(*watch), GFP_KERNEL);
> +               if (!watch)
> +                       goto err_wlist;
> +
> +               init_watch(watch, wqueue);
> +               watch->id               = s->s_unique_id;
> +               watch->private          = s;
> +               watch->info_id          = (u32)watch_id << 24;
> +
> +               ret = security_watch_sb(watch, s);
> +               if (ret < 0)
> +                       goto err_watch;
> +
> +               down_write(&s->s_umount);
> +               ret = -EIO;
> +               if (atomic_read(&s->s_active)) {
> +                       if (!s->s_watchers) {
> +                               s->s_watchers = wlist;
> +                               wlist = NULL;
> +                       }
> +
> +                       ret = add_watch_to_object(watch, s->s_watchers);
> +                       if (ret == 0) {
> +                               spin_lock(&sb_lock);
> +                               s->s_count++;
> +                               spin_unlock(&sb_lock);

Where is the corresponding decrement of s->s_count? I'm guessing that
it should be in the ->release_watch() handler, except that there isn't
one...

> +                               watch = NULL;
> +                       }
> +               }
> +               up_write(&s->s_umount);
> +       } else {
> +               ret = -EBADSLT;
> +               if (READ_ONCE(s->s_watchers)) {

(Nit: I don't get why you do a lockless check here before taking the
lock - it'd be more straightforward to take the lock first, and it's
not like you want to optimize for the case where someone calls
sys_watch_sb() with invalid arguments...)

> +                       down_write(&s->s_umount);
> +                       ret = remove_watch_from_object(s->s_watchers, wqueue,
> +                                                      s->s_unique_id, false);
> +                       up_write(&s->s_umount);
> +               }
> +       }
> +
> +err_watch:
> +       kfree(watch);
> +err_wlist:
> +       kfree(wlist);
> +err_wqueue:
> +       put_watch_queue(wqueue);
> +err_path:
> +       path_put(&path);
> +       return ret;
> +}
> +#endif
[...]
> +/**
> + * notify_sb: Post simple superblock notification.
> + * @s: The superblock the notification is about.
> + * @subtype: The type of notification.
> + * @info: WATCH_INFO_FLAG_* flags to be set in the record.
> + */
> +static inline void notify_sb(struct super_block *s,
> +                            enum superblock_notification_type subtype,
> +                            u32 info)
> +{
> +#ifdef CONFIG_SB_NOTIFICATIONS
> +       if (unlikely(s->s_watchers)) {

READ_ONCE() ?

> +               struct superblock_notification n = {
> +                       .watch.type     = WATCH_TYPE_SB_NOTIFY,
> +                       .watch.subtype  = subtype,
> +                       .watch.info     = watch_sizeof(n) | info,
> +                       .sb_id          = s->s_unique_id,
> +               };
> +
> +               post_sb_notification(s, &n);
> +       }
> +
> +#endif
> +}
> +
> +/**
> + * notify_sb_error: Post superblock error notification.
> + * @s: The superblock the notification is about.
> + * @error: The error number to be recorded.
> + */
> +static inline int notify_sb_error(struct super_block *s, int error)
> +{
> +#ifdef CONFIG_SB_NOTIFICATIONS
> +       if (unlikely(s->s_watchers)) {

READ_ONCE() ?

> +               struct superblock_error_notification n = {
> +                       .s.watch.type   = WATCH_TYPE_SB_NOTIFY,
> +                       .s.watch.subtype = NOTIFY_SUPERBLOCK_ERROR,
> +                       .s.watch.info   = watch_sizeof(n),
> +                       .s.sb_id        = s->s_unique_id,
> +                       .error_number   = error,
> +                       .error_cookie   = 0,
> +               };
> +
> +               post_sb_notification(s, &n.s);
> +       }
> +#endif
> +       return error;
> +}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 18/19] ext4: Add example fsinfo information [ver #16]
  2020-02-18 17:07 ` [PATCH 18/19] ext4: Add example fsinfo information " David Howells
  2020-02-19 17:04   ` Darrick J. Wong
@ 2020-02-20  0:53   ` kbuild test robot
  2020-02-21 14:43   ` David Howells
  2 siblings, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2020-02-20  0:53 UTC (permalink / raw)
  To: David Howells
  Cc: kbuild-all, viro, dhowells, raven, mszeredi, christian,
	linux-api, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2167 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on next-20200219]
[cannot apply to tip/x86/asm nfs/linux-next ext4/dev linus/master v5.6-rc2 v5.6-rc1 v5.5 v5.6-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/David-Howells/VFS-Filesystem-information-and-notifications-ver-16/20200220-072538
base:    1d7f85df0f9c0456520ae86dc597bca87980d253
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=um SUBARCH=x86_64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> fs/ext4/super.c:1480:3: error: 'const struct super_operations' has no member named 'fsinfo_attributes'
     .fsinfo_attributes = ext4_fsinfo_attributes,
      ^~~~~~~~~~~~~~~~~

vim +1480 fs/ext4/super.c

  1466	
  1467	static const struct super_operations ext4_sops = {
  1468		.alloc_inode	= ext4_alloc_inode,
  1469		.free_inode	= ext4_free_in_core_inode,
  1470		.destroy_inode	= ext4_destroy_inode,
  1471		.write_inode	= ext4_write_inode,
  1472		.dirty_inode	= ext4_dirty_inode,
  1473		.drop_inode	= ext4_drop_inode,
  1474		.evict_inode	= ext4_evict_inode,
  1475		.put_super	= ext4_put_super,
  1476		.sync_fs	= ext4_sync_fs,
  1477		.freeze_fs	= ext4_freeze,
  1478		.unfreeze_fs	= ext4_unfreeze,
  1479		.statfs		= ext4_statfs,
> 1480		.fsinfo_attributes = ext4_fsinfo_attributes,
  1481		.remount_fs	= ext4_remount,
  1482		.show_options	= ext4_show_options,
  1483	#ifdef CONFIG_QUOTA
  1484		.quota_read	= ext4_quota_read,
  1485		.quota_write	= ext4_quota_write,
  1486		.get_dquots	= ext4_get_dquots,
  1487	#endif
  1488		.bdev_try_to_free_page = bdev_try_to_free_page,
  1489	};
  1490	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8539 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 19/19] nfs: Add example filesystem information [ver #16]
  2020-02-18 17:07 ` [PATCH 19/19] nfs: Add example filesystem " David Howells
@ 2020-02-20  2:13   ` kbuild test robot
  2020-02-20  2:20   ` kbuild test robot
  1 sibling, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2020-02-20  2:13 UTC (permalink / raw)
  To: David Howells
  Cc: kbuild-all, viro, dhowells, raven, mszeredi, christian,
	linux-api, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on next-20200219]
[cannot apply to tip/x86/asm nfs/linux-next ext4/dev linus/master v5.6-rc2 v5.6-rc1 v5.5 v5.6-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/David-Howells/VFS-Filesystem-information-and-notifications-ver-16/20200220-072538
base:    1d7f85df0f9c0456520ae86dc597bca87980d253
config: sh-rsk7269_defconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> fs/nfs/super.c:79:3: error: 'const struct super_operations' has no member named 'fsinfo_attributes'
     .fsinfo_attributes = nfs_fsinfo_attributes,
      ^~~~~~~~~~~~~~~~~

vim +79 fs/nfs/super.c

    72	
    73	const struct super_operations nfs_sops = {
    74		.alloc_inode	= nfs_alloc_inode,
    75		.free_inode	= nfs_free_inode,
    76		.write_inode	= nfs_write_inode,
    77		.drop_inode	= nfs_drop_inode,
    78		.statfs		= nfs_statfs,
  > 79		.fsinfo_attributes = nfs_fsinfo_attributes,
    80		.evict_inode	= nfs_evict_inode,
    81		.umount_begin	= nfs_umount_begin,
    82		.show_options	= nfs_show_options,
    83		.show_devname	= nfs_show_devname,
    84		.show_path	= nfs_show_path,
    85		.show_stats	= nfs_show_stats,
    86	};
    87	EXPORT_SYMBOL_GPL(nfs_sops);
    88	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11799 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 19/19] nfs: Add example filesystem information [ver #16]
  2020-02-18 17:07 ` [PATCH 19/19] nfs: Add example filesystem " David Howells
  2020-02-20  2:13   ` kbuild test robot
@ 2020-02-20  2:20   ` kbuild test robot
  1 sibling, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2020-02-20  2:20 UTC (permalink / raw)
  To: David Howells
  Cc: kbuild-all, viro, dhowells, raven, mszeredi, christian,
	linux-api, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on next-20200219]
[cannot apply to tip/x86/asm nfs/linux-next ext4/dev linus/master v5.6-rc2 v5.6-rc1 v5.5 v5.6-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/David-Howells/VFS-Filesystem-information-and-notifications-ver-16/20200220-072538
base:    1d7f85df0f9c0456520ae86dc597bca87980d253
config: mips-nlm_xlr_defconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> fs/nfs/nfs4super.c:29:3: error: 'const struct super_operations' has no member named 'fsinfo_attributes'
     .fsinfo_attributes = nfs_fsinfo_attributes,
      ^~~~~~~~~~~~~~~~~

vim +29 fs/nfs/nfs4super.c

    22	
    23	static const struct super_operations nfs4_sops = {
    24		.alloc_inode	= nfs_alloc_inode,
    25		.free_inode	= nfs_free_inode,
    26		.write_inode	= nfs4_write_inode,
    27		.drop_inode	= nfs_drop_inode,
    28		.statfs		= nfs_statfs,
  > 29		.fsinfo_attributes = nfs_fsinfo_attributes,
    30		.evict_inode	= nfs4_evict_inode,
    31		.umount_begin	= nfs_umount_begin,
    32		.show_options	= nfs_show_options,
    33		.show_devname	= nfs_show_devname,
    34		.show_path	= nfs_show_path,
    35		.show_stats	= nfs_show_stats,
    36	};
    37	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19411 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-19 14:46 ` Christian Brauner
  2020-02-19 15:50   ` Darrick J. Wong
@ 2020-02-20  4:42   ` Ian Kent
  2020-02-20  9:09     ` Christian Brauner
  1 sibling, 1 reply; 67+ messages in thread
From: Ian Kent @ 2020-02-20  4:42 UTC (permalink / raw)
  To: Christian Brauner, David Howells
  Cc: viro, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Wed, 2020-02-19 at 15:46 +0100, Christian Brauner wrote:
> On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote:
> > Here are a set of patches that adds system calls, that (a) allow
> > information about the VFS, mount topology, superblock and files to
> > be
> > retrieved and (b) allow for notifications of mount topology
> > rearrangement
> > events, mount and superblock attribute changes and other superblock
> > events,
> > such as errors.
> > 
> > ============================
> > FILESYSTEM INFORMATION QUERY
> > ============================
> > 
> > The first system call, fsinfo(), allows information about the
> > filesystem at
> > a particular path point to be queried as a set of attributes, some
> > of which
> > may have more than one value.
> > 
> > Attribute values are of four basic types:
> > 
> >  (1) Version dependent-length structure (size defined by type).
> > 
> >  (2) Variable-length string (up to 4096, including NUL).
> > 
> >  (3) List of structures (up to INT_MAX size).
> > 
> >  (4) Opaque blob (up to INT_MAX size).
> 
> I mainly have an organizational question. :) This is a huge patchset
> with lots and lots of (good) features. Wouldn't it make sense to make
> the fsinfo() syscall a completely separate patchset from the
> watch_mount() and watch_sb() syscalls? It seems that they don't need
> to
> depend on each other at all. This would make reviewing this so much
> nicer and likely would mean that fsinfo() could proceed a little
> faster.

The remainder of the fsinfo() series would need to remain useful
if this was done.

For context I want work on improving handling of large mount
tables.

Ultimately I expect to solve a very long standing autofs problem
of using large direct mount maps without prohibitive performance
overhead (and there a lot of rather challenging autofs changes to
do for this too) and I believe the fsinfo() system call, and
related bits, is the way to do this.

But improving the handling of large mount tables for autofs
will have the side effect of improvements for other mount table
users, even in the early stages of this work.

For example I want to use this for mount table handling improvements
in libmount. Clearly that ultimately needs mount change notification
in the end but ...

There's a bunch of things that need to be done alone the way
to even get started.

One thing that's needed is the ability to call fsinfo() to get
information on a mount to avoid constant reading of the proc based
mount table, which happens a lot (since the mount info. needs
to be up to date) so systemd (and others) would see an improvement
with the fsinfo() system call alone able to be used in libmount.

But for the fsinfo() system call to be used for this the file
system specific mount options need to also be obtained when
using fsinfo(). That means the super block operation fsinfo uses
to provide this must be implemented for at least most file systems.

So separating out the notifications part, leaving whatever is needed
to still be able to do this, should be fine and the system call
would be immediately useful once the super operation is implemented
for the needed file systems.

Whether the implementation of the super operation should be done
as part of this series is another question but would certainly
be a challenge and make the series more complicated. But is needed
for the change to be useful in my case.

Ian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-20  4:42   ` Ian Kent
@ 2020-02-20  9:09     ` Christian Brauner
  2020-02-20 11:30       ` Ian Kent
  0 siblings, 1 reply; 67+ messages in thread
From: Christian Brauner @ 2020-02-20  9:09 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, viro, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

On Thu, Feb 20, 2020 at 12:42:15PM +0800, Ian Kent wrote:
> On Wed, 2020-02-19 at 15:46 +0100, Christian Brauner wrote:
> > On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote:
> > > Here are a set of patches that adds system calls, that (a) allow
> > > information about the VFS, mount topology, superblock and files to
> > > be
> > > retrieved and (b) allow for notifications of mount topology
> > > rearrangement
> > > events, mount and superblock attribute changes and other superblock
> > > events,
> > > such as errors.
> > > 
> > > ============================
> > > FILESYSTEM INFORMATION QUERY
> > > ============================
> > > 
> > > The first system call, fsinfo(), allows information about the
> > > filesystem at
> > > a particular path point to be queried as a set of attributes, some
> > > of which
> > > may have more than one value.
> > > 
> > > Attribute values are of four basic types:
> > > 
> > >  (1) Version dependent-length structure (size defined by type).
> > > 
> > >  (2) Variable-length string (up to 4096, including NUL).
> > > 
> > >  (3) List of structures (up to INT_MAX size).
> > > 
> > >  (4) Opaque blob (up to INT_MAX size).
> > 
> > I mainly have an organizational question. :) This is a huge patchset
> > with lots and lots of (good) features. Wouldn't it make sense to make
> > the fsinfo() syscall a completely separate patchset from the
> > watch_mount() and watch_sb() syscalls? It seems that they don't need
> > to
> > depend on each other at all. This would make reviewing this so much
> > nicer and likely would mean that fsinfo() could proceed a little
> > faster.
> 
> The remainder of the fsinfo() series would need to remain useful
> if this was done.
> 
> For context I want work on improving handling of large mount
> tables.

Yeah, I've talked to David about this; polling on a large mountinfo file
is not great, I agree.

> 
> Ultimately I expect to solve a very long standing autofs problem
> of using large direct mount maps without prohibitive performance
> overhead (and there a lot of rather challenging autofs changes to
> do for this too) and I believe the fsinfo() system call, and
> related bits, is the way to do this.
> 
> But improving the handling of large mount tables for autofs
> will have the side effect of improvements for other mount table
> users, even in the early stages of this work.
> 
> For example I want to use this for mount table handling improvements
> in libmount. Clearly that ultimately needs mount change notification
> in the end but ...
> 
> There's a bunch of things that need to be done alone the way
> to even get started.
> 
> One thing that's needed is the ability to call fsinfo() to get
> information on a mount to avoid constant reading of the proc based
> mount table, which happens a lot (since the mount info. needs
> to be up to date) so systemd (and others) would see an improvement
> with the fsinfo() system call alone able to be used in libmount.
> 
> But for the fsinfo() system call to be used for this the file
> system specific mount options need to also be obtained when
> using fsinfo(). That means the super block operation fsinfo uses
> to provide this must be implemented for at least most file systems.
> 
> So separating out the notifications part, leaving whatever is needed
> to still be able to do this, should be fine and the system call
> would be immediately useful once the super operation is implemented
> for the needed file systems.
> 
> Whether the implementation of the super operation should be done
> as part of this series is another question but would certainly
> be a challenge and make the series more complicated. But is needed
> for the change to be useful in my case.

I think what would might work - and what David had already brought up
briefly - is to either base the fsinfo branch on top of the mount
notificaiton branch or break the notification counters pieces into a
separate patch and base both mount notifications and fsinfo on top of
it.

Christian

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
  2020-02-19 16:31   ` Darrick J. Wong
  2020-02-19 20:07   ` Jann Horn
@ 2020-02-20 10:34   ` David Howells
  2020-02-20 15:48     ` Darrick J. Wong
  2020-02-20 11:03   ` David Howells
  3 siblings, 1 reply; 67+ messages in thread
From: David Howells @ 2020-02-20 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dhowells, viro, raven, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

Darrick J. Wong <darrick.wong@oracle.com> wrote:

> > +	p->f_blocks.hi	= 0;
> > +	p->f_blocks.lo	= buf.f_blocks;
> 
> Er... are there filesystems (besides that (xfs++)++ one) that require
> u128 counters?  I suspect that the Very Large Fields are for future
> expandability, but I also wonder about the whether it's worth the
> complexity of doing this, since the structures can always be
> version-revved later.

I'm making a relatively cheap allowance for future expansion.  Dave Chinner
has mentioned at one of the LSFs that 16EiB may be exceeded soon (though I
hate to think of fscking such a beastie).  I know that the YFS variant of AFS
supports 96-bit vnode numbers (which I translate to inode numbers).  What I'm
trying to avoid is the problem we have with stat/statfs where under some
circumstances we have to return an error (ERANGE?) because we can't represent
the number if someone asks for an older version of the struct.

Since the buffer is (meant to be) pre-cleared, the filesystem can just ignore
the high word if it's never going to set it.  In fact, fsinfo_generic_statfs
doesn't need to set them either.

> XFS inodes are u64 values...
> ...and the max symlink target length is 1k, not PAGE_SIZE...

Yeah, and AFS(YFS) has 96-bit inode numbers.  The filesystem's fsinfo table is
read first so that the filesystem can override this.

> ...so is the usage model here that XFS should call fsinfo_generic_limits
> to fill out the fsinfo_limits structure, modify the values in
> ctx->buffer as appropriate for XFS, and then return the structure size?

Actually, I should export some these so that you can do that.  I'll export
fsinfo_generic_{timestamp_info,supports,limits} for now.

> > +#define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
> > +#define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
> > +#define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
> 
> I think I've muttered about the distinction between volume id and
> volume name before, but I'm still wondering how confusing that will be
> for users?  Let me check my assumptions, though:
> 
> Volume ID is whatever's in super_block.s_id, which (at least for xfs and
> ext4) is the device name (e.g. "sda1").  I guess that's useful for
> correlating a thing you can call fsinfo() on against strings that were
> logged in dmesg.
>
> Volume name I think is the fs label (e.g. "home"), which I think will
> have to be implemented separately by each filesystem, and that's why
> there's no generic vfs implementation.

Yes.  For AFS, for example, this would be the name of the volume (which may be
changed), whereas the volume ID is the number in the protocol that actually
refers to the volume (which cannot be changed).

And, as you say, for blockdev mounts, the ID is the device name and the volume
name is filesystem specific.

> The 7 -> 0 -> 1 sequence here confused me until I figured out that
> QUERY_TYPE is the mask for QUERY_{PATH,FD}.

Changed to FSINFO_FLAGS_QUERY_MASK.

> > +struct fsinfo_limits {
> > +...
> > +	__u32	__reserved[1];
> 
> I wonder if these structures ought to reserve more space than a single u32...

No need.  Part of the way the interface is designed is that the version number
for a particular VSTRUCT-type attribute is also the length.  So a newer
version is also longer.  All the old fields must be retained and filled in.
New fields are tagged on the end.

If userspace asks for an older version than is supported, it gets a truncated
return.  If it asks for a newer version, the extra fields it is expecting are
all set to 0.  Either way, the length (and thus the version) the kernel
supports is returned - not the length copied.

The __reserved fields are there because they represent padding (the struct is
going to be aligned/padded according to __u64 in this case).  Ideally, I'd
mark the structs __packed, but this messes with the alignment and may make the
compiler do funny tricks to get out any field larger than a byte.

I've renamed them to __padding.

> > +struct fsinfo_supports {
> > +	__u64	stx_attributes;		/* What statx::stx_attributes are supported */
> > +	__u32	stx_mask;		/* What statx::stx_mask bits are supported */
> > +	__u32	ioc_flags;		/* What FS_IOC_* flags are supported */
> 
> "IOC"?  That just means 'ioctl'.  Is this field supposed to return the
> supported FS_IOC_GETFLAGS flags, or the supported FS_IOC_FSGETXATTR
> flags?

FS_IOC_[GS]ETFLAGS is what I meant.

> I suspect it would also be a big help to be able to tell userspace which
> of the flags can be set, and which can be cleared.

How about:

	__u32	fs_ioc_getflags;	/* What FS_IOC_GETFLAGS may return */
	__u32	fs_ioc_setflags_set;	/* What FS_IOC_SETFLAGS may set */
	__u32	fs_ioc_setflags_clear;	/* What FS_IOC_SETFLAGS may clear */

> > +struct fsinfo_timestamp_one {
> > +	__s64	minimum;	/* Minimum timestamp value in seconds */
> > +	__u64	maximum;	/* Maximum timestamp value in seconds */
> 
> Given that time64_t is s64, why is the maximum here u64?

Well, I assume it extremely unlikely that the maximum can be before 1970, so
there doesn't seem any need to allow the maximum to be negative.  Furthermore,
it would be feasible that you could encounter a filesystem with a u64
filesystem that doesn't support dates before 1970.

On the other hand, if Linux is still going when __s64 seconds from 1970 wraps,
it will be impressive, but I'm not sure we'll be around to see it...  Someone
will have to cast a resurrection spell on Arnd to fix that one.

However, since signed/unsigned comparisons may have issues, I can turn it into
an __s64 also if that is preferred.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
                     ` (2 preceding siblings ...)
  2020-02-20 10:34   ` David Howells
@ 2020-02-20 11:03   ` David Howells
  2020-02-20 14:54     ` Jann Horn
  3 siblings, 1 reply; 67+ messages in thread
From: David Howells @ 2020-02-20 11:03 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> > +int fsinfo_string(const char *s, struct fsinfo_context *ctx)
> ...
> Please add a check here to ensure that "ret" actually fits into the
> buffer (and use WARN_ON() if you think the check should never fire).
> Otherwise I think this is too fragile.

How about:

	int fsinfo_string(const char *s, struct fsinfo_context *ctx)
	{
		unsigned int len;
		char *p = ctx->buffer;
		int ret = 0;
		if (s) {
			len = strlen(s);
			if (len > ctx->buf_size - 1)
				len = ctx->buf_size;
			if (!ctx->want_size_only) {
				memcpy(p, s, len);
				p[len] = 0;
			}
			ret = len;
		}
		return ret;
	}

I've also added a check to eliminate the copy if userspace didn't actually
supply a buffer.

> > +       ret = vfs_statfs(path, &buf);
> > +       if (ret < 0 && ret != -ENOSYS)
> > +               return ret;
> ...
> > +       memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
> 
> What's going on here? If vfs_statfs() returns -ENOSYS, we just use the
> (AFAICS uninitialized) buf.f_fsid anyway in the memcpy() below and
> return it to userspace?

Good point.  I've made the access to the buffer contingent on ret==0.  If I
don't set it, it will just be left pre-cleared.

> > +       return sizeof(*attr);
> 
> I think you meant sizeof(*info).

Yes.  I've renamed the buffer point to "p" in all cases so that it's more
obvious.

> > +       return ctx->usage;
> 
> It is kind of weird that you have to return the ctx->usage everywhere
> even though the caller already has ctx...

At this point, it's only used and returned by fsinfo_attributes() and really
is only for the use of the attribute getter function.

I could, I suppose, return the amount of data in ctx->usage and then preset it
for VSTRUCT-type objects.  Unfortunately, I can't make the getter return void
since it might have to return an error.

> > +               ctx->buffer = kvmalloc(ctx->buf_size, GFP_KERNEL);
> 
> ctx->buffer is _almost_ always pre-zeroed (see vfs_do_fsinfo() below),
> except if we have FSINFO_TYPE_OPAQUE or FSINFO_TYPE_LIST with a size
> bigger than what the attribute's ->size field said? Is that
> intentional?

Fixed.

> > +struct fsinfo_attribute {
> > +       unsigned int            attr_id;        /* The ID of the attribute */
> > +       enum fsinfo_value_type  type:8;         /* The type of the attribute's value(s) */
> > +       unsigned int            flags:8;
> > +       unsigned int            size:16;        /* - Value size (FSINFO_STRUCT) */
> > +       unsigned int            element_size:16; /* - Element size (FSINFO_LIST) */
> > +       int (*get)(struct path *path, struct fsinfo_context *params);
> > +};
> 
> Why the bitfields? It doesn't look like that's going to help you much,
> you'll just end up with 6 bytes of holes on x86-64:

Expanding them to non-bitfields will require an extra 10 bytes, making the
struct 8 bytes bigger with 4 bytes of padding.  I can do that if you'd rather.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-20  9:09     ` Christian Brauner
@ 2020-02-20 11:30       ` Ian Kent
  0 siblings, 0 replies; 67+ messages in thread
From: Ian Kent @ 2020-02-20 11:30 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, viro, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

On Thu, 2020-02-20 at 10:09 +0100, Christian Brauner wrote:
> On Thu, Feb 20, 2020 at 12:42:15PM +0800, Ian Kent wrote:
> > On Wed, 2020-02-19 at 15:46 +0100, Christian Brauner wrote:
> > > On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote:
> > > > Here are a set of patches that adds system calls, that (a)
> > > > allow
> > > > information about the VFS, mount topology, superblock and files
> > > > to
> > > > be
> > > > retrieved and (b) allow for notifications of mount topology
> > > > rearrangement
> > > > events, mount and superblock attribute changes and other
> > > > superblock
> > > > events,
> > > > such as errors.
> > > > 
> > > > ============================
> > > > FILESYSTEM INFORMATION QUERY
> > > > ============================
> > > > 
> > > > The first system call, fsinfo(), allows information about the
> > > > filesystem at
> > > > a particular path point to be queried as a set of attributes,
> > > > some
> > > > of which
> > > > may have more than one value.
> > > > 
> > > > Attribute values are of four basic types:
> > > > 
> > > >  (1) Version dependent-length structure (size defined by type).
> > > > 
> > > >  (2) Variable-length string (up to 4096, including NUL).
> > > > 
> > > >  (3) List of structures (up to INT_MAX size).
> > > > 
> > > >  (4) Opaque blob (up to INT_MAX size).
> > > 
> > > I mainly have an organizational question. :) This is a huge
> > > patchset
> > > with lots and lots of (good) features. Wouldn't it make sense to
> > > make
> > > the fsinfo() syscall a completely separate patchset from the
> > > watch_mount() and watch_sb() syscalls? It seems that they don't
> > > need
> > > to
> > > depend on each other at all. This would make reviewing this so
> > > much
> > > nicer and likely would mean that fsinfo() could proceed a little
> > > faster.
> > 
> > The remainder of the fsinfo() series would need to remain useful
> > if this was done.
> > 
> > For context I want work on improving handling of large mount
> > tables.
> 
> Yeah, I've talked to David about this; polling on a large mountinfo
> file
> is not great, I agree.
> 
> > Ultimately I expect to solve a very long standing autofs problem
> > of using large direct mount maps without prohibitive performance
> > overhead (and there a lot of rather challenging autofs changes to
> > do for this too) and I believe the fsinfo() system call, and
> > related bits, is the way to do this.
> > 
> > But improving the handling of large mount tables for autofs
> > will have the side effect of improvements for other mount table
> > users, even in the early stages of this work.
> > 
> > For example I want to use this for mount table handling
> > improvements
> > in libmount. Clearly that ultimately needs mount change
> > notification
> > in the end but ...
> > 
> > There's a bunch of things that need to be done alone the way
> > to even get started.
> > 
> > One thing that's needed is the ability to call fsinfo() to get
> > information on a mount to avoid constant reading of the proc based
> > mount table, which happens a lot (since the mount info. needs
> > to be up to date) so systemd (and others) would see an improvement
> > with the fsinfo() system call alone able to be used in libmount.
> > 
> > But for the fsinfo() system call to be used for this the file
> > system specific mount options need to also be obtained when
> > using fsinfo(). That means the super block operation fsinfo uses
> > to provide this must be implemented for at least most file systems.
> > 
> > So separating out the notifications part, leaving whatever is
> > needed
> > to still be able to do this, should be fine and the system call
> > would be immediately useful once the super operation is implemented
> > for the needed file systems.
> > 
> > Whether the implementation of the super operation should be done
> > as part of this series is another question but would certainly
> > be a challenge and make the series more complicated. But is needed
> > for the change to be useful in my case.
> 
> I think what would might work - and what David had already brought up
> briefly - is to either base the fsinfo branch on top of the mount
> notificaiton branch or break the notification counters pieces into a
> separate patch and base both mount notifications and fsinfo on top of
> it.

Possibly, but I'm pretty sure David has more fsinfo patches.

So I suspect there will be a right time to post patches for the
fsinfo super block operation that David doesn't already have. I'm
going to have to find time for that ...

The post was more to let David know what my first goal is and what
I need for it, and to let others know there is someone wanting to
use this for user space improvements and give some initial insight
into my longer term goals.

Ian


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/19] fsinfo: Provide a bitmap of supported features [ver #16]
  2020-02-18 17:05 ` [PATCH 03/19] fsinfo: Provide a bitmap of supported features " David Howells
  2020-02-19 16:37   ` Darrick J. Wong
@ 2020-02-20 12:22   ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-20 12:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dhowells, viro, raven, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

Darrick J. Wong <darrick.wong@oracle.com> wrote:

> > +struct fsinfo_features {
> > +	__u8	features[(FSINFO_FEAT__NR + 7) / 8];
> 
> Hm...the structure size is pretty small (56 bytes) and will expand as we
> add new _FEAT flags.  Is this ok because the fsinfo code will truncate
> its response to userspace to whatever buffer size userspace tells it?

Yes.  Also, you can ask fsinfo how many feature bits it supports.

I should put this in the struct first rather than putting it elsewhere.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID [ver #16]
  2020-02-18 17:05 ` [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID " David Howells
  2020-02-19 16:53   ` Darrick J. Wong
@ 2020-02-20 12:45   ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-20 12:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dhowells, viro, raven, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

Darrick J. Wong <darrick.wong@oracle.com> wrote:

> Ahah, this is what the f_sb_id field is for.  I noticed a few patches
> ago that it was in a header file but was never set.
> 
> I'm losing track of which IDs do what...
> 
> * f_fsid is that old int[2] thing that we used for statfs.  It sucks but
>   we can't remove it because it's been in statfs since the beginning of
>   time.
> 
> * f_fs_name is a string coming from s_type, which is the name of the fs
>   (e.g. "XFS")?
> 
> * f_fstype comes from s_magic, which (for XFS) is 0x58465342.
> 
> * f_sb_id is basically an incore u64 cookie that one can use with the
>   mount events thing that comes later in this patchset?
> 
> * FSINFO_ATTR_VOLUME_ID comes from s_id, which tends to be the block
>   device name (at least for local filesystems)
> 
> * FSINFO_ATTR_VOLUME_UUID comes from s_uuid, which some filesystems fill
>   in at mount time.
> 
> * FSINFO_ATTR_VOLUME_NAME is ... left to individual filesystems to
>   implement, and (AFAICT) can be the label that one uses for things
>   like: "mount LABEL=foo /home" ?
> 
> Assuming I got all of that right, can we please capture what all of
> these "IDs" mean in the documentation?

Basically, yes.  Would it help if I:

 (1) Put the ID generation into its own patch, first.

 (2) Put the notification counter patches right after that.

 (3) Renamed the fields a bit, say:

	f_fsid		-> fsid
	f_fs_name	-> filesystem_name
	f_fstype	-> filesystem_magic
	f_sb_id		-> superblock_id
	f_dev_*		-> backing_dev_*

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/19] afs: Support fsinfo() [ver #16]
  2020-02-18 17:06 ` [PATCH 11/19] afs: Support fsinfo() " David Howells
  2020-02-19 21:01   ` Jann Horn
@ 2020-02-20 12:58   ` David Howells
  2020-02-20 14:58     ` Jann Horn
  2020-02-21 13:26     ` David Howells
  1 sibling, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-20 12:58 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> Ewww. So basically, having one static set of .fsinfo_attributes is not
> sufficiently flexible for everyone, but instead of allowing the
> filesystem to dynamically provide a list of supported attributes, you
> just duplicate the super_operations? Seems to me like it'd be cleaner
> to add a function pointer to the super_operations that can dynamically
> fill out the supported fsinfo attributes.
>
> It seems to me like the current API is going to be a dead end if you
> ever want to have decent passthrough of these things for e.g. FUSE, or
> overlayfs, or VirtFS?

Ummm...

Would it be sufficient to have a function that returns a list of attributes?
Or does it need to be able to call to vfs_do_fsinfo() if it supports an
attribute?

There are two things I want to be able to do:

 (1) Do the buffer wrangling in the core - which means the core needs to see
     the type of the attribute.  That's fine if, say, afs_fsinfo() can call
     vfs_do_fsinfo() with the definition for any attribute it wants to handle
     and, say, return -ENOPKG otherways so that the core can then fall back to
     its private list.

 (2) Be able to retrieve the list of attributes and/or query an attribute.
     Now, I can probably manage this even through the same interface.  If,
     say, seeing FSINFO_ATTR_FSINFO_ATTRIBUTES causes the handler to simply
     append on the IDs of its own supported attributes (a helper can be
     provided for that).

     If it sees FSINFO_ATR_FSINFO_ATTRIBUTE_INFO, it can just look to see if
     it has the attribute with the ID matching Nth and return that, else
     ENOPKG - again a helper could be provided.

Chaining through overlayfs gets tricky.  You end up with multiple contributory
filesystems with different properties - and any one of those layers could
perhaps be another overlay.  Overlayfs would probably needs to integrate the
info and derive the lowest common set.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-20 11:03   ` David Howells
@ 2020-02-20 14:54     ` Jann Horn
  2020-02-20 15:31       ` Darrick J. Wong
  0 siblings, 1 reply; 67+ messages in thread
From: Jann Horn @ 2020-02-20 14:54 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Thu, Feb 20, 2020 at 12:04 PM David Howells <dhowells@redhat.com> wrote:
> Jann Horn <jannh@google.com> wrote:
>
> > > +int fsinfo_string(const char *s, struct fsinfo_context *ctx)
> > ...
> > Please add a check here to ensure that "ret" actually fits into the
> > buffer (and use WARN_ON() if you think the check should never fire).
> > Otherwise I think this is too fragile.
>
> How about:
>
>         int fsinfo_string(const char *s, struct fsinfo_context *ctx)
>         {
>                 unsigned int len;
>                 char *p = ctx->buffer;
>                 int ret = 0;
>                 if (s) {
>                         len = strlen(s);
>                         if (len > ctx->buf_size - 1)
>                                 len = ctx->buf_size;
>                         if (!ctx->want_size_only) {
>                                 memcpy(p, s, len);
>                                 p[len] = 0;

I think this is off-by-one? If len was too big, it is set to
ctx->buf_size, so in that case this effectively becomes
`ctx->buffer[ctx->buf_size] = 0`, which is one byte out of bounds,
right?

Maybe use something like `len = min_t(size_t, strlen(s), ctx->buf_size-1)` ?

Looks good apart from that, I think.

>                         }
>                         ret = len;
>                 }
>                 return ret;
>         }
[...]
> > > +       return ctx->usage;
> >
> > It is kind of weird that you have to return the ctx->usage everywhere
> > even though the caller already has ctx...
>
> At this point, it's only used and returned by fsinfo_attributes() and really
> is only for the use of the attribute getter function.
>
> I could, I suppose, return the amount of data in ctx->usage and then preset it
> for VSTRUCT-type objects.  Unfortunately, I can't make the getter return void
> since it might have to return an error.

Yeah, then you'd be passing around the error separately from the
length... I don't know whether that'd make things better or worse.

[...]
> > > +struct fsinfo_attribute {
> > > +       unsigned int            attr_id;        /* The ID of the attribute */
> > > +       enum fsinfo_value_type  type:8;         /* The type of the attribute's value(s) */
> > > +       unsigned int            flags:8;
> > > +       unsigned int            size:16;        /* - Value size (FSINFO_STRUCT) */
> > > +       unsigned int            element_size:16; /* - Element size (FSINFO_LIST) */
> > > +       int (*get)(struct path *path, struct fsinfo_context *params);
> > > +};
> >
> > Why the bitfields? It doesn't look like that's going to help you much,
> > you'll just end up with 6 bytes of holes on x86-64:
>
> Expanding them to non-bitfields will require an extra 10 bytes, making the
> struct 8 bytes bigger with 4 bytes of padding.  I can do that if you'd rather.

Wouldn't this still have the same total size?

struct fsinfo_attribute {
  unsigned int attr_id;        /* 0x0-0x3 */
  enum fsinfo_value_type type; /* 0x4-0x7 */
  u8 flags;                    /* 0x8-0x8 */
  /* 1-byte hole */
  u16 size;                    /* 0xa-0xb */
  u16 element_size;            /* 0xc-0xd */
  /* 2-byte hole */
  int (*get)(...);             /* 0x10-0x18 */
};

But it's not like I really care about this detail all that much, feel
free to leave it as-is.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/19] afs: Support fsinfo() [ver #16]
  2020-02-20 12:58   ` David Howells
@ 2020-02-20 14:58     ` Jann Horn
  2020-02-21 13:26     ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-20 14:58 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Thu, Feb 20, 2020 at 1:59 PM David Howells <dhowells@redhat.com> wrote:
> Jann Horn <jannh@google.com> wrote:
>
> > Ewww. So basically, having one static set of .fsinfo_attributes is not
> > sufficiently flexible for everyone, but instead of allowing the
> > filesystem to dynamically provide a list of supported attributes, you
> > just duplicate the super_operations? Seems to me like it'd be cleaner
> > to add a function pointer to the super_operations that can dynamically
> > fill out the supported fsinfo attributes.
> >
> > It seems to me like the current API is going to be a dead end if you
> > ever want to have decent passthrough of these things for e.g. FUSE, or
> > overlayfs, or VirtFS?
>
> Ummm...
>
> Would it be sufficient to have a function that returns a list of attributes?
> Or does it need to be able to call to vfs_do_fsinfo() if it supports an
> attribute?
>
> There are two things I want to be able to do:
>
>  (1) Do the buffer wrangling in the core - which means the core needs to see
>      the type of the attribute.  That's fine if, say, afs_fsinfo() can call
>      vfs_do_fsinfo() with the definition for any attribute it wants to handle
>      and, say, return -ENOPKG otherways so that the core can then fall back to
>      its private list.
>
>  (2) Be able to retrieve the list of attributes and/or query an attribute.
>      Now, I can probably manage this even through the same interface.  If,
>      say, seeing FSINFO_ATTR_FSINFO_ATTRIBUTES causes the handler to simply
>      append on the IDs of its own supported attributes (a helper can be
>      provided for that).
>
>      If it sees FSINFO_ATR_FSINFO_ATTRIBUTE_INFO, it can just look to see if
>      it has the attribute with the ID matching Nth and return that, else
>      ENOPKG - again a helper could be provided.
>
> Chaining through overlayfs gets tricky.  You end up with multiple contributory
> filesystems with different properties - and any one of those layers could
> perhaps be another overlay.  Overlayfs would probably needs to integrate the
> info and derive the lowest common set.

Hm - I guess just returning a list of attributes ought to be fine?
Then AFS can just return one of its two statically-allocated attribute
lists there, and a filesystem with more complicated circumstances
(like FUSE or overlayfs or whatever) can compute a heap-allocated list
on mount that is freed when the superblock goes away, or something
like that?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-20 14:54     ` Jann Horn
@ 2020-02-20 15:31       ` Darrick J. Wong
  0 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-20 15:31 UTC (permalink / raw)
  To: Jann Horn
  Cc: David Howells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

On Thu, Feb 20, 2020 at 03:54:25PM +0100, Jann Horn wrote:
> On Thu, Feb 20, 2020 at 12:04 PM David Howells <dhowells@redhat.com> wrote:
> > Jann Horn <jannh@google.com> wrote:
> >
> > > > +int fsinfo_string(const char *s, struct fsinfo_context *ctx)
> > > ...
> > > Please add a check here to ensure that "ret" actually fits into the
> > > buffer (and use WARN_ON() if you think the check should never fire).
> > > Otherwise I think this is too fragile.
> >
> > How about:
> >
> >         int fsinfo_string(const char *s, struct fsinfo_context *ctx)
> >         {
> >                 unsigned int len;
> >                 char *p = ctx->buffer;
> >                 int ret = 0;
> >                 if (s) {
> >                         len = strlen(s);
> >                         if (len > ctx->buf_size - 1)
> >                                 len = ctx->buf_size;
> >                         if (!ctx->want_size_only) {
> >                                 memcpy(p, s, len);
> >                                 p[len] = 0;
> 
> I think this is off-by-one? If len was too big, it is set to
> ctx->buf_size, so in that case this effectively becomes
> `ctx->buffer[ctx->buf_size] = 0`, which is one byte out of bounds,
> right?
> 
> Maybe use something like `len = min_t(size_t, strlen(s), ctx->buf_size-1)` ?
> 
> Looks good apart from that, I think.
> 
> >                         }
> >                         ret = len;
> >                 }
> >                 return ret;
> >         }
> [...]
> > > > +       return ctx->usage;
> > >
> > > It is kind of weird that you have to return the ctx->usage everywhere
> > > even though the caller already has ctx...
> >
> > At this point, it's only used and returned by fsinfo_attributes() and really
> > is only for the use of the attribute getter function.
> >
> > I could, I suppose, return the amount of data in ctx->usage and then preset it
> > for VSTRUCT-type objects.  Unfortunately, I can't make the getter return void
> > since it might have to return an error.
> 
> Yeah, then you'd be passing around the error separately from the
> length... I don't know whether that'd make things better or worse.
> 
> [...]
> > > > +struct fsinfo_attribute {
> > > > +       unsigned int            attr_id;        /* The ID of the attribute */
> > > > +       enum fsinfo_value_type  type:8;         /* The type of the attribute's value(s) */
> > > > +       unsigned int            flags:8;
> > > > +       unsigned int            size:16;        /* - Value size (FSINFO_STRUCT) */
> > > > +       unsigned int            element_size:16; /* - Element size (FSINFO_LIST) */
> > > > +       int (*get)(struct path *path, struct fsinfo_context *params);
> > > > +};
> > >
> > > Why the bitfields? It doesn't look like that's going to help you much,
> > > you'll just end up with 6 bytes of holes on x86-64:
> >
> > Expanding them to non-bitfields will require an extra 10 bytes, making the
> > struct 8 bytes bigger with 4 bytes of padding.  I can do that if you'd rather.
> 
> Wouldn't this still have the same total size?
> 
> struct fsinfo_attribute {
>   unsigned int attr_id;        /* 0x0-0x3 */
>   enum fsinfo_value_type type; /* 0x4-0x7 */
>   u8 flags;                    /* 0x8-0x8 */
>   /* 1-byte hole */
>   u16 size;                    /* 0xa-0xb */
>   u16 element_size;            /* 0xc-0xd */
>   /* 2-byte hole */
>   int (*get)(...);             /* 0x10-0x18 */
> };
> 
> But it's not like I really care about this detail all that much, feel
> free to leave it as-is.

I was thinking, why not just have unsigned int flags from the start?
That replaces the padding holes with usable flag space, though I guess
this is in-core only so I'm not that passionate.  I doubt we're going to
have millions of fsinfo attributes. :)

--D

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information [ver #16]
  2020-02-20 10:34   ` David Howells
@ 2020-02-20 15:48     ` Darrick J. Wong
  0 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-20 15:48 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Thu, Feb 20, 2020 at 10:34:35AM +0000, David Howells wrote:
> Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > > +	p->f_blocks.hi	= 0;
> > > +	p->f_blocks.lo	= buf.f_blocks;
> > 
> > Er... are there filesystems (besides that (xfs++)++ one) that require
> > u128 counters?  I suspect that the Very Large Fields are for future
> > expandability, but I also wonder about the whether it's worth the
> > complexity of doing this, since the structures can always be
> > version-revved later.
> 
> I'm making a relatively cheap allowance for future expansion.  Dave Chinner
> has mentioned at one of the LSFs that 16EiB may be exceeded soon (though I
> hate to think of fscking such a beastie).  I know that the YFS variant of AFS
> supports 96-bit vnode numbers (which I translate to inode numbers).  What I'm

Aha, you /already/ have a use for larger-than-64bit numbers.  Got it. :)

> trying to avoid is the problem we have with stat/statfs where under some
> circumstances we have to return an error (ERANGE?) because we can't represent
> the number if someone asks for an older version of the struct.
> 
> Since the buffer is (meant to be) pre-cleared, the filesystem can just ignore
> the high word if it's never going to set it.  In fact, fsinfo_generic_statfs
> doesn't need to set them either.
> 
> > XFS inodes are u64 values...
> > ...and the max symlink target length is 1k, not PAGE_SIZE...
> 
> Yeah, and AFS(YFS) has 96-bit inode numbers.  The filesystem's fsinfo table is
> read first so that the filesystem can override this.
> 
> > ...so is the usage model here that XFS should call fsinfo_generic_limits
> > to fill out the fsinfo_limits structure, modify the values in
> > ctx->buffer as appropriate for XFS, and then return the structure size?
> 
> Actually, I should export some these so that you can do that.  I'll export
> fsinfo_generic_{timestamp_info,supports,limits} for now.

Thank you.

> > > +#define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
> > > +#define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
> > > +#define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
> > 
> > I think I've muttered about the distinction between volume id and
> > volume name before, but I'm still wondering how confusing that will be
> > for users?  Let me check my assumptions, though:
> > 
> > Volume ID is whatever's in super_block.s_id, which (at least for xfs and
> > ext4) is the device name (e.g. "sda1").  I guess that's useful for
> > correlating a thing you can call fsinfo() on against strings that were
> > logged in dmesg.
> >
> > Volume name I think is the fs label (e.g. "home"), which I think will
> > have to be implemented separately by each filesystem, and that's why
> > there's no generic vfs implementation.
> 
> Yes.  For AFS, for example, this would be the name of the volume (which may be
> changed), whereas the volume ID is the number in the protocol that actually
> refers to the volume (which cannot be changed).
> 
> And, as you say, for blockdev mounts, the ID is the device name and the volume
> name is filesystem specific.
> 
> > The 7 -> 0 -> 1 sequence here confused me until I figured out that
> > QUERY_TYPE is the mask for QUERY_{PATH,FD}.
> 
> Changed to FSINFO_FLAGS_QUERY_MASK.
> 
> > > +struct fsinfo_limits {
> > > +...
> > > +	__u32	__reserved[1];
> > 
> > I wonder if these structures ought to reserve more space than a single u32...
> 
> No need.  Part of the way the interface is designed is that the version number
> for a particular VSTRUCT-type attribute is also the length.  So a newer
> version is also longer.  All the old fields must be retained and filled in.
> New fields are tagged on the end.
> 
> If userspace asks for an older version than is supported, it gets a truncated
> return.  If it asks for a newer version, the extra fields it is expecting are
> all set to 0.  Either way, the length (and thus the version) the kernel
> supports is returned - not the length copied.
> 
> The __reserved fields are there because they represent padding (the struct is
> going to be aligned/padded according to __u64 in this case).  Ideally, I'd
> mark the structs __packed, but this messes with the alignment and may make the
> compiler do funny tricks to get out any field larger than a byte.
> 
> I've renamed them to __padding.
> 
> > > +struct fsinfo_supports {
> > > +	__u64	stx_attributes;		/* What statx::stx_attributes are supported */
> > > +	__u32	stx_mask;		/* What statx::stx_mask bits are supported */
> > > +	__u32	ioc_flags;		/* What FS_IOC_* flags are supported */
> > 
> > "IOC"?  That just means 'ioctl'.  Is this field supposed to return the
> > supported FS_IOC_GETFLAGS flags, or the supported FS_IOC_FSGETXATTR
> > flags?
> 
> FS_IOC_[GS]ETFLAGS is what I meant.
> 
> > I suspect it would also be a big help to be able to tell userspace which
> > of the flags can be set, and which can be cleared.
> 
> How about:
> 
> 	__u32	fs_ioc_getflags;	/* What FS_IOC_GETFLAGS may return */
> 	__u32	fs_ioc_setflags_set;	/* What FS_IOC_SETFLAGS may set */
> 	__u32	fs_ioc_setflags_clear;	/* What FS_IOC_SETFLAGS may clear */

Looks good to me.  At some point it'll be good to add another fsinfo
verb or something to query exactly what parts of struct fsxattr can be
set, cleared, or returned, but that can wait.

> > > +struct fsinfo_timestamp_one {
> > > +	__s64	minimum;	/* Minimum timestamp value in seconds */
> > > +	__u64	maximum;	/* Maximum timestamp value in seconds */
> > 
> > Given that time64_t is s64, why is the maximum here u64?
> 
> Well, I assume it extremely unlikely that the maximum can be before 1970, so
> there doesn't seem any need to allow the maximum to be negative.  Furthermore,
> it would be feasible that you could encounter a filesystem with a u64
> filesystem that doesn't support dates before 1970.
> 
> On the other hand, if Linux is still going when __s64 seconds from 1970 wraps,
> it will be impressive, but I'm not sure we'll be around to see it...  Someone
> will have to cast a resurrection spell on Arnd to fix that one.

I fear we /all/ will be around, having forcibly been uploaded to The
Cloud because nobody else knows how to maintain the Linux kernel, and we
can't have the drinks dispenser on B deck going bonkers twice per stardate.
(j/k)

> However, since signed/unsigned comparisons may have issues, I can turn it into
> an __s64 also if that is preferred.

It /would/ be more consistent with the rest of the kernel, and since the
kernel & userspace don't support timestamps beyond 8Es, there's no point
in advertising a maximum beyond that.

--D

> David
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 13/19] vfs: Add a mount-notification facility [ver #16]
  2020-02-18 17:06 ` [PATCH 13/19] vfs: Add a mount-notification facility " David Howells
  2020-02-19 22:40   ` Jann Horn
@ 2020-02-21 12:24   ` David Howells
  2020-02-21 15:49     ` Jann Horn
  2020-02-21 17:06     ` David Howells
  1 sibling, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-21 12:24 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> > + * Post mount notifications to all watches going rootwards along the tree.
> > + *
> > + * Must be called with the mount_lock held.
> 
> Please put such constraints into lockdep assertions instead of
> comments; that way, violations can actually be detected.

What's the best way to write a lockdep assertion?

	BUG_ON(!lockdep_is_held(lock));

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 00/19] VFS: Filesystem information and notifications [ver #16]
  2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
                   ` (22 preceding siblings ...)
  2020-02-19 16:16 ` David Howells
@ 2020-02-21 12:57 ` David Howells
  23 siblings, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-21 12:57 UTC (permalink / raw)
  To: Stefan Metzmacher
  Cc: dhowells, viro, raven, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

Stefan Metzmacher <metze@samba.org> wrote:

> > fsinfo() may be called like the following, for example:
> > 
> > 	struct fsinfo_params params = {
> > 		.at_flags	= AT_SYMLINK_NOFOLLOW,
> 
> Shouldn't all new syscalls be able to provide the RESOLVE_ flags
> supported in openat2?

If that's the rule, then fine.  I presume these are a replacement for AT_*.
But the set of RESOLVE_* flags does not appear to be complete - and why's it
not in linux/fs.h if it's meant to be used by everything?

Anyway, it lacks a RESOLVE_NO_AUTOMOUNT flag.  This is not quite the same as
the documented behaviour of RESOLVE_NO_XDEV.

> > 	len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", &params,
> > 		     &address, sizeof(address));
> 
> Also passing sizeof(params) would allow future updates of fsinfo_params,
> also similar to openat2(), clone3()...

I can put that at the beginning of the params block or put dirfd in there.  If
I remember rightly, 6-arg syscalls are discouraged because they may need
special handling on some arches.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/19] afs: Support fsinfo() [ver #16]
  2020-02-20 12:58   ` David Howells
  2020-02-20 14:58     ` Jann Horn
@ 2020-02-21 13:26     ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-21 13:26 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> Hm - I guess just returning a list of attributes ought to be fine?
> Then AFS can just return one of its two statically-allocated attribute
> lists there, and a filesystem with more complicated circumstances
> (like FUSE or overlayfs or whatever) can compute a heap-allocated list
> on mount that is freed when the superblock goes away, or something
> like that?

I've changed it so that the core calls into the filesystem with no buffer
allocated first.  If the fs finds an appropriate attribute, it calls a helper
to handle it.  As there's no buffer, this will just return the size.

If the fs doesn't have a handler, it returns -EOPNOTSUPP and the core looks
for a common attribute instead and calls the helper on that if found.

At this point, if a valid length was returned and if userspace didn't specify
a buffer, we just return the proposed size to userspace.

If userspace did specify a buffer, then core will allocate a buffer of the
requested size and call into the filesystem again.  The helper will call the
->get() function to retrieve the value.  The ->get() function returns the
size.

If the returned size exceeds the buffer size, a bigger buffer will be
allocated and it will repeat the last step.

A simple example looks like:

	int ext4_fsinfo(struct path *path, struct fsinfo_context *ctx)
	{
		return fsinfo_get_attribute(path, ctx, ext4_fsinfo_attributes);
	}

where the ext4_fsinfo_attributes is an array of attribute defs.  The helper,
fsinfo_get_attribute() scans the list.  The helper can be called multiple
times if there's more than one list to process.  The caller should stop if one
doesn't return -EOPNOTSUPP.


When the attribute IDs are being listed, the helper will detect that and just
add all the IDs to the list, returning -EOPNOTSUPP when it's done so that all
the attributes get listed.

When the metadata for an attribute is being retrieved, the helper detects that
and searches the given table for that attribute.  If it finds it, it will
return information about that attribute rather than calling the attribute
helper.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]
  2020-02-18 17:06 ` [PATCH 15/19] vfs: Add superblock " David Howells
  2020-02-19 23:08   ` Jann Horn
@ 2020-02-21 14:23   ` David Howells
  2020-02-21 15:44     ` Jann Horn
  2020-02-21 16:33     ` David Howells
  1 sibling, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-21 14:23 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> > +               if (!s->s_watchers) {
> 
> READ_ONCE() ?

I'm not sure it matters.  It can only be set once, and the next time we read
it we're inside the lock.  And at this point, I don't actually dereference it,
and if it's non-NULL, it's not going to change.

> > +                       ret = add_watch_to_object(watch, s->s_watchers);
> > +                       if (ret == 0) {
> > +                               spin_lock(&sb_lock);
> > +                               s->s_count++;
> > +                               spin_unlock(&sb_lock);
> 
> Where is the corresponding decrement of s->s_count? I'm guessing that
> it should be in the ->release_watch() handler, except that there isn't
> one...

Um.  Good question.  I think this should do the job:

	static void sb_release_watch(struct watch *watch)
	{
		put_super(watch->private);
	}

And this then has to be set later:

	init_watch_list(wlist, sb_release_watch);

> > +       } else {
> > +               ret = -EBADSLT;
> > +               if (READ_ONCE(s->s_watchers)) {
> 
> (Nit: I don't get why you do a lockless check here before taking the
> lock - it'd be more straightforward to take the lock first, and it's
> not like you want to optimize for the case where someone calls
> sys_watch_sb() with invalid arguments...)

Fair enough.  I'll remove it.

> > +#ifdef CONFIG_SB_NOTIFICATIONS
> > +       if (unlikely(s->s_watchers)) {
> 
> READ_ONCE() ?

Shouldn't matter.  It's only read once and then a decision is made on it
immediately thereafter.  And if it's non-NULL, the value cannot change
thereafter.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 18/19] ext4: Add example fsinfo information [ver #16]
  2020-02-18 17:07 ` [PATCH 18/19] ext4: Add example fsinfo information " David Howells
  2020-02-19 17:04   ` Darrick J. Wong
  2020-02-20  0:53   ` kbuild test robot
@ 2020-02-21 14:43   ` David Howells
  2020-02-21 16:26     ` Darrick J. Wong
  2 siblings, 1 reply; 67+ messages in thread
From: David Howells @ 2020-02-21 14:43 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dhowells, viro, raven, mszeredi, christian, linux-api,
	linux-fsdevel, linux-kernel

Darrick J. Wong <darrick.wong@oracle.com> wrote:

> > +	memcpy(ctx->buffer, es->s_volume_name, sizeof(es->s_volume_name));
> 
> Shouldn't this be checking that ctx->buffer is large enough to hold
> s_volume_name?

Well, the buffer is guaranteed to be 4KiB in size.

> > +	return strlen(ctx->buffer);
> 
> s_volume_name is /not/ a null-terminated string if the label is 16
> characters long.

And the buffer is precleared, so it's automatically NULL terminated.

> > +#define FSINFO_ATTR_EXT4_TIMESTAMPS	0x400	/* Ext4 superblock timestamps */
> 
> I guess each filesystem gets ... 256 different attrs, and the third
> nibble determines the namespace?

No.  Think of it as allocating namespace in 256-number blocks.  That means
there are 16 million of them.  If a filesystem uses up an entire block, it can
always allocate another one.  I don't think it likely that we'll get
sufficient filesystems to eat them all.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 04/19] vfs: Add mount change counter [ver #16]
  2020-02-18 17:05 ` [PATCH 04/19] vfs: Add mount change counter " David Howells
@ 2020-02-21 14:48   ` Christian Brauner
  0 siblings, 0 replies; 67+ messages in thread
From: Christian Brauner @ 2020-02-21 14:48 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:05:28PM +0000, David Howells wrote:
> Add a change counter on each mount object so that the user can easily check
> to see if a mount has changed its attributes or its children.
> 
> Future patches will:
> 
>  (1) Provide this value through fsinfo() attributes.
> 
>  (2) Implement a watch_mount() system call to provide a notification
>      interface for userspace monitoring.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/mount.h     |   22 ++++++++++++++++++++++
>  fs/namespace.c |   11 +++++++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index 711a4093e475..a1625924fe81 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -72,6 +72,7 @@ struct mount {
>  	int mnt_expiry_mark;		/* true if marked for expiry */
>  	struct hlist_head mnt_pins;
>  	struct hlist_head mnt_stuck_children;
> +	atomic_t mnt_change_counter;	/* Number of changed applied */

Nit: "Number of changes applied"?

>  } __randomize_layout;
>  
>  #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */
> @@ -153,3 +154,24 @@ static inline bool is_anon_ns(struct mnt_namespace *ns)
>  {
>  	return ns->seq == 0;
>  }
> +
> +/*
> + * Type of mount topology change notification.
> + */
> +enum mount_notification_subtype {
> +	NOTIFY_MOUNT_NEW_MOUNT	= 0, /* New mount added */
> +	NOTIFY_MOUNT_UNMOUNT	= 1, /* Mount removed manually */
> +	NOTIFY_MOUNT_EXPIRY	= 2, /* Automount expired */
> +	NOTIFY_MOUNT_READONLY	= 3, /* Mount R/O state changed */
> +	NOTIFY_MOUNT_SETATTR	= 4, /* Mount attributes changed */
> +	NOTIFY_MOUNT_MOVE_FROM	= 5, /* Mount moved from here */
> +	NOTIFY_MOUNT_MOVE_TO	= 6, /* Mount moved to here (compare op_id) */
> +};
> +
> +static inline void notify_mount(struct mount *changed,
> +				struct mount *aux,
> +				enum mount_notification_subtype subtype,
> +				u32 info_flags)
> +{
> +	atomic_inc(&changed->mnt_change_counter);
> +}
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 85b5f7bea82e..5c84aadb6aa1 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -498,6 +498,8 @@ static int mnt_make_readonly(struct mount *mnt)
>  	smp_wmb();
>  	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
>  	unlock_mount_hash();
> +	if (ret == 0)
> +		notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, 0x10000);

Why is 0x10000 hard-coded here?
Hm, maybe for future patches where this will be replaced with a #define
but not sure...

>  	return ret;
>  }
>  
> @@ -506,6 +508,7 @@ static int __mnt_unmake_readonly(struct mount *mnt)
>  	lock_mount_hash();
>  	mnt->mnt.mnt_flags &= ~MNT_READONLY;
>  	unlock_mount_hash();
> +	notify_mount(mnt, NULL, NOTIFY_MOUNT_READONLY, 0);
>  	return 0;
>  }
>  
> @@ -819,6 +822,7 @@ static struct mountpoint *unhash_mnt(struct mount *mnt)
>   */
>  static void umount_mnt(struct mount *mnt)
>  {
> +	notify_mount(mnt->mnt_parent, mnt, NOTIFY_MOUNT_UNMOUNT, 0);
>  	put_mountpoint(unhash_mnt(mnt));
>  }
>  
> @@ -1453,6 +1457,7 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
>  		p = list_first_entry(&tmp_list, struct mount, mnt_list);
>  		list_del_init(&p->mnt_expire);
>  		list_del_init(&p->mnt_list);
> +
>  		ns = p->mnt_ns;
>  		if (ns) {
>  			ns->mounts--;
> @@ -2078,7 +2083,10 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  		lock_mount_hash();
>  	}
>  	if (moving) {
> +		notify_mount(source_mnt->mnt_parent, source_mnt,
> +			     NOTIFY_MOUNT_MOVE_FROM, 0);
>  		unhash_mnt(source_mnt);
> +		notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_MOVE_TO, 0);
>  		attach_mnt(source_mnt, dest_mnt, dest_mp);
>  		touch_mnt_namespace(source_mnt->mnt_ns);
>  	} else {
> @@ -2087,6 +2095,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  			list_del_init(&source_mnt->mnt_ns->list);
>  		}
>  		mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
> +		notify_mount(dest_mnt, source_mnt, NOTIFY_MOUNT_NEW_MOUNT, 0);
>  		commit_tree(source_mnt);
>  	}
>  
> @@ -2464,6 +2473,7 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
>  	mnt->mnt.mnt_flags = mnt_flags;
>  	touch_mnt_namespace(mnt->mnt_ns);
>  	unlock_mount_hash();
> +	notify_mount(mnt, NULL, NOTIFY_MOUNT_SETATTR, 0);
>  }
>  
>  static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt)
> @@ -2898,6 +2908,7 @@ void mark_mounts_for_expiry(struct list_head *mounts)
>  		if (!xchg(&mnt->mnt_expiry_mark, 1) ||
>  			propagate_mount_busy(mnt, 1))
>  			continue;
> +		notify_mount(mnt, NULL, NOTIFY_MOUNT_EXPIRY, 0);
>  		list_move(&mnt->mnt_expire, &graveyard);
>  	}
>  	while (!list_empty(&graveyard)) {
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/19] fsinfo: Add syscalls to other arches [ver #16]
  2020-02-18 17:05 ` [PATCH 02/19] fsinfo: Add syscalls to other arches " David Howells
@ 2020-02-21 14:51   ` Christian Brauner
  2020-02-21 18:10     ` Geert Uytterhoeven
  0 siblings, 1 reply; 67+ messages in thread
From: Christian Brauner @ 2020-02-21 14:51 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:05:13PM +0000, David Howells wrote:
> Add the fsinfo syscall to the other arches.

Over the last couple of kernel releases we have established the
convention that we wire-up a syscall for all arches at the same time and
not e.g. x86 and then the other separately.

> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  arch/alpha/kernel/syscalls/syscall.tbl      |    1 +
>  arch/arm/tools/syscall.tbl                  |    1 +
>  arch/arm64/include/asm/unistd.h             |    2 +-
>  arch/arm64/include/asm/unistd32.h           |    2 +-
>  arch/ia64/kernel/syscalls/syscall.tbl       |    1 +
>  arch/m68k/kernel/syscalls/syscall.tbl       |    2 ++
>  arch/microblaze/kernel/syscalls/syscall.tbl |    1 +
>  arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 +
>  arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 +
>  arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 +
>  arch/parisc/kernel/syscalls/syscall.tbl     |    1 +
>  arch/powerpc/kernel/syscalls/syscall.tbl    |    1 +
>  arch/s390/kernel/syscalls/syscall.tbl       |    1 +
>  arch/sh/kernel/syscalls/syscall.tbl         |    1 +
>  arch/sparc/kernel/syscalls/syscall.tbl      |    1 +
>  arch/xtensa/kernel/syscalls/syscall.tbl     |    1 +
>  16 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
> index 36d42da7466a..961750417ef2 100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -477,3 +477,4 @@
>  # 545 reserved for clone3
>  547	common	openat2				sys_openat2
>  548	common	pidfd_getfd			sys_pidfd_getfd
> +549	common	fsinfo				sys_fsinfo
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index 4d1cf74a2caa..e6b9dfe01471 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -451,3 +451,4 @@
>  435	common	clone3				sys_clone3
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> index 1dd22da1c3a9..75f04a1023be 100644
> --- a/arch/arm64/include/asm/unistd.h
> +++ b/arch/arm64/include/asm/unistd.h
> @@ -38,7 +38,7 @@
>  #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
>  #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
>  
> -#define __NR_compat_syscalls		439
> +#define __NR_compat_syscalls		440
>  #endif
>  
>  #define __ARCH_WANT_SYS_CLONE
> diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> index c1c61635f89c..7d1225de47e6 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -873,7 +873,7 @@ __SYSCALL(__NR_fsopen, sys_fsopen)
>  __SYSCALL(__NR_fsconfig, sys_fsconfig)
>  #define __NR_fsmount 432
>  __SYSCALL(__NR_fsmount, sys_fsmount)
> -#define __NR_fspick 433
> +#define __NR_fspick 434
>  __SYSCALL(__NR_fspick, sys_fspick)
>  #define __NR_pidfd_open 434
>  __SYSCALL(__NR_pidfd_open, sys_pidfd_open)
> diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
> index 042911e670b8..9018a3a6b067 100644
> --- a/arch/ia64/kernel/syscalls/syscall.tbl
> +++ b/arch/ia64/kernel/syscalls/syscall.tbl
> @@ -358,3 +358,4 @@
>  # 435 reserved for clone3
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
> index f4f49fcb76d0..10172bb6ba1f 100644
> --- a/arch/m68k/kernel/syscalls/syscall.tbl
> +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> @@ -437,3 +437,5 @@
>  435	common	clone3				__sys_clone3
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +# 435 reserved for clone3
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> index 4c67b11f9c9e..58665073c1f0 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -443,3 +443,4 @@
>  435	common	clone3				sys_clone3
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index 1f9e8ad636cc..1f07a89473c3 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -376,3 +376,4 @@
>  435	n32	clone3				__sys_clone3
>  437	n32	openat2				sys_openat2
>  438	n32	pidfd_getfd			sys_pidfd_getfd
> +439	n32	fsinfo				sys_fsinfo
> diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> index c0b9d802dbf6..3c853ca54901 100644
> --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> @@ -352,3 +352,4 @@
>  435	n64	clone3				__sys_clone3
>  437	n64	openat2				sys_openat2
>  438	n64	pidfd_getfd			sys_pidfd_getfd
> +439	n64	fsinfo				sys_fsinfo
> diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
> index ac586774c980..727f54542bf4 100644
> --- a/arch/mips/kernel/syscalls/syscall_o32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
> @@ -425,3 +425,4 @@
>  435	o32	clone3				__sys_clone3
>  437	o32	openat2				sys_openat2
>  438	o32	pidfd_getfd			sys_pidfd_getfd
> +439	o32	fsinfo				sys_fsinfo
> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
> index 52a15f5cd130..2e9576638d80 100644
> --- a/arch/parisc/kernel/syscalls/syscall.tbl
> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> @@ -435,3 +435,4 @@
>  435	common	clone3				sys_clone3_wrapper
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> index 35b61bfc1b1a..397190734ca7 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -519,3 +519,4 @@
>  435	nospu	clone3				ppc_clone3
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> index bd7bd3581a0f..e9340d712dcd 100644
> --- a/arch/s390/kernel/syscalls/syscall.tbl
> +++ b/arch/s390/kernel/syscalls/syscall.tbl
> @@ -440,3 +440,4 @@
>  435  common	clone3			sys_clone3			sys_clone3
>  437  common	openat2			sys_openat2			sys_openat2
>  438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
> +439  common	fsinfo			sys_fsinfo			sys_fsinfo
> diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> index c7a30fcd135f..7bb5ec284fbb 100644
> --- a/arch/sh/kernel/syscalls/syscall.tbl
> +++ b/arch/sh/kernel/syscalls/syscall.tbl
> @@ -440,3 +440,4 @@
>  # 435 reserved for clone3
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
> index f13615ecdecc..a902b757ace2 100644
> --- a/arch/sparc/kernel/syscalls/syscall.tbl
> +++ b/arch/sparc/kernel/syscalls/syscall.tbl
> @@ -483,3 +483,4 @@
>  # 435 reserved for clone3
>  437	common	openat2			sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> index 85a9ab1bc04d..f82702a7ab38 100644
> --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> @@ -408,3 +408,4 @@
>  435	common	clone3				sys_clone3
>  437	common	openat2				sys_openat2
>  438	common	pidfd_getfd			sys_pidfd_getfd
> +439	common	fsinfo				sys_fsinfo
> 
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 06/19] vfs: Allow fsinfo() to look up a mount object by ID [ver #16]
  2020-02-18 17:05 ` [PATCH 06/19] vfs: Allow fsinfo() to look up a mount object by " David Howells
@ 2020-02-21 15:09   ` Christian Brauner
  0 siblings, 0 replies; 67+ messages in thread
From: Christian Brauner @ 2020-02-21 15:09 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Tue, Feb 18, 2020 at 05:05:43PM +0000, David Howells wrote:
> Allow the fsinfo() syscall to look up a mount object by ID rather than by
> pathname.  This is necessary as there can be multiple mounts stacked up at
> the same pathname and there's no way to look through them otherwise.
> 
> This is done by passing FSINFO_FLAGS_QUERY_MOUNT to fsinfo() in the
> parameters and then passing the mount ID as a string to fsinfo() in place
> of the filename:
> 
> 	struct fsinfo_params params = {
> 		.flags	 = FSINFO_FLAGS_QUERY_MOUNT,
> 		.request = FSINFO_ATTR_IDS,
> 	};
> 
> 	ret = fsinfo(AT_FDCWD, "21", &params, buffer, sizeof(buffer));
> 
> The caller is only permitted to query a mount object if the root directory
> of that mount connects directly to the current chroot if dfd == AT_FDCWD[*]
> or the directory specified by dfd otherwise.  Note that this is not
> available to the pathwalk of any other syscall.
> 
> [*] This needs to be something other than AT_FDCWD, perhaps AT_FDROOT.

Sounds like it should accept LOOKUP_BENEATH.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]
  2020-02-21 14:23   ` David Howells
@ 2020-02-21 15:44     ` Jann Horn
  2020-02-21 16:33     ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-21 15:44 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Fri, Feb 21, 2020 at 3:24 PM David Howells <dhowells@redhat.com> wrote:
>
> Jann Horn <jannh@google.com> wrote:
>
> > > +               if (!s->s_watchers) {
> >
> > READ_ONCE() ?
>
> I'm not sure it matters.  It can only be set once, and the next time we read
> it we're inside the lock.  And at this point, I don't actually dereference it,
> and if it's non-NULL, it's not going to change.

I'd really like these READ_ONCE() things to be *anywhere* the value
can concurrently change, for two reasons:

First, it tells the reader "keep in mind that this value may
concurrently change in some way, don't just assume that it'll stay the
same".

But also, it tells the compiler that if it generates multiple loads
here and assumes that they return the same value, *really* bad stuff
may happen. GCC has some really fun behavior when compiling a switch()
on a value that might change concurrently without using READ_ONCE():
It sometimes generates multiple loads, where the first load is used to
test whether the value is in a specific range and then the second load
is used for actually indexing into a table of jump destinations. If
the value is concurrently mutated from an in-bounds value to an
out-of-bounds value, this code will load a jump destination from
random out-of-bounds memory.

An example:

$ cat gcc-jump.c
int blah(int *x, int y) {
  switch (*x) {
    case 0: return y+1;
    case 1: return y*2;
    case 2: return y-3;
    case 3: return y^1;
    case 4: return y+6;
    case 5: return y-5;
    case 6: return y|1;
    case 7: return y&4;
    case 8: return y|5;
    case 9: return y-3;
    case 10: return y&8;
    case 11: return y|9;
    default: return y;
  }
}
$ gcc-9 -O2 -c -o gcc-jump.o gcc-jump.c
$ objdump -dr gcc-jump.o
[...]
0000000000000000 <blah>:
   0: 83 3f 0b              cmpl   $0xb,(%rdi)
   3: 0f 87 00 00 00 00    ja     9 <blah+0x9>
5: R_X86_64_PC32 .text.unlikely-0x4
   9: 8b 07                mov    (%rdi),%eax
   b: 48 8d 15 00 00 00 00 lea    0x0(%rip),%rdx        # 12 <blah+0x12>
e: R_X86_64_PC32 .rodata-0x4
  12: 48 63 04 82          movslq (%rdx,%rax,4),%rax
  16: 48 01 d0              add    %rdx,%rax
  19: ff e0                jmpq   *%rax
[...]


Or if you want to see a full example that actually crashes:

$ cat gcc-jump-crash.c
#include <pthread.h>

int mutating_number;

__attribute__((noinline)) int blah(int *x, int y) {
  switch (*x) {
    case 0: return y+1;
    case 1: return y*2;
    case 2: return y-3;
    case 3: return y^1;
    case 4: return y+6;
    case 5: return y-5;
    case 6: return y|1;
    case 7: return y&4;
    case 8: return y|5;
    case 9: return y-3;
    case 10: return y&8;
    case 11: return y|9;
    default: return y;
  }
}

int blah_num;
void *thread_fn(void *dummy) {
  while (1) {
    blah_num = blah(&mutating_number, blah_num);
  }
}

int main(void) {
  pthread_t thread;
  pthread_create(&thread, NULL, thread_fn, NULL);
  while (1) {
    *(volatile int *)&mutating_number = 1;
    *(volatile int *)&mutating_number = 100000000;
  }
}
$ gcc-9 -O2 -pthread -o gcc-jump-crash gcc-jump-crash.c -ggdb -Wall
$ gdb ./gcc-jump-crash
[...]
(gdb) run
[...]
Thread 2 "gcc-jump-crash" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7db6700 (LWP 33237)]
0x00005555555551a2 in blah (x=0x555555558034 <mutating_number>, y=0)
at gcc-jump-crash.c:6
6   switch (*x) {
(gdb) x/10i blah
   0x555555555190 <blah>: cmp    DWORD PTR [rdi],0xb
   0x555555555193 <blah+3>: ja     0x555555555050 <blah+4294966976>
   0x555555555199 <blah+9>: mov    eax,DWORD PTR [rdi]
   0x55555555519b <blah+11>: lea    rdx,[rip+0xe62]        # 0x555555556004
=> 0x5555555551a2 <blah+18>: movsxd rax,DWORD PTR [rdx+rax*4]
   0x5555555551a6 <blah+22>: add    rax,rdx
   0x5555555551a9 <blah+25>: jmp    rax
   0x5555555551ab <blah+27>: nop    DWORD PTR [rax+rax*1+0x0]
   0x5555555551b0 <blah+32>: lea    eax,[rsi-0x3]
   0x5555555551b3 <blah+35>: ret
(gdb)


Here's a presentation from Felix Wilhelm, a security researcher who
managed to find a case in the Xen hypervisor where a switch() on a
value in shared memory was exploitable to compromise the hypervisor
from inside a guest (see slides 35 and following):
<https://www.blackhat.com/docs/us-16/materials/us-16-Wilhelm-Xenpwn-Breaking-Paravirtualized-Devices.pdf>

I realize that a compiler is extremely unlikely to make such an
optimization decision for a simple "if (!a->b)" branch; but still, I
would prefer to have READ_ONCE() everywhere where it is semantically
required, not just everywhere where you can think of a concrete
compiler optimization that will break stuff.

> > > +                       ret = add_watch_to_object(watch, s->s_watchers);
> > > +                       if (ret == 0) {
> > > +                               spin_lock(&sb_lock);
> > > +                               s->s_count++;
> > > +                               spin_unlock(&sb_lock);
> >
> > Where is the corresponding decrement of s->s_count? I'm guessing that
> > it should be in the ->release_watch() handler, except that there isn't
> > one...
>
> Um.  Good question.  I think this should do the job:
>
>         static void sb_release_watch(struct watch *watch)
>         {
>                 put_super(watch->private);
>         }
>
> And this then has to be set later:
>
>         init_watch_list(wlist, sb_release_watch);

(And as in the other case, the s->s_count increment will probably have
to be moved above the add_watch_to_object(), unless you hold the
sb_lock around it?)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 13/19] vfs: Add a mount-notification facility [ver #16]
  2020-02-21 12:24   ` David Howells
@ 2020-02-21 15:49     ` Jann Horn
  2020-02-21 17:06     ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-21 15:49 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Fri, Feb 21, 2020 at 1:24 PM David Howells <dhowells@redhat.com> wrote:
> Jann Horn <jannh@google.com> wrote:
>
> > > + * Post mount notifications to all watches going rootwards along the tree.
> > > + *
> > > + * Must be called with the mount_lock held.
> >
> > Please put such constraints into lockdep assertions instead of
> > comments; that way, violations can actually be detected.
>
> What's the best way to write a lockdep assertion?
>
>         BUG_ON(!lockdep_is_held(lock));

lockdep_assert_held(lock) is the normal way, I think - that will
WARN() if lockdep is enabled and the lock is not held.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 18/19] ext4: Add example fsinfo information [ver #16]
  2020-02-21 14:43   ` David Howells
@ 2020-02-21 16:26     ` Darrick J. Wong
  0 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2020-02-21 16:26 UTC (permalink / raw)
  To: David Howells
  Cc: viro, raven, mszeredi, christian, linux-api, linux-fsdevel, linux-kernel

On Fri, Feb 21, 2020 at 02:43:05PM +0000, David Howells wrote:
> Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > > +	memcpy(ctx->buffer, es->s_volume_name, sizeof(es->s_volume_name));
> > 
> > Shouldn't this be checking that ctx->buffer is large enough to hold
> > s_volume_name?
> 
> Well, the buffer is guaranteed to be 4KiB in size.

Ah, ok.

> > > +	return strlen(ctx->buffer);
> > 
> > s_volume_name is /not/ a null-terminated string if the label is 16
> > characters long.
> 
> And the buffer is precleared, so it's automatically NULL terminated.

<nod>

> > > +#define FSINFO_ATTR_EXT4_TIMESTAMPS	0x400	/* Ext4 superblock timestamps */
> > 
> > I guess each filesystem gets ... 256 different attrs, and the third
> > nibble determines the namespace?
> 
> No.  Think of it as allocating namespace in 256-number blocks.  That means
> there are 16 million of them.  If a filesystem uses up an entire block, it can
> always allocate another one.  I don't think it likely that we'll get
> sufficient filesystems to eat them all.

Ah.  In that case I declare that we would like to reserve 0x5800-0x58FF
for XFS. :)

--D

> David
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]
  2020-02-21 14:23   ` David Howells
  2020-02-21 15:44     ` Jann Horn
@ 2020-02-21 16:33     ` David Howells
  2020-02-21 16:41       ` Jann Horn
  2020-02-21 17:11       ` David Howells
  1 sibling, 2 replies; 67+ messages in thread
From: David Howells @ 2020-02-21 16:33 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> (And as in the other case, the s->s_count increment will probably have
> to be moved above the add_watch_to_object(), unless you hold the
> sb_lock around it?)

It shouldn't matter as I'm holding s->s_umount across the add and increment.
That prevents the watch from being removed: watch_sb() would have to get the
lock first to do that.  It also deactivate_locked_super() from removing all
the watchers.

I can move it before, but I probably have to drop s_umount before I can call
put_super().

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]
  2020-02-21 16:33     ` David Howells
@ 2020-02-21 16:41       ` Jann Horn
  2020-02-21 17:11       ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: Jann Horn @ 2020-02-21 16:41 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Fri, Feb 21, 2020 at 5:33 PM David Howells <dhowells@redhat.com> wrote:
> Jann Horn <jannh@google.com> wrote:
>
> > (And as in the other case, the s->s_count increment will probably have
> > to be moved above the add_watch_to_object(), unless you hold the
> > sb_lock around it?)
>
> It shouldn't matter as I'm holding s->s_umount across the add and increment.
> That prevents the watch from being removed: watch_sb() would have to get the
> lock first to do that.  It also deactivate_locked_super() from removing all
> the watchers.

Can't the same thing I already pointed out on "[PATCH 13/19] vfs: Add
a mount-notification facility [ver #16]" also happen here?

If another thread concurrently runs close(watch_fd) before the
spin_lock(&sb_lock), pipe_release -> put_pipe_info -> free_pipe_info
-> watch_queue_clear will run, correct? And then watch_queue_clear()
will find the watch that we've just created and call its
->release_watch() handler, which causes put_super(), potentially
dropping the refcount to zero? And then stuff will blow up.

> I can move it before, but I probably have to drop s_umount before I can call
> put_super().

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 13/19] vfs: Add a mount-notification facility [ver #16]
  2020-02-21 12:24   ` David Howells
  2020-02-21 15:49     ` Jann Horn
@ 2020-02-21 17:06     ` David Howells
  2020-02-21 17:36       ` seq_lock and lockdep_is_held() assertions Jann Horn
  1 sibling, 1 reply; 67+ messages in thread
From: David Howells @ 2020-02-21 17:06 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> > What's the best way to write a lockdep assertion?
> >
> >         BUG_ON(!lockdep_is_held(lock));
> 
> lockdep_assert_held(lock) is the normal way, I think - that will
> WARN() if lockdep is enabled and the lock is not held.

Okay.  But what's the best way with a seqlock_t?  It has two dep maps in it.
Do I just ignore the one attached to the spinlock?

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]
  2020-02-21 16:33     ` David Howells
  2020-02-21 16:41       ` Jann Horn
@ 2020-02-21 17:11       ` David Howells
  1 sibling, 0 replies; 67+ messages in thread
From: David Howells @ 2020-02-21 17:11 UTC (permalink / raw)
  To: Jann Horn
  Cc: dhowells, Al Viro, raven, Miklos Szeredi, Christian Brauner,
	Linux API, linux-fsdevel, kernel list

Jann Horn <jannh@google.com> wrote:

> If another thread concurrently runs close(watch_fd)

Fair point.  We have the watch queue pinned, but watch_queue_clear() is called
before the ref is released.

David


^ permalink raw reply	[flat|nested] 67+ messages in thread

* seq_lock and lockdep_is_held() assertions
  2020-02-21 17:06     ` David Howells
@ 2020-02-21 17:36       ` Jann Horn
  2020-02-21 18:02         ` John Stultz
  0 siblings, 1 reply; 67+ messages in thread
From: Jann Horn @ 2020-02-21 17:36 UTC (permalink / raw)
  To: David Howells, John Stultz, Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: Al Viro, raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

adding some locking folks to the thread...

On Fri, Feb 21, 2020 at 6:06 PM David Howells <dhowells@redhat.com> wrote:
> Jann Horn <jannh@google.com> wrote:
> > On Fri, Feb 21, 2020 at 1:24 PM David Howells <dhowells@redhat.com> wrote:
> > > What's the best way to write a lockdep assertion?
> > >
> > >         BUG_ON(!lockdep_is_held(lock));
> >
> > lockdep_assert_held(lock) is the normal way, I think - that will
> > WARN() if lockdep is enabled and the lock is not held.
>
> Okay.  But what's the best way with a seqlock_t?  It has two dep maps in it.
> Do I just ignore the one attached to the spinlock?

Uuuh... very good question. Looking at how the seqlock_t helpers use
the dep map of the seqlock, I don't think lockdep asserts work for
asserting that you're in the read side of a seqlock?

read_seqbegin_or_lock() -> read_seqbegin() -> read_seqcount_begin() ->
seqcount_lockdep_reader_access() does seqcount_acquire_read() (which
maps to lock_acquire_shared_recursive()), but immediately following
that calls seqcount_release() (which maps to lock_release())?

So I think lockdep won't consider you to be holding any locks after
read_seqbegin_or_lock() if the lock wasn't taken?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: seq_lock and lockdep_is_held() assertions
  2020-02-21 17:36       ` seq_lock and lockdep_is_held() assertions Jann Horn
@ 2020-02-21 18:02         ` John Stultz
  0 siblings, 0 replies; 67+ messages in thread
From: John Stultz @ 2020-02-21 18:02 UTC (permalink / raw)
  To: Jann Horn
  Cc: David Howells, Peter Zijlstra, Ingo Molnar, Will Deacon, Al Viro,
	raven, Miklos Szeredi, Christian Brauner, Linux API,
	linux-fsdevel, kernel list

On Fri, Feb 21, 2020 at 9:36 AM Jann Horn <jannh@google.com> wrote:
>
> adding some locking folks to the thread...
>
> On Fri, Feb 21, 2020 at 6:06 PM David Howells <dhowells@redhat.com> wrote:
> > Jann Horn <jannh@google.com> wrote:
> > > On Fri, Feb 21, 2020 at 1:24 PM David Howells <dhowells@redhat.com> wrote:
> > > > What's the best way to write a lockdep assertion?
> > > >
> > > >         BUG_ON(!lockdep_is_held(lock));
> > >
> > > lockdep_assert_held(lock) is the normal way, I think - that will
> > > WARN() if lockdep is enabled and the lock is not held.
> >
> > Okay.  But what's the best way with a seqlock_t?  It has two dep maps in it.
> > Do I just ignore the one attached to the spinlock?
>
> Uuuh... very good question. Looking at how the seqlock_t helpers use
> the dep map of the seqlock, I don't think lockdep asserts work for
> asserting that you're in the read side of a seqlock?
>
> read_seqbegin_or_lock() -> read_seqbegin() -> read_seqcount_begin() ->
> seqcount_lockdep_reader_access() does seqcount_acquire_read() (which
> maps to lock_acquire_shared_recursive()), but immediately following
> that calls seqcount_release() (which maps to lock_release())?
>
> So I think lockdep won't consider you to be holding any locks after
> read_seqbegin_or_lock() if the lock wasn't taken?

Yea. It's a bit foggy now, but the main concern at the time was
wanting to catch seqlock readers that happened under a writer which
was a common cause of deadlocks between the timekeeping core and stuff
like printks (or anything we called out that might try to read the
time).

I think it was because writers can properly interrupt readers, we
couldn't hold the depmap across the read critical section? That's why
we just take and release the depmap, since that will still catch any
reads made while holding the write, which would deadlock.

But take that with a grain of salt, as its been awhile.

thanks
-john

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/19] fsinfo: Add syscalls to other arches [ver #16]
  2020-02-21 14:51   ` Christian Brauner
@ 2020-02-21 18:10     ` Geert Uytterhoeven
  0 siblings, 0 replies; 67+ messages in thread
From: Geert Uytterhoeven @ 2020-02-21 18:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David Howells, Al Viro, Ian Kent, mszeredi, Christian Brauner,
	Linux API, Linux FS Devel, Linux Kernel Mailing List

On Fri, Feb 21, 2020 at 3:52 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> On Tue, Feb 18, 2020 at 05:05:13PM +0000, David Howells wrote:
> > Add the fsinfo syscall to the other arches.
>
> Over the last couple of kernel releases we have established the
> convention that we wire-up a syscall for all arches at the same time and
> not e.g. x86 and then the other separately.

Indeed.

> > Signed-off-by: David Howells <dhowells@redhat.com>
> > ---
> >
> >  arch/alpha/kernel/syscalls/syscall.tbl      |    1 +
> >  arch/arm/tools/syscall.tbl                  |    1 +
> >  arch/arm64/include/asm/unistd.h             |    2 +-
> >  arch/arm64/include/asm/unistd32.h           |    2 +-
> >  arch/ia64/kernel/syscalls/syscall.tbl       |    1 +
> >  arch/m68k/kernel/syscalls/syscall.tbl       |    2 ++

Suspicious diffstat...

> >  arch/microblaze/kernel/syscalls/syscall.tbl |    1 +
> >  arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 +
> >  arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 +
> >  arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 +
> >  arch/parisc/kernel/syscalls/syscall.tbl     |    1 +
> >  arch/powerpc/kernel/syscalls/syscall.tbl    |    1 +
> >  arch/s390/kernel/syscalls/syscall.tbl       |    1 +
> >  arch/sh/kernel/syscalls/syscall.tbl         |    1 +
> >  arch/sparc/kernel/syscalls/syscall.tbl      |    1 +
> >  arch/xtensa/kernel/syscalls/syscall.tbl     |    1 +
> >  16 files changed, 17 insertions(+), 2 deletions(-)

> > --- a/arch/m68k/kernel/syscalls/syscall.tbl
> > +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> > @@ -437,3 +437,5 @@
> >  435  common  clone3                          __sys_clone3
> >  437  common  openat2                         sys_openat2
> >  438  common  pidfd_getfd                     sys_pidfd_getfd
> > +# 435 reserved for clone3

Indeed, bogus line above (bad rebase?).

> > +439  common  fsinfo                          sys_fsinfo

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2020-02-21 18:10 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
2020-02-19 16:31   ` Darrick J. Wong
2020-02-19 20:07   ` Jann Horn
2020-02-20 10:34   ` David Howells
2020-02-20 15:48     ` Darrick J. Wong
2020-02-20 11:03   ` David Howells
2020-02-20 14:54     ` Jann Horn
2020-02-20 15:31       ` Darrick J. Wong
2020-02-18 17:05 ` [PATCH 02/19] fsinfo: Add syscalls to other arches " David Howells
2020-02-21 14:51   ` Christian Brauner
2020-02-21 18:10     ` Geert Uytterhoeven
2020-02-18 17:05 ` [PATCH 03/19] fsinfo: Provide a bitmap of supported features " David Howells
2020-02-19 16:37   ` Darrick J. Wong
2020-02-20 12:22   ` David Howells
2020-02-18 17:05 ` [PATCH 04/19] vfs: Add mount change counter " David Howells
2020-02-21 14:48   ` Christian Brauner
2020-02-18 17:05 ` [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID " David Howells
2020-02-19 16:53   ` Darrick J. Wong
2020-02-20 12:45   ` David Howells
2020-02-18 17:05 ` [PATCH 06/19] vfs: Allow fsinfo() to look up a mount object by " David Howells
2020-02-21 15:09   ` Christian Brauner
2020-02-18 17:05 ` [PATCH 07/19] vfs: Allow mount information to be queried by fsinfo() " David Howells
2020-02-18 17:05 ` [PATCH 08/19] vfs: fsinfo sample: Mount listing program " David Howells
2020-02-18 17:06 ` [PATCH 09/19] fsinfo: Allow the mount topology propogation flags to be retrieved " David Howells
2020-02-18 17:06 ` [PATCH 10/19] fsinfo: Add API documentation " David Howells
2020-02-18 17:06 ` [PATCH 11/19] afs: Support fsinfo() " David Howells
2020-02-19 21:01   ` Jann Horn
2020-02-20 12:58   ` David Howells
2020-02-20 14:58     ` Jann Horn
2020-02-21 13:26     ` David Howells
2020-02-18 17:06 ` [PATCH 12/19] security: Add hooks to rule on setting a superblock or mount watch " David Howells
2020-02-18 17:06 ` [PATCH 13/19] vfs: Add a mount-notification facility " David Howells
2020-02-19 22:40   ` Jann Horn
2020-02-19 22:55     ` Jann Horn
2020-02-21 12:24   ` David Howells
2020-02-21 15:49     ` Jann Horn
2020-02-21 17:06     ` David Howells
2020-02-21 17:36       ` seq_lock and lockdep_is_held() assertions Jann Horn
2020-02-21 18:02         ` John Stultz
2020-02-18 17:06 ` [PATCH 14/19] notifications: sample: Display mount tree change notifications [ver #16] David Howells
2020-02-18 17:06 ` [PATCH 15/19] vfs: Add superblock " David Howells
2020-02-19 23:08   ` Jann Horn
2020-02-21 14:23   ` David Howells
2020-02-21 15:44     ` Jann Horn
2020-02-21 16:33     ` David Howells
2020-02-21 16:41       ` Jann Horn
2020-02-21 17:11       ` David Howells
2020-02-18 17:06 ` [PATCH 16/19] fsinfo: Provide superblock notification counter " David Howells
2020-02-18 17:07 ` [PATCH 17/19] notifications: sample: Display superblock notifications " David Howells
2020-02-18 17:07 ` [PATCH 18/19] ext4: Add example fsinfo information " David Howells
2020-02-19 17:04   ` Darrick J. Wong
2020-02-20  0:53   ` kbuild test robot
2020-02-21 14:43   ` David Howells
2020-02-21 16:26     ` Darrick J. Wong
2020-02-18 17:07 ` [PATCH 19/19] nfs: Add example filesystem " David Howells
2020-02-20  2:13   ` kbuild test robot
2020-02-20  2:20   ` kbuild test robot
2020-02-18 18:12 ` David Howells
2020-02-19 10:23 ` [PATCH 00/19] VFS: Filesystem information and notifications " Stefan Metzmacher
2020-02-19 14:46 ` Christian Brauner
2020-02-19 15:50   ` Darrick J. Wong
2020-02-20  4:42   ` Ian Kent
2020-02-20  9:09     ` Christian Brauner
2020-02-20 11:30       ` Ian Kent
2020-02-19 16:16 ` David Howells
2020-02-21 12:57 ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).