linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/18] VFS: Filesystem information [ver #21]
@ 2020-08-03 13:36 David Howells
  2020-08-03 13:36 ` [PATCH 01/18] fsinfo: Introduce a non-repeating system-unique superblock ID " David Howells
                   ` (19 more replies)
  0 siblings, 20 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:36 UTC (permalink / raw)
  To: viro
  Cc: Theodore Ts'o, Andreas Dilger, Eric Biggers, Jeff Layton,
	linux-ext4, Carlos Maiolino, Darrick J. Wong, linux-api,
	dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel


Here's a set of patches that adds a system call, fsinfo(), that allows
information about the VFS, mount topology, superblock and files to be
retrieved.

The patchset is based on top of the notifications patchset and allows event
counters implemented in the latter to be retrieved to allow overruns to be
efficiently managed.


=======
THE WHY
=======

Why do we want this?

Using /proc/mounts (or similar) has problems:

 (1) Reading from it holds a global lock (namespace_sem) that prevents
     mounting and unmounting.  Lots of data is encoded and mangled into
     text whilst the lock is held, including superblock option strings and
     mount point paths.  This causes performance problems when there are a
     lot of mount objects in a system.

 (2) Even though namespace_sem is held during a read, reading the whole
     file isn't necessarily atomic with respect to mount-type operations.
     If a read isn't satisfied in one go, then it may return to userspace
     briefly and then continue reading some way into the file.  But changes
     can occur in the interval that may then go unseen.

 (3) Determining what has changed means parsing and comparing consecutive
     outputs of /proc/mounts.

 (4) Querying a specific mount or superblock means searching through
     /proc/mounts and searching by path or mount ID - but we might have an
     fd we want to query.

 (5) Whilst you can poll() it for events, it only tells you that something
     changed in the namespace, not what or whether you can even see the
     change.

To fix the notification issues, the preceding notifications patchset added
mount watch notifications whereby you can watch for notifications in a
specific mount subtree.  The notification messages include the ID(s) of the
affected mounts.

To support notifications, however, we need to be able to handle overruns in
the notification queue.  I added a number of event counters to struct
super_block and struct mount to allow you to pin down the changes, but
there needs to be a way to retrieve them.  Exposing them through /proc
would require adding yet another /proc/mounts-type file.  We could add
per-mount directories full of attributes in sysfs, but that has issues also
(see below).

Adding an extensible system call interface for retrieving filesystem
information also allows other things to be exposed:

 (1) Jeff Layton's error handling changes need a way to allow error event
     information to be retrieved.

 (2) Bits in masks returned by things like statx() and FS_IOC_GETFLAGS are
     actually 3-state { Set, Unset, Not supported }.  It could be useful to
     provide a way to expose information like this[*].

 (3) Limits of the numerical metadata values in a filesystem[*].

 (4) Filesystem capability information[*].  Filesystems don't all have the
     same capabilities, and even different instances may have different
     capabilities, particularly with network filesystems where the set of
     may be server-dependent.  Capabilities might even vary at file
     granularity - though possibly such information should be conveyed
     through statx() instead.

 (5) ID mapping/shifting tables in use for a superblock.

 (6) Filesystem-specific information.  I need something for AFS so that I
     can do pioctl()-emulation, thereby allowing me to implement certain of
     the AFS command line utilities that query state of a particular file.
     This could also have application for other filesystems, such as NFS,
     CIFS and ext4.

 [*] In a lot of cases these are probably invariant and can be memcpy'd
     from static data.

There's a further consideration: I want to make it possible to have
fsconfig(fd, FSCONFIG_CMD_CREATE) be intercepted by a container manager
such that the manager can supervise a mount attempted inside the container.
The manager would be given an fd pointing to the fs_context struct and
would then need some way to query it (fsinfo()) and modify it (fsconfig()).
This could also be used to arbitrate user-requested mounts when containers
are not in play.


================
DESIGN DECISIONS
================

 (1) Information is partitioned into sets of attributes.

 (2) Attribute IDs are integers as they're fast to compare.

 (3) Attribute values are typed (struct, list of structs, string, opaque
     blob).  They type is fixed for a particular attribute.

 (4) For structure types, the length is also a version.  New fields can be
     tacked onto the end.

 (5) When copying a versioned struct to userspace, the core handles a
     version mismatch by truncating or zero-padding the data as necessary.
     This is transparent to the filesystem.

 (6) The core handles all the buffering and buffer resizing.

 (7) The filesystem never gets any access to the userspace parameter buffer
     or result buffer.

 (8) "Meta" attributes can describe other attributes.


========
OVERVIEW
========

fsinfo() is a system call that allows information about the filesystem at a
particular path point to be queried as a set of attributes.

Attribute values are of four basic types:

 (1) Structure with version-dependent length (the length is the version).

 (2) Variable-length string.

 (3) List of structures (all the same length).

 (4) Opaque blob.

Attributes can have multiple values either as a sequence of values or a
sequence-of-sequences of values and all the values of a particular
attribute must be of the same type.  Values can be up to INT_MAX size,
subject to memory availability.

Note that the values of an attribute *are* allowed to vary between dentries
within a single superblock, depending on the specific dentry that you're
looking at, but the values still have to be of the type for that attribute.

I've tried to make the interface as light as possible, so integer attribute
IDs rather than string and the core does all the buffer allocation and
expansion and all the extensibility support work rather than leaving that
to the filesystems.  This also means that userspace pointers are not
exposed to the filesystem.


fsinfo() allows a variety of information to be retrieved about a filesystem
and the mount topology:

 (1) General superblock attributes:

     - Filesystem identifiers (UUID, volume label, device numbers, ...)
     - The limits on a filesystem's capabilities
     - Information on supported statx fields and attributes and IOC flags.
     - A variety single-bit flags indicating supported capabilities.
     - Timestamp resolution and range.
     - The amount of space/free space in a filesystem (as statfs()).
     - Superblock notification counter.

 (2) Filesystem-specific superblock attributes:

     - Superblock-level timestamps.
     - Cell name, workgroup or other netfs grouping concept.
     - Server names and addresses.

 (3) VFS information:

     - Mount topology information.
     - Mount attributes.
     - Mount notification counter.
     - Mount point path.

 (4) Information about what the fsinfo() syscall itself supports, including
     the type and struct size of attributes.

The system is extensible:

 (1) New attributes can be added.  There is no requirement that a
     filesystem implement every attribute.  A helper function is provided
     to scan a list of attributes and a filesystem can have multiple such
     lists.

 (2) Version length-dependent structure attributes can be made larger and
     have additional information tacked on the end, provided it keeps the
     layout of the existing fields.  If an older process asks for a shorter
     structure, it will only be given the bits it asks for.  If a newer
     process asks for a longer structure on an older kernel, the extra
     space will be set to 0.  In all cases, the size of the data actually
     available is returned.

     In essence, the size of a structure is that structure's version: a
     smaller size is an earlier version and a later version includes
     everything that the earlier version did.

 (3) New single-bit capability flags can be added.  This is a structure-typed
     attribute and, as such, (2) applies.  Any bits you wanted but the kernel
     doesn't support are automatically set to 0.

fsinfo() may be called like the following, for example:

	struct fsinfo_params params = {
		.at_flags	= AT_SYMLINK_NOFOLLOW,
		.flags		= FSINFO_FLAGS_QUERY_PATH,
		.request	= FSINFO_ATTR_AFS_SERVER_ADDRESSES,
		.Nth		= 2,
	};
	struct fsinfo_server_address address;
	len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc",
		     &params, sizeof(params),
		     &address, sizeof(address));

The above example would query an AFS filesystem to retrieve the address
list for the 3rd server, and:

	struct fsinfo_params params = {
		.at_flags	= AT_SYMLINK_NOFOLLOW,
		.flags		= FSINFO_FLAGS_QUERY_PATH,
		.request	= FSINFO_ATTR_NFS_SERVER_NAME;
	};
	char server_name[256];
	len = fsinfo(AT_FDCWD, "/home/dhowells/",
		     &params, sizeof(params),
		     &server_name, sizeof(server_name));

would retrieve the name of the NFS server as a string.

In future, I want to make fsinfo() capable of querying a context created by
fsopen() or fspick(), e.g.:

	fd = fsopen("ext4", 0);
	struct fsinfo_params params = {
		.flags		= FSINFO_FLAGS_QUERY_FSCONTEXT,
		.request	= FSINFO_ATTR_CONFIGURATION;
	};
	char buffer[65536];
	fsinfo(fd, NULL, &params, sizeof(params), &buffer, sizeof(buffer));

even if that context doesn't currently have a superblock attached.

The patches can be found here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git

on branch:

	fsinfo-core


===================
SIGNIFICANT CHANGES
===================

 ver #21:

 (*) Moved the mount event counters here from the mount notifications
     patchset.  Made the counters in the kernel atomic_long_t and the UAPI
     counters __u64.

 (*) Added Jeff Layton's patches to allow userspace to retrieve writeback
     error information through fsinfo().

 ver #20:

 (*) Changed MOUNT_PROPAGATION_SLAVE to MOUNT_PROPAGATION_DEPENDENT and
     renamed the fields in the fsinfo_mount_topology struct.  The
     MOUNT_PROPAGATION_* settings have been turned into an enum and will
     also be passed to mount_setattr().

 (*) Adjusted the Ext4 patch from feedback and removed the example status
     from it.

 (*) Dropped the NFS patch.

 (*) I've dropped the superblock notifications for now.

 ver #19:

 (*) Split FSINFO_ATTR_MOUNT_TOPOLOGY from FSINFO_ATTR_MOUNT_INFO.  The
     latter requires no locking as it looks no further than the mount
     object it's dealing with.  The topology attribute, however, has to
     take the namespace lock.  That said, the info attribute includes a
     counter that indicates how many times a mount object's position in the
     topology has changed.

 (*) A bit of patch rearrangement to put the mount topology-exposing
     attributes into one patch.

 (*) Pass both AT_* and RESOLVE_* flags to fsinfo() as suggested by Linus,
     rather than adding missing RESOLVE_* flags.

David
---
David Howells (15):
      fsinfo: Introduce a non-repeating system-unique superblock ID
      fsinfo: Add fsinfo() syscall to query filesystem information
      fsinfo: Provide a bitmap of the features a filesystem supports
      fsinfo: Allow retrieval of superblock devname, options and stats
      fsinfo: Allow fsinfo() to look up a mount object by ID
      fsinfo: Add a uniquifier ID to struct mount
      fsinfo: Allow mount information to be queried
      fsinfo: Allow mount topology and propagation info to be retrieved
      watch_queue: Mount event counters
      fsinfo: Provide notification overrun handling support
      fsinfo: sample: Mount listing program
      fsinfo: Add API documentation
      fsinfo: Add support for AFS
      fsinfo: Add support to ext4
      fsinfo: Add an attribute that lists all the visible mounts in a namespace

Jeff Layton (3):
      errseq: add a new errseq_scrape function
      vfs: allow fsinfo to fetch the current state of s_wb_err
      samples: add error state information to test-fsinfo.c


 Documentation/filesystems/fsinfo.rst        | 574 +++++++++++++
 arch/alpha/kernel/syscalls/syscall.tbl      |   1 +
 arch/arm/tools/syscall.tbl                  |   1 +
 arch/arm64/include/asm/unistd.h             |   2 +-
 arch/arm64/include/asm/unistd32.h           |   2 +
 arch/ia64/kernel/syscalls/syscall.tbl       |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl       |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl     |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    |   1 +
 arch/s390/kernel/syscalls/syscall.tbl       |   1 +
 arch/sh/kernel/syscalls/syscall.tbl         |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl      |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl      |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl      |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     |   1 +
 fs/Kconfig                                  |   7 +
 fs/Makefile                                 |   1 +
 fs/afs/internal.h                           |   1 +
 fs/afs/super.c                              | 216 ++++-
 fs/d_path.c                                 |   2 +-
 fs/ext4/Makefile                            |   1 +
 fs/ext4/ext4.h                              |   6 +
 fs/ext4/fsinfo.c                            |  97 +++
 fs/ext4/super.c                             |   3 +
 fs/fsinfo.c                                 | 748 +++++++++++++++++
 fs/internal.h                               |  15 +
 fs/mount.h                                  |   6 +
 fs/mount_notify.c                           |  10 +-
 fs/namespace.c                              | 427 +++++++++-
 include/linux/errseq.h                      |   1 +
 include/linux/fs.h                          |   4 +
 include/linux/fsinfo.h                      | 112 +++
 include/linux/syscalls.h                    |   4 +
 include/uapi/asm-generic/unistd.h           |   4 +-
 include/uapi/linux/fsinfo.h                 | 344 ++++++++
 include/uapi/linux/mount.h                  |  13 +-
 kernel/sys_ni.c                             |   1 +
 lib/errseq.c                                |  33 +-
 samples/vfs/Makefile                        |   6 +-
 samples/vfs/test-fsinfo.c                   | 883 ++++++++++++++++++++
 samples/vfs/test-mntinfo.c                  | 277 ++++++
 45 files changed, 3802 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/filesystems/fsinfo.rst
 create mode 100644 fs/ext4/fsinfo.c
 create mode 100644 fs/fsinfo.c
 create mode 100644 include/linux/fsinfo.h
 create mode 100644 include/uapi/linux/fsinfo.h
 create mode 100644 samples/vfs/test-fsinfo.c
 create mode 100644 samples/vfs/test-mntinfo.c



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/18] fsinfo: Introduce a non-repeating system-unique superblock ID [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
@ 2020-08-03 13:36 ` David Howells
  2020-08-04  9:34   ` Miklos Szeredi
  2020-08-03 13:36 ` [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information " David Howells
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: David Howells @ 2020-08-03 13:36 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Introduce an (effectively) non-repeating system-unique superblock ID that
can be used to determine that two objects are in the same superblock
without needing to worry about the ID changing in the meantime (as is
possible with device IDs).

The counter could also be used to tag other features, such as mount
objects.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/internal.h      |    1 +
 fs/super.c         |    2 ++
 include/linux/fs.h |    3 +++
 3 files changed, 6 insertions(+)

diff --git a/fs/internal.h b/fs/internal.h
index 9b863a7bd708..ea60d864a8cb 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -103,6 +103,7 @@ extern struct file *alloc_empty_file_noaccount(int, const struct cred *);
 /*
  * super.c
  */
+extern atomic64_t vfs_unique_counter;
 extern int reconfigure_super(struct fs_context *);
 extern bool trylock_super(struct super_block *sb);
 extern struct super_block *user_get_super(dev_t);
diff --git a/fs/super.c b/fs/super.c
index 904459b35119..21ae8afeba3a 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -44,6 +44,7 @@ static int thaw_super_locked(struct super_block *sb);
 
 static LIST_HEAD(super_blocks);
 static DEFINE_SPINLOCK(sb_lock);
+atomic64_t vfs_unique_counter; /* Unique identifier counter */
 
 static char *sb_writers_name[SB_FREEZE_LEVELS] = {
 	"sb_writers",
@@ -273,6 +274,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 		goto fail;
 	if (list_lru_init_memcg(&s->s_inode_lru, &s->s_shrink))
 		goto fail;
+	s->s_unique_id = atomic64_inc_return(&vfs_unique_counter);
 	return s;
 
 fail:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f5abba86107d..28a29356eace 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1564,6 +1564,9 @@ struct super_block {
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
+
+	/* Superblock information */
+	u64			s_unique_id;
 } __randomize_layout;
 
 /* Helper functions so that in most cases filesystems will



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
  2020-08-03 13:36 ` [PATCH 01/18] fsinfo: Introduce a non-repeating system-unique superblock ID " David Howells
@ 2020-08-03 13:36 ` David Howells
  2020-08-04 10:16   ` Miklos Szeredi
                     ` (2 more replies)
  2020-08-03 13:36 ` [PATCH 03/18] fsinfo: Provide a bitmap of the features a filesystem supports " David Howells
                   ` (17 subsequent siblings)
  19 siblings, 3 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:36 UTC (permalink / raw)
  To: viro
  Cc: linux-api, dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add a system call to allow filesystem information to be queried.  A request
value can be given to indicate the desired attribute.  Support is provided
for enumerating multi-value attributes.

===============
NEW SYSTEM CALL
===============

The new system call looks like:

	int ret = fsinfo(int dfd,
			 const char *pathname,
			 const struct fsinfo_params *params,
			 size_t params_size,
			 void *result_buffer,
			 size_t result_buf_size);

The params parameter optionally points to a block of parameters:

	struct fsinfo_params {
		__u64	resolve_flags;
		__u32	at_flags;
		__u32	flags;
		__u32	request;
		__u32	Nth;
		__u32	Mth;
	};

If params is NULL, the default is that params->request is
FSINFO_ATTR_STATFS and all the other fields are 0.  params_size indicates
the size of the parameter struct.  If the parameter block is short compared
to what the kernel expects, the missing length will be set to 0; if the
parameter block is longer, an error will be given if the excess is not all
zeros.

The object to be queried is specified as follows - part param->flags
indicates the type of reference:

 (1) FSINFO_FLAGS_QUERY_PATH - dfd, pathname and at_flags indicate a
     filesystem object to query.

     There is no separate system call providing an analogue of lstat() -
     AT_SYMLINK_NOFOLLOW should be set in at_flags instead.
     AT_NO_AUTOMOUNT can also be used to an allow automount point to be
     queried without triggering it.

     RESOLVE_* flags can also be set in resolve_flags to further restrict
     the patchwalk.

 (2) FSINFO_FLAGS_QUERY_FD - dfd indicates a file descriptor pointing to
     the filesystem object to query.  pathname should be NULL.

 (3) FSINFO_FLAGS_QUERY_MOUNT - pathname indicates the numeric ID of the
     mountpoint to query as a string.  dfd is used to constrain which
     mounts can be accessed.  If dfd is AT_FDCWD, the mount must be within
     the subtree rooted at chroot, otherwise the mount must be within the
     subtree rooted at the directory specified by dfd.

 (4) In the future FSINFO_FLAGS_QUERY_FSCONTEXT will be added - dfd will
     indicate a context handle fd obtained from fsopen() or fspick(),
     allowing that to be queried before the target superblock is attached
     to the filesystem or even created.

params->request indicates the attribute/attributes to be queried.  This can
be one of:

	FSINFO_ATTR_STATFS		- statfs-style info
	FSINFO_ATTR_IDS			- Filesystem IDs
	FSINFO_ATTR_LIMITS		- Filesystem limits
	FSINFO_ATTR_SUPPORTS		- Support for statx, ioctl, etc.
	FSINFO_ATTR_TIMESTAMP_INFO	- Inode timestamp info
	FSINFO_ATTR_VOLUME_ID		- Volume ID (string)
	FSINFO_ATTR_VOLUME_UUID		- Volume UUID
	FSINFO_ATTR_VOLUME_NAME		- Volume name (string)
	FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO - Information about attr Nth
	FSINFO_ATTR_FSINFO_ATTRIBUTES	- List of supported attrs

Some attributes (such as the servers backing a network filesystem) can have
multiple values.  These can be enumerated by setting params->Nth and
params->Mth to 0, 1, ... until ENODATA is returned.

result_buffer and result_buf_size point to the reply buffer.  The buffer is
filled up to the specified size, even if this means truncating the reply.
The size of the full reply is returned, irrespective of the amount data
that was copied.  In future versions, this will allow extra fields to be
tacked on to the end of the reply, but anyone not expecting them will only
get the subset they're expecting.  If either buffer of result_buf_size are
0, no copy will take place and the data size will be returned.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
---

 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 
 arch/arm64/include/asm/unistd32.h           |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 fs/Kconfig                                  |    7 
 fs/Makefile                                 |    1 
 fs/fsinfo.c                                 |  596 +++++++++++++++++++++++++
 include/linux/fs.h                          |    4 
 include/linux/fsinfo.h                      |   74 +++
 include/linux/syscalls.h                    |    4 
 include/uapi/asm-generic/unistd.h           |    4 
 include/uapi/linux/fsinfo.h                 |  189 ++++++++
 kernel/sys_ni.c                             |    1 
 samples/vfs/Makefile                        |    2 
 samples/vfs/test-fsinfo.c                   |  646 +++++++++++++++++++++++++++
 29 files changed, 1545 insertions(+), 3 deletions(-)
 create mode 100644 fs/fsinfo.c
 create mode 100644 include/linux/fsinfo.h
 create mode 100644 include/uapi/linux/fsinfo.h
 create mode 100644 samples/vfs/test-fsinfo.c

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index b6cf8403da35..984abd1ac058 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	faccessat2			sys_faccessat2
 550	common	watch_mount			sys_watch_mount
+551	common	fsinfo				sys_fsinfo
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 27cc1f53f4a0..bd791f91f5bb 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -453,3 +453,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index b3b2019f8d16..86a9d7b3eabe 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		441
+#define __NR_compat_syscalls		442
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 4f9cf98cdf0f..bd78eb2c487a 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -887,6 +887,8 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
 #define __NR_watch_mount 440
 __SYSCALL(__NR_watch_mount, sys_watch_mount)
+#define __NR_fsinfo 441
+__SYSCALL(__NR_fsinfo, sys_fsinfo)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index fc6d87903781..09d144487b7d 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -360,3 +360,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index c671aa0e4d25..1bdc26af3c54 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 65cc53f129ef..fb8543122904 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -445,3 +445,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 7f034a239930..b8362bd6bd4a 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -378,3 +378,4 @@
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	faccessat2			sys_faccessat2
 440	n32	watch_mount			sys_watch_mount
+441	n32	fsinfo				sys_fsinfo
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index d39b90de3642..60ca4091d378 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -354,3 +354,4 @@
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	faccessat2			sys_faccessat2
 440	n64	watch_mount			sys_watch_mount
+441	n64	fsinfo				sys_fsinfo
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 09f426cb45b1..07aea9379ca0 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -427,3 +427,4 @@
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	faccessat2			sys_faccessat2
 440	o32	watch_mount			sys_watch_mount
+441	o32	fsinfo				sys_fsinfo
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 52ff3454baa1..f8060767f11a 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -437,3 +437,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 10b7ed3c7a1b..3036bf1336d2 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -529,3 +529,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 86f317bf52df..c0a111fdb3ce 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
 440	common	watch_mount		sys_watch_mount			sys_watch_mount
+441	common	fsinfo			sys_fsinfo			sys_fsinfo
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 0bb0f0b372c7..03b55c32441f 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 369ab65c1e9a..a0144db9fb8c 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -485,3 +485,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index e760ba92c58d..edf90a2be0b9 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -444,3 +444,4 @@
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
 440	i386	watch_mount		sys_watch_mount
+441	i386	fsinfo			sys_fsinfo
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5b58621d4f75..ab0eda639d67 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -361,6 +361,7 @@
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
 440	common	watch_mount		sys_watch_mount
+441	common	fsinfo			sys_fsinfo
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 5b28ee39f70f..979013890caf 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -410,3 +410,4 @@
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
 440	common	watch_mount			sys_watch_mount
+441	common	fsinfo				sys_fsinfo
diff --git a/fs/Kconfig b/fs/Kconfig
index 1a55e56d5c54..df76451ab49a 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -15,6 +15,13 @@ config VALIDATE_FS_PARSER
 	  Enable this to perform validation of the parameter description for a
 	  filesystem when it is registered.
 
+config FSINFO
+	bool "Enable the fsinfo() system call"
+	help
+	  Enable the file system information querying system call to allow
+	  comprehensive information to be retrieved about a filesystem,
+	  superblock or mount object.
+
 if BLOCK
 
 config FS_IOMAP
diff --git a/fs/Makefile b/fs/Makefile
index dd0d87e2ef19..93a7f8047585 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_COREDUMP)		+= coredump.o
 obj-$(CONFIG_SYSCTL)		+= drop_caches.o
 
 obj-$(CONFIG_FHANDLE)		+= fhandle.o
+obj-$(CONFIG_FSINFO)		+= fsinfo.o
 obj-y				+= iomap/
 
 obj-y				+= quota/
diff --git a/fs/fsinfo.c b/fs/fsinfo.c
new file mode 100644
index 000000000000..7d9c73e9cbde
--- /dev/null
+++ b/fs/fsinfo.c
@@ -0,0 +1,596 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information query.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#include <linux/syscalls.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/statfs.h>
+#include <linux/security.h>
+#include <linux/uaccess.h>
+#include <linux/fsinfo.h>
+#include <uapi/linux/mount.h>
+#include "internal.h"
+
+/**
+ * fsinfo_opaque - Store opaque blob as an fsinfo attribute value.
+ * @s: The blob to store (may be NULL)
+ * @ctx: The parameter context
+ * @len: The length of the blob
+ */
+int fsinfo_opaque(const void *s, struct fsinfo_context *ctx, unsigned int len)
+{
+	void *p = ctx->buffer;
+	int ret = 0;
+
+	if (s) {
+		if (!ctx->want_size_only)
+			memcpy(p, s, len);
+		ret = len;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(fsinfo_opaque);
+
+/**
+ * fsinfo_string - Store a NUL-terminated string as an fsinfo attribute value.
+ * @s: The string to store (may be NULL)
+ * @ctx: The parameter context
+ */
+int fsinfo_string(const char *s, struct fsinfo_context *ctx)
+{
+	if (!s)
+		return 1;
+	return fsinfo_opaque(s, ctx, min_t(size_t, strlen(s) + 1, ctx->buf_size));
+}
+EXPORT_SYMBOL(fsinfo_string);
+
+/*
+ * Get basic filesystem stats from statfs.
+ */
+static int fsinfo_generic_statfs(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_statfs *p = ctx->buffer;
+	struct kstatfs buf;
+	int ret;
+
+	ret = vfs_statfs(path, &buf);
+	if (ret < 0)
+		return ret;
+
+	p->f_blocks.lo	= buf.f_blocks;
+	p->f_bfree.lo	= buf.f_bfree;
+	p->f_bavail.lo	= buf.f_bavail;
+	p->f_files.lo	= buf.f_files;
+	p->f_ffree.lo	= buf.f_ffree;
+	p->f_favail.lo	= buf.f_ffree;
+	p->f_bsize	= buf.f_bsize;
+	p->f_frsize	= buf.f_frsize;
+	return sizeof(*p);
+}
+
+static int fsinfo_generic_ids(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_ids *p = ctx->buffer;
+	struct super_block *sb;
+	struct kstatfs buf;
+	int ret;
+
+	ret = vfs_statfs(path, &buf);
+	if (ret < 0 && ret != -ENOSYS)
+		return ret;
+	if (ret == 0)
+		memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
+
+	sb = path->dentry->d_sb;
+	p->f_fstype	= sb->s_magic;
+	p->f_dev_major	= MAJOR(sb->s_dev);
+	p->f_dev_minor	= MINOR(sb->s_dev);
+	p->f_sb_id	= sb->s_unique_id;
+	strlcpy(p->f_fs_name, sb->s_type->name, sizeof(p->f_fs_name));
+	return sizeof(*p);
+}
+
+int fsinfo_generic_limits(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_limits *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	p->max_file_size.hi	= 0;
+	p->max_file_size.lo	= sb->s_maxbytes;
+	p->max_ino.hi		= 0;
+	p->max_ino.lo		= UINT_MAX;
+	p->max_hard_links	= sb->s_max_links;
+	p->max_uid		= UINT_MAX;
+	p->max_gid		= UINT_MAX;
+	p->max_projid		= UINT_MAX;
+	p->max_filename_len	= NAME_MAX;
+	p->max_symlink_len	= PATH_MAX;
+	p->max_xattr_name_len	= XATTR_NAME_MAX;
+	p->max_xattr_body_len	= XATTR_SIZE_MAX;
+	p->max_dev_major	= 0xffffff;
+	p->max_dev_minor	= 0xff;
+	return sizeof(*p);
+}
+EXPORT_SYMBOL(fsinfo_generic_limits);
+
+int fsinfo_generic_supports(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_supports *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	p->stx_mask = STATX_BASIC_STATS;
+	if (sb->s_d_op && sb->s_d_op->d_automount)
+		p->stx_attributes |= STATX_ATTR_AUTOMOUNT;
+	return sizeof(*p);
+}
+EXPORT_SYMBOL(fsinfo_generic_supports);
+
+static const struct fsinfo_timestamp_info fsinfo_default_timestamp_info = {
+	.atime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.mtime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.ctime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.btime = {
+		.minimum	= S64_MIN,
+		.maximum	= S64_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+};
+
+int fsinfo_generic_timestamp_info(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_timestamp_info *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+	s8 exponent;
+
+	*p = fsinfo_default_timestamp_info;
+
+	if (sb->s_time_gran < 1000000000) {
+		if (sb->s_time_gran < 1000)
+			exponent = -9;
+		else if (sb->s_time_gran < 1000000)
+			exponent = -6;
+		else
+			exponent = -3;
+
+		p->atime.gran_exponent = exponent;
+		p->mtime.gran_exponent = exponent;
+		p->ctime.gran_exponent = exponent;
+		p->btime.gran_exponent = exponent;
+	}
+
+	return sizeof(*p);
+}
+EXPORT_SYMBOL(fsinfo_generic_timestamp_info);
+
+static int fsinfo_generic_volume_uuid(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_volume_uuid *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	memcpy(p, &sb->s_uuid, sizeof(*p));
+	return sizeof(*p);
+}
+
+static int fsinfo_generic_volume_id(struct path *path, struct fsinfo_context *ctx)
+{
+	return fsinfo_string(path->dentry->d_sb->s_id, ctx);
+}
+
+static const struct fsinfo_attribute fsinfo_common_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
+
+	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	(void *)123UL),
+	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
+	{}
+};
+
+/*
+ * Determine an attribute's minimum buffer size and, if the buffer is large
+ * enough, get the attribute value.
+ */
+static int fsinfo_get_this_attribute(struct path *path,
+				     struct fsinfo_context *ctx,
+				     const struct fsinfo_attribute *attr)
+{
+	int buf_size;
+
+	if (ctx->Nth != 0 && !(attr->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)))
+		return -ENODATA;
+	if (ctx->Mth != 0 && !(attr->flags & FSINFO_FLAGS_NM))
+		return -ENODATA;
+
+	switch (attr->type) {
+	case FSINFO_TYPE_VSTRUCT:
+		ctx->clear_tail = true;
+		buf_size = attr->size;
+		break;
+	case FSINFO_TYPE_STRING:
+	case FSINFO_TYPE_OPAQUE:
+	case FSINFO_TYPE_LIST:
+		buf_size = 4096;
+		break;
+	default:
+		return -ENOPKG;
+	}
+
+	if (ctx->buf_size < buf_size)
+		return buf_size;
+
+	return attr->get(path, ctx);
+}
+
+static void fsinfo_attributes_insert(struct fsinfo_context *ctx,
+				     const struct fsinfo_attribute *attr)
+{
+	__u32 *p = ctx->buffer;
+	unsigned int i;
+
+	if (ctx->usage >= ctx->buf_size ||
+	    ctx->buf_size - ctx->usage < sizeof(__u32)) {
+		ctx->usage += sizeof(__u32);
+		return;
+	}
+
+	for (i = 0; i < ctx->usage / sizeof(__u32); i++)
+		if (p[i] == attr->attr_id)
+			return;
+
+	p[i] = attr->attr_id;
+	ctx->usage += sizeof(__u32);
+}
+
+static int fsinfo_list_attributes(struct path *path,
+				  struct fsinfo_context *ctx,
+				  const struct fsinfo_attribute *attributes)
+{
+	const struct fsinfo_attribute *a;
+
+	for (a = attributes; a->get; a++)
+		fsinfo_attributes_insert(ctx, a);
+	return -EOPNOTSUPP; /* We want to go through all the lists */
+}
+
+static int fsinfo_get_attribute_info(struct path *path,
+				     struct fsinfo_context *ctx,
+				     const struct fsinfo_attribute *attributes)
+{
+	const struct fsinfo_attribute *a;
+	struct fsinfo_attribute_info *p = ctx->buffer;
+
+	if (!ctx->buf_size)
+		return sizeof(*p);
+
+	for (a = attributes; a->get; a++) {
+		if (a->attr_id == ctx->Nth) {
+			p->attr_id	= a->attr_id;
+			p->type		= a->type;
+			p->flags	= a->flags;
+			p->size		= a->size;
+			p->size		= a->size;
+			return sizeof(*p);
+		}
+	}
+	return -EOPNOTSUPP; /* We want to go through all the lists */
+}
+
+/**
+ * fsinfo_get_attribute - Look up and handle an attribute
+ * @path: The object to query
+ * @params: Parameters to define a request and place to store result
+ * @attributes: List of attributes to search.
+ *
+ * Look through a list of attributes for one that matches the requested
+ * attribute then call the handler for it.
+ */
+int fsinfo_get_attribute(struct path *path, struct fsinfo_context *ctx,
+			 const struct fsinfo_attribute *attributes)
+{
+	const struct fsinfo_attribute *a;
+
+	switch (ctx->requested_attr) {
+	case FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO:
+		return fsinfo_get_attribute_info(path, ctx, attributes);
+	case FSINFO_ATTR_FSINFO_ATTRIBUTES:
+		return fsinfo_list_attributes(path, ctx, attributes);
+	default:
+		for (a = attributes; a->get; a++)
+			if (a->attr_id == ctx->requested_attr)
+				return fsinfo_get_this_attribute(path, ctx, a);
+		return -EOPNOTSUPP;
+	}
+}
+EXPORT_SYMBOL(fsinfo_get_attribute);
+
+/**
+ * generic_fsinfo - Handle an fsinfo attribute generically
+ * @path: The object to query
+ * @params: Parameters to define a request and place to store result
+ */
+static int fsinfo_call(struct path *path, struct fsinfo_context *ctx)
+{
+	int ret;
+
+	if (path->dentry->d_sb->s_op->fsinfo) {
+		ret = path->dentry->d_sb->s_op->fsinfo(path, ctx);
+		if (ret != -EOPNOTSUPP)
+			return ret;
+	}
+	ret = fsinfo_get_attribute(path, ctx, fsinfo_common_attributes);
+	if (ret != -EOPNOTSUPP)
+		return ret;
+
+	switch (ctx->requested_attr) {
+	case FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO:
+		return -ENODATA;
+	case FSINFO_ATTR_FSINFO_ATTRIBUTES:
+		return ctx->usage;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+/**
+ * vfs_fsinfo - Retrieve filesystem information
+ * @path: The object to query
+ * @params: Parameters to define a request and place to store result
+ *
+ * Get an attribute on a filesystem or an object within a filesystem.  The
+ * filesystem attribute to be queried is indicated by @ctx->requested_attr, and
+ * if it's a multi-valued attribute, the particular value is selected by
+ * @ctx->Nth and then @ctx->Mth.
+ *
+ * For common attributes, a value may be fabricated if it is not supported by
+ * the filesystem.
+ *
+ * On success, the size of the attribute's value is returned (0 is a valid
+ * size).  A buffer will have been allocated and will be pointed to by
+ * @ctx->buffer.  The caller must free this with kvfree().
+ *
+ * Errors can also be returned: -ENOMEM if a buffer cannot be allocated, -EPERM
+ * or -EACCES if permission is denied by the LSM, -EOPNOTSUPP if an attribute
+ * doesn't exist for the specified object or -ENODATA if the attribute exists,
+ * but the Nth,Mth value does not exist.  -EMSGSIZE indicates that the value is
+ * unmanageable internally and -ENOPKG indicates other internal failure.
+ *
+ * Errors such as -EIO may also come from attempts to access media or servers
+ * to obtain the requested information if it's not immediately to hand.
+ *
+ * [*] Note that the caller may set @ctx->want_size_only if it only wants the
+ *     size of the value and not the data.  If this is set, a buffer may not be
+ *     allocated under some circumstances.  This is intended for size query by
+ *     userspace.
+ *
+ * [*] Note that @ctx->clear_tail will be returned set if the data should be
+ *     padded out with zeros when writing it to userspace.
+ */
+static int vfs_fsinfo(struct path *path, struct fsinfo_context *ctx)
+{
+	struct dentry *dentry = path->dentry;
+	int ret;
+
+	ret = security_sb_statfs(dentry);
+	if (ret)
+		return ret;
+
+	/* Call the handler to find out the buffer size required. */
+	ctx->buf_size = 0;
+	ret = fsinfo_call(path, ctx);
+	if (ret < 0 || ctx->want_size_only)
+		return ret;
+	ctx->buf_size = ret;
+
+	do {
+		/* Allocate a buffer of the requested size. */
+		if (ctx->buf_size > INT_MAX)
+			return -EMSGSIZE;
+		ctx->buffer = kvzalloc(ctx->buf_size, GFP_KERNEL);
+		if (!ctx->buffer)
+			return -ENOMEM;
+
+		ctx->usage = 0;
+		ctx->skip = 0;
+		ret = fsinfo_call(path, ctx);
+		if (IS_ERR_VALUE((long)ret))
+			return ret;
+		if ((unsigned int)ret <= ctx->buf_size)
+			return ret; /* It fitted */
+
+		/* We need to resize the buffer */
+		ctx->buf_size = roundup(ret, PAGE_SIZE);
+		kvfree(ctx->buffer);
+		ctx->buffer = NULL;
+	} while (!signal_pending(current));
+
+	return -ERESTARTSYS;
+}
+
+static int vfs_fsinfo_path(int dfd, const char __user *pathname,
+			   const struct fsinfo_params *up,
+			   struct fsinfo_context *ctx)
+{
+	struct path path;
+	unsigned lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
+	int ret = -EINVAL;
+
+	if (up->resolve_flags & ~VALID_RESOLVE_FLAGS)
+		return -EINVAL;
+	if (up->at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
+			     AT_EMPTY_PATH))
+		return -EINVAL;
+
+	if (up->resolve_flags & RESOLVE_NO_XDEV)
+		lookup_flags |= LOOKUP_NO_XDEV;
+	if (up->resolve_flags & RESOLVE_NO_MAGICLINKS)
+		lookup_flags |= LOOKUP_NO_MAGICLINKS;
+	if (up->resolve_flags & RESOLVE_NO_SYMLINKS)
+		lookup_flags |= LOOKUP_NO_SYMLINKS;
+	if (up->resolve_flags & RESOLVE_BENEATH)
+		lookup_flags |= LOOKUP_BENEATH;
+	if (up->resolve_flags & RESOLVE_IN_ROOT)
+		lookup_flags |= LOOKUP_IN_ROOT;
+	if (up->at_flags & AT_SYMLINK_NOFOLLOW)
+		lookup_flags &= ~LOOKUP_FOLLOW;
+	if (up->at_flags & AT_NO_AUTOMOUNT)
+		lookup_flags &= ~LOOKUP_AUTOMOUNT;
+	if (up->at_flags & AT_EMPTY_PATH)
+		lookup_flags |= LOOKUP_EMPTY;
+
+retry:
+	ret = user_path_at(dfd, pathname, lookup_flags, &path);
+	if (ret)
+		goto out;
+
+	ret = vfs_fsinfo(&path, ctx);
+	path_put(&path);
+	if (retry_estale(ret, lookup_flags)) {
+		lookup_flags |= LOOKUP_REVAL;
+		goto retry;
+	}
+out:
+	return ret;
+}
+
+static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_context *ctx)
+{
+	struct fd f = fdget_raw(fd);
+	int ret = -EBADF;
+
+	if (f.file) {
+		ret = vfs_fsinfo(&f.file->f_path, ctx);
+		fdput(f);
+	}
+	return ret;
+}
+
+/**
+ * sys_fsinfo - System call to get filesystem information
+ * @dfd: Base directory to pathwalk from or fd referring to filesystem.
+ * @pathname: Filesystem to query or NULL.
+ * @params: Parameters to define request (NULL: FSINFO_ATTR_STATFS).
+ * @params_size: Size of parameter buffer.
+ * @result_buffer: Result buffer.
+ * @result_buf_size: Size of result buffer.
+ *
+ * Get information on a filesystem.  The filesystem attribute to be queried is
+ * indicated by @_params->request, and some of the attributes can have multiple
+ * values, indexed by @_params->Nth and @_params->Mth.  If @_params is NULL,
+ * then the 0th fsinfo_attr_statfs attribute is queried.  If an attribute does
+ * not exist, EOPNOTSUPP is returned; if the Nth,Mth value does not exist,
+ * ENODATA is returned.
+ *
+ * On success, the size of the attribute's value is returned.  If
+ * @result_buf_size is 0 or @result_buffer is NULL, only the size is returned.
+ * If the size of the value is larger than @result_buf_size, it will be
+ * truncated by the copy.  If the size of the value is smaller than
+ * @result_buf_size then the excess buffer space will be cleared.  The full
+ * size of the value will be returned, irrespective of how much data is
+ * actually placed in the buffer.
+ */
+SYSCALL_DEFINE6(fsinfo,
+		int, dfd,
+		const char __user *, pathname,
+		const struct fsinfo_params __user *, params,
+		size_t, params_size,
+		void __user *, result_buffer,
+		size_t, result_buf_size)
+{
+	struct fsinfo_context ctx;
+	struct fsinfo_params user_params;
+	unsigned int result_size;
+	void *r;
+	int ret;
+
+	if ((!params &&  params_size) ||
+	    ( params && !params_size) ||
+	    (!result_buffer &&  result_buf_size) ||
+	    ( result_buffer && !result_buf_size))
+		return -EINVAL;
+	if (result_buf_size > UINT_MAX)
+		return -EOVERFLOW;
+
+	memset(&ctx, 0, sizeof(ctx));
+	ctx.requested_attr	= FSINFO_ATTR_STATFS;
+	ctx.flags		= FSINFO_FLAGS_QUERY_PATH;
+	ctx.want_size_only	= (result_buf_size == 0);
+
+	if (params) {
+		ret = copy_struct_from_user(&user_params, sizeof(user_params),
+					    params, params_size);
+		if (ret < 0)
+			return ret;
+		if (user_params.flags & ~FSINFO_FLAGS_QUERY_MASK)
+			return -EINVAL;
+		ctx.flags = user_params.flags;
+		ctx.requested_attr = user_params.request;
+		ctx.Nth = user_params.Nth;
+		ctx.Mth = user_params.Mth;
+	}
+
+	switch (ctx.flags & FSINFO_FLAGS_QUERY_MASK) {
+	case FSINFO_FLAGS_QUERY_PATH:
+		ret = vfs_fsinfo_path(dfd, pathname, &user_params, &ctx);
+		break;
+	case FSINFO_FLAGS_QUERY_FD:
+		if (pathname)
+			return -EINVAL;
+		ret = vfs_fsinfo_fd(dfd, &ctx);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (ret < 0)
+		goto error;
+
+	r = ctx.buffer + ctx.skip;
+	result_size = min_t(size_t, ret, result_buf_size);
+	if (result_size > 0 &&
+	    copy_to_user(result_buffer, r, result_size) != 0) {
+		ret = -EFAULT;
+		goto error;
+	}
+
+	/* Clear any part of the buffer that we won't fill if we're putting a
+	 * struct in there.  Strings, opaque objects and arrays are expected to
+	 * be variable length.
+	 */
+	if (ctx.clear_tail &&
+	    result_buf_size > result_size &&
+	    clear_user(result_buffer + result_size,
+		       result_buf_size - result_size) != 0) {
+		ret = -EFAULT;
+		goto error;
+	}
+
+error:
+	kvfree(ctx.buffer);
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 28a29356eace..3284f497de0a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -68,6 +68,7 @@ struct fsverity_info;
 struct fsverity_operations;
 struct fs_context;
 struct fs_parameter_spec;
+struct fsinfo_context;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
@@ -1963,6 +1964,9 @@ struct super_operations {
 	int (*thaw_super) (struct super_block *);
 	int (*unfreeze_fs) (struct super_block *);
 	int (*statfs) (struct dentry *, struct kstatfs *);
+#ifdef CONFIG_FSINFO
+	int (*fsinfo)(struct path *, struct fsinfo_context *);
+#endif
 	int (*remount_fs) (struct super_block *, int *, char *);
 	void (*umount_begin) (struct super_block *);
 
diff --git a/include/linux/fsinfo.h b/include/linux/fsinfo.h
new file mode 100644
index 000000000000..a811d69b02ff
--- /dev/null
+++ b/include/linux/fsinfo.h
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information query
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#ifndef _LINUX_FSINFO_H
+#define _LINUX_FSINFO_H
+
+#ifdef CONFIG_FSINFO
+
+#include <uapi/linux/fsinfo.h>
+
+struct path;
+
+#define FSINFO_NORMAL_ATTR_MAX_SIZE 4096
+
+struct fsinfo_context {
+	__u32		flags;		/* [in] FSINFO_FLAGS_* */
+	__u32		requested_attr;	/* [in] What is being asking for */
+	__u32		Nth;		/* [in] Instance of it (some may have multiple) */
+	__u32		Mth;		/* [in] Subinstance */
+	bool		want_size_only;	/* [in] Just want to know the size, not the data */
+	bool		clear_tail;	/* [out] T if tail of buffer should be cleared */
+	unsigned int	skip;		/* [out] Number of bytes to skip in buffer */
+	unsigned int	usage;		/* [tmp] Amount of buffer used (if large) */
+	unsigned int	buf_size;	/* [tmp] Size of ->buffer[] */
+	void		*buffer;	/* [out] The reply buffer */
+};
+
+/*
+ * A filesystem information attribute definition.
+ */
+struct fsinfo_attribute {
+	unsigned int		attr_id;	/* The ID of the attribute */
+	enum fsinfo_value_type	type:8;		/* The type of the attribute's value(s) */
+	unsigned int		flags:8;
+	unsigned int		size:16;	/* - Value size (FSINFO_STRUCT/LIST) */
+	int (*get)(struct path *path, struct fsinfo_context *params);
+};
+
+#define __FSINFO(A, T, S, G, F) \
+	{ .attr_id = A, .type = T, .flags = F, .size = S, .get = G }
+
+#define _FSINFO(A, T, S, G)	__FSINFO(A, T, S, G, 0)
+#define _FSINFO_N(A, T, S, G)	__FSINFO(A, T, S, G, FSINFO_FLAGS_N)
+#define _FSINFO_NM(A, T, S, G)	__FSINFO(A, T, S, G, FSINFO_FLAGS_NM)
+
+#define _FSINFO_VSTRUCT(A,S,G)	  _FSINFO   (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G)
+#define _FSINFO_VSTRUCT_N(A,S,G)  _FSINFO_N (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G)
+#define _FSINFO_VSTRUCT_NM(A,S,G) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), G)
+
+#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G)
+#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G)
+#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G)
+#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, 0, G)
+#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, 0, G)
+#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, 0, G)
+#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, 0, G)
+#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G)
+#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G)
+
+extern int fsinfo_opaque(const void *, struct fsinfo_context *, unsigned int);
+extern int fsinfo_string(const char *, struct fsinfo_context *);
+extern int fsinfo_generic_timestamp_info(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_supports(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_limits(struct path *, struct fsinfo_context *);
+extern int fsinfo_get_attribute(struct path *, struct fsinfo_context *,
+				const struct fsinfo_attribute *);
+
+#endif /* CONFIG_FSINFO */
+
+#endif /* _LINUX_FSINFO_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 88d03fd627ab..e31ad49af4c3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -47,6 +47,7 @@ struct stat64;
 struct statfs;
 struct statfs64;
 struct statx;
+struct fsinfo_params;
 struct __sysctl_args;
 struct sysinfo;
 struct timespec;
@@ -1007,6 +1008,9 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
 asmlinkage long sys_watch_mount(int dfd, const char __user *path,
 				unsigned int at_flags, int watch_fd, int watch_id);
+asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
+			   const struct fsinfo_params __user *params, size_t params_size,
+			   void __user *result_buffer, size_t result_buf_size);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index fcdca8c7d30a..9e38f611ab56 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -859,9 +859,11 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
 #define __NR_watch_mount 440
 __SYSCALL(__NR_watch_mount, sys_watch_mount)
+#define __NR_fsinfo 441
+__SYSCALL(__NR_fsinfo, sys_fsinfo)
 
 #undef __NR_syscalls
-#define __NR_syscalls 441
+#define __NR_syscalls 442
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
new file mode 100644
index 000000000000..65892239ba86
--- /dev/null
+++ b/include/uapi/linux/fsinfo.h
@@ -0,0 +1,189 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* fsinfo() definitions.
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+#ifndef _UAPI_LINUX_FSINFO_H
+#define _UAPI_LINUX_FSINFO_H
+
+#include <linux/types.h>
+#include <linux/socket.h>
+#include <linux/openat2.h>
+
+/*
+ * The filesystem attributes that can be requested.  Note that some attributes
+ * may have multiple instances which can be switched in the parameter block.
+ */
+#define FSINFO_ATTR_STATFS		0x00	/* statfs()-style state */
+#define FSINFO_ATTR_IDS			0x01	/* Filesystem IDs */
+#define FSINFO_ATTR_LIMITS		0x02	/* Filesystem limits */
+#define FSINFO_ATTR_SUPPORTS		0x03	/* What's supported in statx, iocflags, ... */
+#define FSINFO_ATTR_TIMESTAMP_INFO	0x04	/* Inode timestamp info */
+#define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
+#define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
+#define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
+
+#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
+#define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
+
+/*
+ * Optional fsinfo() parameter structure.
+ *
+ * If this is not given, it is assumed that fsinfo_attr_statfs instance 0,0 is
+ * desired.
+ */
+struct fsinfo_params {
+	__u64	resolve_flags;	/* RESOLVE_* flags */
+	__u32	at_flags;	/* AT_* flags */
+	__u32	flags;		/* Flags controlling fsinfo() specifically */
+#define FSINFO_FLAGS_QUERY_MASK	0x0007 /* What object should fsinfo() query? */
+#define FSINFO_FLAGS_QUERY_PATH	0x0000 /* - path, specified by dirfd,pathname,AT_EMPTY_PATH */
+#define FSINFO_FLAGS_QUERY_FD	0x0001 /* - fd specified by dirfd */
+	__u32	request;	/* ID of requested attribute */
+	__u32	Nth;		/* Instance of it (some may have multiple) */
+	__u32	Mth;		/* Subinstance of Nth instance */
+};
+
+enum fsinfo_value_type {
+	FSINFO_TYPE_VSTRUCT	= 0,	/* Version-lengthed struct (up to 4096 bytes) */
+	FSINFO_TYPE_STRING	= 1,	/* NUL-term var-length string (up to 4095 chars) */
+	FSINFO_TYPE_OPAQUE	= 2,	/* Opaque blob (unlimited size) */
+	FSINFO_TYPE_LIST	= 3,	/* List of ints/structs (unlimited size) */
+};
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO).
+ *
+ * This gives information about the attributes supported by fsinfo for the
+ * given path.
+ */
+struct fsinfo_attribute_info {
+	unsigned int		attr_id;	/* The ID of the attribute */
+	enum fsinfo_value_type	type;		/* The type of the attribute's value(s) */
+	unsigned int		flags;
+#define FSINFO_FLAGS_N		0x01		/* - Attr has a set of values */
+#define FSINFO_FLAGS_NM		0x02		/* - Attr has a set of sets of values */
+	unsigned int		size;		/* - Value size (FSINFO_STRUCT/FSINFO_LIST) */
+};
+
+#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO__STRUCT struct fsinfo_attribute_info
+#define FSINFO_ATTR_FSINFO_ATTRIBUTES__STRUCT __u32
+
+struct fsinfo_u128 {
+#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
+	__u64	hi;
+	__u64	lo;
+#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
+	__u64	lo;
+	__u64	hi;
+#endif
+};
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_STATFS).
+ * - This gives extended filesystem information.
+ */
+struct fsinfo_statfs {
+	struct fsinfo_u128 f_blocks;	/* Total number of blocks in fs */
+	struct fsinfo_u128 f_bfree;	/* Total number of free blocks */
+	struct fsinfo_u128 f_bavail;	/* Number of free blocks available to ordinary user */
+	struct fsinfo_u128 f_files;	/* Total number of file nodes in fs */
+	struct fsinfo_u128 f_ffree;	/* Number of free file nodes */
+	struct fsinfo_u128 f_favail;	/* Number of file nodes available to ordinary user */
+	__u64	f_bsize;		/* Optimal block size */
+	__u64	f_frsize;		/* Fragment size */
+};
+
+#define FSINFO_ATTR_STATFS__STRUCT struct fsinfo_statfs
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_IDS).
+ *
+ * List of basic identifiers as is normally found in statfs().
+ */
+struct fsinfo_ids {
+	char	f_fs_name[15 + 1];	/* Filesystem name */
+	__u64	f_fsid;			/* Short 64-bit Filesystem ID (as statfs) */
+	__u64	f_sb_id;		/* Internal superblock ID for sbnotify()/mntnotify() */
+	__u32	f_fstype;		/* Filesystem type from linux/magic.h [uncond] */
+	__u32	f_dev_major;		/* As st_dev_* from struct statx [uncond] */
+	__u32	f_dev_minor;
+	__u32	__padding[1];
+};
+
+#define FSINFO_ATTR_IDS__STRUCT struct fsinfo_ids
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_LIMITS).
+ *
+ * List of supported filesystem limits.
+ */
+struct fsinfo_limits {
+	struct fsinfo_u128 max_file_size;	/* Maximum file size */
+	struct fsinfo_u128 max_ino;		/* Maximum inode number */
+	__u64	max_uid;			/* Maximum UID supported */
+	__u64	max_gid;			/* Maximum GID supported */
+	__u64	max_projid;			/* Maximum project ID supported */
+	__u64	max_hard_links;			/* Maximum number of hard links on a file */
+	__u64	max_xattr_body_len;		/* Maximum xattr content length */
+	__u32	max_xattr_name_len;		/* Maximum xattr name length */
+	__u32	max_filename_len;		/* Maximum filename length */
+	__u32	max_symlink_len;		/* Maximum symlink content length */
+	__u32	max_dev_major;			/* Maximum device major representable */
+	__u32	max_dev_minor;			/* Maximum device minor representable */
+	__u32	__padding[1];
+};
+
+#define FSINFO_ATTR_LIMITS__STRUCT struct fsinfo_limits
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_SUPPORTS).
+ *
+ * What's supported in various masks, such as statx() attribute and mask bits
+ * and IOC flags.
+ */
+struct fsinfo_supports {
+	__u64	stx_attributes;		/* What statx::stx_attributes are supported */
+	__u32	stx_mask;		/* What statx::stx_mask bits are supported */
+	__u32	fs_ioc_getflags;	/* What FS_IOC_GETFLAGS may return */
+	__u32	fs_ioc_setflags_set;	/* What FS_IOC_SETFLAGS may set */
+	__u32	fs_ioc_setflags_clear;	/* What FS_IOC_SETFLAGS may clear */
+	__u32	fs_ioc_fsgetxattr_xflags; /* What FS_IOC_FSGETXATTR[A] may return in fsx_xflags */
+	__u32	fs_ioc_fssetxattr_xflags_set; /* What FS_IOC_FSSETXATTR may set in fsx_xflags */
+	__u32	fs_ioc_fssetxattr_xflags_clear; /* What FS_IOC_FSSETXATTR may set in fsx_xflags */
+	__u32	win_file_attrs;		/* What DOS/Windows FILE_* attributes are supported */
+};
+
+#define FSINFO_ATTR_SUPPORTS__STRUCT struct fsinfo_supports
+
+struct fsinfo_timestamp_one {
+	__s64	minimum;	/* Minimum timestamp value in seconds */
+	__s64	maximum;	/* Maximum timestamp value in seconds */
+	__u16	gran_mantissa;	/* Granularity(secs) = mant * 10^exp */
+	__s8	gran_exponent;
+	__u8	__padding[5];
+};
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_TIMESTAMP_INFO).
+ */
+struct fsinfo_timestamp_info {
+	struct fsinfo_timestamp_one	atime;	/* Access time */
+	struct fsinfo_timestamp_one	mtime;	/* Modification time */
+	struct fsinfo_timestamp_one	ctime;	/* Change time */
+	struct fsinfo_timestamp_one	btime;	/* Birth/creation time */
+};
+
+#define FSINFO_ATTR_TIMESTAMP_INFO__STRUCT struct fsinfo_timestamp_info
+
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_VOLUME_UUID).
+ */
+struct fsinfo_volume_uuid {
+	__u8	uuid[16];
+};
+
+#define FSINFO_ATTR_VOLUME_UUID__STRUCT struct fsinfo_volume_uuid
+
+#endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3e1c5c9d2efe..f72a9e4ddc9a 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents);
 COND_SYSCALL(io_uring_setup);
 COND_SYSCALL(io_uring_enter);
 COND_SYSCALL(io_uring_register);
+COND_SYSCALL(fsinfo);
 
 /* fs/xattr.c */
 
diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
index 00b6824f9237..d63af5106fc2 100644
--- a/samples/vfs/Makefile
+++ b/samples/vfs/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-userprogs := test-fsmount test-statx
+userprogs := test-fsinfo test-fsmount test-statx
 always-y := $(userprogs)
 
 userccflags += -I usr/include
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
new file mode 100644
index 000000000000..934b25399ffe
--- /dev/null
+++ b/samples/vfs/test-fsinfo.c
@@ -0,0 +1,646 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Test the fsinfo() system call
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define _GNU_SOURCE
+#define _ATFILE_SOURCE
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+#include <ctype.h>
+#include <errno.h>
+#include <time.h>
+#include <math.h>
+#include <fcntl.h>
+#include <sys/syscall.h>
+#include <linux/fsinfo.h>
+#include <linux/socket.h>
+#include <sys/stat.h>
+#include <arpa/inet.h>
+
+#ifndef __NR_fsinfo
+#define __NR_fsinfo -1
+#endif
+
+static bool debug = 0;
+static bool list_last;
+
+static __attribute__((unused))
+ssize_t fsinfo(int dfd, const char *filename,
+	       struct fsinfo_params *params, size_t params_size,
+	       void *result_buffer, size_t result_buf_size)
+{
+	return syscall(__NR_fsinfo, dfd, filename,
+		       params, params_size,
+		       result_buffer, result_buf_size);
+}
+
+struct fsinfo_attribute {
+	unsigned int		attr_id;
+	enum fsinfo_value_type	type;
+	unsigned int		size;
+	const char		*name;
+	void (*dump)(void *reply, unsigned int size);
+};
+
+static const struct fsinfo_attribute fsinfo_attributes[];
+
+static ssize_t get_fsinfo(const char *, const char *, struct fsinfo_params *, void **);
+
+static void dump_hex(FILE *f, unsigned char *data, int from, int to)
+{
+	unsigned offset, col = 0;
+	bool print_offset = true;
+
+	for (offset = from; offset < to; offset++) {
+		if (print_offset) {
+			fprintf(f, "%04x: ", offset);
+			print_offset = 0;
+		}
+		fprintf(f, "%02x", data[offset]);
+		col++;
+		if ((col & 3) == 0) {
+			if ((col & 15) == 0) {
+				fprintf(f, "\n");
+				print_offset = 1;
+			} else {
+				fprintf(f, " ");
+			}
+		}
+	}
+
+	if (!print_offset)
+		fprintf(f, "\n");
+}
+
+static void dump_attribute_info(void *reply, unsigned int size)
+{
+	struct fsinfo_attribute_info *attr_info = reply;
+	const struct fsinfo_attribute *attr;
+	char type[32], val_size[32];
+
+	switch (attr_info->type) {
+	case FSINFO_TYPE_VSTRUCT:	strcpy(type, "V-STRUCT");	break;
+	case FSINFO_TYPE_STRING:	strcpy(type, "STRING");		break;
+	case FSINFO_TYPE_OPAQUE:	strcpy(type, "OPAQUE");		break;
+	case FSINFO_TYPE_LIST:		strcpy(type, "LIST");		break;
+	default:
+		sprintf(type, "type-%x", attr_info->type);
+		break;
+	}
+
+	if (attr_info->flags & FSINFO_FLAGS_N)
+		strcat(type, " x N");
+	else if (attr_info->flags & FSINFO_FLAGS_NM)
+		strcat(type, " x NM");
+
+	for (attr = fsinfo_attributes; attr->name; attr++)
+		if (attr->attr_id == attr_info->attr_id)
+			break;
+
+	if (attr_info->size)
+		sprintf(val_size, "%u", attr_info->size);
+	else
+		strcpy(val_size, "-");
+
+	printf("%8x %-12s %08x %5s %s\n",
+	       attr_info->attr_id,
+	       type,
+	       attr_info->flags,
+	       val_size,
+	       attr->name ? attr->name : "");
+}
+
+static void dump_fsinfo_generic_statfs(void *reply, unsigned int size)
+{
+	struct fsinfo_statfs *f = reply;
+
+	printf("\n");
+	printf("\tblocks       : n=%llu fr=%llu av=%llu\n",
+	       (unsigned long long)f->f_blocks.lo,
+	       (unsigned long long)f->f_bfree.lo,
+	       (unsigned long long)f->f_bavail.lo);
+
+	printf("\tfiles        : n=%llu fr=%llu av=%llu\n",
+	       (unsigned long long)f->f_files.lo,
+	       (unsigned long long)f->f_ffree.lo,
+	       (unsigned long long)f->f_favail.lo);
+	printf("\tbsize        : %llu\n",
+	       (unsigned long long)f->f_bsize);
+	printf("\tfrsize       : %llu\n",
+	       (unsigned long long)f->f_frsize);
+}
+
+static void dump_fsinfo_generic_ids(void *reply, unsigned int size)
+{
+	struct fsinfo_ids *f = reply;
+
+	printf("\n");
+	printf("\tdev          : %02x:%02x\n", f->f_dev_major, f->f_dev_minor);
+	printf("\tfs           : type=%x name=%s\n", f->f_fstype, f->f_fs_name);
+	printf("\tfsid         : %llx\n", (unsigned long long)f->f_fsid);
+	printf("\tsbid         : %llx\n", (unsigned long long)f->f_sb_id);
+}
+
+static void dump_fsinfo_generic_limits(void *reply, unsigned int size)
+{
+	struct fsinfo_limits *f = reply;
+
+	printf("\n");
+	printf("\tmax file size: %llx%016llx\n",
+	       (unsigned long long)f->max_file_size.hi,
+	       (unsigned long long)f->max_file_size.lo);
+	printf("\tmax ino      : %llx%016llx\n",
+	       (unsigned long long)f->max_ino.hi,
+	       (unsigned long long)f->max_ino.lo);
+	printf("\tmax ids      : u=%llx g=%llx p=%llx\n",
+	       (unsigned long long)f->max_uid,
+	       (unsigned long long)f->max_gid,
+	       (unsigned long long)f->max_projid);
+	printf("\tmax dev      : maj=%x min=%x\n",
+	       f->max_dev_major, f->max_dev_minor);
+	printf("\tmax links    : %llx\n",
+	       (unsigned long long)f->max_hard_links);
+	printf("\tmax xattr    : n=%x b=%llx\n",
+	       f->max_xattr_name_len,
+	       (unsigned long long)f->max_xattr_body_len);
+	printf("\tmax len      : file=%x sym=%x\n",
+	       f->max_filename_len, f->max_symlink_len);
+}
+
+static void dump_fsinfo_generic_supports(void *reply, unsigned int size)
+{
+	struct fsinfo_supports *f = reply;
+
+	printf("\n");
+	printf("\tstx_attr     : %llx\n", (unsigned long long)f->stx_attributes);
+	printf("\tstx_mask     : %x\n", f->stx_mask);
+	printf("\tfs_ioc_*flags: get=%x set=%x clr=%x\n",
+	       f->fs_ioc_getflags, f->fs_ioc_setflags_set, f->fs_ioc_setflags_clear);
+	printf("\tfs_ioc_*xattr: fsx_xflags: get=%x set=%x clr=%x\n",
+	       f->fs_ioc_fsgetxattr_xflags,
+	       f->fs_ioc_fssetxattr_xflags_set,
+	       f->fs_ioc_fssetxattr_xflags_clear);
+	printf("\twin_fattrs   : %x\n", f->win_file_attrs);
+}
+
+static void print_time(struct fsinfo_timestamp_one *t, char stamp)
+{
+	printf("\t%ctime       : gran=%uE%d range=%llx-%llx\n",
+	       stamp,
+	       t->gran_mantissa, t->gran_exponent,
+	       (long long)t->minimum, (long long)t->maximum);
+}
+
+static void dump_fsinfo_generic_timestamp_info(void *reply, unsigned int size)
+{
+	struct fsinfo_timestamp_info *f = reply;
+
+	printf("\n");
+	print_time(&f->atime, 'a');
+	print_time(&f->mtime, 'm');
+	print_time(&f->ctime, 'c');
+	print_time(&f->btime, 'b');
+}
+
+static void dump_fsinfo_generic_volume_uuid(void *reply, unsigned int size)
+{
+	struct fsinfo_volume_uuid *f = reply;
+
+	printf("%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x"
+	       "-%02x%02x%02x%02x%02x%02x\n",
+	       f->uuid[ 0], f->uuid[ 1],
+	       f->uuid[ 2], f->uuid[ 3],
+	       f->uuid[ 4], f->uuid[ 5],
+	       f->uuid[ 6], f->uuid[ 7],
+	       f->uuid[ 8], f->uuid[ 9],
+	       f->uuid[10], f->uuid[11],
+	       f->uuid[12], f->uuid[13],
+	       f->uuid[14], f->uuid[15]);
+}
+
+static void dump_string(void *reply, unsigned int size)
+{
+	char *s = reply, *p;
+	bool nl = false, last_nl = false;
+
+	p = s;
+	if (size >= 4096) {
+		size = 4096;
+		p[4092] = '.';
+		p[4093] = '.';
+		p[4094] = '.';
+		p[4095] = 0;
+	} else {
+		p[size] = 0;
+	}
+
+	for (p = s; *p; p++) {
+		if (*p == '\n') {
+			last_nl = nl = true;
+			continue;
+		}
+		last_nl = false;
+		if (!isprint(*p) && *p != '\t')
+			*p = '?';
+	}
+
+	if (nl)
+		putchar('\n');
+	printf("%s", s);
+	if (!last_nl)
+		putchar('\n');
+}
+
+#define dump_fsinfo_meta_attribute_info		(void *)0x123
+#define dump_fsinfo_meta_attributes		(void *)0x123
+
+/*
+ *
+ */
+#define __FSINFO(A, T, S, G, F, N)					\
+	{ .attr_id = A, .type = T, .size = S, .name = N, .dump = dump_##G }
+
+#define _FSINFO(A,T,S,G,N)	__FSINFO(A, T, S, G, 0, N)
+#define _FSINFO_N(A,T,S,G,N)	__FSINFO(A, T, S, G, FSINFO_FLAGS_N, N)
+#define _FSINFO_NM(A,T,S,G,N)	__FSINFO(A, T, S, G, FSINFO_FLAGS_NM, N)
+
+#define _FSINFO_VSTRUCT(A,S,G,N)    _FSINFO   (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G, N)
+#define _FSINFO_VSTRUCT_N(A,S,G,N)  _FSINFO_N (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G, N)
+#define _FSINFO_VSTRUCT_NM(A,S,G,N) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), G, N)
+
+#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G, #A)
+#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G, #A)
+#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G, #A)
+#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, 0, G, #A)
+#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, 0, G, #A)
+#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, 0, G, #A)
+#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, 0, G, #A)
+#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G, #A)
+#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G, #A)
+
+static const struct fsinfo_attribute fsinfo_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		string),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	string),
+	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_meta_attribute_info),
+	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_meta_attributes),
+	{}
+};
+
+static __attribute__((noreturn))
+void bad_value(const char *what,
+	       struct fsinfo_params *params,
+	       const struct fsinfo_attribute *attr,
+	       const struct fsinfo_attribute_info *attr_info,
+	       void *reply, unsigned int size)
+{
+	printf("\n");
+	fprintf(stderr, "%s %s{%u}{%u} t=%x f=%x s=%x\n",
+		what, attr->name, params->Nth, params->Mth,
+		attr_info->type, attr_info->flags, attr_info->size);
+	fprintf(stderr, "size=%u\n", size);
+	dump_hex(stderr, reply, 0, size);
+	exit(1);
+}
+
+static void dump_value(unsigned int attr_id,
+		       const struct fsinfo_attribute *attr,
+		       const struct fsinfo_attribute_info *attr_info,
+		       void *reply, unsigned int size)
+{
+	if (!attr || !attr->dump) {
+		printf("<no dumper>\n");
+		return;
+	}
+
+	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
+		printf("<short data %u/%u>\n", size, attr->size);
+		return;
+	}
+
+	attr->dump(reply, size);
+}
+
+static void dump_list(unsigned int attr_id,
+		      const struct fsinfo_attribute *attr,
+		      const struct fsinfo_attribute_info *attr_info,
+		      void *reply, unsigned int size)
+{
+	size_t elem_size = attr_info->size;
+	unsigned int ix = 0;
+
+	printf("\n");
+	if (!attr || !attr->dump) {
+		printf("<no dumper>\n");
+		return;
+	}
+
+	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
+		printf("<short data %u/%u>\n", size, attr->size);
+		return;
+	}
+
+	list_last = false;
+	while (size >= elem_size) {
+		printf("\t[%02x] ", ix);
+		if (size == elem_size)
+			list_last = true;
+		attr->dump(reply, size);
+		reply += elem_size;
+		size -= elem_size;
+		ix++;
+	}
+}
+
+/*
+ * Call fsinfo, expanding the buffer as necessary.
+ */
+static ssize_t get_fsinfo(const char *file, const char *name,
+			  struct fsinfo_params *params, void **_r)
+{
+	ssize_t ret;
+	size_t buf_size = 4096;
+	void *r;
+
+	for (;;) {
+		r = malloc(buf_size);
+		if (!r) {
+			perror("malloc");
+			exit(1);
+		}
+		memset(r, 0xbd, buf_size);
+
+		errno = 0;
+		ret = fsinfo(AT_FDCWD, file, params, sizeof(*params), r, buf_size - 1);
+		if (ret == -1)
+			goto error;
+
+		if (ret <= buf_size - 1)
+			break;
+		buf_size = (ret + 4096 - 1) & ~(4096 - 1);
+	}
+
+	if (debug)
+		printf("fsinfo(%s,%s,%u,%u) = %zd\n",
+		       file, name, params->Nth, params->Mth, ret);
+
+	((char *)r)[ret] = 0;
+	*_r = r;
+	return ret;
+
+error:
+	*_r = NULL;
+	free(r);
+	if (debug)
+		printf("fsinfo(%s,%s,%u,%u) = %m\n",
+		       file, name, params->Nth, params->Mth);
+	return ret;
+}
+
+/*
+ * Try one subinstance of an attribute.
+ */
+static int try_one(const char *file, struct fsinfo_params *params,
+		   const struct fsinfo_attribute_info *attr_info, bool raw)
+{
+	const struct fsinfo_attribute *attr;
+	const char *name;
+	size_t size = 4096;
+	char namebuf[32];
+	void *r;
+
+	for (attr = fsinfo_attributes; attr->name; attr++) {
+		if (attr->attr_id == params->request) {
+			name = attr->name;
+			if (strncmp(name, "fsinfo_generic_", 15) == 0)
+				name += 15;
+			goto found;
+		}
+	}
+
+	sprintf(namebuf, "<unknown-%x>", params->request);
+	name = namebuf;
+	attr = NULL;
+
+found:
+	size = get_fsinfo(file, name, params, &r);
+
+	if (size == -1) {
+		if (errno == ENODATA) {
+			if (!(attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) &&
+			    params->Nth == 0 && params->Mth == 0)
+				bad_value("Unexpected ENODATA",
+					  params, attr, attr_info, r, size);
+			free(r);
+			return (params->Mth == 0) ? 2 : 1;
+		}
+		if (errno == EOPNOTSUPP) {
+			if (params->Nth > 0 || params->Mth > 0)
+				bad_value("Should return ENODATA",
+					  params, attr, attr_info, r, size);
+			//printf("\e[33m%s\e[m: <not supported>\n",
+			//       fsinfo_attr_names[attr]);
+			free(r);
+			return 2;
+		}
+		perror(file);
+		exit(1);
+	}
+
+	if (raw) {
+		if (size > 4096)
+			size = 4096;
+		dump_hex(stdout, r, 0, size);
+		free(r);
+		return 0;
+	}
+
+	switch (attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) {
+	case 0:
+		printf("\e[33m%s\e[m: ", name);
+		break;
+	case FSINFO_FLAGS_N:
+		printf("\e[33m%s{%u}\e[m: ", name, params->Nth);
+		break;
+	case FSINFO_FLAGS_NM:
+		printf("\e[33m%s{%u,%u}\e[m: ", name, params->Nth, params->Mth);
+		break;
+	}
+
+	switch (attr_info->type) {
+	case FSINFO_TYPE_STRING:
+		if (size == 0 || ((char *)r)[size - 1] != 0)
+			bad_value("Unterminated string",
+				  params, attr, attr_info, r, size);
+	case FSINFO_TYPE_VSTRUCT:
+	case FSINFO_TYPE_OPAQUE:
+		dump_value(params->request, attr, attr_info, r, size);
+		free(r);
+		return 0;
+
+	case FSINFO_TYPE_LIST:
+		dump_list(params->request, attr, attr_info, r, size);
+		free(r);
+		return 0;
+
+	default:
+		bad_value("Fishy type", params, attr, attr_info, r, size);
+	}
+}
+
+static int cmp_u32(const void *a, const void *b)
+{
+	return *(const int *)a - *(const int *)b;
+}
+
+/*
+ *
+ */
+int main(int argc, char **argv)
+{
+	struct fsinfo_attribute_info attr_info;
+	struct fsinfo_params params = {
+		.at_flags	= AT_SYMLINK_NOFOLLOW,
+		.flags		= FSINFO_FLAGS_QUERY_PATH,
+	};
+	unsigned int *attrs, ret, nr, i;
+	bool meta = false;
+	int raw = 0, opt, Nth, Mth;
+
+	while ((opt = getopt(argc, argv, "Madlr"))) {
+		switch (opt) {
+		case 'M':
+			meta = true;
+			continue;
+		case 'a':
+			params.at_flags |= AT_NO_AUTOMOUNT;
+			params.flags = FSINFO_FLAGS_QUERY_PATH;
+			continue;
+		case 'd':
+			debug = true;
+			continue;
+		case 'l':
+			params.at_flags &= ~AT_SYMLINK_NOFOLLOW;
+			params.flags = FSINFO_FLAGS_QUERY_PATH;
+			continue;
+		case 'r':
+			raw = 1;
+			continue;
+		}
+		break;
+	}
+
+	argc -= optind;
+	argv += optind;
+
+	if (argc != 1) {
+		printf("Format: test-fsinfo [-Madlr] <path>\n");
+		exit(2);
+	}
+
+	/* Retrieve a list of supported attribute IDs */
+	params.request = FSINFO_ATTR_FSINFO_ATTRIBUTES;
+	params.Nth = 0;
+	params.Mth = 0;
+	ret = get_fsinfo(argv[0], "attributes", &params, (void **)&attrs);
+	if (ret == -1) {
+		fprintf(stderr, "Unable to get attribute list: %m\n");
+		exit(1);
+	}
+
+	if (ret % sizeof(attrs[0])) {
+		fprintf(stderr, "Bad length of attribute list (0x%x)\n", ret);
+		exit(2);
+	}
+
+	nr = ret / sizeof(attrs[0]);
+	qsort(attrs, nr, sizeof(attrs[0]), cmp_u32);
+
+	if (meta) {
+		printf("ATTR ID  TYPE         FLAGS    SIZE  NAME\n");
+		printf("======== ============ ======== ===== =========\n");
+		for (i = 0; i < nr; i++) {
+			params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
+			params.Nth = attrs[i];
+			params.Mth = 0;
+			ret = fsinfo(AT_FDCWD, argv[0],
+				     &params, sizeof(params),
+				     &attr_info, sizeof(attr_info));
+			if (ret == -1) {
+				fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
+				exit(1);
+			}
+
+			dump_attribute_info(&attr_info, ret);
+		}
+		exit(0);
+	}
+
+	for (i = 0; i < nr; i++) {
+		params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
+		params.Nth = attrs[i];
+		params.Mth = 0;
+		ret = fsinfo(AT_FDCWD, argv[0],
+			     &params, sizeof(params),
+			     &attr_info, sizeof(attr_info));
+		if (ret == -1) {
+			fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
+			exit(1);
+		}
+
+		if (attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
+		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTES)
+			continue;
+
+		if (attrs[i] != attr_info.attr_id) {
+			fprintf(stderr, "ID for %03x returned %03x\n",
+				attrs[i], attr_info.attr_id);
+			break;
+		}
+		Nth = 0;
+		do {
+			Mth = 0;
+			do {
+				params.request = attrs[i];
+				params.Nth = Nth;
+				params.Mth = Mth;
+
+				switch (try_one(argv[0], &params, &attr_info, raw)) {
+				case 0:
+					continue;
+				case 1:
+					goto done_M;
+				case 2:
+					goto done_N;
+				}
+			} while (++Mth < 100);
+
+		done_M:
+			if (Mth >= 100) {
+				fprintf(stderr, "Fishy: Mth %x[%u][%u]\n", attrs[i], Nth, Mth);
+				break;
+			}
+
+		} while (++Nth < 100);
+
+	done_N:
+		if (Nth >= 100) {
+			fprintf(stderr, "Fishy: Nth %x[%u]\n", attrs[i], Nth);
+			break;
+		}
+	}
+
+	return 0;
+}



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 03/18] fsinfo: Provide a bitmap of the features a filesystem supports [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
  2020-08-03 13:36 ` [PATCH 01/18] fsinfo: Introduce a non-repeating system-unique superblock ID " David Howells
  2020-08-03 13:36 ` [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information " David Howells
@ 2020-08-03 13:36 ` David Howells
  2020-08-03 13:37 ` [PATCH 04/18] fsinfo: Allow retrieval of superblock devname, options and stats " David Howells
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:36 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Provide a bitmap of features that a filesystem may provide for the path
being queried.  Features include such things as:

 (1) The general class of filesystem, such as kernel-interface,
     block-based, flash-based, network-based.

 (2) Supported inode features, such as which timestamps are supported,
     whether simple numeric user, group or project IDs are supported and
     whether user identification is actually more complex behind the
     scenes.

 (3) Supported volume features, such as it having a UUID, a name or a
     filesystem ID.

 (4) Supported filesystem features, such as what types of file are
     supported, whether sparse files, extended attributes and quotas are
     supported.

 (5) Supported interface features, such as whether locking and leases are
     supported, what open flags are honoured and how i_version is managed.

For some filesystems, this may be an immutable set and can just be memcpy'd
into the reply buffer.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                 |   34 +++++++++++++++++++++
 include/linux/fsinfo.h      |   38 +++++++++++++++++++++++
 include/uapi/linux/fsinfo.h |   68 ++++++++++++++++++++++++++++++++++++++++++
 samples/vfs/test-fsinfo.c   |   70 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 210 insertions(+)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index 7d9c73e9cbde..79c222d465d8 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -131,6 +131,39 @@ int fsinfo_generic_supports(struct path *path, struct fsinfo_context *ctx)
 }
 EXPORT_SYMBOL(fsinfo_generic_supports);
 
+int fsinfo_generic_features(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_features *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+
+	fsinfo_init_features(p);
+	if (sb->s_mtd)
+		fsinfo_set_feature(p, FSINFO_FEAT_IS_FLASH_FS);
+	else if (sb->s_bdev)
+		fsinfo_set_feature(p, FSINFO_FEAT_IS_BLOCK_FS);
+
+	if (sb->s_quota_types & QTYPE_MASK_USR)
+		fsinfo_set_feature(p, FSINFO_FEAT_USER_QUOTAS);
+	if (sb->s_quota_types & QTYPE_MASK_GRP)
+		fsinfo_set_feature(p, FSINFO_FEAT_GROUP_QUOTAS);
+	if (sb->s_quota_types & QTYPE_MASK_PRJ)
+		fsinfo_set_feature(p, FSINFO_FEAT_PROJECT_QUOTAS);
+	if (sb->s_d_op && sb->s_d_op->d_automount)
+		fsinfo_set_feature(p, FSINFO_FEAT_AUTOMOUNTS);
+	if (sb->s_id[0])
+		fsinfo_set_feature(p, FSINFO_FEAT_VOLUME_ID);
+	if (sb->s_flags & SB_MANDLOCK)
+		fsinfo_set_feature(p, FSINFO_FEAT_MAND_LOCKS);
+	if (sb->s_flags & SB_POSIXACL)
+		fsinfo_set_feature(p, FSINFO_FEAT_HAS_ACL);
+
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_ATIME);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_CTIME);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_MTIME);
+	return sizeof(*p);
+}
+EXPORT_SYMBOL(fsinfo_generic_features);
+
 static const struct fsinfo_timestamp_info fsinfo_default_timestamp_info = {
 	.atime = {
 		.minimum	= S64_MIN,
@@ -206,6 +239,7 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
 
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	(void *)123UL),
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
diff --git a/include/linux/fsinfo.h b/include/linux/fsinfo.h
index a811d69b02ff..517edd5a2791 100644
--- a/include/linux/fsinfo.h
+++ b/include/linux/fsinfo.h
@@ -68,6 +68,44 @@ extern int fsinfo_generic_supports(struct path *, struct fsinfo_context *);
 extern int fsinfo_generic_limits(struct path *, struct fsinfo_context *);
 extern int fsinfo_get_attribute(struct path *, struct fsinfo_context *,
 				const struct fsinfo_attribute *);
+extern int fsinfo_generic_features(struct path *, struct fsinfo_context *);
+
+static inline void fsinfo_init_features(struct fsinfo_features *p)
+{
+	p->nr_features = FSINFO_FEAT__NR;
+}
+
+static inline void fsinfo_set_feature(struct fsinfo_features *p,
+				      enum fsinfo_feature feature)
+{
+	p->features[feature / 8] |= 1 << (feature % 8);
+}
+
+static inline void fsinfo_clear_feature(struct fsinfo_features *p,
+					enum fsinfo_feature feature)
+{
+	p->features[feature / 8] &= ~(1 << (feature % 8));
+}
+
+/**
+ * fsinfo_set_unix_features - Set standard UNIX features.
+ * @f: The features mask to alter
+ */
+static inline void fsinfo_set_unix_features(struct fsinfo_features *p)
+{
+	fsinfo_set_feature(p, FSINFO_FEAT_UIDS);
+	fsinfo_set_feature(p, FSINFO_FEAT_GIDS);
+	fsinfo_set_feature(p, FSINFO_FEAT_DIRECTORIES);
+	fsinfo_set_feature(p, FSINFO_FEAT_SYMLINKS);
+	fsinfo_set_feature(p, FSINFO_FEAT_HARD_LINKS);
+	fsinfo_set_feature(p, FSINFO_FEAT_DEVICE_FILES);
+	fsinfo_set_feature(p, FSINFO_FEAT_UNIX_SPECIALS);
+	fsinfo_set_feature(p, FSINFO_FEAT_SPARSE);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_ATIME);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_CTIME);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_MTIME);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_INODE_NUMBERS);
+}
 
 #endif /* CONFIG_FSINFO */
 
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 65892239ba86..b8b2c836267b 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -23,6 +23,7 @@
 #define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
 #define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
 #define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
+#define FSINFO_ATTR_FEATURES		0x08	/* Filesystem features (bits) */
 
 #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
@@ -157,6 +158,73 @@ struct fsinfo_supports {
 
 #define FSINFO_ATTR_SUPPORTS__STRUCT struct fsinfo_supports
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_FEATURES).
+ *
+ * Bitmask indicating filesystem features where renderable as single bits.
+ */
+enum fsinfo_feature {
+	FSINFO_FEAT_IS_KERNEL_FS	= 0,	/* fs is kernel-special filesystem */
+	FSINFO_FEAT_IS_BLOCK_FS		= 1,	/* fs is block-based filesystem */
+	FSINFO_FEAT_IS_FLASH_FS		= 2,	/* fs is flash filesystem */
+	FSINFO_FEAT_IS_NETWORK_FS	= 3,	/* fs is network filesystem */
+	FSINFO_FEAT_IS_AUTOMOUNTER_FS	= 4,	/* fs is automounter special filesystem */
+	FSINFO_FEAT_IS_MEMORY_FS	= 5,	/* fs is memory-based filesystem */
+	FSINFO_FEAT_AUTOMOUNTS		= 6,	/* fs supports automounts */
+	FSINFO_FEAT_ADV_LOCKS		= 7,	/* fs supports advisory file locking */
+	FSINFO_FEAT_MAND_LOCKS		= 8,	/* fs supports mandatory file locking */
+	FSINFO_FEAT_LEASES		= 9,	/* fs supports file leases */
+	FSINFO_FEAT_UIDS		= 10,	/* fs supports numeric uids */
+	FSINFO_FEAT_GIDS		= 11,	/* fs supports numeric gids */
+	FSINFO_FEAT_PROJIDS		= 12,	/* fs supports numeric project ids */
+	FSINFO_FEAT_STRING_USER_IDS	= 13,	/* fs supports string user identifiers */
+	FSINFO_FEAT_GUID_USER_IDS	= 14,	/* fs supports GUID user identifiers */
+	FSINFO_FEAT_WINDOWS_ATTRS	= 15,	/* fs has windows attributes */
+	FSINFO_FEAT_USER_QUOTAS		= 16,	/* fs has per-user quotas */
+	FSINFO_FEAT_GROUP_QUOTAS	= 17,	/* fs has per-group quotas */
+	FSINFO_FEAT_PROJECT_QUOTAS	= 18,	/* fs has per-project quotas */
+	FSINFO_FEAT_XATTRS		= 19,	/* fs has xattrs */
+	FSINFO_FEAT_JOURNAL		= 20,	/* fs has a journal */
+	FSINFO_FEAT_DATA_IS_JOURNALLED	= 21,	/* fs is using data journalling */
+	FSINFO_FEAT_O_SYNC		= 22,	/* fs supports O_SYNC */
+	FSINFO_FEAT_O_DIRECT		= 23,	/* fs supports O_DIRECT */
+	FSINFO_FEAT_VOLUME_ID		= 24,	/* fs has a volume ID */
+	FSINFO_FEAT_VOLUME_UUID		= 25,	/* fs has a volume UUID */
+	FSINFO_FEAT_VOLUME_NAME		= 26,	/* fs has a volume name */
+	FSINFO_FEAT_VOLUME_FSID		= 27,	/* fs has a volume FSID */
+	FSINFO_FEAT_IVER_ALL_CHANGE	= 28,	/* i_version represents data + meta changes */
+	FSINFO_FEAT_IVER_DATA_CHANGE	= 29,	/* i_version represents data changes only */
+	FSINFO_FEAT_IVER_MONO_INCR	= 30,	/* i_version incremented monotonically */
+	FSINFO_FEAT_DIRECTORIES		= 31,	/* fs supports (sub)directories */
+	FSINFO_FEAT_SYMLINKS		= 32,	/* fs supports symlinks */
+	FSINFO_FEAT_HARD_LINKS		= 33,	/* fs supports hard links */
+	FSINFO_FEAT_HARD_LINKS_1DIR	= 34,	/* fs supports hard links in same dir only */
+	FSINFO_FEAT_DEVICE_FILES	= 35,	/* fs supports bdev, cdev */
+	FSINFO_FEAT_UNIX_SPECIALS	= 36,	/* fs supports pipe, fifo, socket */
+	FSINFO_FEAT_RESOURCE_FORKS	= 37,	/* fs supports resource forks/streams */
+	FSINFO_FEAT_NAME_CASE_INDEP	= 38,	/* Filename case independence is mandatory */
+	FSINFO_FEAT_NAME_CASE_FOLD	= 39,	/* Filename case is folded on medium */
+	FSINFO_FEAT_NAME_NON_UTF8	= 40,	/* fs has non-utf8 names */
+	FSINFO_FEAT_NAME_HAS_CODEPAGE	= 41,	/* fs has a filename codepage */
+	FSINFO_FEAT_SPARSE		= 42,	/* fs supports sparse files */
+	FSINFO_FEAT_NOT_PERSISTENT	= 43,	/* fs is not persistent */
+	FSINFO_FEAT_NO_UNIX_MODE	= 44,	/* fs does not support unix mode bits */
+	FSINFO_FEAT_HAS_ATIME		= 45,	/* fs supports access time */
+	FSINFO_FEAT_HAS_BTIME		= 46,	/* fs supports birth/creation time */
+	FSINFO_FEAT_HAS_CTIME		= 47,	/* fs supports change time */
+	FSINFO_FEAT_HAS_MTIME		= 48,	/* fs supports modification time */
+	FSINFO_FEAT_HAS_ACL		= 49,	/* fs supports ACLs of some sort */
+	FSINFO_FEAT_HAS_INODE_NUMBERS	= 50,	/* fs has inode numbers */
+	FSINFO_FEAT__NR
+};
+
+struct fsinfo_features {
+	__u32	nr_features;	/* Number of supported features (FSINFO_FEAT__NR) */
+	__u8	features[(FSINFO_FEAT__NR + 7) / 8];
+};
+
+#define FSINFO_ATTR_FEATURES__STRUCT struct fsinfo_features
+
 struct fsinfo_timestamp_one {
 	__s64	minimum;	/* Minimum timestamp value in seconds */
 	__s64	maximum;	/* Maximum timestamp value in seconds */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 934b25399ffe..c5932109f683 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -190,6 +190,75 @@ static void dump_fsinfo_generic_supports(void *reply, unsigned int size)
 	printf("\twin_fattrs   : %x\n", f->win_file_attrs);
 }
 
+#define FSINFO_FEATURE_NAME(C) [FSINFO_FEAT_##C] = #C
+static const char *fsinfo_feature_names[FSINFO_FEAT__NR] = {
+	FSINFO_FEATURE_NAME(IS_KERNEL_FS),
+	FSINFO_FEATURE_NAME(IS_BLOCK_FS),
+	FSINFO_FEATURE_NAME(IS_FLASH_FS),
+	FSINFO_FEATURE_NAME(IS_NETWORK_FS),
+	FSINFO_FEATURE_NAME(IS_AUTOMOUNTER_FS),
+	FSINFO_FEATURE_NAME(IS_MEMORY_FS),
+	FSINFO_FEATURE_NAME(AUTOMOUNTS),
+	FSINFO_FEATURE_NAME(ADV_LOCKS),
+	FSINFO_FEATURE_NAME(MAND_LOCKS),
+	FSINFO_FEATURE_NAME(LEASES),
+	FSINFO_FEATURE_NAME(UIDS),
+	FSINFO_FEATURE_NAME(GIDS),
+	FSINFO_FEATURE_NAME(PROJIDS),
+	FSINFO_FEATURE_NAME(STRING_USER_IDS),
+	FSINFO_FEATURE_NAME(GUID_USER_IDS),
+	FSINFO_FEATURE_NAME(WINDOWS_ATTRS),
+	FSINFO_FEATURE_NAME(USER_QUOTAS),
+	FSINFO_FEATURE_NAME(GROUP_QUOTAS),
+	FSINFO_FEATURE_NAME(PROJECT_QUOTAS),
+	FSINFO_FEATURE_NAME(XATTRS),
+	FSINFO_FEATURE_NAME(JOURNAL),
+	FSINFO_FEATURE_NAME(DATA_IS_JOURNALLED),
+	FSINFO_FEATURE_NAME(O_SYNC),
+	FSINFO_FEATURE_NAME(O_DIRECT),
+	FSINFO_FEATURE_NAME(VOLUME_ID),
+	FSINFO_FEATURE_NAME(VOLUME_UUID),
+	FSINFO_FEATURE_NAME(VOLUME_NAME),
+	FSINFO_FEATURE_NAME(VOLUME_FSID),
+	FSINFO_FEATURE_NAME(IVER_ALL_CHANGE),
+	FSINFO_FEATURE_NAME(IVER_DATA_CHANGE),
+	FSINFO_FEATURE_NAME(IVER_MONO_INCR),
+	FSINFO_FEATURE_NAME(DIRECTORIES),
+	FSINFO_FEATURE_NAME(SYMLINKS),
+	FSINFO_FEATURE_NAME(HARD_LINKS),
+	FSINFO_FEATURE_NAME(HARD_LINKS_1DIR),
+	FSINFO_FEATURE_NAME(DEVICE_FILES),
+	FSINFO_FEATURE_NAME(UNIX_SPECIALS),
+	FSINFO_FEATURE_NAME(RESOURCE_FORKS),
+	FSINFO_FEATURE_NAME(NAME_CASE_INDEP),
+	FSINFO_FEATURE_NAME(NAME_CASE_FOLD),
+	FSINFO_FEATURE_NAME(NAME_NON_UTF8),
+	FSINFO_FEATURE_NAME(NAME_HAS_CODEPAGE),
+	FSINFO_FEATURE_NAME(SPARSE),
+	FSINFO_FEATURE_NAME(NOT_PERSISTENT),
+	FSINFO_FEATURE_NAME(NO_UNIX_MODE),
+	FSINFO_FEATURE_NAME(HAS_ATIME),
+	FSINFO_FEATURE_NAME(HAS_BTIME),
+	FSINFO_FEATURE_NAME(HAS_CTIME),
+	FSINFO_FEATURE_NAME(HAS_MTIME),
+	FSINFO_FEATURE_NAME(HAS_ACL),
+	FSINFO_FEATURE_NAME(HAS_INODE_NUMBERS),
+};
+
+static void dump_fsinfo_generic_features(void *reply, unsigned int size)
+{
+	struct fsinfo_features *f = reply;
+	int i;
+
+	printf("\n\t");
+	for (i = 0; i < sizeof(f->features); i++)
+		printf("%02x", f->features[i]);
+	printf(" (nr=%u)\n", f->nr_features);
+	for (i = 0; i < FSINFO_FEAT__NR; i++)
+		if (f->features[i / 8] & (1 << (i % 8)))
+			printf("\t- %s\n", fsinfo_feature_names[i]);
+}
+
 static void print_time(struct fsinfo_timestamp_one *t, char stamp)
 {
 	printf("\t%ctime       : gran=%uE%d range=%llx-%llx\n",
@@ -290,6 +359,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		string),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 04/18] fsinfo: Allow retrieval of superblock devname, options and stats [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (2 preceding siblings ...)
  2020-08-03 13:36 ` [PATCH 03/18] fsinfo: Provide a bitmap of the features a filesystem supports " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-03 13:37 ` [PATCH 05/18] fsinfo: Allow fsinfo() to look up a mount object by ID " David Howells
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Provide fsinfo() attributes to retrieve superblock device name, options,
and statistics in string form.  The following attributes are defined:

	FSINFO_ATTR_SOURCE		- Mount-specific device name
	FSINFO_ATTR_CONFIGURATION	- Mount options
	FSINFO_ATTR_FS_STATISTICS	- Filesystem statistics

FSINFO_ATTR_SOURCE could be made indexable by params->Nth to handle the
case where there is more than one source (e.g. the bcachefs filesystem).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                 |   39 +++++++++++++++++++++++++++++++++++++++
 fs/internal.h               |    2 ++
 fs/namespace.c              |   41 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fsinfo.h |    3 +++
 samples/vfs/test-fsinfo.c   |    4 ++++
 5 files changed, 89 insertions(+)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index 79c222d465d8..aef7a736e8fc 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -231,6 +231,42 @@ static int fsinfo_generic_volume_id(struct path *path, struct fsinfo_context *ct
 	return fsinfo_string(path->dentry->d_sb->s_id, ctx);
 }
 
+/*
+ * Retrieve the superblock configuration (mount options) as a comma-separated
+ * string.  The initial comma is stripped off and NUL termination is added.
+ */
+static int fsinfo_generic_seq_read(struct path *path, struct fsinfo_context *ctx)
+{
+	struct super_block *sb = path->dentry->d_sb;
+	struct seq_file m = {
+		.buf	= ctx->buffer,
+		.size	= ctx->buf_size - 1,
+	};
+	int ret = 0;
+
+	switch (ctx->requested_attr) {
+	case FSINFO_ATTR_CONFIGURATION:
+		seq_puts(&m, sb_rdonly(sb) ? "ro" : "rw");
+		ret = security_sb_show_options(&m, sb);
+		if (!ret && sb->s_op->show_options)
+			ret = sb->s_op->show_options(&m, path->mnt->mnt_root);
+		break;
+
+	case FSINFO_ATTR_FS_STATISTICS:
+		if (sb->s_op->show_stats)
+			ret = sb->s_op->show_stats(&m, path->mnt->mnt_root);
+		break;
+	}
+
+	if (ret < 0)
+		return ret;
+	if (seq_has_overflowed(&m))
+		return ctx->buf_size + PAGE_SIZE;
+
+	((char *)ctx->buffer)[ctx->skip + m.count] = 0;
+	return m.count + 1;
+}
+
 static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
@@ -240,6 +276,9 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		fsinfo_generic_features),
+	FSINFO_STRING	(FSINFO_ATTR_SOURCE,		fsinfo_generic_mount_source),
+	FSINFO_STRING	(FSINFO_ATTR_CONFIGURATION,	fsinfo_generic_seq_read),
+	FSINFO_STRING	(FSINFO_ATTR_FS_STATISTICS,	fsinfo_generic_seq_read),
 
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	(void *)123UL),
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
diff --git a/fs/internal.h b/fs/internal.h
index ea60d864a8cb..0b57da498f06 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -89,6 +89,8 @@ extern int __mnt_want_write_file(struct file *);
 extern void __mnt_drop_write_file(struct file *);
 
 extern void dissolve_on_fput(struct vfsmount *);
+extern int fsinfo_generic_mount_source(struct path *, struct fsinfo_context *);
+
 /*
  * fs_struct.c
  */
diff --git a/fs/namespace.c b/fs/namespace.c
index 73ff5bf0c9af..ead8d1a16610 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -30,6 +30,7 @@
 #include <uapi/linux/mount.h>
 #include <linux/fs_context.h>
 #include <linux/shmem_fs.h>
+#include <linux/fsinfo.h>
 
 #include "pnode.h"
 #include "internal.h"
@@ -4111,3 +4112,43 @@ const struct proc_ns_operations mntns_operations = {
 	.install	= mntns_install,
 	.owner		= mntns_owner,
 };
+
+#ifdef CONFIG_FSINFO
+static inline void mangle(struct seq_file *m, const char *s)
+{
+	seq_escape(m, s, " \t\n\\");
+}
+
+/*
+ * Return the mount source/device name as seen from this mountpoint.  Shared
+ * mounts may vary here and the filesystem is permitted to substitute its own
+ * rendering.
+ */
+int fsinfo_generic_mount_source(struct path *path, struct fsinfo_context *ctx)
+{
+	struct super_block *sb = path->mnt->mnt_sb;
+	struct mount *mnt = real_mount(path->mnt);
+	struct seq_file m = {
+		.buf	= ctx->buffer,
+		.size	= ctx->buf_size - 1,
+	};
+	int ret;
+
+	if (sb->s_op->show_devname) {
+		ret = sb->s_op->show_devname(&m, mnt->mnt.mnt_root);
+		if (ret < 0)
+			return ret;
+	} else {
+		if (!mnt->mnt_devname)
+			return fsinfo_string("none", ctx);
+		mangle(&m, mnt->mnt_devname);
+	}
+
+	if (seq_has_overflowed(&m))
+		return ctx->buf_size + PAGE_SIZE;
+
+	((char *)ctx->buffer)[m.count] = 0;
+	return m.count + 1;
+}
+
+#endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index b8b2c836267b..a27e92b68266 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -24,6 +24,9 @@
 #define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
 #define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
 #define FSINFO_ATTR_FEATURES		0x08	/* Filesystem features (bits) */
+#define FSINFO_ATTR_SOURCE		0x09	/* Superblock source/device name (string) */
+#define FSINFO_ATTR_CONFIGURATION	0x0a	/* Superblock configuration/options (string) */
+#define FSINFO_ATTR_FS_STATISTICS	0x0b	/* Superblock filesystem statistics (string) */
 
 #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index c5932109f683..634f30b7e67f 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -364,6 +364,10 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		string),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
 	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	string),
+	FSINFO_STRING	(FSINFO_ATTR_SOURCE,		string),
+	FSINFO_STRING	(FSINFO_ATTR_CONFIGURATION,	string),
+	FSINFO_STRING	(FSINFO_ATTR_FS_STATISTICS,	string),
+
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_meta_attribute_info),
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_meta_attributes),
 	{}



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 05/18] fsinfo: Allow fsinfo() to look up a mount object by ID [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (3 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 04/18] fsinfo: Allow retrieval of superblock devname, options and stats " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-04 10:33   ` Miklos Szeredi
  2020-08-03 13:37 ` [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount " David Howells
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Allow the fsinfo() syscall to look up a mount object by ID rather than by
pathname.  This is necessary as there can be multiple mounts stacked up at
the same pathname and there's no way to look through them otherwise.

This is done by passing FSINFO_FLAGS_QUERY_MOUNT to fsinfo() in the
parameters and then passing the mount ID as a string to fsinfo() in place
of the filename:

	struct fsinfo_params params = {
		.flags	 = FSINFO_FLAGS_QUERY_MOUNT,
		.request = FSINFO_ATTR_IDS,
	};

	ret = fsinfo(AT_FDCWD, "21", &params, buffer, sizeof(buffer));

The caller is only permitted to query a mount object if the root directory
of that mount connects directly to the current chroot if dfd == AT_FDCWD[*]
or the directory specified by dfd otherwise.  Note that this is not
available to the pathwalk of any other syscall.

[*] This needs to be something other than AT_FDCWD, perhaps AT_FDROOT.

[!] This probably needs an LSM hook.

[!] This might want to check the permissions on all the intervening dirs -
    but it would have to do that under RCU conditions.

[!] This might want to check a CAP_* flag.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                 |   53 +++++++++++++++++++
 fs/internal.h               |    1 
 fs/namespace.c              |  117 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/fsinfo.h |    1 
 samples/vfs/test-fsinfo.c   |    7 ++-
 5 files changed, 175 insertions(+), 4 deletions(-)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index aef7a736e8fc..8ccbcddb4f16 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -563,6 +563,56 @@ static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_context *ctx)
 	return ret;
 }
 
+/*
+ * Look up the root of a mount object.  This allows access to mount objects
+ * (and their attached superblocks) that can't be retrieved by path because
+ * they're entirely covered.
+ *
+ * We only permit access to a mount that has a direct path between either the
+ * dentry pointed to by dfd or to our chroot (if dfd is AT_FDCWD).
+ */
+static int vfs_fsinfo_mount(int dfd, const char __user *filename,
+			    struct fsinfo_context *ctx)
+{
+	struct path path;
+	struct fd f = {};
+	char *name;
+	unsigned long mnt_id;
+	int ret;
+
+	if (!filename)
+		return -EINVAL;
+
+	name = strndup_user(filename, 32);
+	if (IS_ERR(name))
+		return PTR_ERR(name);
+	ret = kstrtoul(name, 0, &mnt_id);
+	if (ret < 0)
+		goto out_name;
+	if (mnt_id > INT_MAX)
+		goto out_name;
+
+	if (dfd != AT_FDCWD) {
+		ret = -EBADF;
+		f = fdget_raw(dfd);
+		if (!f.file)
+			goto out_name;
+	}
+
+	ret = lookup_mount_object(f.file ? &f.file->f_path : NULL,
+				  mnt_id, &path);
+	if (ret < 0)
+		goto out_fd;
+
+	ret = vfs_fsinfo(&path, ctx);
+	path_put(&path);
+out_fd:
+	fdput(f);
+out_name:
+	kfree(name);
+	return ret;
+}
+
 /**
  * sys_fsinfo - System call to get filesystem information
  * @dfd: Base directory to pathwalk from or fd referring to filesystem.
@@ -636,6 +686,9 @@ SYSCALL_DEFINE6(fsinfo,
 			return -EINVAL;
 		ret = vfs_fsinfo_fd(dfd, &ctx);
 		break;
+	case FSINFO_FLAGS_QUERY_MOUNT:
+		ret = vfs_fsinfo_mount(dfd, pathname, &ctx);
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/fs/internal.h b/fs/internal.h
index 0b57da498f06..84bbb743a5ac 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -89,6 +89,7 @@ extern int __mnt_want_write_file(struct file *);
 extern void __mnt_drop_write_file(struct file *);
 
 extern void dissolve_on_fput(struct vfsmount *);
+extern int lookup_mount_object(struct path *, unsigned int, struct path *);
 extern int fsinfo_generic_mount_source(struct path *, struct fsinfo_context *);
 
 /*
diff --git a/fs/namespace.c b/fs/namespace.c
index ead8d1a16610..b2b9920ffd3c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -64,7 +64,7 @@ static int __init set_mphash_entries(char *str)
 __setup("mphash_entries=", set_mphash_entries);
 
 static u64 event;
-static DEFINE_IDA(mnt_id_ida);
+static DEFINE_IDR(mnt_id_ida);
 static DEFINE_IDA(mnt_group_ida);
 
 static struct hlist_head *mount_hashtable __read_mostly;
@@ -105,17 +105,27 @@ static inline struct hlist_head *mp_hash(struct dentry *dentry)
 
 static int mnt_alloc_id(struct mount *mnt)
 {
-	int res = ida_alloc(&mnt_id_ida, GFP_KERNEL);
+	int res;
 
+	/* Allocate an ID, but don't set the pointer back to the mount until
+	 * later, as once we do that, we have to follow RCU protocols to get
+	 * rid of the mount struct.
+	 */
+	res = idr_alloc(&mnt_id_ida, NULL, 0, INT_MAX, GFP_KERNEL);
 	if (res < 0)
 		return res;
 	mnt->mnt_id = res;
 	return 0;
 }
 
+static void mnt_publish_id(struct mount *mnt)
+{
+	idr_replace(&mnt_id_ida, mnt, mnt->mnt_id);
+}
+
 static void mnt_free_id(struct mount *mnt)
 {
-	ida_free(&mnt_id_ida, mnt->mnt_id);
+	idr_remove(&mnt_id_ida, mnt->mnt_id);
 }
 
 /*
@@ -975,6 +985,7 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
 	lock_mount_hash();
 	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
 	unlock_mount_hash();
+	mnt_publish_id(mnt);
 	return &mnt->mnt;
 }
 EXPORT_SYMBOL(vfs_create_mount);
@@ -1068,6 +1079,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	lock_mount_hash();
 	list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
 	unlock_mount_hash();
+	mnt_publish_id(mnt);
 
 	if ((flag & CL_SLAVE) ||
 	    ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) {
@@ -4151,4 +4163,103 @@ int fsinfo_generic_mount_source(struct path *path, struct fsinfo_context *ctx)
 	return m.count + 1;
 }
 
+/*
+ * See if one path point connects directly to another by ancestral relationship
+ * across mountpoints.  Must call with the RCU read lock held.
+ */
+static bool are_paths_connected(struct path *ancestor, struct path *to_check)
+{
+	struct mount *mnt, *parent;
+	struct path cursor;
+	unsigned seq;
+	bool connected;
+
+	seq = 0;
+restart:
+	cursor = *to_check;
+
+	read_seqbegin_or_lock(&rename_lock, &seq);
+	while (cursor.mnt != ancestor->mnt) {
+		mnt = real_mount(cursor.mnt);
+		parent = READ_ONCE(mnt->mnt_parent);
+		if (mnt == parent)
+			goto failed;
+		cursor.dentry = READ_ONCE(mnt->mnt_mountpoint);
+		cursor.mnt = &parent->mnt;
+	}
+
+	while (cursor.dentry != ancestor->dentry) {
+		if (cursor.dentry == cursor.mnt->mnt_root ||
+		    IS_ROOT(cursor.dentry))
+			goto failed;
+		cursor.dentry = READ_ONCE(cursor.dentry->d_parent);
+	}
+
+	connected = true;
+out:
+	done_seqretry(&rename_lock, seq);
+	return connected;
+
+failed:
+	if (need_seqretry(&rename_lock, seq)) {
+		seq = 1;
+		goto restart;
+	}
+	connected = false;
+	goto out;
+}
+
+/**
+ * lookup_mount_object - Look up a vfsmount object by ID
+ * @root: The mount root must connect backwards to this point (or chroot if NULL).
+ * @id: The ID of the mountpoint.
+ * @_mntpt: Where to return the resulting mountpoint path.
+ *
+ * Look up the root of the mount with the corresponding ID.  This is only
+ * permitted if that mount connects directly to the specified root/chroot.
+ */
+int lookup_mount_object(struct path *root, unsigned int mnt_id, struct path *_mntpt)
+{
+	struct mount *mnt;
+	struct path stop, mntpt = {};
+	int ret = -EPERM;
+
+	if (!root)
+		get_fs_root(current->fs, &stop);
+	else
+		stop = *root;
+
+	rcu_read_lock();
+	lock_mount_hash();
+	mnt = idr_find(&mnt_id_ida, mnt_id);
+	if (!mnt)
+		goto out_unlock_mh;
+	if (mnt->mnt.mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
+		goto out_unlock_mh;
+	if (mnt_get_count(mnt) == 0)
+		goto out_unlock_mh;
+	mnt_add_count(mnt, 1);
+	mntpt.mnt = &mnt->mnt;
+	mntpt.dentry = dget(mnt->mnt.mnt_root);
+	unlock_mount_hash();
+
+	if (are_paths_connected(&stop, &mntpt)) {
+		*_mntpt = mntpt;
+		mntpt.mnt = NULL;
+		mntpt.dentry = NULL;
+		ret = 0;
+	}
+
+out_unlock:
+	rcu_read_unlock();
+	if (!root)
+		path_put(&stop);
+	path_put(&mntpt);
+	return ret;
+
+out_unlock_mh:
+	unlock_mount_hash();
+	goto out_unlock;
+}
+
 #endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index a27e92b68266..d24e47762a07 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -44,6 +44,7 @@ struct fsinfo_params {
 #define FSINFO_FLAGS_QUERY_MASK	0x0007 /* What object should fsinfo() query? */
 #define FSINFO_FLAGS_QUERY_PATH	0x0000 /* - path, specified by dirfd,pathname,AT_EMPTY_PATH */
 #define FSINFO_FLAGS_QUERY_FD	0x0001 /* - fd specified by dirfd */
+#define FSINFO_FLAGS_QUERY_MOUNT 0x0002	/* - mount object (path=>mount_id, dirfd=>subtree) */
 	__u32	request;	/* ID of requested attribute */
 	__u32	Nth;		/* Instance of it (some may have multiple) */
 	__u32	Mth;		/* Subinstance of Nth instance */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 634f30b7e67f..dfa44bba8bbd 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -593,7 +593,7 @@ int main(int argc, char **argv)
 	bool meta = false;
 	int raw = 0, opt, Nth, Mth;
 
-	while ((opt = getopt(argc, argv, "Madlr"))) {
+	while ((opt = getopt(argc, argv, "Madmlr"))) {
 		switch (opt) {
 		case 'M':
 			meta = true;
@@ -609,6 +609,10 @@ int main(int argc, char **argv)
 			params.at_flags &= ~AT_SYMLINK_NOFOLLOW;
 			params.flags = FSINFO_FLAGS_QUERY_PATH;
 			continue;
+		case 'm':
+			params.resolve_flags = 0;
+			params.flags = FSINFO_FLAGS_QUERY_MOUNT;
+			continue;
 		case 'r':
 			raw = 1;
 			continue;
@@ -621,6 +625,7 @@ int main(int argc, char **argv)
 
 	if (argc != 1) {
 		printf("Format: test-fsinfo [-Madlr] <path>\n");
+		printf("Format: test-fsinfo [-Mdr] -m <mnt_id>\n");
 		exit(2);
 	}
 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (4 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 05/18] fsinfo: Allow fsinfo() to look up a mount object by ID " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-04 10:41   ` Miklos Szeredi
  2020-08-05 14:13   ` David Howells
  2020-08-03 13:37 ` [PATCH 07/18] fsinfo: Allow mount information to be queried " David Howells
                   ` (13 subsequent siblings)
  19 siblings, 2 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add a uniquifier ID to struct mount that is effectively unique over the
kernel lifetime to deal around mnt_id values being reused.  This can then
be exported through fsinfo() to allow detection of replacement mounts that
happen to end up with the same mount ID.

The normal mount handle is still used for referring to a particular mount.

The mount notification is then changed to convey these unique mount IDs
rather than the mount handle.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/mount.h        |    3 +++
 fs/mount_notify.c |    4 ++--
 fs/namespace.c    |    3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 85456a5f5a3a..1037781be055 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -79,6 +79,9 @@ struct mount {
 	int mnt_expiry_mark;		/* true if marked for expiry */
 	struct hlist_head mnt_pins;
 	struct hlist_head mnt_stuck_children;
+#ifdef CONFIG_FSINFO
+	u64	mnt_unique_id;		/* ID unique over lifetime of kernel */
+#endif
 #ifdef CONFIG_MOUNT_NOTIFICATIONS
 	struct watch_list *mnt_watchers; /* Watches on dentries within this mount */
 #endif
diff --git a/fs/mount_notify.c b/fs/mount_notify.c
index 44f570e4cebe..d8ba66ed5f77 100644
--- a/fs/mount_notify.c
+++ b/fs/mount_notify.c
@@ -90,7 +90,7 @@ void notify_mount(struct mount *trigger,
 	n.watch.type	= WATCH_TYPE_MOUNT_NOTIFY;
 	n.watch.subtype	= subtype;
 	n.watch.info	= info_flags | watch_sizeof(n);
-	n.triggered_on	= trigger->mnt_id;
+	n.triggered_on	= trigger->mnt_unique_id;
 
 	switch (subtype) {
 	case NOTIFY_MOUNT_EXPIRY:
@@ -102,7 +102,7 @@ void notify_mount(struct mount *trigger,
 	case NOTIFY_MOUNT_UNMOUNT:
 	case NOTIFY_MOUNT_MOVE_FROM:
 	case NOTIFY_MOUNT_MOVE_TO:
-		n.auxiliary_mount	= aux->mnt_id;
+		n.auxiliary_mount = aux->mnt_unique_id;
 		break;
 
 	default:
diff --git a/fs/namespace.c b/fs/namespace.c
index b2b9920ffd3c..1db8a64cd76f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -115,6 +115,9 @@ static int mnt_alloc_id(struct mount *mnt)
 	if (res < 0)
 		return res;
 	mnt->mnt_id = res;
+#ifdef CONFIG_FSINFO
+	mnt->mnt_unique_id = atomic64_inc_return(&vfs_unique_counter);
+#endif
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 07/18] fsinfo: Allow mount information to be queried [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (5 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-03 13:37 ` [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved " David Howells
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Allow mount information, including information about a mount object to be
queried with the fsinfo() system call.  Setting FSINFO_FLAGS_QUERY_MOUNT
allows overlapping mounts to be queried by indicating that the syscall
should interpret the pathname as a number indicating the mount ID.

To this end, a number of fsinfo() attributes are provided:

 (1) FSINFO_ATTR_MOUNT_INFO.

     This is a structure providing information about a mount, including:

	- Mount ID (can be used with FSINFO_FLAGS_QUERY_MOUNT).
	- Mount uniquifier ID.
	- Mount attributes (eg. R/O, NOEXEC).
	- Mount change/notification counters.
	- Superblock ID.
	- Superblock change/notification counters.

 (2) FSINFO_ATTR_MOUNT_PATH.

     This a string providing information about a bind mount relative the
     the root that was bound off, though it may get overridden by the
     filesystem (NFS unconditionally sets it to "/", for example).

 (3) FSINFO_ATTR_MOUNT_POINT.

     This is a string indicating the name of the mountpoint within the
     parent mount, limited to the parent's mounted root and the chroot.

 (4) FSINFO_ATTR_MOUNT_POINT_FULL.

     This is a string indicating the full path of the mountpoint, limited to
     the chroot.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/d_path.c                 |    2 -
 fs/fsinfo.c                 |   12 +++++
 fs/internal.h               |    9 +++
 fs/namespace.c              |  114 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fsinfo.h |   17 ++++++
 samples/vfs/test-fsinfo.c   |   16 ++++++
 6 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/fs/d_path.c b/fs/d_path.c
index 0f1fc1743302..4c203f64e45e 100644
--- a/fs/d_path.c
+++ b/fs/d_path.c
@@ -229,7 +229,7 @@ static int prepend_unreachable(char **buffer, int *buflen)
 	return prepend(buffer, buflen, "(unreachable)", 13);
 }
 
-static void get_fs_root_rcu(struct fs_struct *fs, struct path *root)
+void get_fs_root_rcu(struct fs_struct *fs, struct path *root)
 {
 	unsigned seq;
 
diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index 8ccbcddb4f16..f276857709ee 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -252,6 +252,13 @@ static int fsinfo_generic_seq_read(struct path *path, struct fsinfo_context *ctx
 			ret = sb->s_op->show_options(&m, path->mnt->mnt_root);
 		break;
 
+	case FSINFO_ATTR_MOUNT_PATH:
+		if (sb->s_op->show_path)
+			ret = sb->s_op->show_path(&m, path->mnt->mnt_root);
+		else
+			seq_dentry(&m, path->mnt->mnt_root, " \t\n\\");
+		break;
+
 	case FSINFO_ATTR_FS_STATISTICS:
 		if (sb->s_op->show_stats)
 			ret = sb->s_op->show_stats(&m, path->mnt->mnt_root);
@@ -282,6 +289,11 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	(void *)123UL),
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
+
+	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
+	FSINFO_STRING	(FSINFO_ATTR_MOUNT_PATH,	fsinfo_generic_seq_read),
+	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
+	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT_FULL,	fsinfo_generic_mount_point_full),
 	{}
 };
 
diff --git a/fs/internal.h b/fs/internal.h
index 84bbb743a5ac..a56008b7f3ec 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -15,6 +15,7 @@ struct mount;
 struct shrink_control;
 struct fs_context;
 struct user_namespace;
+struct fsinfo_context;
 
 /*
  * block_dev.c
@@ -46,6 +47,11 @@ extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
  */
 extern void __init chrdev_init(void);
 
+/*
+ * d_path.c
+ */
+extern void get_fs_root_rcu(struct fs_struct *fs, struct path *root);
+
 /*
  * fs_context.c
  */
@@ -91,6 +97,9 @@ extern void __mnt_drop_write_file(struct file *);
 extern void dissolve_on_fput(struct vfsmount *);
 extern int lookup_mount_object(struct path *, unsigned int, struct path *);
 extern int fsinfo_generic_mount_source(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_info(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_point(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_point_full(struct path *, struct fsinfo_context *);
 
 /*
  * fs_struct.c
diff --git a/fs/namespace.c b/fs/namespace.c
index 1db8a64cd76f..c196af35d39d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4265,4 +4265,118 @@ int lookup_mount_object(struct path *root, unsigned int mnt_id, struct path *_mn
 	goto out_unlock;
 }
 
+/*
+ * Retrieve information about the nominated mount.
+ */
+int fsinfo_generic_mount_info(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_mount_info *p = ctx->buffer;
+	struct super_block *sb;
+	struct mount *m;
+	unsigned int flags;
+
+	m = real_mount(path->mnt);
+	sb = m->mnt.mnt_sb;
+
+	p->sb_unique_id		= sb->s_unique_id;
+	p->mnt_unique_id	= m->mnt_unique_id;
+	p->mnt_id		= m->mnt_id;
+
+	flags = READ_ONCE(m->mnt.mnt_flags);
+	if (flags & MNT_READONLY)
+		p->attr |= MOUNT_ATTR_RDONLY;
+	if (flags & MNT_NOSUID)
+		p->attr |= MOUNT_ATTR_NOSUID;
+	if (flags & MNT_NODEV)
+		p->attr |= MOUNT_ATTR_NODEV;
+	if (flags & MNT_NOEXEC)
+		p->attr |= MOUNT_ATTR_NOEXEC;
+	if (flags & MNT_NODIRATIME)
+		p->attr |= MOUNT_ATTR_NODIRATIME;
+
+	if (flags & MNT_NOATIME)
+		p->attr |= MOUNT_ATTR_NOATIME;
+	else if (flags & MNT_RELATIME)
+		p->attr |= MOUNT_ATTR_RELATIME;
+	else
+		p->attr |= MOUNT_ATTR_STRICTATIME;
+	return sizeof(*p);
+}
+
+/*
+ * Return the path of this mount relative to its parent and clipped to
+ * the current chroot.
+ */
+int fsinfo_generic_mount_point(struct path *path, struct fsinfo_context *ctx)
+{
+	struct mountpoint *mp;
+	struct mount *m, *parent;
+	struct path mountpoint, root;
+	void *p;
+
+	rcu_read_lock();
+
+	m = real_mount(path->mnt);
+	parent = m->mnt_parent;
+	if (parent == m)
+		goto skip;
+	mp = READ_ONCE(m->mnt_mp);
+	if (mp)
+		goto found;
+skip:
+	rcu_read_unlock();
+	return -ENODATA;
+
+found:
+	mountpoint.mnt = &parent->mnt;
+	mountpoint.dentry = READ_ONCE(mp->m_dentry);
+
+	get_fs_root_rcu(current->fs, &root);
+	if (path->mnt == root.mnt) {
+		rcu_read_unlock();
+		return fsinfo_string("/", ctx);
+	}
+
+	if (root.mnt != &parent->mnt) {
+		root.mnt = &parent->mnt;
+		root.dentry = parent->mnt.mnt_root;
+	}
+
+	((char *)ctx->buffer)[ctx->buf_size - 1] = 0;
+	p = __d_path(&mountpoint, &root, ctx->buffer, ctx->buf_size - 1);
+	rcu_read_unlock();
+
+	if (IS_ERR(p))
+		return PTR_ERR(p);
+	if (!p)
+		return -EPERM;
+
+	ctx->skip = p - ctx->buffer;
+	return (ctx->buffer + ctx->buf_size) - p;
+}
+
+/*
+ * Return the path of this mount from the current chroot.
+ */
+int fsinfo_generic_mount_point_full(struct path *path, struct fsinfo_context *ctx)
+{
+	struct path root;
+	void *p;
+
+	((char *)ctx->buffer)[ctx->buf_size - 1] = 0;
+
+	rcu_read_lock();
+	get_fs_root_rcu(current->fs, &root);
+	p = __d_path(path, &root, ctx->buffer, ctx->buf_size - 1);
+	rcu_read_unlock();
+
+	if (IS_ERR(p))
+		return PTR_ERR(p);
+	if (!p)
+		return -EPERM;
+
+	ctx->skip = p - ctx->buffer;
+	return (ctx->buffer + ctx->buf_size) - p;
+}
+
 #endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index d24e47762a07..15ef161905cd 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -31,6 +31,11 @@
 #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
 
+#define FSINFO_ATTR_MOUNT_INFO		0x200	/* Mount object information */
+#define FSINFO_ATTR_MOUNT_PATH		0x201	/* Bind mount/superblock path (string) */
+#define FSINFO_ATTR_MOUNT_POINT		0x202	/* Relative path of mount in parent (string) */
+#define FSINFO_ATTR_MOUNT_POINT_FULL	0x203	/* Absolute path of mount (string) */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -85,6 +90,18 @@ struct fsinfo_u128 {
 #endif
 };
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_MOUNT_INFO).
+ */
+struct fsinfo_mount_info {
+	__u64	sb_unique_id;		/* Kernel-lifetime unique superblock ID */
+	__u64	mnt_unique_id;		/* Kernel-lifetime unique mount ID */
+	__u32	mnt_id;			/* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */
+	__u32	attr;			/* MOUNT_ATTR_* flags */
+};
+
+#define FSINFO_ATTR_MOUNT_INFO__STRUCT struct fsinfo_mount_info
+
 /*
  * Information struct for fsinfo(FSINFO_ATTR_STATFS).
  * - This gives extended filesystem information.
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index dfa44bba8bbd..f3bebb7318d9 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -294,6 +294,17 @@ static void dump_fsinfo_generic_volume_uuid(void *reply, unsigned int size)
 	       f->uuid[14], f->uuid[15]);
 }
 
+static void dump_fsinfo_generic_mount_info(void *reply, unsigned int size)
+{
+	struct fsinfo_mount_info *r = reply;
+
+	printf("\n");
+	printf("\tsb_uniq : %llx\n", (unsigned long long)r->sb_unique_id);
+	printf("\tmnt_uniq: %llx\n", (unsigned long long)r->mnt_unique_id);
+	printf("\tmnt_id  : %x\n", r->mnt_id);
+	printf("\tattr    : %x\n", r->attr);
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -370,6 +381,11 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_meta_attribute_info),
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_meta_attributes),
+
+	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
+	FSINFO_STRING	(FSINFO_ATTR_MOUNT_PATH,	string),
+	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	string),
+	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT_FULL,	string),
 	{}
 };
 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (6 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 07/18] fsinfo: Allow mount information to be queried " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-04 13:38   ` Miklos Szeredi
  2020-08-05 15:37   ` David Howells
  2020-08-03 13:37 ` [PATCH 09/18] watch_queue: Mount event counters " David Howells
                   ` (11 subsequent siblings)
  19 siblings, 2 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add a couple of attributes to allow information about the mount topology
and propagation to be retrieved:

 (1) FSINFO_ATTR_MOUNT_TOPOLOGY.

     Information about a mount's parentage in the mount topology tree and
     its propagation attributes.

     This has to be collected with the VFS namespace lock held, so it's
     separate from FSINFO_ATTR_MOUNT_INFO.  The topology change counter
     that a subsequent patch will export can be used to work out from the
     cheaper _INFO attribute as to whether the more expensive _TOPOLOGY
     attribute needs requerying.

     MOUNT_PROPAGATION_* flags are added to linux/mount.h for UAPI
     consumption.  At some point a mount_setattr() system call needs to be
     added.

 (2) FSINFO_ATTR_MOUNT_CHILDREN.

     Information about a mount's children in the mount topology tree.

     This is formatted as an array of structures, one for each child and
     capped with one for the argument mount (checked after listing all the
     children).  Each element contains the static IDs of the respective
     mount object along with a sum of its change attributes.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                 |    2 +
 fs/internal.h               |    2 +
 fs/namespace.c              |   94 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fsinfo.h |   27 ++++++++++++
 include/uapi/linux/mount.h  |   13 +++++-
 samples/vfs/test-fsinfo.c   |   55 +++++++++++++++++++++++++
 6 files changed, 192 insertions(+), 1 deletion(-)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index f276857709ee..0540cce89555 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -291,9 +291,11 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
 
 	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_TOPOLOGY,	fsinfo_generic_mount_topology),
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_PATH,	fsinfo_generic_seq_read),
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT_FULL,	fsinfo_generic_mount_point_full),
+	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
 	{}
 };
 
diff --git a/fs/internal.h b/fs/internal.h
index a56008b7f3ec..cb5edcc7125a 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -98,8 +98,10 @@ extern void dissolve_on_fput(struct vfsmount *);
 extern int lookup_mount_object(struct path *, unsigned int, struct path *);
 extern int fsinfo_generic_mount_source(struct path *, struct fsinfo_context *);
 extern int fsinfo_generic_mount_info(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_topology(struct path *, struct fsinfo_context *);
 extern int fsinfo_generic_mount_point(struct path *, struct fsinfo_context *);
 extern int fsinfo_generic_mount_point_full(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_children(struct path *, struct fsinfo_context *);
 
 /*
  * fs_struct.c
diff --git a/fs/namespace.c b/fs/namespace.c
index c196af35d39d..b5c2a3b4f96d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4303,6 +4303,54 @@ int fsinfo_generic_mount_info(struct path *path, struct fsinfo_context *ctx)
 	return sizeof(*p);
 }
 
+/*
+ * Retrieve information about the topology at the nominated mount and
+ * its propogation attributes.
+ */
+int fsinfo_generic_mount_topology(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_mount_topology *p = ctx->buffer;
+	struct mount *m;
+	struct path root;
+
+	get_fs_root(current->fs, &root);
+
+	namespace_lock();
+
+	m = real_mount(path->mnt);
+
+	p->parent_id = m->mnt_parent->mnt_id;
+
+	if (path->mnt == root.mnt) {
+		p->parent_id = m->mnt_id;
+	} else {
+		rcu_read_lock();
+		if (!are_paths_connected(&root, path))
+			p->parent_id = m->mnt_id;
+		rcu_read_unlock();
+	}
+
+	if (IS_MNT_SHARED(m)) {
+		p->shared_group_id = m->mnt_group_id;
+		p->propagation_type |= MOUNT_PROPAGATION_SHARED;
+	} else if (IS_MNT_SLAVE(m)) {
+		int source = m->mnt_master->mnt_group_id;
+		int from = get_dominating_id(m, &root);
+		p->dependent_source_id = source;
+		if (from && from != source)
+			p->dependent_clone_of_id = from;
+		p->propagation_type |= MOUNT_PROPAGATION_DEPENDENT;
+	} else if (IS_MNT_UNBINDABLE(m)) {
+		p->propagation_type |= MOUNT_PROPAGATION_UNBINDABLE;
+	} else {
+		p->propagation_type |= MOUNT_PROPAGATION_PRIVATE;
+	}
+
+	namespace_unlock();
+	path_put(&root);
+	return sizeof(*p);
+}
+
 /*
  * Return the path of this mount relative to its parent and clipped to
  * the current chroot.
@@ -4379,4 +4427,50 @@ int fsinfo_generic_mount_point_full(struct path *path, struct fsinfo_context *ct
 	return (ctx->buffer + ctx->buf_size) - p;
 }
 
+/*
+ * Store a mount record into the fsinfo buffer.
+ */
+static void fsinfo_store_mount(struct fsinfo_context *ctx, const struct mount *p,
+			       bool is_root)
+{
+	struct fsinfo_mount_child record = {};
+	unsigned int usage = ctx->usage;
+
+	if (ctx->usage >= INT_MAX)
+		return;
+	ctx->usage = usage + sizeof(record);
+	if (!ctx->buffer || ctx->usage > ctx->buf_size)
+		return;
+
+	record.mnt_unique_id	= p->mnt_unique_id;
+	record.mnt_id		= p->mnt_id;
+	record.parent_id	= is_root ? p->mnt_id : p->mnt_parent->mnt_id;
+	memcpy(ctx->buffer + usage, &record, sizeof(record));
+}
+
+/*
+ * Return information about the submounts relative to path.
+ */
+int fsinfo_generic_mount_children(struct path *path, struct fsinfo_context *ctx)
+{
+	struct mount *m, *child;
+
+	m = real_mount(path->mnt);
+
+	read_seqlock_excl(&mount_lock);
+
+	list_for_each_entry_rcu(child, &m->mnt_mounts, mnt_child) {
+		if (child->mnt_parent != m)
+			continue;
+		fsinfo_store_mount(ctx, child, false);
+	}
+
+	/* End the list with a copy of the parameter mount's details so that
+	 * userspace can quickly check for changes.
+	 */
+	fsinfo_store_mount(ctx, m, true);
+	read_sequnlock_excl(&mount_lock);
+	return ctx->usage;
+}
+
 #endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 15ef161905cd..f0a352b7028e 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -35,6 +35,8 @@
 #define FSINFO_ATTR_MOUNT_PATH		0x201	/* Bind mount/superblock path (string) */
 #define FSINFO_ATTR_MOUNT_POINT		0x202	/* Relative path of mount in parent (string) */
 #define FSINFO_ATTR_MOUNT_POINT_FULL	0x203	/* Absolute path of mount (string) */
+#define FSINFO_ATTR_MOUNT_TOPOLOGY	0x204	/* Mount object topology */
+#define FSINFO_ATTR_MOUNT_CHILDREN	0x205	/* Children of this mount (list) */
 
 /*
  * Optional fsinfo() parameter structure.
@@ -102,6 +104,31 @@ struct fsinfo_mount_info {
 
 #define FSINFO_ATTR_MOUNT_INFO__STRUCT struct fsinfo_mount_info
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_MOUNT_TOPOLOGY).
+ */
+struct fsinfo_mount_topology {
+	__u32	parent_id;		/* Parent mount identifier */
+	__u32	shared_group_id;	/* Shared: mount group ID */
+	__u32	dependent_source_id;	/* Dependent: source mount group ID */
+	__u32	dependent_clone_of_id;	/* Dependent: ID of mount this was cloned from */
+	__u32	propagation_type;	/* MOUNT_PROPAGATION_* type */
+};
+
+#define FSINFO_ATTR_MOUNT_TOPOLOGY__STRUCT struct fsinfo_mount_topology
+
+/*
+ * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_CHILDREN).
+ * - An extra element is placed on the end representing the parent mount.
+ */
+struct fsinfo_mount_child {
+	__u64	mnt_unique_id;		/* Kernel-lifetime unique mount ID */
+	__u32	mnt_id;			/* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */
+	__u32	parent_id;		/* Parent mount identifier */
+};
+
+#define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct fsinfo_mount_child
+
 /*
  * Information struct for fsinfo(FSINFO_ATTR_STATFS).
  * - This gives extended filesystem information.
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 96a0240f23fe..9ac8bb708843 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -105,7 +105,7 @@ enum fsconfig_command {
 #define FSMOUNT_CLOEXEC		0x00000001
 
 /*
- * Mount attributes.
+ * Mount object attributes (these are separate to filesystem attributes).
  */
 #define MOUNT_ATTR_RDONLY	0x00000001 /* Mount read-only */
 #define MOUNT_ATTR_NOSUID	0x00000002 /* Ignore suid and sgid bits */
@@ -117,4 +117,15 @@ enum fsconfig_command {
 #define MOUNT_ATTR_STRICTATIME	0x00000020 /* - Always perform atime updates */
 #define MOUNT_ATTR_NODIRATIME	0x00000080 /* Do not update directory access times */
 
+/*
+ * Mount object propagation type.
+ */
+enum propagation_type {
+	/* 0 is left unallocated to mean "no change" in mount_setattr()  */
+	MOUNT_PROPAGATION_UNBINDABLE	= 1, /* Make unbindable. */
+	MOUNT_PROPAGATION_PRIVATE	= 2, /* Do not receive or send mount events. */
+	MOUNT_PROPAGATION_DEPENDENT	= 3, /* Only receive mount events. */
+	MOUNT_PROPAGATION_SHARED	= 4, /* Send and receive mount events. */
+};
+
 #endif /* _UAPI_LINUX_MOUNT_H */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index f3bebb7318d9..b7290ea8eb55 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -21,6 +21,7 @@
 #include <sys/syscall.h>
 #include <linux/fsinfo.h>
 #include <linux/socket.h>
+#include <linux/mount.h>
 #include <sys/stat.h>
 #include <arpa/inet.h>
 
@@ -305,6 +306,58 @@ static void dump_fsinfo_generic_mount_info(void *reply, unsigned int size)
 	printf("\tattr    : %x\n", r->attr);
 }
 
+static void dump_fsinfo_generic_mount_topology(void *reply, unsigned int size)
+{
+	struct fsinfo_mount_topology *r = reply;
+
+	printf("\n");
+	printf("\tparent  : %x\n", r->parent_id);
+
+	switch (r->propagation_type) {
+	case MOUNT_PROPAGATION_UNBINDABLE:
+		printf("\tpropag  : unbindable\n");
+		break;
+	case MOUNT_PROPAGATION_PRIVATE:
+		printf("\tpropag  : private\n");
+		break;
+	case MOUNT_PROPAGATION_DEPENDENT:
+		printf("\tpropag  : dependent source=%x clone_of=%x\n",
+		       r->dependent_source_id, r->dependent_clone_of_id);
+		break;
+	case MOUNT_PROPAGATION_SHARED:
+		printf("\tpropag  : shared group=%x\n", r->shared_group_id);
+		break;
+	default:
+		printf("\tpropag  : unknown type %x\n", r->propagation_type);
+		break;
+	}
+
+}
+
+static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
+{
+	struct fsinfo_mount_child *r = reply;
+	ssize_t mplen;
+	char path[32], *mp;
+
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
+		.request	= FSINFO_ATTR_MOUNT_POINT,
+	};
+
+	if (!list_last) {
+		sprintf(path, "%u", r->mnt_id);
+		mplen = get_fsinfo(path, "FSINFO_ATTR_MOUNT_POINT", &params, (void **)&mp);
+		if (mplen < 0)
+			mp = "-";
+	} else {
+		mp = "<this>";
+	}
+
+	printf("%8x %16llx %s\n",
+	       r->mnt_id, (unsigned long long)r->mnt_unique_id, mp);
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -383,9 +436,11 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_meta_attributes),
 
 	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_TOPOLOGY,	fsinfo_generic_mount_topology),
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_PATH,	string),
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	string),
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT_FULL,	string),
+	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
 	{}
 };
 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 09/18] watch_queue: Mount event counters [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (7 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-03 13:37 ` [PATCH 10/18] fsinfo: Provide notification overrun handling support " David Howells
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add three event counters to each mount object:

 (1) mnt_topology_changes.

     Counts the number of changes to the mount tree topology, including
     addition of new mount objects, removal of mount objects and mount
     objects being moved about.

 (2) mnt_attr_changes.

     Counts the number of changes to a mount object's attributes, such as
     whether or not the device files it contains are interpretable as such.

 (3) mnt_subtree_notifications.

     Counts the number of events within the mount subtree at this point.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/mount.h        |    3 +++
 fs/mount_notify.c |    4 ++++
 2 files changed, 7 insertions(+)

diff --git a/fs/mount.h b/fs/mount.h
index 1037781be055..9758a9fa8f69 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -83,6 +83,9 @@ struct mount {
 	u64	mnt_unique_id;		/* ID unique over lifetime of kernel */
 #endif
 #ifdef CONFIG_MOUNT_NOTIFICATIONS
+	atomic_long_t mnt_topology_changes;	/* Number of topology changes applied */
+	atomic_long_t mnt_attr_changes;		/* Number of attribute changes applied */
+	atomic_long_t mnt_subtree_notifications; /* Number of notifications in subtree */
 	struct watch_list *mnt_watchers; /* Watches on dentries within this mount */
 #endif
 } __randomize_layout;
diff --git a/fs/mount_notify.c b/fs/mount_notify.c
index d8ba66ed5f77..57eebae51cb1 100644
--- a/fs/mount_notify.c
+++ b/fs/mount_notify.c
@@ -61,6 +61,7 @@ static void post_mount_notification(struct mount *changed,
 			cursor.dentry = READ_ONCE(mnt->mnt_mountpoint);
 			mnt = parent;
 			cursor.mnt = &mnt->mnt;
+			atomic_long_inc(&mnt->mnt_subtree_notifications);
 		} else {
 			cursor.dentry = cursor.dentry->d_parent;
 		}
@@ -96,6 +97,7 @@ void notify_mount(struct mount *trigger,
 	case NOTIFY_MOUNT_EXPIRY:
 	case NOTIFY_MOUNT_READONLY:
 	case NOTIFY_MOUNT_SETATTR:
+		atomic_long_inc(&trigger->mnt_attr_changes);
 		break;
 
 	case NOTIFY_MOUNT_NEW_MOUNT:
@@ -103,6 +105,8 @@ void notify_mount(struct mount *trigger,
 	case NOTIFY_MOUNT_MOVE_FROM:
 	case NOTIFY_MOUNT_MOVE_TO:
 		n.auxiliary_mount = aux->mnt_unique_id;
+		atomic_long_inc(&trigger->mnt_topology_changes);
+		atomic_long_inc(&aux->mnt_topology_changes);
 		break;
 
 	default:



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (8 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 09/18] watch_queue: Mount event counters " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-04 13:56   ` Miklos Szeredi
  2020-08-05 16:06   ` David Howells
  2020-08-03 13:37 ` [PATCH 11/18] fsinfo: sample: Mount listing program " David Howells
                   ` (9 subsequent siblings)
  19 siblings, 2 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Provide support for the handling of an overrun in a watch queue.  In the
event that an overrun occurs, the watcher needs to be able to find out what
it was that they missed.  To this end, previous patches added event
counters to struct mount.

To make them accessible, they can be retrieved using fsinfo() and the
FSINFO_ATTR_MOUNT_INFO attribute.

	struct fsinfo_mount_info {
		__u64	mnt_unique_id;
		__u64	mnt_attr_changes;
		__u64	mnt_topology_changes;
		__u64	mnt_subtree_notifications;
	...
	};

There's a uniquifier and some event counters:

 (1) mnt_unique_id - This is an effectively non-repeating ID given to each
     mount object on creation.  This allows the caller to check that the
     mount ID didn't get reused (the 32-bit mount ID is more efficient to
     look up).

 (2) mnt_attr_changes - Count of attribute changes on a mount object.

 (3) mnt_topology_changes - Count of alterations to the mount tree that
     affected this node.

 (4) mnt_subtree_notifications - Count of mount object event notifications
     that were generated in the subtree rooted at this node.  This excludes
     events generated on this node itself and does not include superblock
     events.

The counters are also accessible through the FSINFO_ATTR_MOUNT_CHILDREN
attribute, where a list of all the children of a mount can be scanned.  The
record returned for each child includes the sum of the counters for that
child.  An additional record is added at the end for the queried object and
that also includes the sum of its counters

The mnt_topology_changes counter is also included in
FSINFO_ATTR_MOUNT_TOPOLOGY.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/mount_notify.c           |    2 ++
 fs/namespace.c              |   21 +++++++++++++++++++++
 include/uapi/linux/fsinfo.h |    7 +++++++
 samples/vfs/test-fsinfo.c   |   10 ++++++++--
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/fs/mount_notify.c b/fs/mount_notify.c
index 57eebae51cb1..57995c27ca88 100644
--- a/fs/mount_notify.c
+++ b/fs/mount_notify.c
@@ -93,6 +93,8 @@ void notify_mount(struct mount *trigger,
 	n.watch.info	= info_flags | watch_sizeof(n);
 	n.triggered_on	= trigger->mnt_unique_id;
 
+	smp_wmb(); /* See fsinfo_generic_mount_info(). */
+
 	switch (subtype) {
 	case NOTIFY_MOUNT_EXPIRY:
 	case NOTIFY_MOUNT_READONLY:
diff --git a/fs/namespace.c b/fs/namespace.c
index b5c2a3b4f96d..122c12f9512b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4282,6 +4282,17 @@ int fsinfo_generic_mount_info(struct path *path, struct fsinfo_context *ctx)
 	p->mnt_unique_id	= m->mnt_unique_id;
 	p->mnt_id		= m->mnt_id;
 
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	p->mnt_subtree_notifications = atomic_long_read(&m->mnt_subtree_notifications);
+	p->mnt_topology_changes	= atomic_long_read(&m->mnt_topology_changes);
+	p->mnt_attr_changes	= atomic_long_read(&m->mnt_attr_changes);
+#endif
+
+	/* Record the counters before reading the attributes as we're not
+	 * holding a lock.  Paired with a write barrier in notify_mount().
+	 */
+	smp_rmb();
+
 	flags = READ_ONCE(m->mnt.mnt_flags);
 	if (flags & MNT_READONLY)
 		p->attr |= MOUNT_ATTR_RDONLY;
@@ -4319,6 +4330,9 @@ int fsinfo_generic_mount_topology(struct path *path, struct fsinfo_context *ctx)
 
 	m = real_mount(path->mnt);
 
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	p->mnt_topology_changes	= atomic_long_read(&m->mnt_topology_changes);
+#endif
 	p->parent_id = m->mnt_parent->mnt_id;
 
 	if (path->mnt == root.mnt) {
@@ -4445,6 +4459,13 @@ static void fsinfo_store_mount(struct fsinfo_context *ctx, const struct mount *p
 	record.mnt_unique_id	= p->mnt_unique_id;
 	record.mnt_id		= p->mnt_id;
 	record.parent_id	= is_root ? p->mnt_id : p->mnt_parent->mnt_id;
+
+#ifdef CONFIG_MOUNT_NOTIFICATIONS
+	record.mnt_notify_sum	= (atomic_long_read(&p->mnt_attr_changes) +
+				   atomic_long_read(&p->mnt_topology_changes) +
+				   atomic_long_read(&p->mnt_subtree_notifications));
+#endif
+
 	memcpy(ctx->buffer + usage, &record, sizeof(record));
 }
 
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index f0a352b7028e..b021466dee0f 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -100,6 +100,9 @@ struct fsinfo_mount_info {
 	__u64	mnt_unique_id;		/* Kernel-lifetime unique mount ID */
 	__u32	mnt_id;			/* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */
 	__u32	attr;			/* MOUNT_ATTR_* flags */
+	__u64	mnt_attr_changes;	/* Number of attribute changes to this mount. */
+	__u64	mnt_topology_changes;	/* Number of topology changes to this mount. */
+	__u64	mnt_subtree_notifications; /* Number of notifications in mount subtree */
 };
 
 #define FSINFO_ATTR_MOUNT_INFO__STRUCT struct fsinfo_mount_info
@@ -113,6 +116,7 @@ struct fsinfo_mount_topology {
 	__u32	dependent_source_id;	/* Dependent: source mount group ID */
 	__u32	dependent_clone_of_id;	/* Dependent: ID of mount this was cloned from */
 	__u32	propagation_type;	/* MOUNT_PROPAGATION_* type */
+	__u64	mnt_topology_changes;	/* Number of topology changes to this mount. */
 };
 
 #define FSINFO_ATTR_MOUNT_TOPOLOGY__STRUCT struct fsinfo_mount_topology
@@ -125,6 +129,9 @@ struct fsinfo_mount_child {
 	__u64	mnt_unique_id;		/* Kernel-lifetime unique mount ID */
 	__u32	mnt_id;			/* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */
 	__u32	parent_id;		/* Parent mount identifier */
+	__u64	mnt_notify_sum;		/* Sum of mnt_attr_changes, mnt_topology_changes and
+					 * mnt_subtree_notifications.
+					 */
 };
 
 #define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct fsinfo_mount_child
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index b7290ea8eb55..620a02477aa8 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -304,6 +304,10 @@ static void dump_fsinfo_generic_mount_info(void *reply, unsigned int size)
 	printf("\tmnt_uniq: %llx\n", (unsigned long long)r->mnt_unique_id);
 	printf("\tmnt_id  : %x\n", r->mnt_id);
 	printf("\tattr    : %x\n", r->attr);
+	printf("\tevents  : attr=%llu topology=%llu subtree=%llu\n",
+	       (unsigned long long)r->mnt_attr_changes,
+	       (unsigned long long)r->mnt_topology_changes,
+	       (unsigned long long)r->mnt_subtree_notifications);
 }
 
 static void dump_fsinfo_generic_mount_topology(void *reply, unsigned int size)
@@ -332,6 +336,7 @@ static void dump_fsinfo_generic_mount_topology(void *reply, unsigned int size)
 		break;
 	}
 
+	printf("\tevents  : topology=%llu\n", (unsigned long long)r->mnt_topology_changes);
 }
 
 static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
@@ -354,8 +359,9 @@ static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
 		mp = "<this>";
 	}
 
-	printf("%8x %16llx %s\n",
-	       r->mnt_id, (unsigned long long)r->mnt_unique_id, mp);
+	printf("%8x %16llx %10llu %s\n",
+	       r->mnt_id, (unsigned long long)r->mnt_unique_id,
+	       (unsigned long long)r->mnt_notify_sum, mp);
 }
 
 static void dump_string(void *reply, unsigned int size)



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 11/18] fsinfo: sample: Mount listing program [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (9 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 10/18] fsinfo: Provide notification overrun handling support " David Howells
@ 2020-08-03 13:37 ` David Howells
  2020-08-03 13:38 ` [PATCH 12/18] fsinfo: Add API documentation " David Howells
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:37 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Implement a program to demonstrate mount listing using the new fsinfo()
syscall.  For example, to dump the tree from mount 21:

# ./test-mntinfo -m 21
MOUNT                            MOUNT ID   CHANGE#  AT P DEV   TYPE
-------------------------------- ---------- -------- -- - ----- --------
21                                       21        0  e 4  0:14 sysfs
 \_ kernel/security                      24        0  e 4   0:8 securityfs
 \_ fs/cgroup                            28        4 2f 4  0:18 tmpfs
 |   \_ unified                          29        0  e 4  0:19 cgroup2
 |   \_ systemd                          30        0  e 4  0:1a cgroup
 |   \_ blkio                            34        0  e 4  0:1e cgroup
 |   \_ net_cls,net_prio                 35        0  e 4  0:1f cgroup
 |   \_ perf_event                       36        0  e 4  0:20 cgroup
 |   \_ freezer                          37        0  e 4  0:21 cgroup
 |   \_ devices                          38        0  e 4  0:22 cgroup
 |   \_ cpu,cpuacct                      39        0  e 4  0:23 cgroup
 |   \_ rdma                             40        0  e 4  0:24 cgroup
 |   \_ memory                           41        0  e 4  0:25 cgroup
 |   \_ cpuset                           42        0  e 4  0:26 cgroup
 |   \_ hugetlb                          43        0  e 4  0:27 cgroup
 \_ fs/pstore                            31        0  e 4  0:1b pstore
 \_ firmware/efi/efivars                 32        0  e 4  0:1c efivarfs
 \_ fs/bpf                               33        0  e 4  0:1d bpf
 \_ kernel/config                        92        0  0 4  0:28 configfs
 \_ fs/selinux                           44        0  0 4  0:11 selinuxfs
 \_ kernel/debug                         45        1  0 4   0:7 debugfs

Signed-off-by: David Howells <dhowells@redhat.com>
---

 samples/vfs/Makefile       |    6 +
 samples/vfs/test-mntinfo.c |  277 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 282 insertions(+), 1 deletion(-)
 create mode 100644 samples/vfs/test-mntinfo.c

diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
index d63af5106fc2..7bcdd7a2829e 100644
--- a/samples/vfs/Makefile
+++ b/samples/vfs/Makefile
@@ -1,5 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0-only
-userprogs := test-fsinfo test-fsmount test-statx
+userprogs := \
+	test-fsinfo \
+	test-fsmount \
+	test-mntinfo \
+	test-statx
 always-y := $(userprogs)
 
 userccflags += -I usr/include
diff --git a/samples/vfs/test-mntinfo.c b/samples/vfs/test-mntinfo.c
new file mode 100644
index 000000000000..a706b5e85997
--- /dev/null
+++ b/samples/vfs/test-mntinfo.c
@@ -0,0 +1,277 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Test the fsinfo() system call
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define _GNU_SOURCE
+#define _ATFILE_SOURCE
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+#include <ctype.h>
+#include <errno.h>
+#include <time.h>
+#include <math.h>
+#include <sys/syscall.h>
+#include <linux/fsinfo.h>
+#include <linux/socket.h>
+#include <linux/fcntl.h>
+#include <sys/stat.h>
+#include <arpa/inet.h>
+
+#ifndef __NR_fsinfo
+#define __NR_fsinfo -1
+#endif
+
+static __attribute__((unused))
+ssize_t fsinfo(int dfd, const char *filename,
+	       struct fsinfo_params *params, size_t params_size,
+	       void *result_buffer, size_t result_buf_size)
+{
+	return syscall(__NR_fsinfo, dfd, filename,
+		       params, params_size,
+		       result_buffer, result_buf_size);
+}
+
+static char tree_buf[4096];
+static char bar_buf[4096];
+static unsigned int children_list_interval;
+
+/*
+ * Get an fsinfo attribute in a statically allocated buffer.
+ */
+static void get_attr(unsigned int mnt_id, unsigned int attr, unsigned int Nth,
+		     void *buf, size_t buf_size)
+{
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
+		.request	= attr,
+		.Nth		= Nth,
+	};
+	char file[32];
+	long ret;
+
+	sprintf(file, "%u", mnt_id);
+
+	memset(buf, 0xbd, buf_size);
+
+	ret = fsinfo(AT_FDCWD, file, &params, sizeof(params), buf, buf_size);
+	if (ret == -1) {
+		fprintf(stderr, "mount-%s: %m\n", file);
+		exit(1);
+	}
+}
+
+/*
+ * Get an fsinfo attribute in a dynamically allocated buffer.
+ */
+static void *get_attr_alloc(unsigned int mnt_id, unsigned int attr,
+			    unsigned int Nth, size_t *_size)
+{
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
+		.request	= attr,
+		.Nth		= Nth,
+	};
+	size_t buf_size = 4096;
+	char file[32];
+	void *r;
+	long ret;
+
+	sprintf(file, "%u", mnt_id);
+
+	for (;;) {
+		r = malloc(buf_size);
+		if (!r) {
+			perror("malloc");
+			exit(1);
+		}
+		memset(r, 0xbd, buf_size);
+
+		ret = fsinfo(AT_FDCWD, file, &params, sizeof(params), r, buf_size);
+		if (ret == -1) {
+			fprintf(stderr, "mount-%s: %x,%x,%x %m\n",
+				file, params.request, params.Nth, params.Mth);
+			exit(1);
+		}
+
+		if (ret <= buf_size) {
+			*_size = ret;
+			break;
+		}
+		buf_size = (ret + 4096 - 1) & ~(4096 - 1);
+	}
+
+	return r;
+}
+
+/*
+ * Display a mount and then recurse through its children.
+ */
+static void display_mount(unsigned int mnt_id, unsigned int depth, char *path)
+{
+	struct fsinfo_mount_topology top;
+	struct fsinfo_mount_child child;
+	struct fsinfo_mount_info info;
+	struct fsinfo_ids ids;
+	void *children;
+	unsigned int d;
+	size_t ch_size, p_size;
+	char dev[64];
+	int i, n, s;
+
+	get_attr(mnt_id, FSINFO_ATTR_MOUNT_TOPOLOGY, 0, &top, sizeof(top));
+	get_attr(mnt_id, FSINFO_ATTR_MOUNT_INFO, 0, &info, sizeof(info));
+	get_attr(mnt_id, FSINFO_ATTR_IDS, 0, &ids, sizeof(ids));
+	if (depth > 0)
+		printf("%s", tree_buf);
+
+	s = strlen(path);
+	printf("%s", !s ? "\"\"" : path);
+	if (!s)
+		s += 2;
+	s += depth;
+	if (s < 38)
+		s = 38 - s;
+	else
+		s = 1;
+	printf("%*.*s", s, s, "");
+
+	sprintf(dev, "%x:%x", ids.f_dev_major, ids.f_dev_minor);
+	printf("%10u %8llx %2x %x %5s %s",
+	       info.mnt_id,
+	       (info.mnt_attr_changes +
+		info.mnt_topology_changes +
+		info.mnt_subtree_notifications),
+	       info.attr, top.propagation_type,
+	       dev, ids.f_fs_name);
+	putchar('\n');
+
+	children = get_attr_alloc(mnt_id, FSINFO_ATTR_MOUNT_CHILDREN, 0, &ch_size);
+	n = ch_size / children_list_interval - 1;
+
+	bar_buf[depth + 1] = '|';
+	if (depth > 0) {
+		tree_buf[depth - 4 + 1] = bar_buf[depth - 4 + 1];
+		tree_buf[depth - 4 + 2] = ' ';
+	}
+
+	tree_buf[depth + 0] = ' ';
+	tree_buf[depth + 1] = '\\';
+	tree_buf[depth + 2] = '_';
+	tree_buf[depth + 3] = ' ';
+	tree_buf[depth + 4] = 0;
+	d = depth + 4;
+
+	memset(&child, 0, sizeof(child));
+	for (i = 0; i < n; i++) {
+		void *p = children + i * children_list_interval;
+
+		if (sizeof(child) >= children_list_interval)
+			memcpy(&child, p, children_list_interval);
+		else
+			memcpy(&child, p, sizeof(child));
+
+		if (i == n - 1)
+			bar_buf[depth + 1] = ' ';
+		path = get_attr_alloc(child.mnt_id, FSINFO_ATTR_MOUNT_POINT,
+				      0, &p_size);
+		display_mount(child.mnt_id, d, path + 1);
+		free(path);
+	}
+
+	free(children);
+	if (depth > 0) {
+		tree_buf[depth - 4 + 1] = '\\';
+		tree_buf[depth - 4 + 2] = '_';
+	}
+	tree_buf[depth] = 0;
+}
+
+/*
+ * Find the ID of whatever is at the nominated path.
+ */
+static unsigned int lookup_mnt_by_path(const char *path)
+{
+	struct fsinfo_mount_info mnt;
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_PATH,
+		.request	= FSINFO_ATTR_MOUNT_INFO,
+	};
+
+	if (fsinfo(AT_FDCWD, path, &params, sizeof(params), &mnt, sizeof(mnt)) == -1) {
+		perror(path);
+		exit(1);
+	}
+
+	return mnt.mnt_id;
+}
+
+/*
+ * Determine the element size for the mount child list.
+ */
+static unsigned int query_list_element_size(int mnt_id, unsigned int attr)
+{
+	struct fsinfo_attribute_info attr_info;
+
+	get_attr(mnt_id, FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, attr,
+		 &attr_info, sizeof(attr_info));
+	return attr_info.size;
+}
+
+/*
+ *
+ */
+int main(int argc, char **argv)
+{
+	unsigned int mnt_id;
+	char *path;
+	bool use_mnt_id = false;
+	int opt;
+
+	while ((opt = getopt(argc, argv, "m"))) {
+		switch (opt) {
+		case 'm':
+			use_mnt_id = true;
+			continue;
+		}
+		break;
+	}
+
+	argc -= optind;
+	argv += optind;
+
+	switch (argc) {
+	case 0:
+		mnt_id = lookup_mnt_by_path("/");
+		path = "ROOT";
+		break;
+	case 1:
+		path = argv[0];
+		if (use_mnt_id) {
+			mnt_id = strtoul(argv[0], NULL, 0);
+			break;
+		}
+
+		mnt_id = lookup_mnt_by_path(argv[0]);
+		break;
+	default:
+		printf("Format: test-mntinfo\n");
+		printf("Format: test-mntinfo <path>\n");
+		printf("Format: test-mntinfo -m <mnt_id>\n");
+		exit(2);
+	}
+
+	children_list_interval =
+		query_list_element_size(mnt_id, FSINFO_ATTR_MOUNT_CHILDREN);
+
+	printf("MOUNT                                 MOUNT ID   CHANGE#  AT P DEV   TYPE\n");
+	printf("------------------------------------- ---------- -------- -- - ----- --------\n");
+	display_mount(mnt_id, 0, path);
+	return 0;
+}



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 12/18] fsinfo: Add API documentation [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (10 preceding siblings ...)
  2020-08-03 13:37 ` [PATCH 11/18] fsinfo: sample: Mount listing program " David Howells
@ 2020-08-03 13:38 ` David Howells
  2020-08-03 13:38 ` [PATCH 13/18] fsinfo: Add support for AFS " David Howells
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:38 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add API documentation for fsinfo.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 Documentation/filesystems/fsinfo.rst |  574 ++++++++++++++++++++++++++++++++++
 1 file changed, 574 insertions(+)
 create mode 100644 Documentation/filesystems/fsinfo.rst

diff --git a/Documentation/filesystems/fsinfo.rst b/Documentation/filesystems/fsinfo.rst
new file mode 100644
index 000000000000..65d88e5a36bc
--- /dev/null
+++ b/Documentation/filesystems/fsinfo.rst
@@ -0,0 +1,574 @@
+============================
+Filesystem Information Query
+============================
+
+The fsinfo() system call allows the querying of filesystem and filesystem
+security information beyond what stat(), statx() and statfs() can obtain.  It
+does not require a file to be opened as does ioctl().
+
+fsinfo() may be called with a path, with open file descriptor or a with a mount
+object identifier.
+
+The fsinfo() system call needs to be configured on by enabling:
+
+	"File systems"/"Enable the fsinfo() system call" (CONFIG_FSINFO)
+
+This document has the following sections:
+
+.. contents:: :local:
+
+
+Overview
+========
+
+The fsinfo() system call retrieves one of a number of attributes, the IDs of
+which can be found in include/uapi/linux/fsinfo.h::
+
+	FSINFO_ATTR_STATFS	- statfs()-style state
+	FSINFO_ATTR_IDS		- Filesystem IDs
+	FSINFO_ATTR_LIMITS	- Filesystem limits
+	...
+	FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO - Information about an attribute
+	FSINFO_ATTR_FSINFO_ATTRIBUTES - List of available attributes
+	...
+	FSINFO_ATTR_MOUNT_INFO	- Information about the mount topology
+	...
+
+Each attribute can have zero or more values, which can be of one of the
+following types:
+
+ * ``FSINFO_TYPE_VSTRUCT``.  This is a structure with a version-dependent
+   length.  New versions of the kernel may append more fields, though they are
+   not permitted to remove or replace old ones.
+
+   Older applications, expecting an older version of the field, can ask for a
+   shorter struct and will only get the fields they requested; newer
+   applications running on an older kernel will get the extra fields they
+   requested filled with zeros.  Either way, the system call returns the size
+   of the internal struct, regardless of how much data it returned.
+
+   This allows for struct-type fields to be extended in future.
+
+ * ``FSINFO_TYPE_STRING``.  This is a variable-length string of up to INT_MAX
+   characters (no NUL character is included).  The returned string will be
+   truncated if the output buffer is too small.  The total size of the string
+   is returned, regardless of any truncation.
+
+ * ``FSINFO_TYPE_OPAQUE``.  This is a variable-length blob of indeterminate
+   structure.  It may be up to INT_MAX bytes in size.
+
+ * ``FSINFO_TYPE_LIST``.  This is a variable-length list of fixed-size
+   structures.  The element size may not vary over time, so the element format
+   must be designed with care.  The maximum length is INT_MAX bytes, though
+   this depends on the kernel being able to allocate an internal buffer large
+   enough.
+
+Value type is an inherent propery of an attribute and all the values of an
+attribute must be of that type.  Each attribute can have a single value, a
+sequence of values or a sequence-of-sequences of values.
+
+
+Filesystem API
+==============
+
+If the filesystem wishes to override the generic queryable attributes or
+provide queryable attributes of its own, it should define a handler function
+and point the appropriate superblock op to it::
+
+	int (*fsinfo)(struct path *path, struct fsinfo_context *ctx);
+
+The core calls this function to see if it wants to handle the attribute.  For
+each table of attibutes it has (and it can have more than one), it should
+call::
+
+	int fsinfo_get_attribute(struct path *path, struct fsinfo_context *ctx,
+				 const struct fsinfo_attribute *attrs);
+
+to scan the table to see if the requested one is in there.  This function also
+handles determining the size of struct attributes, enumerating attributes for
+the FSINFO_ATTR_FSINFO_ATTRIBUTES and querying information about an attribute
+for FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO.
+
+If it doesn't want to handle the function, -EOPNOTSUPP should be returned.  The
+core will then examine the generic attribute table.
+
+
+Attribute Table
+---------------
+
+An attribute table is a sequence of ``struct fsinfo_attribute`` terminated with
+a blank entry.  Entries can be created with a set of helper macros::
+
+	FSINFO_VSTRUCT(A,G)
+	FSINFO_VSTRUCT_N(A,G)
+	FSINFO_VSTRUCT_NM(A,G)
+	FSINFO_STRING(A,G)
+	FSINFO_STRING_N(A,G)
+	FSINFO_STRING_NM(A,G)
+	FSINFO_OPAQUE(A,G)
+	FSINFO_LIST(A,G)
+	FSINFO_LIST_N(A,G)
+
+The names of the macro are a combination of type (vstruct, string, opaque and
+list) and an optional qualifier, if the attribute has N values or N lots of M
+values.  ``A`` is the name of the attribute and ``G`` is a function to get a
+value for that attribute.
+
+For vstruct- and list-type attributes, it is expected that there is a macro
+defined with the name ``A##__STRUCT`` that indicates the structure type.
+
+The get function needs to match the following type::
+
+	int (*get)(struct path *path, struct fsinfo_context *ctx);
+
+where "path" indicates the object to be queried and ctx is a context describing
+the parameters and the output buffer.  The function should return the total
+size of the data it would like to produce or an error.
+
+
+Context Structure
+-----------------
+
+The context struct looks like::
+
+	struct fsinfo_context {
+		__u32		requested_attr;
+		__u32		Nth;
+		__u32		Mth;
+		bool		want_size_only;
+		unsigned int	skip;
+		unsigned int	usage;
+		unsigned int	buf_size;
+		void		*buffer;
+		...
+	};
+
+The fields relevant to the filesystem are as follows:
+
+ * ``requested_attr``
+
+   Which attribute is being requested.  EOPNOTSUPP should be returned if the
+   attribute is not supported by the filesystem or the LSM.
+
+ * ``Nth`` and ``Mth``
+
+   Which value of an attribute is being requested.
+
+   For a single-value attribute Nth and Mth will both be 0.
+
+   For a "1D" attribute, Nth will indicate which value and Mth will always
+   be 0.  Take, for example, FSINFO_ATTR_SERVER_NAME - for a network
+   filesystem, the superblock will be backed by a number of servers.  This will
+   return the name of the Nth server.  ENODATA will be returned if Nth goes
+   beyond the end of the array.
+
+   For a "2D" attribute, Mth will indicate the index in the Nth set of values.
+   Take, for example, an attribute for a network filesystems that returns
+   server addresses - each server may have one or more addresses.  This could
+   return the Mth address of the Nth server.  ENODATA should be returned if the
+   Nth set doesn't exist or the Mth element of the Nth set doesn't exist.
+
+ * ``want_size_only``
+
+   Is set to true if the caller only wants the size of the value so that the
+   get function doesn't have to make expensive calculations or calls to
+   retrieve the value.
+
+ * ``skip``
+
+   This indicates how far into the buffer the data to be returned starts.  This
+   can be used to trim the front off the buffer or to handle backward-filling.
+
+ * ``usage``
+
+   This indicates how much of the buffer has been used so far for an list or
+   opaque type attribute.  This is updated by the fsinfo_note_param*()
+   functions.
+
+ * ``buf_size``
+
+   This indicates the current size of the buffer.  For the list type and the
+   opaque type this will be increased if the current buffer won't hold the
+   value and the filesystem will be called again.
+
+ * ``buffer``
+
+   This points to the output buffer.  It will be buf_size in size and will be
+   resized if the returned size is larger than this.
+
+To simplify filesystem code, there will always be at least a minimal buffer
+available if a ->get() method gets called.
+
+
+Helper Functions
+================
+
+The API includes a number of helper functions:
+
+ * ``int fsinfo_string(const char *s, struct fsinfo_context *ctx);``
+
+   This places the specified string into the buffer set in the context.  If the
+   string is NULL, the buffer will be left empty.
+
+ * ``int fsinfo_generic_timestamp_info(struct path *, struct fsinfo_context *);``
+ * ``int fsinfo_generic_supports(struct path *, struct fsinfo_context *);``
+ * ``int fsinfo_generic_limits(struct path *, struct fsinfo_context *);``
+
+   These set the generic information for timestamp resolution and range
+   information, supported features and number limits and are called for the
+   corresponding attributes if the filesystem doesn't override them.
+
+   If the filesystem does override them, it can call the above functions and
+   then amend the results.
+
+ * ``void fsinfo_set_feature(struct fsinfo_features *ft,
+			     enum fsinfo_feature feature);``
+
+   This function sets a feature flag.
+
+ * ``void fsinfo_clear_feature(struct fsinfo_features *ft,
+			       enum fsinfo_feature feature);``
+
+   This function clears a feature flag.
+
+ * ``void fsinfo_set_unix_features(struct fsinfo_features *ft);``
+
+   Set feature flags appropriate to the features of a standard UNIX filesystem,
+   such as having numeric UIDS and GIDS; allowing the creation of directories,
+   symbolic links, hard links, device files, FIFO and socket files; permitting
+   sparse files; and having access, change and modification times.
+
+
+Attribute Summary
+=================
+
+To summarise the attributes that are defined::
+
+  Symbolic name				Type
+  =====================================	===============
+  FSINFO_ATTR_STATFS			vstruct
+  FSINFO_ATTR_IDS			vstruct
+  FSINFO_ATTR_LIMITS			vstruct
+  FSINFO_ATTR_SUPPORTS			vstruct
+  FSINFO_ATTR_TIMESTAMP_INFO		vstruct
+  FSINFO_ATTR_VOLUME_ID			string
+  FSINFO_ATTR_VOLUME_UUID		vstruct
+  FSINFO_ATTR_VOLUME_NAME		string
+  FSINFO_ATTR_FEATURES			vstruct
+  FSINFO_ATTR_SOURCE			string
+  FSINFO_ATTR_CONFIGURATION		string
+  FSINFO_ATTR_FS_STATISTICS		string
+  FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO	N × vstruct
+  FSINFO_ATTR_FSINFO_ATTRIBUTES		list
+  FSINFO_ATTR_MOUNT_INFO		vstruct
+  FSINFO_ATTR_MOUNT_PATH		string
+  FSINFO_ATTR_MOUNT_POINT		string
+  FSINFO_ATTR_MOUNT_CHILDREN		list
+  FSINFO_ATTR_AFS_CELL_NAME		string
+  FSINFO_ATTR_AFS_SERVER_NAME		N × string
+  FSINFO_ATTR_AFS_SERVER_ADDRESSES	N × list
+
+
+Attribute Catalogue
+===================
+
+A number of the attributes convey information about a filesystem superblock:
+
+ *  ``FSINFO_ATTR_STATFS``
+
+    This struct-type attribute gives most of the equivalent data to statfs(),
+    but with all the fields as unconditional 64-bit or 128-bit integers.  Note
+    that static data like IDs that don't change are retrieved with
+    FSINFO_ATTR_IDS instead.
+
+    Further, superblock flags (such as MS_RDONLY) are not exposed by this
+    attribute; rather the parameters must be listed and the attributes picked
+    out from that.
+
+ *  ``FSINFO_ATTR_IDS``
+
+    This struct-type attribute conveys various identifiers used by the target
+    filesystem.  This includes the filesystem name, the NFS filesystem ID, the
+    superblock ID used in notifications, the filesystem magic type number and
+    the primary device ID.
+
+ *  ``FSINFO_ATTR_LIMITS``
+
+    This struct-type attribute conveys the limits on various aspects of a
+    filesystem, such as maximum file, symlink and xattr sizes, maxiumm filename
+    and xattr name length, maximum number of symlinks, maximum device major and
+    minor numbers and maximum UID, GID and project ID numbers.
+
+ *  ``FSINFO_ATTR_SUPPORTS``
+
+    This struct-type attribute conveys information about the support the
+    filesystem has for various UAPI features of a filesystem.  This includes
+    information about which bits are supported in various masks employed by the
+    statx system call, what FS_IOC_* flags are supported by ioctls and what
+    DOS/Windows file attribute flags are supported.
+
+ *  ``FSINFO_ATTR_TIMESTAMP_INFO``
+
+    This struct-type attribute conveys information about the resolution and
+    range of the timestamps available in a filesystem.  The resolutions are
+    given as a mantissa and exponent (resolution = mantissa * 10^exponent
+    seconds), where the exponent can be negative to indicate a sub-second
+    resolution (-9 being nanoseconds, for example).
+
+ *  ``FSINFO_ATTR_VOLUME_ID``
+
+    This is a string-type attribute that conveys the superblock identifier for
+    the volume.  By default it will be filled in from the contents of s_id from
+    the superblock.  For a block-based filesystem, for example, this might be
+    the name of the primary block device.
+
+ *  ``FSINFO_ATTR_VOLUME_UUID``
+
+    This is a struct-type attribute that conveys the UUID identifier for the
+    volume.  By default it will be filled in from the contents of s_uuid from
+    the superblock.  If this doesn't exist, it will be an entirely zeros.
+
+ *  ``FSINFO_ATTR_VOLUME_NAME``
+
+    This is a string-type attribute that conveys the name of the volume.  By
+    default it will return EOPNOTSUPP.  For a disk-based filesystem, it might
+    convey the partition label; for a network-based filesystem, it might convey
+    the name of the remote volume.
+
+ *  ``FSINFO_ATTR_FEATURES``
+
+    This is a special attribute, being a set of single-bit feature flags,
+    formatted as struct-type attribute.  The meanings of the feature bits are
+    listed below - see the "Feature Bit Catalogue" section.  The feature bits
+    are grouped numerically into bytes, such that features 0-7 are in byte 0,
+    8-15 are in byte 1, 16-23 in byte 2 and so on.
+
+    Any feature bit that's not supported by the kernel will be set to false if
+    asked for.  The highest supported feature is set at the beginning of the
+    structure.
+
+ *  ``FSINFO_ATTR_SOURCE``
+ *  ``FSINFO_ATTR_CONFIGURATION``
+ *  ``FSINFO_ATTR_FS_STATISTICS``
+
+    These attributes return the mountpoint device name (as processed by the
+    filesystem), the superblock configuration (mount) options and the
+    superblock statistics in string form, as presented through a variety
+    of /proc files.
+
+
+Some attributes give information about fsinfo itself:
+
+ *  ``FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO``
+
+    This struct-type attribute gives metadata about the attribute with the ID
+    specified by the Nth parameter, including its type, default size and
+    element size.
+
+ *  ``FSINFO_ATTR_FSINFO_ATTRIBUTES``
+
+    This list-type attribute gives a list of the attribute IDs available at the
+    point of reference.  FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO can then be used to
+    query each attribute.
+
+
+Some attributes give information about mount objects:
+
+ *  ``FSINFO_ATTR_MOUNT_INFO``
+
+    This gives information about a particular mount object, including its IDs,
+    its attributes and its event counters.
+
+ *  ``FSINFO_ATTR_MOUNT_TOPOLOGY``
+
+    This gives information about a mount object's topological relationships and
+    propagation attributes.  This is more expensive inside the kernel than
+    MOUNT_INFO due to the locking requirements, but the mount object's topology
+    change counter can be used to work out if it has changed.
+
+    This does not give a list of the children; use FSINFO_ATTR_MOUNT_CHILDREN
+    for that.
+
+ *  ``FSINFO_ATTR_MOUNT_PATH``
+
+    This gives information about the path set by binding a mount, though it may
+    be overridden by the filesystem.
+
+ *  ``FSINFO_ATTR_MOUNT_POINT``
+ *  ``FSINFO_ATTR_MOUNT_POINT_FULL``
+
+    These give the path to the mount point for a mount object, in the former
+    relative to its parent mount's mount point (limited to chroot) and in the
+    latter as a full path from the chroot.
+
+ *  ``FSINFO_ATTR_MOUNT_CHILDREN``
+
+    This gives a list of all the child mounts of the queried mount.  This is
+    presented as tuples of { mount ID, mount uniquifier, event counter sum }
+    and includes at the end a tuple representing the queried mount.
+
+
+Finally there are filesystem-specific attributes, e.g.:
+
+ *  ``FSINFO_ATTR_AFS_CELL_NAME``
+
+    This is a string-type attribute that retrieves the AFS cell name of the
+    target object.
+
+ *  ``FSINFO_ATTR_AFS_SERVER_NAME``
+
+    This is a string-type attribute that conveys the name of the Nth server
+    backing a network-filesystem superblock.
+
+ *  ``FSINFO_ATTR_AFS_SERVER_ADDRESSES``
+
+    This is a list-type attribute that conveys the addresses of the Nth server,
+    corresponding to the Nth server returned by FSINFO_ATTR_SERVER_NAME.
+
+
+Feature Bit Catalogue
+=====================
+
+The feature bits convey single true/false assertions about a specific instance
+of a filesystem (ie. a specific superblock).  They are accessed using the
+"FSINFO_ATTR_FEATURE" attribute:
+
+ *  ``FSINFO_FEAT_IS_KERNEL_FS``
+ *  ``FSINFO_FEAT_IS_BLOCK_FS``
+ *  ``FSINFO_FEAT_IS_FLASH_FS``
+ *  ``FSINFO_FEAT_IS_NETWORK_FS``
+ *  ``FSINFO_FEAT_IS_AUTOMOUNTER_FS``
+ *  ``FSINFO_FEAT_IS_MEMORY_FS``
+
+    These indicate what kind of filesystem the target is: kernel API (proc),
+    block-based (ext4), flash/nvm-based (jffs2), remote over the network (NFS),
+    local quasi-filesystem that acts as a tray of mountpoints (autofs), plain
+    in-memory filesystem (shmem).
+
+ *  ``FSINFO_FEAT_AUTOMOUNTS``
+
+    This indicate if a filesystem may have objects that are automount points.
+
+ *  ``FSINFO_FEAT_ADV_LOCKS``
+ *  ``FSINFO_FEAT_MAND_LOCKS``
+ *  ``FSINFO_FEAT_LEASES``
+
+    These indicate if a filesystem supports advisory locks, mandatory locks or
+    leases.
+
+ *  ``FSINFO_FEAT_UIDS``
+ *  ``FSINFO_FEAT_GIDS``
+ *  ``FSINFO_FEAT_PROJIDS``
+
+    These indicate if a filesystem supports/stores/transports numeric user IDs,
+    group IDs or project IDs.  The "FSINFO_ATTR_LIMITS" attribute can be used
+    to find out the upper limits on the IDs values.
+
+ *  ``FSINFO_FEAT_STRING_USER_IDS``
+
+    This indicates if a filesystem supports/stores/transports string user
+    identifiers.
+
+ *  ``FSINFO_FEAT_GUID_USER_IDS``
+
+    This indicates if a filesystem supports/stores/transports Windows GUIDs as
+    user identifiers (eg. ntfs).
+
+ *  ``FSINFO_FEAT_WINDOWS_ATTRS``
+
+    This indicates if a filesystem supports Windows FILE_* attribute bits
+    (eg. cifs, jfs).  The "FSINFO_ATTR_SUPPORTS" attribute can be used to find
+    out which windows file attributes are supported by the filesystem.
+
+ *  ``FSINFO_FEAT_USER_QUOTAS``
+ *  ``FSINFO_FEAT_GROUP_QUOTAS``
+ *  ``FSINFO_FEAT_PROJECT_QUOTAS``
+
+    These indicate if a filesystem supports quotas for users, groups or
+    projects.
+
+ *  ``FSINFO_FEAT_XATTRS``
+
+    These indicate if a filesystem supports extended attributes.  The
+    "FSINFO_ATTR_LIMITS" attribute can be used to find out the upper limits on
+    the supported name and body lengths.
+
+ *  ``FSINFO_FEAT_JOURNAL``
+ *  ``FSINFO_FEAT_DATA_IS_JOURNALLED``
+
+    These indicate whether the filesystem has a journal and whether data
+    changes are logged to it.
+
+ *  ``FSINFO_FEAT_O_SYNC``
+ *  ``FSINFO_FEAT_O_DIRECT``
+
+    These indicate whether the filesystem supports the O_SYNC and O_DIRECT
+    flags.
+
+ *  ``FSINFO_FEAT_VOLUME_ID``
+ *  ``FSINFO_FEAT_VOLUME_UUID``
+ *  ``FSINFO_FEAT_VOLUME_NAME``
+ *  ``FSINFO_FEAT_VOLUME_FSID``
+
+    These indicate whether ID, UUID, name and FSID identifiers actually exist
+    in the filesystem and thus might be considered persistent.
+
+ *  ``FSINFO_FEAT_IVER_ALL_CHANGE``
+ *  ``FSINFO_FEAT_IVER_DATA_CHANGE``
+ *  ``FSINFO_FEAT_IVER_MONO_INCR``
+
+    These indicate whether i_version in the inode is supported and, if so, what
+    mode it operates in.  The first two indicate if it's changed for any data
+    or metadata change, or whether it's only changed for any data changes; the
+    last indicates whether or not it's monotonically increasing for each such
+    change.
+
+ *  ``FSINFO_FEAT_HARD_LINKS``
+ *  ``FSINFO_FEAT_HARD_LINKS_1DIR``
+
+    These indicate whether the filesystem can have hard links made in it, and
+    whether they can be made between directory or only within the same
+    directory.
+
+ *  ``FSINFO_FEAT_DIRECTORIES``
+ *  ``FSINFO_FEAT_SYMLINKS``
+ *  ``FSINFO_FEAT_DEVICE_FILES``
+ *  ``FSINFO_FEAT_UNIX_SPECIALS``
+
+    These indicate whether directories; symbolic links; device files; or pipes
+    and sockets can be made within the filesystem.
+
+ *  ``FSINFO_FEAT_RESOURCE_FORKS``
+
+    This indicates if the filesystem supports resource forks.
+
+ *  ``FSINFO_FEAT_NAME_CASE_INDEP``
+ *  ``FSINFO_FEAT_NAME_NON_UTF8``
+ *  ``FSINFO_FEAT_NAME_HAS_CODEPAGE``
+
+    These indicate if the filesystem supports case-independent file names,
+    whether the filenames are non-utf8 (see the "FSINFO_ATTR_NAME_ENCODING"
+    attribute) and whether a codepage is in use to transliterate them (see
+    the "FSINFO_ATTR_NAME_CODEPAGE" attribute).
+
+ *  ``FSINFO_FEAT_SPARSE``
+
+    This indicates if a filesystem supports sparse files.
+
+ *  ``FSINFO_FEAT_NOT_PERSISTENT``
+
+    This indicates if a filesystem is not persistent.
+
+ *  ``FSINFO_FEAT_NO_UNIX_MODE``
+
+    This indicates if a filesystem doesn't support UNIX mode bits (though they
+    may be manufactured from other bits, such as Windows file attribute flags).
+
+ *  ``FSINFO_FEAT_HAS_ATIME``
+ *  ``FSINFO_FEAT_HAS_BTIME``
+ *  ``FSINFO_FEAT_HAS_CTIME``
+ *  ``FSINFO_FEAT_HAS_MTIME``
+
+    These indicate which timestamps a filesystem supports (access, birth,
+    change, modify).  The range and resolutions can be queried with the
+    "FSINFO_ATTR_TIMESTAMPS" attribute).



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 13/18] fsinfo: Add support for AFS [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (11 preceding siblings ...)
  2020-08-03 13:38 ` [PATCH 12/18] fsinfo: Add API documentation " David Howells
@ 2020-08-03 13:38 ` David Howells
  2020-08-03 13:38 ` [PATCH 14/18] fsinfo: Add support to ext4 " David Howells
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:38 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add fsinfo support to the AFS filesystem.  This allows the export of server
lists, amongst other things, which is necessary to implement some of the
AFS 'fs' command set, such as "checkservers", "getserverprefs" and
"whereis".

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/afs/internal.h           |    1 
 fs/afs/super.c              |  216 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fsinfo.h |   15 +++
 samples/vfs/test-fsinfo.c   |   49 ++++++++++
 4 files changed, 279 insertions(+), 2 deletions(-)

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 792ac711985e..e775340c23c1 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -225,6 +225,7 @@ struct afs_super_info {
 	struct afs_volume	*volume;	/* volume record */
 	enum afs_flock_mode	flock_mode:8;	/* File locking emulation mode */
 	bool			dyn_root;	/* True if dynamic root */
+	bool			autocell;	/* True if autocell */
 };
 
 static inline struct afs_super_info *AFS_FS_S(struct super_block *sb)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index b552357b1d13..6fe7b8a57869 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -26,9 +26,13 @@
 #include <linux/sched.h>
 #include <linux/nsproxy.h>
 #include <linux/magic.h>
+#include <linux/fsinfo.h>
 #include <net/net_namespace.h>
 #include "internal.h"
 
+#ifdef CONFIG_FSINFO
+static int afs_fsinfo(struct path *path, struct fsinfo_context *ctx);
+#endif
 static void afs_i_init_once(void *foo);
 static void afs_kill_super(struct super_block *sb);
 static struct inode *afs_alloc_inode(struct super_block *sb);
@@ -54,6 +58,9 @@ int afs_net_id;
 
 static const struct super_operations afs_super_ops = {
 	.statfs		= afs_statfs,
+#ifdef CONFIG_FSINFO
+	.fsinfo		= afs_fsinfo,
+#endif
 	.alloc_inode	= afs_alloc_inode,
 	.drop_inode	= afs_drop_inode,
 	.destroy_inode	= afs_destroy_inode,
@@ -193,7 +200,7 @@ static int afs_show_options(struct seq_file *m, struct dentry *root)
 
 	if (as->dyn_root)
 		seq_puts(m, ",dyn");
-	if (test_bit(AFS_VNODE_AUTOCELL, &AFS_FS_I(d_inode(root))->flags))
+	if (as->autocell)
 		seq_puts(m, ",autocell");
 	switch (as->flock_mode) {
 	case afs_flock_mode_unset:	break;
@@ -470,7 +477,7 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
-	if (ctx->autocell || as->dyn_root)
+	if (as->autocell || as->dyn_root)
 		set_bit(AFS_VNODE_AUTOCELL, &AFS_FS_I(inode)->flags);
 
 	ret = -ENOMEM;
@@ -512,6 +519,8 @@ static struct afs_super_info *afs_alloc_sbi(struct fs_context *fc)
 			as->volume = afs_get_volume(ctx->volume,
 						    afs_volume_trace_get_alloc_sbi);
 		}
+		if (ctx->autocell)
+			as->autocell = true;
 	}
 	return as;
 }
@@ -771,3 +780,206 @@ static int afs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	op->ops			= &afs_get_volume_status_operation;
 	return afs_do_sync_operation(op);
 }
+
+#ifdef CONFIG_FSINFO
+static const struct fsinfo_timestamp_info afs_timestamp_info = {
+	.atime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.mtime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.ctime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+	.btime = {
+		.minimum	= 0,
+		.maximum	= UINT_MAX,
+		.gran_mantissa	= 1,
+		.gran_exponent	= 0,
+	},
+};
+
+static int afs_fsinfo_get_timestamp(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_timestamp_info *tsinfo = ctx->buffer;
+	*tsinfo = afs_timestamp_info;
+	return sizeof(*tsinfo);
+}
+
+static int afs_fsinfo_get_limits(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_limits *lim = ctx->buffer;
+
+	lim->max_file_size.hi	= 0;
+	lim->max_file_size.lo	= MAX_LFS_FILESIZE;
+	/* Inode numbers can be 96-bit on YFS, but that's hard to determine. */
+	lim->max_ino.hi		= 0;
+	lim->max_ino.lo		= UINT_MAX;
+	lim->max_hard_links	= UINT_MAX;
+	lim->max_uid		= UINT_MAX;
+	lim->max_gid		= UINT_MAX;
+	lim->max_filename_len	= AFSNAMEMAX - 1;
+	lim->max_symlink_len	= AFSPATHMAX - 1;
+	return sizeof(*lim);
+}
+
+static int afs_fsinfo_get_supports(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_supports *p = ctx->buffer;
+
+	p->stx_mask = (STATX_TYPE | STATX_MODE |
+		       STATX_NLINK |
+		       STATX_UID | STATX_GID |
+		       STATX_MTIME | STATX_INO |
+		       STATX_SIZE);
+	p->stx_attributes = STATX_ATTR_AUTOMOUNT;
+	return sizeof(*p);
+}
+
+static int afs_fsinfo_get_features(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_features *p = ctx->buffer;
+
+	fsinfo_set_feature(p, FSINFO_FEAT_IS_NETWORK_FS);
+	fsinfo_set_feature(p, FSINFO_FEAT_AUTOMOUNTS);
+	fsinfo_set_feature(p, FSINFO_FEAT_ADV_LOCKS);
+	fsinfo_set_feature(p, FSINFO_FEAT_UIDS);
+	fsinfo_set_feature(p, FSINFO_FEAT_GIDS);
+	fsinfo_set_feature(p, FSINFO_FEAT_VOLUME_ID);
+	fsinfo_set_feature(p, FSINFO_FEAT_VOLUME_NAME);
+	fsinfo_set_feature(p, FSINFO_FEAT_IVER_MONO_INCR);
+	fsinfo_set_feature(p, FSINFO_FEAT_SYMLINKS);
+	fsinfo_set_feature(p, FSINFO_FEAT_HARD_LINKS_1DIR);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_MTIME);
+	fsinfo_set_feature(p, FSINFO_FEAT_HAS_INODE_NUMBERS);
+	return sizeof(*p);
+}
+
+static int afs_dyn_fsinfo_get_features(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_features *p = ctx->buffer;
+
+	fsinfo_set_feature(p, FSINFO_FEAT_IS_AUTOMOUNTER_FS);
+	fsinfo_set_feature(p, FSINFO_FEAT_AUTOMOUNTS);
+	return sizeof(*p);
+}
+
+static int afs_fsinfo_get_volume_name(struct path *path, struct fsinfo_context *ctx)
+{
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_volume *volume = as->volume;
+
+	return fsinfo_opaque(volume->name, ctx, volume->name_len + 1);
+}
+
+static int afs_fsinfo_get_cell_name(struct path *path, struct fsinfo_context *ctx)
+{
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_cell *cell = as->cell;
+
+	return fsinfo_opaque(cell->name, ctx, cell->name_len + 1);
+}
+
+static int afs_fsinfo_get_server_name(struct path *path, struct fsinfo_context *ctx)
+{
+	struct afs_server_list *slist;
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_volume *volume = as->volume;
+	struct afs_server *server;
+	int ret = -ENODATA;
+
+	read_lock(&volume->servers_lock);
+	slist = volume->servers;
+	if (slist) {
+		if (ctx->Nth < slist->nr_servers) {
+			server = slist->servers[ctx->Nth].server;
+			ret = sprintf(ctx->buffer, "%pU", &server->uuid) + 1;
+		}
+	}
+
+	read_unlock(&volume->servers_lock);
+	return ret;
+}
+
+static int afs_fsinfo_get_server_address(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_afs_server_address *p = ctx->buffer;
+	struct afs_server_list *slist;
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	struct afs_addr_list *alist;
+	struct afs_volume *volume = as->volume;
+	struct afs_server *server;
+	struct afs_net *net = afs_d2net(path->dentry);
+	unsigned int i;
+	int ret = -ENODATA;
+
+	read_lock(&volume->servers_lock);
+	slist = afs_get_serverlist(volume->servers);
+	read_unlock(&volume->servers_lock);
+
+	if (ctx->Nth >= slist->nr_servers)
+		goto put_slist;
+	server = slist->servers[ctx->Nth].server;
+
+	read_lock(&server->fs_lock);
+	alist = afs_get_addrlist(rcu_dereference_protected(
+					 server->addresses,
+					 lockdep_is_held(&server->fs_lock)));
+	read_unlock(&server->fs_lock);
+	if (!alist)
+		goto put_slist;
+
+	ret = alist->nr_addrs * sizeof(*p);
+	if (ret <= ctx->buf_size) {
+		for (i = 0; i < alist->nr_addrs; i++)
+			memcpy(&p[i].address, &alist->addrs[i],
+			       sizeof(struct sockaddr_rxrpc));
+	}
+
+	afs_put_addrlist(alist);
+put_slist:
+	afs_put_serverlist(net, slist);
+	return ret;
+}
+
+static const struct fsinfo_attribute afs_fsinfo_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	afs_fsinfo_get_timestamp),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		afs_fsinfo_get_limits),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		afs_fsinfo_get_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		afs_fsinfo_get_features),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	afs_fsinfo_get_volume_name),
+	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	afs_fsinfo_get_cell_name),
+	FSINFO_STRING_N	(FSINFO_ATTR_AFS_SERVER_NAME,	afs_fsinfo_get_server_name),
+	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_get_server_address),
+	{}
+};
+
+static const struct fsinfo_attribute afs_dyn_fsinfo_attributes[] = {
+	FSINFO_VSTRUCT(FSINFO_ATTR_TIMESTAMP_INFO,	afs_fsinfo_get_timestamp),
+	FSINFO_VSTRUCT(FSINFO_ATTR_FEATURES,		afs_dyn_fsinfo_get_features),
+	{}
+};
+
+static int afs_fsinfo(struct path *path, struct fsinfo_context *ctx)
+{
+	struct afs_super_info *as = AFS_FS_S(path->dentry->d_sb);
+	int ret;
+
+	if (as->dyn_root)
+		ret = fsinfo_get_attribute(path, ctx, afs_dyn_fsinfo_attributes);
+	else
+		ret = fsinfo_get_attribute(path, ctx, afs_fsinfo_attributes);
+	return ret;
+}
+
+#endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index b021466dee0f..81329de6905e 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -38,6 +38,10 @@
 #define FSINFO_ATTR_MOUNT_TOPOLOGY	0x204	/* Mount object topology */
 #define FSINFO_ATTR_MOUNT_CHILDREN	0x205	/* Children of this mount (list) */
 
+#define FSINFO_ATTR_AFS_CELL_NAME	0x300	/* AFS cell name (string) */
+#define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of the Nth server (string) */
+#define FSINFO_ATTR_AFS_SERVER_ADDRESSES 0x302	/* List of addresses of the Nth server */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -309,4 +313,15 @@ struct fsinfo_volume_uuid {
 
 #define FSINFO_ATTR_VOLUME_UUID__STRUCT struct fsinfo_volume_uuid
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_AFS_SERVER_ADDRESSES).
+ *
+ * Get the addresses of the Nth server for a network filesystem.
+ */
+struct fsinfo_afs_server_address {
+	struct __kernel_sockaddr_storage address;
+};
+
+#define FSINFO_ATTR_AFS_SERVER_ADDRESSES__STRUCT struct fsinfo_afs_server_address
+
 #endif /* _UAPI_LINUX_FSINFO_H */
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 620a02477aa8..374825ab85b0 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -24,6 +24,7 @@
 #include <linux/mount.h>
 #include <sys/stat.h>
 #include <arpa/inet.h>
+#include <linux/rxrpc.h>
 
 #ifndef __NR_fsinfo
 #define __NR_fsinfo -1
@@ -364,6 +365,50 @@ static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
 	       (unsigned long long)r->mnt_notify_sum, mp);
 }
 
+static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
+{
+	struct fsinfo_afs_server_address *f = reply;
+	struct sockaddr_storage *ss = (struct sockaddr_storage *)&f->address;
+	struct sockaddr_rxrpc *srx;
+	struct sockaddr_in6 *sin6;
+	struct sockaddr_in *sin;
+	char proto[32], buf[1024];
+
+	if (ss->ss_family == AF_RXRPC) {
+		srx = (struct sockaddr_rxrpc *)ss;
+		printf("%5u ", srx->srx_service);
+		switch (srx->transport_type) {
+		case SOCK_DGRAM:
+			sprintf(proto, "udp");
+			break;
+		case SOCK_STREAM:
+			sprintf(proto, "tcp");
+			break;
+		default:
+			sprintf(proto, "%3u", srx->transport_type);
+			break;
+		}
+		ss = (struct sockaddr_storage *)&srx->transport;
+	}
+
+	switch (ss->ss_family) {
+	case AF_INET:
+		sin = (struct sockaddr_in *)ss;
+		if (!inet_ntop(AF_INET, &sin->sin_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u/%s %s\n", ntohs(sin->sin_port), proto, buf);
+		return;
+	case AF_INET6:
+		sin6 = (struct sockaddr_in6 *)ss;
+		if (!inet_ntop(AF_INET6, &sin6->sin6_addr, buf, sizeof(buf)))
+			break;
+		printf("%5u/%s %s\n", ntohs(sin6->sin6_port), proto, buf);
+		return;
+	}
+
+	printf("family=%u\n", ss->ss_family);
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -447,6 +492,10 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	string),
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT_FULL,	string),
 	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
+
+	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	string),
+	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	string),
+	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),
 	{}
 };
 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 14/18] fsinfo: Add support to ext4 [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (12 preceding siblings ...)
  2020-08-03 13:38 ` [PATCH 13/18] fsinfo: Add support for AFS " David Howells
@ 2020-08-03 13:38 ` David Howells
  2020-08-03 13:38 ` [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace " David Howells
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:38 UTC (permalink / raw)
  To: viro
  Cc: Darrick J. Wong, Theodore Ts'o, Andreas Dilger, Eric Biggers,
	linux-ext4, dhowells, torvalds, raven, mszeredi, christian,
	jannh, darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add support to ext4, including the following:

 (1) FSINFO_ATTR_SUPPORTS: Information about supported STATX attributes and
     support for ioctls like FS_IOC_[GS]ETFLAGS and FS_IOC_FS[GS]ETXATTR.

 (2) FSINFO_ATTR_FEATURES: Information about features supported by an ext4
     filesystem, such as whether version counting, birth time and name case
     folding are in operation.

 (3) FSINFO_ATTR_VOLUME_NAME: The volume name from the superblock.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
cc: "Theodore Ts'o" <tytso@mit.edu>
cc: Andreas Dilger <adilger.kernel@dilger.ca>
cc: Eric Biggers <ebiggers@kernel.org>
cc: linux-ext4@vger.kernel.org
---

 fs/ext4/Makefile |    1 +
 fs/ext4/ext4.h   |    6 +++
 fs/ext4/fsinfo.c |   97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/super.c  |    3 ++
 4 files changed, 107 insertions(+)
 create mode 100644 fs/ext4/fsinfo.c

diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 2e42f47a7f98..ad67812bf7d0 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -17,3 +17,4 @@ ext4-$(CONFIG_EXT4_FS_SECURITY)		+= xattr_security.o
 ext4-inode-test-objs			+= inode-test.o
 obj-$(CONFIG_EXT4_KUNIT_TESTS)		+= ext4-inode-test.o
 ext4-$(CONFIG_FS_VERITY)		+= verity.o
+ext4-$(CONFIG_FSINFO)			+= fsinfo.o
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 42f5060f3cdf..99a737cf6308 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -43,6 +43,7 @@
 
 #include <linux/fscrypt.h>
 #include <linux/fsverity.h>
+#include <linux/fsinfo.h>
 
 #include <linux/compiler.h>
 
@@ -3233,6 +3234,11 @@ extern const struct inode_operations ext4_file_inode_operations;
 extern const struct file_operations ext4_file_operations;
 extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
 
+/* fsinfo.c */
+#ifdef CONFIG_FSINFO
+extern int ext4_fsinfo(struct path *path, struct fsinfo_context *ctx);
+#endif
+
 /* inline.c */
 extern int ext4_get_max_inline_size(struct inode *inode);
 extern int ext4_find_inline_data_nolock(struct inode *inode);
diff --git a/fs/ext4/fsinfo.c b/fs/ext4/fsinfo.c
new file mode 100644
index 000000000000..1d4093ef32e7
--- /dev/null
+++ b/fs/ext4/fsinfo.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information for ext4
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include "ext4.h"
+
+static int ext4_fsinfo_supports(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_supports *p = ctx->buffer;
+	struct inode *inode = d_inode(path->dentry);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct ext4_inode *raw_inode;
+	u32 flags;
+
+	fsinfo_generic_supports(path, ctx);
+	p->stx_attributes |= (STATX_ATTR_APPEND |
+			      STATX_ATTR_COMPRESSED |
+			      STATX_ATTR_ENCRYPTED |
+			      STATX_ATTR_IMMUTABLE |
+			      STATX_ATTR_NODUMP |
+			      STATX_ATTR_VERITY);
+	if (EXT4_FITS_IN_INODE(raw_inode, ei, i_crtime))
+		p->stx_mask |= STATX_BTIME;
+
+	flags = EXT4_FL_USER_VISIBLE;
+	if (S_ISREG(inode->i_mode))
+		flags &= ~EXT4_PROJINHERIT_FL;
+	p->fs_ioc_getflags = flags;
+	flags &= EXT4_FL_USER_MODIFIABLE;
+	p->fs_ioc_setflags_set = flags;
+	p->fs_ioc_setflags_clear = flags;
+
+	p->fs_ioc_fsgetxattr_xflags = EXT4_FL_XFLAG_VISIBLE;
+	p->fs_ioc_fssetxattr_xflags_set = EXT4_FL_XFLAG_VISIBLE;
+	p->fs_ioc_fssetxattr_xflags_clear = EXT4_FL_XFLAG_VISIBLE;
+	return sizeof(*p);
+}
+
+static int ext4_fsinfo_features(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_features *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+	struct inode *inode = d_inode(path->dentry);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct ext4_inode *raw_inode;
+
+	fsinfo_generic_features(path, ctx);
+	fsinfo_set_unix_features(p);
+	fsinfo_set_feature(p, FSINFO_FEAT_VOLUME_UUID);
+	fsinfo_set_feature(p, FSINFO_FEAT_VOLUME_NAME);
+	fsinfo_set_feature(p, FSINFO_FEAT_O_SYNC);
+	fsinfo_set_feature(p, FSINFO_FEAT_O_DIRECT);
+	fsinfo_set_feature(p, FSINFO_FEAT_ADV_LOCKS);
+
+	if (test_opt(sb, XATTR_USER))
+		fsinfo_set_feature(p, FSINFO_FEAT_XATTRS);
+	if (ext4_has_feature_journal(sb))
+		fsinfo_set_feature(p, FSINFO_FEAT_JOURNAL);
+	if (ext4_has_feature_casefold(sb))
+		fsinfo_set_feature(p, FSINFO_FEAT_NAME_CASE_INDEP);
+
+	if (sb->s_flags & SB_I_VERSION &&
+	    !test_opt2(sb, HURD_COMPAT) &&
+	    EXT4_INODE_SIZE(sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+		fsinfo_set_feature(p, FSINFO_FEAT_IVER_DATA_CHANGE);
+		fsinfo_set_feature(p, FSINFO_FEAT_IVER_MONO_INCR);
+	}
+
+	if (EXT4_FITS_IN_INODE(raw_inode, ei, i_crtime))
+		fsinfo_set_feature(p, FSINFO_FEAT_HAS_BTIME);
+	return sizeof(*p);
+}
+
+static int ext4_fsinfo_get_volume_name(struct path *path, struct fsinfo_context *ctx)
+{
+	const struct ext4_sb_info *sbi = EXT4_SB(path->mnt->mnt_sb);
+	const struct ext4_super_block *es = sbi->s_es;
+
+	memcpy(ctx->buffer, es->s_volume_name, sizeof(es->s_volume_name));
+	return strlen(ctx->buffer) + 1;
+}
+
+static const struct fsinfo_attribute ext4_fsinfo_attributes[] = {
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		ext4_fsinfo_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		ext4_fsinfo_features),
+	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	ext4_fsinfo_get_volume_name),
+	{}
+};
+
+int ext4_fsinfo(struct path *path, struct fsinfo_context *ctx)
+{
+	return fsinfo_get_attribute(path, ctx, ext4_fsinfo_attributes);
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 330957ed1f05..47f349620176 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1481,6 +1481,9 @@ static const struct super_operations ext4_sops = {
 	.freeze_fs	= ext4_freeze,
 	.unfreeze_fs	= ext4_unfreeze,
 	.statfs		= ext4_statfs,
+#ifdef CONFIG_FSINFO
+	.fsinfo		= ext4_fsinfo,
+#endif
 	.remount_fs	= ext4_remount,
 	.show_options	= ext4_show_options,
 #ifdef CONFIG_QUOTA



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (13 preceding siblings ...)
  2020-08-03 13:38 ` [PATCH 14/18] fsinfo: Add support to ext4 " David Howells
@ 2020-08-03 13:38 ` David Howells
  2020-08-04 14:05   ` Miklos Szeredi
  2020-08-05 16:44   ` David Howells
  2020-08-03 13:38 ` [PATCH 16/18] errseq: add a new errseq_scrape function " David Howells
                   ` (4 subsequent siblings)
  19 siblings, 2 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:38 UTC (permalink / raw)
  To: viro
  Cc: dhowells, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Add a filesystem attribute that exports a list of all the visible mounts in
a namespace, given the caller's chroot setting.  The returned list is an
array of:

	struct fsinfo_mount_child {
		__u64	mnt_unique_id;
		__u32	mnt_id;
		__u32	parent_id;
		__u32	mnt_notify_sum;
		__u32	sb_notify_sum;
	};

where each element contains a once-in-a-system-lifetime unique ID, the
mount ID (which may get reused), the parent mount ID and sums of the
notification/change counters for the mount and its superblock.

This works with a read lock on the namespace_sem, but ideally would do it
under the RCU read lock only.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                 |    1 +
 fs/internal.h               |    1 +
 fs/namespace.c              |   37 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fsinfo.h |    4 ++++
 samples/vfs/test-fsinfo.c   |   22 ++++++++++++++++++++++
 5 files changed, 65 insertions(+)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index 0540cce89555..f230124ffdf5 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -296,6 +296,7 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
 	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT_FULL,	fsinfo_generic_mount_point_full),
 	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
+	FSINFO_LIST	(FSINFO_ATTR_MOUNT_ALL,		fsinfo_generic_mount_all),
 	{}
 };
 
diff --git a/fs/internal.h b/fs/internal.h
index cb5edcc7125a..267b4aaf0271 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -102,6 +102,7 @@ extern int fsinfo_generic_mount_topology(struct path *, struct fsinfo_context *)
 extern int fsinfo_generic_mount_point(struct path *, struct fsinfo_context *);
 extern int fsinfo_generic_mount_point_full(struct path *, struct fsinfo_context *);
 extern int fsinfo_generic_mount_children(struct path *, struct fsinfo_context *);
+extern int fsinfo_generic_mount_all(struct path *, struct fsinfo_context *);
 
 /*
  * fs_struct.c
diff --git a/fs/namespace.c b/fs/namespace.c
index 122c12f9512b..1f2e06507244 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4494,4 +4494,41 @@ int fsinfo_generic_mount_children(struct path *path, struct fsinfo_context *ctx)
 	return ctx->usage;
 }
 
+/*
+ * Return information about all the mounts in the namespace referenced by the
+ * path.
+ */
+int fsinfo_generic_mount_all(struct path *path, struct fsinfo_context *ctx)
+{
+	struct mnt_namespace *ns;
+	struct mount *m, *p;
+	struct path chroot;
+	bool allow;
+
+	m = real_mount(path->mnt);
+	ns = m->mnt_ns;
+
+	get_fs_root(current->fs, &chroot);
+	rcu_read_lock();
+	allow = are_paths_connected(&chroot, path) || capable(CAP_SYS_ADMIN);
+	rcu_read_unlock();
+	path_put(&chroot);
+	if (!allow)
+		return -EPERM;
+
+	down_read(&namespace_sem);
+
+	list_for_each_entry(p, &ns->list, mnt_list) {
+		struct path mnt_root;
+
+		mnt_root.mnt	= &p->mnt;
+		mnt_root.dentry	= p->mnt.mnt_root;
+		if (are_paths_connected(path, &mnt_root))
+			fsinfo_store_mount(ctx, p, p == m);
+	}
+
+	up_read(&namespace_sem);
+	return ctx->usage;
+}
+
 #endif /* CONFIG_FSINFO */
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index 81329de6905e..e40192d98648 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -37,6 +37,7 @@
 #define FSINFO_ATTR_MOUNT_POINT_FULL	0x203	/* Absolute path of mount (string) */
 #define FSINFO_ATTR_MOUNT_TOPOLOGY	0x204	/* Mount object topology */
 #define FSINFO_ATTR_MOUNT_CHILDREN	0x205	/* Children of this mount (list) */
+#define FSINFO_ATTR_MOUNT_ALL		0x206	/* List all mounts in a namespace (list) */
 
 #define FSINFO_ATTR_AFS_CELL_NAME	0x300	/* AFS cell name (string) */
 #define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of the Nth server (string) */
@@ -128,6 +129,8 @@ struct fsinfo_mount_topology {
 /*
  * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_CHILDREN).
  * - An extra element is placed on the end representing the parent mount.
+ *
+ * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_ALL).
  */
 struct fsinfo_mount_child {
 	__u64	mnt_unique_id;		/* Kernel-lifetime unique mount ID */
@@ -139,6 +142,7 @@ struct fsinfo_mount_child {
 };
 
 #define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct fsinfo_mount_child
+#define FSINFO_ATTR_MOUNT_ALL__STRUCT struct fsinfo_mount_child
 
 /*
  * Information struct for fsinfo(FSINFO_ATTR_STATFS).
diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 374825ab85b0..596fa5e71762 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -365,6 +365,27 @@ static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
 	       (unsigned long long)r->mnt_notify_sum, mp);
 }
 
+static void dump_fsinfo_generic_mount_all(void *reply, unsigned int size)
+{
+	struct fsinfo_mount_child *r = reply;
+	ssize_t mplen;
+	char path[32], *mp;
+
+	struct fsinfo_params params = {
+		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
+		.request	= FSINFO_ATTR_MOUNT_POINT_FULL,
+	};
+
+	sprintf(path, "%u", r->mnt_id);
+	mplen = get_fsinfo(path, "FSINFO_ATTR_MOUNT_POINT_FULL", &params, (void **)&mp);
+	if (mplen < 0)
+		mp = "-";
+
+	printf("%5x %5x %12llx %10llu %s\n",
+	       r->mnt_id, r->parent_id, (unsigned long long)r->mnt_unique_id,
+	       r->mnt_notify_sum, mp);
+}
+
 static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
 {
 	struct fsinfo_afs_server_address *f = reply;
@@ -492,6 +513,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	string),
 	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT_FULL,	string),
 	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
+	FSINFO_LIST	(FSINFO_ATTR_MOUNT_ALL,		fsinfo_generic_mount_all),
 
 	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	string),
 	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	string),



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 16/18] errseq: add a new errseq_scrape function [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (14 preceding siblings ...)
  2020-08-03 13:38 ` [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace " David Howells
@ 2020-08-03 13:38 ` David Howells
  2020-08-03 13:38 ` [PATCH 17/18] vfs: allow fsinfo to fetch the current state of s_wb_err " David Howells
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:38 UTC (permalink / raw)
  To: viro
  Cc: Jeff Layton, Carlos Maiolino, dhowells, torvalds, raven,
	mszeredi, christian, jannh, darrick.wong, kzak, jlayton,
	linux-api, linux-fsdevel, linux-security-module, linux-kernel

From: Jeff Layton <jlayton@kernel.org>

To grab the current value of an errseq_t, mark it as seen and then
return the value with the seen bit masked off.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/errseq.h |    1 +
 lib/errseq.c           |   33 +++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/errseq.h b/include/linux/errseq.h
index fc2777770768..de165623fa86 100644
--- a/include/linux/errseq.h
+++ b/include/linux/errseq.h
@@ -9,6 +9,7 @@ typedef u32	errseq_t;
 
 errseq_t errseq_set(errseq_t *eseq, int err);
 errseq_t errseq_sample(errseq_t *eseq);
+errseq_t errseq_scrape(errseq_t *eseq);
 int errseq_check(errseq_t *eseq, errseq_t since);
 int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
 #endif
diff --git a/lib/errseq.c b/lib/errseq.c
index 81f9e33aa7e7..8ded0920eed3 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -108,7 +108,7 @@ errseq_t errseq_set(errseq_t *eseq, int err)
 EXPORT_SYMBOL(errseq_set);
 
 /**
- * errseq_sample() - Grab current errseq_t value.
+ * errseq_sample() - Grab current errseq_t value (or 0 if it hasn't been seen)
  * @eseq: Pointer to errseq_t to be sampled.
  *
  * This function allows callers to initialise their errseq_t variable.
@@ -117,7 +117,7 @@ EXPORT_SYMBOL(errseq_set);
  * see it the next time it checks for an error.
  *
  * Context: Any context.
- * Return: The current errseq value.
+ * Return: The current errseq value or 0 if it wasn't previously seen
  */
 errseq_t errseq_sample(errseq_t *eseq)
 {
@@ -130,6 +130,35 @@ errseq_t errseq_sample(errseq_t *eseq)
 }
 EXPORT_SYMBOL(errseq_sample);
 
+/**
+ * errseq_scrape() - Grab current errseq_t value
+ * @eseq: Pointer to errseq_t to be sampled.
+ *
+ * This function allows callers to scrape the current value of an errseq_t.
+ * Unlike errseq_sample, this will always return the current value with
+ * the SEEN flag unset, even when the value has not yet been seen.
+ *
+ * Context: Any context.
+ * Return: The current errseq value with ERRSEQ_SEEN masked off
+ */
+errseq_t errseq_scrape(errseq_t *eseq)
+{
+	errseq_t old = READ_ONCE(*eseq);
+
+	/*
+	 * For the common case of no errors ever having been set, we can skip
+	 * marking the SEEN bit. Once an error has been set, the value will
+	 * never go back to zero.
+	 */
+	if (old != 0) {
+		errseq_t new = old | ERRSEQ_SEEN;
+		if (old != new)
+			cmpxchg(eseq, old, new);
+	}
+	return old & ~ERRSEQ_SEEN;
+}
+EXPORT_SYMBOL(errseq_scrape);
+
 /**
  * errseq_check() - Has an error occurred since a particular sample point?
  * @eseq: Pointer to errseq_t value to be checked.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 17/18] vfs: allow fsinfo to fetch the current state of s_wb_err [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (15 preceding siblings ...)
  2020-08-03 13:38 ` [PATCH 16/18] errseq: add a new errseq_scrape function " David Howells
@ 2020-08-03 13:38 ` David Howells
  2020-08-03 13:39 ` [PATCH 18/18] samples: add error state information to test-fsinfo.c " David Howells
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:38 UTC (permalink / raw)
  To: viro
  Cc: Jeff Layton, dhowells, torvalds, raven, mszeredi, christian,
	jannh, darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

From: Jeff Layton <jlayton@kernel.org>

Add a new "error_state" struct to fsinfo, and teach the kernel to fill
that out from sb->s_wb_err. There are two fields:

wb_error_last: the most recently recorded errno for the filesystem

wb_error_cookie: this value will change vs. the previously fetched
                 value if a new error was recorded since it was last
		 checked. Callers should treat this as an opaque value
		 that can be compared to earlier fetched values.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/fsinfo.c                 |   11 +++++++++++
 include/uapi/linux/fsinfo.h |   13 +++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/fs/fsinfo.c b/fs/fsinfo.c
index f230124ffdf5..ea9d9821d76b 100644
--- a/fs/fsinfo.c
+++ b/fs/fsinfo.c
@@ -274,6 +274,16 @@ static int fsinfo_generic_seq_read(struct path *path, struct fsinfo_context *ctx
 	return m.count + 1;
 }
 
+static int fsinfo_generic_error_state(struct path *path,
+				      struct fsinfo_context *ctx)
+{
+	struct fsinfo_error_state *es = ctx->buffer;
+
+	es->wb_error_cookie = errseq_scrape(&path->dentry->d_sb->s_wb_err);
+	es->wb_error_last = es->wb_error_cookie & MAX_ERRNO;
+	return sizeof(*es);
+}
+
 static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
 	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
@@ -286,6 +296,7 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_SOURCE,		fsinfo_generic_mount_source),
 	FSINFO_STRING	(FSINFO_ATTR_CONFIGURATION,	fsinfo_generic_seq_read),
 	FSINFO_STRING	(FSINFO_ATTR_FS_STATISTICS,	fsinfo_generic_seq_read),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_ERROR_STATE,	fsinfo_generic_error_state),
 
 	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	(void *)123UL),
 	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index e40192d98648..dcd764771a7d 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -27,6 +27,7 @@
 #define FSINFO_ATTR_SOURCE		0x09	/* Superblock source/device name (string) */
 #define FSINFO_ATTR_CONFIGURATION	0x0a	/* Superblock configuration/options (string) */
 #define FSINFO_ATTR_FS_STATISTICS	0x0b	/* Superblock filesystem statistics (string) */
+#define FSINFO_ATTR_ERROR_STATE		0x0c	/* Superblock writeback error state */
 
 #define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
 #define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
@@ -328,4 +329,16 @@ struct fsinfo_afs_server_address {
 
 #define FSINFO_ATTR_AFS_SERVER_ADDRESSES__STRUCT struct fsinfo_afs_server_address
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_ERROR_STATE).
+ *
+ * Retrieve the error state for a filesystem.
+ */
+struct fsinfo_error_state {
+	__u32		wb_error_cookie;	/* writeback error cookie */
+	__u32		wb_error_last;		/* latest writeback error */
+};
+
+#define FSINFO_ATTR_ERROR_STATE__STRUCT struct fsinfo_error_state
+
 #endif /* _UAPI_LINUX_FSINFO_H */



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 18/18] samples: add error state information to test-fsinfo.c [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (16 preceding siblings ...)
  2020-08-03 13:38 ` [PATCH 17/18] vfs: allow fsinfo to fetch the current state of s_wb_err " David Howells
@ 2020-08-03 13:39 ` David Howells
  2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
  2020-08-05 17:13 ` David Howells
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-03 13:39 UTC (permalink / raw)
  To: viro
  Cc: Jeff Layton, dhowells, torvalds, raven, mszeredi, christian,
	jannh, darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

From: Jeff Layton <jlayton@kernel.org>

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 samples/vfs/test-fsinfo.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
index 596fa5e71762..c359c3f52871 100644
--- a/samples/vfs/test-fsinfo.c
+++ b/samples/vfs/test-fsinfo.c
@@ -430,6 +430,15 @@ static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
 	printf("family=%u\n", ss->ss_family);
 }
 
+static void dump_fsinfo_generic_error_state(void *reply, unsigned int size)
+{
+	struct fsinfo_error_state *es = reply;
+
+	printf("\n");
+	printf("\tlatest error : %d (%s)\n", es->wb_error_last, strerror(es->wb_error_last));
+	printf("\tcookie       : 0x%x\n", es->wb_error_cookie);
+}
+
 static void dump_string(void *reply, unsigned int size)
 {
 	char *s = reply, *p;
@@ -518,6 +527,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
 	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	string),
 	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	string),
 	FSINFO_LIST_N	(FSINFO_ATTR_AFS_SERVER_ADDRESSES, afs_fsinfo_server_address),
+	FSINFO_VSTRUCT  (FSINFO_ATTR_ERROR_STATE,       fsinfo_generic_error_state),
 	{}
 };
 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/18] fsinfo: Introduce a non-repeating system-unique superblock ID [ver #21]
  2020-08-03 13:36 ` [PATCH 01/18] fsinfo: Introduce a non-repeating system-unique superblock ID " David Howells
@ 2020-08-04  9:34   ` Miklos Szeredi
  0 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04  9:34 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Linus Torvalds, Ian Kent, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Mon, Aug 3, 2020 at 3:37 PM David Howells <dhowells@redhat.com> wrote:
>
> Introduce an (effectively) non-repeating system-unique superblock ID that
> can be used to determine that two objects are in the same superblock
> without needing to worry about the ID changing in the meantime (as is
> possible with device IDs).
>
> The counter could also be used to tag other features, such as mount
> objects.
>
> Signed-off-by: David Howells <dhowells@redhat.com>

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information [ver #21]
  2020-08-03 13:36 ` [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information " David Howells
@ 2020-08-04 10:16   ` Miklos Szeredi
  2020-08-04 11:34   ` David Howells
  2020-08-27 11:27   ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04 10:16 UTC (permalink / raw)
  To: David Howells
  Cc: viro, linux-api, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-fsdevel,
	linux-security-module, linux-kernel

On Mon, Aug 03, 2020 at 02:36:42PM +0100, David Howells wrote:
> Add a system call to allow filesystem information to be queried.  A request
> value can be given to indicate the desired attribute.  Support is provided
> for enumerating multi-value attributes.
> 
> ===============
> NEW SYSTEM CALL
> ===============
> 
> The new system call looks like:
> 
> 	int ret = fsinfo(int dfd,
> 			 const char *pathname,
> 			 const struct fsinfo_params *params,
> 			 size_t params_size,
> 			 void *result_buffer,
> 			 size_t result_buf_size);
> 
> The params parameter optionally points to a block of parameters:
> 
> 	struct fsinfo_params {
> 		__u64	resolve_flags;
> 		__u32	at_flags;
> 		__u32	flags;
> 		__u32	request;
> 		__u32	Nth;
> 		__u32	Mth;

The Mth field seems to be unused in this patchset.  Since the struct is
extensible, I guess there's no point in adding it now.

> 	};
> 
> If params is NULL, the default is that params->request is
> FSINFO_ATTR_STATFS and all the other fields are 0.  params_size indicates
> the size of the parameter struct.  If the parameter block is short compared
> to what the kernel expects, the missing length will be set to 0; if the
> parameter block is longer, an error will be given if the excess is not all
> zeros.
> 
> The object to be queried is specified as follows - part param->flags
> indicates the type of reference:
> 
>  (1) FSINFO_FLAGS_QUERY_PATH - dfd, pathname and at_flags indicate a
>      filesystem object to query.
> 
>      There is no separate system call providing an analogue of lstat() -
>      AT_SYMLINK_NOFOLLOW should be set in at_flags instead.
>      AT_NO_AUTOMOUNT can also be used to an allow automount point to be
>      queried without triggering it.
> 
>      RESOLVE_* flags can also be set in resolve_flags to further restrict
>      the patchwalk.
> 
>  (2) FSINFO_FLAGS_QUERY_FD - dfd indicates a file descriptor pointing to
>      the filesystem object to query.  pathname should be NULL.

This is at_flags = AT_EMPTY_PATH by convention.


> 
>  (3) FSINFO_FLAGS_QUERY_MOUNT - pathname indicates the numeric ID of the
>      mountpoint to query as a string.  dfd is used to constrain which
>      mounts can be accessed.  If dfd is AT_FDCWD, the mount must be within
>      the subtree rooted at chroot, otherwise the mount must be within the
>      subtree rooted at the directory specified by dfd.
> 
>  (4) In the future FSINFO_FLAGS_QUERY_FSCONTEXT will be added - dfd will
>      indicate a context handle fd obtained from fsopen() or fspick(),
>      allowing that to be queried before the target superblock is attached
>      to the filesystem or even created.

Can you describe features that are added by *this* patch?  It's compex enough as
is.

> 
> params->request indicates the attribute/attributes to be queried.  This can
> be one of:
> 
> 	FSINFO_ATTR_STATFS		- statfs-style info
> 	FSINFO_ATTR_IDS			- Filesystem IDs
> 	FSINFO_ATTR_LIMITS		- Filesystem limits
> 	FSINFO_ATTR_SUPPORTS		- Support for statx, ioctl, etc.
> 	FSINFO_ATTR_TIMESTAMP_INFO	- Inode timestamp info
> 	FSINFO_ATTR_VOLUME_ID		- Volume ID (string)
> 	FSINFO_ATTR_VOLUME_UUID		- Volume UUID
> 	FSINFO_ATTR_VOLUME_NAME		- Volume name (string)
> 	FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO - Information about attr Nth
> 	FSINFO_ATTR_FSINFO_ATTRIBUTES	- List of supported attrs
> 
> Some attributes (such as the servers backing a network filesystem) can have
> multiple values.  These can be enumerated by setting params->Nth and
> params->Mth to 0, 1, ... until ENODATA is returned.
> 
> result_buffer and result_buf_size point to the reply buffer.  The buffer is
> filled up to the specified size, even if this means truncating the reply.
> The size of the full reply is returned, irrespective of the amount data
> that was copied.  In future versions, this will allow extra fields to be
> tacked on to the end of the reply, but anyone not expecting them will only
> get the subset they're expecting.  If either buffer of result_buf_size are
> 0, no copy will take place and the data size will be returned.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: linux-api@vger.kernel.org
> ---
> 
>  arch/alpha/kernel/syscalls/syscall.tbl      |    1 
>  arch/arm/tools/syscall.tbl                  |    1 
>  arch/arm64/include/asm/unistd.h             |    2 
>  arch/arm64/include/asm/unistd32.h           |    2 
>  arch/ia64/kernel/syscalls/syscall.tbl       |    1 
>  arch/m68k/kernel/syscalls/syscall.tbl       |    1 
>  arch/microblaze/kernel/syscalls/syscall.tbl |    1 
>  arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
>  arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
>  arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
>  arch/parisc/kernel/syscalls/syscall.tbl     |    1 
>  arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
>  arch/s390/kernel/syscalls/syscall.tbl       |    1 
>  arch/sh/kernel/syscalls/syscall.tbl         |    1 
>  arch/sparc/kernel/syscalls/syscall.tbl      |    1 
>  arch/x86/entry/syscalls/syscall_32.tbl      |    1 
>  arch/x86/entry/syscalls/syscall_64.tbl      |    1 
>  arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
>  fs/Kconfig                                  |    7 
>  fs/Makefile                                 |    1 
>  fs/fsinfo.c                                 |  596 +++++++++++++++++++++++++
>  include/linux/fs.h                          |    4 
>  include/linux/fsinfo.h                      |   74 +++
>  include/linux/syscalls.h                    |    4 
>  include/uapi/asm-generic/unistd.h           |    4 
>  include/uapi/linux/fsinfo.h                 |  189 ++++++++
>  kernel/sys_ni.c                             |    1 
>  samples/vfs/Makefile                        |    2 
>  samples/vfs/test-fsinfo.c                   |  646 +++++++++++++++++++++++++++
>  29 files changed, 1545 insertions(+), 3 deletions(-)
>  create mode 100644 fs/fsinfo.c
>  create mode 100644 include/linux/fsinfo.h
>  create mode 100644 include/uapi/linux/fsinfo.h
>  create mode 100644 samples/vfs/test-fsinfo.c
> 
> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
> index b6cf8403da35..984abd1ac058 100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -479,3 +479,4 @@
>  548	common	pidfd_getfd			sys_pidfd_getfd
>  549	common	faccessat2			sys_faccessat2
>  550	common	watch_mount			sys_watch_mount
> +551	common	fsinfo				sys_fsinfo
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index 27cc1f53f4a0..bd791f91f5bb 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -453,3 +453,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> index b3b2019f8d16..86a9d7b3eabe 100644
> --- a/arch/arm64/include/asm/unistd.h
> +++ b/arch/arm64/include/asm/unistd.h
> @@ -38,7 +38,7 @@
>  #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
>  #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
>  
> -#define __NR_compat_syscalls		441
> +#define __NR_compat_syscalls		442
>  #endif
>  
>  #define __ARCH_WANT_SYS_CLONE
> diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> index 4f9cf98cdf0f..bd78eb2c487a 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -887,6 +887,8 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
>  __SYSCALL(__NR_faccessat2, sys_faccessat2)
>  #define __NR_watch_mount 440
>  __SYSCALL(__NR_watch_mount, sys_watch_mount)
> +#define __NR_fsinfo 441
> +__SYSCALL(__NR_fsinfo, sys_fsinfo)
>  
>  /*
>   * Please add new compat syscalls above this comment and update
> diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
> index fc6d87903781..09d144487b7d 100644
> --- a/arch/ia64/kernel/syscalls/syscall.tbl
> +++ b/arch/ia64/kernel/syscalls/syscall.tbl
> @@ -360,3 +360,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
> index c671aa0e4d25..1bdc26af3c54 100644
> --- a/arch/m68k/kernel/syscalls/syscall.tbl
> +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> @@ -439,3 +439,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> index 65cc53f129ef..fb8543122904 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -445,3 +445,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index 7f034a239930..b8362bd6bd4a 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -378,3 +378,4 @@
>  438	n32	pidfd_getfd			sys_pidfd_getfd
>  439	n32	faccessat2			sys_faccessat2
>  440	n32	watch_mount			sys_watch_mount
> +441	n32	fsinfo				sys_fsinfo
> diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> index d39b90de3642..60ca4091d378 100644
> --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> @@ -354,3 +354,4 @@
>  438	n64	pidfd_getfd			sys_pidfd_getfd
>  439	n64	faccessat2			sys_faccessat2
>  440	n64	watch_mount			sys_watch_mount
> +441	n64	fsinfo				sys_fsinfo
> diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
> index 09f426cb45b1..07aea9379ca0 100644
> --- a/arch/mips/kernel/syscalls/syscall_o32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
> @@ -427,3 +427,4 @@
>  438	o32	pidfd_getfd			sys_pidfd_getfd
>  439	o32	faccessat2			sys_faccessat2
>  440	o32	watch_mount			sys_watch_mount
> +441	o32	fsinfo				sys_fsinfo
> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
> index 52ff3454baa1..f8060767f11a 100644
> --- a/arch/parisc/kernel/syscalls/syscall.tbl
> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> @@ -437,3 +437,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> index 10b7ed3c7a1b..3036bf1336d2 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -529,3 +529,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> index 86f317bf52df..c0a111fdb3ce 100644
> --- a/arch/s390/kernel/syscalls/syscall.tbl
> +++ b/arch/s390/kernel/syscalls/syscall.tbl
> @@ -442,3 +442,4 @@
>  438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
>  439  common	faccessat2		sys_faccessat2			sys_faccessat2
>  440	common	watch_mount		sys_watch_mount			sys_watch_mount
> +441	common	fsinfo			sys_fsinfo			sys_fsinfo
> diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> index 0bb0f0b372c7..03b55c32441f 100644
> --- a/arch/sh/kernel/syscalls/syscall.tbl
> +++ b/arch/sh/kernel/syscalls/syscall.tbl
> @@ -442,3 +442,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
> index 369ab65c1e9a..a0144db9fb8c 100644
> --- a/arch/sparc/kernel/syscalls/syscall.tbl
> +++ b/arch/sparc/kernel/syscalls/syscall.tbl
> @@ -485,3 +485,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index e760ba92c58d..edf90a2be0b9 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -444,3 +444,4 @@
>  438	i386	pidfd_getfd		sys_pidfd_getfd
>  439	i386	faccessat2		sys_faccessat2
>  440	i386	watch_mount		sys_watch_mount
> +441	i386	fsinfo			sys_fsinfo
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5b58621d4f75..ab0eda639d67 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -361,6 +361,7 @@
>  438	common	pidfd_getfd		sys_pidfd_getfd
>  439	common	faccessat2		sys_faccessat2
>  440	common	watch_mount		sys_watch_mount
> +441	common	fsinfo			sys_fsinfo
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> index 5b28ee39f70f..979013890caf 100644
> --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> @@ -410,3 +410,4 @@
>  438	common	pidfd_getfd			sys_pidfd_getfd
>  439	common	faccessat2			sys_faccessat2
>  440	common	watch_mount			sys_watch_mount
> +441	common	fsinfo				sys_fsinfo
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 1a55e56d5c54..df76451ab49a 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -15,6 +15,13 @@ config VALIDATE_FS_PARSER
>  	  Enable this to perform validation of the parameter description for a
>  	  filesystem when it is registered.
>  
> +config FSINFO
> +	bool "Enable the fsinfo() system call"
> +	help
> +	  Enable the file system information querying system call to allow
> +	  comprehensive information to be retrieved about a filesystem,
> +	  superblock or mount object.
> +
>  if BLOCK
>  
>  config FS_IOMAP
> diff --git a/fs/Makefile b/fs/Makefile
> index dd0d87e2ef19..93a7f8047585 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -55,6 +55,7 @@ obj-$(CONFIG_COREDUMP)		+= coredump.o
>  obj-$(CONFIG_SYSCTL)		+= drop_caches.o
>  
>  obj-$(CONFIG_FHANDLE)		+= fhandle.o
> +obj-$(CONFIG_FSINFO)		+= fsinfo.o
>  obj-y				+= iomap/
>  
>  obj-y				+= quota/
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> new file mode 100644
> index 000000000000..7d9c73e9cbde
> --- /dev/null
> +++ b/fs/fsinfo.c
> @@ -0,0 +1,596 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Filesystem information query.
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +#include <linux/syscalls.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/mount.h>
> +#include <linux/namei.h>
> +#include <linux/statfs.h>
> +#include <linux/security.h>
> +#include <linux/uaccess.h>
> +#include <linux/fsinfo.h>
> +#include <uapi/linux/mount.h>
> +#include "internal.h"
> +
> +/**
> + * fsinfo_opaque - Store opaque blob as an fsinfo attribute value.
> + * @s: The blob to store (may be NULL)
> + * @ctx: The parameter context
> + * @len: The length of the blob
> + */
> +int fsinfo_opaque(const void *s, struct fsinfo_context *ctx, unsigned int len)
> +{
> +	void *p = ctx->buffer;
> +	int ret = 0;
> +
> +	if (s) {
> +		if (!ctx->want_size_only)
> +			memcpy(p, s, len);
> +		ret = len;
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(fsinfo_opaque);
> +
> +/**
> + * fsinfo_string - Store a NUL-terminated string as an fsinfo attribute value.
> + * @s: The string to store (may be NULL)
> + * @ctx: The parameter context
> + */
> +int fsinfo_string(const char *s, struct fsinfo_context *ctx)
> +{
> +	if (!s)
> +		return 1;
> +	return fsinfo_opaque(s, ctx, min_t(size_t, strlen(s) + 1, ctx->buf_size));
> +}
> +EXPORT_SYMBOL(fsinfo_string);
> +
> +/*
> + * Get basic filesystem stats from statfs.
> + */
> +static int fsinfo_generic_statfs(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_statfs *p = ctx->buffer;
> +	struct kstatfs buf;
> +	int ret;
> +
> +	ret = vfs_statfs(path, &buf);
> +	if (ret < 0)
> +		return ret;
> +
> +	p->f_blocks.lo	= buf.f_blocks;
> +	p->f_bfree.lo	= buf.f_bfree;
> +	p->f_bavail.lo	= buf.f_bavail;
> +	p->f_files.lo	= buf.f_files;
> +	p->f_ffree.lo	= buf.f_ffree;
> +	p->f_favail.lo	= buf.f_ffree;
> +	p->f_bsize	= buf.f_bsize;
> +	p->f_frsize	= buf.f_frsize;
> +	return sizeof(*p);
> +}
> +
> +static int fsinfo_generic_ids(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_ids *p = ctx->buffer;
> +	struct super_block *sb;
> +	struct kstatfs buf;
> +	int ret;
> +
> +	ret = vfs_statfs(path, &buf);
> +	if (ret < 0 && ret != -ENOSYS)
> +		return ret;
> +	if (ret == 0)
> +		memcpy(&p->f_fsid, &buf.f_fsid, sizeof(p->f_fsid));
> +
> +	sb = path->dentry->d_sb;
> +	p->f_fstype	= sb->s_magic;
> +	p->f_dev_major	= MAJOR(sb->s_dev);
> +	p->f_dev_minor	= MINOR(sb->s_dev);
> +	p->f_sb_id	= sb->s_unique_id;
> +	strlcpy(p->f_fs_name, sb->s_type->name, sizeof(p->f_fs_name));
> +	return sizeof(*p);
> +}
> +
> +int fsinfo_generic_limits(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_limits *p = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	p->max_file_size.hi	= 0;
> +	p->max_file_size.lo	= sb->s_maxbytes;
> +	p->max_ino.hi		= 0;
> +	p->max_ino.lo		= UINT_MAX;
> +	p->max_hard_links	= sb->s_max_links;
> +	p->max_uid		= UINT_MAX;
> +	p->max_gid		= UINT_MAX;
> +	p->max_projid		= UINT_MAX;
> +	p->max_filename_len	= NAME_MAX;
> +	p->max_symlink_len	= PATH_MAX;
> +	p->max_xattr_name_len	= XATTR_NAME_MAX;
> +	p->max_xattr_body_len	= XATTR_SIZE_MAX;
> +	p->max_dev_major	= 0xffffff;
> +	p->max_dev_minor	= 0xff;
> +	return sizeof(*p);
> +}
> +EXPORT_SYMBOL(fsinfo_generic_limits);
> +
> +int fsinfo_generic_supports(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_supports *p = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	p->stx_mask = STATX_BASIC_STATS;
> +	if (sb->s_d_op && sb->s_d_op->d_automount)
> +		p->stx_attributes |= STATX_ATTR_AUTOMOUNT;
> +	return sizeof(*p);
> +}
> +EXPORT_SYMBOL(fsinfo_generic_supports);
> +
> +static const struct fsinfo_timestamp_info fsinfo_default_timestamp_info = {
> +	.atime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +	.mtime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +	.ctime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +	.btime = {
> +		.minimum	= S64_MIN,
> +		.maximum	= S64_MAX,
> +		.gran_mantissa	= 1,
> +		.gran_exponent	= 0,
> +	},
> +};
> +
> +int fsinfo_generic_timestamp_info(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_timestamp_info *p = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +	s8 exponent;
> +
> +	*p = fsinfo_default_timestamp_info;
> +
> +	if (sb->s_time_gran < 1000000000) {
> +		if (sb->s_time_gran < 1000)
> +			exponent = -9;
> +		else if (sb->s_time_gran < 1000000)
> +			exponent = -6;
> +		else
> +			exponent = -3;
> +
> +		p->atime.gran_exponent = exponent;
> +		p->mtime.gran_exponent = exponent;
> +		p->ctime.gran_exponent = exponent;
> +		p->btime.gran_exponent = exponent;
> +	}
> +
> +	return sizeof(*p);
> +}
> +EXPORT_SYMBOL(fsinfo_generic_timestamp_info);
> +
> +static int fsinfo_generic_volume_uuid(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_volume_uuid *p = ctx->buffer;
> +	struct super_block *sb = path->dentry->d_sb;
> +
> +	memcpy(p, &sb->s_uuid, sizeof(*p));
> +	return sizeof(*p);
> +}
> +
> +static int fsinfo_generic_volume_id(struct path *path, struct fsinfo_context *ctx)
> +{
> +	return fsinfo_string(path->dentry->d_sb->s_id, ctx);
> +}
> +
> +static const struct fsinfo_attribute fsinfo_common_attributes[] = {
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
> +	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		fsinfo_generic_volume_id),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
> +
> +	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	(void *)123UL),
> +	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
> +	{}
> +};
> +
> +/*
> + * Determine an attribute's minimum buffer size and, if the buffer is large
> + * enough, get the attribute value.
> + */
> +static int fsinfo_get_this_attribute(struct path *path,
> +				     struct fsinfo_context *ctx,
> +				     const struct fsinfo_attribute *attr)
> +{
> +	int buf_size;
> +
> +	if (ctx->Nth != 0 && !(attr->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)))
> +		return -ENODATA;
> +	if (ctx->Mth != 0 && !(attr->flags & FSINFO_FLAGS_NM))
> +		return -ENODATA;
> +
> +	switch (attr->type) {
> +	case FSINFO_TYPE_VSTRUCT:
> +		ctx->clear_tail = true;
> +		buf_size = attr->size;
> +		break;
> +	case FSINFO_TYPE_STRING:
> +	case FSINFO_TYPE_OPAQUE:
> +	case FSINFO_TYPE_LIST:
> +		buf_size = 4096;
> +		break;
> +	default:
> +		return -ENOPKG;
> +	}
> +
> +	if (ctx->buf_size < buf_size)
> +		return buf_size;
> +
> +	return attr->get(path, ctx);
> +}
> +
> +static void fsinfo_attributes_insert(struct fsinfo_context *ctx,
> +				     const struct fsinfo_attribute *attr)
> +{
> +	__u32 *p = ctx->buffer;
> +	unsigned int i;
> +
> +	if (ctx->usage >= ctx->buf_size ||
> +	    ctx->buf_size - ctx->usage < sizeof(__u32)) {
> +		ctx->usage += sizeof(__u32);
> +		return;
> +	}
> +
> +	for (i = 0; i < ctx->usage / sizeof(__u32); i++)
> +		if (p[i] == attr->attr_id)
> +			return;
> +
> +	p[i] = attr->attr_id;
> +	ctx->usage += sizeof(__u32);
> +}
> +
> +static int fsinfo_list_attributes(struct path *path,
> +				  struct fsinfo_context *ctx,
> +				  const struct fsinfo_attribute *attributes)
> +{
> +	const struct fsinfo_attribute *a;
> +
> +	for (a = attributes; a->get; a++)
> +		fsinfo_attributes_insert(ctx, a);
> +	return -EOPNOTSUPP; /* We want to go through all the lists */
> +}
> +
> +static int fsinfo_get_attribute_info(struct path *path,
> +				     struct fsinfo_context *ctx,
> +				     const struct fsinfo_attribute *attributes)
> +{
> +	const struct fsinfo_attribute *a;
> +	struct fsinfo_attribute_info *p = ctx->buffer;
> +
> +	if (!ctx->buf_size)
> +		return sizeof(*p);
> +
> +	for (a = attributes; a->get; a++) {
> +		if (a->attr_id == ctx->Nth) {
> +			p->attr_id	= a->attr_id;
> +			p->type		= a->type;
> +			p->flags	= a->flags;
> +			p->size		= a->size;
> +			p->size		= a->size;
> +			return sizeof(*p);
> +		}
> +	}
> +	return -EOPNOTSUPP; /* We want to go through all the lists */
> +}
> +
> +/**
> + * fsinfo_get_attribute - Look up and handle an attribute
> + * @path: The object to query
> + * @params: Parameters to define a request and place to store result
> + * @attributes: List of attributes to search.
> + *
> + * Look through a list of attributes for one that matches the requested
> + * attribute then call the handler for it.
> + */
> +int fsinfo_get_attribute(struct path *path, struct fsinfo_context *ctx,
> +			 const struct fsinfo_attribute *attributes)
> +{
> +	const struct fsinfo_attribute *a;
> +
> +	switch (ctx->requested_attr) {
> +	case FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO:
> +		return fsinfo_get_attribute_info(path, ctx, attributes);
> +	case FSINFO_ATTR_FSINFO_ATTRIBUTES:
> +		return fsinfo_list_attributes(path, ctx, attributes);
> +	default:
> +		for (a = attributes; a->get; a++)
> +			if (a->attr_id == ctx->requested_attr)
> +				return fsinfo_get_this_attribute(path, ctx, a);
> +		return -EOPNOTSUPP;
> +	}
> +}
> +EXPORT_SYMBOL(fsinfo_get_attribute);
> +
> +/**
> + * generic_fsinfo - Handle an fsinfo attribute generically
> + * @path: The object to query
> + * @params: Parameters to define a request and place to store result
> + */
> +static int fsinfo_call(struct path *path, struct fsinfo_context *ctx)
> +{
> +	int ret;
> +
> +	if (path->dentry->d_sb->s_op->fsinfo) {
> +		ret = path->dentry->d_sb->s_op->fsinfo(path, ctx);
> +		if (ret != -EOPNOTSUPP)
> +			return ret;
> +	}
> +	ret = fsinfo_get_attribute(path, ctx, fsinfo_common_attributes);
> +	if (ret != -EOPNOTSUPP)
> +		return ret;
> +
> +	switch (ctx->requested_attr) {
> +	case FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO:
> +		return -ENODATA;
> +	case FSINFO_ATTR_FSINFO_ATTRIBUTES:
> +		return ctx->usage;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +/**
> + * vfs_fsinfo - Retrieve filesystem information
> + * @path: The object to query
> + * @params: Parameters to define a request and place to store result
> + *
> + * Get an attribute on a filesystem or an object within a filesystem.  The
> + * filesystem attribute to be queried is indicated by @ctx->requested_attr, and
> + * if it's a multi-valued attribute, the particular value is selected by
> + * @ctx->Nth and then @ctx->Mth.
> + *
> + * For common attributes, a value may be fabricated if it is not supported by
> + * the filesystem.
> + *
> + * On success, the size of the attribute's value is returned (0 is a valid
> + * size).  A buffer will have been allocated and will be pointed to by
> + * @ctx->buffer.  The caller must free this with kvfree().
> + *
> + * Errors can also be returned: -ENOMEM if a buffer cannot be allocated, -EPERM
> + * or -EACCES if permission is denied by the LSM, -EOPNOTSUPP if an attribute
> + * doesn't exist for the specified object or -ENODATA if the attribute exists,
> + * but the Nth,Mth value does not exist.  -EMSGSIZE indicates that the value is
> + * unmanageable internally and -ENOPKG indicates other internal failure.
> + *
> + * Errors such as -EIO may also come from attempts to access media or servers
> + * to obtain the requested information if it's not immediately to hand.
> + *
> + * [*] Note that the caller may set @ctx->want_size_only if it only wants the
> + *     size of the value and not the data.  If this is set, a buffer may not be
> + *     allocated under some circumstances.  This is intended for size query by
> + *     userspace.
> + *
> + * [*] Note that @ctx->clear_tail will be returned set if the data should be
> + *     padded out with zeros when writing it to userspace.
> + */
> +static int vfs_fsinfo(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct dentry *dentry = path->dentry;
> +	int ret;
> +
> +	ret = security_sb_statfs(dentry);
> +	if (ret)
> +		return ret;
> +
> +	/* Call the handler to find out the buffer size required. */
> +	ctx->buf_size = 0;
> +	ret = fsinfo_call(path, ctx);
> +	if (ret < 0 || ctx->want_size_only)
> +		return ret;
> +	ctx->buf_size = ret;
> +
> +	do {
> +		/* Allocate a buffer of the requested size. */
> +		if (ctx->buf_size > INT_MAX)
> +			return -EMSGSIZE;
> +		ctx->buffer = kvzalloc(ctx->buf_size, GFP_KERNEL);
> +		if (!ctx->buffer)
> +			return -ENOMEM;
> +
> +		ctx->usage = 0;
> +		ctx->skip = 0;
> +		ret = fsinfo_call(path, ctx);
> +		if (IS_ERR_VALUE((long)ret))
> +			return ret;
> +		if ((unsigned int)ret <= ctx->buf_size)
> +			return ret; /* It fitted */
> +
> +		/* We need to resize the buffer */
> +		ctx->buf_size = roundup(ret, PAGE_SIZE);
> +		kvfree(ctx->buffer);
> +		ctx->buffer = NULL;
> +	} while (!signal_pending(current));
> +
> +	return -ERESTARTSYS;
> +}
> +
> +static int vfs_fsinfo_path(int dfd, const char __user *pathname,
> +			   const struct fsinfo_params *up,
> +			   struct fsinfo_context *ctx)
> +{
> +	struct path path;
> +	unsigned lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
> +	int ret = -EINVAL;
> +
> +	if (up->resolve_flags & ~VALID_RESOLVE_FLAGS)
> +		return -EINVAL;
> +	if (up->at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
> +			     AT_EMPTY_PATH))
> +		return -EINVAL;
> +
> +	if (up->resolve_flags & RESOLVE_NO_XDEV)
> +		lookup_flags |= LOOKUP_NO_XDEV;
> +	if (up->resolve_flags & RESOLVE_NO_MAGICLINKS)
> +		lookup_flags |= LOOKUP_NO_MAGICLINKS;
> +	if (up->resolve_flags & RESOLVE_NO_SYMLINKS)
> +		lookup_flags |= LOOKUP_NO_SYMLINKS;
> +	if (up->resolve_flags & RESOLVE_BENEATH)
> +		lookup_flags |= LOOKUP_BENEATH;
> +	if (up->resolve_flags & RESOLVE_IN_ROOT)
> +		lookup_flags |= LOOKUP_IN_ROOT;
> +	if (up->at_flags & AT_SYMLINK_NOFOLLOW)
> +		lookup_flags &= ~LOOKUP_FOLLOW;
> +	if (up->at_flags & AT_NO_AUTOMOUNT)
> +		lookup_flags &= ~LOOKUP_AUTOMOUNT;
> +	if (up->at_flags & AT_EMPTY_PATH)
> +		lookup_flags |= LOOKUP_EMPTY;
> +
> +retry:
> +	ret = user_path_at(dfd, pathname, lookup_flags, &path);
> +	if (ret)
> +		goto out;
> +
> +	ret = vfs_fsinfo(&path, ctx);
> +	path_put(&path);
> +	if (retry_estale(ret, lookup_flags)) {
> +		lookup_flags |= LOOKUP_REVAL;
> +		goto retry;
> +	}
> +out:
> +	return ret;
> +}
> +
> +static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_context *ctx)
> +{
> +	struct fd f = fdget_raw(fd);
> +	int ret = -EBADF;
> +
> +	if (f.file) {
> +		ret = vfs_fsinfo(&f.file->f_path, ctx);
> +		fdput(f);
> +	}
> +	return ret;
> +}
> +
> +/**
> + * sys_fsinfo - System call to get filesystem information
> + * @dfd: Base directory to pathwalk from or fd referring to filesystem.
> + * @pathname: Filesystem to query or NULL.
> + * @params: Parameters to define request (NULL: FSINFO_ATTR_STATFS).
> + * @params_size: Size of parameter buffer.
> + * @result_buffer: Result buffer.
> + * @result_buf_size: Size of result buffer.
> + *
> + * Get information on a filesystem.  The filesystem attribute to be queried is
> + * indicated by @_params->request, and some of the attributes can have multiple
> + * values, indexed by @_params->Nth and @_params->Mth.  If @_params is NULL,
> + * then the 0th fsinfo_attr_statfs attribute is queried.  If an attribute does
> + * not exist, EOPNOTSUPP is returned; if the Nth,Mth value does not exist,
> + * ENODATA is returned.
> + *
> + * On success, the size of the attribute's value is returned.  If
> + * @result_buf_size is 0 or @result_buffer is NULL, only the size is returned.
> + * If the size of the value is larger than @result_buf_size, it will be
> + * truncated by the copy.  If the size of the value is smaller than
> + * @result_buf_size then the excess buffer space will be cleared.  The full
> + * size of the value will be returned, irrespective of how much data is
> + * actually placed in the buffer.
> + */
> +SYSCALL_DEFINE6(fsinfo,
> +		int, dfd,
> +		const char __user *, pathname,
> +		const struct fsinfo_params __user *, params,
> +		size_t, params_size,
> +		void __user *, result_buffer,
> +		size_t, result_buf_size)
> +{
> +	struct fsinfo_context ctx;
> +	struct fsinfo_params user_params;
> +	unsigned int result_size;
> +	void *r;
> +	int ret;
> +
> +	if ((!params &&  params_size) ||
> +	    ( params && !params_size) ||
> +	    (!result_buffer &&  result_buf_size) ||
> +	    ( result_buffer && !result_buf_size))
> +		return -EINVAL;
> +	if (result_buf_size > UINT_MAX)
> +		return -EOVERFLOW;
> +
> +	memset(&ctx, 0, sizeof(ctx));
> +	ctx.requested_attr	= FSINFO_ATTR_STATFS;
> +	ctx.flags		= FSINFO_FLAGS_QUERY_PATH;
> +	ctx.want_size_only	= (result_buf_size == 0);
> +
> +	if (params) {
> +		ret = copy_struct_from_user(&user_params, sizeof(user_params),
> +					    params, params_size);
> +		if (ret < 0)
> +			return ret;
> +		if (user_params.flags & ~FSINFO_FLAGS_QUERY_MASK)
> +			return -EINVAL;
> +		ctx.flags = user_params.flags;
> +		ctx.requested_attr = user_params.request;
> +		ctx.Nth = user_params.Nth;
> +		ctx.Mth = user_params.Mth;
> +	}
> +
> +	switch (ctx.flags & FSINFO_FLAGS_QUERY_MASK) {
> +	case FSINFO_FLAGS_QUERY_PATH:
> +		ret = vfs_fsinfo_path(dfd, pathname, &user_params, &ctx);
> +		break;
> +	case FSINFO_FLAGS_QUERY_FD:
> +		if (pathname)
> +			return -EINVAL;
> +		ret = vfs_fsinfo_fd(dfd, &ctx);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (ret < 0)
> +		goto error;
> +
> +	r = ctx.buffer + ctx.skip;
> +	result_size = min_t(size_t, ret, result_buf_size);
> +	if (result_size > 0 &&
> +	    copy_to_user(result_buffer, r, result_size) != 0) {
> +		ret = -EFAULT;
> +		goto error;
> +	}
> +
> +	/* Clear any part of the buffer that we won't fill if we're putting a
> +	 * struct in there.  Strings, opaque objects and arrays are expected to
> +	 * be variable length.
> +	 */
> +	if (ctx.clear_tail &&
> +	    result_buf_size > result_size &&
> +	    clear_user(result_buffer + result_size,
> +		       result_buf_size - result_size) != 0) {
> +		ret = -EFAULT;
> +		goto error;
> +	}
> +
> +error:
> +	kvfree(ctx.buffer);
> +	return ret;
> +}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 28a29356eace..3284f497de0a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -68,6 +68,7 @@ struct fsverity_info;
>  struct fsverity_operations;
>  struct fs_context;
>  struct fs_parameter_spec;
> +struct fsinfo_context;
>  
>  extern void __init inode_init(void);
>  extern void __init inode_init_early(void);
> @@ -1963,6 +1964,9 @@ struct super_operations {
>  	int (*thaw_super) (struct super_block *);
>  	int (*unfreeze_fs) (struct super_block *);
>  	int (*statfs) (struct dentry *, struct kstatfs *);
> +#ifdef CONFIG_FSINFO
> +	int (*fsinfo)(struct path *, struct fsinfo_context *);
> +#endif
>  	int (*remount_fs) (struct super_block *, int *, char *);
>  	void (*umount_begin) (struct super_block *);
>  
> diff --git a/include/linux/fsinfo.h b/include/linux/fsinfo.h
> new file mode 100644
> index 000000000000..a811d69b02ff
> --- /dev/null
> +++ b/include/linux/fsinfo.h
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Filesystem information query
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#ifndef _LINUX_FSINFO_H
> +#define _LINUX_FSINFO_H
> +
> +#ifdef CONFIG_FSINFO
> +
> +#include <uapi/linux/fsinfo.h>
> +
> +struct path;
> +
> +#define FSINFO_NORMAL_ATTR_MAX_SIZE 4096
> +
> +struct fsinfo_context {
> +	__u32		flags;		/* [in] FSINFO_FLAGS_* */
> +	__u32		requested_attr;	/* [in] What is being asking for */
> +	__u32		Nth;		/* [in] Instance of it (some may have multiple) */
> +	__u32		Mth;		/* [in] Subinstance */
> +	bool		want_size_only;	/* [in] Just want to know the size, not the data */
> +	bool		clear_tail;	/* [out] T if tail of buffer should be cleared */
> +	unsigned int	skip;		/* [out] Number of bytes to skip in buffer */
> +	unsigned int	usage;		/* [tmp] Amount of buffer used (if large) */
> +	unsigned int	buf_size;	/* [tmp] Size of ->buffer[] */
> +	void		*buffer;	/* [out] The reply buffer */
> +};
> +
> +/*
> + * A filesystem information attribute definition.
> + */
> +struct fsinfo_attribute {
> +	unsigned int		attr_id;	/* The ID of the attribute */
> +	enum fsinfo_value_type	type:8;		/* The type of the attribute's value(s) */
> +	unsigned int		flags:8;
> +	unsigned int		size:16;	/* - Value size (FSINFO_STRUCT/LIST) */
> +	int (*get)(struct path *path, struct fsinfo_context *params);
> +};
> +
> +#define __FSINFO(A, T, S, G, F) \
> +	{ .attr_id = A, .type = T, .flags = F, .size = S, .get = G }
> +
> +#define _FSINFO(A, T, S, G)	__FSINFO(A, T, S, G, 0)
> +#define _FSINFO_N(A, T, S, G)	__FSINFO(A, T, S, G, FSINFO_FLAGS_N)
> +#define _FSINFO_NM(A, T, S, G)	__FSINFO(A, T, S, G, FSINFO_FLAGS_NM)
> +
> +#define _FSINFO_VSTRUCT(A,S,G)	  _FSINFO   (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G)
> +#define _FSINFO_VSTRUCT_N(A,S,G)  _FSINFO_N (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G)
> +#define _FSINFO_VSTRUCT_NM(A,S,G) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), G)
> +
> +#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G)
> +#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G)
> +#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G)
> +#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, 0, G)
> +#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, 0, G)
> +#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, 0, G)
> +#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, 0, G)


The opaque type seems to be unused in this patchset.  It's definitely not
somehting we want without a good reason, so if that reason arrises, then let's
please discuss then.

> +#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G)
> +#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G)
> +
> +extern int fsinfo_opaque(const void *, struct fsinfo_context *, unsigned int);
> +extern int fsinfo_string(const char *, struct fsinfo_context *);
> +extern int fsinfo_generic_timestamp_info(struct path *, struct fsinfo_context *);
> +extern int fsinfo_generic_supports(struct path *, struct fsinfo_context *);
> +extern int fsinfo_generic_limits(struct path *, struct fsinfo_context *);
> +extern int fsinfo_get_attribute(struct path *, struct fsinfo_context *,
> +				const struct fsinfo_attribute *);
> +
> +#endif /* CONFIG_FSINFO */
> +
> +#endif /* _LINUX_FSINFO_H */
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 88d03fd627ab..e31ad49af4c3 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -47,6 +47,7 @@ struct stat64;
>  struct statfs;
>  struct statfs64;
>  struct statx;
> +struct fsinfo_params;
>  struct __sysctl_args;
>  struct sysinfo;
>  struct timespec;
> @@ -1007,6 +1008,9 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
>  asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
>  asmlinkage long sys_watch_mount(int dfd, const char __user *path,
>  				unsigned int at_flags, int watch_fd, int watch_id);
> +asmlinkage long sys_fsinfo(int dfd, const char __user *pathname,
> +			   const struct fsinfo_params __user *params, size_t params_size,
> +			   void __user *result_buffer, size_t result_buf_size);
>  
>  /*
>   * Architecture-specific system calls
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index fcdca8c7d30a..9e38f611ab56 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -859,9 +859,11 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
>  __SYSCALL(__NR_faccessat2, sys_faccessat2)
>  #define __NR_watch_mount 440
>  __SYSCALL(__NR_watch_mount, sys_watch_mount)
> +#define __NR_fsinfo 441
> +__SYSCALL(__NR_fsinfo, sys_fsinfo)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 441
> +#define __NR_syscalls 442
>  
>  /*
>   * 32 bit systems traditionally used different
> diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
> new file mode 100644
> index 000000000000..65892239ba86
> --- /dev/null
> +++ b/include/uapi/linux/fsinfo.h
> @@ -0,0 +1,189 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* fsinfo() definitions.
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +#ifndef _UAPI_LINUX_FSINFO_H
> +#define _UAPI_LINUX_FSINFO_H
> +
> +#include <linux/types.h>
> +#include <linux/socket.h>
> +#include <linux/openat2.h>
> +
> +/*
> + * The filesystem attributes that can be requested.  Note that some attributes
> + * may have multiple instances which can be switched in the parameter block.
> + */
> +#define FSINFO_ATTR_STATFS		0x00	/* statfs()-style state */
> +#define FSINFO_ATTR_IDS			0x01	/* Filesystem IDs */
> +#define FSINFO_ATTR_LIMITS		0x02	/* Filesystem limits */
> +#define FSINFO_ATTR_SUPPORTS		0x03	/* What's supported in statx, iocflags, ... */
> +#define FSINFO_ATTR_TIMESTAMP_INFO	0x04	/* Inode timestamp info */
> +#define FSINFO_ATTR_VOLUME_ID		0x05	/* Volume ID (string) */
> +#define FSINFO_ATTR_VOLUME_UUID		0x06	/* Volume UUID (LE uuid) */
> +#define FSINFO_ATTR_VOLUME_NAME		0x07	/* Volume name (string) */
> +
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */


I think it would make sense to move the actual attributes to a separate patch
and leave this just being the infrastructure.

> +
> +/*
> + * Optional fsinfo() parameter structure.
> + *
> + * If this is not given, it is assumed that fsinfo_attr_statfs instance 0,0 is
> + * desired.
> + */
> +struct fsinfo_params {
> +	__u64	resolve_flags;	/* RESOLVE_* flags */
> +	__u32	at_flags;	/* AT_* flags */
> +	__u32	flags;		/* Flags controlling fsinfo() specifically */
> +#define FSINFO_FLAGS_QUERY_MASK	0x0007 /* What object should fsinfo() query? */
> +#define FSINFO_FLAGS_QUERY_PATH	0x0000 /* - path, specified by dirfd,pathname,AT_EMPTY_PATH */
> +#define FSINFO_FLAGS_QUERY_FD	0x0001 /* - fd specified by dirfd */
> +	__u32	request;	/* ID of requested attribute */
> +	__u32	Nth;		/* Instance of it (some may have multiple) */
> +	__u32	Mth;		/* Subinstance of Nth instance */
> +};
> +
> +enum fsinfo_value_type {
> +	FSINFO_TYPE_VSTRUCT	= 0,	/* Version-lengthed struct (up to 4096 bytes) */
> +	FSINFO_TYPE_STRING	= 1,	/* NUL-term var-length string (up to 4095 chars) */
> +	FSINFO_TYPE_OPAQUE	= 2,	/* Opaque blob (unlimited size) */
> +	FSINFO_TYPE_LIST	= 3,	/* List of ints/structs (unlimited size) */
> +};
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO).
> + *
> + * This gives information about the attributes supported by fsinfo for the
> + * given path.
> + */
> +struct fsinfo_attribute_info {
> +	unsigned int		attr_id;	/* The ID of the attribute */
> +	enum fsinfo_value_type	type;		/* The type of the attribute's value(s) */
> +	unsigned int		flags;
> +#define FSINFO_FLAGS_N		0x01		/* - Attr has a set of values */
> +#define FSINFO_FLAGS_NM		0x02		/* - Attr has a set of sets of values */
> +	unsigned int		size;		/* - Value size (FSINFO_STRUCT/FSINFO_LIST) */
> +};
> +
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO__STRUCT struct fsinfo_attribute_info
> +#define FSINFO_ATTR_FSINFO_ATTRIBUTES__STRUCT __u32
> +
> +struct fsinfo_u128 {
> +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
> +	__u64	hi;
> +	__u64	lo;
> +#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
> +	__u64	lo;
> +	__u64	hi;
> +#endif

Shouldn't this belong in <linux/types.h>?

> +};
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_STATFS).
> + * - This gives extended filesystem information.
> + */
> +struct fsinfo_statfs {
> +	struct fsinfo_u128 f_blocks;	/* Total number of blocks in fs */
> +	struct fsinfo_u128 f_bfree;	/* Total number of free blocks */
> +	struct fsinfo_u128 f_bavail;	/* Number of free blocks available to ordinary user */
> +	struct fsinfo_u128 f_files;	/* Total number of file nodes in fs */
> +	struct fsinfo_u128 f_ffree;	/* Number of free file nodes */
> +	struct fsinfo_u128 f_favail;	/* Number of file nodes available to ordinary user */


Is there a reason these are 128 wide fields?  Are we approaching the limits of
64bits?

> +	__u64	f_bsize;		/* Optimal block size */
> +	__u64	f_frsize;		/* Fragment size */
> +};
> +
> +#define FSINFO_ATTR_STATFS__STRUCT struct fsinfo_statfs
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_IDS).
> + *
> + * List of basic identifiers as is normally found in statfs().
> + */
> +struct fsinfo_ids {
> +	char	f_fs_name[15 + 1];	/* Filesystem name */
> +	__u64	f_fsid;			/* Short 64-bit Filesystem ID (as statfs) */
> +	__u64	f_sb_id;		/* Internal superblock ID for sbnotify()/mntnotify() */
> +	__u32	f_fstype;		/* Filesystem type from linux/magic.h [uncond] */
> +	__u32	f_dev_major;		/* As st_dev_* from struct statx [uncond] */
> +	__u32	f_dev_minor;
> +	__u32	__padding[1];
> +};
> +
> +#define FSINFO_ATTR_IDS__STRUCT struct fsinfo_ids
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_LIMITS).
> + *
> + * List of supported filesystem limits.
> + */
> +struct fsinfo_limits {
> +	struct fsinfo_u128 max_file_size;	/* Maximum file size */
> +	struct fsinfo_u128 max_ino;		/* Maximum inode number */

Again, what's the reason.  AFACT we are not yet worried about overflowing 64
bits.  Future proofing is good, but there has to be some rules and reasons
behind the decisions.

BTW, having all-string attributes (which I have advocated in the past) would
avoid having to worry about field widths.

> +	__u64	max_uid;			/* Maximum UID supported */
> +	__u64	max_gid;			/* Maximum GID supported */
> +	__u64	max_projid;			/* Maximum project ID supported */
> +	__u64	max_hard_links;			/* Maximum number of hard links on a file */
> +	__u64	max_xattr_body_len;		/* Maximum xattr content length */
> +	__u32	max_xattr_name_len;		/* Maximum xattr name length */
> +	__u32	max_filename_len;		/* Maximum filename length */
> +	__u32	max_symlink_len;		/* Maximum symlink content length */
> +	__u32	max_dev_major;			/* Maximum device major representable */
> +	__u32	max_dev_minor;			/* Maximum device minor representable */
> +	__u32	__padding[1];
> +};
> +
> +#define FSINFO_ATTR_LIMITS__STRUCT struct fsinfo_limits
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_SUPPORTS).
> + *
> + * What's supported in various masks, such as statx() attribute and mask bits
> + * and IOC flags.
> + */
> +struct fsinfo_supports {
> +	__u64	stx_attributes;		/* What statx::stx_attributes are supported */
> +	__u32	stx_mask;		/* What statx::stx_mask bits are supported */
> +	__u32	fs_ioc_getflags;	/* What FS_IOC_GETFLAGS may return */
> +	__u32	fs_ioc_setflags_set;	/* What FS_IOC_SETFLAGS may set */
> +	__u32	fs_ioc_setflags_clear;	/* What FS_IOC_SETFLAGS may clear */
> +	__u32	fs_ioc_fsgetxattr_xflags; /* What FS_IOC_FSGETXATTR[A] may return in fsx_xflags */
> +	__u32	fs_ioc_fssetxattr_xflags_set; /* What FS_IOC_FSSETXATTR may set in fsx_xflags */
> +	__u32	fs_ioc_fssetxattr_xflags_clear; /* What FS_IOC_FSSETXATTR may set in fsx_xflags */
> +	__u32	win_file_attrs;		/* What DOS/Windows FILE_* attributes are supported */
> +};
> +
> +#define FSINFO_ATTR_SUPPORTS__STRUCT struct fsinfo_supports
> +
> +struct fsinfo_timestamp_one {
> +	__s64	minimum;	/* Minimum timestamp value in seconds */
> +	__s64	maximum;	/* Maximum timestamp value in seconds */
> +	__u16	gran_mantissa;	/* Granularity(secs) = mant * 10^exp */
> +	__s8	gran_exponent;
> +	__u8	__padding[5];
> +};
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_TIMESTAMP_INFO).
> + */
> +struct fsinfo_timestamp_info {
> +	struct fsinfo_timestamp_one	atime;	/* Access time */
> +	struct fsinfo_timestamp_one	mtime;	/* Modification time */
> +	struct fsinfo_timestamp_one	ctime;	/* Change time */
> +	struct fsinfo_timestamp_one	btime;	/* Birth/creation time */
> +};
> +
> +#define FSINFO_ATTR_TIMESTAMP_INFO__STRUCT struct fsinfo_timestamp_info
> +
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_VOLUME_UUID).
> + */
> +struct fsinfo_volume_uuid {
> +	__u8	uuid[16];
> +};
> +
> +#define FSINFO_ATTR_VOLUME_UUID__STRUCT struct fsinfo_volume_uuid
> +
> +#endif /* _UAPI_LINUX_FSINFO_H */
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 3e1c5c9d2efe..f72a9e4ddc9a 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents);
>  COND_SYSCALL(io_uring_setup);
>  COND_SYSCALL(io_uring_enter);
>  COND_SYSCALL(io_uring_register);
> +COND_SYSCALL(fsinfo);
>  
>  /* fs/xattr.c */
>  
> diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile
> index 00b6824f9237..d63af5106fc2 100644
> --- a/samples/vfs/Makefile
> +++ b/samples/vfs/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -userprogs := test-fsmount test-statx
> +userprogs := test-fsinfo test-fsmount test-statx
>  always-y := $(userprogs)
>  
>  userccflags += -I usr/include
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> new file mode 100644
> index 000000000000..934b25399ffe
> --- /dev/null
> +++ b/samples/vfs/test-fsinfo.c
> @@ -0,0 +1,646 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Test the fsinfo() system call
> + *
> + * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowells@redhat.com)
> + */
> +
> +#define _GNU_SOURCE
> +#define _ATFILE_SOURCE
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <ctype.h>
> +#include <errno.h>
> +#include <time.h>
> +#include <math.h>
> +#include <fcntl.h>
> +#include <sys/syscall.h>
> +#include <linux/fsinfo.h>
> +#include <linux/socket.h>
> +#include <sys/stat.h>
> +#include <arpa/inet.h>
> +
> +#ifndef __NR_fsinfo
> +#define __NR_fsinfo -1
> +#endif
> +
> +static bool debug = 0;
> +static bool list_last;
> +
> +static __attribute__((unused))
> +ssize_t fsinfo(int dfd, const char *filename,
> +	       struct fsinfo_params *params, size_t params_size,
> +	       void *result_buffer, size_t result_buf_size)
> +{
> +	return syscall(__NR_fsinfo, dfd, filename,
> +		       params, params_size,
> +		       result_buffer, result_buf_size);
> +}
> +
> +struct fsinfo_attribute {
> +	unsigned int		attr_id;
> +	enum fsinfo_value_type	type;
> +	unsigned int		size;
> +	const char		*name;
> +	void (*dump)(void *reply, unsigned int size);
> +};
> +
> +static const struct fsinfo_attribute fsinfo_attributes[];
> +
> +static ssize_t get_fsinfo(const char *, const char *, struct fsinfo_params *, void **);
> +
> +static void dump_hex(FILE *f, unsigned char *data, int from, int to)
> +{
> +	unsigned offset, col = 0;
> +	bool print_offset = true;
> +
> +	for (offset = from; offset < to; offset++) {
> +		if (print_offset) {
> +			fprintf(f, "%04x: ", offset);
> +			print_offset = 0;
> +		}
> +		fprintf(f, "%02x", data[offset]);
> +		col++;
> +		if ((col & 3) == 0) {
> +			if ((col & 15) == 0) {
> +				fprintf(f, "\n");
> +				print_offset = 1;
> +			} else {
> +				fprintf(f, " ");
> +			}
> +		}
> +	}
> +
> +	if (!print_offset)
> +		fprintf(f, "\n");
> +}
> +
> +static void dump_attribute_info(void *reply, unsigned int size)
> +{
> +	struct fsinfo_attribute_info *attr_info = reply;
> +	const struct fsinfo_attribute *attr;
> +	char type[32], val_size[32];
> +
> +	switch (attr_info->type) {
> +	case FSINFO_TYPE_VSTRUCT:	strcpy(type, "V-STRUCT");	break;
> +	case FSINFO_TYPE_STRING:	strcpy(type, "STRING");		break;
> +	case FSINFO_TYPE_OPAQUE:	strcpy(type, "OPAQUE");		break;
> +	case FSINFO_TYPE_LIST:		strcpy(type, "LIST");		break;
> +	default:
> +		sprintf(type, "type-%x", attr_info->type);
> +		break;
> +	}
> +
> +	if (attr_info->flags & FSINFO_FLAGS_N)
> +		strcat(type, " x N");
> +	else if (attr_info->flags & FSINFO_FLAGS_NM)
> +		strcat(type, " x NM");
> +
> +	for (attr = fsinfo_attributes; attr->name; attr++)
> +		if (attr->attr_id == attr_info->attr_id)
> +			break;
> +
> +	if (attr_info->size)
> +		sprintf(val_size, "%u", attr_info->size);
> +	else
> +		strcpy(val_size, "-");
> +
> +	printf("%8x %-12s %08x %5s %s\n",
> +	       attr_info->attr_id,
> +	       type,
> +	       attr_info->flags,
> +	       val_size,
> +	       attr->name ? attr->name : "");
> +}
> +
> +static void dump_fsinfo_generic_statfs(void *reply, unsigned int size)
> +{
> +	struct fsinfo_statfs *f = reply;
> +
> +	printf("\n");
> +	printf("\tblocks       : n=%llu fr=%llu av=%llu\n",
> +	       (unsigned long long)f->f_blocks.lo,
> +	       (unsigned long long)f->f_bfree.lo,
> +	       (unsigned long long)f->f_bavail.lo);
> +
> +	printf("\tfiles        : n=%llu fr=%llu av=%llu\n",
> +	       (unsigned long long)f->f_files.lo,
> +	       (unsigned long long)f->f_ffree.lo,
> +	       (unsigned long long)f->f_favail.lo);
> +	printf("\tbsize        : %llu\n",
> +	       (unsigned long long)f->f_bsize);
> +	printf("\tfrsize       : %llu\n",
> +	       (unsigned long long)f->f_frsize);
> +}
> +
> +static void dump_fsinfo_generic_ids(void *reply, unsigned int size)
> +{
> +	struct fsinfo_ids *f = reply;
> +
> +	printf("\n");
> +	printf("\tdev          : %02x:%02x\n", f->f_dev_major, f->f_dev_minor);
> +	printf("\tfs           : type=%x name=%s\n", f->f_fstype, f->f_fs_name);
> +	printf("\tfsid         : %llx\n", (unsigned long long)f->f_fsid);
> +	printf("\tsbid         : %llx\n", (unsigned long long)f->f_sb_id);
> +}
> +
> +static void dump_fsinfo_generic_limits(void *reply, unsigned int size)
> +{
> +	struct fsinfo_limits *f = reply;
> +
> +	printf("\n");
> +	printf("\tmax file size: %llx%016llx\n",
> +	       (unsigned long long)f->max_file_size.hi,
> +	       (unsigned long long)f->max_file_size.lo);
> +	printf("\tmax ino      : %llx%016llx\n",
> +	       (unsigned long long)f->max_ino.hi,
> +	       (unsigned long long)f->max_ino.lo);
> +	printf("\tmax ids      : u=%llx g=%llx p=%llx\n",
> +	       (unsigned long long)f->max_uid,
> +	       (unsigned long long)f->max_gid,
> +	       (unsigned long long)f->max_projid);
> +	printf("\tmax dev      : maj=%x min=%x\n",
> +	       f->max_dev_major, f->max_dev_minor);
> +	printf("\tmax links    : %llx\n",
> +	       (unsigned long long)f->max_hard_links);
> +	printf("\tmax xattr    : n=%x b=%llx\n",
> +	       f->max_xattr_name_len,
> +	       (unsigned long long)f->max_xattr_body_len);
> +	printf("\tmax len      : file=%x sym=%x\n",
> +	       f->max_filename_len, f->max_symlink_len);
> +}
> +
> +static void dump_fsinfo_generic_supports(void *reply, unsigned int size)
> +{
> +	struct fsinfo_supports *f = reply;
> +
> +	printf("\n");
> +	printf("\tstx_attr     : %llx\n", (unsigned long long)f->stx_attributes);
> +	printf("\tstx_mask     : %x\n", f->stx_mask);
> +	printf("\tfs_ioc_*flags: get=%x set=%x clr=%x\n",
> +	       f->fs_ioc_getflags, f->fs_ioc_setflags_set, f->fs_ioc_setflags_clear);
> +	printf("\tfs_ioc_*xattr: fsx_xflags: get=%x set=%x clr=%x\n",
> +	       f->fs_ioc_fsgetxattr_xflags,
> +	       f->fs_ioc_fssetxattr_xflags_set,
> +	       f->fs_ioc_fssetxattr_xflags_clear);
> +	printf("\twin_fattrs   : %x\n", f->win_file_attrs);
> +}
> +
> +static void print_time(struct fsinfo_timestamp_one *t, char stamp)
> +{
> +	printf("\t%ctime       : gran=%uE%d range=%llx-%llx\n",
> +	       stamp,
> +	       t->gran_mantissa, t->gran_exponent,
> +	       (long long)t->minimum, (long long)t->maximum);
> +}
> +
> +static void dump_fsinfo_generic_timestamp_info(void *reply, unsigned int size)
> +{
> +	struct fsinfo_timestamp_info *f = reply;
> +
> +	printf("\n");
> +	print_time(&f->atime, 'a');
> +	print_time(&f->mtime, 'm');
> +	print_time(&f->ctime, 'c');
> +	print_time(&f->btime, 'b');
> +}
> +
> +static void dump_fsinfo_generic_volume_uuid(void *reply, unsigned int size)
> +{
> +	struct fsinfo_volume_uuid *f = reply;
> +
> +	printf("%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x"
> +	       "-%02x%02x%02x%02x%02x%02x\n",
> +	       f->uuid[ 0], f->uuid[ 1],
> +	       f->uuid[ 2], f->uuid[ 3],
> +	       f->uuid[ 4], f->uuid[ 5],
> +	       f->uuid[ 6], f->uuid[ 7],
> +	       f->uuid[ 8], f->uuid[ 9],
> +	       f->uuid[10], f->uuid[11],
> +	       f->uuid[12], f->uuid[13],
> +	       f->uuid[14], f->uuid[15]);
> +}
> +
> +static void dump_string(void *reply, unsigned int size)
> +{
> +	char *s = reply, *p;
> +	bool nl = false, last_nl = false;
> +
> +	p = s;
> +	if (size >= 4096) {
> +		size = 4096;
> +		p[4092] = '.';
> +		p[4093] = '.';
> +		p[4094] = '.';
> +		p[4095] = 0;
> +	} else {
> +		p[size] = 0;
> +	}
> +
> +	for (p = s; *p; p++) {
> +		if (*p == '\n') {
> +			last_nl = nl = true;
> +			continue;
> +		}
> +		last_nl = false;
> +		if (!isprint(*p) && *p != '\t')
> +			*p = '?';
> +	}
> +
> +	if (nl)
> +		putchar('\n');
> +	printf("%s", s);
> +	if (!last_nl)
> +		putchar('\n');
> +}
> +
> +#define dump_fsinfo_meta_attribute_info		(void *)0x123
> +#define dump_fsinfo_meta_attributes		(void *)0x123
> +
> +/*
> + *
> + */
> +#define __FSINFO(A, T, S, G, F, N)					\
> +	{ .attr_id = A, .type = T, .size = S, .name = N, .dump = dump_##G }
> +
> +#define _FSINFO(A,T,S,G,N)	__FSINFO(A, T, S, G, 0, N)
> +#define _FSINFO_N(A,T,S,G,N)	__FSINFO(A, T, S, G, FSINFO_FLAGS_N, N)
> +#define _FSINFO_NM(A,T,S,G,N)	__FSINFO(A, T, S, G, FSINFO_FLAGS_NM, N)
> +
> +#define _FSINFO_VSTRUCT(A,S,G,N)    _FSINFO   (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G, N)
> +#define _FSINFO_VSTRUCT_N(A,S,G,N)  _FSINFO_N (A, FSINFO_TYPE_VSTRUCT, sizeof(S), G, N)
> +#define _FSINFO_VSTRUCT_NM(A,S,G,N) _FSINFO_NM(A, FSINFO_TYPE_VSTRUCT, sizeof(S), G, N)
> +
> +#define FSINFO_VSTRUCT(A,G)	_FSINFO_VSTRUCT   (A, A##__STRUCT, G, #A)
> +#define FSINFO_VSTRUCT_N(A,G)	_FSINFO_VSTRUCT_N (A, A##__STRUCT, G, #A)
> +#define FSINFO_VSTRUCT_NM(A,G)	_FSINFO_VSTRUCT_NM(A, A##__STRUCT, G, #A)
> +#define FSINFO_STRING(A,G)	_FSINFO   (A, FSINFO_TYPE_STRING, 0, G, #A)
> +#define FSINFO_STRING_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_STRING, 0, G, #A)
> +#define FSINFO_STRING_NM(A,G)	_FSINFO_NM(A, FSINFO_TYPE_STRING, 0, G, #A)
> +#define FSINFO_OPAQUE(A,G)	_FSINFO   (A, FSINFO_TYPE_OPAQUE, 0, G, #A)
> +#define FSINFO_LIST(A,G)	_FSINFO   (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G, #A)
> +#define FSINFO_LIST_N(A,G)	_FSINFO_N (A, FSINFO_TYPE_LIST, sizeof(A##__STRUCT), G, #A)
> +
> +static const struct fsinfo_attribute fsinfo_attributes[] = {
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_STATFS,		fsinfo_generic_statfs),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_IDS,		fsinfo_generic_ids),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_LIMITS,		fsinfo_generic_limits),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		fsinfo_generic_supports),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_TIMESTAMP_INFO,	fsinfo_generic_timestamp_info),
> +	FSINFO_STRING	(FSINFO_ATTR_VOLUME_ID,		string),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_VOLUME_UUID,	fsinfo_generic_volume_uuid),
> +	FSINFO_STRING	(FSINFO_ATTR_VOLUME_NAME,	string),
> +	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, fsinfo_meta_attribute_info),
> +	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_meta_attributes),
> +	{}
> +};
> +
> +static __attribute__((noreturn))
> +void bad_value(const char *what,
> +	       struct fsinfo_params *params,
> +	       const struct fsinfo_attribute *attr,
> +	       const struct fsinfo_attribute_info *attr_info,
> +	       void *reply, unsigned int size)
> +{
> +	printf("\n");
> +	fprintf(stderr, "%s %s{%u}{%u} t=%x f=%x s=%x\n",
> +		what, attr->name, params->Nth, params->Mth,
> +		attr_info->type, attr_info->flags, attr_info->size);
> +	fprintf(stderr, "size=%u\n", size);
> +	dump_hex(stderr, reply, 0, size);
> +	exit(1);
> +}
> +
> +static void dump_value(unsigned int attr_id,
> +		       const struct fsinfo_attribute *attr,
> +		       const struct fsinfo_attribute_info *attr_info,
> +		       void *reply, unsigned int size)
> +{
> +	if (!attr || !attr->dump) {
> +		printf("<no dumper>\n");
> +		return;
> +	}
> +
> +	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
> +		printf("<short data %u/%u>\n", size, attr->size);
> +		return;
> +	}
> +
> +	attr->dump(reply, size);
> +}
> +
> +static void dump_list(unsigned int attr_id,
> +		      const struct fsinfo_attribute *attr,
> +		      const struct fsinfo_attribute_info *attr_info,
> +		      void *reply, unsigned int size)
> +{
> +	size_t elem_size = attr_info->size;
> +	unsigned int ix = 0;
> +
> +	printf("\n");
> +	if (!attr || !attr->dump) {
> +		printf("<no dumper>\n");
> +		return;
> +	}
> +
> +	if (attr->type == FSINFO_TYPE_VSTRUCT && size < attr->size) {
> +		printf("<short data %u/%u>\n", size, attr->size);
> +		return;
> +	}
> +
> +	list_last = false;
> +	while (size >= elem_size) {
> +		printf("\t[%02x] ", ix);
> +		if (size == elem_size)
> +			list_last = true;
> +		attr->dump(reply, size);
> +		reply += elem_size;
> +		size -= elem_size;
> +		ix++;
> +	}
> +}
> +
> +/*
> + * Call fsinfo, expanding the buffer as necessary.
> + */
> +static ssize_t get_fsinfo(const char *file, const char *name,
> +			  struct fsinfo_params *params, void **_r)
> +{
> +	ssize_t ret;
> +	size_t buf_size = 4096;
> +	void *r;
> +
> +	for (;;) {
> +		r = malloc(buf_size);
> +		if (!r) {
> +			perror("malloc");
> +			exit(1);
> +		}
> +		memset(r, 0xbd, buf_size);
> +
> +		errno = 0;
> +		ret = fsinfo(AT_FDCWD, file, params, sizeof(*params), r, buf_size - 1);
> +		if (ret == -1)
> +			goto error;
> +
> +		if (ret <= buf_size - 1)
> +			break;
> +		buf_size = (ret + 4096 - 1) & ~(4096 - 1);
> +	}
> +
> +	if (debug)
> +		printf("fsinfo(%s,%s,%u,%u) = %zd\n",
> +		       file, name, params->Nth, params->Mth, ret);
> +
> +	((char *)r)[ret] = 0;
> +	*_r = r;
> +	return ret;
> +
> +error:
> +	*_r = NULL;
> +	free(r);
> +	if (debug)
> +		printf("fsinfo(%s,%s,%u,%u) = %m\n",
> +		       file, name, params->Nth, params->Mth);
> +	return ret;
> +}
> +
> +/*
> + * Try one subinstance of an attribute.
> + */
> +static int try_one(const char *file, struct fsinfo_params *params,
> +		   const struct fsinfo_attribute_info *attr_info, bool raw)
> +{
> +	const struct fsinfo_attribute *attr;
> +	const char *name;
> +	size_t size = 4096;
> +	char namebuf[32];
> +	void *r;
> +
> +	for (attr = fsinfo_attributes; attr->name; attr++) {
> +		if (attr->attr_id == params->request) {
> +			name = attr->name;
> +			if (strncmp(name, "fsinfo_generic_", 15) == 0)
> +				name += 15;
> +			goto found;
> +		}
> +	}
> +
> +	sprintf(namebuf, "<unknown-%x>", params->request);
> +	name = namebuf;
> +	attr = NULL;
> +
> +found:
> +	size = get_fsinfo(file, name, params, &r);
> +
> +	if (size == -1) {
> +		if (errno == ENODATA) {
> +			if (!(attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) &&
> +			    params->Nth == 0 && params->Mth == 0)
> +				bad_value("Unexpected ENODATA",
> +					  params, attr, attr_info, r, size);
> +			free(r);
> +			return (params->Mth == 0) ? 2 : 1;
> +		}
> +		if (errno == EOPNOTSUPP) {
> +			if (params->Nth > 0 || params->Mth > 0)
> +				bad_value("Should return ENODATA",
> +					  params, attr, attr_info, r, size);
> +			//printf("\e[33m%s\e[m: <not supported>\n",
> +			//       fsinfo_attr_names[attr]);
> +			free(r);
> +			return 2;
> +		}
> +		perror(file);
> +		exit(1);
> +	}
> +
> +	if (raw) {
> +		if (size > 4096)
> +			size = 4096;
> +		dump_hex(stdout, r, 0, size);
> +		free(r);
> +		return 0;
> +	}
> +
> +	switch (attr_info->flags & (FSINFO_FLAGS_N | FSINFO_FLAGS_NM)) {
> +	case 0:
> +		printf("\e[33m%s\e[m: ", name);
> +		break;
> +	case FSINFO_FLAGS_N:
> +		printf("\e[33m%s{%u}\e[m: ", name, params->Nth);
> +		break;
> +	case FSINFO_FLAGS_NM:
> +		printf("\e[33m%s{%u,%u}\e[m: ", name, params->Nth, params->Mth);
> +		break;
> +	}
> +
> +	switch (attr_info->type) {
> +	case FSINFO_TYPE_STRING:
> +		if (size == 0 || ((char *)r)[size - 1] != 0)
> +			bad_value("Unterminated string",
> +				  params, attr, attr_info, r, size);
> +	case FSINFO_TYPE_VSTRUCT:
> +	case FSINFO_TYPE_OPAQUE:
> +		dump_value(params->request, attr, attr_info, r, size);
> +		free(r);
> +		return 0;
> +
> +	case FSINFO_TYPE_LIST:
> +		dump_list(params->request, attr, attr_info, r, size);
> +		free(r);
> +		return 0;
> +
> +	default:
> +		bad_value("Fishy type", params, attr, attr_info, r, size);
> +	}
> +}
> +
> +static int cmp_u32(const void *a, const void *b)
> +{
> +	return *(const int *)a - *(const int *)b;
> +}
> +
> +/*
> + *
> + */
> +int main(int argc, char **argv)
> +{
> +	struct fsinfo_attribute_info attr_info;
> +	struct fsinfo_params params = {
> +		.at_flags	= AT_SYMLINK_NOFOLLOW,
> +		.flags		= FSINFO_FLAGS_QUERY_PATH,
> +	};
> +	unsigned int *attrs, ret, nr, i;
> +	bool meta = false;
> +	int raw = 0, opt, Nth, Mth;
> +
> +	while ((opt = getopt(argc, argv, "Madlr"))) {
> +		switch (opt) {
> +		case 'M':
> +			meta = true;
> +			continue;
> +		case 'a':
> +			params.at_flags |= AT_NO_AUTOMOUNT;
> +			params.flags = FSINFO_FLAGS_QUERY_PATH;
> +			continue;
> +		case 'd':
> +			debug = true;
> +			continue;
> +		case 'l':
> +			params.at_flags &= ~AT_SYMLINK_NOFOLLOW;
> +			params.flags = FSINFO_FLAGS_QUERY_PATH;
> +			continue;
> +		case 'r':
> +			raw = 1;
> +			continue;
> +		}
> +		break;
> +	}
> +
> +	argc -= optind;
> +	argv += optind;
> +
> +	if (argc != 1) {
> +		printf("Format: test-fsinfo [-Madlr] <path>\n");
> +		exit(2);
> +	}
> +
> +	/* Retrieve a list of supported attribute IDs */
> +	params.request = FSINFO_ATTR_FSINFO_ATTRIBUTES;
> +	params.Nth = 0;
> +	params.Mth = 0;
> +	ret = get_fsinfo(argv[0], "attributes", &params, (void **)&attrs);
> +	if (ret == -1) {
> +		fprintf(stderr, "Unable to get attribute list: %m\n");
> +		exit(1);
> +	}
> +
> +	if (ret % sizeof(attrs[0])) {
> +		fprintf(stderr, "Bad length of attribute list (0x%x)\n", ret);
> +		exit(2);
> +	}
> +
> +	nr = ret / sizeof(attrs[0]);
> +	qsort(attrs, nr, sizeof(attrs[0]), cmp_u32);
> +
> +	if (meta) {
> +		printf("ATTR ID  TYPE         FLAGS    SIZE  NAME\n");
> +		printf("======== ============ ======== ===== =========\n");
> +		for (i = 0; i < nr; i++) {
> +			params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
> +			params.Nth = attrs[i];
> +			params.Mth = 0;
> +			ret = fsinfo(AT_FDCWD, argv[0],
> +				     &params, sizeof(params),
> +				     &attr_info, sizeof(attr_info));
> +			if (ret == -1) {
> +				fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
> +				exit(1);
> +			}
> +
> +			dump_attribute_info(&attr_info, ret);
> +		}
> +		exit(0);
> +	}
> +
> +	for (i = 0; i < nr; i++) {
> +		params.request = FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO;
> +		params.Nth = attrs[i];
> +		params.Mth = 0;
> +		ret = fsinfo(AT_FDCWD, argv[0],
> +			     &params, sizeof(params),
> +			     &attr_info, sizeof(attr_info));
> +		if (ret == -1) {
> +			fprintf(stderr, "Can't get info for attribute %x: %m\n", attrs[i]);
> +			exit(1);
> +		}
> +
> +		if (attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO ||
> +		    attrs[i] == FSINFO_ATTR_FSINFO_ATTRIBUTES)
> +			continue;
> +
> +		if (attrs[i] != attr_info.attr_id) {
> +			fprintf(stderr, "ID for %03x returned %03x\n",
> +				attrs[i], attr_info.attr_id);
> +			break;
> +		}
> +		Nth = 0;
> +		do {
> +			Mth = 0;
> +			do {
> +				params.request = attrs[i];
> +				params.Nth = Nth;
> +				params.Mth = Mth;
> +
> +				switch (try_one(argv[0], &params, &attr_info, raw)) {
> +				case 0:
> +					continue;
> +				case 1:
> +					goto done_M;
> +				case 2:
> +					goto done_N;
> +				}
> +			} while (++Mth < 100);
> +
> +		done_M:
> +			if (Mth >= 100) {
> +				fprintf(stderr, "Fishy: Mth %x[%u][%u]\n", attrs[i], Nth, Mth);
> +				break;
> +			}
> +
> +		} while (++Nth < 100);
> +
> +	done_N:
> +		if (Nth >= 100) {
> +			fprintf(stderr, "Fishy: Nth %x[%u]\n", attrs[i], Nth);
> +			break;
> +		}
> +	}
> +
> +	return 0;
> +}
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/18] fsinfo: Allow fsinfo() to look up a mount object by ID [ver #21]
  2020-08-03 13:37 ` [PATCH 05/18] fsinfo: Allow fsinfo() to look up a mount object by ID " David Howells
@ 2020-08-04 10:33   ` Miklos Szeredi
  0 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04 10:33 UTC (permalink / raw)
  To: David Howells
  Cc: viro, torvalds, raven, mszeredi, christian, jannh, darrick.wong,
	kzak, jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Mon, Aug 03, 2020 at 02:37:08PM +0100, David Howells wrote:
> Allow the fsinfo() syscall to look up a mount object by ID rather than by
> pathname.  This is necessary as there can be multiple mounts stacked up at
> the same pathname and there's no way to look through them otherwise.
> 
> This is done by passing FSINFO_FLAGS_QUERY_MOUNT to fsinfo() in the
> parameters and then passing the mount ID as a string to fsinfo() in place
> of the filename:
> 
> 	struct fsinfo_params params = {
> 		.flags	 = FSINFO_FLAGS_QUERY_MOUNT,
> 		.request = FSINFO_ATTR_IDS,
> 	};
> 
> 	ret = fsinfo(AT_FDCWD, "21", &params, buffer, sizeof(buffer));
> 
> The caller is only permitted to query a mount object if the root directory
> of that mount connects directly to the current chroot if dfd == AT_FDCWD[*]
> or the directory specified by dfd otherwise.  Note that this is not
> available to the pathwalk of any other syscall.
> 
> [*] This needs to be something other than AT_FDCWD, perhaps AT_FDROOT.
> 
> [!] This probably needs an LSM hook.
> 
> [!] This might want to check the permissions on all the intervening dirs -
>     but it would have to do that under RCU conditions.
> 
> [!] This might want to check a CAP_* flag.

Was this reviewed by security folks?

> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/fsinfo.c                 |   53 +++++++++++++++++++
>  fs/internal.h               |    1 
>  fs/namespace.c              |  117 ++++++++++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/fsinfo.h |    1 
>  samples/vfs/test-fsinfo.c   |    7 ++-
>  5 files changed, 175 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> index aef7a736e8fc..8ccbcddb4f16 100644
> --- a/fs/fsinfo.c
> +++ b/fs/fsinfo.c
> @@ -563,6 +563,56 @@ static int vfs_fsinfo_fd(unsigned int fd, struct fsinfo_context *ctx)
>  	return ret;
>  }
>  
> +/*
> + * Look up the root of a mount object.  This allows access to mount objects
> + * (and their attached superblocks) that can't be retrieved by path because
> + * they're entirely covered.
> + *
> + * We only permit access to a mount that has a direct path between either the
> + * dentry pointed to by dfd or to our chroot (if dfd is AT_FDCWD).
> + */
> +static int vfs_fsinfo_mount(int dfd, const char __user *filename,
> +			    struct fsinfo_context *ctx)
> +{
> +	struct path path;
> +	struct fd f = {};
> +	char *name;
> +	unsigned long mnt_id;
> +	int ret;
> +
> +	if (!filename)
> +		return -EINVAL;
> +
> +	name = strndup_user(filename, 32);
> +	if (IS_ERR(name))
> +		return PTR_ERR(name);
> +	ret = kstrtoul(name, 0, &mnt_id);
> +	if (ret < 0)
> +		goto out_name;
> +	if (mnt_id > INT_MAX)
> +		goto out_name;
> +
> +	if (dfd != AT_FDCWD) {
> +		ret = -EBADF;
> +		f = fdget_raw(dfd);
> +		if (!f.file)
> +			goto out_name;
> +	}
> +
> +	ret = lookup_mount_object(f.file ? &f.file->f_path : NULL,
> +				  mnt_id, &path);
> +	if (ret < 0)
> +		goto out_fd;
> +
> +	ret = vfs_fsinfo(&path, ctx);
> +	path_put(&path);
> +out_fd:
> +	fdput(f);
> +out_name:
> +	kfree(name);
> +	return ret;
> +}
> +
>  /**
>   * sys_fsinfo - System call to get filesystem information
>   * @dfd: Base directory to pathwalk from or fd referring to filesystem.
> @@ -636,6 +686,9 @@ SYSCALL_DEFINE6(fsinfo,
>  			return -EINVAL;
>  		ret = vfs_fsinfo_fd(dfd, &ctx);
>  		break;
> +	case FSINFO_FLAGS_QUERY_MOUNT:
> +		ret = vfs_fsinfo_mount(dfd, pathname, &ctx);
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/fs/internal.h b/fs/internal.h
> index 0b57da498f06..84bbb743a5ac 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -89,6 +89,7 @@ extern int __mnt_want_write_file(struct file *);
>  extern void __mnt_drop_write_file(struct file *);
>  
>  extern void dissolve_on_fput(struct vfsmount *);
> +extern int lookup_mount_object(struct path *, unsigned int, struct path *);
>  extern int fsinfo_generic_mount_source(struct path *, struct fsinfo_context *);
>  
>  /*
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ead8d1a16610..b2b9920ffd3c 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -64,7 +64,7 @@ static int __init set_mphash_entries(char *str)
>  __setup("mphash_entries=", set_mphash_entries);
>  
>  static u64 event;
> -static DEFINE_IDA(mnt_id_ida);
> +static DEFINE_IDR(mnt_id_ida);
>  static DEFINE_IDA(mnt_group_ida);
>  
>  static struct hlist_head *mount_hashtable __read_mostly;
> @@ -105,17 +105,27 @@ static inline struct hlist_head *mp_hash(struct dentry *dentry)
>  
>  static int mnt_alloc_id(struct mount *mnt)
>  {
> -	int res = ida_alloc(&mnt_id_ida, GFP_KERNEL);
> +	int res;
>  
> +	/* Allocate an ID, but don't set the pointer back to the mount until
> +	 * later, as once we do that, we have to follow RCU protocols to get
> +	 * rid of the mount struct.
> +	 */
> +	res = idr_alloc(&mnt_id_ida, NULL, 0, INT_MAX, GFP_KERNEL);

This needs to be a separate patch.

>  	if (res < 0)
>  		return res;
>  	mnt->mnt_id = res;
>  	return 0;
>  }
>  
> +static void mnt_publish_id(struct mount *mnt)
> +{
> +	idr_replace(&mnt_id_ida, mnt, mnt->mnt_id);
> +}
> +
>  static void mnt_free_id(struct mount *mnt)
>  {
> -	ida_free(&mnt_id_ida, mnt->mnt_id);
> +	idr_remove(&mnt_id_ida, mnt->mnt_id);
>  }
>  
>  /*
> @@ -975,6 +985,7 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
>  	lock_mount_hash();
>  	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
>  	unlock_mount_hash();
> +	mnt_publish_id(mnt);
>  	return &mnt->mnt;
>  }
>  EXPORT_SYMBOL(vfs_create_mount);
> @@ -1068,6 +1079,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
>  	lock_mount_hash();
>  	list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
>  	unlock_mount_hash();
> +	mnt_publish_id(mnt);
>  
>  	if ((flag & CL_SLAVE) ||
>  	    ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) {
> @@ -4151,4 +4163,103 @@ int fsinfo_generic_mount_source(struct path *path, struct fsinfo_context *ctx)
>  	return m.count + 1;
>  }
>  
> +/*
> + * See if one path point connects directly to another by ancestral relationship
> + * across mountpoints.  Must call with the RCU read lock held.
> + */
> +static bool are_paths_connected(struct path *ancestor, struct path *to_check)
> +{
> +	struct mount *mnt, *parent;
> +	struct path cursor;
> +	unsigned seq;
> +	bool connected;
> +
> +	seq = 0;
> +restart:
> +	cursor = *to_check;
> +
> +	read_seqbegin_or_lock(&rename_lock, &seq);
> +	while (cursor.mnt != ancestor->mnt) {
> +		mnt = real_mount(cursor.mnt);
> +		parent = READ_ONCE(mnt->mnt_parent);
> +		if (mnt == parent)
> +			goto failed;
> +		cursor.dentry = READ_ONCE(mnt->mnt_mountpoint);
> +		cursor.mnt = &parent->mnt;
> +	}
> +
> +	while (cursor.dentry != ancestor->dentry) {
> +		if (cursor.dentry == cursor.mnt->mnt_root ||
> +		    IS_ROOT(cursor.dentry))
> +			goto failed;
> +		cursor.dentry = READ_ONCE(cursor.dentry->d_parent);
> +	}
> +
> +	connected = true;
> +out:
> +	done_seqretry(&rename_lock, seq);
> +	return connected;
> +
> +failed:
> +	if (need_seqretry(&rename_lock, seq)) {
> +		seq = 1;
> +		goto restart;
> +	}
> +	connected = false;
> +	goto out;
> +}
> +
> +/**
> + * lookup_mount_object - Look up a vfsmount object by ID
> + * @root: The mount root must connect backwards to this point (or chroot if NULL).
> + * @id: The ID of the mountpoint.
> + * @_mntpt: Where to return the resulting mountpoint path.
> + *
> + * Look up the root of the mount with the corresponding ID.  This is only
> + * permitted if that mount connects directly to the specified root/chroot.
> + */
> +int lookup_mount_object(struct path *root, unsigned int mnt_id, struct path *_mntpt)
> +{
> +	struct mount *mnt;
> +	struct path stop, mntpt = {};
> +	int ret = -EPERM;
> +
> +	if (!root)
> +		get_fs_root(current->fs, &stop);
> +	else
> +		stop = *root;
> +
> +	rcu_read_lock();
> +	lock_mount_hash();
> +	mnt = idr_find(&mnt_id_ida, mnt_id);
> +	if (!mnt)
> +		goto out_unlock_mh;
> +	if (mnt->mnt.mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
> +		goto out_unlock_mh;
> +	if (mnt_get_count(mnt) == 0)
> +		goto out_unlock_mh;
> +	mnt_add_count(mnt, 1);
> +	mntpt.mnt = &mnt->mnt;
> +	mntpt.dentry = dget(mnt->mnt.mnt_root);
> +	unlock_mount_hash();
> +
> +	if (are_paths_connected(&stop, &mntpt)) {
> +		*_mntpt = mntpt;
> +		mntpt.mnt = NULL;
> +		mntpt.dentry = NULL;
> +		ret = 0;
> +	}
> +
> +out_unlock:
> +	rcu_read_unlock();
> +	if (!root)
> +		path_put(&stop);
> +	path_put(&mntpt);
> +	return ret;
> +
> +out_unlock_mh:
> +	unlock_mount_hash();
> +	goto out_unlock;
> +}
> +
>  #endif /* CONFIG_FSINFO */
> diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
> index a27e92b68266..d24e47762a07 100644
> --- a/include/uapi/linux/fsinfo.h
> +++ b/include/uapi/linux/fsinfo.h
> @@ -44,6 +44,7 @@ struct fsinfo_params {
>  #define FSINFO_FLAGS_QUERY_MASK	0x0007 /* What object should fsinfo() query? */
>  #define FSINFO_FLAGS_QUERY_PATH	0x0000 /* - path, specified by dirfd,pathname,AT_EMPTY_PATH */
>  #define FSINFO_FLAGS_QUERY_FD	0x0001 /* - fd specified by dirfd */
> +#define FSINFO_FLAGS_QUERY_MOUNT 0x0002	/* - mount object (path=>mount_id, dirfd=>subtree) */
>  	__u32	request;	/* ID of requested attribute */
>  	__u32	Nth;		/* Instance of it (some may have multiple) */
>  	__u32	Mth;		/* Subinstance of Nth instance */
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> index 634f30b7e67f..dfa44bba8bbd 100644
> --- a/samples/vfs/test-fsinfo.c
> +++ b/samples/vfs/test-fsinfo.c
> @@ -593,7 +593,7 @@ int main(int argc, char **argv)
>  	bool meta = false;
>  	int raw = 0, opt, Nth, Mth;
>  
> -	while ((opt = getopt(argc, argv, "Madlr"))) {
> +	while ((opt = getopt(argc, argv, "Madmlr"))) {
>  		switch (opt) {
>  		case 'M':
>  			meta = true;
> @@ -609,6 +609,10 @@ int main(int argc, char **argv)
>  			params.at_flags &= ~AT_SYMLINK_NOFOLLOW;
>  			params.flags = FSINFO_FLAGS_QUERY_PATH;
>  			continue;
> +		case 'm':
> +			params.resolve_flags = 0;
> +			params.flags = FSINFO_FLAGS_QUERY_MOUNT;
> +			continue;
>  		case 'r':
>  			raw = 1;
>  			continue;
> @@ -621,6 +625,7 @@ int main(int argc, char **argv)
>  
>  	if (argc != 1) {
>  		printf("Format: test-fsinfo [-Madlr] <path>\n");
> +		printf("Format: test-fsinfo [-Mdr] -m <mnt_id>\n");
>  		exit(2);
>  	}
>  
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-03 13:37 ` [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount " David Howells
@ 2020-08-04 10:41   ` Miklos Szeredi
  2020-08-04 12:32     ` Ian Kent
  2020-08-05 14:13   ` David Howells
  1 sibling, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04 10:41 UTC (permalink / raw)
  To: David Howells
  Cc: viro, torvalds, raven, mszeredi, christian, jannh, darrick.wong,
	kzak, jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Mon, Aug 03, 2020 at 02:37:16PM +0100, David Howells wrote:
> Add a uniquifier ID to struct mount that is effectively unique over the
> kernel lifetime to deal around mnt_id values being reused.  This can then
> be exported through fsinfo() to allow detection of replacement mounts that
> happen to end up with the same mount ID.
> 
> The normal mount handle is still used for referring to a particular mount.
> 
> The mount notification is then changed to convey these unique mount IDs
> rather than the mount handle.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/mount.h        |    3 +++
>  fs/mount_notify.c |    4 ++--
>  fs/namespace.c    |    3 +++
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index 85456a5f5a3a..1037781be055 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -79,6 +79,9 @@ struct mount {
>  	int mnt_expiry_mark;		/* true if marked for expiry */
>  	struct hlist_head mnt_pins;
>  	struct hlist_head mnt_stuck_children;
> +#ifdef CONFIG_FSINFO
> +	u64	mnt_unique_id;		/* ID unique over lifetime of kernel */
> +#endif

Not sure if it's worth making conditional.

>  #ifdef CONFIG_MOUNT_NOTIFICATIONS
>  	struct watch_list *mnt_watchers; /* Watches on dentries within this mount */
>  #endif
> diff --git a/fs/mount_notify.c b/fs/mount_notify.c
> index 44f570e4cebe..d8ba66ed5f77 100644
> --- a/fs/mount_notify.c
> +++ b/fs/mount_notify.c
> @@ -90,7 +90,7 @@ void notify_mount(struct mount *trigger,
>  	n.watch.type	= WATCH_TYPE_MOUNT_NOTIFY;
>  	n.watch.subtype	= subtype;
>  	n.watch.info	= info_flags | watch_sizeof(n);
> -	n.triggered_on	= trigger->mnt_id;
> +	n.triggered_on	= trigger->mnt_unique_id;
>  
>  	switch (subtype) {
>  	case NOTIFY_MOUNT_EXPIRY:
> @@ -102,7 +102,7 @@ void notify_mount(struct mount *trigger,
>  	case NOTIFY_MOUNT_UNMOUNT:
>  	case NOTIFY_MOUNT_MOVE_FROM:
>  	case NOTIFY_MOUNT_MOVE_TO:
> -		n.auxiliary_mount	= aux->mnt_id;
> +		n.auxiliary_mount = aux->mnt_unique_id;

Hmm, so we now have two ID's:

 - one can be used to look up the mount
 - one is guaranteed to be unique

With this change the mount cannot be looked up with FSINFO_FLAGS_QUERY_MOUNT,
right?

Should we be merging the two ID's into a single one which has both properties?

>  		break;
>  
>  	default:
> diff --git a/fs/namespace.c b/fs/namespace.c
> index b2b9920ffd3c..1db8a64cd76f 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -115,6 +115,9 @@ static int mnt_alloc_id(struct mount *mnt)
>  	if (res < 0)
>  		return res;
>  	mnt->mnt_id = res;
> +#ifdef CONFIG_FSINFO
> +	mnt->mnt_unique_id = atomic64_inc_return(&vfs_unique_counter);
> +#endif
>  	return 0;
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information [ver #21]
  2020-08-03 13:36 ` [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information " David Howells
  2020-08-04 10:16   ` Miklos Szeredi
@ 2020-08-04 11:34   ` David Howells
  2020-08-27 11:27   ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-04 11:34 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, viro, linux-api, torvalds, raven, mszeredi, christian,
	jannh, darrick.wong, kzak, jlayton, linux-fsdevel,
	linux-security-module, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> > 		__u32	Mth;
> 
> The Mth field seems to be unused in this patchset.  Since the struct is
> extensible, I guess there's no point in adding it now.

Yeah - I was using it to index through the server address lists for network
filesystems (ie. the Mth address of the Nth server), but I've dropped the nfs
patch and made afs return an array of addresses for the Nth server since the
address list can get reordered.

Ordinarily, I'd just take it out, but I don't want to cause the patchset to
get dropped for yet another merge cycle :-/

> > +#define FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO 0x100	/* Information about attr N (for path) */
> > +#define FSINFO_ATTR_FSINFO_ATTRIBUTES	0x101	/* List of supported attrs (for path) */
> 
> I think it would make sense to move the actual attributes to a separate patch
> and leave this just being the infrastructure.

Maybe.  If there are no attributes, then it makes it a bit hard to test.

> > +struct fsinfo_u128 {
> ...
> 
> Shouldn't this belong in <linux/types.h>?

Maybe.  Ideally, I'd use a proper C type rather than a struct.

> Is there a reason these are 128 wide fields?  Are we approaching the limits of
> 64bits?

Dave Chinner was talking at LSF a couple of years ago, IIRC, about looking
beyond the 16 Exa limit in XFS.  I've occasionally talked to people who have
multi-Peta data sets in AFS or whatever they were using, streamed from science
experiments, so the limit isn't necessarily all *that* far off.

> > +struct fsinfo_limits {
> > +	struct fsinfo_u128 max_file_size;	/* Maximum file size */
> > +	struct fsinfo_u128 max_ino;		/* Maximum inode number */
> 
> Again, what's the reason.  AFACT we are not yet worried about overflowing 64
> bits.  Future proofing is good, but there has to be some rules and reasons
> behind the decisions.

This is cheap to do.  This information is expected to be static for the
lifetime a superblock and, for most filesystems, of the running kernel, so
simply copying it with memcpy() from rodata is going to suffice most of the
time.

But don't worry - 640K is sufficient for everyone ;-)

David


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-04 10:41   ` Miklos Szeredi
@ 2020-08-04 12:32     ` Ian Kent
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Kent @ 2020-08-04 12:32 UTC (permalink / raw)
  To: Miklos Szeredi, David Howells
  Cc: viro, torvalds, mszeredi, christian, jannh, darrick.wong, kzak,
	jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Tue, 2020-08-04 at 12:41 +0200, Miklos Szeredi wrote:
> On Mon, Aug 03, 2020 at 02:37:16PM +0100, David Howells wrote:
> > Add a uniquifier ID to struct mount that is effectively unique over
> > the
> > kernel lifetime to deal around mnt_id values being reused.  This
> > can then
> > be exported through fsinfo() to allow detection of replacement
> > mounts that
> > happen to end up with the same mount ID.
> > 
> > The normal mount handle is still used for referring to a particular
> > mount.
> > 
> > The mount notification is then changed to convey these unique mount
> > IDs
> > rather than the mount handle.
> > 
> > Signed-off-by: David Howells <dhowells@redhat.com>
> > ---
> > 
> >  fs/mount.h        |    3 +++
> >  fs/mount_notify.c |    4 ++--
> >  fs/namespace.c    |    3 +++
> >  3 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/mount.h b/fs/mount.h
> > index 85456a5f5a3a..1037781be055 100644
> > --- a/fs/mount.h
> > +++ b/fs/mount.h
> > @@ -79,6 +79,9 @@ struct mount {
> >  	int mnt_expiry_mark;		/* true if marked for
> > expiry */
> >  	struct hlist_head mnt_pins;
> >  	struct hlist_head mnt_stuck_children;
> > +#ifdef CONFIG_FSINFO
> > +	u64	mnt_unique_id;		/* ID unique over lifetime of
> > kernel */
> > +#endif
> 
> Not sure if it's worth making conditional.
> 
> >  #ifdef CONFIG_MOUNT_NOTIFICATIONS
> >  	struct watch_list *mnt_watchers; /* Watches on dentries within
> > this mount */
> >  #endif
> > diff --git a/fs/mount_notify.c b/fs/mount_notify.c
> > index 44f570e4cebe..d8ba66ed5f77 100644
> > --- a/fs/mount_notify.c
> > +++ b/fs/mount_notify.c
> > @@ -90,7 +90,7 @@ void notify_mount(struct mount *trigger,
> >  	n.watch.type	= WATCH_TYPE_MOUNT_NOTIFY;
> >  	n.watch.subtype	= subtype;
> >  	n.watch.info	= info_flags | watch_sizeof(n);
> > -	n.triggered_on	= trigger->mnt_id;
> > +	n.triggered_on	= trigger->mnt_unique_id;
> >  
> >  	switch (subtype) {
> >  	case NOTIFY_MOUNT_EXPIRY:
> > @@ -102,7 +102,7 @@ void notify_mount(struct mount *trigger,
> >  	case NOTIFY_MOUNT_UNMOUNT:
> >  	case NOTIFY_MOUNT_MOVE_FROM:
> >  	case NOTIFY_MOUNT_MOVE_TO:
> > -		n.auxiliary_mount	= aux->mnt_id;
> > +		n.auxiliary_mount = aux->mnt_unique_id;
> 
> Hmm, so we now have two ID's:
> 
>  - one can be used to look up the mount
>  - one is guaranteed to be unique
> 
> With this change the mount cannot be looked up with
> FSINFO_FLAGS_QUERY_MOUNT,
> right?
> 
> Should we be merging the two ID's into a single one which has both
> properties?

I'd been thinking we would probably need to change to 64 bit ids
for a while now and I thought that was what was going to happen.

We'll need to change libmount and current code but better early
on than later.

Ian

> 
> >  		break;
> >  
> >  	default:
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index b2b9920ffd3c..1db8a64cd76f 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -115,6 +115,9 @@ static int mnt_alloc_id(struct mount *mnt)
> >  	if (res < 0)
> >  		return res;
> >  	mnt->mnt_id = res;
> > +#ifdef CONFIG_FSINFO
> > +	mnt->mnt_unique_id = atomic64_inc_return(&vfs_unique_counter);
> > +#endif
> >  	return 0;
> >  }
> >  
> > 
> > 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved [ver #21]
  2020-08-03 13:37 ` [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved " David Howells
@ 2020-08-04 13:38   ` Miklos Szeredi
  2020-08-05 15:37   ` David Howells
  1 sibling, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04 13:38 UTC (permalink / raw)
  To: David Howells
  Cc: viro, torvalds, raven, mszeredi, christian, jannh, darrick.wong,
	kzak, jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Mon, Aug 03, 2020 at 02:37:33PM +0100, David Howells wrote:
> Add a couple of attributes to allow information about the mount topology
> and propagation to be retrieved:
> 
>  (1) FSINFO_ATTR_MOUNT_TOPOLOGY.
> 
>      Information about a mount's parentage in the mount topology tree and
>      its propagation attributes.
> 
>      This has to be collected with the VFS namespace lock held, so it's
>      separate from FSINFO_ATTR_MOUNT_INFO.  The topology change counter
>      that a subsequent patch will export can be used to work out from the
>      cheaper _INFO attribute as to whether the more expensive _TOPOLOGY
>      attribute needs requerying.
> 
>      MOUNT_PROPAGATION_* flags are added to linux/mount.h for UAPI
>      consumption.  At some point a mount_setattr() system call needs to be
>      added.
> 
>  (2) FSINFO_ATTR_MOUNT_CHILDREN.
> 
>      Information about a mount's children in the mount topology tree.
> 
>      This is formatted as an array of structures, one for each child and
>      capped with one for the argument mount (checked after listing all the
>      children).  Each element contains the static IDs of the respective
>      mount object along with a sum of its change attributes.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/fsinfo.c                 |    2 +
>  fs/internal.h               |    2 +
>  fs/namespace.c              |   94 +++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/fsinfo.h |   27 ++++++++++++
>  include/uapi/linux/mount.h  |   13 +++++-
>  samples/vfs/test-fsinfo.c   |   55 +++++++++++++++++++++++++
>  6 files changed, 192 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> index f276857709ee..0540cce89555 100644
> --- a/fs/fsinfo.c
> +++ b/fs/fsinfo.c
> @@ -291,9 +291,11 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
>  	FSINFO_VSTRUCT_N(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO, (void *)123UL),
>  
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_TOPOLOGY,	fsinfo_generic_mount_topology),
>  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_PATH,	fsinfo_generic_seq_read),
>  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
>  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT_FULL,	fsinfo_generic_mount_point_full),
> +	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
>  	{}
>  };
>  
> diff --git a/fs/internal.h b/fs/internal.h
> index a56008b7f3ec..cb5edcc7125a 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -98,8 +98,10 @@ extern void dissolve_on_fput(struct vfsmount *);
>  extern int lookup_mount_object(struct path *, unsigned int, struct path *);
>  extern int fsinfo_generic_mount_source(struct path *, struct fsinfo_context *);
>  extern int fsinfo_generic_mount_info(struct path *, struct fsinfo_context *);
> +extern int fsinfo_generic_mount_topology(struct path *, struct fsinfo_context *);
>  extern int fsinfo_generic_mount_point(struct path *, struct fsinfo_context *);
>  extern int fsinfo_generic_mount_point_full(struct path *, struct fsinfo_context *);
> +extern int fsinfo_generic_mount_children(struct path *, struct fsinfo_context *);
>  
>  /*
>   * fs_struct.c
> diff --git a/fs/namespace.c b/fs/namespace.c
> index c196af35d39d..b5c2a3b4f96d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -4303,6 +4303,54 @@ int fsinfo_generic_mount_info(struct path *path, struct fsinfo_context *ctx)
>  	return sizeof(*p);
>  }
>  
> +/*
> + * Retrieve information about the topology at the nominated mount and
> + * its propogation attributes.
> + */
> +int fsinfo_generic_mount_topology(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct fsinfo_mount_topology *p = ctx->buffer;
> +	struct mount *m;
> +	struct path root;
> +
> +	get_fs_root(current->fs, &root);
> +
> +	namespace_lock();
> +
> +	m = real_mount(path->mnt);
> +
> +	p->parent_id = m->mnt_parent->mnt_id;
> +
> +	if (path->mnt == root.mnt) {
> +		p->parent_id = m->mnt_id;
> +	} else {
> +		rcu_read_lock();
> +		if (!are_paths_connected(&root, path))
> +			p->parent_id = m->mnt_id;
> +		rcu_read_unlock();
> +	}
> +
> +	if (IS_MNT_SHARED(m)) {
> +		p->shared_group_id = m->mnt_group_id;
> +		p->propagation_type |= MOUNT_PROPAGATION_SHARED;
> +	} else if (IS_MNT_SLAVE(m)) {
> +		int source = m->mnt_master->mnt_group_id;
> +		int from = get_dominating_id(m, &root);
> +		p->dependent_source_id = source;
> +		if (from && from != source)
> +			p->dependent_clone_of_id = from;
> +		p->propagation_type |= MOUNT_PROPAGATION_DEPENDENT;
> +	} else if (IS_MNT_UNBINDABLE(m)) {
> +		p->propagation_type |= MOUNT_PROPAGATION_UNBINDABLE;
> +	} else {
> +		p->propagation_type |= MOUNT_PROPAGATION_PRIVATE;
> +	}
> +
> +	namespace_unlock();
> +	path_put(&root);
> +	return sizeof(*p);
> +}
> +
>  /*
>   * Return the path of this mount relative to its parent and clipped to
>   * the current chroot.
> @@ -4379,4 +4427,50 @@ int fsinfo_generic_mount_point_full(struct path *path, struct fsinfo_context *ct
>  	return (ctx->buffer + ctx->buf_size) - p;
>  }
>  
> +/*
> + * Store a mount record into the fsinfo buffer.
> + */
> +static void fsinfo_store_mount(struct fsinfo_context *ctx, const struct mount *p,
> +			       bool is_root)
> +{
> +	struct fsinfo_mount_child record = {};
> +	unsigned int usage = ctx->usage;
> +
> +	if (ctx->usage >= INT_MAX)
> +		return;
> +	ctx->usage = usage + sizeof(record);
> +	if (!ctx->buffer || ctx->usage > ctx->buf_size)
> +		return;
> +
> +	record.mnt_unique_id	= p->mnt_unique_id;
> +	record.mnt_id		= p->mnt_id;
> +	record.parent_id	= is_root ? p->mnt_id : p->mnt_parent->mnt_id;
> +	memcpy(ctx->buffer + usage, &record, sizeof(record));
> +}
> +
> +/*
> + * Return information about the submounts relative to path.
> + */
> +int fsinfo_generic_mount_children(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct mount *m, *child;
> +
> +	m = real_mount(path->mnt);
> +
> +	read_seqlock_excl(&mount_lock);
> +
> +	list_for_each_entry_rcu(child, &m->mnt_mounts, mnt_child) {
> +		if (child->mnt_parent != m)
> +			continue;
> +		fsinfo_store_mount(ctx, child, false);
> +	}
> +
> +	/* End the list with a copy of the parameter mount's details so that
> +	 * userspace can quickly check for changes.
> +	 */
> +	fsinfo_store_mount(ctx, m, true);
> +	read_sequnlock_excl(&mount_lock);
> +	return ctx->usage;
> +}
> +
>  #endif /* CONFIG_FSINFO */
> diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
> index 15ef161905cd..f0a352b7028e 100644
> --- a/include/uapi/linux/fsinfo.h
> +++ b/include/uapi/linux/fsinfo.h
> @@ -35,6 +35,8 @@
>  #define FSINFO_ATTR_MOUNT_PATH		0x201	/* Bind mount/superblock path (string) */
>  #define FSINFO_ATTR_MOUNT_POINT		0x202	/* Relative path of mount in parent (string) */
>  #define FSINFO_ATTR_MOUNT_POINT_FULL	0x203	/* Absolute path of mount (string) */
> +#define FSINFO_ATTR_MOUNT_TOPOLOGY	0x204	/* Mount object topology */
> +#define FSINFO_ATTR_MOUNT_CHILDREN	0x205	/* Children of this mount (list) */
>  
>  /*
>   * Optional fsinfo() parameter structure.
> @@ -102,6 +104,31 @@ struct fsinfo_mount_info {
>  
>  #define FSINFO_ATTR_MOUNT_INFO__STRUCT struct fsinfo_mount_info
>  
> +/*
> + * Information struct for fsinfo(FSINFO_ATTR_MOUNT_TOPOLOGY).
> + */
> +struct fsinfo_mount_topology {
> +	__u32	parent_id;		/* Parent mount identifier */

Again, which mount ID does this refer to?  I think we want this to be *the*
mount id that's both unique and can be looked up and that is 64 bits wide.


> +	__u32	shared_group_id;	/* Shared: mount group ID */
> +	__u32	dependent_source_id;	/* Dependent: source mount group ID */
> +	__u32	dependent_clone_of_id;	/* Dependent: ID of mount this was cloned from */

Another set of ID's that are currently 32bit *internally* but that doesn't mean
they will always be 32 bit.

And that last one (apart from "slave" being obfuscated) is simply incorrect.  It
has nothing to do with cloning.  It's the "ID of the closest peer group in the
propagation chain that has a representative mount in the current root".

> +	__u32	propagation_type;	/* MOUNT_PROPAGATION_* type */
> +};
> +
> +#define FSINFO_ATTR_MOUNT_TOPOLOGY__STRUCT struct fsinfo_mount_topology
> +
> +/*
> + * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_CHILDREN).
> + * - An extra element is placed on the end representing the parent mount.
> + */
> +struct fsinfo_mount_child {
> +	__u64	mnt_unique_id;		/* Kernel-lifetime unique mount ID */
> +	__u32	mnt_id;			/* Mount identifier (use with AT_FSINFO_MOUNTID_PATH) */
> +	__u32	parent_id;		/* Parent mount identifier */


Again, which ID do we want for this and parent?  Preferably one which is 64bit.
As it is we are operating with 96bit mount ID's, which is excessive.

> +};
> +
> +#define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct fsinfo_mount_child
> +
>  /*
>   * Information struct for fsinfo(FSINFO_ATTR_STATFS).
>   * - This gives extended filesystem information.
> diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
> index 96a0240f23fe..9ac8bb708843 100644
> --- a/include/uapi/linux/mount.h
> +++ b/include/uapi/linux/mount.h
> @@ -105,7 +105,7 @@ enum fsconfig_command {
>  #define FSMOUNT_CLOEXEC		0x00000001
>  
>  /*
> - * Mount attributes.
> + * Mount object attributes (these are separate to filesystem attributes).
>   */
>  #define MOUNT_ATTR_RDONLY	0x00000001 /* Mount read-only */
>  #define MOUNT_ATTR_NOSUID	0x00000002 /* Ignore suid and sgid bits */
> @@ -117,4 +117,15 @@ enum fsconfig_command {
>  #define MOUNT_ATTR_STRICTATIME	0x00000020 /* - Always perform atime updates */
>  #define MOUNT_ATTR_NODIRATIME	0x00000080 /* Do not update directory access times */
>  
> +/*
> + * Mount object propagation type.
> + */
> +enum propagation_type {
> +	/* 0 is left unallocated to mean "no change" in mount_setattr()  */
> +	MOUNT_PROPAGATION_UNBINDABLE	= 1, /* Make unbindable. */
> +	MOUNT_PROPAGATION_PRIVATE	= 2, /* Do not receive or send mount events. */
> +	MOUNT_PROPAGATION_DEPENDENT	= 3, /* Only receive mount events. */
> +	MOUNT_PROPAGATION_SHARED	= 4, /* Send and receive mount events. */
> +};
> +
>  #endif /* _UAPI_LINUX_MOUNT_H */
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> index f3bebb7318d9..b7290ea8eb55 100644
> --- a/samples/vfs/test-fsinfo.c
> +++ b/samples/vfs/test-fsinfo.c
> @@ -21,6 +21,7 @@
>  #include <sys/syscall.h>
>  #include <linux/fsinfo.h>
>  #include <linux/socket.h>
> +#include <linux/mount.h>
>  #include <sys/stat.h>
>  #include <arpa/inet.h>
>  
> @@ -305,6 +306,58 @@ static void dump_fsinfo_generic_mount_info(void *reply, unsigned int size)
>  	printf("\tattr    : %x\n", r->attr);
>  }
>  
> +static void dump_fsinfo_generic_mount_topology(void *reply, unsigned int size)
> +{
> +	struct fsinfo_mount_topology *r = reply;
> +
> +	printf("\n");
> +	printf("\tparent  : %x\n", r->parent_id);
> +
> +	switch (r->propagation_type) {
> +	case MOUNT_PROPAGATION_UNBINDABLE:
> +		printf("\tpropag  : unbindable\n");
> +		break;
> +	case MOUNT_PROPAGATION_PRIVATE:
> +		printf("\tpropag  : private\n");
> +		break;
> +	case MOUNT_PROPAGATION_DEPENDENT:
> +		printf("\tpropag  : dependent source=%x clone_of=%x\n",
> +		       r->dependent_source_id, r->dependent_clone_of_id);
> +		break;
> +	case MOUNT_PROPAGATION_SHARED:
> +		printf("\tpropag  : shared group=%x\n", r->shared_group_id);
> +		break;
> +	default:
> +		printf("\tpropag  : unknown type %x\n", r->propagation_type);
> +		break;
> +	}
> +
> +}
> +
> +static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
> +{
> +	struct fsinfo_mount_child *r = reply;
> +	ssize_t mplen;
> +	char path[32], *mp;
> +
> +	struct fsinfo_params params = {
> +		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
> +		.request	= FSINFO_ATTR_MOUNT_POINT,
> +	};
> +
> +	if (!list_last) {
> +		sprintf(path, "%u", r->mnt_id);
> +		mplen = get_fsinfo(path, "FSINFO_ATTR_MOUNT_POINT", &params, (void **)&mp);
> +		if (mplen < 0)
> +			mp = "-";
> +	} else {
> +		mp = "<this>";
> +	}
> +
> +	printf("%8x %16llx %s\n",
> +	       r->mnt_id, (unsigned long long)r->mnt_unique_id, mp);
> +}
> +
>  static void dump_string(void *reply, unsigned int size)
>  {
>  	char *s = reply, *p;
> @@ -383,9 +436,11 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
>  	FSINFO_LIST	(FSINFO_ATTR_FSINFO_ATTRIBUTES,	fsinfo_meta_attributes),
>  
>  	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_INFO,	fsinfo_generic_mount_info),
> +	FSINFO_VSTRUCT	(FSINFO_ATTR_MOUNT_TOPOLOGY,	fsinfo_generic_mount_topology),
>  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_PATH,	string),
>  	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	string),
>  	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT_FULL,	string),
> +	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
>  	{}
>  };
>  
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-03 13:37 ` [PATCH 10/18] fsinfo: Provide notification overrun handling support " David Howells
@ 2020-08-04 13:56   ` Miklos Szeredi
  2020-08-05  2:05     ` Ian Kent
  2020-08-05 16:06   ` David Howells
  1 sibling, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04 13:56 UTC (permalink / raw)
  To: David Howells
  Cc: viro, torvalds, raven, mszeredi, christian, jannh, darrick.wong,
	kzak, jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Mon, Aug 03, 2020 at 02:37:50PM +0100, David Howells wrote:
> Provide support for the handling of an overrun in a watch queue.  In the
> event that an overrun occurs, the watcher needs to be able to find out what
> it was that they missed.  To this end, previous patches added event
> counters to struct mount.

So this is optimizing the buffer overrun case?

Shoun't we just make sure that the likelyhood of overruns is low and if it
happens, just reinitialize everthing from scratch (shouldn't be *that*
expensive).

Trying to find out what was missed seems like just adding complexity for no good
reason.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace [ver #21]
  2020-08-03 13:38 ` [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace " David Howells
@ 2020-08-04 14:05   ` Miklos Szeredi
  2020-08-05  0:59     ` Ian Kent
  2020-08-05 16:44   ` David Howells
  1 sibling, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04 14:05 UTC (permalink / raw)
  To: David Howells
  Cc: viro, torvalds, raven, mszeredi, christian, jannh, darrick.wong,
	kzak, jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Mon, Aug 03, 2020 at 02:38:34PM +0100, David Howells wrote:
> Add a filesystem attribute that exports a list of all the visible mounts in
> a namespace, given the caller's chroot setting.  The returned list is an
> array of:
> 
> 	struct fsinfo_mount_child {
> 		__u64	mnt_unique_id;
> 		__u32	mnt_id;
> 		__u32	parent_id;
> 		__u32	mnt_notify_sum;
> 		__u32	sb_notify_sum;
> 	};
> 
> where each element contains a once-in-a-system-lifetime unique ID, the
> mount ID (which may get reused), the parent mount ID and sums of the
> notification/change counters for the mount and its superblock.

The change counters are currently conditional on CONFIG_MOUNT_NOTIFICATIONS.
Is this is intentional?

> 
> This works with a read lock on the namespace_sem, but ideally would do it
> under the RCU read lock only.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/fsinfo.c                 |    1 +
>  fs/internal.h               |    1 +
>  fs/namespace.c              |   37 +++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/fsinfo.h |    4 ++++
>  samples/vfs/test-fsinfo.c   |   22 ++++++++++++++++++++++
>  5 files changed, 65 insertions(+)
> 
> diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> index 0540cce89555..f230124ffdf5 100644
> --- a/fs/fsinfo.c
> +++ b/fs/fsinfo.c
> @@ -296,6 +296,7 @@ static const struct fsinfo_attribute fsinfo_common_attributes[] = {
>  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_generic_mount_point),
>  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT_FULL,	fsinfo_generic_mount_point_full),
>  	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
> +	FSINFO_LIST	(FSINFO_ATTR_MOUNT_ALL,		fsinfo_generic_mount_all),
>  	{}
>  };
>  
> diff --git a/fs/internal.h b/fs/internal.h
> index cb5edcc7125a..267b4aaf0271 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -102,6 +102,7 @@ extern int fsinfo_generic_mount_topology(struct path *, struct fsinfo_context *)
>  extern int fsinfo_generic_mount_point(struct path *, struct fsinfo_context *);
>  extern int fsinfo_generic_mount_point_full(struct path *, struct fsinfo_context *);
>  extern int fsinfo_generic_mount_children(struct path *, struct fsinfo_context *);
> +extern int fsinfo_generic_mount_all(struct path *, struct fsinfo_context *);
>  
>  /*
>   * fs_struct.c
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 122c12f9512b..1f2e06507244 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -4494,4 +4494,41 @@ int fsinfo_generic_mount_children(struct path *path, struct fsinfo_context *ctx)
>  	return ctx->usage;
>  }
>  
> +/*
> + * Return information about all the mounts in the namespace referenced by the
> + * path.
> + */
> +int fsinfo_generic_mount_all(struct path *path, struct fsinfo_context *ctx)
> +{
> +	struct mnt_namespace *ns;
> +	struct mount *m, *p;
> +	struct path chroot;
> +	bool allow;
> +
> +	m = real_mount(path->mnt);
> +	ns = m->mnt_ns;
> +
> +	get_fs_root(current->fs, &chroot);
> +	rcu_read_lock();
> +	allow = are_paths_connected(&chroot, path) || capable(CAP_SYS_ADMIN);
> +	rcu_read_unlock();
> +	path_put(&chroot);
> +	if (!allow)
> +		return -EPERM;
> +
> +	down_read(&namespace_sem);
> +
> +	list_for_each_entry(p, &ns->list, mnt_list) {

This is missing locking and check added by commit 9f6c61f96f2d ("proc/mounts:
add cursor").

> +		struct path mnt_root;
> +
> +		mnt_root.mnt	= &p->mnt;
> +		mnt_root.dentry	= p->mnt.mnt_root;
> +		if (are_paths_connected(path, &mnt_root))
> +			fsinfo_store_mount(ctx, p, p == m);
> +	}
> +
> +	up_read(&namespace_sem);
> +	return ctx->usage;
> +}
> +
>  #endif /* CONFIG_FSINFO */
> diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
> index 81329de6905e..e40192d98648 100644
> --- a/include/uapi/linux/fsinfo.h
> +++ b/include/uapi/linux/fsinfo.h
> @@ -37,6 +37,7 @@
>  #define FSINFO_ATTR_MOUNT_POINT_FULL	0x203	/* Absolute path of mount (string) */
>  #define FSINFO_ATTR_MOUNT_TOPOLOGY	0x204	/* Mount object topology */
>  #define FSINFO_ATTR_MOUNT_CHILDREN	0x205	/* Children of this mount (list) */
> +#define FSINFO_ATTR_MOUNT_ALL		0x206	/* List all mounts in a namespace (list) */
>  
>  #define FSINFO_ATTR_AFS_CELL_NAME	0x300	/* AFS cell name (string) */
>  #define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of the Nth server (string) */
> @@ -128,6 +129,8 @@ struct fsinfo_mount_topology {
>  /*
>   * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_CHILDREN).
>   * - An extra element is placed on the end representing the parent mount.
> + *
> + * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_ALL).
>   */
>  struct fsinfo_mount_child {
>  	__u64	mnt_unique_id;		/* Kernel-lifetime unique mount ID */
> @@ -139,6 +142,7 @@ struct fsinfo_mount_child {
>  };
>  
>  #define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct fsinfo_mount_child
> +#define FSINFO_ATTR_MOUNT_ALL__STRUCT struct fsinfo_mount_child
>  
>  /*
>   * Information struct for fsinfo(FSINFO_ATTR_STATFS).
> diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> index 374825ab85b0..596fa5e71762 100644
> --- a/samples/vfs/test-fsinfo.c
> +++ b/samples/vfs/test-fsinfo.c
> @@ -365,6 +365,27 @@ static void dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
>  	       (unsigned long long)r->mnt_notify_sum, mp);
>  }
>  
> +static void dump_fsinfo_generic_mount_all(void *reply, unsigned int size)
> +{
> +	struct fsinfo_mount_child *r = reply;
> +	ssize_t mplen;
> +	char path[32], *mp;
> +
> +	struct fsinfo_params params = {
> +		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
> +		.request	= FSINFO_ATTR_MOUNT_POINT_FULL,
> +	};
> +
> +	sprintf(path, "%u", r->mnt_id);
> +	mplen = get_fsinfo(path, "FSINFO_ATTR_MOUNT_POINT_FULL", &params, (void **)&mp);
> +	if (mplen < 0)
> +		mp = "-";
> +
> +	printf("%5x %5x %12llx %10llu %s\n",
> +	       r->mnt_id, r->parent_id, (unsigned long long)r->mnt_unique_id,
> +	       r->mnt_notify_sum, mp);
> +}
> +
>  static void dump_afs_fsinfo_server_address(void *reply, unsigned int size)
>  {
>  	struct fsinfo_afs_server_address *f = reply;
> @@ -492,6 +513,7 @@ static const struct fsinfo_attribute fsinfo_attributes[] = {
>  	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	string),
>  	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT_FULL,	string),
>  	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_mount_children),
> +	FSINFO_LIST	(FSINFO_ATTR_MOUNT_ALL,		fsinfo_generic_mount_all),
>  
>  	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	string),
>  	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	string),
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/18] VFS: Filesystem information [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (17 preceding siblings ...)
  2020-08-03 13:39 ` [PATCH 18/18] samples: add error state information to test-fsinfo.c " David Howells
@ 2020-08-04 15:39 ` James Bottomley
  2020-08-04 19:18   ` Miklos Szeredi
  2020-08-05 17:13 ` David Howells
  19 siblings, 1 reply; 49+ messages in thread
From: James Bottomley @ 2020-08-04 15:39 UTC (permalink / raw)
  To: David Howells, viro
  Cc: Theodore Ts'o, Andreas Dilger, Eric Biggers, Jeff Layton,
	linux-ext4, Carlos Maiolino, Darrick J. Wong, linux-api,
	torvalds, raven, mszeredi, christian, jannh, kzak, jlayton,
	linux-fsdevel, linux-security-module, linux-kernel

On Mon, 2020-08-03 at 14:36 +0100, David Howells wrote:
> Here's a set of patches that adds a system call, fsinfo(), that
> allows information about the VFS, mount topology, superblock and
> files to be retrieved.
> 
> The patchset is based on top of the notifications patchset and allows
> event counters implemented in the latter to be retrieved to allow
> overruns to be efficiently managed.

Could I repeat the question I asked about six months back that never
got answered:

https://lore.kernel.org/linux-api/1582316494.3376.45.camel@HansenPartnership.com/

It sort of petered out into a long winding thread about why not use
sysfs instead, which really doesn't look like a good idea to me.

I'll repeat the information for those who want to quote it easily on
reply without having to use a web interface:

---
Could I make a suggestion about how this should be done in a way that
doesn't actually require the fsinfo syscall at all: it could just be
done with fsconfig.  The idea is based on something I've wanted to do
for configfd but couldn't because otherwise it wouldn't substitute for
fsconfig, but Christian made me think it was actually essential to the
ability of the seccomp and other verifier tools in the critique of
configfd and I belive the same critique applies here.

Instead of making fsconfig functionally configure ... as in you pass
the attribute name, type and parameters down into the fs specific
handler and the handler does a string match and then verifies the
parameters and then acts on them, make it table configured, so what
each fstype does is register a table of attributes which can be got and
optionally set (with each attribute having a get and optional set
function).  We'd have multiple tables per fstype, so the generic VFS
can register a table of attributes it understands for every fstype
(things like name, uuid and the like) and then each fs type would
register a table of fs specific attributes following the same pattern. 
The system would examine the fs specific table before the generic one,
allowing overrides.  fsconfig would have the ability to both get and
set attributes, permitting retrieval as well as setting (which is how I
get rid of the fsinfo syscall), we'd have a global parameter, which
would retrieve the entire table by name and type so the whole thing is
introspectable because the upper layer knows a-priori all the
attributes which can be set for a given fs type and what type they are
(so we can make more of the parsing generic).  Any attribute which
doesn't have a set routine would be read only and all attributes would
have to have a get routine meaning everything is queryable.

I think I know how to code this up in a way that would be fully
transparent to the existing syscalls.
---

James




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/18] VFS: Filesystem information [ver #21]
  2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
@ 2020-08-04 19:18   ` Miklos Szeredi
  0 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-04 19:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Howells, Al Viro, Theodore Ts'o, Andreas Dilger,
	Eric Biggers, Jeff Layton, linux-ext4, Carlos Maiolino,
	Darrick J. Wong, Linux API, Linus Torvalds, Ian Kent,
	Miklos Szeredi, Christian Brauner, Jann Horn, Karel Zak,
	Jeff Layton, linux-fsdevel, LSM, linux-kernel

On Tue, Aug 4, 2020 at 5:40 PM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Mon, 2020-08-03 at 14:36 +0100, David Howells wrote:
> > Here's a set of patches that adds a system call, fsinfo(), that
> > allows information about the VFS, mount topology, superblock and
> > files to be retrieved.
> >
> > The patchset is based on top of the notifications patchset and allows
> > event counters implemented in the latter to be retrieved to allow
> > overruns to be efficiently managed.
>
> Could I repeat the question I asked about six months back that never
> got answered:
>
> https://lore.kernel.org/linux-api/1582316494.3376.45.camel@HansenPartnership.com/
>
> It sort of petered out into a long winding thread about why not use
> sysfs instead, which really doesn't look like a good idea to me.
>
> I'll repeat the information for those who want to quote it easily on
> reply without having to use a web interface:
>
> ---
> Could I make a suggestion about how this should be done in a way that
> doesn't actually require the fsinfo syscall at all: it could just be
> done with fsconfig.  The idea is based on something I've wanted to do
> for configfd but couldn't because otherwise it wouldn't substitute for
> fsconfig, but Christian made me think it was actually essential to the
> ability of the seccomp and other verifier tools in the critique of
> configfd and I belive the same critique applies here.
>
> Instead of making fsconfig functionally configure ... as in you pass
> the attribute name, type and parameters down into the fs specific
> handler and the handler does a string match and then verifies the
> parameters and then acts on them, make it table configured, so what
> each fstype does is register a table of attributes which can be got and
> optionally set (with each attribute having a get and optional set
> function).  We'd have multiple tables per fstype, so the generic VFS
> can register a table of attributes it understands for every fstype
> (things like name, uuid and the like) and then each fs type would
> register a table of fs specific attributes following the same pattern.
> The system would examine the fs specific table before the generic one,
> allowing overrides.  fsconfig would have the ability to both get and
> set attributes, permitting retrieval as well as setting (which is how I
> get rid of the fsinfo syscall), we'd have a global parameter, which
> would retrieve the entire table by name and type so the whole thing is
> introspectable because the upper layer knows a-priori all the
> attributes which can be set for a given fs type and what type they are
> (so we can make more of the parsing generic).  Any attribute which
> doesn't have a set routine would be read only and all attributes would
> have to have a get routine meaning everything is queryable.

fsconfig(2) takes an fd referring to an fs_context, that in turn
refers to a super_block.

So using fsconfig() for retrieving super_block attributes would be
fine (modulo value being const, and lack of buffer size).

But what about mount attributes?

I don't buy the argument that an API needs to be designed around the
requirements of seccomp and the like.  It should be the other way
round.  In that, I think your configfd idea was fine, and would answer
the above question.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace [ver #21]
  2020-08-04 14:05   ` Miklos Szeredi
@ 2020-08-05  0:59     ` Ian Kent
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Kent @ 2020-08-05  0:59 UTC (permalink / raw)
  To: Miklos Szeredi, David Howells
  Cc: viro, torvalds, mszeredi, christian, jannh, darrick.wong, kzak,
	jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Tue, 2020-08-04 at 16:05 +0200, Miklos Szeredi wrote:
> On Mon, Aug 03, 2020 at 02:38:34PM +0100, David Howells wrote:
> > Add a filesystem attribute that exports a list of all the visible
> > mounts in
> > a namespace, given the caller's chroot setting.  The returned list
> > is an
> > array of:
> > 
> > 	struct fsinfo_mount_child {
> > 		__u64	mnt_unique_id;
> > 		__u32	mnt_id;
> > 		__u32	parent_id;
> > 		__u32	mnt_notify_sum;
> > 		__u32	sb_notify_sum;
> > 	};
> > 
> > where each element contains a once-in-a-system-lifetime unique ID,
> > the
> > mount ID (which may get reused), the parent mount ID and sums of
> > the
> > notification/change counters for the mount and its superblock.
> 
> The change counters are currently conditional on
> CONFIG_MOUNT_NOTIFICATIONS.
> Is this is intentional?
> 
> > This works with a read lock on the namespace_sem, but ideally would
> > do it
> > under the RCU read lock only.
> > 
> > Signed-off-by: David Howells <dhowells@redhat.com>
> > ---
> > 
> >  fs/fsinfo.c                 |    1 +
> >  fs/internal.h               |    1 +
> >  fs/namespace.c              |   37
> > +++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/fsinfo.h |    4 ++++
> >  samples/vfs/test-fsinfo.c   |   22 ++++++++++++++++++++++
> >  5 files changed, 65 insertions(+)
> > 
> > diff --git a/fs/fsinfo.c b/fs/fsinfo.c
> > index 0540cce89555..f230124ffdf5 100644
> > --- a/fs/fsinfo.c
> > +++ b/fs/fsinfo.c
> > @@ -296,6 +296,7 @@ static const struct fsinfo_attribute
> > fsinfo_common_attributes[] = {
> >  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT,	fsinfo_gene
> > ric_mount_point),
> >  	FSINFO_STRING	(FSINFO_ATTR_MOUNT_POINT_FULL,	fsinfo_gene
> > ric_mount_point_full),
> >  	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_moun
> > t_children),
> > +	FSINFO_LIST	(FSINFO_ATTR_MOUNT_ALL,		fsinfo_generic_moun
> > t_all),
> >  	{}
> >  };
> >  
> > diff --git a/fs/internal.h b/fs/internal.h
> > index cb5edcc7125a..267b4aaf0271 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -102,6 +102,7 @@ extern int fsinfo_generic_mount_topology(struct
> > path *, struct fsinfo_context *)
> >  extern int fsinfo_generic_mount_point(struct path *, struct
> > fsinfo_context *);
> >  extern int fsinfo_generic_mount_point_full(struct path *, struct
> > fsinfo_context *);
> >  extern int fsinfo_generic_mount_children(struct path *, struct
> > fsinfo_context *);
> > +extern int fsinfo_generic_mount_all(struct path *, struct
> > fsinfo_context *);
> >  
> >  /*
> >   * fs_struct.c
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 122c12f9512b..1f2e06507244 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -4494,4 +4494,41 @@ int fsinfo_generic_mount_children(struct
> > path *path, struct fsinfo_context *ctx)
> >  	return ctx->usage;
> >  }
> >  
> > +/*
> > + * Return information about all the mounts in the namespace
> > referenced by the
> > + * path.
> > + */
> > +int fsinfo_generic_mount_all(struct path *path, struct
> > fsinfo_context *ctx)
> > +{
> > +	struct mnt_namespace *ns;
> > +	struct mount *m, *p;
> > +	struct path chroot;
> > +	bool allow;
> > +
> > +	m = real_mount(path->mnt);
> > +	ns = m->mnt_ns;
> > +
> > +	get_fs_root(current->fs, &chroot);
> > +	rcu_read_lock();
> > +	allow = are_paths_connected(&chroot, path) ||
> > capable(CAP_SYS_ADMIN);
> > +	rcu_read_unlock();
> > +	path_put(&chroot);
> > +	if (!allow)
> > +		return -EPERM;
> > +
> > +	down_read(&namespace_sem);
> > +
> > +	list_for_each_entry(p, &ns->list, mnt_list) {
> 
> This is missing locking and check added by commit 9f6c61f96f2d
> ("proc/mounts:
> add cursor").

That's a good catch Miklos.

Yes, the extra lock and the cursor check that's now needed.

> 
> > +		struct path mnt_root;
> > +
> > +		mnt_root.mnt	= &p->mnt;
> > +		mnt_root.dentry	= p->mnt.mnt_root;
> > +		if (are_paths_connected(path, &mnt_root))
> > +			fsinfo_store_mount(ctx, p, p == m);
> > +	}
> > +
> > +	up_read(&namespace_sem);
> > +	return ctx->usage;
> > +}
> > +
> >  #endif /* CONFIG_FSINFO */
> > diff --git a/include/uapi/linux/fsinfo.h
> > b/include/uapi/linux/fsinfo.h
> > index 81329de6905e..e40192d98648 100644
> > --- a/include/uapi/linux/fsinfo.h
> > +++ b/include/uapi/linux/fsinfo.h
> > @@ -37,6 +37,7 @@
> >  #define FSINFO_ATTR_MOUNT_POINT_FULL	0x203	/* Absolute
> > path of mount (string) */
> >  #define FSINFO_ATTR_MOUNT_TOPOLOGY	0x204	/* Mount object
> > topology */
> >  #define FSINFO_ATTR_MOUNT_CHILDREN	0x205	/* Children of this
> > mount (list) */
> > +#define FSINFO_ATTR_MOUNT_ALL		0x206	/* List all
> > mounts in a namespace (list) */
> >  
> >  #define FSINFO_ATTR_AFS_CELL_NAME	0x300	/* AFS cell name
> > (string) */
> >  #define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of
> > the Nth server (string) */
> > @@ -128,6 +129,8 @@ struct fsinfo_mount_topology {
> >  /*
> >   * Information struct element for
> > fsinfo(FSINFO_ATTR_MOUNT_CHILDREN).
> >   * - An extra element is placed on the end representing the parent
> > mount.
> > + *
> > + * Information struct element for fsinfo(FSINFO_ATTR_MOUNT_ALL).
> >   */
> >  struct fsinfo_mount_child {
> >  	__u64	mnt_unique_id;		/* Kernel-lifetime unique
> > mount ID */
> > @@ -139,6 +142,7 @@ struct fsinfo_mount_child {
> >  };
> >  
> >  #define FSINFO_ATTR_MOUNT_CHILDREN__STRUCT struct
> > fsinfo_mount_child
> > +#define FSINFO_ATTR_MOUNT_ALL__STRUCT struct fsinfo_mount_child
> >  
> >  /*
> >   * Information struct for fsinfo(FSINFO_ATTR_STATFS).
> > diff --git a/samples/vfs/test-fsinfo.c b/samples/vfs/test-fsinfo.c
> > index 374825ab85b0..596fa5e71762 100644
> > --- a/samples/vfs/test-fsinfo.c
> > +++ b/samples/vfs/test-fsinfo.c
> > @@ -365,6 +365,27 @@ static void
> > dump_fsinfo_generic_mount_children(void *reply, unsigned int size)
> >  	       (unsigned long long)r->mnt_notify_sum, mp);
> >  }
> >  
> > +static void dump_fsinfo_generic_mount_all(void *reply, unsigned
> > int size)
> > +{
> > +	struct fsinfo_mount_child *r = reply;
> > +	ssize_t mplen;
> > +	char path[32], *mp;
> > +
> > +	struct fsinfo_params params = {
> > +		.flags		= FSINFO_FLAGS_QUERY_MOUNT,
> > +		.request	= FSINFO_ATTR_MOUNT_POINT_FULL,
> > +	};
> > +
> > +	sprintf(path, "%u", r->mnt_id);
> > +	mplen = get_fsinfo(path, "FSINFO_ATTR_MOUNT_POINT_FULL",
> > &params, (void **)&mp);
> > +	if (mplen < 0)
> > +		mp = "-";
> > +
> > +	printf("%5x %5x %12llx %10llu %s\n",
> > +	       r->mnt_id, r->parent_id, (unsigned long long)r-
> > >mnt_unique_id,
> > +	       r->mnt_notify_sum, mp);
> > +}
> > +
> >  static void dump_afs_fsinfo_server_address(void *reply, unsigned
> > int size)
> >  {
> >  	struct fsinfo_afs_server_address *f = reply;
> > @@ -492,6 +513,7 @@ static const struct fsinfo_attribute
> > fsinfo_attributes[] = {
> >  	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT,	string),
> >  	FSINFO_STRING_N	(FSINFO_ATTR_MOUNT_POINT_FULL,	string),
> >  	FSINFO_LIST	(FSINFO_ATTR_MOUNT_CHILDREN,	fsinfo_generic_moun
> > t_children),
> > +	FSINFO_LIST	(FSINFO_ATTR_MOUNT_ALL,		fsinfo_generic_moun
> > t_all),
> >  
> >  	FSINFO_STRING	(FSINFO_ATTR_AFS_CELL_NAME,	string),
> >  	FSINFO_STRING	(FSINFO_ATTR_AFS_SERVER_NAME,	string),
> > 
> > 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-04 13:56   ` Miklos Szeredi
@ 2020-08-05  2:05     ` Ian Kent
  2020-08-05  2:46       ` Ian Kent
  0 siblings, 1 reply; 49+ messages in thread
From: Ian Kent @ 2020-08-05  2:05 UTC (permalink / raw)
  To: Miklos Szeredi, David Howells
  Cc: viro, torvalds, mszeredi, christian, jannh, darrick.wong, kzak,
	jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Tue, 2020-08-04 at 15:56 +0200, Miklos Szeredi wrote:
> On Mon, Aug 03, 2020 at 02:37:50PM +0100, David Howells wrote:
> > Provide support for the handling of an overrun in a watch
> > queue.  In the
> > event that an overrun occurs, the watcher needs to be able to find
> > out what
> > it was that they missed.  To this end, previous patches added event
> > counters to struct mount.
> 
> So this is optimizing the buffer overrun case?
> 
> Shoun't we just make sure that the likelyhood of overruns is low and
> if it
> happens, just reinitialize everthing from scratch (shouldn't be
> *that*
> expensive).

But maybe not possible if you are using notifications for tracking
state in user space, you need to know when the thing you have needs
to be synced because you missed something and it's during the
notification processing you actually have the object that may need
to be refreshed.

> 
> Trying to find out what was missed seems like just adding complexity
> for no good
> reason.
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-05  2:05     ` Ian Kent
@ 2020-08-05  2:46       ` Ian Kent
  2020-08-05  7:45         ` Miklos Szeredi
  0 siblings, 1 reply; 49+ messages in thread
From: Ian Kent @ 2020-08-05  2:46 UTC (permalink / raw)
  To: Miklos Szeredi, David Howells
  Cc: viro, torvalds, mszeredi, christian, jannh, darrick.wong, kzak,
	jlayton, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

On Wed, 2020-08-05 at 10:05 +0800, Ian Kent wrote:
> On Tue, 2020-08-04 at 15:56 +0200, Miklos Szeredi wrote:
> > On Mon, Aug 03, 2020 at 02:37:50PM +0100, David Howells wrote:
> > > Provide support for the handling of an overrun in a watch
> > > queue.  In the
> > > event that an overrun occurs, the watcher needs to be able to
> > > find
> > > out what
> > > it was that they missed.  To this end, previous patches added
> > > event
> > > counters to struct mount.
> > 
> > So this is optimizing the buffer overrun case?
> > 
> > Shoun't we just make sure that the likelyhood of overruns is low
> > and
> > if it
> > happens, just reinitialize everthing from scratch (shouldn't be
> > *that*
> > expensive).
> 
> But maybe not possible if you are using notifications for tracking
> state in user space, you need to know when the thing you have needs
> to be synced because you missed something and it's during the
> notification processing you actually have the object that may need
> to be refreshed.
> 
> > Trying to find out what was missed seems like just adding
> > complexity
> > for no good
> > reason.

Coming back to an actual use case.

What I said above is one aspect but, since I'm looking at this right
now with systemd, and I do have the legacy code to fall back to, the
"just reset everything" suggestion does make sense.

But I'm struggling to see how I can identify notification buffer
overrun in libmount, and overrun is just one possibility for lost
notifications, so I like the idea that, as a library user, I can
work out that I need to take action based on what I have in the
notifications themselves.

Ian


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-05  2:46       ` Ian Kent
@ 2020-08-05  7:45         ` Miklos Szeredi
  2020-08-05 11:23           ` Ian Kent
  0 siblings, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-05  7:45 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Al Viro, Linus Torvalds, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, Aug 5, 2020 at 4:46 AM Ian Kent <raven@themaw.net> wrote:
>

> Coming back to an actual use case.
>
> What I said above is one aspect but, since I'm looking at this right
> now with systemd, and I do have the legacy code to fall back to, the
> "just reset everything" suggestion does make sense.
>
> But I'm struggling to see how I can identify notification buffer
> overrun in libmount, and overrun is just one possibility for lost
> notifications, so I like the idea that, as a library user, I can
> work out that I need to take action based on what I have in the
> notifications themselves.

Hmm, what's the other possibility for lost notifications?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-05  7:45         ` Miklos Szeredi
@ 2020-08-05 11:23           ` Ian Kent
  2020-08-05 11:27             ` Miklos Szeredi
  0 siblings, 1 reply; 49+ messages in thread
From: Ian Kent @ 2020-08-05 11:23 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, Al Viro, Linus Torvalds, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, 2020-08-05 at 09:45 +0200, Miklos Szeredi wrote:
> On Wed, Aug 5, 2020 at 4:46 AM Ian Kent <raven@themaw.net> wrote:
> > Coming back to an actual use case.
> > 
> > What I said above is one aspect but, since I'm looking at this
> > right
> > now with systemd, and I do have the legacy code to fall back to,
> > the
> > "just reset everything" suggestion does make sense.
> > 
> > But I'm struggling to see how I can identify notification buffer
> > overrun in libmount, and overrun is just one possibility for lost
> > notifications, so I like the idea that, as a library user, I can
> > work out that I need to take action based on what I have in the
> > notifications themselves.
> 
> Hmm, what's the other possibility for lost notifications?

In user space that is:

Multi-threaded application races, single threaded applications and
signal processing races, other bugs ...

For example systemd has it's own event handling sub-system and handles
half a dozen or so event types of which the mount changes are one. It's
fairly complex so I find myself wondering if I can trust it and
wondering if there are undiscovered bugs in it. The answer to the
former is probably yes but the answer to the later is also probably
yes.

Maybe I just paranoid!
Ian



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-05 11:23           ` Ian Kent
@ 2020-08-05 11:27             ` Miklos Szeredi
  2020-08-06  1:47               ` Ian Kent
  0 siblings, 1 reply; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-05 11:27 UTC (permalink / raw)
  To: Ian Kent
  Cc: David Howells, Al Viro, Linus Torvalds, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, Aug 5, 2020 at 1:23 PM Ian Kent <raven@themaw.net> wrote:
>
> On Wed, 2020-08-05 at 09:45 +0200, Miklos Szeredi wrote:

> > Hmm, what's the other possibility for lost notifications?
>
> In user space that is:
>
> Multi-threaded application races, single threaded applications and
> signal processing races, other bugs ...

Okay, let's fix the bugs then.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-03 13:37 ` [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount " David Howells
  2020-08-04 10:41   ` Miklos Szeredi
@ 2020-08-05 14:13   ` David Howells
  2020-08-05 14:46     ` Miklos Szeredi
  2020-08-05 15:30     ` David Howells
  1 sibling, 2 replies; 49+ messages in thread
From: David Howells @ 2020-08-05 14:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, viro, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> > +#ifdef CONFIG_FSINFO
> > +	u64	mnt_unique_id;		/* ID unique over lifetime of kernel */
> > +#endif
>
> Not sure if it's worth making conditional.

You can't get at it without CONFIG_FSINFO=y as it stands, but making it
unconditional might be reasonable.

> > -		n.auxiliary_mount	= aux->mnt_id;
> > +		n.auxiliary_mount = aux->mnt_unique_id;
>
> Hmm, so we now have two ID's:
>
>  - one can be used to look up the mount
>  - one is guaranteed to be unique
>
> With this change the mount cannot be looked up with FSINFO_FLAGS_QUERY_MOUNT,
> right?
>
> Should we be merging the two ID's into a single one which has both properties?

Ideally, yes... but...  The 31-bit mnt_id is currently exposed to userspace in
various places, e.g. /proc, sys_name_to_handle_at().  So we have to keep that
as is and we can't expand it.

For fsinfo(), however, it might make sense to only use the 64-bit uniquifier
as the identifier to use for direct look up.

However, looking up that identifier requires some sort of structure for doing
this and it's kind of worst case for the IDR tree as the keys are gradually
going to spread out, causing it to eat more memory.  It may be a tradeoff
worth making, and the memory consumption might not be that bad - or we could
use some other data structure such as an rbtree.

That's why I was going for the 31-bit identifier + uniquifier so that you can at
least tell if the identifier got recycled reasonably quickly.

David


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-05 14:13   ` David Howells
@ 2020-08-05 14:46     ` Miklos Szeredi
  2020-08-05 15:30     ` David Howells
  1 sibling, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-05 14:46 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Linus Torvalds, Ian Kent, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, Aug 5, 2020 at 4:14 PM David Howells <dhowells@redhat.com> wrote:

> However, looking up that identifier requires some sort of structure for doing
> this and it's kind of worst case for the IDR tree as the keys are gradually
> going to spread out, causing it to eat more memory.  It may be a tradeoff
> worth making, and the memory consumption might not be that bad - or we could
> use some other data structure such as an rbtree.

idr_alloc_cyclic() seems to be a good template for doing the lower
32bit allocation, and we can add code to increment the high 32bit on
wraparound.

Lots of code uses idr_alloc_cyclic() so I guess it shouldn't be too
bad in terms of memory use or performance.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-05 14:13   ` David Howells
  2020-08-05 14:46     ` Miklos Szeredi
@ 2020-08-05 15:30     ` David Howells
  2020-08-05 19:33       ` Matthew Wilcox
  1 sibling, 1 reply; 49+ messages in thread
From: David Howells @ 2020-08-05 15:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Al Viro, Linus Torvalds, Ian Kent, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> idr_alloc_cyclic() seems to be a good template for doing the lower
> 32bit allocation, and we can add code to increment the high 32bit on
> wraparound.
> 
> Lots of code uses idr_alloc_cyclic() so I guess it shouldn't be too
> bad in terms of memory use or performance.

It's optimised for shortness of path and trades memory for performance.  It's
currently implemented using an xarray, so memory usage is dependent on the
sparseness of the tree.  Each node in the tree is 576 bytes and in the worst
case, each one node will contain one mount - and then you have to backfill the
ancestry, though for lower memory costs.

Systemd makes life more interesting since it sets up a whole load of
propagations.  Each mount you make may cause several others to be created, but
that would likely make the tree more efficient.

David


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved [ver #21]
  2020-08-03 13:37 ` [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved " David Howells
  2020-08-04 13:38   ` Miklos Szeredi
@ 2020-08-05 15:37   ` David Howells
  2020-08-05 17:19     ` Miklos Szeredi
  1 sibling, 1 reply; 49+ messages in thread
From: David Howells @ 2020-08-05 15:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, viro, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> > +	__u32	shared_group_id;	/* Shared: mount group ID */
> > +	__u32	dependent_source_id;	/* Dependent: source mount group ID */
> > +	__u32	dependent_clone_of_id;	/* Dependent: ID of mount this was cloned from */
> 
> Another set of ID's that are currently 32bit *internally* but that doesn't
> mean they will always be 32 bit.
> 
> And that last one (apart from "slave" being obfuscated)

I had "slave" in there.  It got objected to.  See
Documentation/process/coding-style.rst section 4.

> is simply incorrect.  It has nothing to do with cloning.  It's the "ID of
> the closest peer group in the propagation chain that has a representative
> mount in the current root".

You appear to be in disagreement with others that I've asked.

David


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-03 13:37 ` [PATCH 10/18] fsinfo: Provide notification overrun handling support " David Howells
  2020-08-04 13:56   ` Miklos Szeredi
@ 2020-08-05 16:06   ` David Howells
  2020-08-05 17:26     ` Miklos Szeredi
  1 sibling, 1 reply; 49+ messages in thread
From: David Howells @ 2020-08-05 16:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, viro, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Shoun't we just make sure that the likelyhood of overruns is low

That's not necessarily easy.  To avoid overruns you need a bigger buffer.  The
buffer is preallocated from unswappable kernel space.  Yes, you can increase
the size of the buffer, but it eats out of your pipe bufferage limit.

Further, it's a *general* notifications queue, not just for a specific
purpose, but that means it might get connected to multiple sources, and doing
something like tearing down a container might generate enough notifications to
overrun the queue.

> and if it happens, just reinitialize everthing from scratch (shouldn't be
> *that* expensive).

If you then spend time reinitialising everything, you're increasing the
likelihood of racing with further events.  Further, there multiple expenses:
firstly, you have to tear down and discard all the data that you've spent time
setting up; secondly, it takes time doing all this; thirdly, it takes cpu
cycles away from applications.

The reason I put the event counters in there and made it so that fsinfo()
could read all the mounts in a subtree and their event counters in one go is
to make it faster for the user to find out what changed in the event that a
notification is lost.

I have a patch (not included here as it occasionally induces oopses) that
attempts to make this doable under the RCU read lock so that it doesn't
prevent mounts from taking place during the scan.

David


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace [ver #21]
  2020-08-03 13:38 ` [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace " David Howells
  2020-08-04 14:05   ` Miklos Szeredi
@ 2020-08-05 16:44   ` David Howells
  1 sibling, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-05 16:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, viro, torvalds, raven, mszeredi, christian, jannh,
	darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> > where each element contains a once-in-a-system-lifetime unique ID, the
> > mount ID (which may get reused), the parent mount ID and sums of the
> > notification/change counters for the mount and its superblock.
> 
> The change counters are currently conditional on CONFIG_MOUNT_NOTIFICATIONS.
> Is this is intentional?

Yeah - the counters aren't driven unless CONFIG_MOUNT_NOTIFICATIONS=y.

I could perhaps make it so they're driven in both cases, but driving the
in-subtree counter is somewhat tied up in the notification posting.

This is something that can be fixed after this patchset is taken - if it is
taken since that doesn't change the UAPI.

David


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/18] VFS: Filesystem information [ver #21]
  2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
                   ` (18 preceding siblings ...)
  2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
@ 2020-08-05 17:13 ` David Howells
  19 siblings, 0 replies; 49+ messages in thread
From: David Howells @ 2020-08-05 17:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: dhowells, viro, Theodore Ts'o, Andreas Dilger, Eric Biggers,
	Jeff Layton, linux-ext4, Carlos Maiolino, Darrick J. Wong,
	linux-api, torvalds, raven, mszeredi, christian, jannh, kzak,
	jlayton, linux-fsdevel, linux-security-module, linux-kernel

James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> It sort of petered out into a long winding thread about why not use
> sysfs instead, which really doesn't look like a good idea to me.

It seemed to turn into a set of procfs symlinks that pointed at a bunch of
sysfs stuff - or possibly some special filesystem.

> Could I make a suggestion about how this should be done in a way that
> doesn't actually require the fsinfo syscall at all: it could just be
> done with fsconfig.

I'd prefer to keep it separate.  The interface for fsconfig() is intended to
move stuff into the kernel, not out of it.  Better to add a parallel syscall
to go the other way (kind of like we have setxattr/getxattr, sendmsg/recvmsg).

Further, fsinfo() can refer directly to a file/fd/mount/whatever, but
fsconfig() doesn't do that.  You have to use fspick() to get a context before
you can use fsconfig().  Now, that's fine if you want to gather several pieces
of information from a particular object, but it's not so good if you want to
get one piece of information from each of several objects.

> ... make it table configured...

I did, kind of (though I didn't call it that).  Al rewrote the code to get rid
of it.

David


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved [ver #21]
  2020-08-05 15:37   ` David Howells
@ 2020-08-05 17:19     ` Miklos Szeredi
  0 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-05 17:19 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Linus Torvalds, Ian Kent, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, Aug 5, 2020 at 5:37 PM David Howells <dhowells@redhat.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > > +   __u32   shared_group_id;        /* Shared: mount group ID */
> > > +   __u32   dependent_source_id;    /* Dependent: source mount group ID */
> > > +   __u32   dependent_clone_of_id;  /* Dependent: ID of mount this was cloned from */
> >
> > Another set of ID's that are currently 32bit *internally* but that doesn't
> > mean they will always be 32 bit.
> >
> > And that last one (apart from "slave" being obfuscated)
>
> I had "slave" in there.  It got objected to.  See
> Documentation/process/coding-style.rst section 4.
>
> > is simply incorrect.  It has nothing to do with cloning.  It's the "ID of
> > the closest peer group in the propagation chain that has a representative
> > mount in the current root".
>
> You appear to be in disagreement with others that I've asked.

Read the code.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-05 16:06   ` David Howells
@ 2020-08-05 17:26     ` Miklos Szeredi
  0 siblings, 0 replies; 49+ messages in thread
From: Miklos Szeredi @ 2020-08-05 17:26 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Linus Torvalds, Ian Kent, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, Aug 5, 2020 at 6:07 PM David Howells <dhowells@redhat.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > Shoun't we just make sure that the likelyhood of overruns is low
>
> That's not necessarily easy.  To avoid overruns you need a bigger buffer.  The
> buffer is preallocated from unswappable kernel space.  Yes, you can increase
> the size of the buffer, but it eats out of your pipe bufferage limit.
>
> Further, it's a *general* notifications queue, not just for a specific
> purpose, but that means it might get connected to multiple sources, and doing
> something like tearing down a container might generate enough notifications to
> overrun the queue.
>
> > and if it happens, just reinitialize everthing from scratch (shouldn't be
> > *that* expensive).
>
> If you then spend time reinitialising everything, you're increasing the
> likelihood of racing with further events.  Further, there multiple expenses:
> firstly, you have to tear down and discard all the data that you've spent time
> setting up; secondly, it takes time doing all this; thirdly, it takes cpu
> cycles away from applications.
>
> The reason I put the event counters in there and made it so that fsinfo()
> could read all the mounts in a subtree and their event counters in one go is
> to make it faster for the user to find out what changed in the event that a
> notification is lost.

That's just overdesigning it, IMO.

If the protocol is extensible (as you state) then the counters can be
added as needed.  And unless the above CPU cycle wastage is actually
observed in practice, the whole thing is unnecessary.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-05 15:30     ` David Howells
@ 2020-08-05 19:33       ` Matthew Wilcox
  2020-08-06  5:43         ` Ian Kent
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2020-08-05 19:33 UTC (permalink / raw)
  To: David Howells
  Cc: Miklos Szeredi, Al Viro, Linus Torvalds, Ian Kent,
	Miklos Szeredi, Christian Brauner, Jann Horn, Darrick J. Wong,
	Karel Zak, Jeff Layton, Linux API, linux-fsdevel, LSM,
	linux-kernel

On Wed, Aug 05, 2020 at 04:30:10PM +0100, David Howells wrote:
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > idr_alloc_cyclic() seems to be a good template for doing the lower
> > 32bit allocation, and we can add code to increment the high 32bit on
> > wraparound.
> > 
> > Lots of code uses idr_alloc_cyclic() so I guess it shouldn't be too
> > bad in terms of memory use or performance.
> 
> It's optimised for shortness of path and trades memory for performance.  It's
> currently implemented using an xarray, so memory usage is dependent on the
> sparseness of the tree.  Each node in the tree is 576 bytes and in the worst
> case, each one node will contain one mount - and then you have to backfill the
> ancestry, though for lower memory costs.
> 
> Systemd makes life more interesting since it sets up a whole load of
> propagations.  Each mount you make may cause several others to be created, but
> that would likely make the tree more efficient.

I would recommend using xa_alloc and ignoring the ID assigned from
xa_alloc.  Looking up by unique ID is then a matter of iterating every
mount (xa_for_each()) looking for a matching unique ID in the mount
struct.  That's O(n) search, but it's faster than a linked list, and we
don't have that many mounts in a system.

The maple tree will handle this case more effectively, but I can't
recommend waiting for that to be ready.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/18] fsinfo: Provide notification overrun handling support [ver #21]
  2020-08-05 11:27             ` Miklos Szeredi
@ 2020-08-06  1:47               ` Ian Kent
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Kent @ 2020-08-06  1:47 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, Al Viro, Linus Torvalds, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, 2020-08-05 at 13:27 +0200, Miklos Szeredi wrote:
> On Wed, Aug 5, 2020 at 1:23 PM Ian Kent <raven@themaw.net> wrote:
> > On Wed, 2020-08-05 at 09:45 +0200, Miklos Szeredi wrote:
> > > Hmm, what's the other possibility for lost notifications?
> > 
> > In user space that is:
> > 
> > Multi-threaded application races, single threaded applications and
> > signal processing races, other bugs ...
> 
> Okay, let's fix the bugs then.

It's the the bugs you don't know about that get you, in this case
the world "is" actually out to get you, ;)

Ian


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount [ver #21]
  2020-08-05 19:33       ` Matthew Wilcox
@ 2020-08-06  5:43         ` Ian Kent
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Kent @ 2020-08-06  5:43 UTC (permalink / raw)
  To: Matthew Wilcox, David Howells
  Cc: Miklos Szeredi, Al Viro, Linus Torvalds, Miklos Szeredi,
	Christian Brauner, Jann Horn, Darrick J. Wong, Karel Zak,
	Jeff Layton, Linux API, linux-fsdevel, LSM, linux-kernel

On Wed, 2020-08-05 at 20:33 +0100, Matthew Wilcox wrote:
> On Wed, Aug 05, 2020 at 04:30:10PM +0100, David Howells wrote:
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > 
> > > idr_alloc_cyclic() seems to be a good template for doing the
> > > lower
> > > 32bit allocation, and we can add code to increment the high 32bit
> > > on
> > > wraparound.
> > > 
> > > Lots of code uses idr_alloc_cyclic() so I guess it shouldn't be
> > > too
> > > bad in terms of memory use or performance.
> > 
> > It's optimised for shortness of path and trades memory for
> > performance.  It's
> > currently implemented using an xarray, so memory usage is dependent
> > on the
> > sparseness of the tree.  Each node in the tree is 576 bytes and in
> > the worst
> > case, each one node will contain one mount - and then you have to
> > backfill the
> > ancestry, though for lower memory costs.
> > 
> > Systemd makes life more interesting since it sets up a whole load
> > of
> > propagations.  Each mount you make may cause several others to be
> > created, but
> > that would likely make the tree more efficient.
> 
> I would recommend using xa_alloc and ignoring the ID assigned from
> xa_alloc.  Looking up by unique ID is then a matter of iterating
> every
> mount (xa_for_each()) looking for a matching unique ID in the mount
> struct.  That's O(n) search, but it's faster than a linked list, and
> we
> don't have that many mounts in a system.

How many is not many, 5000, 10000, I agree that 30000 plus is fairly
rare, even for the autofs direct mount case I hope the implementation
here will help to fix.

Ian


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information [ver #21]
  2020-08-03 13:36 ` [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information " David Howells
  2020-08-04 10:16   ` Miklos Szeredi
  2020-08-04 11:34   ` David Howells
@ 2020-08-27 11:27   ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 49+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-08-27 11:27 UTC (permalink / raw)
  To: David Howells
  Cc: Alexander Viro, Linux API, Linus Torvalds, Ian Kent,
	Miklos Szeredi, Christian Brauner, Jann Horn, Darrick J. Wong,
	Karel Zak, Jeff Layton, linux-fsdevel, linux-security-module,
	lkml, linux-man

Hello David,

On Mon, 3 Aug 2020 at 15:37, David Howells <dhowells@redhat.com> wrote:
>
> Add a system call to allow filesystem information to be queried.  A request
> value can be given to indicate the desired attribute.  Support is provided
> for enumerating multi-value attributes.

Do we have an up to date manual page for this system call?

Could you please (re)post to the same CC as this mail, plus linux-man@?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2020-08-27 15:23 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
2020-08-03 13:36 ` [PATCH 01/18] fsinfo: Introduce a non-repeating system-unique superblock ID " David Howells
2020-08-04  9:34   ` Miklos Szeredi
2020-08-03 13:36 ` [PATCH 02/18] fsinfo: Add fsinfo() syscall to query filesystem information " David Howells
2020-08-04 10:16   ` Miklos Szeredi
2020-08-04 11:34   ` David Howells
2020-08-27 11:27   ` Michael Kerrisk (man-pages)
2020-08-03 13:36 ` [PATCH 03/18] fsinfo: Provide a bitmap of the features a filesystem supports " David Howells
2020-08-03 13:37 ` [PATCH 04/18] fsinfo: Allow retrieval of superblock devname, options and stats " David Howells
2020-08-03 13:37 ` [PATCH 05/18] fsinfo: Allow fsinfo() to look up a mount object by ID " David Howells
2020-08-04 10:33   ` Miklos Szeredi
2020-08-03 13:37 ` [PATCH 06/18] fsinfo: Add a uniquifier ID to struct mount " David Howells
2020-08-04 10:41   ` Miklos Szeredi
2020-08-04 12:32     ` Ian Kent
2020-08-05 14:13   ` David Howells
2020-08-05 14:46     ` Miklos Szeredi
2020-08-05 15:30     ` David Howells
2020-08-05 19:33       ` Matthew Wilcox
2020-08-06  5:43         ` Ian Kent
2020-08-03 13:37 ` [PATCH 07/18] fsinfo: Allow mount information to be queried " David Howells
2020-08-03 13:37 ` [PATCH 08/18] fsinfo: Allow mount topology and propagation info to be retrieved " David Howells
2020-08-04 13:38   ` Miklos Szeredi
2020-08-05 15:37   ` David Howells
2020-08-05 17:19     ` Miklos Szeredi
2020-08-03 13:37 ` [PATCH 09/18] watch_queue: Mount event counters " David Howells
2020-08-03 13:37 ` [PATCH 10/18] fsinfo: Provide notification overrun handling support " David Howells
2020-08-04 13:56   ` Miklos Szeredi
2020-08-05  2:05     ` Ian Kent
2020-08-05  2:46       ` Ian Kent
2020-08-05  7:45         ` Miklos Szeredi
2020-08-05 11:23           ` Ian Kent
2020-08-05 11:27             ` Miklos Szeredi
2020-08-06  1:47               ` Ian Kent
2020-08-05 16:06   ` David Howells
2020-08-05 17:26     ` Miklos Szeredi
2020-08-03 13:37 ` [PATCH 11/18] fsinfo: sample: Mount listing program " David Howells
2020-08-03 13:38 ` [PATCH 12/18] fsinfo: Add API documentation " David Howells
2020-08-03 13:38 ` [PATCH 13/18] fsinfo: Add support for AFS " David Howells
2020-08-03 13:38 ` [PATCH 14/18] fsinfo: Add support to ext4 " David Howells
2020-08-03 13:38 ` [PATCH 15/18] fsinfo: Add an attribute that lists all the visible mounts in a namespace " David Howells
2020-08-04 14:05   ` Miklos Szeredi
2020-08-05  0:59     ` Ian Kent
2020-08-05 16:44   ` David Howells
2020-08-03 13:38 ` [PATCH 16/18] errseq: add a new errseq_scrape function " David Howells
2020-08-03 13:38 ` [PATCH 17/18] vfs: allow fsinfo to fetch the current state of s_wb_err " David Howells
2020-08-03 13:39 ` [PATCH 18/18] samples: add error state information to test-fsinfo.c " David Howells
2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
2020-08-04 19:18   ` Miklos Szeredi
2020-08-05 17:13 ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).