All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
@ 2021-07-27 22:37 NeilBrown
  2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
                   ` (14 more replies)
  0 siblings, 15 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

There are long-standing problems with btrfs subvols, particularly in
relation to whether and how they are exposed in the mount table.

 - /proc/self/mountinfo reports the major:minor device number for each
    filesystem and when a btrfs subvol is explicitly mounted, the number
    reported is wrong - it does not match what stat() reports for the
    mountpoint.

 - when subvol are not explicitly mounted, they don't appear in
   mountinfo at all.

Consequences include that a tool which uses stat() to find the dev of the
filesystem, then searches mountinfo for that filesystem, will not find
it.

Some tools (e.g. findmnt) appear to have been enhanced to cope with this
strangeness, but it would be best to make btrfs behave more normally.

  - nfsd cannot currently see the transition to subvol, so reports the
    main volume and all subvols to the client as being in the same
    filesystem.  As inode numbers are not unique across all subvols,
    this can confuse clients.  In particular, 'find' is likely to report a
    loop.

subvols can be made to appear in mountinfo using automounts.  However
nfsd does not cope well with automounts.  It assumes all filesystems to
be exported are already mounted.  So adding automounts to btrfs would
break nfsd.

We can enhance nfsd to understand that some automounts can be managed.
"internal mounts" where a filesystem provides an automount point and
mounts its own directories, can be handled differently by nfsd.

This series addresses all these issues.  After a few enhancements to the
VFS to provide needed support, they enhance exportfs and nfsd to cope
with the concept of internal mounts, and then enhance btrfs to provide
them.

The NFSv3 support is incomplete.  I'm not sure we can make it work
"perfectly".  A normal nfsv3 mount seem to work well enough, but if
mounted with '-o noac', it loses track of the mounted-on inode number
and complains about inode numbers changing.

My basic test for these is to mount a btrfs filesystem which contains
subvols, nfs-export it and mount it with nfsv3 and nfsv4, then run
'find' in each of the filesystem and check the contents of
/proc/self/mountinfo.

The first patch simply fixes the dev number in mountinfo and could
possibly be tagged for -stable.

NeilBrown

---

NeilBrown (11):
      VFS: show correct dev num in mountinfo
      VFS: allow d_automount to create in-place bind-mount.
      VFS: pass lookup_flags into follow_down()
      VFS: export lookup_mnt()
      VFS: new function: mount_is_internal()
      nfsd: include a vfsmount in struct svc_fh
      exportfs: Allow filehandle lookup to cross internal mount points.
      nfsd: change get_parent_attributes() to nfsd_get_mounted_on()
      nfsd: Allow filehandle lookup to cross internal mount points.
      btrfs: introduce mapping function from location to inum
      btrfs: use automount to bind-mount all subvol roots.


 fs/btrfs/btrfs_inode.h   |  12 +++
 fs/btrfs/inode.c         | 111 ++++++++++++++++++++++++++-
 fs/btrfs/super.c         |   1 +
 fs/exportfs/expfs.c      | 100 ++++++++++++++++++++----
 fs/fhandle.c             |   2 +-
 fs/internal.h            |   1 -
 fs/namei.c               |   6 +-
 fs/namespace.c           |  32 +++++++-
 fs/nfsd/export.c         |   4 +-
 fs/nfsd/nfs3xdr.c        |  40 +++++++---
 fs/nfsd/nfs4proc.c       |   9 ++-
 fs/nfsd/nfs4xdr.c        | 106 ++++++++++++-------------
 fs/nfsd/nfsfh.c          |  44 +++++++----
 fs/nfsd/nfsfh.h          |   3 +-
 fs/nfsd/nfsproc.c        |   5 +-
 fs/nfsd/vfs.c            | 162 +++++++++++++++++++++++----------------
 fs/nfsd/vfs.h            |  12 +--
 fs/nfsd/xdr4.h           |   2 +-
 fs/overlayfs/namei.c     |   5 +-
 fs/xfs/xfs_ioctl.c       |  12 ++-
 include/linux/exportfs.h |   4 +-
 include/linux/mount.h    |   4 +
 include/linux/namei.h    |   2 +-
 23 files changed, 490 insertions(+), 189 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
  2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
  2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-30  0:25   ` Al Viro
  2021-07-27 22:37 ` [PATCH 03/11] VFS: pass lookup_flags into follow_down() NeilBrown
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

/proc/$PID/mountinfo contains a field for the device number of the
filesystem at each mount.

This is taken from the superblock ->s_dev field, which is correct for
every filesystem except btrfs.  A btrfs filesystem can contain multiple
subvols which each have a different device number.  If (a directory
within) one of these subvols is mounted, the device number reported in
mountinfo will be different from the device number reported by stat().

This confuses some libraries and tools such as, historically, findmnt.
Current findmnt seems to cope with the strangeness.

So instead of using ->s_dev, call vfs_getattr_nosec() and use the ->dev
provided.  As there is no STATX flag to ask for the device number, we
pass a request mask for zero, and also ask the filesystem to avoid
syncing with any remote service.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/proc_namespace.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 392ef5162655..f342a0231e9e 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -138,10 +138,16 @@ static int show_mountinfo(struct seq_file *m, struct vfsmount *mnt)
 	struct mount *r = real_mount(mnt);
 	struct super_block *sb = mnt->mnt_sb;
 	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
+	struct kstat stat;
 	int err;
 
+	/* We only want ->dev, and there is no STATX flag for that,
+	 * so ask for nothing and assume we get ->dev
+	 */
+	vfs_getattr_nosec(&mnt_path, &stat, 0, AT_STATX_DONT_SYNC);
+
 	seq_printf(m, "%i %i %u:%u ", r->mnt_id, r->mnt_parent->mnt_id,
-		   MAJOR(sb->s_dev), MINOR(sb->s_dev));
+		   MAJOR(stat.dev), MINOR(stat.dev));
 	if (sb->s_op->show_path) {
 		err = sb->s_op->show_path(m, mnt->mnt_root);
 		if (err)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount.
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (6 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 10/11] btrfs: introduce mapping function from location to inum NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

finish_automount() prevents a mount trap from mounting a dentry onto
itself, as this could cause a loop - repeatedly automounting.  There is
nothing intrinsically wrong with this arrangement, and the d_automount
function can easily avoid the loop.  btrfs will use it to expose subvols
in the mount table.

It may well be a problem to mount a dentry onto itself when it is
already the root of the vfsmount, so narrow the test to only check that
case.

The test on mnt_sb is redundant and has been removed.  path->mnt and
path->dentry must have the same sb, so if m->mnt_root == dentry, then
m->mnt_sb must be the same as path->mnt->mnt_sb.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namespace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ab4174a3c802..81b0f2b2e701 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2928,7 +2928,7 @@ int finish_automount(struct vfsmount *m, struct path *path)
 	 */
 	BUG_ON(mnt_get_count(mnt) < 2);
 
-	if (m->mnt_sb == path->mnt->mnt_sb &&
+	if (m->mnt_root == path->mnt->mnt_root &&
 	    m->mnt_root == dentry) {
 		err = -ELOOP;
 		goto discard;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 03/11] VFS: pass lookup_flags into follow_down()
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (2 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

A future patch will want to trigger automount (LOOKUP_AUTOMOUNT) on some
follow_down calls, so allow a flag to be passed.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c            |    6 +++---
 fs/nfsd/vfs.c         |    2 +-
 include/linux/namei.h |    2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bf6d8a738c59..cea0e9b2f162 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1395,11 +1395,11 @@ EXPORT_SYMBOL(follow_down_one);
  * point, the filesystem owning that dentry may be queried as to whether the
  * caller is permitted to proceed or not.
  */
-int follow_down(struct path *path)
+int follow_down(struct path *path, unsigned int lookup_flags)
 {
 	struct vfsmount *mnt = path->mnt;
 	bool jumped;
-	int ret = traverse_mounts(path, &jumped, NULL, 0);
+	int ret = traverse_mounts(path, &jumped, NULL, lookup_flags);
 
 	if (path->mnt != mnt)
 		mntput(mnt);
@@ -2736,7 +2736,7 @@ int path_pts(struct path *path)
 
 	path->dentry = child;
 	dput(parent);
-	follow_down(path);
+	follow_down(path, 0);
 	return 0;
 }
 #endif
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index a224a5e23cc1..7c32edcfd2e9 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -65,7 +65,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 			    .dentry = dget(dentry)};
 	int err = 0;
 
-	err = follow_down(&path);
+	err = follow_down(&path, 0);
 	if (err < 0)
 		goto out;
 	if (path.mnt == exp->ex_path.mnt && path.dentry == dentry &&
diff --git a/include/linux/namei.h b/include/linux/namei.h
index be9a2b349ca7..8d47433def3c 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -70,7 +70,7 @@ extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int
 extern struct dentry *lookup_positive_unlocked(const char *, struct dentry *, int);
 
 extern int follow_down_one(struct path *);
-extern int follow_down(struct path *);
+extern int follow_down(struct path *, unsigned int);
 extern int follow_up(struct path *);
 
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 04/11] VFS: export lookup_mnt()
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
  2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-30  0:31   ` Al Viro
  2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

In order to support filehandle lookup in filesystems with internal
mounts (multiple subvols in the one filesystem) reconnect_path() in
exportfs will need to find out if a given dentry is already mounted.
This can be done with the function lookup_mnt(), so export that to make
it available.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/internal.h         |    1 -
 fs/namespace.c        |    1 +
 include/linux/mount.h |    2 ++
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/internal.h b/fs/internal.h
index 3ce8edbaa3ca..0feb2722d2e5 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -81,7 +81,6 @@ int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
 /*
  * namespace.c
  */
-extern struct vfsmount *lookup_mnt(const struct path *);
 extern int finish_automount(struct vfsmount *, struct path *);
 
 extern int sb_prepare_remount_readonly(struct super_block *);
diff --git a/fs/namespace.c b/fs/namespace.c
index 81b0f2b2e701..73bbdb921e24 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -662,6 +662,7 @@ struct vfsmount *lookup_mnt(const struct path *path)
 	rcu_read_unlock();
 	return m;
 }
+EXPORT_SYMBOL(lookup_mnt);
 
 static inline void lock_ns_list(struct mnt_namespace *ns)
 {
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 5d92a7e1a742..1d3daed88f83 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -118,6 +118,8 @@ extern unsigned int sysctl_mount_max;
 
 extern bool path_is_mountpoint(const struct path *path);
 
+extern struct vfsmount *lookup_mnt(const struct path *);
+
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 #endif /* _LINUX_MOUNT_H */



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 05/11] VFS: new function: mount_is_internal()
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (9 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-28  2:16   ` Al Viro
  2021-07-28  2:19 ` [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Al Viro
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

This patch introduces the concept of an "internal" mount which is a
mount where a filesystem has create the mount itself.

Both the mounted-on-dentry and the mount's root dentry must refer to the
same superblock (they may be the same dentry), and the mounted-on dentry
must be an automount.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namespace.c        |   29 +++++++++++++++++++++++++++++
 include/linux/mount.h |    2 ++
 2 files changed, 31 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 73bbdb921e24..a14efbccfb03 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1273,6 +1273,35 @@ bool path_is_mountpoint(const struct path *path)
 }
 EXPORT_SYMBOL(path_is_mountpoint);
 
+/**
+ * mount_is_internal() - Check if path is a mount internal to a single filesystem
+ * @mnt: vfsmount to check
+ *
+ * Some filesystems present multiple file-sets using a single
+ * superblock, such as btrfs with multiple subvolumes.  Names within a
+ * parent filesystem which lead to a subordinate filesystem are
+ * implemented as automounts so that the structure is visible in the
+ * mount table.  nfsd needs visibility into this arrangement so that it
+ * can determine if a mountpoint requires a new export, or is completely
+ * covered by an existing mount.
+ *
+ * An "internal" mount is one where the parent and child have the same
+ * superblock, and the mounted-on dentry is "managed" as an automount.  A
+ * filehandle found for an inode in the child can be looked-up using either
+ * vfsmount.
+ */
+bool mount_is_internal(struct vfsmount *mnt)
+{
+	struct mount *m = real_mount(mnt);
+
+	if (!mnt_has_parent(m))
+		return false;
+	if (m->mnt_parent->mnt.mnt_sb != m->mnt.mnt_sb)
+		return false;
+	return m->mnt_mountpoint->d_flags & DCACHE_NEED_AUTOMOUNT;
+}
+EXPORT_SYMBOL(mount_is_internal);
+
 struct vfsmount *mnt_clone_internal(const struct path *path)
 {
 	struct mount *p;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 1d3daed88f83..ab58087728ba 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -118,6 +118,8 @@ extern unsigned int sysctl_mount_max;
 
 extern bool path_is_mountpoint(const struct path *path);
 
+extern bool mount_is_internal(struct vfsmount *mnt);
+
 extern struct vfsmount *lookup_mnt(const struct path *);
 
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (4 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-27 22:37 ` [PATCH 10/11] btrfs: introduce mapping function from location to inum NeilBrown
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

A future patch will allow exportfs_decode_fh{,_raw} to return a
different vfsmount than the one passed.  This is specifically for btrfs,
but would be useful for any filesystem that presents as multiple volumes
(i.e. different st_dev, each with their own st_ino number-space).

For nfsd, this means that the mnt in the svc_export may not apply to all
filehandles reached from that export.  So svc_fh needs to store a
distinct vfsmount as well.

For now, fs->fh_mnt == fh->fh_export->ex_path.mnt, but that will change.

Changes include:
  fh_compose()
  nfsd_lookup_dentry()
     now take a *path instead of a *dentry

  nfsd4_encode_fattr()
  nfsd4_encode_fattr_to_buf()
     now take a *vfsmount as well as a *dentry

  nfsd_cross_mnt() now takes a *path instead of a **dentry
     to pass in, and get back, the mnt and dentry.

  nfsd_lookup_parent() used to take a *dentry and a **dentry.
     now it just takes a *path.  This is the *path that as passed
     to nfsd_lookup_dentry().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/export.c   |    4 +-
 fs/nfsd/nfs3xdr.c  |   22 +++++----
 fs/nfsd/nfs4proc.c |    9 ++--
 fs/nfsd/nfs4xdr.c  |   55 +++++++++++-----------
 fs/nfsd/nfsfh.c    |   30 +++++++-----
 fs/nfsd/nfsfh.h    |    3 +
 fs/nfsd/nfsproc.c  |    5 ++
 fs/nfsd/vfs.c      |  133 ++++++++++++++++++++++++++++------------------------
 fs/nfsd/vfs.h      |   10 ++--
 fs/nfsd/xdr4.h     |    2 -
 10 files changed, 150 insertions(+), 123 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 9421dae22737..e506cbe78b4f 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -1003,7 +1003,7 @@ exp_rootfh(struct net *net, struct auth_domain *clp, char *name,
 	 * fh must be initialized before calling fh_compose
 	 */
 	fh_init(&fh, maxsize);
-	if (fh_compose(&fh, exp, path.dentry, NULL))
+	if (fh_compose(&fh, exp, &path, NULL))
 		err = -EINVAL;
 	else
 		err = 0;
@@ -1178,7 +1178,7 @@ exp_pseudoroot(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	exp = rqst_find_fsidzero_export(rqstp);
 	if (IS_ERR(exp))
 		return nfserrno(PTR_ERR(exp));
-	rv = fh_compose(fhp, exp, exp->ex_path.dentry, NULL);
+	rv = fh_compose(fhp, exp, &exp->ex_path, NULL);
 	exp_put(exp);
 	return rv;
 }
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 0a5ebc52e6a9..67af0c5c1543 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -1089,36 +1089,38 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 		 const char *name, int namlen, u64 ino)
 {
 	struct svc_export	*exp;
-	struct dentry		*dparent, *dchild;
+	struct dentry		*dparent;
+	struct path		child;
 	__be32 rv = nfserr_noent;
 
 	dparent = cd->fh.fh_dentry;
 	exp  = cd->fh.fh_export;
+	child.mnt = cd->fh.fh_mnt;
 
 	if (isdotent(name, namlen)) {
 		if (namlen == 2) {
-			dchild = dget_parent(dparent);
+			child.dentry = dget_parent(dparent);
 			/*
 			 * Don't return filehandle for ".." if we're at
 			 * the filesystem or export root:
 			 */
-			if (dchild == dparent)
+			if (child.dentry == dparent)
 				goto out;
 			if (dparent == exp->ex_path.dentry)
 				goto out;
 		} else
-			dchild = dget(dparent);
+			child.dentry = dget(dparent);
 	} else
-		dchild = lookup_positive_unlocked(name, dparent, namlen);
-	if (IS_ERR(dchild))
+		child.dentry = lookup_positive_unlocked(name, dparent, namlen);
+	if (IS_ERR(child.dentry))
 		return rv;
-	if (d_mountpoint(dchild))
+	if (d_mountpoint(child.dentry))
 		goto out;
-	if (dchild->d_inode->i_ino != ino)
+	if (child.dentry->d_inode->i_ino != ino)
 		goto out;
-	rv = fh_compose(fhp, exp, dchild, &cd->fh);
+	rv = fh_compose(fhp, exp, &child, &cd->fh);
 out:
-	dput(dchild);
+	dput(child.dentry);
 	return rv;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 486c5dba4b65..743b9315cd3e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -902,7 +902,7 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	struct nfsd4_secinfo *secinfo = &u->secinfo;
 	struct svc_export *exp;
-	struct dentry *dentry;
+	struct path path;
 	__be32 err;
 
 	err = fh_verify(rqstp, &cstate->current_fh, S_IFDIR, NFSD_MAY_EXEC);
@@ -910,16 +910,16 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		return err;
 	err = nfsd_lookup_dentry(rqstp, &cstate->current_fh,
 				    secinfo->si_name, secinfo->si_namelen,
-				    &exp, &dentry);
+				    &exp, &path);
 	if (err)
 		return err;
 	fh_unlock(&cstate->current_fh);
-	if (d_really_is_negative(dentry)) {
+	if (d_really_is_negative(path.dentry)) {
 		exp_put(exp);
 		err = nfserr_noent;
 	} else
 		secinfo->si_exp = exp;
-	dput(dentry);
+	path_put(&path);
 	if (cstate->minorversion)
 		/* See rfc 5661 section 2.6.3.1.1.8 */
 		fh_put(&cstate->current_fh);
@@ -1930,6 +1930,7 @@ _nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	p = buf;
 	status = nfsd4_encode_fattr_to_buf(&p, count, &cstate->current_fh,
 				    cstate->current_fh.fh_export,
+				    cstate->current_fh.fh_mnt,
 				    cstate->current_fh.fh_dentry,
 				    verify->ve_bmval,
 				    rqstp, 0);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 7abeccb975b2..21c277fa28ae 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2823,9 +2823,9 @@ nfsd4_encode_bitmap(struct xdr_stream *xdr, u32 bmval0, u32 bmval1, u32 bmval2)
  */
 static __be32
 nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
-		struct svc_export *exp,
-		struct dentry *dentry, u32 *bmval,
-		struct svc_rqst *rqstp, int ignore_crossmnt)
+		   struct svc_export *exp,
+		   struct vfsmount *mnt, struct dentry *dentry,
+		   u32 *bmval, struct svc_rqst *rqstp, int ignore_crossmnt)
 {
 	u32 bmval0 = bmval[0];
 	u32 bmval1 = bmval[1];
@@ -2851,7 +2851,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 	struct nfsd4_compoundres *resp = rqstp->rq_resp;
 	u32 minorversion = resp->cstate.minorversion;
 	struct path path = {
-		.mnt	= exp->ex_path.mnt,
+		.mnt	= mnt,
 		.dentry	= dentry,
 	};
 	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
@@ -2882,7 +2882,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		if (!tempfh)
 			goto out;
 		fh_init(tempfh, NFS4_FHSIZE);
-		status = fh_compose(tempfh, exp, dentry, NULL);
+		status = fh_compose(tempfh, exp, &path, NULL);
 		if (status)
 			goto out;
 		fhp = tempfh;
@@ -3274,13 +3274,12 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
-                	goto out_resource;
+			goto out_resource;
 		/*
 		 * Get parent's attributes if not ignoring crossmount
 		 * and this is the root of a cross-mounted filesystem.
 		 */
-		if (ignore_crossmnt == 0 &&
-		    dentry == exp->ex_path.mnt->mnt_root) {
+		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root) {
 			err = get_parent_attributes(exp, &parent_stat);
 			if (err)
 				goto out_nfserr;
@@ -3380,17 +3379,18 @@ static void svcxdr_init_encode_from_buffer(struct xdr_stream *xdr,
 }
 
 __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
-			struct svc_fh *fhp, struct svc_export *exp,
-			struct dentry *dentry, u32 *bmval,
-			struct svc_rqst *rqstp, int ignore_crossmnt)
+				 struct svc_fh *fhp, struct svc_export *exp,
+				 struct vfsmount *mnt, struct dentry *dentry,
+				 u32 *bmval, struct svc_rqst *rqstp,
+				 int ignore_crossmnt)
 {
 	struct xdr_buf dummy;
 	struct xdr_stream xdr;
 	__be32 ret;
 
 	svcxdr_init_encode_from_buffer(&xdr, &dummy, *p, words << 2);
-	ret = nfsd4_encode_fattr(&xdr, fhp, exp, dentry, bmval, rqstp,
-							ignore_crossmnt);
+	ret = nfsd4_encode_fattr(&xdr, fhp, exp, mnt, dentry, bmval, rqstp,
+				 ignore_crossmnt);
 	*p = xdr.p;
 	return ret;
 }
@@ -3409,14 +3409,16 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 			const char *name, int namlen)
 {
 	struct svc_export *exp = cd->rd_fhp->fh_export;
-	struct dentry *dentry;
+	struct path path;
 	__be32 nfserr;
 	int ignore_crossmnt = 0;
 
-	dentry = lookup_positive_unlocked(name, cd->rd_fhp->fh_dentry, namlen);
-	if (IS_ERR(dentry))
-		return nfserrno(PTR_ERR(dentry));
+	path.dentry = lookup_positive_unlocked(name, cd->rd_fhp->fh_dentry,
+					      namlen);
+	if (IS_ERR(path.dentry))
+		return nfserrno(PTR_ERR(path.dentry));
 
+	path.mnt = mntget(cd->rd_fhp->fh_mnt);
 	exp_get(exp);
 	/*
 	 * In the case of a mountpoint, the client may be asking for
@@ -3425,7 +3427,7 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 	 * we will not follow the cross mount and will fill the attribtutes
 	 * directly from the mountpoint dentry.
 	 */
-	if (nfsd_mountpoint(dentry, exp)) {
+	if (nfsd_mountpoint(path.dentry, exp)) {
 		int err;
 
 		if (!(exp->ex_flags & NFSEXP_V4ROOT)
@@ -3434,11 +3436,11 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 			goto out_encode;
 		}
 		/*
-		 * Why the heck aren't we just using nfsd_lookup??
+		 * Why the heck aren't we just using nfsd_lookup_dentry??
 		 * Different "."/".." handling?  Something else?
 		 * At least, add a comment here to explain....
 		 */
-		err = nfsd_cross_mnt(cd->rd_rqstp, &dentry, &exp);
+		err = nfsd_cross_mnt(cd->rd_rqstp, &path, &exp);
 		if (err) {
 			nfserr = nfserrno(err);
 			goto out_put;
@@ -3446,13 +3448,13 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 		nfserr = check_nfsd_access(exp, cd->rd_rqstp);
 		if (nfserr)
 			goto out_put;
-
 	}
 out_encode:
-	nfserr = nfsd4_encode_fattr(xdr, NULL, exp, dentry, cd->rd_bmval,
-					cd->rd_rqstp, ignore_crossmnt);
+	nfserr = nfsd4_encode_fattr(xdr, NULL, exp, path.mnt, path.dentry,
+				    cd->rd_bmval, cd->rd_rqstp,
+				    ignore_crossmnt);
 out_put:
-	dput(dentry);
+	path_put(&path);
 	exp_put(exp);
 	return nfserr;
 }
@@ -3651,8 +3653,9 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
 	struct svc_fh *fhp = getattr->ga_fhp;
 	struct xdr_stream *xdr = resp->xdr;
 
-	return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry,
-				    getattr->ga_bmval, resp->rqstp, 0);
+	return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export,
+				  fhp->fh_mnt, fhp->fh_dentry,
+				  getattr->ga_bmval, resp->rqstp, 0);
 }
 
 static __be32
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c475d2271f9c..0bf7ac13ae50 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -299,6 +299,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	}
 
 	fhp->fh_dentry = dentry;
+	fhp->fh_mnt = mntget(exp->ex_path.mnt);
 	fhp->fh_export = exp;
 
 	switch (rqstp->rq_vers) {
@@ -556,7 +557,7 @@ static void set_version_and_fsid_type(struct svc_fh *fhp, struct svc_export *exp
 }
 
 __be32
-fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
+fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct path *path,
 	   struct svc_fh *ref_fh)
 {
 	/* ref_fh is a reference file handle.
@@ -567,13 +568,13 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 	 *
 	 */
 
-	struct inode * inode = d_inode(dentry);
+	struct inode * inode = d_inode(path->dentry);
 	dev_t ex_dev = exp_sb(exp)->s_dev;
 
 	dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
 		MAJOR(ex_dev), MINOR(ex_dev),
 		(long) d_inode(exp->ex_path.dentry)->i_ino,
-		dentry,
+		path->dentry,
 		(inode ? inode->i_ino : 0));
 
 	/* Choose filehandle version and fsid type based on
@@ -590,14 +591,15 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 
 	if (fhp->fh_locked || fhp->fh_dentry) {
 		printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n",
-		       dentry);
+		       path->dentry);
 	}
 	if (fhp->fh_maxsize < NFS_FHSIZE)
 		printk(KERN_ERR "fh_compose: called with maxsize %d! %pd2\n",
 		       fhp->fh_maxsize,
-		       dentry);
+		       path->dentry);
 
-	fhp->fh_dentry = dget(dentry); /* our internal copy */
+	fhp->fh_dentry = dget(path->dentry); /* our internal copy */
+	fhp->fh_mnt = mntget(path->mnt);
 	fhp->fh_export = exp_get(exp);
 
 	if (fhp->fh_handle.fh_version == 0xca) {
@@ -609,9 +611,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 		fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev;
 		fhp->fh_handle.ofh_xino =
 			ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
-		fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry));
+		fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(path->dentry));
 		if (inode)
-			_fh_update_old(dentry, exp, &fhp->fh_handle);
+			_fh_update_old(path->dentry, exp, &fhp->fh_handle);
 	} else {
 		fhp->fh_handle.fh_size =
 			key_len(fhp->fh_handle.fh_fsid_type) + 4;
@@ -624,7 +626,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 			exp->ex_fsid, exp->ex_uuid);
 
 		if (inode)
-			_fh_update(fhp, exp, dentry);
+			_fh_update(fhp, exp, path->dentry);
 		if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
 			fh_put(fhp);
 			return nfserr_opnotsupp;
@@ -675,8 +677,10 @@ fh_update(struct svc_fh *fhp)
 void
 fh_put(struct svc_fh *fhp)
 {
-	struct dentry * dentry = fhp->fh_dentry;
-	struct svc_export * exp = fhp->fh_export;
+	struct dentry *dentry = fhp->fh_dentry;
+	struct svc_export *exp = fhp->fh_export;
+	struct vfsmount *mnt = fhp->fh_mnt;
+
 	if (dentry) {
 		fh_unlock(fhp);
 		fhp->fh_dentry = NULL;
@@ -684,6 +688,10 @@ fh_put(struct svc_fh *fhp)
 		fh_clear_wcc(fhp);
 	}
 	fh_drop_write(fhp);
+	if (mnt) {
+		mntput(mnt);
+		fhp->fh_mnt = NULL;
+	}
 	if (exp) {
 		exp_put(exp);
 		fhp->fh_export = NULL;
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 6106697adc04..26c02209babd 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -31,6 +31,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
 typedef struct svc_fh {
 	struct knfsd_fh		fh_handle;	/* FH data */
 	int			fh_maxsize;	/* max size for fh_handle */
+	struct vfsmount	*	fh_mnt;		/* mnt, possibly of subvol */
 	struct dentry *		fh_dentry;	/* validated dentry */
 	struct svc_export *	fh_export;	/* export pointer */
 
@@ -171,7 +172,7 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
  * Function prototypes
  */
 __be32	fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
-__be32	fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
+__be32	fh_compose(struct svc_fh *, struct svc_export *, struct path *, struct svc_fh *);
 __be32	fh_update(struct svc_fh *);
 void	fh_put(struct svc_fh *);
 
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 60d7c59e7935..245199b0e630 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -268,6 +268,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 	struct iattr	*attr = &argp->attrs;
 	struct inode	*inode;
 	struct dentry	*dchild;
+	struct path	path;
 	int		type, mode;
 	int		hosterr;
 	dev_t		rdev = 0, wanted = new_decode_dev(attr->ia_size);
@@ -298,7 +299,9 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 		goto out_unlock;
 	}
 	fh_init(newfhp, NFS_FHSIZE);
-	resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
+	path.mnt = dirfhp->fh_mnt;
+	path.dentry = dchild;
+	resp->status = fh_compose(newfhp, dirfhp->fh_export, &path, dirfhp);
 	if (!resp->status && d_really_is_negative(dchild))
 		resp->status = nfserr_noent;
 	dput(dchild);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 7c32edcfd2e9..c0c6920f25a4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -49,27 +49,26 @@
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
-/* 
- * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
+/*
+ * Called from nfsd_lookup and encode_dirent. Check if we have crossed
  * a mount point.
- * Returns -EAGAIN or -ETIMEDOUT leaving *dpp and *expp unchanged,
- *  or nfs_ok having possibly changed *dpp and *expp
+ * Returns -EAGAIN or -ETIMEDOUT leaving *path and *expp unchanged,
+ *  or nfs_ok having possibly changed *path and *expp
  */
 int
-nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, 
-		        struct svc_export **expp)
+nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
+	       struct svc_export **expp)
 {
 	struct svc_export *exp = *expp, *exp2 = NULL;
-	struct dentry *dentry = *dpp;
-	struct path path = {.mnt = mntget(exp->ex_path.mnt),
-			    .dentry = dget(dentry)};
+	struct path path = {.mnt = mntget(path_parent->mnt),
+			    .dentry = dget(path_parent->dentry)};
 	int err = 0;
 
 	err = follow_down(&path, 0);
 	if (err < 0)
 		goto out;
-	if (path.mnt == exp->ex_path.mnt && path.dentry == dentry &&
-	    nfsd_mountpoint(dentry, exp) == 2) {
+	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
+	    nfsd_mountpoint(path.dentry, exp) == 2) {
 		/* This is only a mountpoint in some other namespace */
 		path_put(&path);
 		goto out;
@@ -93,19 +92,14 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 	if (nfsd_v4client(rqstp) ||
 		(exp->ex_flags & NFSEXP_CROSSMOUNT) || EX_NOHIDE(exp2)) {
 		/* successfully crossed mount point */
-		/*
-		 * This is subtle: path.dentry is *not* on path.mnt
-		 * at this point.  The only reason we are safe is that
-		 * original mnt is pinned down by exp, so we should
-		 * put path *before* putting exp
-		 */
-		*dpp = path.dentry;
-		path.dentry = dentry;
+		path_put(path_parent);
+		*path_parent = path;
+		exp_put(exp);
 		*expp = exp2;
-		exp2 = exp;
+	} else {
+		path_put(&path);
+		exp_put(exp2);
 	}
-	path_put(&path);
-	exp_put(exp2);
 out:
 	return err;
 }
@@ -121,27 +115,30 @@ static void follow_to_parent(struct path *path)
 	path->dentry = dp;
 }
 
-static int nfsd_lookup_parent(struct svc_rqst *rqstp, struct dentry *dparent, struct svc_export **exp, struct dentry **dentryp)
+static int nfsd_lookup_parent(struct svc_rqst *rqstp, struct svc_export **exp,
+			      struct path *path)
 {
+	struct path path2;
 	struct svc_export *exp2;
-	struct path path = {.mnt = mntget((*exp)->ex_path.mnt),
-			    .dentry = dget(dparent)};
 
-	follow_to_parent(&path);
-
-	exp2 = rqst_exp_parent(rqstp, &path);
+	path2 = *path;
+	path_get(&path2);
+	follow_to_parent(&path2);
+	exp2 = rqst_exp_parent(rqstp, path);
 	if (PTR_ERR(exp2) == -ENOENT) {
-		*dentryp = dget(dparent);
+		/* leave path unchanged */
+		path_put(&path2);
+		return 0;
 	} else if (IS_ERR(exp2)) {
-		path_put(&path);
+		path_put(&path2);
 		return PTR_ERR(exp2);
 	} else {
-		*dentryp = dget(path.dentry);
+		path_put(path);
+		*path = path2;
 		exp_put(*exp);
 		*exp = exp2;
+		return 0;
 	}
-	path_put(&path);
-	return 0;
 }
 
 /*
@@ -172,29 +169,32 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
 __be32
 nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		   const char *name, unsigned int len,
-		   struct svc_export **exp_ret, struct dentry **dentry_ret)
+		   struct svc_export **exp_ret, struct path *ret)
 {
 	struct svc_export	*exp;
 	struct dentry		*dparent;
-	struct dentry		*dentry;
 	int			host_err;
 
 	dprintk("nfsd: nfsd_lookup(fh %s, %.*s)\n", SVCFH_fmt(fhp), len,name);
 
 	dparent = fhp->fh_dentry;
+	ret->mnt = mntget(fhp->fh_mnt);
 	exp = exp_get(fhp->fh_export);
 
 	/* Lookup the name, but don't follow links */
 	if (isdotent(name, len)) {
 		if (len==1)
-			dentry = dget(dparent);
+			ret->dentry = dget(dparent);
 		else if (dparent != exp->ex_path.dentry)
-			dentry = dget_parent(dparent);
+			ret->dentry = dget_parent(dparent);
 		else if (!EX_NOHIDE(exp) && !nfsd_v4client(rqstp))
-			dentry = dget(dparent); /* .. == . just like at / */
+			ret->dentry = dget(dparent); /* .. == . just like at / */
 		else {
-			/* checking mountpoint crossing is very different when stepping up */
-			host_err = nfsd_lookup_parent(rqstp, dparent, &exp, &dentry);
+			/* checking mountpoint crossing is very different when
+			 * stepping up
+			 */
+			ret->dentry = dget(dparent);
+			host_err = nfsd_lookup_parent(rqstp, &exp, ret);
 			if (host_err)
 				goto out_nfserr;
 		}
@@ -205,11 +205,13 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		 * need to take the child's i_mutex:
 		 */
 		fh_lock_nested(fhp, I_MUTEX_PARENT);
-		dentry = lookup_one_len(name, dparent, len);
-		host_err = PTR_ERR(dentry);
-		if (IS_ERR(dentry))
+		ret->dentry = lookup_one_len(name, dparent, len);
+		host_err = PTR_ERR(ret->dentry);
+		if (IS_ERR(ret->dentry)) {
+			ret->dentry = NULL;
 			goto out_nfserr;
-		if (nfsd_mountpoint(dentry, exp)) {
+		}
+		if (nfsd_mountpoint(ret->dentry, exp)) {
 			/*
 			 * We don't need the i_mutex after all.  It's
 			 * still possible we could open this (regular
@@ -219,18 +221,16 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			 * and a mountpoint won't be renamed:
 			 */
 			fh_unlock(fhp);
-			if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) {
-				dput(dentry);
+			if ((host_err = nfsd_cross_mnt(rqstp, ret, &exp)))
 				goto out_nfserr;
-			}
 		}
 	}
-	*dentry_ret = dentry;
 	*exp_ret = exp;
 	return 0;
 
 out_nfserr:
 	exp_put(exp);
+	path_put(ret);
 	return nfserrno(host_err);
 }
 
@@ -251,13 +251,13 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 				unsigned int len, struct svc_fh *resfh)
 {
 	struct svc_export	*exp;
-	struct dentry		*dentry;
+	struct path		path;
 	__be32 err;
 
 	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
 	if (err)
 		return err;
-	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry);
+	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &path);
 	if (err)
 		return err;
 	err = check_nfsd_access(exp, rqstp);
@@ -267,11 +267,11 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 	 * Note: we compose the file handle now, but as the
 	 * dentry may be negative, it may need to be updated.
 	 */
-	err = fh_compose(resfh, exp, dentry, fhp);
-	if (!err && d_really_is_negative(dentry))
+	err = fh_compose(resfh, exp, &path, fhp);
+	if (!err && d_really_is_negative(path.dentry))
 		err = nfserr_noent;
 out:
-	dput(dentry);
+	path_put(&path);
 	exp_put(exp);
 	return err;
 }
@@ -740,7 +740,7 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	__be32		err;
 	int		host_err = 0;
 
-	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.mnt = fhp->fh_mnt;
 	path.dentry = fhp->fh_dentry;
 	inode = d_inode(path.dentry);
 
@@ -1350,6 +1350,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		int type, dev_t rdev, struct svc_fh *resfhp)
 {
 	struct dentry	*dentry, *dchild = NULL;
+	struct path	path;
 	__be32		err;
 	int		host_err;
 
@@ -1371,7 +1372,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	host_err = PTR_ERR(dchild);
 	if (IS_ERR(dchild))
 		return nfserrno(host_err);
-	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
+	path.mnt = fhp->fh_mnt;
+	path.dentry = dchild;
+	err = fh_compose(resfhp, fhp->fh_export, &path, fhp);
 	/*
 	 * We unconditionally drop our ref to dchild as fh_compose will have
 	 * already grabbed its own ref for it.
@@ -1390,11 +1393,12 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 __be32
 do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		char *fname, int flen, struct iattr *iap,
-		struct svc_fh *resfhp, int createmode, u32 *verifier,
-	        bool *truncp, bool *created)
+	       char *fname, int flen, struct iattr *iap,
+	       struct svc_fh *resfhp, int createmode, u32 *verifier,
+	       bool *truncp, bool *created)
 {
 	struct dentry	*dentry, *dchild = NULL;
+	struct path	path;
 	struct inode	*dirp;
 	__be32		err;
 	int		host_err;
@@ -1436,7 +1440,9 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			goto out;
 	}
 
-	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
+	path.mnt = fhp->fh_mnt;
+	path.dentry = dchild;
+	err = fh_compose(resfhp, fhp->fh_export, &path, fhp);
 	if (err)
 		goto out;
 
@@ -1569,7 +1575,7 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
 	if (unlikely(err))
 		return err;
 
-	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.mnt = fhp->fh_mnt;
 	path.dentry = fhp->fh_dentry;
 
 	if (unlikely(!d_is_symlink(path.dentry)))
@@ -1600,6 +1606,7 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct svc_fh *resfhp)
 {
 	struct dentry	*dentry, *dnew;
+	struct path	pathnew;
 	__be32		err, cerr;
 	int		host_err;
 
@@ -1633,7 +1640,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	fh_drop_write(fhp);
 
-	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
+	pathnew.mnt = fhp->fh_mnt;
+	pathnew.dentry = dnew;
+	cerr = fh_compose(resfhp, fhp->fh_export, &pathnew, fhp);
 	dput(dnew);
 	if (err==0) err = cerr;
 out:
@@ -2107,7 +2116,7 @@ nfsd_statfs(struct svc_rqst *rqstp, struct svc_fh *fhp, struct kstatfs *stat, in
 	err = fh_verify(rqstp, fhp, 0, NFSD_MAY_NOP | access);
 	if (!err) {
 		struct path path = {
-			.mnt	= fhp->fh_export->ex_path.mnt,
+			.mnt	= fhp->fh_mnt,
 			.dentry	= fhp->fh_dentry,
 		};
 		if (vfs_statfs(&path, stat))
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index b21b76e6b9a8..52f587716208 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -42,13 +42,13 @@ struct nfsd_file;
 typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
 
 /* nfsd/vfs.c */
-int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
+int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *,
 		                struct svc_export **expp);
 __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
 				const char *, unsigned int, struct svc_fh *);
 __be32		 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *,
 				const char *, unsigned int,
-				struct svc_export **, struct dentry **);
+				struct svc_export **, struct path *);
 __be32		nfsd_setattr(struct svc_rqst *, struct svc_fh *,
 				struct iattr *, int, time64_t);
 int nfsd_mountpoint(struct dentry *, struct svc_export *);
@@ -138,7 +138,7 @@ static inline int fh_want_write(struct svc_fh *fh)
 
 	if (fh->fh_want_write)
 		return 0;
-	ret = mnt_want_write(fh->fh_export->ex_path.mnt);
+	ret = mnt_want_write(fh->fh_mnt);
 	if (!ret)
 		fh->fh_want_write = true;
 	return ret;
@@ -148,13 +148,13 @@ static inline void fh_drop_write(struct svc_fh *fh)
 {
 	if (fh->fh_want_write) {
 		fh->fh_want_write = false;
-		mnt_drop_write(fh->fh_export->ex_path.mnt);
+		mnt_drop_write(fh->fh_mnt);
 	}
 }
 
 static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat)
 {
-	struct path p = {.mnt = fh->fh_export->ex_path.mnt,
+	struct path p = {.mnt = fh->fh_mnt,
 			 .dentry = fh->fh_dentry};
 	return nfserrno(vfs_getattr(&p, stat, STATX_BASIC_STATS,
 				    AT_STATX_SYNC_AS_STAT));
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 3e4052e3bd50..8934db5113ac 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -763,7 +763,7 @@ void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *);
 void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op);
 __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
 		struct svc_fh *fhp, struct svc_export *exp,
-		struct dentry *dentry,
+		struct vfsmount *mnt, struct dentry *dentry,
 		u32 *bmval, struct svc_rqst *, int ignore_crossmnt);
 extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *, union nfsd4_op_u *u);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-28 10:13   ` Amir Goldstein
  2021-07-28 19:17   ` J. Bruce Fields
  2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

When a filesystem has internal mounts, it controls the filehandles
across all those mounts (subvols) in the filesystem.  So it is useful to
be able to look up a filehandle again one mount, and get a result which
is in a different mount (part of the same overall file system).

This patch makes that possible by changing export_decode_fh() and
export_decode_fh_raw() to take a vfsmount pointer by reference, and
possibly change the vfsmount pointed to before returning.

The core of the change is in reconnect_path() which now not only checks
that the dentry is fully connected, but also that the vfsmnt reported
has the same 'dev' (reported by vfs_getattr) as the dentry.
If it doesn't, we walk up the dparent() chain to find the highest place
where the dev changes without there being a mount point, and trigger an
automount there.

As no filesystems yet provide local-mounts, this does not yet change any
behaviour.

In exportfs_decode_fh_raw() we previously tested for DCACHE_DISCONNECT
before calling reconnect_path().  That test is dropped.  It was only a
minor optimisation and is now inconvenient.

The change in overlayfs needs more careful thought than I have yet given
it.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/exportfs/expfs.c      |  100 +++++++++++++++++++++++++++++++++++++++-------
 fs/fhandle.c             |    2 -
 fs/nfsd/nfsfh.c          |    9 +++-
 fs/overlayfs/namei.c     |    5 ++
 fs/xfs/xfs_ioctl.c       |   12 ++++--
 include/linux/exportfs.h |    4 +-
 6 files changed, 106 insertions(+), 26 deletions(-)

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..2d7c42137b49 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -207,11 +207,18 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
  * that case reconnect_path may still succeed with target_dir fully
  * connected, but further operations using the filehandle will fail when
  * necessary (due to S_DEAD being set on the directory).
+ *
+ * If the filesystem supports multiple subvols, then *mntp may be updated
+ * to a subordinate mount point on the same filesystem.
  */
 static int
-reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
+reconnect_path(struct vfsmount **mntp, struct dentry *target_dir, char *nbuf)
 {
+	struct vfsmount *mnt = *mntp;
+	struct path path;
 	struct dentry *dentry, *parent;
+	struct kstat stat;
+	dev_t target_dev;
 
 	dentry = dget(target_dir);
 
@@ -232,6 +239,68 @@ reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
 	}
 	dput(dentry);
 	clear_disconnected(target_dir);
+
+	/* Need to find appropriate vfsmount, which might not exist yet.
+	 * We may need to trigger automount points.
+	 */
+	path.mnt = mnt;
+	path.dentry = target_dir;
+	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+	target_dev = stat.dev;
+
+	path.dentry = mnt->mnt_root;
+	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+
+	while (stat.dev != target_dev) {
+		/* walk up the dcache tree from target_dir, recording the
+		 * location of the most recent change in dev number,
+		 * until we find a mountpoint.
+		 * If there was no change in show_dev result before the
+		 * mountpount, the vfsmount at the mountpoint is what we want.
+		 * If there was, we need to trigger an automount where the
+		 * show_dev() result changed.
+		 */
+		struct dentry *last_change = NULL;
+		dev_t last_dev = target_dev;
+
+		dentry = dget(target_dir);
+		while ((parent = dget_parent(dentry)) != dentry) {
+			path.dentry = parent;
+			vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+			if (stat.dev != last_dev) {
+				path.dentry = dentry;
+				mnt = lookup_mnt(&path);
+				if (mnt) {
+					mntput(path.mnt);
+					path.mnt = mnt;
+					break;
+				}
+				dput(last_change);
+				last_change = dget(dentry);
+				last_dev = stat.dev;
+			}
+			dput(dentry);
+			dentry = parent;
+		}
+		dput(dentry); dput(parent);
+
+		if (!last_change)
+			break;
+
+		mnt = path.mnt;
+		path.dentry = last_change;
+		follow_down(&path, LOOKUP_AUTOMOUNT);
+		dput(path.dentry);
+		if (path.mnt == mnt)
+			/* There should have been a mount-trap there,
+			 * but there wasn't.  Just give up.
+			 */
+			break;
+
+		path.dentry = mnt->mnt_root;
+		vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+	}
+	*mntp = path.mnt;
 	return 0;
 }
 
@@ -418,12 +487,13 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
 EXPORT_SYMBOL_GPL(exportfs_encode_fh);
 
 struct dentry *
-exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
+exportfs_decode_fh_raw(struct vfsmount **mntp, struct fid *fid, int fh_len,
 		       int fileid_type,
 		       int (*acceptable)(void *, struct dentry *),
 		       void *context)
 {
-	const struct export_operations *nop = mnt->mnt_sb->s_export_op;
+	struct super_block *sb = (*mntp)->mnt_sb;
+	const struct export_operations *nop = sb->s_export_op;
 	struct dentry *result, *alias;
 	char nbuf[NAME_MAX+1];
 	int err;
@@ -433,7 +503,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 	 */
 	if (!nop || !nop->fh_to_dentry)
 		return ERR_PTR(-ESTALE);
-	result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
+	result = nop->fh_to_dentry(sb, fid, fh_len, fileid_type);
 	if (IS_ERR_OR_NULL(result))
 		return result;
 
@@ -452,14 +522,12 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 		 *
 		 * On the positive side there is only one dentry for each
 		 * directory inode.  On the negative side this implies that we
-		 * to ensure our dentry is connected all the way up to the
+		 * need to ensure our dentry is connected all the way up to the
 		 * filesystem root.
 		 */
-		if (result->d_flags & DCACHE_DISCONNECTED) {
-			err = reconnect_path(mnt, result, nbuf);
-			if (err)
-				goto err_result;
-		}
+		err = reconnect_path(mntp, result, nbuf);
+		if (err)
+			goto err_result;
 
 		if (!acceptable(context, result)) {
 			err = -EACCES;
@@ -494,7 +562,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 		if (!nop->fh_to_parent)
 			goto err_result;
 
-		target_dir = nop->fh_to_parent(mnt->mnt_sb, fid,
+		target_dir = nop->fh_to_parent(sb, fid,
 				fh_len, fileid_type);
 		if (!target_dir)
 			goto err_result;
@@ -507,7 +575,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 		 * connected to the filesystem root.  The VFS really doesn't
 		 * like disconnected directories..
 		 */
-		err = reconnect_path(mnt, target_dir, nbuf);
+		err = reconnect_path(mntp, target_dir, nbuf);
 		if (err) {
 			dput(target_dir);
 			goto err_result;
@@ -518,7 +586,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 		 * dentry for the inode we're after, make sure that our
 		 * inode is actually connected to the parent.
 		 */
-		err = exportfs_get_name(mnt, target_dir, nbuf, result);
+		err = exportfs_get_name(*mntp, target_dir, nbuf, result);
 		if (err) {
 			dput(target_dir);
 			goto err_result;
@@ -556,7 +624,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 			goto err_result;
 		}
 
-		return alias;
+		return result;
 	}
 
  err_result:
@@ -565,14 +633,14 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
 }
 EXPORT_SYMBOL_GPL(exportfs_decode_fh_raw);
 
-struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
+struct dentry *exportfs_decode_fh(struct vfsmount **mntp, struct fid *fid,
 				  int fh_len, int fileid_type,
 				  int (*acceptable)(void *, struct dentry *),
 				  void *context)
 {
 	struct dentry *ret;
 
-	ret = exportfs_decode_fh_raw(mnt, fid, fh_len, fileid_type,
+	ret = exportfs_decode_fh_raw(mntp, fid, fh_len, fileid_type,
 				     acceptable, context);
 	if (IS_ERR_OR_NULL(ret)) {
 		if (ret == ERR_PTR(-ENOMEM))
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 6630c69c23a2..b47c7696469f 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -149,7 +149,7 @@ static int do_handle_to_path(int mountdirfd, struct file_handle *handle,
 	}
 	/* change the handle size to multiple of sizeof(u32) */
 	handle_dwords = handle->handle_bytes >> 2;
-	path->dentry = exportfs_decode_fh(path->mnt,
+	path->dentry = exportfs_decode_fh(&path->mnt,
 					  (struct fid *)handle->f_handle,
 					  handle_dwords, handle->handle_type,
 					  vfs_dentry_acceptable, NULL);
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 0bf7ac13ae50..4023046f63e2 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -157,6 +157,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	struct fid *fid = NULL, sfid;
 	struct svc_export *exp;
 	struct dentry *dentry;
+	struct vfsmount *mnt = NULL;
 	int fileid_type;
 	int data_left = fh->fh_size/4;
 	__be32 error;
@@ -253,6 +254,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	if (rqstp->rq_vers > 2)
 		error = nfserr_badhandle;
 
+	mnt = mntget(exp->ex_path.mnt);
+
 	if (fh->fh_version != 1) {
 		sfid.i32.ino = fh->ofh_ino;
 		sfid.i32.gen = fh->ofh_generation;
@@ -269,7 +272,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	if (fileid_type == FILEID_ROOT)
 		dentry = dget(exp->ex_path.dentry);
 	else {
-		dentry = exportfs_decode_fh_raw(exp->ex_path.mnt, fid,
+		dentry = exportfs_decode_fh_raw(&mnt, fid,
 						data_left, fileid_type,
 						nfsd_acceptable, exp);
 		if (IS_ERR_OR_NULL(dentry)) {
@@ -299,7 +302,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	}
 
 	fhp->fh_dentry = dentry;
-	fhp->fh_mnt = mntget(exp->ex_path.mnt);
+	fhp->fh_mnt = mnt;
 	fhp->fh_export = exp;
 
 	switch (rqstp->rq_vers) {
@@ -317,6 +320,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 
 	return 0;
 out:
+	mntput(mnt);
 	exp_put(exp);
 	return error;
 }
@@ -428,7 +432,6 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 	return error;
 }
 
-
 /*
  * Compose a file handle for an NFS reply.
  *
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 210cd6f66e28..0bca19f6df54 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -155,6 +155,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fs *ofs, struct ovl_fh *fh,
 {
 	struct dentry *real;
 	int bytes;
+	struct vfsmount *mnt2;
 
 	if (!capable(CAP_DAC_READ_SEARCH))
 		return NULL;
@@ -169,9 +170,11 @@ struct dentry *ovl_decode_real_fh(struct ovl_fs *ofs, struct ovl_fh *fh,
 		return NULL;
 
 	bytes = (fh->fb.len - offsetof(struct ovl_fb, fid));
-	real = exportfs_decode_fh(mnt, (struct fid *)fh->fb.fid,
+	mnt2 = mntget(mnt);
+	real = exportfs_decode_fh(&mnt2, (struct fid *)fh->fb.fid,
 				  bytes >> 2, (int)fh->fb.type,
 				  connected ? ovl_acceptable : NULL, mnt);
+	mntput(mnt2);
 	if (IS_ERR(real)) {
 		/*
 		 * Treat stale file handle to lower file as "origin unknown".
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 16039ea10ac9..76eb7d540811 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -149,6 +149,8 @@ xfs_handle_to_dentry(
 {
 	xfs_handle_t		handle;
 	struct xfs_fid64	fid;
+	struct dentry		*ret;
+	struct vfsmount		*mnt;
 
 	/*
 	 * Only allow handle opens under a directory.
@@ -168,9 +170,13 @@ xfs_handle_to_dentry(
 	fid.ino = handle.ha_fid.fid_ino;
 	fid.gen = handle.ha_fid.fid_gen;
 
-	return exportfs_decode_fh(parfilp->f_path.mnt, (struct fid *)&fid, 3,
-			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
-			xfs_handle_acceptable, NULL);
+	mnt = mntget(parfilp->f_path.mnt);
+	ret = exportfs_decode_fh(&mnt, (struct fid *)&fid, 3,
+				 FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
+				 xfs_handle_acceptable, NULL);
+	WARN_ON(mnt != parfilp->f_path.mnt);
+	mntput(mnt);
+	return ret;
 }
 
 STATIC struct dentry *
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index fe848901fcc3..9a8c5434a5cf 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -228,12 +228,12 @@ extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
 				    int *max_len, struct inode *parent);
 extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
 	int *max_len, int connectable);
-extern struct dentry *exportfs_decode_fh_raw(struct vfsmount *mnt,
+extern struct dentry *exportfs_decode_fh_raw(struct vfsmount **mntp,
 					     struct fid *fid, int fh_len,
 					     int fileid_type,
 					     int (*acceptable)(void *, struct dentry *),
 					     void *context);
-extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
+extern struct dentry *exportfs_decode_fh(struct vfsmount **mnt, struct fid *fid,
 	int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *),
 	void *context);
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on()
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (8 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

get_parent_attributes() is only used to get the inode number of the
mounted-on directory.  So change it to only do that and call it
nfsd_get_mounted_on().

It will eventually be use by nfs3 as well as nfs4, so move it to vfs.c.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4xdr.c |   29 +++++------------------------
 fs/nfsd/vfs.c     |   18 ++++++++++++++++++
 fs/nfsd/vfs.h     |    2 ++
 3 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 21c277fa28ae..d5683b6a74b2 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2768,22 +2768,6 @@ static __be32 fattr_handle_absent_fs(u32 *bmval0, u32 *bmval1, u32 *bmval2, u32
 	return 0;
 }
 
-
-static int get_parent_attributes(struct svc_export *exp, struct kstat *stat)
-{
-	struct path path = exp->ex_path;
-	int err;
-
-	path_get(&path);
-	while (follow_up(&path)) {
-		if (path.dentry != path.mnt->mnt_root)
-			break;
-	}
-	err = vfs_getattr(&path, stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
-	path_put(&path);
-	return err;
-}
-
 static __be32
 nfsd4_encode_bitmap(struct xdr_stream *xdr, u32 bmval0, u32 bmval1, u32 bmval2)
 {
@@ -3269,8 +3253,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		*p++ = cpu_to_be32(stat.mtime.tv_nsec);
 	}
 	if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
-		struct kstat parent_stat;
-		u64 ino = stat.ino;
+		u64 ino;
 
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
@@ -3279,12 +3262,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		 * Get parent's attributes if not ignoring crossmount
 		 * and this is the root of a cross-mounted filesystem.
 		 */
-		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root) {
-			err = get_parent_attributes(exp, &parent_stat);
-			if (err)
-				goto out_nfserr;
-			ino = parent_stat.ino;
-		}
+		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root)
+			ino = nfsd_get_mounted_on(mnt);
+		if (!ino)
+			ino = stat.ino;
 		p = xdr_encode_hyper(p, ino);
 	}
 #ifdef CONFIG_NFSD_PNFS
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c0c6920f25a4..baa12ac36ece 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -2445,3 +2445,21 @@ nfsd_permission(struct svc_rqst *rqstp, struct svc_export *exp,
 
 	return err? nfserrno(err) : 0;
 }
+
+unsigned long nfsd_get_mounted_on(struct vfsmount *mnt)
+{
+	struct kstat stat;
+	struct path path = { .mnt = mnt, .dentry = mnt->mnt_root };
+	int err;
+
+	path_get(&path);
+	while (follow_up(&path)) {
+		if (path.dentry != path.mnt->mnt_root)
+			break;
+	}
+	err = vfs_getattr(&path, &stat, STATX_INO, AT_STATX_DONT_SYNC);
+	path_put(&path);
+	if (err)
+		return 0;
+	return stat.ino;
+}
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 52f587716208..11ac36b21b4c 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -132,6 +132,8 @@ __be32		nfsd_statfs(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_permission(struct svc_rqst *, struct svc_export *,
 				struct dentry *, int);
 
+unsigned long	nfsd_get_mounted_on(struct vfsmount *mnt);
+
 static inline int fh_want_write(struct svc_fh *fh)
 {
 	int ret;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points.
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (7 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-28 19:15   ` J. Bruce Fields
  2021-07-30  0:42   ` Al Viro
  2021-07-27 22:37 ` [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() NeilBrown
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

Enhance nfsd to detect internal mounts and to cross them without
requiring a new export.

Also ensure the fsid reported is different for different submounts.  We
do this by xoring in the ino of the mounted-on directory.  This makes
sense for btrfs at least.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3xdr.c |   28 +++++++++++++++++++++-------
 fs/nfsd/nfs4xdr.c |   34 +++++++++++++++++++++++-----------
 fs/nfsd/nfsfh.c   |    7 ++++++-
 fs/nfsd/vfs.c     |   11 +++++++++--
 4 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 67af0c5c1543..80b1cc0334fa 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -370,6 +370,8 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 	case FSIDSOURCE_UUID:
 		fsid = ((u64 *)fhp->fh_export->ex_uuid)[0];
 		fsid ^= ((u64 *)fhp->fh_export->ex_uuid)[1];
+		if (fhp->fh_mnt != fhp->fh_export->ex_path.mnt)
+			fsid ^= nfsd_get_mounted_on(fhp->fh_mnt);
 		break;
 	default:
 		fsid = (u64)huge_encode_dev(fhp->fh_dentry->d_sb->s_dev);
@@ -1094,8 +1096,8 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 	__be32 rv = nfserr_noent;
 
 	dparent = cd->fh.fh_dentry;
-	exp  = cd->fh.fh_export;
-	child.mnt = cd->fh.fh_mnt;
+	exp  = exp_get(cd->fh.fh_export);
+	child.mnt = mntget(cd->fh.fh_mnt);
 
 	if (isdotent(name, namlen)) {
 		if (namlen == 2) {
@@ -1112,15 +1114,27 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 			child.dentry = dget(dparent);
 	} else
 		child.dentry = lookup_positive_unlocked(name, dparent, namlen);
-	if (IS_ERR(child.dentry))
+	if (IS_ERR(child.dentry)) {
+		mntput(child.mnt);
+		exp_put(exp);
 		return rv;
-	if (d_mountpoint(child.dentry))
-		goto out;
-	if (child.dentry->d_inode->i_ino != ino)
+	}
+	/* If child is a mountpoint, then we want to expose the fact
+	 * so client can create a mountpoint.  If not, then a different
+	 * ino number probably means a race with rename, so avoid providing
+	 * too much detail.
+	 */
+	if (nfsd_mountpoint(child.dentry, exp)) {
+		int err;
+		err = nfsd_cross_mnt(cd->rqstp, &child, &exp);
+		if (err)
+			goto out;
+	} else if (child.dentry->d_inode->i_ino != ino)
 		goto out;
 	rv = fh_compose(fhp, exp, &child, &cd->fh);
 out:
-	dput(child.dentry);
+	path_put(&child);
+	exp_put(exp);
 	return rv;
 }
 
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d5683b6a74b2..4dbc99ed2c8b 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2817,6 +2817,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 	struct kstat stat;
 	struct svc_fh *tempfh = NULL;
 	struct kstatfs statfs;
+	u64 mounted_on_ino;
+	u64 sub_fsid;
 	__be32 *p;
 	int starting_len = xdr->buf->len;
 	int attrlen_offset;
@@ -2871,6 +2873,24 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 			goto out;
 		fhp = tempfh;
 	}
+	if ((bmval0 & FATTR4_WORD0_FSID) ||
+	    (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID)) {
+		mounted_on_ino = stat.ino;
+		sub_fsid = 0;
+		/*
+		 * The inode number that the current mnt is mounted on is
+		 * used for MOUNTED_ON_FILED if we are at the root,
+		 * and for sub_fsid if mnt is not the export mnt.
+		 */
+		if (ignore_crossmnt == 0) {
+			u64 moi = nfsd_get_mounted_on(mnt);
+
+			if (dentry == mnt->mnt_root && moi)
+				mounted_on_ino = moi;
+			if (mnt != exp->ex_path.mnt)
+				sub_fsid = moi;
+		}
+	}
 	if (bmval0 & FATTR4_WORD0_ACL) {
 		err = nfsd4_get_nfs4_acl(rqstp, dentry, &acl);
 		if (err == -EOPNOTSUPP)
@@ -3008,6 +3028,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		case FSIDSOURCE_UUID:
 			p = xdr_encode_opaque_fixed(p, exp->ex_uuid,
 								EX_UUID_LEN);
+			if (mnt != exp->ex_path.mnt)
+				*(u64*)(p-2) ^= sub_fsid;
 			break;
 		}
 	}
@@ -3253,20 +3275,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		*p++ = cpu_to_be32(stat.mtime.tv_nsec);
 	}
 	if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
-		u64 ino;
-
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
 			goto out_resource;
-		/*
-		 * Get parent's attributes if not ignoring crossmount
-		 * and this is the root of a cross-mounted filesystem.
-		 */
-		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root)
-			ino = nfsd_get_mounted_on(mnt);
-		if (!ino)
-			ino = stat.ino;
-		p = xdr_encode_hyper(p, ino);
+		p = xdr_encode_hyper(p, mounted_on_ino);
 	}
 #ifdef CONFIG_NFSD_PNFS
 	if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 4023046f63e2..4b53838bca89 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -9,7 +9,7 @@
  */
 
 #include <linux/exportfs.h>
-
+#include <linux/namei.h>
 #include <linux/sunrpc/svcauth_gss.h>
 #include "nfsd.h"
 #include "vfs.h"
@@ -285,6 +285,11 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 			default:
 				dentry = ERR_PTR(-ESTALE);
 			}
+		} else if (nfsd_mountpoint(dentry, exp)) {
+			struct path path = { .mnt = mnt, .dentry = dentry };
+			follow_down(&path, LOOKUP_AUTOMOUNT);
+			mnt = path.mnt;
+			dentry = path.dentry;
 		}
 	}
 	if (dentry == NULL)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index baa12ac36ece..22523e1cd478 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -64,7 +64,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
 			    .dentry = dget(path_parent->dentry)};
 	int err = 0;
 
-	err = follow_down(&path, 0);
+	err = follow_down(&path, LOOKUP_AUTOMOUNT);
 	if (err < 0)
 		goto out;
 	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
@@ -73,6 +73,13 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
 		path_put(&path);
 		goto out;
 	}
+	if (mount_is_internal(path.mnt)) {
+		/* Use the new path, but don't look for a new export */
+		/* FIXME should I check NOHIDE in this case?? */
+		path_put(path_parent);
+		*path_parent = path;
+		goto out;
+	}
 
 	exp2 = rqst_exp_get_by_name(rqstp, &path);
 	if (IS_ERR(exp2)) {
@@ -157,7 +164,7 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
 		return 1;
 	if (nfsd4_is_junction(dentry))
 		return 1;
-	if (d_mountpoint(dentry))
+	if (d_managed(dentry))
 		/*
 		 * Might only be a mountpoint in a different namespace,
 		 * but we need to check.



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 10/11] btrfs: introduce mapping function from location to inum
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (5 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-27 22:37 ` [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount NeilBrown
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

A btrfs directory entry can refer to two different sorts of objects

 BTRFS_INODE_ITEM_KEY - a regular fs object (file, dir, etc)
 BTRFS_ROOT_ITEM_KEY  - a reference to a subvol.

The 'objectid' numbers for these two are independent, so it is possible
(and common) for an INODE and a ROOT to have the same objectid.

As readdir reports the objectid as the inode number, if two such are in
the same directory, a tool which examines the inode numbers in getdents
results could think they are links.

As the BTRFS_ROOT_ITEM_KEY objectid is not visible via stat() (only
getdents), this is rarely if ever a problem.  However a future patch
will expose this number as the i_ino of an automount point.  This will
cause problems if the objectid is used as-is.

So: create a simple mapping function to reduce (or eliminate?) the
possibility of conflict.  The objectid of BTRFS_ROOT_ITEM_KEY is
subtracted from ULONG_MAX to make an inode number.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/btrfs/btrfs_inode.h |   10 ++++++++++
 fs/btrfs/inode.c       |    3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index c652e19ad74e..a4b5f38196e6 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -328,6 +328,16 @@ static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
 	return ret;
 }
 
+static inline unsigned long btrfs_location_to_ino(struct btrfs_key *location)
+{
+	if (location->type == BTRFS_INODE_ITEM_KEY)
+		return location->objectid;
+	/* Probably BTRFS_ROOT_ITEM_KEY, try to keep the inode
+	 * numbers separate.
+	 */
+	return ULONG_MAX - location->objectid;
+}
+
 struct btrfs_dio_private {
 	struct inode *inode;
 	u64 logical_offset;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8f60314c36c5..02537c1a9763 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6136,7 +6136,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
 		put_unaligned(fs_ftype_to_dtype(btrfs_dir_type(leaf, di)),
 				&entry->type);
 		btrfs_dir_item_key_to_cpu(leaf, di, &location);
-		put_unaligned(location.objectid, &entry->ino);
+		put_unaligned(btrfs_location_to_ino(&location),
+			      &entry->ino);
 		put_unaligned(found_key.offset, &entry->offset);
 		entries++;
 		addr += sizeof(struct dir_entry) + name_len;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots.
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (3 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 03/11] VFS: pass lookup_flags into follow_down() NeilBrown
@ 2021-07-27 22:37 ` NeilBrown
  2021-07-28  8:37   ` kernel test robot
                     ` (3 more replies)
  2021-07-27 22:37 ` [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh NeilBrown
                   ` (9 subsequent siblings)
  14 siblings, 4 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-27 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

All subvol roots are now marked as automounts.  If the d_automount()
function determines that the dentry is not the root of the vfsmount, it
creates a simple loop-back mount of the dentry onto itself.  If it
determines that it IS the root of the vfsmount, it returns -EISDIR so
that no further automounting is attempted.

btrfs_getattr pays special attention to these automount dentries.
If it is NOT the root of the vfsmount:
 - the ->dev is reported as that for the rest of the vfsmount
 - the ->ino is reported as the subvol objectid, suitable transformed
   to avoid collision.

This way the same inode appear to be different depending on which mount
it is in.

automounted vfsmounts are kept on a list and timeout after 500 to 1000
seconds of last use.  This is configurable via a module parameter.
The tracking and timeout of automounts is copied from NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/btrfs/btrfs_inode.h |    2 +
 fs/btrfs/inode.c       |  108 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/super.c       |    1 
 3 files changed, 111 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index a4b5f38196e6..f03056cacc4a 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -387,4 +387,6 @@ static inline void btrfs_print_data_csum_error(struct btrfs_inode *inode,
 			mirror_num);
 }
 
+void btrfs_release_automount_timer(void);
+
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 02537c1a9763..a5f46545fb38 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -31,6 +31,7 @@
 #include <linux/migrate.h>
 #include <linux/sched/mm.h>
 #include <linux/iomap.h>
+#include <linux/fs_context.h>
 #include <asm/unaligned.h>
 #include "misc.h"
 #include "ctree.h"
@@ -5782,6 +5783,8 @@ static int btrfs_init_locked_inode(struct inode *inode, void *p)
 	struct btrfs_iget_args *args = p;
 
 	inode->i_ino = args->ino;
+	if (args->ino == BTRFS_FIRST_FREE_OBJECTID)
+		inode->i_flags |= S_AUTOMOUNT;
 	BTRFS_I(inode)->location.objectid = args->ino;
 	BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY;
 	BTRFS_I(inode)->location.offset = 0;
@@ -5985,6 +5988,101 @@ static int btrfs_dentry_delete(const struct dentry *dentry)
 	return 0;
 }
 
+static void btrfs_expire_automounts(struct work_struct *work);
+static LIST_HEAD(btrfs_automount_list);
+static DECLARE_DELAYED_WORK(btrfs_automount_task, btrfs_expire_automounts);
+int btrfs_mountpoint_expiry_timeout = 500 * HZ;
+static void btrfs_expire_automounts(struct work_struct *work)
+{
+	struct list_head *list = &btrfs_automount_list;
+	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
+
+	mark_mounts_for_expiry(list);
+	if (!list_empty(list) && timeout > 0)
+		schedule_delayed_work(&btrfs_automount_task, timeout);
+}
+
+void btrfs_release_automount_timer(void)
+{
+	if (list_empty(&btrfs_automount_list))
+		cancel_delayed_work(&btrfs_automount_task);
+}
+
+static struct vfsmount *btrfs_automount(struct path *path)
+{
+	struct fs_context fc;
+	struct vfsmount *mnt;
+	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
+
+	if (path->dentry == path->mnt->mnt_root)
+		/* dentry is root of the vfsmount,
+		 * so skip automount processing
+		 */
+		return ERR_PTR(-EISDIR);
+	/* Create a bind-mount to expose the subvol in the mount table */
+	fc.root = path->dentry;
+	fc.sb_flags = 0;
+	fc.source = "btrfs-automount";
+	mnt = vfs_create_mount(&fc);
+	if (IS_ERR(mnt))
+		return mnt;
+	mntget(mnt);
+	mnt_set_expiry(mnt, &btrfs_automount_list);
+	if (timeout > 0)
+		schedule_delayed_work(&btrfs_automount_task, timeout);
+	return mnt;
+}
+
+static int param_set_btrfs_timeout(const char *val, const struct kernel_param *kp)
+{
+	long num;
+	int ret;
+
+	if (!val)
+		return -EINVAL;
+	ret = kstrtol(val, 0, &num);
+	if (ret)
+		return -EINVAL;
+	if (num > 0) {
+		if (num >= INT_MAX / HZ)
+			num = INT_MAX;
+		else
+			num *= HZ;
+		*((int *)kp->arg) = num;
+		if (!list_empty(&btrfs_automount_list))
+			mod_delayed_work(system_wq, &btrfs_automount_task, num);
+	} else {
+		*((int *)kp->arg) = -1*HZ;
+		cancel_delayed_work(&btrfs_automount_task);
+	}
+	return 0;
+}
+
+static int param_get_btrfs_timeout(char *buffer, const struct kernel_param *kp)
+{
+	long num = *((int *)kp->arg);
+
+	if (num > 0) {
+		if (num >= INT_MAX - (HZ - 1))
+			num = INT_MAX / HZ;
+		else
+			num = (num + (HZ - 1)) / HZ;
+	} else
+		num = -1;
+	return scnprintf(buffer, PAGE_SIZE, "%li\n", num);
+}
+
+static const struct kernel_param_ops param_ops_btrfs_timeout = {
+	.set = param_set_btrfs_timeout,
+	.get = param_get_btrfs_timeout,
+};
+#define param_check_btrfs_timeout(name, p) __param_check(name, p, int)
+
+module_param(btrfs_mountpoint_expiry_timeout, btrfs_timeout, 0644);
+MODULE_PARM_DESC(btrfs_mountpoint_expiry_timeout,
+		"Set the btrfs automounted mountpoint timeout value (seconds). "
+		"Values <= 0 turn expiration off.");
+
 static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
 				   unsigned int flags)
 {
@@ -9195,6 +9293,15 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
 
 	generic_fillattr(&init_user_ns, inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
+	if ((inode->i_flags & S_AUTOMOUNT) &&
+	    path->dentry != path->mnt->mnt_root) {
+		/* This is the mounted-on side of the automount,
+		 * so we show the inode number from the ROOT_ITEM key
+		 * and the dev of the mountpoint.
+		 */
+		stat->ino = btrfs_location_to_ino(&BTRFS_I(inode)->root->root_key);
+		stat->dev = BTRFS_I(d_inode(path->mnt->mnt_root))->root->anon_dev;
+	}
 
 	spin_lock(&BTRFS_I(inode)->lock);
 	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
@@ -10844,4 +10951,5 @@ static const struct inode_operations btrfs_symlink_inode_operations = {
 
 const struct dentry_operations btrfs_dentry_operations = {
 	.d_delete	= btrfs_dentry_delete,
+	.d_automount	= btrfs_automount,
 };
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d07b18b2b250..33008e432a15 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -338,6 +338,7 @@ void __btrfs_panic(struct btrfs_fs_info *fs_info, const char *function,
 static void btrfs_put_super(struct super_block *sb)
 {
 	close_ctree(btrfs_sb(sb));
+	btrfs_release_automount_timer();
 }
 
 enum {



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 05/11] VFS: new function: mount_is_internal()
  2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
@ 2021-07-28  2:16   ` Al Viro
  2021-07-28  3:32     ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Al Viro @ 2021-07-28  2:16 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> This patch introduces the concept of an "internal" mount which is a
> mount where a filesystem has create the mount itself.
> 
> Both the mounted-on-dentry and the mount's root dentry must refer to the
> same superblock (they may be the same dentry), and the mounted-on dentry
> must be an automount.

And what happens if you mount --move it?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (10 preceding siblings ...)
  2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
@ 2021-07-28  2:19 ` Al Viro
  2021-07-28  4:58 ` Wang Yugui
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 123+ messages in thread
From: Al Viro @ 2021-07-28  2:19 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:

> We can enhance nfsd to understand that some automounts can be managed.
> "internal mounts" where a filesystem provides an automount point and
> mounts its own directories, can be handled differently by nfsd.
> 
> This series addresses all these issues.  After a few enhancements to the
> VFS to provide needed support, they enhance exportfs and nfsd to cope
> with the concept of internal mounts, and then enhance btrfs to provide
> them.

I'm half asleep right now; will review and reply in detail tomorrow.
The first impression is that it's going to be brittle; properties of
mount really should not depend upon the nature of mountpoint - too many
ways for that to go wrong.

Anyway, tomorrow...

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 05/11] VFS: new function: mount_is_internal()
  2021-07-28  2:16   ` Al Viro
@ 2021-07-28  3:32     ` NeilBrown
  2021-07-30  0:34       ` Al Viro
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-28  3:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, 28 Jul 2021, Al Viro wrote:
> On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > This patch introduces the concept of an "internal" mount which is a
> > mount where a filesystem has create the mount itself.
> > 
> > Both the mounted-on-dentry and the mount's root dentry must refer to the
> > same superblock (they may be the same dentry), and the mounted-on dentry
> > must be an automount.
> 
> And what happens if you mount --move it?
> 
> 
If you move the mount, then the mounted-on dentry would not longer be an
automount (....  I assume???...) so it would not longer be
mount_is_internal().

I think that is reasonable.  Whoever moved the mount has now taken over
responsibility for it - it no longer is controlled by the filesystem.
The moving will have removed the mount from the list of auto-expire
mounts, and the mount-trap will now be exposed and can be mounted-on
again.

It would be just like unmounting the automount, and bind-mounting the
same dentry elsewhere.

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (11 preceding siblings ...)
  2021-07-28  2:19 ` [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Al Viro
@ 2021-07-28  4:58 ` Wang Yugui
  2021-07-28  6:04   ` Wang Yugui
  2021-07-28  7:06   ` NeilBrown
  2021-07-28 19:35 ` J. Bruce Fields
  2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
  14 siblings, 2 replies; 123+ messages in thread
From: Wang Yugui @ 2021-07-28  4:58 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4396 bytes --]

Hi,

We no longer need the dummy inode(BTRFS_FIRST_FREE_OBJECTID - 1) in this
patch serials?

I tried to backport it to 5.10.x, but it failed to work.
No big modification in this 5.10.x backporting, and all modified pathes
are attached.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/07/28

> There are long-standing problems with btrfs subvols, particularly in
> relation to whether and how they are exposed in the mount table.
> 
>  - /proc/self/mountinfo reports the major:minor device number for each
>     filesystem and when a btrfs subvol is explicitly mounted, the number
>     reported is wrong - it does not match what stat() reports for the
>     mountpoint.
> 
>  - when subvol are not explicitly mounted, they don't appear in
>    mountinfo at all.
> 
> Consequences include that a tool which uses stat() to find the dev of the
> filesystem, then searches mountinfo for that filesystem, will not find
> it.
> 
> Some tools (e.g. findmnt) appear to have been enhanced to cope with this
> strangeness, but it would be best to make btrfs behave more normally.
> 
>   - nfsd cannot currently see the transition to subvol, so reports the
>     main volume and all subvols to the client as being in the same
>     filesystem.  As inode numbers are not unique across all subvols,
>     this can confuse clients.  In particular, 'find' is likely to report a
>     loop.
> 
> subvols can be made to appear in mountinfo using automounts.  However
> nfsd does not cope well with automounts.  It assumes all filesystems to
> be exported are already mounted.  So adding automounts to btrfs would
> break nfsd.
> 
> We can enhance nfsd to understand that some automounts can be managed.
> "internal mounts" where a filesystem provides an automount point and
> mounts its own directories, can be handled differently by nfsd.
> 
> This series addresses all these issues.  After a few enhancements to the
> VFS to provide needed support, they enhance exportfs and nfsd to cope
> with the concept of internal mounts, and then enhance btrfs to provide
> them.
> 
> The NFSv3 support is incomplete.  I'm not sure we can make it work
> "perfectly".  A normal nfsv3 mount seem to work well enough, but if
> mounted with '-o noac', it loses track of the mounted-on inode number
> and complains about inode numbers changing.
> 
> My basic test for these is to mount a btrfs filesystem which contains
> subvols, nfs-export it and mount it with nfsv3 and nfsv4, then run
> 'find' in each of the filesystem and check the contents of
> /proc/self/mountinfo.
> 
> The first patch simply fixes the dev number in mountinfo and could
> possibly be tagged for -stable.
> 
> NeilBrown
> 
> ---
> 
> NeilBrown (11):
>       VFS: show correct dev num in mountinfo
>       VFS: allow d_automount to create in-place bind-mount.
>       VFS: pass lookup_flags into follow_down()
>       VFS: export lookup_mnt()
>       VFS: new function: mount_is_internal()
>       nfsd: include a vfsmount in struct svc_fh
>       exportfs: Allow filehandle lookup to cross internal mount points.
>       nfsd: change get_parent_attributes() to nfsd_get_mounted_on()
>       nfsd: Allow filehandle lookup to cross internal mount points.
>       btrfs: introduce mapping function from location to inum
>       btrfs: use automount to bind-mount all subvol roots.
> 
> 
>  fs/btrfs/btrfs_inode.h   |  12 +++
>  fs/btrfs/inode.c         | 111 ++++++++++++++++++++++++++-
>  fs/btrfs/super.c         |   1 +
>  fs/exportfs/expfs.c      | 100 ++++++++++++++++++++----
>  fs/fhandle.c             |   2 +-
>  fs/internal.h            |   1 -
>  fs/namei.c               |   6 +-
>  fs/namespace.c           |  32 +++++++-
>  fs/nfsd/export.c         |   4 +-
>  fs/nfsd/nfs3xdr.c        |  40 +++++++---
>  fs/nfsd/nfs4proc.c       |   9 ++-
>  fs/nfsd/nfs4xdr.c        | 106 ++++++++++++-------------
>  fs/nfsd/nfsfh.c          |  44 +++++++----
>  fs/nfsd/nfsfh.h          |   3 +-
>  fs/nfsd/nfsproc.c        |   5 +-
>  fs/nfsd/vfs.c            | 162 +++++++++++++++++++++++----------------
>  fs/nfsd/vfs.h            |  12 +--
>  fs/nfsd/xdr4.h           |   2 +-
>  fs/overlayfs/namei.c     |   5 +-
>  fs/xfs/xfs_ioctl.c       |  12 ++-
>  include/linux/exportfs.h |   4 +-
>  include/linux/mount.h    |   4 +
>  include/linux/namei.h    |   2 +-
>  23 files changed, 490 insertions(+), 189 deletions(-)
> 
> --
> Signature


[-- Attachment #2: 0009-nfsd-Allow-filehandle-lookup-to-cross-internal-mount.patch --]
[-- Type: application/octet-stream, Size: 6507 bytes --]

From af944383d835860cf9ebd7859aadbfbc4a5c295c Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] nfsd: Allow filehandle lookup to cross internal mount points.

Enhance nfsd to detect internal mounts and to cross them without
requiring a new export.

Also ensure the fsid reported is different for different submounts.  We
do this by xoring in the ino of the mounted-on directory.  This makes
sense for btrfs at least.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3xdr.c | 28 +++++++++++++++++++++-------
 fs/nfsd/nfs4xdr.c | 34 +++++++++++++++++++++++-----------
 fs/nfsd/nfsfh.c   |  8 +++++++-
 fs/nfsd/vfs.c     | 11 +++++++++--
 4 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 583a228..5e77442 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -156,6 +156,8 @@ static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp)
 	case FSIDSOURCE_UUID:
 		f = ((u64*)fhp->fh_export->ex_uuid)[0];
 		f ^= ((u64*)fhp->fh_export->ex_uuid)[1];
+		if (fhp->fh_mnt != fhp->fh_export->ex_path.mnt)
+			f ^= nfsd_get_mounted_on(fhp->fh_mnt);
 		p = xdr_encode_hyper(p, f);
 		break;
 	}
@@ -859,8 +861,8 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 	__be32 rv = nfserr_noent;
 
 	dparent = cd->fh.fh_dentry;
-	exp  = cd->fh.fh_export;
-	child.mnt = cd->fh.fh_mnt;
+	exp  = exp_get(cd->fh.fh_export);
+	child.mnt = mntget(cd->fh.fh_mnt);
 
 	if (isdotent(name, namlen)) {
 		if (namlen == 2) {
@@ -1112,15 +1114,27 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 			child.dentry = dget(dparent);
 	} else
 		child.dentry = lookup_positive_unlocked(name, dparent, namlen);
-	if (IS_ERR(child.dentry))
+	if (IS_ERR(child.dentry)) {
+		mntput(child.mnt);
+		exp_put(exp);
 		return rv;
-	if (d_mountpoint(child.dentry))
-		goto out;
-	if (child.dentry->d_inode->i_ino != ino)
+	}
+	/* If child is a mountpoint, then we want to expose the fact
+	 * so client can create a mountpoint.  If not, then a different
+	 * ino number probably means a race with rename, so avoid providing
+	 * too much detail.
+	 */
+	if (nfsd_mountpoint(child.dentry, exp)) {
+		int err;
+		err = nfsd_cross_mnt(cd->rqstp, &child, &exp);
+		if (err)
+			goto out;
+	} else if (child.dentry->d_inode->i_ino != ino)
 		goto out;
 	rv = fh_compose(fhp, exp, &child, &cd->fh);
 out:
-	dput(child.dentry);
+	path_put(&child);
+	exp_put(exp);
 	return rv;
 }
 
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d5683b6a74b2..4dbc99ed2c8b 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2817,6 +2817,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 	struct kstat stat;
 	struct svc_fh *tempfh = NULL;
 	struct kstatfs statfs;
+	u64 mounted_on_ino;
+	u64 sub_fsid;
 	__be32 *p;
 	int starting_len = xdr->buf->len;
 	int attrlen_offset;
@@ -2871,6 +2873,24 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 			goto out;
 		fhp = tempfh;
 	}
+	if ((bmval0 & FATTR4_WORD0_FSID) ||
+	    (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID)) {
+		mounted_on_ino = stat.ino;
+		sub_fsid = 0;
+		/*
+		 * The inode number that the current mnt is mounted on is
+		 * used for MOUNTED_ON_FILED if we are at the root,
+		 * and for sub_fsid if mnt is not the export mnt.
+		 */
+		if (ignore_crossmnt == 0) {
+			u64 moi = nfsd_get_mounted_on(mnt);
+
+			if (dentry == mnt->mnt_root && moi)
+				mounted_on_ino = moi;
+			if (mnt != exp->ex_path.mnt)
+				sub_fsid = moi;
+		}
+	}
 	if (bmval0 & FATTR4_WORD0_ACL) {
 		err = nfsd4_get_nfs4_acl(rqstp, dentry, &acl);
 		if (err == -EOPNOTSUPP)
@@ -3008,6 +3028,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		case FSIDSOURCE_UUID:
 			p = xdr_encode_opaque_fixed(p, exp->ex_uuid,
 								EX_UUID_LEN);
+			if (mnt != exp->ex_path.mnt)
+				*(u64*)(p-2) ^= sub_fsid;
 			break;
 		}
 	}
@@ -3253,20 +3275,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		*p++ = cpu_to_be32(stat.mtime.tv_nsec);
 	}
 	if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
-		u64 ino;
-
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
 			goto out_resource;
-		/*
-		 * Get parent's attributes if not ignoring crossmount
-		 * and this is the root of a cross-mounted filesystem.
-		 */
-		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root)
-			ino = nfsd_get_mounted_on(mnt);
-		if (!ino)
-			ino = stat.ino;
-		p = xdr_encode_hyper(p, ino);
+		p = xdr_encode_hyper(p, mounted_on_ino);
 	}
 #ifdef CONFIG_NFSD_PNFS
 	if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 4023046f63e2..4b53838bca89 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -9,7 +9,7 @@
  */
 
 #include <linux/exportfs.h>
-
+#include <linux/namei.h>
 #include <linux/sunrpc/svcauth_gss.h>
 #include "nfsd.h"
 #include "vfs.h"
@@ -277,6 +277,12 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 		if (IS_ERR_OR_NULL(dentry))
 			trace_nfsd_set_fh_dentry_badhandle(rqstp, fhp,
 					dentry ?  PTR_ERR(dentry) : -ESTALE);
+		else if (nfsd_mountpoint(dentry, exp)) {
+			struct path path = { .mnt = mnt, .dentry = dentry };
+			follow_down(&path, LOOKUP_AUTOMOUNT);
+			mnt = path.mnt;
+			dentry = path.dentry;
+		}
 	}
 	if (dentry == NULL)
 		goto out;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index baa12ac36ece..22523e1cd478 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -64,7 +64,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
 			    .dentry = dget(path_parent->dentry)};
 	int err = 0;
 
-	err = follow_down(&path, 0);
+	err = follow_down(&path, LOOKUP_AUTOMOUNT);
 	if (err < 0)
 		goto out;
 	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
@@ -73,6 +73,13 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
 		path_put(&path);
 		goto out;
 	}
+	if (mount_is_internal(path.mnt)) {
+		/* Use the new path, but don't look for a new export */
+		/* FIXME should I check NOHIDE in this case?? */
+		path_put(path_parent);
+		*path_parent = path;
+		goto out;
+	}
 
 	exp2 = rqst_exp_get_by_name(rqstp, &path);
 	if (IS_ERR(exp2)) {
@@ -157,7 +164,7 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
 		return 1;
 	if (nfsd4_is_junction(dentry))
 		return 1;
-	if (d_mountpoint(dentry))
+	if (d_managed(dentry))
 		/*
 		 * Might only be a mountpoint in a different namespace,
 		 * but we need to check.
-- 
2.32.0


[-- Attachment #3: 0011-btrfs-use-automount-to-bind-mount-all-subvol-roots.patch --]
[-- Type: application/octet-stream, Size: 6484 bytes --]

From e818a147155d2d9b66b986e4617455fd6a1454aa Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] btrfs: use automount to bind-mount all subvol roots.

All subvol roots are now marked as automounts.  If the d_automount()
function determines that the dentry is not the root of the vfsmount, it
creates a simple loop-back mount of the dentry onto itself.  If it
determines that it IS the root of the vfsmount, it returns -EISDIR so
that no further automounting is attempted.

btrfs_getattr pays special attention to these automount dentries.
If it is NOT the root of the vfsmount:
 - the ->dev is reported as that for the rest of the vfsmount
 - the ->ino is reported as the subvol objectid, suitable transformed
   to avoid collision.

This way the same inode appear to be different depending on which mount
it is in.

automounted vfsmounts are kept on a list and timeout after 500 to 1000
seconds of last use.  This is configurable via a module parameter.
The tracking and timeout of automounts is copied from NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/btrfs/btrfs_inode.h |   2 +
 fs/btrfs/inode.c       | 108 +++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/super.c       |   1 +
 3 files changed, 111 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index a4b5f38196e6..f03056cacc4a 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -387,4 +387,6 @@ static inline void btrfs_print_data_csum_error(struct btrfs_inode *inode,
 			mirror_num);
 }
 
+void btrfs_release_automount_timer(void);
+
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 25db806ca68a..809b97defafe 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -31,6 +31,8 @@
 #include <linux/migrate.h>
 #include <linux/sched/mm.h>
 #include <linux/iomap.h>
+#include <linux/fs_context.h>
+#include <linux/mount.h>
 #include <asm/unaligned.h>
 #include "misc.h"
 #include "ctree.h"
@@ -5782,6 +5783,8 @@ static int btrfs_init_locked_inode(struct inode *inode, void *p)
 	struct btrfs_iget_args *args = p;
 
 	inode->i_ino = args->ino;
+	if (args->ino == BTRFS_FIRST_FREE_OBJECTID)
+		inode->i_flags |= S_AUTOMOUNT;
 	BTRFS_I(inode)->location.objectid = args->ino;
 	BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY;
 	BTRFS_I(inode)->location.offset = 0;
@@ -5985,6 +5988,101 @@ static int btrfs_dentry_delete(const struct dentry *dentry)
 	return 0;
 }
 
+static void btrfs_expire_automounts(struct work_struct *work);
+static LIST_HEAD(btrfs_automount_list);
+static DECLARE_DELAYED_WORK(btrfs_automount_task, btrfs_expire_automounts);
+int btrfs_mountpoint_expiry_timeout = 500 * HZ;
+static void btrfs_expire_automounts(struct work_struct *work)
+{
+	struct list_head *list = &btrfs_automount_list;
+	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
+
+	mark_mounts_for_expiry(list);
+	if (!list_empty(list) && timeout > 0)
+		schedule_delayed_work(&btrfs_automount_task, timeout);
+}
+
+void btrfs_release_automount_timer(void)
+{
+	if (list_empty(&btrfs_automount_list))
+		cancel_delayed_work(&btrfs_automount_task);
+}
+
+static struct vfsmount *btrfs_automount(struct path *path)
+{
+	struct fs_context fc;
+	struct vfsmount *mnt;
+	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
+
+	if (path->dentry == path->mnt->mnt_root)
+		/* dentry is root of the vfsmount,
+		 * so skip automount processing
+		 */
+		return ERR_PTR(-EISDIR);
+	/* Create a bind-mount to expose the subvol in the mount table */
+	fc.root = path->dentry;
+	fc.sb_flags = 0;
+	fc.source = "btrfs-automount";
+	mnt = vfs_create_mount(&fc);
+	if (IS_ERR(mnt))
+		return mnt;
+	mntget(mnt);
+	mnt_set_expiry(mnt, &btrfs_automount_list);
+	if (timeout > 0)
+		schedule_delayed_work(&btrfs_automount_task, timeout);
+	return mnt;
+}
+
+static int param_set_btrfs_timeout(const char *val, const struct kernel_param *kp)
+{
+	long num;
+	int ret;
+
+	if (!val)
+		return -EINVAL;
+	ret = kstrtol(val, 0, &num);
+	if (ret)
+		return -EINVAL;
+	if (num > 0) {
+		if (num >= INT_MAX / HZ)
+			num = INT_MAX;
+		else
+			num *= HZ;
+		*((int *)kp->arg) = num;
+		if (!list_empty(&btrfs_automount_list))
+			mod_delayed_work(system_wq, &btrfs_automount_task, num);
+	} else {
+		*((int *)kp->arg) = -1*HZ;
+		cancel_delayed_work(&btrfs_automount_task);
+	}
+	return 0;
+}
+
+static int param_get_btrfs_timeout(char *buffer, const struct kernel_param *kp)
+{
+	long num = *((int *)kp->arg);
+
+	if (num > 0) {
+		if (num >= INT_MAX - (HZ - 1))
+			num = INT_MAX / HZ;
+		else
+			num = (num + (HZ - 1)) / HZ;
+	} else
+		num = -1;
+	return scnprintf(buffer, PAGE_SIZE, "%li\n", num);
+}
+
+static const struct kernel_param_ops param_ops_btrfs_timeout = {
+	.set = param_set_btrfs_timeout,
+	.get = param_get_btrfs_timeout,
+};
+#define param_check_btrfs_timeout(name, p) __param_check(name, p, int)
+
+module_param(btrfs_mountpoint_expiry_timeout, btrfs_timeout, 0644);
+MODULE_PARM_DESC(btrfs_mountpoint_expiry_timeout,
+		"Set the btrfs automounted mountpoint timeout value (seconds). "
+		"Values <= 0 turn expiration off.");
+
 static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
 				   unsigned int flags)
 {
@@ -8874,6 +8972,15 @@ static int btrfs_getattr(const struct path *path, struct kstat *stat,
 
 	generic_fillattr(inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
+	if ((inode->i_flags & S_AUTOMOUNT) &&
+	    path->dentry != path->mnt->mnt_root) {
+		/* This is the mounted-on side of the automount,
+		 * so we show the inode number from the ROOT_ITEM key
+		 * and the dev of the mountpoint.
+		 */
+		stat->ino = btrfs_location_to_ino(&BTRFS_I(inode)->root->root_key);
+		stat->dev = BTRFS_I(d_inode(path->mnt->mnt_root))->root->anon_dev;
+	}
 
 	spin_lock(&BTRFS_I(inode)->lock);
 	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
@@ -10844,4 +10951,5 @@ static const struct inode_operations btrfs_symlink_inode_operations = {
 
 const struct dentry_operations btrfs_dentry_operations = {
 	.d_delete	= btrfs_dentry_delete,
+	.d_automount	= btrfs_automount,
 };
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d07b18b2b250..33008e432a15 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -338,6 +338,7 @@ void __btrfs_panic(struct btrfs_fs_info *fs_info, const char *function,
 static void btrfs_put_super(struct super_block *sb)
 {
 	close_ctree(btrfs_sb(sb));
+	btrfs_release_automount_timer();
 }
 
 enum {
-- 
2.32.0


[-- Attachment #4: 0006-nfsd-include-a-vfsmount-in-struct-svc_fh.patch --]
[-- Type: application/octet-stream, Size: 27410 bytes --]

From b7e1488f5f44806c5ccff692adac907a7c57e545 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] nfsd: include a vfsmount in struct svc_fh

A future patch will allow exportfs_decode_fh{,_raw} to return a
different vfsmount than the one passed.  This is specifically for btrfs,
but would be useful for any filesystem that presents as multiple volumes
(i.e. different st_dev, each with their own st_ino number-space).

For nfsd, this means that the mnt in the svc_export may not apply to all
filehandles reached from that export.  So svc_fh needs to store a
distinct vfsmount as well.

For now, fs->fh_mnt == fh->fh_export->ex_path.mnt, but that will change.

Changes include:
  fh_compose()
  nfsd_lookup_dentry()
     now take a *path instead of a *dentry

  nfsd4_encode_fattr()
  nfsd4_encode_fattr_to_buf()
     now take a *vfsmount as well as a *dentry

  nfsd_cross_mnt() now takes a *path instead of a **dentry
     to pass in, and get back, the mnt and dentry.

  nfsd_lookup_parent() used to take a *dentry and a **dentry.
     now it just takes a *path.  This is the *path that as passed
     to nfsd_lookup_dentry().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/export.c   |   4 +-
 fs/nfsd/nfs3xdr.c  |  22 ++++----
 fs/nfsd/nfs4proc.c |   9 +--
 fs/nfsd/nfs4xdr.c  |  55 ++++++++++---------
 fs/nfsd/nfsfh.c    |  30 ++++++----
 fs/nfsd/nfsfh.h    |   3 +-
 fs/nfsd/nfsproc.c  |   5 +-
 fs/nfsd/vfs.c      | 133 ++++++++++++++++++++++++---------------------
 fs/nfsd/vfs.h      |  10 ++--
 fs/nfsd/xdr4.h     |   2 +-
 10 files changed, 150 insertions(+), 123 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 9421dae22737..e506cbe78b4f 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -1003,7 +1003,7 @@ exp_rootfh(struct net *net, struct auth_domain *clp, char *name,
 	 * fh must be initialized before calling fh_compose
 	 */
 	fh_init(&fh, maxsize);
-	if (fh_compose(&fh, exp, path.dentry, NULL))
+	if (fh_compose(&fh, exp, &path, NULL))
 		err = -EINVAL;
 	else
 		err = 0;
@@ -1178,7 +1178,7 @@ exp_pseudoroot(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	exp = rqst_find_fsidzero_export(rqstp);
 	if (IS_ERR(exp))
 		return nfserrno(PTR_ERR(exp));
-	rv = fh_compose(fhp, exp, exp->ex_path.dentry, NULL);
+	rv = fh_compose(fhp, exp, &exp->ex_path, NULL);
 	exp_put(exp);
 	return rv;
 }
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 0a5ebc52e6a9..67af0c5c1543 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -1089,36 +1089,38 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 		 const char *name, int namlen, u64 ino)
 {
 	struct svc_export	*exp;
-	struct dentry		*dparent, *dchild;
+	struct dentry		*dparent;
+	struct path		child;
 	__be32 rv = nfserr_noent;
 
 	dparent = cd->fh.fh_dentry;
 	exp  = cd->fh.fh_export;
+	child.mnt = cd->fh.fh_mnt;
 
 	if (isdotent(name, namlen)) {
 		if (namlen == 2) {
-			dchild = dget_parent(dparent);
+			child.dentry = dget_parent(dparent);
 			/*
 			 * Don't return filehandle for ".." if we're at
 			 * the filesystem or export root:
 			 */
-			if (dchild == dparent)
+			if (child.dentry == dparent)
 				goto out;
 			if (dparent == exp->ex_path.dentry)
 				goto out;
 		} else
-			dchild = dget(dparent);
+			child.dentry = dget(dparent);
 	} else
-		dchild = lookup_positive_unlocked(name, dparent, namlen);
-	if (IS_ERR(dchild))
+		child.dentry = lookup_positive_unlocked(name, dparent, namlen);
+	if (IS_ERR(child.dentry))
 		return rv;
-	if (d_mountpoint(dchild))
+	if (d_mountpoint(child.dentry))
 		goto out;
-	if (dchild->d_inode->i_ino != ino)
+	if (child.dentry->d_inode->i_ino != ino)
 		goto out;
-	rv = fh_compose(fhp, exp, dchild, &cd->fh);
+	rv = fh_compose(fhp, exp, &child, &cd->fh);
 out:
-	dput(dchild);
+	dput(child.dentry);
 	return rv;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 486c5dba4b65..743b9315cd3e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -902,7 +902,7 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	struct nfsd4_secinfo *secinfo = &u->secinfo;
 	struct svc_export *exp;
-	struct dentry *dentry;
+	struct path path;
 	__be32 err;
 
 	err = fh_verify(rqstp, &cstate->current_fh, S_IFDIR, NFSD_MAY_EXEC);
@@ -910,16 +910,16 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		return err;
 	err = nfsd_lookup_dentry(rqstp, &cstate->current_fh,
 				    secinfo->si_name, secinfo->si_namelen,
-				    &exp, &dentry);
+				    &exp, &path);
 	if (err)
 		return err;
 	fh_unlock(&cstate->current_fh);
-	if (d_really_is_negative(dentry)) {
+	if (d_really_is_negative(path.dentry)) {
 		exp_put(exp);
 		err = nfserr_noent;
 	} else
 		secinfo->si_exp = exp;
-	dput(dentry);
+	path_put(&path);
 	if (cstate->minorversion)
 		/* See rfc 5661 section 2.6.3.1.1.8 */
 		fh_put(&cstate->current_fh);
@@ -1930,6 +1930,7 @@ _nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	p = buf;
 	status = nfsd4_encode_fattr_to_buf(&p, count, &cstate->current_fh,
 				    cstate->current_fh.fh_export,
+				    cstate->current_fh.fh_mnt,
 				    cstate->current_fh.fh_dentry,
 				    verify->ve_bmval,
 				    rqstp, 0);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 7abeccb975b2..21c277fa28ae 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2823,9 +2823,9 @@ nfsd4_encode_bitmap(struct xdr_stream *xdr, u32 bmval0, u32 bmval1, u32 bmval2)
  */
 static __be32
 nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
-		struct svc_export *exp,
-		struct dentry *dentry, u32 *bmval,
-		struct svc_rqst *rqstp, int ignore_crossmnt)
+		   struct svc_export *exp,
+		   struct vfsmount *mnt, struct dentry *dentry,
+		   u32 *bmval, struct svc_rqst *rqstp, int ignore_crossmnt)
 {
 	u32 bmval0 = bmval[0];
 	u32 bmval1 = bmval[1];
@@ -2851,7 +2851,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 	struct nfsd4_compoundres *resp = rqstp->rq_resp;
 	u32 minorversion = resp->cstate.minorversion;
 	struct path path = {
-		.mnt	= exp->ex_path.mnt,
+		.mnt	= mnt,
 		.dentry	= dentry,
 	};
 	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
@@ -2882,7 +2882,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		if (!tempfh)
 			goto out;
 		fh_init(tempfh, NFS4_FHSIZE);
-		status = fh_compose(tempfh, exp, dentry, NULL);
+		status = fh_compose(tempfh, exp, &path, NULL);
 		if (status)
 			goto out;
 		fhp = tempfh;
@@ -3274,13 +3274,12 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
-                	goto out_resource;
+			goto out_resource;
 		/*
 		 * Get parent's attributes if not ignoring crossmount
 		 * and this is the root of a cross-mounted filesystem.
 		 */
-		if (ignore_crossmnt == 0 &&
-		    dentry == exp->ex_path.mnt->mnt_root) {
+		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root) {
 			err = get_parent_attributes(exp, &parent_stat);
 			if (err)
 				goto out_nfserr;
@@ -3380,17 +3379,18 @@ static void svcxdr_init_encode_from_buffer(struct xdr_stream *xdr,
 }
 
 __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
-			struct svc_fh *fhp, struct svc_export *exp,
-			struct dentry *dentry, u32 *bmval,
-			struct svc_rqst *rqstp, int ignore_crossmnt)
+				 struct svc_fh *fhp, struct svc_export *exp,
+				 struct vfsmount *mnt, struct dentry *dentry,
+				 u32 *bmval, struct svc_rqst *rqstp,
+				 int ignore_crossmnt)
 {
 	struct xdr_buf dummy;
 	struct xdr_stream xdr;
 	__be32 ret;
 
 	svcxdr_init_encode_from_buffer(&xdr, &dummy, *p, words << 2);
-	ret = nfsd4_encode_fattr(&xdr, fhp, exp, dentry, bmval, rqstp,
-							ignore_crossmnt);
+	ret = nfsd4_encode_fattr(&xdr, fhp, exp, mnt, dentry, bmval, rqstp,
+				 ignore_crossmnt);
 	*p = xdr.p;
 	return ret;
 }
@@ -3409,14 +3409,16 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 			const char *name, int namlen)
 {
 	struct svc_export *exp = cd->rd_fhp->fh_export;
-	struct dentry *dentry;
+	struct path path;
 	__be32 nfserr;
 	int ignore_crossmnt = 0;
 
-	dentry = lookup_positive_unlocked(name, cd->rd_fhp->fh_dentry, namlen);
-	if (IS_ERR(dentry))
-		return nfserrno(PTR_ERR(dentry));
+	path.dentry = lookup_positive_unlocked(name, cd->rd_fhp->fh_dentry,
+					      namlen);
+	if (IS_ERR(path.dentry))
+		return nfserrno(PTR_ERR(path.dentry));
 
+	path.mnt = mntget(cd->rd_fhp->fh_mnt);
 	exp_get(exp);
 	/*
 	 * In the case of a mountpoint, the client may be asking for
@@ -3425,7 +3427,7 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 	 * we will not follow the cross mount and will fill the attribtutes
 	 * directly from the mountpoint dentry.
 	 */
-	if (nfsd_mountpoint(dentry, exp)) {
+	if (nfsd_mountpoint(path.dentry, exp)) {
 		int err;
 
 		if (!(exp->ex_flags & NFSEXP_V4ROOT)
@@ -3434,11 +3436,11 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 			goto out_encode;
 		}
 		/*
-		 * Why the heck aren't we just using nfsd_lookup??
+		 * Why the heck aren't we just using nfsd_lookup_dentry??
 		 * Different "."/".." handling?  Something else?
 		 * At least, add a comment here to explain....
 		 */
-		err = nfsd_cross_mnt(cd->rd_rqstp, &dentry, &exp);
+		err = nfsd_cross_mnt(cd->rd_rqstp, &path, &exp);
 		if (err) {
 			nfserr = nfserrno(err);
 			goto out_put;
@@ -3446,13 +3448,13 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 		nfserr = check_nfsd_access(exp, cd->rd_rqstp);
 		if (nfserr)
 			goto out_put;
-
 	}
 out_encode:
-	nfserr = nfsd4_encode_fattr(xdr, NULL, exp, dentry, cd->rd_bmval,
-					cd->rd_rqstp, ignore_crossmnt);
+	nfserr = nfsd4_encode_fattr(xdr, NULL, exp, path.mnt, path.dentry,
+				    cd->rd_bmval, cd->rd_rqstp,
+				    ignore_crossmnt);
 out_put:
-	dput(dentry);
+	path_put(&path);
 	exp_put(exp);
 	return nfserr;
 }
@@ -3651,8 +3653,9 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
 	struct svc_fh *fhp = getattr->ga_fhp;
 	struct xdr_stream *xdr = &resp->xdr;
 
-	return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry,
-				    getattr->ga_bmval, resp->rqstp, 0);
+	return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export,
+				  fhp->fh_mnt, fhp->fh_dentry,
+				  getattr->ga_bmval, resp->rqstp, 0);
 }
 
 static __be32
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c475d2271f9c..0bf7ac13ae50 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -299,6 +299,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	}
 
 	fhp->fh_dentry = dentry;
+	fhp->fh_mnt = mntget(exp->ex_path.mnt);
 	fhp->fh_export = exp;
 	return 0;
 out:
@@ -556,7 +557,7 @@ static void set_version_and_fsid_type(struct svc_fh *fhp, struct svc_export *exp
 }
 
 __be32
-fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
+fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct path *path,
 	   struct svc_fh *ref_fh)
 {
 	/* ref_fh is a reference file handle.
@@ -567,13 +568,13 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 	 *
 	 */
 
-	struct inode * inode = d_inode(dentry);
+	struct inode * inode = d_inode(path->dentry);
 	dev_t ex_dev = exp_sb(exp)->s_dev;
 
 	dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
 		MAJOR(ex_dev), MINOR(ex_dev),
 		(long) d_inode(exp->ex_path.dentry)->i_ino,
-		dentry,
+		path->dentry,
 		(inode ? inode->i_ino : 0));
 
 	/* Choose filehandle version and fsid type based on
@@ -590,14 +591,15 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 
 	if (fhp->fh_locked || fhp->fh_dentry) {
 		printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n",
-		       dentry);
+		       path->dentry);
 	}
 	if (fhp->fh_maxsize < NFS_FHSIZE)
 		printk(KERN_ERR "fh_compose: called with maxsize %d! %pd2\n",
 		       fhp->fh_maxsize,
-		       dentry);
+		       path->dentry);
 
-	fhp->fh_dentry = dget(dentry); /* our internal copy */
+	fhp->fh_dentry = dget(path->dentry); /* our internal copy */
+	fhp->fh_mnt = mntget(path->mnt);
 	fhp->fh_export = exp_get(exp);
 
 	if (fhp->fh_handle.fh_version == 0xca) {
@@ -609,9 +611,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 		fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev;
 		fhp->fh_handle.ofh_xino =
 			ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
-		fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry));
+		fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(path->dentry));
 		if (inode)
-			_fh_update_old(dentry, exp, &fhp->fh_handle);
+			_fh_update_old(path->dentry, exp, &fhp->fh_handle);
 	} else {
 		fhp->fh_handle.fh_size =
 			key_len(fhp->fh_handle.fh_fsid_type) + 4;
@@ -624,7 +626,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 			exp->ex_fsid, exp->ex_uuid);
 
 		if (inode)
-			_fh_update(fhp, exp, dentry);
+			_fh_update(fhp, exp, path->dentry);
 		if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
 			fh_put(fhp);
 			return nfserr_opnotsupp;
@@ -675,8 +677,10 @@ fh_update(struct svc_fh *fhp)
 void
 fh_put(struct svc_fh *fhp)
 {
-	struct dentry * dentry = fhp->fh_dentry;
-	struct svc_export * exp = fhp->fh_export;
+	struct dentry *dentry = fhp->fh_dentry;
+	struct svc_export *exp = fhp->fh_export;
+	struct vfsmount *mnt = fhp->fh_mnt;
+
 	if (dentry) {
 		fh_unlock(fhp);
 		fhp->fh_dentry = NULL;
@@ -684,6 +688,10 @@ fh_put(struct svc_fh *fhp)
 		fh_clear_wcc(fhp);
 	}
 	fh_drop_write(fhp);
+	if (mnt) {
+		mntput(mnt);
+		fhp->fh_mnt = NULL;
+	}
 	if (exp) {
 		exp_put(exp);
 		fhp->fh_export = NULL;
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 6106697adc04..26c02209babd 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -31,6 +31,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
 typedef struct svc_fh {
 	struct knfsd_fh		fh_handle;	/* FH data */
 	int			fh_maxsize;	/* max size for fh_handle */
+	struct vfsmount	*	fh_mnt;		/* mnt, possibly of subvol */
 	struct dentry *		fh_dentry;	/* validated dentry */
 	struct svc_export *	fh_export;	/* export pointer */
 
@@ -171,7 +172,7 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
  * Function prototypes
  */
 __be32	fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
-__be32	fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
+__be32	fh_compose(struct svc_fh *, struct svc_export *, struct path *, struct svc_fh *);
 __be32	fh_update(struct svc_fh *);
 void	fh_put(struct svc_fh *);
 
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 60d7c59e7935..245199b0e630 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -268,6 +268,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 	struct iattr	*attr = &argp->attrs;
 	struct inode	*inode;
 	struct dentry	*dchild;
+	struct path	path;
 	int		type, mode;
 	int		hosterr;
 	dev_t		rdev = 0, wanted = new_decode_dev(attr->ia_size);
@@ -298,7 +299,9 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 		goto out_unlock;
 	}
 	fh_init(newfhp, NFS_FHSIZE);
-	resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
+	path.mnt = dirfhp->fh_mnt;
+	path.dentry = dchild;
+	resp->status = fh_compose(newfhp, dirfhp->fh_export, &path, dirfhp);
 	if (!resp->status && d_really_is_negative(dchild))
 		resp->status = nfserr_noent;
 	dput(dchild);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 7c32edcfd2e9..c0c6920f25a4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -49,27 +49,26 @@
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
-/* 
- * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
+/*
+ * Called from nfsd_lookup and encode_dirent. Check if we have crossed
  * a mount point.
- * Returns -EAGAIN or -ETIMEDOUT leaving *dpp and *expp unchanged,
- *  or nfs_ok having possibly changed *dpp and *expp
+ * Returns -EAGAIN or -ETIMEDOUT leaving *path and *expp unchanged,
+ *  or nfs_ok having possibly changed *path and *expp
  */
 int
-nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, 
-		        struct svc_export **expp)
+nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
+	       struct svc_export **expp)
 {
 	struct svc_export *exp = *expp, *exp2 = NULL;
-	struct dentry *dentry = *dpp;
-	struct path path = {.mnt = mntget(exp->ex_path.mnt),
-			    .dentry = dget(dentry)};
+	struct path path = {.mnt = mntget(path_parent->mnt),
+			    .dentry = dget(path_parent->dentry)};
 	int err = 0;
 
 	err = follow_down(&path, 0);
 	if (err < 0)
 		goto out;
-	if (path.mnt == exp->ex_path.mnt && path.dentry == dentry &&
-	    nfsd_mountpoint(dentry, exp) == 2) {
+	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
+	    nfsd_mountpoint(path.dentry, exp) == 2) {
 		/* This is only a mountpoint in some other namespace */
 		path_put(&path);
 		goto out;
@@ -93,19 +92,14 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 	if (nfsd_v4client(rqstp) ||
 		(exp->ex_flags & NFSEXP_CROSSMOUNT) || EX_NOHIDE(exp2)) {
 		/* successfully crossed mount point */
-		/*
-		 * This is subtle: path.dentry is *not* on path.mnt
-		 * at this point.  The only reason we are safe is that
-		 * original mnt is pinned down by exp, so we should
-		 * put path *before* putting exp
-		 */
-		*dpp = path.dentry;
-		path.dentry = dentry;
+		path_put(path_parent);
+		*path_parent = path;
+		exp_put(exp);
 		*expp = exp2;
-		exp2 = exp;
+	} else {
+		path_put(&path);
+		exp_put(exp2);
 	}
-	path_put(&path);
-	exp_put(exp2);
 out:
 	return err;
 }
@@ -121,27 +115,30 @@ static void follow_to_parent(struct path *path)
 	path->dentry = dp;
 }
 
-static int nfsd_lookup_parent(struct svc_rqst *rqstp, struct dentry *dparent, struct svc_export **exp, struct dentry **dentryp)
+static int nfsd_lookup_parent(struct svc_rqst *rqstp, struct svc_export **exp,
+			      struct path *path)
 {
+	struct path path2;
 	struct svc_export *exp2;
-	struct path path = {.mnt = mntget((*exp)->ex_path.mnt),
-			    .dentry = dget(dparent)};
 
-	follow_to_parent(&path);
-
-	exp2 = rqst_exp_parent(rqstp, &path);
+	path2 = *path;
+	path_get(&path2);
+	follow_to_parent(&path2);
+	exp2 = rqst_exp_parent(rqstp, path);
 	if (PTR_ERR(exp2) == -ENOENT) {
-		*dentryp = dget(dparent);
+		/* leave path unchanged */
+		path_put(&path2);
+		return 0;
 	} else if (IS_ERR(exp2)) {
-		path_put(&path);
+		path_put(&path2);
 		return PTR_ERR(exp2);
 	} else {
-		*dentryp = dget(path.dentry);
+		path_put(path);
+		*path = path2;
 		exp_put(*exp);
 		*exp = exp2;
+		return 0;
 	}
-	path_put(&path);
-	return 0;
 }
 
 /*
@@ -172,29 +169,32 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
 __be32
 nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		   const char *name, unsigned int len,
-		   struct svc_export **exp_ret, struct dentry **dentry_ret)
+		   struct svc_export **exp_ret, struct path *ret)
 {
 	struct svc_export	*exp;
 	struct dentry		*dparent;
-	struct dentry		*dentry;
 	int			host_err;
 
 	dprintk("nfsd: nfsd_lookup(fh %s, %.*s)\n", SVCFH_fmt(fhp), len,name);
 
 	dparent = fhp->fh_dentry;
+	ret->mnt = mntget(fhp->fh_mnt);
 	exp = exp_get(fhp->fh_export);
 
 	/* Lookup the name, but don't follow links */
 	if (isdotent(name, len)) {
 		if (len==1)
-			dentry = dget(dparent);
+			ret->dentry = dget(dparent);
 		else if (dparent != exp->ex_path.dentry)
-			dentry = dget_parent(dparent);
+			ret->dentry = dget_parent(dparent);
 		else if (!EX_NOHIDE(exp) && !nfsd_v4client(rqstp))
-			dentry = dget(dparent); /* .. == . just like at / */
+			ret->dentry = dget(dparent); /* .. == . just like at / */
 		else {
-			/* checking mountpoint crossing is very different when stepping up */
-			host_err = nfsd_lookup_parent(rqstp, dparent, &exp, &dentry);
+			/* checking mountpoint crossing is very different when
+			 * stepping up
+			 */
+			ret->dentry = dget(dparent);
+			host_err = nfsd_lookup_parent(rqstp, &exp, ret);
 			if (host_err)
 				goto out_nfserr;
 		}
@@ -205,11 +205,13 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		 * need to take the child's i_mutex:
 		 */
 		fh_lock_nested(fhp, I_MUTEX_PARENT);
-		dentry = lookup_one_len(name, dparent, len);
-		host_err = PTR_ERR(dentry);
-		if (IS_ERR(dentry))
+		ret->dentry = lookup_one_len(name, dparent, len);
+		host_err = PTR_ERR(ret->dentry);
+		if (IS_ERR(ret->dentry)) {
+			ret->dentry = NULL;
 			goto out_nfserr;
-		if (nfsd_mountpoint(dentry, exp)) {
+		}
+		if (nfsd_mountpoint(ret->dentry, exp)) {
 			/*
 			 * We don't need the i_mutex after all.  It's
 			 * still possible we could open this (regular
@@ -219,18 +221,16 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			 * and a mountpoint won't be renamed:
 			 */
 			fh_unlock(fhp);
-			if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) {
-				dput(dentry);
+			if ((host_err = nfsd_cross_mnt(rqstp, ret, &exp)))
 				goto out_nfserr;
-			}
 		}
 	}
-	*dentry_ret = dentry;
 	*exp_ret = exp;
 	return 0;
 
 out_nfserr:
 	exp_put(exp);
+	path_put(ret);
 	return nfserrno(host_err);
 }
 
@@ -251,13 +251,13 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 				unsigned int len, struct svc_fh *resfh)
 {
 	struct svc_export	*exp;
-	struct dentry		*dentry;
+	struct path		path;
 	__be32 err;
 
 	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
 	if (err)
 		return err;
-	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry);
+	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &path);
 	if (err)
 		return err;
 	err = check_nfsd_access(exp, rqstp);
@@ -267,11 +267,11 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 	 * Note: we compose the file handle now, but as the
 	 * dentry may be negative, it may need to be updated.
 	 */
-	err = fh_compose(resfh, exp, dentry, fhp);
-	if (!err && d_really_is_negative(dentry))
+	err = fh_compose(resfh, exp, &path, fhp);
+	if (!err && d_really_is_negative(path.dentry))
 		err = nfserr_noent;
 out:
-	dput(dentry);
+	path_put(&path);
 	exp_put(exp);
 	return err;
 }
@@ -740,7 +740,7 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	__be32		err;
 	int		host_err = 0;
 
-	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.mnt = fhp->fh_mnt;
 	path.dentry = fhp->fh_dentry;
 	inode = d_inode(path.dentry);
 
@@ -1350,6 +1350,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		int type, dev_t rdev, struct svc_fh *resfhp)
 {
 	struct dentry	*dentry, *dchild = NULL;
+	struct path	path;
 	__be32		err;
 	int		host_err;
 
@@ -1371,7 +1372,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	host_err = PTR_ERR(dchild);
 	if (IS_ERR(dchild))
 		return nfserrno(host_err);
-	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
+	path.mnt = fhp->fh_mnt;
+	path.dentry = dchild;
+	err = fh_compose(resfhp, fhp->fh_export, &path, fhp);
 	/*
 	 * We unconditionally drop our ref to dchild as fh_compose will have
 	 * already grabbed its own ref for it.
@@ -1390,11 +1393,12 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 __be32
 do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		char *fname, int flen, struct iattr *iap,
-		struct svc_fh *resfhp, int createmode, u32 *verifier,
-	        bool *truncp, bool *created)
+	       char *fname, int flen, struct iattr *iap,
+	       struct svc_fh *resfhp, int createmode, u32 *verifier,
+	       bool *truncp, bool *created)
 {
 	struct dentry	*dentry, *dchild = NULL;
+	struct path	path;
 	struct inode	*dirp;
 	__be32		err;
 	int		host_err;
@@ -1436,7 +1440,9 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			goto out;
 	}
 
-	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
+	path.mnt = fhp->fh_mnt;
+	path.dentry = dchild;
+	err = fh_compose(resfhp, fhp->fh_export, &path, fhp);
 	if (err)
 		goto out;
 
@@ -1569,7 +1575,7 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
 	if (unlikely(err))
 		return err;
 
-	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.mnt = fhp->fh_mnt;
 	path.dentry = fhp->fh_dentry;
 
 	if (unlikely(!d_is_symlink(path.dentry)))
@@ -1600,6 +1606,7 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct svc_fh *resfhp)
 {
 	struct dentry	*dentry, *dnew;
+	struct path	pathnew;
 	__be32		err, cerr;
 	int		host_err;
 
@@ -1633,7 +1640,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	fh_drop_write(fhp);
 
-	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
+	pathnew.mnt = fhp->fh_mnt;
+	pathnew.dentry = dnew;
+	cerr = fh_compose(resfhp, fhp->fh_export, &pathnew, fhp);
 	dput(dnew);
 	if (err==0) err = cerr;
 out:
@@ -2107,7 +2116,7 @@ nfsd_statfs(struct svc_rqst *rqstp, struct svc_fh *fhp, struct kstatfs *stat, in
 	err = fh_verify(rqstp, fhp, 0, NFSD_MAY_NOP | access);
 	if (!err) {
 		struct path path = {
-			.mnt	= fhp->fh_export->ex_path.mnt,
+			.mnt	= fhp->fh_mnt,
 			.dentry	= fhp->fh_dentry,
 		};
 		if (vfs_statfs(&path, stat))
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index b21b76e6b9a8..52f587716208 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -42,13 +42,13 @@ struct nfsd_file;
 typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
 
 /* nfsd/vfs.c */
-int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
+int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *,
 		                struct svc_export **expp);
 __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
 				const char *, unsigned int, struct svc_fh *);
 __be32		 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *,
 				const char *, unsigned int,
-				struct svc_export **, struct dentry **);
+				struct svc_export **, struct path *);
 __be32		nfsd_setattr(struct svc_rqst *, struct svc_fh *,
 				struct iattr *, int, time64_t);
 int nfsd_mountpoint(struct dentry *, struct svc_export *);
@@ -138,7 +138,7 @@ static inline int fh_want_write(struct svc_fh *fh)
 
 	if (fh->fh_want_write)
 		return 0;
-	ret = mnt_want_write(fh->fh_export->ex_path.mnt);
+	ret = mnt_want_write(fh->fh_mnt);
 	if (!ret)
 		fh->fh_want_write = true;
 	return ret;
@@ -148,13 +148,13 @@ static inline void fh_drop_write(struct svc_fh *fh)
 {
 	if (fh->fh_want_write) {
 		fh->fh_want_write = false;
-		mnt_drop_write(fh->fh_export->ex_path.mnt);
+		mnt_drop_write(fh->fh_mnt);
 	}
 }
 
 static inline __be32 fh_getattr(struct svc_fh *fh, struct kstat *stat)
 {
-	struct path p = {.mnt = fh->fh_export->ex_path.mnt,
+	struct path p = {.mnt = fh->fh_mnt,
 			 .dentry = fh->fh_dentry};
 	return nfserrno(vfs_getattr(&p, stat, STATX_BASIC_STATS,
 				    AT_STATX_SYNC_AS_STAT));
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 3e4052e3bd50..8934db5113ac 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -763,7 +763,7 @@ void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *);
 void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op);
 __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
 		struct svc_fh *fhp, struct svc_export *exp,
-		struct dentry *dentry,
+		struct vfsmount *mnt, struct dentry *dentry,
 		u32 *bmval, struct svc_rqst *, int ignore_crossmnt);
 extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *, union nfsd4_op_u *u);
-- 
2.32.0


[-- Attachment #5: 0007-exportfs-Allow-filehandle-lookup-to-cross-internal-m.patch --]
[-- Type: application/octet-stream, Size: 11957 bytes --]

From 637b97c587df703d9348e4075051834e25666441 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] exportfs: Allow filehandle lookup to cross internal mount
 points.

When a filesystem has internal mounts, it controls the filehandles
across all those mounts (subvols) in the filesystem.  So it is useful to
be able to look up a filehandle again one mount, and get a result which
is in a different mount (part of the same overall file system).

This patch makes that possible by changing export_decode_fh() and
export_decode_fh_raw() to take a vfsmount pointer by reference, and
possibly change the vfsmount pointed to before returning.

The core of the change is in reconnect_path() which now not only checks
that the dentry is fully connected, but also that the vfsmnt reported
has the same 'dev' (reported by vfs_getattr) as the dentry.
If it doesn't, we walk up the dparent() chain to find the highest place
where the dev changes without there being a mount point, and trigger an
automount there.

As no filesystems yet provide local-mounts, this does not yet change any
behaviour.

In exportfs_decode_fh_raw() we previously tested for DCACHE_DISCONNECT
before calling reconnect_path().  That test is dropped.  It was only a
minor optimisation and is now inconvenient.

The change in overlayfs needs more careful thought than I have yet given
it.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/exportfs/expfs.c      | 96 ++++++++++++++++++++++++++++++++++------
 fs/fhandle.c             |  2 +-
 fs/nfsd/nfsfh.c          |  9 ++--
 fs/overlayfs/namei.c     |  5 ++-
 fs/xfs/xfs_ioctl.c       | 12 +++--
 include/linux/exportfs.h |  2 +-
 6 files changed, 103 insertions(+), 23 deletions(-)

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..2d7c42137b49 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -207,11 +207,18 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
  * that case reconnect_path may still succeed with target_dir fully
  * connected, but further operations using the filehandle will fail when
  * necessary (due to S_DEAD being set on the directory).
+ *
+ * If the filesystem supports multiple subvols, then *mntp may be updated
+ * to a subordinate mount point on the same filesystem.
  */
 static int
-reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
+reconnect_path(struct vfsmount **mntp, struct dentry *target_dir, char *nbuf)
 {
+	struct vfsmount *mnt = *mntp;
+	struct path path;
 	struct dentry *dentry, *parent;
+	struct kstat stat;
+	dev_t target_dev;
 
 	dentry = dget(target_dir);
 
@@ -232,6 +239,68 @@ reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
 	}
 	dput(dentry);
 	clear_disconnected(target_dir);
+
+	/* Need to find appropriate vfsmount, which might not exist yet.
+	 * We may need to trigger automount points.
+	 */
+	path.mnt = mnt;
+	path.dentry = target_dir;
+	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+	target_dev = stat.dev;
+
+	path.dentry = mnt->mnt_root;
+	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+
+	while (stat.dev != target_dev) {
+		/* walk up the dcache tree from target_dir, recording the
+		 * location of the most recent change in dev number,
+		 * until we find a mountpoint.
+		 * If there was no change in show_dev result before the
+		 * mountpount, the vfsmount at the mountpoint is what we want.
+		 * If there was, we need to trigger an automount where the
+		 * show_dev() result changed.
+		 */
+		struct dentry *last_change = NULL;
+		dev_t last_dev = target_dev;
+
+		dentry = dget(target_dir);
+		while ((parent = dget_parent(dentry)) != dentry) {
+			path.dentry = parent;
+			vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+			if (stat.dev != last_dev) {
+				path.dentry = dentry;
+				mnt = lookup_mnt(&path);
+				if (mnt) {
+					mntput(path.mnt);
+					path.mnt = mnt;
+					break;
+				}
+				dput(last_change);
+				last_change = dget(dentry);
+				last_dev = stat.dev;
+			}
+			dput(dentry);
+			dentry = parent;
+		}
+		dput(dentry); dput(parent);
+
+		if (!last_change)
+			break;
+
+		mnt = path.mnt;
+		path.dentry = last_change;
+		follow_down(&path, LOOKUP_AUTOMOUNT);
+		dput(path.dentry);
+		if (path.mnt == mnt)
+			/* There should have been a mount-trap there,
+			 * but there wasn't.  Just give up.
+			 */
+			break;
+
+		path.dentry = mnt->mnt_root;
+		vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+	}
+	*mntp = path.mnt;
 	return 0;
 }
 
@@ -417,11 +486,12 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
 }
 EXPORT_SYMBOL_GPL(exportfs_encode_fh);
 
-struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
+struct dentry *exportfs_decode_fh(struct vfsmount **mntp, struct fid *fid,
 		int fh_len, int fileid_type,
 		int (*acceptable)(void *, struct dentry *), void *context)
 {
-	const struct export_operations *nop = mnt->mnt_sb->s_export_op;
+	struct super_block *sb = (*mntp)->mnt_sb;
+	const struct export_operations *nop = sb->s_export_op;
 	struct dentry *result, *alias;
 	char nbuf[NAME_MAX+1];
 	int err;
@@ -431,7 +501,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 	 */
 	if (!nop || !nop->fh_to_dentry)
 		return ERR_PTR(-ESTALE);
-	result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
+	result = nop->fh_to_dentry(sb, fid, fh_len, fileid_type);
 	if (PTR_ERR(result) == -ENOMEM)
 		return ERR_CAST(result);
 	if (IS_ERR_OR_NULL(result))
@@ -452,14 +522,12 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		 *
 		 * On the positive side there is only one dentry for each
 		 * directory inode.  On the negative side this implies that we
-		 * to ensure our dentry is connected all the way up to the
+		 * need to ensure our dentry is connected all the way up to the
 		 * filesystem root.
 		 */
-		if (result->d_flags & DCACHE_DISCONNECTED) {
-			err = reconnect_path(mnt, result, nbuf);
-			if (err)
-				goto err_result;
-		}
+		err = reconnect_path(mntp, result, nbuf);
+		if (err)
+			goto err_result;
 
 		if (!acceptable(context, result)) {
 			err = -EACCES;
@@ -494,7 +562,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		if (!nop->fh_to_parent)
 			goto err_result;
 
-		target_dir = nop->fh_to_parent(mnt->mnt_sb, fid,
+		target_dir = nop->fh_to_parent(sb, fid,
 				fh_len, fileid_type);
 		if (!target_dir)
 			goto err_result;
@@ -507,7 +575,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		 * connected to the filesystem root.  The VFS really doesn't
 		 * like disconnected directories..
 		 */
-		err = reconnect_path(mnt, target_dir, nbuf);
+		err = reconnect_path(mntp, target_dir, nbuf);
 		if (err) {
 			dput(target_dir);
 			goto err_result;
@@ -518,7 +586,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		 * dentry for the inode we're after, make sure that our
 		 * inode is actually connected to the parent.
 		 */
-		err = exportfs_get_name(mnt, target_dir, nbuf, result);
+		err = exportfs_get_name(*mntp, target_dir, nbuf, result);
 		if (err) {
 			dput(target_dir);
 			goto err_result;
@@ -556,7 +624,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 			goto err_result;
 		}
 
-		return alias;
+		return result;
 	}
 
  err_result:
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 6630c69c23a2..b47c7696469f 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -149,7 +149,7 @@ static int do_handle_to_path(int mountdirfd, struct file_handle *handle,
 	}
 	/* change the handle size to multiple of sizeof(u32) */
 	handle_dwords = handle->handle_bytes >> 2;
-	path->dentry = exportfs_decode_fh(path->mnt,
+	path->dentry = exportfs_decode_fh(&path->mnt,
 					  (struct fid *)handle->f_handle,
 					  handle_dwords, handle->handle_type,
 					  vfs_dentry_acceptable, NULL);
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 0bf7ac13ae50..4023046f63e2 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -157,6 +157,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	struct fid *fid = NULL, sfid;
 	struct svc_export *exp;
 	struct dentry *dentry;
+	struct vfsmount *mnt = NULL;
 	int fileid_type;
 	int data_left = fh->fh_size/4;
 	__be32 error;
@@ -253,6 +254,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	if (rqstp->rq_vers > 2)
 		error = nfserr_badhandle;
 
+	mnt = mntget(exp->ex_path.mnt);
+
 	if (fh->fh_version != 1) {
 		sfid.i32.ino = fh->ofh_ino;
 		sfid.i32.gen = fh->ofh_generation;
@@ -269,7 +272,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	if (fileid_type == FILEID_ROOT)
 		dentry = dget(exp->ex_path.dentry);
 	else {
-		dentry = exportfs_decode_fh(exp->ex_path.mnt, fid,
+		dentry = exportfs_decode_fh(&mnt, fid,
 				data_left, fileid_type,
 				nfsd_acceptable, exp);
 		if (IS_ERR_OR_NULL(dentry))
@@ -290,10 +293,11 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	}
 
 	fhp->fh_dentry = dentry;
-	fhp->fh_mnt = mntget(exp->ex_path.mnt);
+	fhp->fh_mnt = mnt;
 	fhp->fh_export = exp;
 	return 0;
 out:
+	mntput(mnt);
 	exp_put(exp);
 	return error;
 }
@@ -428,7 +432,6 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 	return error;
 }
 
-
 /*
  * Compose a file handle for an NFS reply.
  *
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 210cd6f66e28..0bca19f6df54 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -155,6 +155,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 {
 	struct dentry *real;
 	int bytes;
+	struct vfsmount *mnt2;
 
 	/*
 	 * Make sure that the stored uuid matches the uuid of the lower
@@ -164,9 +165,11 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 		return NULL;
 
 	bytes = (fh->fb.len - offsetof(struct ovl_fb, fid));
-	real = exportfs_decode_fh(mnt, (struct fid *)fh->fb.fid,
+	mnt2 = mntget(mnt);
+	real = exportfs_decode_fh(&mnt2, (struct fid *)fh->fb.fid,
 				  bytes >> 2, (int)fh->fb.type,
 				  connected ? ovl_acceptable : NULL, mnt);
+	mntput(mnt2);
 	if (IS_ERR(real)) {
 		/*
 		 * Treat stale file handle to lower file as "origin unknown".
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 16039ea10ac9..76eb7d540811 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -149,6 +149,8 @@ xfs_handle_to_dentry(
 {
 	xfs_handle_t		handle;
 	struct xfs_fid64	fid;
+	struct dentry		*ret;
+	struct vfsmount		*mnt;
 
 	/*
 	 * Only allow handle opens under a directory.
@@ -168,9 +170,13 @@ xfs_handle_to_dentry(
 	fid.ino = handle.ha_fid.fid_ino;
 	fid.gen = handle.ha_fid.fid_gen;
 
-	return exportfs_decode_fh(parfilp->f_path.mnt, (struct fid *)&fid, 3,
-			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
-			xfs_handle_acceptable, NULL);
+	mnt = mntget(parfilp->f_path.mnt);
+	ret = exportfs_decode_fh(&mnt, (struct fid *)&fid, 3,
+				 FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
+				 xfs_handle_acceptable, NULL);
+	WARN_ON(mnt != parfilp->f_path.mnt);
+	mntput(mnt);
+	return ret;
 }
 
 STATIC struct dentry *
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index fe848901fcc3..9a8c5434a5cf 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -219,7 +219,7 @@ extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
 				    int *max_len, struct inode *parent);
 extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
 	int *max_len, int connectable);
-extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
+extern struct dentry *exportfs_decode_fh(struct vfsmount **mnt, struct fid *fid,
 	int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *),
 	void *context);
 
-- 
2.32.0


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28  4:58 ` Wang Yugui
@ 2021-07-28  6:04   ` Wang Yugui
  2021-07-28  7:01     ` NeilBrown
  2021-07-28  7:06   ` NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: Wang Yugui @ 2021-07-28  6:04 UTC (permalink / raw)
  To: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

Hi,

This patchset works well in 5.14-rc3.

1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)

2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
not used.
/dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
/dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
/dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2

This is a visiual feature change for btrfs user.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/07/28

> Hi,
> 
> We no longer need the dummy inode(BTRFS_FIRST_FREE_OBJECTID - 1) in this
> patch serials?
> 
> I tried to backport it to 5.10.x, but it failed to work.
> No big modification in this 5.10.x backporting, and all modified pathes
> are attached.
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2021/07/28
> 
> > There are long-standing problems with btrfs subvols, particularly in
> > relation to whether and how they are exposed in the mount table.
> > 
> >  - /proc/self/mountinfo reports the major:minor device number for each
> >     filesystem and when a btrfs subvol is explicitly mounted, the number
> >     reported is wrong - it does not match what stat() reports for the
> >     mountpoint.
> > 
> >  - when subvol are not explicitly mounted, they don't appear in
> >    mountinfo at all.
> > 
> > Consequences include that a tool which uses stat() to find the dev of the
> > filesystem, then searches mountinfo for that filesystem, will not find
> > it.
> > 
> > Some tools (e.g. findmnt) appear to have been enhanced to cope with this
> > strangeness, but it would be best to make btrfs behave more normally.
> > 
> >   - nfsd cannot currently see the transition to subvol, so reports the
> >     main volume and all subvols to the client as being in the same
> >     filesystem.  As inode numbers are not unique across all subvols,
> >     this can confuse clients.  In particular, 'find' is likely to report a
> >     loop.
> > 
> > subvols can be made to appear in mountinfo using automounts.  However
> > nfsd does not cope well with automounts.  It assumes all filesystems to
> > be exported are already mounted.  So adding automounts to btrfs would
> > break nfsd.
> > 
> > We can enhance nfsd to understand that some automounts can be managed.
> > "internal mounts" where a filesystem provides an automount point and
> > mounts its own directories, can be handled differently by nfsd.
> > 
> > This series addresses all these issues.  After a few enhancements to the
> > VFS to provide needed support, they enhance exportfs and nfsd to cope
> > with the concept of internal mounts, and then enhance btrfs to provide
> > them.
> > 
> > The NFSv3 support is incomplete.  I'm not sure we can make it work
> > "perfectly".  A normal nfsv3 mount seem to work well enough, but if
> > mounted with '-o noac', it loses track of the mounted-on inode number
> > and complains about inode numbers changing.
> > 
> > My basic test for these is to mount a btrfs filesystem which contains
> > subvols, nfs-export it and mount it with nfsv3 and nfsv4, then run
> > 'find' in each of the filesystem and check the contents of
> > /proc/self/mountinfo.
> > 
> > The first patch simply fixes the dev number in mountinfo and could
> > possibly be tagged for -stable.
> > 
> > NeilBrown
> > 
> > ---
> > 
> > NeilBrown (11):
> >       VFS: show correct dev num in mountinfo
> >       VFS: allow d_automount to create in-place bind-mount.
> >       VFS: pass lookup_flags into follow_down()
> >       VFS: export lookup_mnt()
> >       VFS: new function: mount_is_internal()
> >       nfsd: include a vfsmount in struct svc_fh
> >       exportfs: Allow filehandle lookup to cross internal mount points.
> >       nfsd: change get_parent_attributes() to nfsd_get_mounted_on()
> >       nfsd: Allow filehandle lookup to cross internal mount points.
> >       btrfs: introduce mapping function from location to inum
> >       btrfs: use automount to bind-mount all subvol roots.
> > 
> > 
> >  fs/btrfs/btrfs_inode.h   |  12 +++
> >  fs/btrfs/inode.c         | 111 ++++++++++++++++++++++++++-
> >  fs/btrfs/super.c         |   1 +
> >  fs/exportfs/expfs.c      | 100 ++++++++++++++++++++----
> >  fs/fhandle.c             |   2 +-
> >  fs/internal.h            |   1 -
> >  fs/namei.c               |   6 +-
> >  fs/namespace.c           |  32 +++++++-
> >  fs/nfsd/export.c         |   4 +-
> >  fs/nfsd/nfs3xdr.c        |  40 +++++++---
> >  fs/nfsd/nfs4proc.c       |   9 ++-
> >  fs/nfsd/nfs4xdr.c        | 106 ++++++++++++-------------
> >  fs/nfsd/nfsfh.c          |  44 +++++++----
> >  fs/nfsd/nfsfh.h          |   3 +-
> >  fs/nfsd/nfsproc.c        |   5 +-
> >  fs/nfsd/vfs.c            | 162 +++++++++++++++++++++++----------------
> >  fs/nfsd/vfs.h            |  12 +--
> >  fs/nfsd/xdr4.h           |   2 +-
> >  fs/overlayfs/namei.c     |   5 +-
> >  fs/xfs/xfs_ioctl.c       |  12 ++-
> >  include/linux/exportfs.h |   4 +-
> >  include/linux/mount.h    |   4 +
> >  include/linux/namei.h    |   2 +-
> >  23 files changed, 490 insertions(+), 189 deletions(-)
> > 
> > --
> > Signature
> 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28  6:04   ` Wang Yugui
@ 2021-07-28  7:01     ` NeilBrown
  2021-07-28 12:26       ` Neal Gompa
  2021-07-28 13:43       ` g.btrfs
  0 siblings, 2 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-28  7:01 UTC (permalink / raw)
  To: Wang Yugui
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On Wed, 28 Jul 2021, Wang Yugui wrote:
> Hi,
> 
> This patchset works well in 5.14-rc3.

Thanks for testing.

> 
> 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
> dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)

The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
be permanent.
The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
think).
This is a bit less of a hack.  It is an easily available number that is
fairly unique.

> 
> 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
> not used.
> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
> 
> This is a visiual feature change for btrfs user.

Hopefully it is an improvement.  But it is certainly a change that needs
to be carefully considered.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28  4:58 ` Wang Yugui
  2021-07-28  6:04   ` Wang Yugui
@ 2021-07-28  7:06   ` NeilBrown
  2021-07-28  9:36     ` Wang Yugui
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-28  7:06 UTC (permalink / raw)
  To: Wang Yugui
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On Wed, 28 Jul 2021, Wang Yugui wrote:
> Hi,
> 
> We no longer need the dummy inode(BTRFS_FIRST_FREE_OBJECTID - 1) in this
> patch serials?

No.

> 
> I tried to backport it to 5.10.x, but it failed to work.
> No big modification in this 5.10.x backporting, and all modified pathes
> are attached.

I'm not surprised, but I doubt there would be a major barrier to making
it work.  I'm unlikely to try until I have more positive reviews.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots.
  2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
@ 2021-07-28  8:37   ` kernel test robot
  2021-07-28  8:37   ` [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 123+ messages in thread
From: kernel test robot @ 2021-07-28  8:37 UTC (permalink / raw)
  To: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro
  Cc: kbuild-all, linux-fsdevel, linux-nfs, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1745 bytes --]

Hi NeilBrown,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on nfsd/nfsd-next]
[also build test WARNING on kdave/for-next hch-configfs/for-next linus/master v5.14-rc3 next-20210727]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/expose-btrfs-subvols-in-mount-table-correctly/20210728-064502
base:   git://linux-nfs.org/~bfields/linux.git nfsd-next
config: i386-randconfig-s002-20210728 (attached as .config)
compiler: gcc-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/58749022685aea90dfddfb9f8b2fcdc74dee6ec0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/expose-btrfs-subvols-in-mount-table-correctly/20210728-064502
        git checkout 58749022685aea90dfddfb9f8b2fcdc74dee6ec0
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash fs/btrfs/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> fs/btrfs/inode.c:5868:5: sparse: sparse: symbol 'btrfs_mountpoint_expiry_timeout' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33647 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static
  2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
  2021-07-28  8:37   ` kernel test robot
@ 2021-07-28  8:37   ` kernel test robot
  2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
  2021-07-31  6:25   ` [btrfs] 5874902268: xfstests.btrfs.202.fail kernel test robot
  3 siblings, 0 replies; 123+ messages in thread
From: kernel test robot @ 2021-07-28  8:37 UTC (permalink / raw)
  To: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro
  Cc: kbuild-all, linux-fsdevel, linux-nfs, linux-btrfs

fs/btrfs/inode.c:5868:5: warning: symbol 'btrfs_mountpoint_expiry_timeout' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---
 inode.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8667a26d684d4..4f9472e41074a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5865,7 +5865,7 @@ static int btrfs_dentry_delete(const struct dentry *dentry)
 static void btrfs_expire_automounts(struct work_struct *work);
 static LIST_HEAD(btrfs_automount_list);
 static DECLARE_DELAYED_WORK(btrfs_automount_task, btrfs_expire_automounts);
-int btrfs_mountpoint_expiry_timeout = 500 * HZ;
+static int btrfs_mountpoint_expiry_timeout = 500 * HZ;
 static void btrfs_expire_automounts(struct work_struct *work)
 {
 	struct list_head *list = &btrfs_automount_list;

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28  7:06   ` NeilBrown
@ 2021-07-28  9:36     ` Wang Yugui
  0 siblings, 0 replies; 123+ messages in thread
From: Wang Yugui @ 2021-07-28  9:36 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 679 bytes --]

Hi,

> > I tried to backport it to 5.10.x, but it failed to work.
> > No big modification in this 5.10.x backporting, and all modified pathes
> > are attached.
> 
> I'm not surprised, but I doubt there would be a major barrier to making
> it work.  I'm unlikely to try until I have more positive reviews.

With two patches as dependency, 
d045465fc6cbfa4acfb5a7d817a7c1a57a078109
	0001-exportfs-Add-a-function-to-return-the-raw-output-fro.patch
2e19d10c1438241de32467637a2a411971547991
	0002-nfsd-Fix-up-nfsd-to-ensure-that-timeout-errors-don-t.patch

the 5.10.x backporting become more simple and it works well now.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/07/28


[-- Attachment #2: 5.10.x.zip --]
[-- Type: application/x-zip-compressed, Size: 20102 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
@ 2021-07-28 10:13   ` Amir Goldstein
  2021-07-29  0:28     ` NeilBrown
  2021-07-28 19:17   ` J. Bruce Fields
  1 sibling, 1 reply; 123+ messages in thread
From: Amir Goldstein @ 2021-07-28 10:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Linux Btrfs, Miklos Szeredi

On Wed, Jul 28, 2021 at 1:44 AM NeilBrown <neilb@suse.de> wrote:
>
> When a filesystem has internal mounts, it controls the filehandles
> across all those mounts (subvols) in the filesystem.  So it is useful to
> be able to look up a filehandle again one mount, and get a result which
> is in a different mount (part of the same overall file system).
>
> This patch makes that possible by changing export_decode_fh() and
> export_decode_fh_raw() to take a vfsmount pointer by reference, and
> possibly change the vfsmount pointed to before returning.
>
> The core of the change is in reconnect_path() which now not only checks
> that the dentry is fully connected, but also that the vfsmnt reported
> has the same 'dev' (reported by vfs_getattr) as the dentry.
> If it doesn't, we walk up the dparent() chain to find the highest place
> where the dev changes without there being a mount point, and trigger an
> automount there.
>
> As no filesystems yet provide local-mounts, this does not yet change any
> behaviour.
>
> In exportfs_decode_fh_raw() we previously tested for DCACHE_DISCONNECT
> before calling reconnect_path().  That test is dropped.  It was only a
> minor optimisation and is now inconvenient.
>
> The change in overlayfs needs more careful thought than I have yet given
> it.

Just note that overlayfs does not support following auto mounts in layers.
See ovl_dentry_weird(). ovl_lookup() fails if it finds such a dentry.
So I think you need to make sure that the vfsmount was not crossed
when decoding an overlayfs real fh.

Apart from that, I think that your new feature should be opt-in w.r.t
the exportfs_decode_fh() vfs api and that overlayfs should not opt-in
for the cross mount decode.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28  7:01     ` NeilBrown
@ 2021-07-28 12:26       ` Neal Gompa
  2021-07-28 19:14         ` J. Bruce Fields
  2021-07-28 22:50         ` NeilBrown
  2021-07-28 13:43       ` g.btrfs
  1 sibling, 2 replies; 123+ messages in thread
From: Neal Gompa @ 2021-07-28 12:26 UTC (permalink / raw)
  To: NeilBrown
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, Btrfs BTRFS

On Wed, Jul 28, 2021 at 3:02 AM NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 28 Jul 2021, Wang Yugui wrote:
> > Hi,
> >
> > This patchset works well in 5.14-rc3.
>
> Thanks for testing.
>
> >
> > 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
> > dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
>
> The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
> be permanent.
> The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
> think).
> This is a bit less of a hack.  It is an easily available number that is
> fairly unique.
>
> >
> > 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
> > not used.
> > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
> > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
> > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
> >
> > This is a visiual feature change for btrfs user.
>
> Hopefully it is an improvement.  But it is certainly a change that needs
> to be carefully considered.

I think this is behavior people generally expect, but I wonder what
the consequences of this would be with huge numbers of subvolumes. If
there are hundreds or thousands of them (which is quite possible on
SUSE systems, for example, with its auto-snapshotting regime), this
would be a mess, wouldn't it?

Or can we add a way to mark these things to not show up there or is
there some kind of behavioral change we can make to snapper or other
tools to make them not show up here?



-- 
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots.
  2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
  2021-07-28  8:37   ` kernel test robot
  2021-07-28  8:37   ` [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static kernel test robot
@ 2021-07-28 13:12   ` Christian Brauner
  2021-07-29  0:43     ` NeilBrown
  2021-07-31  6:25   ` [btrfs] 5874902268: xfstests.btrfs.202.fail kernel test robot
  3 siblings, 1 reply; 123+ messages in thread
From: Christian Brauner @ 2021-07-28 13:12 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> All subvol roots are now marked as automounts.  If the d_automount()
> function determines that the dentry is not the root of the vfsmount, it
> creates a simple loop-back mount of the dentry onto itself.  If it
> determines that it IS the root of the vfsmount, it returns -EISDIR so
> that no further automounting is attempted.
> 
> btrfs_getattr pays special attention to these automount dentries.
> If it is NOT the root of the vfsmount:
>  - the ->dev is reported as that for the rest of the vfsmount
>  - the ->ino is reported as the subvol objectid, suitable transformed
>    to avoid collision.
> 
> This way the same inode appear to be different depending on which mount
> it is in.
> 
> automounted vfsmounts are kept on a list and timeout after 500 to 1000
> seconds of last use.  This is configurable via a module parameter.
> The tracking and timeout of automounts is copied from NFS.
> 
> Link: https://lore.kernel.org/r/162742546558.32498.1901201501617899416.stgit@noble.brown
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/btrfs/btrfs_inode.h |    2 +
>  fs/btrfs/inode.c       |  108 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/super.c       |    1 
>  3 files changed, 111 insertions(+)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index a4b5f38196e6..f03056cacc4a 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -387,4 +387,6 @@ static inline void btrfs_print_data_csum_error(struct btrfs_inode *inode,
>  			mirror_num);
>  }
>  
> +void btrfs_release_automount_timer(void);
> +
>  #endif
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 02537c1a9763..a5f46545fb38 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -31,6 +31,7 @@
>  #include <linux/migrate.h>
>  #include <linux/sched/mm.h>
>  #include <linux/iomap.h>
> +#include <linux/fs_context.h>
>  #include <asm/unaligned.h>
>  #include "misc.h"
>  #include "ctree.h"
> @@ -5782,6 +5783,8 @@ static int btrfs_init_locked_inode(struct inode *inode, void *p)
>  	struct btrfs_iget_args *args = p;
>  
>  	inode->i_ino = args->ino;
> +	if (args->ino == BTRFS_FIRST_FREE_OBJECTID)
> +		inode->i_flags |= S_AUTOMOUNT;
>  	BTRFS_I(inode)->location.objectid = args->ino;
>  	BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY;
>  	BTRFS_I(inode)->location.offset = 0;
> @@ -5985,6 +5988,101 @@ static int btrfs_dentry_delete(const struct dentry *dentry)
>  	return 0;
>  }
>  
> +static void btrfs_expire_automounts(struct work_struct *work);
> +static LIST_HEAD(btrfs_automount_list);
> +static DECLARE_DELAYED_WORK(btrfs_automount_task, btrfs_expire_automounts);
> +int btrfs_mountpoint_expiry_timeout = 500 * HZ;
> +static void btrfs_expire_automounts(struct work_struct *work)
> +{
> +	struct list_head *list = &btrfs_automount_list;
> +	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
> +
> +	mark_mounts_for_expiry(list);
> +	if (!list_empty(list) && timeout > 0)
> +		schedule_delayed_work(&btrfs_automount_task, timeout);
> +}
> +
> +void btrfs_release_automount_timer(void)
> +{
> +	if (list_empty(&btrfs_automount_list))
> +		cancel_delayed_work(&btrfs_automount_task);
> +}
> +
> +static struct vfsmount *btrfs_automount(struct path *path)
> +{
> +	struct fs_context fc;
> +	struct vfsmount *mnt;
> +	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
> +
> +	if (path->dentry == path->mnt->mnt_root)
> +		/* dentry is root of the vfsmount,
> +		 * so skip automount processing
> +		 */
> +		return ERR_PTR(-EISDIR);
> +	/* Create a bind-mount to expose the subvol in the mount table */
> +	fc.root = path->dentry;
> +	fc.sb_flags = 0;
> +	fc.source = "btrfs-automount";
> +	mnt = vfs_create_mount(&fc);
> +	if (IS_ERR(mnt))
> +		return mnt;

Hey Neil,

Sorry if this is a stupid question but wouldn't you want to copy the
mount properties from path->mnt here? Couldn't you otherwise use this to
e.g. suddenly expose a dentry on a read-only mount as read-write?

Christian

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28  7:01     ` NeilBrown
  2021-07-28 12:26       ` Neal Gompa
@ 2021-07-28 13:43       ` g.btrfs
  2021-07-29  1:39         ` NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: g.btrfs @ 2021-07-28 13:43 UTC (permalink / raw)
  To: NeilBrown, Wang Yugui
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On 28/07/2021 08:01, NeilBrown wrote:
> On Wed, 28 Jul 2021, Wang Yugui wrote:
>> Hi,
>>
>> This patchset works well in 5.14-rc3.
> 
> Thanks for testing.
> 
>>
>> 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
>> dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
> 
> The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
> be permanent.
> The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
> think).
> This is a bit less of a hack.  It is an easily available number that is
> fairly unique.
> 
>>
>> 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
>> not used.
>> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
>> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
>> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
>>
>> This is a visiual feature change for btrfs user.
> 
> Hopefully it is an improvement.  But it is certainly a change that needs
> to be carefully considered.

Would this change the behaviour of findmnt? I have several scripts that
depend on findmnt to select btrfs filesystems. Just to take a couple of
examples (using the example shown above): my scripts would depend on
'findmnt --target /mnt/test/sub1 -o target' providing /mnt/test, not the
subvolume; and another script would depend on 'findmnt -t btrfs
--mountpoint /mnt/test/sub1' providing no output as the directory is not
an /etc/fstab mount point for a btrfs filesystem.

Maybe findmnt isn't affected? Or maybe the change is worth making
anyway? But it needs to be carefully considered if it breaks existing
user interfaces.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28 12:26       ` Neal Gompa
@ 2021-07-28 19:14         ` J. Bruce Fields
  2021-07-29  1:29           ` Zygo Blaxell
  2021-07-28 22:50         ` NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: J. Bruce Fields @ 2021-07-28 19:14 UTC (permalink / raw)
  To: Neal Gompa
  Cc: NeilBrown, Wang Yugui, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, Btrfs BTRFS

On Wed, Jul 28, 2021 at 08:26:12AM -0400, Neal Gompa wrote:
> I think this is behavior people generally expect, but I wonder what
> the consequences of this would be with huge numbers of subvolumes. If
> there are hundreds or thousands of them (which is quite possible on
> SUSE systems, for example, with its auto-snapshotting regime), this
> would be a mess, wouldn't it?

I'm surprised that btrfs is special here.  Doesn't anyone have thousands
of lvm snapshots?  Or is it that they do but they're not normally
mounted?

--b.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points.
  2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
@ 2021-07-28 19:15   ` J. Bruce Fields
  2021-07-28 22:29     ` NeilBrown
  2021-07-30  0:42   ` Al Viro
  1 sibling, 1 reply; 123+ messages in thread
From: J. Bruce Fields @ 2021-07-28 19:15 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> Enhance nfsd to detect internal mounts and to cross them without
> requiring a new export.

Why don't we want a new export?

(Honest question, it's not obvious to me what the best behavior is.)

--b.

> 
> Also ensure the fsid reported is different for different submounts.  We
> do this by xoring in the ino of the mounted-on directory.  This makes
> sense for btrfs at least.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs3xdr.c |   28 +++++++++++++++++++++-------
>  fs/nfsd/nfs4xdr.c |   34 +++++++++++++++++++++++-----------
>  fs/nfsd/nfsfh.c   |    7 ++++++-
>  fs/nfsd/vfs.c     |   11 +++++++++--
>  4 files changed, 59 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 67af0c5c1543..80b1cc0334fa 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -370,6 +370,8 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
>  	case FSIDSOURCE_UUID:
>  		fsid = ((u64 *)fhp->fh_export->ex_uuid)[0];
>  		fsid ^= ((u64 *)fhp->fh_export->ex_uuid)[1];
> +		if (fhp->fh_mnt != fhp->fh_export->ex_path.mnt)
> +			fsid ^= nfsd_get_mounted_on(fhp->fh_mnt);
>  		break;
>  	default:
>  		fsid = (u64)huge_encode_dev(fhp->fh_dentry->d_sb->s_dev);
> @@ -1094,8 +1096,8 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
>  	__be32 rv = nfserr_noent;
>  
>  	dparent = cd->fh.fh_dentry;
> -	exp  = cd->fh.fh_export;
> -	child.mnt = cd->fh.fh_mnt;
> +	exp  = exp_get(cd->fh.fh_export);
> +	child.mnt = mntget(cd->fh.fh_mnt);
>  
>  	if (isdotent(name, namlen)) {
>  		if (namlen == 2) {
> @@ -1112,15 +1114,27 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
>  			child.dentry = dget(dparent);
>  	} else
>  		child.dentry = lookup_positive_unlocked(name, dparent, namlen);
> -	if (IS_ERR(child.dentry))
> +	if (IS_ERR(child.dentry)) {
> +		mntput(child.mnt);
> +		exp_put(exp);
>  		return rv;
> -	if (d_mountpoint(child.dentry))
> -		goto out;
> -	if (child.dentry->d_inode->i_ino != ino)
> +	}
> +	/* If child is a mountpoint, then we want to expose the fact
> +	 * so client can create a mountpoint.  If not, then a different
> +	 * ino number probably means a race with rename, so avoid providing
> +	 * too much detail.
> +	 */
> +	if (nfsd_mountpoint(child.dentry, exp)) {
> +		int err;
> +		err = nfsd_cross_mnt(cd->rqstp, &child, &exp);
> +		if (err)
> +			goto out;
> +	} else if (child.dentry->d_inode->i_ino != ino)
>  		goto out;
>  	rv = fh_compose(fhp, exp, &child, &cd->fh);
>  out:
> -	dput(child.dentry);
> +	path_put(&child);
> +	exp_put(exp);
>  	return rv;
>  }
>  
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d5683b6a74b2..4dbc99ed2c8b 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -2817,6 +2817,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>  	struct kstat stat;
>  	struct svc_fh *tempfh = NULL;
>  	struct kstatfs statfs;
> +	u64 mounted_on_ino;
> +	u64 sub_fsid;
>  	__be32 *p;
>  	int starting_len = xdr->buf->len;
>  	int attrlen_offset;
> @@ -2871,6 +2873,24 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>  			goto out;
>  		fhp = tempfh;
>  	}
> +	if ((bmval0 & FATTR4_WORD0_FSID) ||
> +	    (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID)) {
> +		mounted_on_ino = stat.ino;
> +		sub_fsid = 0;
> +		/*
> +		 * The inode number that the current mnt is mounted on is
> +		 * used for MOUNTED_ON_FILED if we are at the root,
> +		 * and for sub_fsid if mnt is not the export mnt.
> +		 */
> +		if (ignore_crossmnt == 0) {
> +			u64 moi = nfsd_get_mounted_on(mnt);
> +
> +			if (dentry == mnt->mnt_root && moi)
> +				mounted_on_ino = moi;
> +			if (mnt != exp->ex_path.mnt)
> +				sub_fsid = moi;
> +		}
> +	}
>  	if (bmval0 & FATTR4_WORD0_ACL) {
>  		err = nfsd4_get_nfs4_acl(rqstp, dentry, &acl);
>  		if (err == -EOPNOTSUPP)
> @@ -3008,6 +3028,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>  		case FSIDSOURCE_UUID:
>  			p = xdr_encode_opaque_fixed(p, exp->ex_uuid,
>  								EX_UUID_LEN);
> +			if (mnt != exp->ex_path.mnt)
> +				*(u64*)(p-2) ^= sub_fsid;
>  			break;
>  		}
>  	}
> @@ -3253,20 +3275,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>  		*p++ = cpu_to_be32(stat.mtime.tv_nsec);
>  	}
>  	if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
> -		u64 ino;
> -
>  		p = xdr_reserve_space(xdr, 8);
>  		if (!p)
>  			goto out_resource;
> -		/*
> -		 * Get parent's attributes if not ignoring crossmount
> -		 * and this is the root of a cross-mounted filesystem.
> -		 */
> -		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root)
> -			ino = nfsd_get_mounted_on(mnt);
> -		if (!ino)
> -			ino = stat.ino;
> -		p = xdr_encode_hyper(p, ino);
> +		p = xdr_encode_hyper(p, mounted_on_ino);
>  	}
>  #ifdef CONFIG_NFSD_PNFS
>  	if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 4023046f63e2..4b53838bca89 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -9,7 +9,7 @@
>   */
>  
>  #include <linux/exportfs.h>
> -
> +#include <linux/namei.h>
>  #include <linux/sunrpc/svcauth_gss.h>
>  #include "nfsd.h"
>  #include "vfs.h"
> @@ -285,6 +285,11 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  			default:
>  				dentry = ERR_PTR(-ESTALE);
>  			}
> +		} else if (nfsd_mountpoint(dentry, exp)) {
> +			struct path path = { .mnt = mnt, .dentry = dentry };
> +			follow_down(&path, LOOKUP_AUTOMOUNT);
> +			mnt = path.mnt;
> +			dentry = path.dentry;
>  		}
>  	}
>  	if (dentry == NULL)
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index baa12ac36ece..22523e1cd478 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -64,7 +64,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
>  			    .dentry = dget(path_parent->dentry)};
>  	int err = 0;
>  
> -	err = follow_down(&path, 0);
> +	err = follow_down(&path, LOOKUP_AUTOMOUNT);
>  	if (err < 0)
>  		goto out;
>  	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
> @@ -73,6 +73,13 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
>  		path_put(&path);
>  		goto out;
>  	}
> +	if (mount_is_internal(path.mnt)) {
> +		/* Use the new path, but don't look for a new export */
> +		/* FIXME should I check NOHIDE in this case?? */
> +		path_put(path_parent);
> +		*path_parent = path;
> +		goto out;
> +	}
>  
>  	exp2 = rqst_exp_get_by_name(rqstp, &path);
>  	if (IS_ERR(exp2)) {
> @@ -157,7 +164,7 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
>  		return 1;
>  	if (nfsd4_is_junction(dentry))
>  		return 1;
> -	if (d_mountpoint(dentry))
> +	if (d_managed(dentry))
>  		/*
>  		 * Might only be a mountpoint in a different namespace,
>  		 * but we need to check.
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
  2021-07-28 10:13   ` Amir Goldstein
@ 2021-07-28 19:17   ` J. Bruce Fields
  2021-07-28 22:25     ` NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: J. Bruce Fields @ 2021-07-28 19:17 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> When a filesystem has internal mounts, it controls the filehandles
> across all those mounts (subvols) in the filesystem.  So it is useful to
> be able to look up a filehandle again one mount, and get a result which
> is in a different mount (part of the same overall file system).
> 
> This patch makes that possible by changing export_decode_fh() and
> export_decode_fh_raw() to take a vfsmount pointer by reference, and
> possibly change the vfsmount pointed to before returning.
> 
> The core of the change is in reconnect_path() which now not only checks
> that the dentry is fully connected, but also that the vfsmnt reported
> has the same 'dev' (reported by vfs_getattr) as the dentry.
> If it doesn't, we walk up the dparent() chain to find the highest place
> where the dev changes without there being a mount point, and trigger an
> automount there.
> 
> As no filesystems yet provide local-mounts, this does not yet change any
> behaviour.
> 
> In exportfs_decode_fh_raw() we previously tested for DCACHE_DISCONNECT
> before calling reconnect_path().  That test is dropped.  It was only a
> minor optimisation and is now inconvenient.
> 
> The change in overlayfs needs more careful thought than I have yet given
> it.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/exportfs/expfs.c      |  100 +++++++++++++++++++++++++++++++++++++++-------
>  fs/fhandle.c             |    2 -
>  fs/nfsd/nfsfh.c          |    9 +++-
>  fs/overlayfs/namei.c     |    5 ++
>  fs/xfs/xfs_ioctl.c       |   12 ++++--
>  include/linux/exportfs.h |    4 +-
>  6 files changed, 106 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index 0106eba46d5a..2d7c42137b49 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -207,11 +207,18 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
>   * that case reconnect_path may still succeed with target_dir fully
>   * connected, but further operations using the filehandle will fail when
>   * necessary (due to S_DEAD being set on the directory).
> + *
> + * If the filesystem supports multiple subvols, then *mntp may be updated
> + * to a subordinate mount point on the same filesystem.
>   */
>  static int
> -reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
> +reconnect_path(struct vfsmount **mntp, struct dentry *target_dir, char *nbuf)
>  {
> +	struct vfsmount *mnt = *mntp;
> +	struct path path;
>  	struct dentry *dentry, *parent;
> +	struct kstat stat;
> +	dev_t target_dev;
>  
>  	dentry = dget(target_dir);
>  
> @@ -232,6 +239,68 @@ reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
>  	}
>  	dput(dentry);
>  	clear_disconnected(target_dir);

Minor nit--I'd prefer the following in a separate function.

--b.

> +
> +	/* Need to find appropriate vfsmount, which might not exist yet.
> +	 * We may need to trigger automount points.
> +	 */
> +	path.mnt = mnt;
> +	path.dentry = target_dir;
> +	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
> +	target_dev = stat.dev;
> +
> +	path.dentry = mnt->mnt_root;
> +	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
> +
> +	while (stat.dev != target_dev) {
> +		/* walk up the dcache tree from target_dir, recording the
> +		 * location of the most recent change in dev number,
> +		 * until we find a mountpoint.
> +		 * If there was no change in show_dev result before the
> +		 * mountpount, the vfsmount at the mountpoint is what we want.
> +		 * If there was, we need to trigger an automount where the
> +		 * show_dev() result changed.
> +		 */
> +		struct dentry *last_change = NULL;
> +		dev_t last_dev = target_dev;
> +
> +		dentry = dget(target_dir);
> +		while ((parent = dget_parent(dentry)) != dentry) {
> +			path.dentry = parent;
> +			vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
> +			if (stat.dev != last_dev) {
> +				path.dentry = dentry;
> +				mnt = lookup_mnt(&path);
> +				if (mnt) {
> +					mntput(path.mnt);
> +					path.mnt = mnt;
> +					break;
> +				}
> +				dput(last_change);
> +				last_change = dget(dentry);
> +				last_dev = stat.dev;
> +			}
> +			dput(dentry);
> +			dentry = parent;
> +		}
> +		dput(dentry); dput(parent);
> +
> +		if (!last_change)
> +			break;
> +
> +		mnt = path.mnt;
> +		path.dentry = last_change;
> +		follow_down(&path, LOOKUP_AUTOMOUNT);
> +		dput(path.dentry);
> +		if (path.mnt == mnt)
> +			/* There should have been a mount-trap there,
> +			 * but there wasn't.  Just give up.
> +			 */
> +			break;
> +
> +		path.dentry = mnt->mnt_root;
> +		vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
> +	}
> +	*mntp = path.mnt;
>  	return 0;
>  }
>  
> @@ -418,12 +487,13 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
>  EXPORT_SYMBOL_GPL(exportfs_encode_fh);
>  
>  struct dentry *
> -exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
> +exportfs_decode_fh_raw(struct vfsmount **mntp, struct fid *fid, int fh_len,
>  		       int fileid_type,
>  		       int (*acceptable)(void *, struct dentry *),
>  		       void *context)
>  {
> -	const struct export_operations *nop = mnt->mnt_sb->s_export_op;
> +	struct super_block *sb = (*mntp)->mnt_sb;
> +	const struct export_operations *nop = sb->s_export_op;
>  	struct dentry *result, *alias;
>  	char nbuf[NAME_MAX+1];
>  	int err;
> @@ -433,7 +503,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  	 */
>  	if (!nop || !nop->fh_to_dentry)
>  		return ERR_PTR(-ESTALE);
> -	result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
> +	result = nop->fh_to_dentry(sb, fid, fh_len, fileid_type);
>  	if (IS_ERR_OR_NULL(result))
>  		return result;
>  
> @@ -452,14 +522,12 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  		 *
>  		 * On the positive side there is only one dentry for each
>  		 * directory inode.  On the negative side this implies that we
> -		 * to ensure our dentry is connected all the way up to the
> +		 * need to ensure our dentry is connected all the way up to the
>  		 * filesystem root.
>  		 */
> -		if (result->d_flags & DCACHE_DISCONNECTED) {
> -			err = reconnect_path(mnt, result, nbuf);
> -			if (err)
> -				goto err_result;
> -		}
> +		err = reconnect_path(mntp, result, nbuf);
> +		if (err)
> +			goto err_result;
>  
>  		if (!acceptable(context, result)) {
>  			err = -EACCES;
> @@ -494,7 +562,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  		if (!nop->fh_to_parent)
>  			goto err_result;
>  
> -		target_dir = nop->fh_to_parent(mnt->mnt_sb, fid,
> +		target_dir = nop->fh_to_parent(sb, fid,
>  				fh_len, fileid_type);
>  		if (!target_dir)
>  			goto err_result;
> @@ -507,7 +575,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  		 * connected to the filesystem root.  The VFS really doesn't
>  		 * like disconnected directories..
>  		 */
> -		err = reconnect_path(mnt, target_dir, nbuf);
> +		err = reconnect_path(mntp, target_dir, nbuf);
>  		if (err) {
>  			dput(target_dir);
>  			goto err_result;
> @@ -518,7 +586,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  		 * dentry for the inode we're after, make sure that our
>  		 * inode is actually connected to the parent.
>  		 */
> -		err = exportfs_get_name(mnt, target_dir, nbuf, result);
> +		err = exportfs_get_name(*mntp, target_dir, nbuf, result);
>  		if (err) {
>  			dput(target_dir);
>  			goto err_result;
> @@ -556,7 +624,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  			goto err_result;
>  		}
>  
> -		return alias;
> +		return result;
>  	}
>  
>   err_result:
> @@ -565,14 +633,14 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
>  }
>  EXPORT_SYMBOL_GPL(exportfs_decode_fh_raw);
>  
> -struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
> +struct dentry *exportfs_decode_fh(struct vfsmount **mntp, struct fid *fid,
>  				  int fh_len, int fileid_type,
>  				  int (*acceptable)(void *, struct dentry *),
>  				  void *context)
>  {
>  	struct dentry *ret;
>  
> -	ret = exportfs_decode_fh_raw(mnt, fid, fh_len, fileid_type,
> +	ret = exportfs_decode_fh_raw(mntp, fid, fh_len, fileid_type,
>  				     acceptable, context);
>  	if (IS_ERR_OR_NULL(ret)) {
>  		if (ret == ERR_PTR(-ENOMEM))
> diff --git a/fs/fhandle.c b/fs/fhandle.c
> index 6630c69c23a2..b47c7696469f 100644
> --- a/fs/fhandle.c
> +++ b/fs/fhandle.c
> @@ -149,7 +149,7 @@ static int do_handle_to_path(int mountdirfd, struct file_handle *handle,
>  	}
>  	/* change the handle size to multiple of sizeof(u32) */
>  	handle_dwords = handle->handle_bytes >> 2;
> -	path->dentry = exportfs_decode_fh(path->mnt,
> +	path->dentry = exportfs_decode_fh(&path->mnt,
>  					  (struct fid *)handle->f_handle,
>  					  handle_dwords, handle->handle_type,
>  					  vfs_dentry_acceptable, NULL);
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 0bf7ac13ae50..4023046f63e2 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -157,6 +157,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	struct fid *fid = NULL, sfid;
>  	struct svc_export *exp;
>  	struct dentry *dentry;
> +	struct vfsmount *mnt = NULL;
>  	int fileid_type;
>  	int data_left = fh->fh_size/4;
>  	__be32 error;
> @@ -253,6 +254,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	if (rqstp->rq_vers > 2)
>  		error = nfserr_badhandle;
>  
> +	mnt = mntget(exp->ex_path.mnt);
> +
>  	if (fh->fh_version != 1) {
>  		sfid.i32.ino = fh->ofh_ino;
>  		sfid.i32.gen = fh->ofh_generation;
> @@ -269,7 +272,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	if (fileid_type == FILEID_ROOT)
>  		dentry = dget(exp->ex_path.dentry);
>  	else {
> -		dentry = exportfs_decode_fh_raw(exp->ex_path.mnt, fid,
> +		dentry = exportfs_decode_fh_raw(&mnt, fid,
>  						data_left, fileid_type,
>  						nfsd_acceptable, exp);
>  		if (IS_ERR_OR_NULL(dentry)) {
> @@ -299,7 +302,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	}
>  
>  	fhp->fh_dentry = dentry;
> -	fhp->fh_mnt = mntget(exp->ex_path.mnt);
> +	fhp->fh_mnt = mnt;
>  	fhp->fh_export = exp;
>  
>  	switch (rqstp->rq_vers) {
> @@ -317,6 +320,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  
>  	return 0;
>  out:
> +	mntput(mnt);
>  	exp_put(exp);
>  	return error;
>  }
> @@ -428,7 +432,6 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
>  	return error;
>  }
>  
> -
>  /*
>   * Compose a file handle for an NFS reply.
>   *
> diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
> index 210cd6f66e28..0bca19f6df54 100644
> --- a/fs/overlayfs/namei.c
> +++ b/fs/overlayfs/namei.c
> @@ -155,6 +155,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fs *ofs, struct ovl_fh *fh,
>  {
>  	struct dentry *real;
>  	int bytes;
> +	struct vfsmount *mnt2;
>  
>  	if (!capable(CAP_DAC_READ_SEARCH))
>  		return NULL;
> @@ -169,9 +170,11 @@ struct dentry *ovl_decode_real_fh(struct ovl_fs *ofs, struct ovl_fh *fh,
>  		return NULL;
>  
>  	bytes = (fh->fb.len - offsetof(struct ovl_fb, fid));
> -	real = exportfs_decode_fh(mnt, (struct fid *)fh->fb.fid,
> +	mnt2 = mntget(mnt);
> +	real = exportfs_decode_fh(&mnt2, (struct fid *)fh->fb.fid,
>  				  bytes >> 2, (int)fh->fb.type,
>  				  connected ? ovl_acceptable : NULL, mnt);
> +	mntput(mnt2);
>  	if (IS_ERR(real)) {
>  		/*
>  		 * Treat stale file handle to lower file as "origin unknown".
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 16039ea10ac9..76eb7d540811 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -149,6 +149,8 @@ xfs_handle_to_dentry(
>  {
>  	xfs_handle_t		handle;
>  	struct xfs_fid64	fid;
> +	struct dentry		*ret;
> +	struct vfsmount		*mnt;
>  
>  	/*
>  	 * Only allow handle opens under a directory.
> @@ -168,9 +170,13 @@ xfs_handle_to_dentry(
>  	fid.ino = handle.ha_fid.fid_ino;
>  	fid.gen = handle.ha_fid.fid_gen;
>  
> -	return exportfs_decode_fh(parfilp->f_path.mnt, (struct fid *)&fid, 3,
> -			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
> -			xfs_handle_acceptable, NULL);
> +	mnt = mntget(parfilp->f_path.mnt);
> +	ret = exportfs_decode_fh(&mnt, (struct fid *)&fid, 3,
> +				 FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
> +				 xfs_handle_acceptable, NULL);
> +	WARN_ON(mnt != parfilp->f_path.mnt);
> +	mntput(mnt);
> +	return ret;
>  }
>  
>  STATIC struct dentry *
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index fe848901fcc3..9a8c5434a5cf 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -228,12 +228,12 @@ extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
>  				    int *max_len, struct inode *parent);
>  extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
>  	int *max_len, int connectable);
> -extern struct dentry *exportfs_decode_fh_raw(struct vfsmount *mnt,
> +extern struct dentry *exportfs_decode_fh_raw(struct vfsmount **mntp,
>  					     struct fid *fid, int fh_len,
>  					     int fileid_type,
>  					     int (*acceptable)(void *, struct dentry *),
>  					     void *context);
> -extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
> +extern struct dentry *exportfs_decode_fh(struct vfsmount **mnt, struct fid *fid,
>  	int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *),
>  	void *context);
>  
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (12 preceding siblings ...)
  2021-07-28  4:58 ` Wang Yugui
@ 2021-07-28 19:35 ` J. Bruce Fields
  2021-07-28 21:30   ` Josef Bacik
  2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
  14 siblings, 1 reply; 123+ messages in thread
From: J. Bruce Fields @ 2021-07-28 19:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	linux-btrfs

I'm still stuck trying to understand why subvolumes can't get their own
superblocks:

	- Why are the performance issues Josef raises unsurmountable?
	  And why are they unique to btrfs?  (Surely there other cases
	  where people need hundreds or thousands of superblocks?)

	- If filehandle decoding can return a different vfs mount than
	  it's passed, why can't it return a different superblock?

--b.

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> There are long-standing problems with btrfs subvols, particularly in
> relation to whether and how they are exposed in the mount table.
> 
>  - /proc/self/mountinfo reports the major:minor device number for each
>     filesystem and when a btrfs subvol is explicitly mounted, the number
>     reported is wrong - it does not match what stat() reports for the
>     mountpoint.
> 
>  - when subvol are not explicitly mounted, they don't appear in
>    mountinfo at all.
> 
> Consequences include that a tool which uses stat() to find the dev of the
> filesystem, then searches mountinfo for that filesystem, will not find
> it.
> 
> Some tools (e.g. findmnt) appear to have been enhanced to cope with this
> strangeness, but it would be best to make btrfs behave more normally.
> 
>   - nfsd cannot currently see the transition to subvol, so reports the
>     main volume and all subvols to the client as being in the same
>     filesystem.  As inode numbers are not unique across all subvols,
>     this can confuse clients.  In particular, 'find' is likely to report a
>     loop.
> 
> subvols can be made to appear in mountinfo using automounts.  However
> nfsd does not cope well with automounts.  It assumes all filesystems to
> be exported are already mounted.  So adding automounts to btrfs would
> break nfsd.
> 
> We can enhance nfsd to understand that some automounts can be managed.
> "internal mounts" where a filesystem provides an automount point and
> mounts its own directories, can be handled differently by nfsd.
> 
> This series addresses all these issues.  After a few enhancements to the
> VFS to provide needed support, they enhance exportfs and nfsd to cope
> with the concept of internal mounts, and then enhance btrfs to provide
> them.
> 
> The NFSv3 support is incomplete.  I'm not sure we can make it work
> "perfectly".  A normal nfsv3 mount seem to work well enough, but if
> mounted with '-o noac', it loses track of the mounted-on inode number
> and complains about inode numbers changing.
> 
> My basic test for these is to mount a btrfs filesystem which contains
> subvols, nfs-export it and mount it with nfsv3 and nfsv4, then run
> 'find' in each of the filesystem and check the contents of
> /proc/self/mountinfo.
> 
> The first patch simply fixes the dev number in mountinfo and could
> possibly be tagged for -stable.
> 
> NeilBrown
> 
> ---
> 
> NeilBrown (11):
>       VFS: show correct dev num in mountinfo
>       VFS: allow d_automount to create in-place bind-mount.
>       VFS: pass lookup_flags into follow_down()
>       VFS: export lookup_mnt()
>       VFS: new function: mount_is_internal()
>       nfsd: include a vfsmount in struct svc_fh
>       exportfs: Allow filehandle lookup to cross internal mount points.
>       nfsd: change get_parent_attributes() to nfsd_get_mounted_on()
>       nfsd: Allow filehandle lookup to cross internal mount points.
>       btrfs: introduce mapping function from location to inum
>       btrfs: use automount to bind-mount all subvol roots.
> 
> 
>  fs/btrfs/btrfs_inode.h   |  12 +++
>  fs/btrfs/inode.c         | 111 ++++++++++++++++++++++++++-
>  fs/btrfs/super.c         |   1 +
>  fs/exportfs/expfs.c      | 100 ++++++++++++++++++++----
>  fs/fhandle.c             |   2 +-
>  fs/internal.h            |   1 -
>  fs/namei.c               |   6 +-
>  fs/namespace.c           |  32 +++++++-
>  fs/nfsd/export.c         |   4 +-
>  fs/nfsd/nfs3xdr.c        |  40 +++++++---
>  fs/nfsd/nfs4proc.c       |   9 ++-
>  fs/nfsd/nfs4xdr.c        | 106 ++++++++++++-------------
>  fs/nfsd/nfsfh.c          |  44 +++++++----
>  fs/nfsd/nfsfh.h          |   3 +-
>  fs/nfsd/nfsproc.c        |   5 +-
>  fs/nfsd/vfs.c            | 162 +++++++++++++++++++++++----------------
>  fs/nfsd/vfs.h            |  12 +--
>  fs/nfsd/xdr4.h           |   2 +-
>  fs/overlayfs/namei.c     |   5 +-
>  fs/xfs/xfs_ioctl.c       |  12 ++-
>  include/linux/exportfs.h |   4 +-
>  include/linux/mount.h    |   4 +
>  include/linux/namei.h    |   2 +-
>  23 files changed, 490 insertions(+), 189 deletions(-)
> 
> --
> Signature

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28 19:35 ` J. Bruce Fields
@ 2021-07-28 21:30   ` Josef Bacik
  2021-07-30  0:13     ` Al Viro
  0 siblings, 1 reply; 123+ messages in thread
From: Josef Bacik @ 2021-07-28 21:30 UTC (permalink / raw)
  To: J. Bruce Fields, NeilBrown
  Cc: Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, linux-btrfs

On 7/28/21 3:35 PM, J. Bruce Fields wrote:
> I'm still stuck trying to understand why subvolumes can't get their own
> superblocks:
> 
> 	- Why are the performance issues Josef raises unsurmountable?
> 	  And why are they unique to btrfs?  (Surely there other cases
> 	  where people need hundreds or thousands of superblocks?)
> 

I don't think anybody has that many file systems.  For btrfs it's a single file 
system.  Think of syncfs, it's going to walk through all of the super blocks on 
the system calling ->sync_fs on each subvol superblock.  Now this isn't a huge 
deal, we could just have some flag that says "I'm not real" or even just have 
anonymous superblocks that don't get added to the global super_blocks list, and 
that would address my main pain points.

The second part is inode reclaim.  Again this particular problem could be 
avoided if we had an anonymous superblock that wasn't actually used, but the 
inode lru is per superblock.  Now with reclaim instead of walking all the 
inodes, you're walking a bunch of super blocks and then walking the list of 
inodes within those super blocks.  You're burning CPU cycles because now instead 
of getting big chunks of inodes to dispose, it's spread out across many super 
blocks.

The other weird thing is the way we apply pressure to shrinker systems.  We 
essentially say "try to evict X objects from your list", which means in this 
case with lots of subvolumes we'd be evicting waaaaay more inodes than you were 
before, likely impacting performance where you have workloads that have lots of 
files open across many subvolumes (which is what FB does with it's containers).

If we want a anonymous superblock per subvolume then the only way it'll work is 
if it's not actually tied into anything, and we still use the primary super 
block for the whole file system.  And if that's what we're going to do what's 
the point of the super block exactly?  This approach that Neil's come up with 
seems like a reasonable solution to me.  Christoph gets his separation and 
/proc/self/mountinfo, and we avoid the scalability headache of a billion super 
blocks.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-07-28 19:17   ` J. Bruce Fields
@ 2021-07-28 22:25     ` NeilBrown
  0 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-28 22:25 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Josef Bacik, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	linux-btrfs

On Thu, 29 Jul 2021, J. Bruce Fields wrote:
> On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > @@ -232,6 +239,68 @@ reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
> >  	}
> >  	dput(dentry);
> >  	clear_disconnected(target_dir);
> 
> Minor nit--I'd prefer the following in a separate function.

Fair.  Are you thinking "a separate function that is called here" or "a
separate function that needs to be called by whoever called
exportfs_decode_fh_raw()" if they happen to want the vfsmnt to be
updated?

NeilBrown

> 
> --b.
> 
> > +
> > +	/* Need to find appropriate vfsmount, which might not exist yet.
> > +	 * We may need to trigger automount points.
> > +	 */
> > +	path.mnt = mnt;
> > +	path.dentry = target_dir;
> > +	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
> > +	target_dev = stat.dev;
> > +
> > +	path.dentry = mnt->mnt_root;
> > +	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
> > +
> > +	while (stat.dev != target_dev) {
> > +		/* walk up the dcache tree from target_dir, recording the
> > +		 * location of the most recent change in dev number,
> > +		 * until we find a mountpoint.
> > +		 * If there was no change in show_dev result before the
> > +		 * mountpount, the vfsmount at the mountpoint is what we want.
> > +		 * If there was, we need to trigger an automount where the
> > +		 * show_dev() result changed.
> > +		 */

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points.
  2021-07-28 19:15   ` J. Bruce Fields
@ 2021-07-28 22:29     ` NeilBrown
  0 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-28 22:29 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Josef Bacik, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	linux-btrfs

On Thu, 29 Jul 2021, J. Bruce Fields wrote:
> On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > Enhance nfsd to detect internal mounts and to cross them without
> > requiring a new export.
> 
> Why don't we want a new export?
> 
> (Honest question, it's not obvious to me what the best behavior is.)

Because a new export means asking user-space to determine if the mount
is exported and to provide a filehandle-prefix for it.  A large part of
the point of this it to avoid using a different filehandle-prefix.

I haven't yet thought deeply about how the 'crossmnt' flag (for v3)
should affect crossing these internal mounts.  My current feeling is
that it shouldn't as it really is just one big filesystem being
exported, which happens to internally have different inode-number
spaces. 
Unfortuantely this technically violates the RFC as the fsid is not meant
to change when you do a LOOKUP ...

NeilBrown


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28 12:26       ` Neal Gompa
  2021-07-28 19:14         ` J. Bruce Fields
@ 2021-07-28 22:50         ` NeilBrown
  2021-07-29  2:37           ` Zygo Blaxell
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-28 22:50 UTC (permalink / raw)
  To: Neal Gompa
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, Btrfs BTRFS

On Wed, 28 Jul 2021, Neal Gompa wrote:
> On Wed, Jul 28, 2021 at 3:02 AM NeilBrown <neilb@suse.de> wrote:
> >
> > On Wed, 28 Jul 2021, Wang Yugui wrote:
> > > Hi,
> > >
> > > This patchset works well in 5.14-rc3.
> >
> > Thanks for testing.
> >
> > >
> > > 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
> > > dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
> >
> > The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
> > be permanent.
> > The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
> > think).
> > This is a bit less of a hack.  It is an easily available number that is
> > fairly unique.
> >
> > >
> > > 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
> > > not used.
> > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
> > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
> > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
> > >
> > > This is a visiual feature change for btrfs user.
> >
> > Hopefully it is an improvement.  But it is certainly a change that needs
> > to be carefully considered.
> 
> I think this is behavior people generally expect, but I wonder what
> the consequences of this would be with huge numbers of subvolumes. If
> there are hundreds or thousands of them (which is quite possible on
> SUSE systems, for example, with its auto-snapshotting regime), this
> would be a mess, wouldn't it?

Would there be hundreds or thousands of subvols concurrently being
accessed? The auto-mounted subvols only appear in the mount table while
that are being accessed, and for about 15 minutes after the last access.
I suspect that most subvols are "backup" snapshots which are not being
accessed and so would not appear.

> 
> Or can we add a way to mark these things to not show up there or is
> there some kind of behavioral change we can make to snapper or other
> tools to make them not show up here?

Certainly it might make sense to flag these in some way so that tools
can choose the ignore them or handle them specially, just as nfsd needs
to handle them specially.  I was considering a "local" mount flag.

NeilBrown

> 
> 
> 
> -- 
> 真実はいつも一つ!/ Always, there's only one truth!
> 
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-07-28 10:13   ` Amir Goldstein
@ 2021-07-29  0:28     ` NeilBrown
  2021-07-29  5:27       ` Amir Goldstein
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-29  0:28 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Linux Btrfs, Miklos Szeredi

On Wed, 28 Jul 2021, Amir Goldstein wrote:
> On Wed, Jul 28, 2021 at 1:44 AM NeilBrown <neilb@suse.de> wrote:
> >
> > When a filesystem has internal mounts, it controls the filehandles
> > across all those mounts (subvols) in the filesystem.  So it is useful to
> > be able to look up a filehandle again one mount, and get a result which
> > is in a different mount (part of the same overall file system).
> >
> > This patch makes that possible by changing export_decode_fh() and
> > export_decode_fh_raw() to take a vfsmount pointer by reference, and
> > possibly change the vfsmount pointed to before returning.
> >
> > The core of the change is in reconnect_path() which now not only checks
> > that the dentry is fully connected, but also that the vfsmnt reported
> > has the same 'dev' (reported by vfs_getattr) as the dentry.
> > If it doesn't, we walk up the dparent() chain to find the highest place
> > where the dev changes without there being a mount point, and trigger an
> > automount there.
> >
> > As no filesystems yet provide local-mounts, this does not yet change any
> > behaviour.
> >
> > In exportfs_decode_fh_raw() we previously tested for DCACHE_DISCONNECT
> > before calling reconnect_path().  That test is dropped.  It was only a
> > minor optimisation and is now inconvenient.
> >
> > The change in overlayfs needs more careful thought than I have yet given
> > it.
> 
> Just note that overlayfs does not support following auto mounts in layers.
> See ovl_dentry_weird(). ovl_lookup() fails if it finds such a dentry.
> So I think you need to make sure that the vfsmount was not crossed
> when decoding an overlayfs real fh.

Sounds sensible - thanks.
Does this mean that my change would cause problems for people using
overlayfs with a btrfs lower layer?

> 
> Apart from that, I think that your new feature should be opt-in w.r.t
> the exportfs_decode_fh() vfs api and that overlayfs should not opt-in
> for the cross mount decode.

I did consider making it opt-in, but it is easy enough for the caller
to ignore the changed vfsmount, and only one (of 4) callers that it
really makes a difference for.

I will review the overlayfs in light of these comments.
Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots.
  2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
@ 2021-07-29  0:43     ` NeilBrown
  2021-07-29 14:38       ` Christian Brauner
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-29  0:43 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On Wed, 28 Jul 2021, Christian Brauner wrote:
> 
> Hey Neil,
> 
> Sorry if this is a stupid question but wouldn't you want to copy the
> mount properties from path->mnt here? Couldn't you otherwise use this to
> e.g. suddenly expose a dentry on a read-only mount as read-write?

There are no stupid questions, and this is a particularly non-stupid
one!

I hadn't considered that, but having examined the code I see that it
is already handled.
The vfsmount that d_automount returns is passed to finish_automount(),
which hands it to do_add_mount() together with the mnt_flags for the
parent vfsmount (plus MNT_SHRINKABLE).
do_add_mount() sets up the mnt_flags of the new vfsmount.
In fact, the d_automount interface has no control of these flags at all.
Whatever it sets will be over-written by do_add_mount.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28 19:14         ` J. Bruce Fields
@ 2021-07-29  1:29           ` Zygo Blaxell
  2021-07-29  1:43             ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Zygo Blaxell @ 2021-07-29  1:29 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Neal Gompa, NeilBrown, Wang Yugui, Christoph Hellwig,
	Josef Bacik, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On Wed, Jul 28, 2021 at 03:14:31PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 28, 2021 at 08:26:12AM -0400, Neal Gompa wrote:
> > I think this is behavior people generally expect, but I wonder what
> > the consequences of this would be with huge numbers of subvolumes. If
> > there are hundreds or thousands of them (which is quite possible on
> > SUSE systems, for example, with its auto-snapshotting regime), this
> > would be a mess, wouldn't it?
> 
> I'm surprised that btrfs is special here.  Doesn't anyone have thousands
> of lvm snapshots?  Or is it that they do but they're not normally
> mounted?

Unprivileged users can't create lvm snapshots as easily or quickly as
using mkdir (well, ok, mkdir and fssync).  lvm doesn't scale very well
past more than a few dozen snapshots of the same original volume, and
performance degrades linearly in the number of snapshots if the original
LV is modified.  btrfs is the opposite:  users can create and delete
as many snapshots as they like, at a cost more expensive than mkdir but
less expensive than 'cp -a', and users only pay IO costs for writes to
the subvols they modify.  So some btrfs users use snapshots in places
where more traditional tools like 'cp -a' or 'git checkout' are used on
other filesystems.

e.g. a build system might make a snapshot of a git working tree containing
a checked out and built baseline revision, and then it might do a loop
where it makes a snapshot, applies one patch from an integration branch
in the snapshot directory, and incrementally builds there.  The next
revision makes a snapshot of its parent revision's subvol and builds
the next patch.  If there are merges in the integration branch, then
the builder can go back to parent revisions, create a new snapshot,
apply the patch, and build in a snapshot on both sides of the merge.
After testing picks a winner, the builder can simply delete all the
snapshots except the one for the version that won testing (there is no
requirement to commit the snapshot to the origin LV as in lvm, either
can be destroyed without requiring action to preserve the other).

You can do a similar thing with overlayfs, but it runs into problems
with all the mount points.  In btrfs, the mount points are persistent
because they're built into the filesystem.  With overlayfs, you have
to save and restore them so they persist across reboots (unless that
feature has been added since I last looked).

I'm looking at a few machines here, and if all the subvols are visible to
'df', its output would be somewhere around 3-5 MB.  That's too much--we'd
have to hack up df to not show the same btrfs twice...as well as every
monitoring tool that reports free space...which sounds similar to the
problems we're trying to avoid.

Ideally there would be a way to turn this on or off.  It is creating a
set of new problems that is the complement of the set we're trying to
fix in this change.

> --b.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28 13:43       ` g.btrfs
@ 2021-07-29  1:39         ` NeilBrown
  2021-07-29  9:28           ` Graham Cobb
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-29  1:39 UTC (permalink / raw)
  To: g.btrfs
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Wed, 28 Jul 2021, g.btrfs@cobb.uk.net wrote:
> On 28/07/2021 08:01, NeilBrown wrote:
> > On Wed, 28 Jul 2021, Wang Yugui wrote:
> >> Hi,
> >>
> >> This patchset works well in 5.14-rc3.
> > 
> > Thanks for testing.
> > 
> >>
> >> 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
> >> dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
> > 
> > The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
> > be permanent.
> > The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
> > think).
> > This is a bit less of a hack.  It is an easily available number that is
> > fairly unique.
> > 
> >>
> >> 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
> >> not used.
> >> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
> >> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
> >> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
> >>
> >> This is a visiual feature change for btrfs user.
> > 
> > Hopefully it is an improvement.  But it is certainly a change that needs
> > to be carefully considered.
> 
> Would this change the behaviour of findmnt? I have several scripts that
> depend on findmnt to select btrfs filesystems. Just to take a couple of
> examples (using the example shown above): my scripts would depend on
> 'findmnt --target /mnt/test/sub1 -o target' providing /mnt/test, not the
> subvolume; and another script would depend on 'findmnt -t btrfs
> --mountpoint /mnt/test/sub1' providing no output as the directory is not
> an /etc/fstab mount point for a btrfs filesystem.

Yes, I think it does change the behaviour of findmnt.
If the sub1 automount has not been triggered,
  findmnt --target /mnt/test/sub1 -o target
will report "/mnt/test".
After it has been triggered, it will report "/mnt/test/sub1"

Similarly "findmnt -t btrfs --mountpoint /mnt/test/sub1" will report
nothing if the automount hasn't been triggered, and will report full
details of /mnt/test/sub1 if it has.

> 
> Maybe findmnt isn't affected? Or maybe the change is worth making
> anyway? But it needs to be carefully considered if it breaks existing
> user interfaces.
> 
I hope the change is worth making anyway, but breaking findmnt would not
be a popular move.
This is unfortunate....  btrfs is "broken" and people/code have adjusted
to that breakage so that "fixing" it will be traumatic.

The only way I can find to get findmnt to ignore the new entries in
/proc/self/mountinfo is to trigger a parse error such as by replacing the 
" - " with " -- "
but that causes a parse error message to be generated, and will likely
break other tools.
(...  or I could check if current->comm is "findmnt", and suppress the
extra entries, but that is even more horrible!!)

A possible option is to change findmnt to explicitly ignore the new
"internal" mounts (unless some option is given) and then delay the
kernel update until that can be rolled out.

Or we could make the new internal mounts invisible in /proc without some
kernel setting enabled.  Then distros can enable it once all important
tools can cope, and they can easily test correctness without rebooting.

I wonder if correcting the device-number reported for explicit subvol
mounts will break anything....  findmnt seems happy with it in my
limited testing.  There seems to be a testsuite with util-linux.  Maybe
I should try that.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-29  1:29           ` Zygo Blaxell
@ 2021-07-29  1:43             ` NeilBrown
  2021-07-29 23:20               ` Zygo Blaxell
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-29  1:43 UTC (permalink / raw)
  To: Zygo Blaxell
  Cc: J. Bruce Fields, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On Thu, 29 Jul 2021, Zygo Blaxell wrote:
> 
> I'm looking at a few machines here, and if all the subvols are visible to
> 'df', its output would be somewhere around 3-5 MB.  That's too much--we'd
> have to hack up df to not show the same btrfs twice...as well as every
> monitoring tool that reports free space...which sounds similar to the
> problems we're trying to avoid.

Thanks for providing hard data!! How many of these subvols are actively
used (have a file open) a the same time? Most? About half? Just a few??

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28 22:50         ` NeilBrown
@ 2021-07-29  2:37           ` Zygo Blaxell
  2021-07-29  3:36             ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Zygo Blaxell @ 2021-07-29  2:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Neal Gompa, Wang Yugui, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On Thu, Jul 29, 2021 at 08:50:50AM +1000, NeilBrown wrote:
> On Wed, 28 Jul 2021, Neal Gompa wrote:
> > On Wed, Jul 28, 2021 at 3:02 AM NeilBrown <neilb@suse.de> wrote:
> > >
> > > On Wed, 28 Jul 2021, Wang Yugui wrote:
> > > > Hi,
> > > >
> > > > This patchset works well in 5.14-rc3.
> > >
> > > Thanks for testing.
> > >
> > > >
> > > > 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
> > > > dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
> > >
> > > The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
> > > be permanent.
> > > The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
> > > think).
> > > This is a bit less of a hack.  It is an easily available number that is
> > > fairly unique.
> > >
> > > >
> > > > 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
> > > > not used.
> > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
> > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
> > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
> > > >
> > > > This is a visiual feature change for btrfs user.
> > >
> > > Hopefully it is an improvement.  But it is certainly a change that needs
> > > to be carefully considered.
> > 
> > I think this is behavior people generally expect, but I wonder what
> > the consequences of this would be with huge numbers of subvolumes. If
> > there are hundreds or thousands of them (which is quite possible on
> > SUSE systems, for example, with its auto-snapshotting regime), this
> > would be a mess, wouldn't it?
> 
> Would there be hundreds or thousands of subvols concurrently being
> accessed? The auto-mounted subvols only appear in the mount table while
> that are being accessed, and for about 15 minutes after the last access.
> I suspect that most subvols are "backup" snapshots which are not being
> accessed and so would not appear.

bees dedupes across subvols and polls every few minutes for new data
to dedupe.  bees doesn't particularly care where the "src" in the dedupe
call comes from, so it will pick a subvol that has a reference to the
data at random (whichever one comes up first in backref search) for each
dedupe call.  There is a cache of open fds on each subvol root so that it
can access files within that subvol using openat().  The cache quickly
populates fully, i.e. it holds a fd to every subvol on the filesystem.
The cache has a 15 minute timeout too, so bees would likely keep the
mount table fully populated at all times.

plocate also uses openat() and it can also be active on many subvols
simultaneously, though it only runs once a day, and it's reasonable to
exclude all snapshots from plocate for performance reasons.

My bigger concern here is that users on btrfs can currently have private
subvols with secret names.  e.g.

	user$ mkdir -m 700 private
	user$ btrfs sub create private/secret
	user$ cd private/secret
	user$ ...do stuff...

Would "secret" now be visible in the very public /proc/mounts every time
the user is doing stuff?

> > Or can we add a way to mark these things to not show up there or is
> > there some kind of behavioral change we can make to snapper or other
> > tools to make them not show up here?
> 
> Certainly it might make sense to flag these in some way so that tools
> can choose the ignore them or handle them specially, just as nfsd needs
> to handle them specially.  I was considering a "local" mount flag.

I would definitely want an 'off' switch for this thing until the impact
is better understood.

> NeilBrown
> 
> > 
> > 
> > 
> > -- 
> > 真実はいつも一つ!/ Always, there's only one truth!
> > 
> > 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-29  2:37           ` Zygo Blaxell
@ 2021-07-29  3:36             ` NeilBrown
  2021-07-29 23:20               ` Zygo Blaxell
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-29  3:36 UTC (permalink / raw)
  To: Zygo Blaxell
  Cc: Neal Gompa, Wang Yugui, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On Thu, 29 Jul 2021, Zygo Blaxell wrote:
> On Thu, Jul 29, 2021 at 08:50:50AM +1000, NeilBrown wrote:
> > On Wed, 28 Jul 2021, Neal Gompa wrote:
> > > On Wed, Jul 28, 2021 at 3:02 AM NeilBrown <neilb@suse.de> wrote:
> > > >
> > > > On Wed, 28 Jul 2021, Wang Yugui wrote:
> > > > > Hi,
> > > > >
> > > > > This patchset works well in 5.14-rc3.
> > > >
> > > > Thanks for testing.
> > > >
> > > > >
> > > > > 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
> > > > > dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
> > > >
> > > > The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
> > > > be permanent.
> > > > The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
> > > > think).
> > > > This is a bit less of a hack.  It is an easily available number that is
> > > > fairly unique.
> > > >
> > > > >
> > > > > 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
> > > > > not used.
> > > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
> > > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
> > > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
> > > > >
> > > > > This is a visiual feature change for btrfs user.
> > > >
> > > > Hopefully it is an improvement.  But it is certainly a change that needs
> > > > to be carefully considered.
> > > 
> > > I think this is behavior people generally expect, but I wonder what
> > > the consequences of this would be with huge numbers of subvolumes. If
> > > there are hundreds or thousands of them (which is quite possible on
> > > SUSE systems, for example, with its auto-snapshotting regime), this
> > > would be a mess, wouldn't it?
> > 
> > Would there be hundreds or thousands of subvols concurrently being
> > accessed? The auto-mounted subvols only appear in the mount table while
> > that are being accessed, and for about 15 minutes after the last access.
> > I suspect that most subvols are "backup" snapshots which are not being
> > accessed and so would not appear.
> 
> bees dedupes across subvols and polls every few minutes for new data
> to dedupe.  bees doesn't particularly care where the "src" in the dedupe
> call comes from, so it will pick a subvol that has a reference to the
> data at random (whichever one comes up first in backref search) for each
> dedupe call.  There is a cache of open fds on each subvol root so that it
> can access files within that subvol using openat().  The cache quickly
> populates fully, i.e. it holds a fd to every subvol on the filesystem.
> The cache has a 15 minute timeout too, so bees would likely keep the
> mount table fully populated at all times.

OK ... that is very interesting and potentially helpful - thanks.

Localizing these daemons in a separate namespace would stop them from
polluting the public namespace, but I don't know how easy that would
be..

Do you know how bees opens these files?  Does it use path-names from the
root, or some special btrfs ioctl, or ???
If path-names are not used, it might be possible to suppress the
automount. 

> 
> plocate also uses openat() and it can also be active on many subvols
> simultaneously, though it only runs once a day, and it's reasonable to
> exclude all snapshots from plocate for performance reasons.
> 
> My bigger concern here is that users on btrfs can currently have private
> subvols with secret names.  e.g.
> 
> 	user$ mkdir -m 700 private
> 	user$ btrfs sub create private/secret
> 	user$ cd private/secret
> 	user$ ...do stuff...
> 
> Would "secret" now be visible in the very public /proc/mounts every time
> the user is doing stuff?

Yes, the secret would be publicly visible.  Unless we hid it.

It is conceivable that the content of /proc/mounts could be limited to
mountpoints where the process reading had 'x' access to the mountpoint. 
However to be really safe we would want to require 'x' access to all
ancestors too, and possibly some 'r' access.  That would get
prohibitively expensive.

We could go with "owned by root, or owned by user" maybe.

Thanks,
NeilBrown


> 
> > > Or can we add a way to mark these things to not show up there or is
> > > there some kind of behavioral change we can make to snapper or other
> > > tools to make them not show up here?
> > 
> > Certainly it might make sense to flag these in some way so that tools
> > can choose the ignore them or handle them specially, just as nfsd needs
> > to handle them specially.  I was considering a "local" mount flag.
> 
> I would definitely want an 'off' switch for this thing until the impact
> is better understood.
> 
> > NeilBrown
> > 
> > > 
> > > 
> > > 
> > > -- 
> > > 真実はいつも一つ!/ Always, there's only one truth!
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-07-29  0:28     ` NeilBrown
@ 2021-07-29  5:27       ` Amir Goldstein
  2021-08-06  7:52         ` Miklos Szeredi
  0 siblings, 1 reply; 123+ messages in thread
From: Amir Goldstein @ 2021-07-29  5:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Linux Btrfs, Miklos Szeredi

On Thu, Jul 29, 2021 at 3:28 AM NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 28 Jul 2021, Amir Goldstein wrote:
> > On Wed, Jul 28, 2021 at 1:44 AM NeilBrown <neilb@suse.de> wrote:
> > >
> > > When a filesystem has internal mounts, it controls the filehandles
> > > across all those mounts (subvols) in the filesystem.  So it is useful to
> > > be able to look up a filehandle again one mount, and get a result which
> > > is in a different mount (part of the same overall file system).
> > >
> > > This patch makes that possible by changing export_decode_fh() and
> > > export_decode_fh_raw() to take a vfsmount pointer by reference, and
> > > possibly change the vfsmount pointed to before returning.
> > >
> > > The core of the change is in reconnect_path() which now not only checks
> > > that the dentry is fully connected, but also that the vfsmnt reported
> > > has the same 'dev' (reported by vfs_getattr) as the dentry.
> > > If it doesn't, we walk up the dparent() chain to find the highest place
> > > where the dev changes without there being a mount point, and trigger an
> > > automount there.
> > >
> > > As no filesystems yet provide local-mounts, this does not yet change any
> > > behaviour.
> > >
> > > In exportfs_decode_fh_raw() we previously tested for DCACHE_DISCONNECT
> > > before calling reconnect_path().  That test is dropped.  It was only a
> > > minor optimisation and is now inconvenient.
> > >
> > > The change in overlayfs needs more careful thought than I have yet given
> > > it.
> >
> > Just note that overlayfs does not support following auto mounts in layers.
> > See ovl_dentry_weird(). ovl_lookup() fails if it finds such a dentry.
> > So I think you need to make sure that the vfsmount was not crossed
> > when decoding an overlayfs real fh.
>
> Sounds sensible - thanks.
> Does this mean that my change would cause problems for people using
> overlayfs with a btrfs lower layer?
>

It sounds like it might :-/
I assume that enabling automount in btrfs in opt-in?
Otherwise you will be changing behavior for users of existing systems.

I am not sure, but I think it may be possible to remove the AUTOMOUNT
check from the ovl_dentry_weird() condition with an explicit overlayfs
config/module/mount option so that we won't change behavior by
default, but distro may change the default for overlayfs.

Then, when admin changes the btrfs options on the system to perform
automounts, it will also need to change the overlayfs options to not
error on automounts.

Given that today, subvolume mounts (or any mounts) on the lower layer
are not followed by overlayfs, I don't really see the difference
if mounts are created manually or automatically.
Miklos?

> >
> > Apart from that, I think that your new feature should be opt-in w.r.t
> > the exportfs_decode_fh() vfs api and that overlayfs should not opt-in
> > for the cross mount decode.
>
> I did consider making it opt-in, but it is easy enough for the caller
> to ignore the changed vfsmount, and only one (of 4) callers that it
> really makes a difference for.
>

Which reminds me. Please ignore the changed vfsmount in
do_handle_to_path() (or do not opt-in to changed vfsmount).

I have an application that uses a bind mount to filter file handles
of directories by subtree. It opens by the file handles that were
acquired from fanotify DFID info record using a mountfd in the
bind mount and readlink /proc/self/fd to determine the path
relative to that subtree bind mount.

Your change, IIUC, is going to change the semantics of
open_by_handle_at(2) and is going to break my application.

If you need this change for nfsd, please keep it as an internal api
used only by nfsd.

TBH, I think it would also be nice to have an internal api to limit
reconnect_path() walk up to mnt->mnt_root, which is what overlayfs
really wants, so here is another excuse for you to introduce
"reconnect flags" to exportfs_decode_fh_raw() ;-)

Note that I had already added support for one implicit "reconnect
flag" (i.e. "don't reconnect") in commit 8a22efa15b46
("ovl: do not try to reconnect a disconnected origin dentry").

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-29  1:39         ` NeilBrown
@ 2021-07-29  9:28           ` Graham Cobb
  0 siblings, 0 replies; 123+ messages in thread
From: Graham Cobb @ 2021-07-29  9:28 UTC (permalink / raw)
  To: NeilBrown
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On 29/07/2021 02:39, NeilBrown wrote:
> On Wed, 28 Jul 2021, g.btrfs@cobb.uk.net wrote:
>> On 28/07/2021 08:01, NeilBrown wrote:
>>> On Wed, 28 Jul 2021, Wang Yugui wrote:
>>>> Hi,
>>>>
>>>> This patchset works well in 5.14-rc3.
>>>
>>> Thanks for testing.
>>>
>>>>
>>>> 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
>>>> dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
>>>
>>> The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
>>> be permanent.
>>> The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
>>> think).
>>> This is a bit less of a hack.  It is an easily available number that is
>>> fairly unique.
>>>
>>>>
>>>> 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
>>>> not used.
>>>> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
>>>> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
>>>> /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
>>>>
>>>> This is a visiual feature change for btrfs user.
>>>
>>> Hopefully it is an improvement.  But it is certainly a change that needs
>>> to be carefully considered.
>>
>> Would this change the behaviour of findmnt? I have several scripts that
>> depend on findmnt to select btrfs filesystems. Just to take a couple of
>> examples (using the example shown above): my scripts would depend on
>> 'findmnt --target /mnt/test/sub1 -o target' providing /mnt/test, not the
>> subvolume; and another script would depend on 'findmnt -t btrfs
>> --mountpoint /mnt/test/sub1' providing no output as the directory is not
>> an /etc/fstab mount point for a btrfs filesystem.
> 
> Yes, I think it does change the behaviour of findmnt.
> If the sub1 automount has not been triggered,
>   findmnt --target /mnt/test/sub1 -o target
> will report "/mnt/test".
> After it has been triggered, it will report "/mnt/test/sub1"
> 
> Similarly "findmnt -t btrfs --mountpoint /mnt/test/sub1" will report
> nothing if the automount hasn't been triggered, and will report full
> details of /mnt/test/sub1 if it has.
> 
>>
>> Maybe findmnt isn't affected? Or maybe the change is worth making
>> anyway? But it needs to be carefully considered if it breaks existing
>> user interfaces.
>>
> I hope the change is worth making anyway, but breaking findmnt would not
> be a popular move.

I agree. I use findmnt, but I also use NFS mounted btrfs disks so I am
keen to see this deployed. But people who don't maintain their own
scripts and need a third party to change them might disagree!

> This is unfortunate....  btrfs is "broken" and people/code have adjusted
> to that breakage so that "fixing" it will be traumatic.
> 
> The only way I can find to get findmnt to ignore the new entries in
> /proc/self/mountinfo is to trigger a parse error such as by replacing the 
> " - " with " -- "
> but that causes a parse error message to be generated, and will likely
> break other tools.
> (...  or I could check if current->comm is "findmnt", and suppress the
> extra entries, but that is even more horrible!!)
> 
> A possible option is to change findmnt to explicitly ignore the new
> "internal" mounts (unless some option is given) and then delay the
> kernel update until that can be rolled out.

That sounds good as a permanent fix for findmnt. Some sort of
'--include-subvols' option. Particularly if it were possible to default
it using an environment variable so a script can be written to work with
both the old and the new versions of findmnt.

Unfortunately it won't help any other program which does similar
searches through /proc/self/mountinfo.

How about creating two different files? Say, /proc/self/mountinfo and
/proc/self/mountinfo.internal (better filenames may be available!). The
.internal file could be just the additional internal mounts, or it could
be the complete list. Or something like
/proc/self/mountinfo.without-subvols and
/proc/self/mountinfo.with-subvols and a sysctl setting to choose which
is made visible as /proc/self/mountinfo.

Graham



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots.
  2021-07-29  0:43     ` NeilBrown
@ 2021-07-29 14:38       ` Christian Brauner
  0 siblings, 0 replies; 123+ messages in thread
From: Christian Brauner @ 2021-07-29 14:38 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On Thu, Jul 29, 2021 at 10:43:23AM +1000, NeilBrown wrote:
> On Wed, 28 Jul 2021, Christian Brauner wrote:
> > 
> > Hey Neil,
> > 
> > Sorry if this is a stupid question but wouldn't you want to copy the
> > mount properties from path->mnt here? Couldn't you otherwise use this to
> > e.g. suddenly expose a dentry on a read-only mount as read-write?
> 
> There are no stupid questions, and this is a particularly non-stupid
> one!
> 
> I hadn't considered that, but having examined the code I see that it
> is already handled.
> The vfsmount that d_automount returns is passed to finish_automount(),
> which hands it to do_add_mount() together with the mnt_flags for the
> parent vfsmount (plus MNT_SHRINKABLE).
> do_add_mount() sets up the mnt_flags of the new vfsmount.
> In fact, the d_automount interface has no control of these flags at all.
> Whatever it sets will be over-written by do_add_mount.

Ah, interesting thank you very much, Neil. I seemed to have overlooked
this yesterday.

If btrfs makes use of automounts the way you envisioned to expose
subvolumes and also will support idmapped mounts (see [1]) we need to
teach do_add_mount() to also take the idmapped mount into account. So
you'd need something like (entirely untested):

diff --git a/fs/namespace.c b/fs/namespace.c
index ab4174a3c802..921f6396c36d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2811,6 +2811,11 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
                return -EINVAL;

        newmnt->mnt.mnt_flags = mnt_flags;
+
+       newmnt->mnt.mnt_userns = path->mnt;
+       if (newmnt->mnt.mnt_userns != &init_user_ns)
+               newmnt->mnt.mnt_userns = get_user_ns(newmnt->mnt.mnt_userns);
+
        return graft_tree(newmnt, parent, mp);
 }

[1]: https://lore.kernel.org/linux-btrfs/20210727104900.829215-1-brauner@kernel.org/T/#mca601363b435e81c89d8ca4f09134faa5c227e6d

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-29  3:36             ` NeilBrown
@ 2021-07-29 23:20               ` Zygo Blaxell
  2021-07-30  2:36                 ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Zygo Blaxell @ 2021-07-29 23:20 UTC (permalink / raw)
  To: NeilBrown
  Cc: Neal Gompa, Wang Yugui, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On Thu, Jul 29, 2021 at 01:36:06PM +1000, NeilBrown wrote:
> On Thu, 29 Jul 2021, Zygo Blaxell wrote:
> > On Thu, Jul 29, 2021 at 08:50:50AM +1000, NeilBrown wrote:
> > > On Wed, 28 Jul 2021, Neal Gompa wrote:
> > > > On Wed, Jul 28, 2021 at 3:02 AM NeilBrown <neilb@suse.de> wrote:
> > > > >
> > > > > On Wed, 28 Jul 2021, Wang Yugui wrote:
> > > > > > Hi,
> > > > > >
> > > > > > This patchset works well in 5.14-rc3.
> > > > >
> > > > > Thanks for testing.
> > > > >
> > > > > >
> > > > > > 1, fixed dummy inode(255, BTRFS_FIRST_FREE_OBJECTID - 1 )  is changed to
> > > > > > dynamic dummy inode(18446744073709551358, or 18446744073709551359, ...)
> > > > >
> > > > > The BTRFS_FIRST_FREE_OBJECTID-1 was a just a hack, I never wanted it to
> > > > > be permanent.
> > > > > The new number is ULONG_MAX - subvol_id (where subvol_id starts at 257 I
> > > > > think).
> > > > > This is a bit less of a hack.  It is an easily available number that is
> > > > > fairly unique.
> > > > >
> > > > > >
> > > > > > 2, btrfs subvol mount info is shown in /proc/mounts, even if nfsd/nfs is
> > > > > > not used.
> > > > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test
> > > > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub1
> > > > > > /dev/sdc                btrfs   94G  3.5M   93G   1% /mnt/test/sub2
> > > > > >
> > > > > > This is a visiual feature change for btrfs user.
> > > > >
> > > > > Hopefully it is an improvement.  But it is certainly a change that needs
> > > > > to be carefully considered.
> > > > 
> > > > I think this is behavior people generally expect, but I wonder what
> > > > the consequences of this would be with huge numbers of subvolumes. If
> > > > there are hundreds or thousands of them (which is quite possible on
> > > > SUSE systems, for example, with its auto-snapshotting regime), this
> > > > would be a mess, wouldn't it?
> > > 
> > > Would there be hundreds or thousands of subvols concurrently being
> > > accessed? The auto-mounted subvols only appear in the mount table while
> > > that are being accessed, and for about 15 minutes after the last access.
> > > I suspect that most subvols are "backup" snapshots which are not being
> > > accessed and so would not appear.
> > 
> > bees dedupes across subvols and polls every few minutes for new data
> > to dedupe.  bees doesn't particularly care where the "src" in the dedupe
> > call comes from, so it will pick a subvol that has a reference to the
> > data at random (whichever one comes up first in backref search) for each
> > dedupe call.  There is a cache of open fds on each subvol root so that it
> > can access files within that subvol using openat().  The cache quickly
> > populates fully, i.e. it holds a fd to every subvol on the filesystem.
> > The cache has a 15 minute timeout too, so bees would likely keep the
> > mount table fully populated at all times.
> 
> OK ... that is very interesting and potentially helpful - thanks.
> 
> Localizing these daemons in a separate namespace would stop them from
> polluting the public namespace, but I don't know how easy that would
> be..
> 
> Do you know how bees opens these files?  Does it use path-names from the
> root, or some special btrfs ioctl, or ???

There's a function in bees that opens a subvol root directory by subvol
id.  It walks up the btrfs subvol tree with btrfs ioctls to construct a
path to the root, then down the filesystem tree with other btrfs ioctls
to get filenames for each subvol.  The filenames are fed to openat()
with the parent subvol's fd to get a fd for each child subvol's root
directory along the path.  This is recursive and expensive (the fd
has to be checked to see if it matches the subvol, in case some other
process renamed it) and called every time bees wants to open a file,
so the fd goes into a cache for future open-subvol-by-id calls.

For files, bees calls subvol root open to get a fd for the subvol's root,
then calls the btrfs inode-to-path ioctl on that fd to get a list of
names for the inode, then openat(subvol_fd, inode_to_path(inum), ...) on
each name until a fd matching the target subvol and inode is obtained.
File access is driven by data content, so bees cannot easily predict
which files will need to be accessed again in the near future and which
can be closed.  The fd cache is a brute-force way to reduce the number
of inode-to-path and open calls.

Upper layers of bees use (subvol, inode) pairs to identify files
and request file descriptors.  The lower layers use filenames as an
implementation detail for compatibility with kernel API.

> If path-names are not used, it might be possible to suppress the
> automount. 

A userspace interface to read and dedupe that doesn't use pathnames or
file descriptors (other than one fd to bind the interface to a filesystem)
would be nice!  About half of the bees code is devoted to emulating
that interface using the existing kernel API.

Ideally a dedupe agent would be able to pass two physical offsets and
a length of identical data to the filesystem without ever opening a file.

While I'm in favor of making bees smaller, this seems like an expensive
way to suppress automounts.

> > plocate also uses openat() and it can also be active on many subvols
> > simultaneously, though it only runs once a day, and it's reasonable to
> > exclude all snapshots from plocate for performance reasons.
> > 
> > My bigger concern here is that users on btrfs can currently have private
> > subvols with secret names.  e.g.
> > 
> > 	user$ mkdir -m 700 private
> > 	user$ btrfs sub create private/secret
> > 	user$ cd private/secret
> > 	user$ ...do stuff...
> > 
> > Would "secret" now be visible in the very public /proc/mounts every time
> > the user is doing stuff?
> 
> Yes, the secret would be publicly visible.  Unless we hid it.
> 
> It is conceivable that the content of /proc/mounts could be limited to
> mountpoints where the process reading had 'x' access to the mountpoint. 
> However to be really safe we would want to require 'x' access to all
> ancestors too, and possibly some 'r' access.  That would get
> prohibitively expensive.

And inconsistent, since we don't do that with other mount points, i.e.
outside of btrfs.  But we do hide parts of the path names when the
process's filesystem root or namespace changes, so maybe this is more
of that.

> We could go with "owned by root, or owned by user" maybe.
> 
> Thanks,
> NeilBrown
> 
> 
> > 
> > > > Or can we add a way to mark these things to not show up there or is
> > > > there some kind of behavioral change we can make to snapper or other
> > > > tools to make them not show up here?
> > > 
> > > Certainly it might make sense to flag these in some way so that tools
> > > can choose the ignore them or handle them specially, just as nfsd needs
> > > to handle them specially.  I was considering a "local" mount flag.
> > 
> > I would definitely want an 'off' switch for this thing until the impact
> > is better understood.
> > 
> > > NeilBrown
> > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > 真実はいつも一つ!/ Always, there's only one truth!
> > > > 
> > > > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-29  1:43             ` NeilBrown
@ 2021-07-29 23:20               ` Zygo Blaxell
  0 siblings, 0 replies; 123+ messages in thread
From: Zygo Blaxell @ 2021-07-29 23:20 UTC (permalink / raw)
  To: NeilBrown
  Cc: J. Bruce Fields, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On Thu, Jul 29, 2021 at 11:43:21AM +1000, NeilBrown wrote:
> On Thu, 29 Jul 2021, Zygo Blaxell wrote:
> > 
> > I'm looking at a few machines here, and if all the subvols are visible to
> > 'df', its output would be somewhere around 3-5 MB.  That's too much--we'd
> > have to hack up df to not show the same btrfs twice...as well as every
> > monitoring tool that reports free space...which sounds similar to the
> > problems we're trying to avoid.
> 
> Thanks for providing hard data!! How many of these subvols are actively
> used (have a file open) a the same time? Most? About half? Just a few??

Between 1% and 10% of the subvols have open fds at any time (not counting
bees, which holds an open fd on every subvol most of the time).  The ratio
is higher (more active) when the machine has less storage or more CPU/RAM:
we keep idle subvols around longer if we have lots of free space, or we
make more subvols active at the same time if we have lots of RAM and CPU.

I don't recall if stat() triggers automount, but most of the subvols are
located in a handful of directories.  Could a single 'ls -l' command
activate all of their automounts?  If so, then we'd be hitting those
at least once every 15 minutes--these directories are browseable, and
an entry point for users.  Certainly anything that looks inside those
directories (like certain file-browsing user agents that look for icons
one level down) will trigger automount as they search children of the
subvol root.

Some of this might be fixable, like I could probably make bees be
more parsimonious with subvol root fds, and I could probably rework
reporting scripts to avoid touching anything inside subdirectories,
and I could probably rework our subvolume directory layout to avoid
accidentally triggering thousands of automounts.  But I'd rather not.
I'd rather have working NFS and a 15-20 line df output with no new
privacy or scalability concerns.

> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-28 21:30   ` Josef Bacik
@ 2021-07-30  0:13     ` Al Viro
  2021-07-30  6:08       ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Al Viro @ 2021-07-30  0:13 UTC (permalink / raw)
  To: Josef Bacik
  Cc: J. Bruce Fields, NeilBrown, Christoph Hellwig, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 05:30:04PM -0400, Josef Bacik wrote:

> I don't think anybody has that many file systems.  For btrfs it's a single
> file system.  Think of syncfs, it's going to walk through all of the super
> blocks on the system calling ->sync_fs on each subvol superblock.  Now this
> isn't a huge deal, we could just have some flag that says "I'm not real" or
> even just have anonymous superblocks that don't get added to the global
> super_blocks list, and that would address my main pain points.

Umm...  Aren't the snapshots read-only by definition?

> The second part is inode reclaim.  Again this particular problem could be
> avoided if we had an anonymous superblock that wasn't actually used, but the
> inode lru is per superblock.  Now with reclaim instead of walking all the
> inodes, you're walking a bunch of super blocks and then walking the list of
> inodes within those super blocks.  You're burning CPU cycles because now
> instead of getting big chunks of inodes to dispose, it's spread out across
> many super blocks.
> 
> The other weird thing is the way we apply pressure to shrinker systems.  We
> essentially say "try to evict X objects from your list", which means in this
> case with lots of subvolumes we'd be evicting waaaaay more inodes than you
> were before, likely impacting performance where you have workloads that have
> lots of files open across many subvolumes (which is what FB does with it's
> containers).
> 
> If we want a anonymous superblock per subvolume then the only way it'll work
> is if it's not actually tied into anything, and we still use the primary
> super block for the whole file system.  And if that's what we're going to do
> what's the point of the super block exactly?  This approach that Neil's come
> up with seems like a reasonable solution to me.  Christoph gets his
> separation and /proc/self/mountinfo, and we avoid the scalability headache
> of a billion super blocks.  Thanks,

AFAICS, we also get arseloads of weird corner cases - in particular, Neil's
suggestions re visibility in /proc/mounts look rather arbitrary.

Al, really disliking the entire series...

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
@ 2021-07-30  0:25   ` Al Viro
  2021-07-30  5:28     ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Al Viro @ 2021-07-30  0:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> /proc/$PID/mountinfo contains a field for the device number of the
> filesystem at each mount.
> 
> This is taken from the superblock ->s_dev field, which is correct for
> every filesystem except btrfs.  A btrfs filesystem can contain multiple
> subvols which each have a different device number.  If (a directory
> within) one of these subvols is mounted, the device number reported in
> mountinfo will be different from the device number reported by stat().
> 
> This confuses some libraries and tools such as, historically, findmnt.
> Current findmnt seems to cope with the strangeness.
> 
> So instead of using ->s_dev, call vfs_getattr_nosec() and use the ->dev
> provided.  As there is no STATX flag to ask for the device number, we
> pass a request mask for zero, and also ask the filesystem to avoid
> syncing with any remote service.

Hard NAK.  You are putting IO (potentially - network IO, with no upper
limit on the completion time) under namespace_sem.

This is an instant DoS - have a hung NFS mount anywhere in the system,
try to cat /proc/self/mountinfo and watch a system-wide rwsem held shared.
From that point on any attempt to take it exclusive will hang *AND* after
that all attempts to take it shared will do the same.

Please, fix BTRFS shite in BTRFS.  Without turning a moderately unpleasant
problem (say, unplugged hub on the way to NFS server) into something that
escalates into buggered clients.  Note that you have taken out any possibility
to e.g. umount -l /path/to/stuck/mount, along with any chance of clear shutdown
of the client.  Not going to happen.

NAKed-by: Al Viro <viro@zeniv.linux.org.uk>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 04/11] VFS: export lookup_mnt()
  2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
@ 2021-07-30  0:31   ` Al Viro
  2021-07-30  5:33     ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Al Viro @ 2021-07-30  0:31 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> In order to support filehandle lookup in filesystems with internal
> mounts (multiple subvols in the one filesystem) reconnect_path() in
> exportfs will need to find out if a given dentry is already mounted.
> This can be done with the function lookup_mnt(), so export that to make
> it available.

IMO having exportfs modular is wrong - note that fs/fhandle.c is
	* calling functions in exportfs
	* non-modular
	* ... and not going to be modular, no matter what - there
are syscalls in it.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 05/11] VFS: new function: mount_is_internal()
  2021-07-28  3:32     ` NeilBrown
@ 2021-07-30  0:34       ` Al Viro
  0 siblings, 0 replies; 123+ messages in thread
From: Al Viro @ 2021-07-30  0:34 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 01:32:45PM +1000, NeilBrown wrote:
> On Wed, 28 Jul 2021, Al Viro wrote:
> > On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > > This patch introduces the concept of an "internal" mount which is a
> > > mount where a filesystem has create the mount itself.
> > > 
> > > Both the mounted-on-dentry and the mount's root dentry must refer to the
> > > same superblock (they may be the same dentry), and the mounted-on dentry
> > > must be an automount.
> > 
> > And what happens if you mount --move it?
> > 
> > 
> If you move the mount, then the mounted-on dentry would not longer be an
> automount (....  I assume???...) so it would not longer be
> mount_is_internal().
> 
> I think that is reasonable.  Whoever moved the mount has now taken over
> responsibility for it - it no longer is controlled by the filesystem.
> The moving will have removed the mount from the list of auto-expire
> mounts, and the mount-trap will now be exposed and can be mounted-on
> again.
> 
> It would be just like unmounting the automount, and bind-mounting the
> same dentry elsewhere.

Once more, with feeling: what happens to your function if it gets called during
mount --move?

What locking environment is going to be provided for it?  And what is going
to provide said environment?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points.
  2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
  2021-07-28 19:15   ` J. Bruce Fields
@ 2021-07-30  0:42   ` Al Viro
  2021-07-30  5:43     ` NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: Al Viro @ 2021-07-30  0:42 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:

> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index baa12ac36ece..22523e1cd478 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -64,7 +64,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
>  			    .dentry = dget(path_parent->dentry)};
>  	int err = 0;
>  
> -	err = follow_down(&path, 0);
> +	err = follow_down(&path, LOOKUP_AUTOMOUNT);
>  	if (err < 0)
>  		goto out;
>  	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
> @@ -73,6 +73,13 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
>  		path_put(&path);
>  		goto out;
>  	}
> +	if (mount_is_internal(path.mnt)) {
> +		/* Use the new path, but don't look for a new export */
> +		/* FIXME should I check NOHIDE in this case?? */
> +		path_put(path_parent);
> +		*path_parent = path;
> +		goto out;
> +	}

... IOW, mount_is_internal() is called with no exclusion whatsoever.  What's there
to
	* keep its return value valid?
	* prevent fetching ->mnt_mountpoint, getting preempted away, having
the mount moved *and* what used to be ->mnt_mountpoint evicted from dcache,
now that it's no longer pinned, then mount_is_internal() regaining CPU and
dereferencing ->mnt_mountpoint, which now points to hell knows what?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-29 23:20               ` Zygo Blaxell
@ 2021-07-30  2:36                 ` NeilBrown
  2021-07-30  5:25                   ` Qu Wenruo
  2021-07-30  5:28                   ` Amir Goldstein
  0 siblings, 2 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-30  2:36 UTC (permalink / raw)
  To: Zygo Blaxell
  Cc: Neal Gompa, Wang Yugui, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS


I've been pondering all the excellent feedback, and what I have learnt
from examining the code in btrfs, and I have developed a different
perspective.

Maybe "subvol" is a poor choice of name because it conjures up
connections with the Volumes in LVM, and btrfs subvols are very different
things.  Btrfs subvols are really just subtrees that can be treated as a
unit for operations like "clone" or "destroy".

As such, they don't really deserve separate st_dev numbers.

Maybe the different st_dev numbers were introduced as a "cheap" way to
extend to size of the inode-number space.  Like many "cheap" things, it
has hidden costs.

Maybe objects in different subvols should still be given different inode
numbers.  This would be problematic on 32bit systems, but much less so on
64bit systems.

The patch below, which is just a proof-of-concept, changes btrfs to
report a uniform st_dev, and different (64bit) st_ino in different subvols.

It has problems:
 - it will break any 32bit readdir and 32bit stat.  I don't know how big
   a problem that is these days (ino_t in the kernel is "unsigned long",
   not "unsigned long long). That surprised me).
 - It might break some user-space expectations.  One thing I have learnt
   is not to make any assumption about what other people might expect.

However, it would be quite easy to make this opt-in (or opt-out) with a
mount option, so that people who need the current inode numbers and will
accept the current breakage can keep working.

I think this approach would be a net-win for NFS export, whether BTRFS
supports it directly or not.  I might post a patch which modifies NFS to
intuit improved inode numbers for btrfs exports....

So: how would this break your use-case??

Thanks,
NeilBrown

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0117d867ecf8..8dc58c848502 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6020,6 +6020,37 @@ static int btrfs_opendir(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static u64 btrfs_make_inum(struct btrfs_key *root, struct btrfs_key *ino)
+{
+	u64 rootid = root->objectid;
+	u64 inoid = ino->objectid;
+	int shift = 64-8;
+
+	if (ino->type == BTRFS_ROOT_ITEM_KEY) {
+		/* This is a subvol root found during readdir. */
+		rootid = inoid;
+		inoid = BTRFS_FIRST_FREE_OBJECTID;
+	}
+	if (rootid == BTRFS_FS_TREE_OBJECTID)
+		/* this is main vol, not subvol (I think) */
+		return inoid;
+	/* store the rootid in the high bits of the inum.  This
+	 * will break if 32bit inums are required - we cannot know
+	 */
+	while (rootid) {
+		inoid ^= (rootid & 0xff) << shift;
+		rootid >>= 8;
+		shift -= 8;
+	}
+	return inoid;
+}
+
+static inline u64 btrfs_ino_to_inum(struct inode *inode)
+{
+	return btrfs_make_inum(&BTRFS_I(inode)->root->root_key,
+			       &BTRFS_I(inode)->location);
+}
+
 struct dir_entry {
 	u64 ino;
 	u64 offset;
@@ -6045,6 +6076,49 @@ static int btrfs_filldir(void *addr, int entries, struct dir_context *ctx)
 	return 0;
 }
 
+static inline bool btrfs_dir_emit_dot(struct file *file,
+				      struct dir_context *ctx)
+{
+	return ctx->actor(ctx, ".", 1, ctx->pos,
+			  btrfs_ino_to_inum(file->f_path.dentry->d_inode),
+			  DT_DIR) == 0;
+}
+
+static inline ino_t btrfs_parent_ino(struct dentry *dentry)
+{
+	ino_t res;
+
+	/*
+	 * Don't strictly need d_lock here? If the parent ino could change
+	 * then surely we'd have a deeper race in the caller?
+	 */
+	spin_lock(&dentry->d_lock);
+	res = btrfs_ino_to_inum(dentry->d_parent->d_inode);
+	spin_unlock(&dentry->d_lock);
+	return res;
+}
+
+static inline bool btrfs_dir_emit_dotdot(struct file *file,
+					 struct dir_context *ctx)
+{
+	return ctx->actor(ctx, "..", 2, ctx->pos,
+			  btrfs_parent_ino(file->f_path.dentry), DT_DIR) == 0;
+}
+static inline bool btrfs_dir_emit_dots(struct file *file,
+				       struct dir_context *ctx)
+{
+	if (ctx->pos == 0) {
+		if (!btrfs_dir_emit_dot(file, ctx))
+			return false;
+		ctx->pos = 1;
+	}
+	if (ctx->pos == 1) {
+		if (!btrfs_dir_emit_dotdot(file, ctx))
+			return false;
+		ctx->pos = 2;
+	}
+	return true;
+}
 static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
 {
 	struct inode *inode = file_inode(file);
@@ -6067,7 +6141,7 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
 	bool put = false;
 	struct btrfs_key location;
 
-	if (!dir_emit_dots(file, ctx))
+	if (!btrfs_dir_emit_dots(file, ctx))
 		return 0;
 
 	path = btrfs_alloc_path();
@@ -6136,7 +6210,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
 		put_unaligned(fs_ftype_to_dtype(btrfs_dir_type(leaf, di)),
 				&entry->type);
 		btrfs_dir_item_key_to_cpu(leaf, di, &location);
-		put_unaligned(location.objectid, &entry->ino);
+		put_unaligned(btrfs_make_inum(&root->root_key, &location),
+			      &entry->ino);
 		put_unaligned(found_key.offset, &entry->offset);
 		entries++;
 		addr += sizeof(struct dir_entry) + name_len;
@@ -9193,7 +9268,7 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
 				  STATX_ATTR_NODUMP);
 
 	generic_fillattr(&init_user_ns, inode, stat);
-	stat->dev = BTRFS_I(inode)->root->anon_dev;
+	stat->ino = btrfs_ino_to_inum(inode);
 
 	spin_lock(&BTRFS_I(inode)->lock);
 	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  2:36                 ` NeilBrown
@ 2021-07-30  5:25                   ` Qu Wenruo
  2021-07-30  5:31                     ` Qu Wenruo
  2021-07-30  5:58                     ` NeilBrown
  2021-07-30  5:28                   ` Amir Goldstein
  1 sibling, 2 replies; 123+ messages in thread
From: Qu Wenruo @ 2021-07-30  5:25 UTC (permalink / raw)
  To: NeilBrown, Zygo Blaxell
  Cc: Neal Gompa, Wang Yugui, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS



On 2021/7/30 上午10:36, NeilBrown wrote:
>
> I've been pondering all the excellent feedback, and what I have learnt
> from examining the code in btrfs, and I have developed a different
> perspective.

Great! Some new developers into the btrfs realm!

>
> Maybe "subvol" is a poor choice of name because it conjures up
> connections with the Volumes in LVM, and btrfs subvols are very different
> things.  Btrfs subvols are really just subtrees that can be treated as a
> unit for operations like "clone" or "destroy".
>
> As such, they don't really deserve separate st_dev numbers.
>
> Maybe the different st_dev numbers were introduced as a "cheap" way to
> extend to size of the inode-number space.  Like many "cheap" things, it
> has hidden costs.
>
> Maybe objects in different subvols should still be given different inode
> numbers.  This would be problematic on 32bit systems, but much less so on
> 64bit systems.
>
> The patch below, which is just a proof-of-concept, changes btrfs to
> report a uniform st_dev, and different (64bit) st_ino in different subvols.
>
> It has problems:
>   - it will break any 32bit readdir and 32bit stat.  I don't know how big
>     a problem that is these days (ino_t in the kernel is "unsigned long",
>     not "unsigned long long). That surprised me).
>   - It might break some user-space expectations.  One thing I have learnt
>     is not to make any assumption about what other people might expect.

Wouldn't any filesystem boundary check fail to stop at subvolume boundary?

Then it will go through the full btrfs subvolumes/snapshots, which can
be super slow.

>
> However, it would be quite easy to make this opt-in (or opt-out) with a
> mount option, so that people who need the current inode numbers and will
> accept the current breakage can keep working.
>
> I think this approach would be a net-win for NFS export, whether BTRFS
> supports it directly or not.  I might post a patch which modifies NFS to
> intuit improved inode numbers for btrfs exports....

Some extra ideas, but not familiar with VFS enough to be sure.

Can we generate "fake" superblock for each subvolume?
Like using the subolume UUID to replace the FSID of each subvolume.
Could that migrate the problem?

Thanks,
Qu

>
> So: how would this break your use-case??
>
> Thanks,
> NeilBrown
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0117d867ecf8..8dc58c848502 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6020,6 +6020,37 @@ static int btrfs_opendir(struct inode *inode, struct file *file)
>   	return 0;
>   }
>
> +static u64 btrfs_make_inum(struct btrfs_key *root, struct btrfs_key *ino)
> +{
> +	u64 rootid = root->objectid;
> +	u64 inoid = ino->objectid;
> +	int shift = 64-8;
> +
> +	if (ino->type == BTRFS_ROOT_ITEM_KEY) {
> +		/* This is a subvol root found during readdir. */
> +		rootid = inoid;
> +		inoid = BTRFS_FIRST_FREE_OBJECTID;
> +	}
> +	if (rootid == BTRFS_FS_TREE_OBJECTID)
> +		/* this is main vol, not subvol (I think) */
> +		return inoid;
> +	/* store the rootid in the high bits of the inum.  This
> +	 * will break if 32bit inums are required - we cannot know
> +	 */
> +	while (rootid) {
> +		inoid ^= (rootid & 0xff) << shift;
> +		rootid >>= 8;
> +		shift -= 8;
> +	}
> +	return inoid;
> +}
> +
> +static inline u64 btrfs_ino_to_inum(struct inode *inode)
> +{
> +	return btrfs_make_inum(&BTRFS_I(inode)->root->root_key,
> +			       &BTRFS_I(inode)->location);
> +}
> +
>   struct dir_entry {
>   	u64 ino;
>   	u64 offset;
> @@ -6045,6 +6076,49 @@ static int btrfs_filldir(void *addr, int entries, struct dir_context *ctx)
>   	return 0;
>   }
>
> +static inline bool btrfs_dir_emit_dot(struct file *file,
> +				      struct dir_context *ctx)
> +{
> +	return ctx->actor(ctx, ".", 1, ctx->pos,
> +			  btrfs_ino_to_inum(file->f_path.dentry->d_inode),
> +			  DT_DIR) == 0;
> +}
> +
> +static inline ino_t btrfs_parent_ino(struct dentry *dentry)
> +{
> +	ino_t res;
> +
> +	/*
> +	 * Don't strictly need d_lock here? If the parent ino could change
> +	 * then surely we'd have a deeper race in the caller?
> +	 */
> +	spin_lock(&dentry->d_lock);
> +	res = btrfs_ino_to_inum(dentry->d_parent->d_inode);
> +	spin_unlock(&dentry->d_lock);
> +	return res;
> +}
> +
> +static inline bool btrfs_dir_emit_dotdot(struct file *file,
> +					 struct dir_context *ctx)
> +{
> +	return ctx->actor(ctx, "..", 2, ctx->pos,
> +			  btrfs_parent_ino(file->f_path.dentry), DT_DIR) == 0;
> +}
> +static inline bool btrfs_dir_emit_dots(struct file *file,
> +				       struct dir_context *ctx)
> +{
> +	if (ctx->pos == 0) {
> +		if (!btrfs_dir_emit_dot(file, ctx))
> +			return false;
> +		ctx->pos = 1;
> +	}
> +	if (ctx->pos == 1) {
> +		if (!btrfs_dir_emit_dotdot(file, ctx))
> +			return false;
> +		ctx->pos = 2;
> +	}
> +	return true;
> +}
>   static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>   {
>   	struct inode *inode = file_inode(file);
> @@ -6067,7 +6141,7 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>   	bool put = false;
>   	struct btrfs_key location;
>
> -	if (!dir_emit_dots(file, ctx))
> +	if (!btrfs_dir_emit_dots(file, ctx))
>   		return 0;
>
>   	path = btrfs_alloc_path();
> @@ -6136,7 +6210,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>   		put_unaligned(fs_ftype_to_dtype(btrfs_dir_type(leaf, di)),
>   				&entry->type);
>   		btrfs_dir_item_key_to_cpu(leaf, di, &location);
> -		put_unaligned(location.objectid, &entry->ino);
> +		put_unaligned(btrfs_make_inum(&root->root_key, &location),
> +			      &entry->ino);
>   		put_unaligned(found_key.offset, &entry->offset);
>   		entries++;
>   		addr += sizeof(struct dir_entry) + name_len;
> @@ -9193,7 +9268,7 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
>   				  STATX_ATTR_NODUMP);
>
>   	generic_fillattr(&init_user_ns, inode, stat);
> -	stat->dev = BTRFS_I(inode)->root->anon_dev;
> +	stat->ino = btrfs_ino_to_inum(inode);
>
>   	spin_lock(&BTRFS_I(inode)->lock);
>   	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  2:36                 ` NeilBrown
  2021-07-30  5:25                   ` Qu Wenruo
@ 2021-07-30  5:28                   ` Amir Goldstein
  1 sibling, 0 replies; 123+ messages in thread
From: Amir Goldstein @ 2021-07-30  5:28 UTC (permalink / raw)
  To: NeilBrown
  Cc: Zygo Blaxell, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Btrfs BTRFS

On Fri, Jul 30, 2021 at 5:41 AM NeilBrown <neilb@suse.de> wrote:
>
>
> I've been pondering all the excellent feedback, and what I have learnt
> from examining the code in btrfs, and I have developed a different
> perspective.
>
> Maybe "subvol" is a poor choice of name because it conjures up
> connections with the Volumes in LVM, and btrfs subvols are very different
> things.  Btrfs subvols are really just subtrees that can be treated as a
> unit for operations like "clone" or "destroy".
>
> As such, they don't really deserve separate st_dev numbers.
>
> Maybe the different st_dev numbers were introduced as a "cheap" way to
> extend to size of the inode-number space.  Like many "cheap" things, it
> has hidden costs.
>
> Maybe objects in different subvols should still be given different inode
> numbers.  This would be problematic on 32bit systems, but much less so on
> 64bit systems.
>
> The patch below, which is just a proof-of-concept, changes btrfs to
> report a uniform st_dev, and different (64bit) st_ino in different subvols.
>
> It has problems:
>  - it will break any 32bit readdir and 32bit stat.  I don't know how big
>    a problem that is these days (ino_t in the kernel is "unsigned long",
>    not "unsigned long long). That surprised me).
>  - It might break some user-space expectations.  One thing I have learnt
>    is not to make any assumption about what other people might expect.
>
> However, it would be quite easy to make this opt-in (or opt-out) with a
> mount option, so that people who need the current inode numbers and will
> accept the current breakage can keep working.
>
> I think this approach would be a net-win for NFS export, whether BTRFS
> supports it directly or not.  I might post a patch which modifies NFS to
> intuit improved inode numbers for btrfsdemostrates exports....
>
> So: how would this break your use-case??

The simple cases are find -xdev and du -x which expect the st_dev
change, but that can be excused if opting in to a unified st_dev namespace.

The harder problem is <st_dev;st_ino> collisions which are not even
that hard to hit with unlimited number of snapshots.
The 'diff' tool demonstrates the implications of <st_dev;st_ino>
collisions for different objects on userspace.
See xfstest overlay/049 for a demonstration.

The overlayfs xino feature made a similar change to overlayfs
<st_dev;st_ino> with one big difference - applications expect that
all objects in overlayfs mount will have the same st_dev.

Also, overlayfs has prior knowledge on the number of layers
so it is easier to parcel the ino namespace and avoid collisions.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-30  0:25   ` Al Viro
@ 2021-07-30  5:28     ` NeilBrown
  2021-07-30  5:54       ` Miklos Szeredi
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-30  5:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Fri, 30 Jul 2021, Al Viro wrote:
> On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > /proc/$PID/mountinfo contains a field for the device number of the
> > filesystem at each mount.
> > 
> > This is taken from the superblock ->s_dev field, which is correct for
> > every filesystem except btrfs.  A btrfs filesystem can contain multiple
> > subvols which each have a different device number.  If (a directory
> > within) one of these subvols is mounted, the device number reported in
> > mountinfo will be different from the device number reported by stat().
> > 
> > This confuses some libraries and tools such as, historically, findmnt.
> > Current findmnt seems to cope with the strangeness.
> > 
> > So instead of using ->s_dev, call vfs_getattr_nosec() and use the ->dev
> > provided.  As there is no STATX flag to ask for the device number, we
> > pass a request mask for zero, and also ask the filesystem to avoid
> > syncing with any remote service.
> 
> Hard NAK.  You are putting IO (potentially - network IO, with no upper
> limit on the completion time) under namespace_sem.

Why would IO be generated? The inode must already be in cache because it
is mounted, and STATX_DONT_SYNC is passed.  If a filesystem did IO in
those circumstances, it would be broken.

Thanks for the review,
NeilBrown


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  5:25                   ` Qu Wenruo
@ 2021-07-30  5:31                     ` Qu Wenruo
  2021-07-30  5:53                       ` Amir Goldstein
  2021-07-30  6:00                       ` NeilBrown
  2021-07-30  5:58                     ` NeilBrown
  1 sibling, 2 replies; 123+ messages in thread
From: Qu Wenruo @ 2021-07-30  5:31 UTC (permalink / raw)
  To: NeilBrown, Zygo Blaxell
  Cc: Neal Gompa, Wang Yugui, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS



On 2021/7/30 下午1:25, Qu Wenruo wrote:
>
>
> On 2021/7/30 上午10:36, NeilBrown wrote:
>>
>> I've been pondering all the excellent feedback, and what I have learnt
>> from examining the code in btrfs, and I have developed a different
>> perspective.
>
> Great! Some new developers into the btrfs realm!
>
>>
>> Maybe "subvol" is a poor choice of name because it conjures up
>> connections with the Volumes in LVM, and btrfs subvols are very different
>> things.  Btrfs subvols are really just subtrees that can be treated as a
>> unit for operations like "clone" or "destroy".
>>
>> As such, they don't really deserve separate st_dev numbers.
>>
>> Maybe the different st_dev numbers were introduced as a "cheap" way to
>> extend to size of the inode-number space.  Like many "cheap" things, it
>> has hidden costs.

Forgot another problem already caused by this st_dev method.

Since btrfs uses st_dev to distinguish them its inode name space, and
st_dev is allocated using anonymous bdev, and the anonymous bdev poor
has limited size (much smaller than btrfs subvolume id name space), it's
already causing problems like we can't allocate enough anonymous bdev
for each subvolume, and failed to create subvolume/snapshot.

Thus it's really a time to re-consider how we should export this info to
user space.

Thanks,
Qu

>>
>> Maybe objects in different subvols should still be given different inode
>> numbers.  This would be problematic on 32bit systems, but much less so on
>> 64bit systems.
>>
>> The patch below, which is just a proof-of-concept, changes btrfs to
>> report a uniform st_dev, and different (64bit) st_ino in different
>> subvols.
>>
>> It has problems:
>>   - it will break any 32bit readdir and 32bit stat.  I don't know how big
>>     a problem that is these days (ino_t in the kernel is "unsigned long",
>>     not "unsigned long long). That surprised me).
>>   - It might break some user-space expectations.  One thing I have learnt
>>     is not to make any assumption about what other people might expect.
>
> Wouldn't any filesystem boundary check fail to stop at subvolume boundary?
>
> Then it will go through the full btrfs subvolumes/snapshots, which can
> be super slow.
>
>>
>> However, it would be quite easy to make this opt-in (or opt-out) with a
>> mount option, so that people who need the current inode numbers and will
>> accept the current breakage can keep working.
>>
>> I think this approach would be a net-win for NFS export, whether BTRFS
>> supports it directly or not.  I might post a patch which modifies NFS to
>> intuit improved inode numbers for btrfs exports....
>
> Some extra ideas, but not familiar with VFS enough to be sure.
>
> Can we generate "fake" superblock for each subvolume?
> Like using the subolume UUID to replace the FSID of each subvolume.
> Could that migrate the problem?
>
> Thanks,
> Qu
>
>>
>> So: how would this break your use-case??
>>
>> Thanks,
>> NeilBrown
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 0117d867ecf8..8dc58c848502 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -6020,6 +6020,37 @@ static int btrfs_opendir(struct inode *inode,
>> struct file *file)
>>       return 0;
>>   }
>>
>> +static u64 btrfs_make_inum(struct btrfs_key *root, struct btrfs_key
>> *ino)
>> +{
>> +    u64 rootid = root->objectid;
>> +    u64 inoid = ino->objectid;
>> +    int shift = 64-8;
>> +
>> +    if (ino->type == BTRFS_ROOT_ITEM_KEY) {
>> +        /* This is a subvol root found during readdir. */
>> +        rootid = inoid;
>> +        inoid = BTRFS_FIRST_FREE_OBJECTID;
>> +    }
>> +    if (rootid == BTRFS_FS_TREE_OBJECTID)
>> +        /* this is main vol, not subvol (I think) */
>> +        return inoid;
>> +    /* store the rootid in the high bits of the inum.  This
>> +     * will break if 32bit inums are required - we cannot know
>> +     */
>> +    while (rootid) {
>> +        inoid ^= (rootid & 0xff) << shift;
>> +        rootid >>= 8;
>> +        shift -= 8;
>> +    }
>> +    return inoid;
>> +}
>> +
>> +static inline u64 btrfs_ino_to_inum(struct inode *inode)
>> +{
>> +    return btrfs_make_inum(&BTRFS_I(inode)->root->root_key,
>> +                   &BTRFS_I(inode)->location);
>> +}
>> +
>>   struct dir_entry {
>>       u64 ino;
>>       u64 offset;
>> @@ -6045,6 +6076,49 @@ static int btrfs_filldir(void *addr, int
>> entries, struct dir_context *ctx)
>>       return 0;
>>   }
>>
>> +static inline bool btrfs_dir_emit_dot(struct file *file,
>> +                      struct dir_context *ctx)
>> +{
>> +    return ctx->actor(ctx, ".", 1, ctx->pos,
>> +              btrfs_ino_to_inum(file->f_path.dentry->d_inode),
>> +              DT_DIR) == 0;
>> +}
>> +
>> +static inline ino_t btrfs_parent_ino(struct dentry *dentry)
>> +{
>> +    ino_t res;
>> +
>> +    /*
>> +     * Don't strictly need d_lock here? If the parent ino could change
>> +     * then surely we'd have a deeper race in the caller?
>> +     */
>> +    spin_lock(&dentry->d_lock);
>> +    res = btrfs_ino_to_inum(dentry->d_parent->d_inode);
>> +    spin_unlock(&dentry->d_lock);
>> +    return res;
>> +}
>> +
>> +static inline bool btrfs_dir_emit_dotdot(struct file *file,
>> +                     struct dir_context *ctx)
>> +{
>> +    return ctx->actor(ctx, "..", 2, ctx->pos,
>> +              btrfs_parent_ino(file->f_path.dentry), DT_DIR) == 0;
>> +}
>> +static inline bool btrfs_dir_emit_dots(struct file *file,
>> +                       struct dir_context *ctx)
>> +{
>> +    if (ctx->pos == 0) {
>> +        if (!btrfs_dir_emit_dot(file, ctx))
>> +            return false;
>> +        ctx->pos = 1;
>> +    }
>> +    if (ctx->pos == 1) {
>> +        if (!btrfs_dir_emit_dotdot(file, ctx))
>> +            return false;
>> +        ctx->pos = 2;
>> +    }
>> +    return true;
>> +}
>>   static int btrfs_real_readdir(struct file *file, struct dir_context
>> *ctx)
>>   {
>>       struct inode *inode = file_inode(file);
>> @@ -6067,7 +6141,7 @@ static int btrfs_real_readdir(struct file *file,
>> struct dir_context *ctx)
>>       bool put = false;
>>       struct btrfs_key location;
>>
>> -    if (!dir_emit_dots(file, ctx))
>> +    if (!btrfs_dir_emit_dots(file, ctx))
>>           return 0;
>>
>>       path = btrfs_alloc_path();
>> @@ -6136,7 +6210,8 @@ static int btrfs_real_readdir(struct file *file,
>> struct dir_context *ctx)
>>           put_unaligned(fs_ftype_to_dtype(btrfs_dir_type(leaf, di)),
>>                   &entry->type);
>>           btrfs_dir_item_key_to_cpu(leaf, di, &location);
>> -        put_unaligned(location.objectid, &entry->ino);
>> +        put_unaligned(btrfs_make_inum(&root->root_key, &location),
>> +                  &entry->ino);
>>           put_unaligned(found_key.offset, &entry->offset);
>>           entries++;
>>           addr += sizeof(struct dir_entry) + name_len;
>> @@ -9193,7 +9268,7 @@ static int btrfs_getattr(struct user_namespace
>> *mnt_userns,
>>                     STATX_ATTR_NODUMP);
>>
>>       generic_fillattr(&init_user_ns, inode, stat);
>> -    stat->dev = BTRFS_I(inode)->root->anon_dev;
>> +    stat->ino = btrfs_ino_to_inum(inode);
>>
>>       spin_lock(&BTRFS_I(inode)->lock);
>>       delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
>>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 04/11] VFS: export lookup_mnt()
  2021-07-30  0:31   ` Al Viro
@ 2021-07-30  5:33     ` NeilBrown
  0 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-30  5:33 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Fri, 30 Jul 2021, Al Viro wrote:
> On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > In order to support filehandle lookup in filesystems with internal
> > mounts (multiple subvols in the one filesystem) reconnect_path() in
> > exportfs will need to find out if a given dentry is already mounted.
> > This can be done with the function lookup_mnt(), so export that to make
> > it available.
> 
> IMO having exportfs modular is wrong - note that fs/fhandle.c is
> 	* calling functions in exportfs
> 	* non-modular
> 	* ... and not going to be modular, no matter what - there
> are syscalls in it.
> 
> 

I agree - it makes sense for exportfs to be non-module.  It cannot be
module if FHANDLE is enabled, and if you don't want FHANDLE you probably
don't want EXPORTFS either.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points.
  2021-07-30  0:42   ` Al Viro
@ 2021-07-30  5:43     ` NeilBrown
  0 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-30  5:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Fri, 30 Jul 2021, Al Viro wrote:
> On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> 
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index baa12ac36ece..22523e1cd478 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -64,7 +64,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
> >  			    .dentry = dget(path_parent->dentry)};
> >  	int err = 0;
> >  
> > -	err = follow_down(&path, 0);
> > +	err = follow_down(&path, LOOKUP_AUTOMOUNT);
> >  	if (err < 0)
> >  		goto out;
> >  	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
> > @@ -73,6 +73,13 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
> >  		path_put(&path);
> >  		goto out;
> >  	}
> > +	if (mount_is_internal(path.mnt)) {
> > +		/* Use the new path, but don't look for a new export */
> > +		/* FIXME should I check NOHIDE in this case?? */
> > +		path_put(path_parent);
> > +		*path_parent = path;
> > +		goto out;
> > +	}
> 
> ... IOW, mount_is_internal() is called with no exclusion whatsoever.  What's there
> to
> 	* keep its return value valid?
> 	* prevent fetching ->mnt_mountpoint, getting preempted away, having
> the mount moved *and* what used to be ->mnt_mountpoint evicted from dcache,
> now that it's no longer pinned, then mount_is_internal() regaining CPU and
> dereferencing ->mnt_mountpoint, which now points to hell knows what?
> 

Yes, mount_is_internal needs to same mount_lock protection that
lookup_mnt() has.  Thanks.

I don't think it matter how long the result remains valid.  The only
realistic transtion is from True to False, but the fact that it *was*
True means that it is acceptable for the lookup to have succeeded.
i.e.  If the mountpoint was moved which a request was being processed it
will either cause the same result as if it happened before the request
started, or after it finished.  Either seems OK.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  5:31                     ` Qu Wenruo
@ 2021-07-30  5:53                       ` Amir Goldstein
  2021-07-30  6:00                       ` NeilBrown
  1 sibling, 0 replies; 123+ messages in thread
From: Amir Goldstein @ 2021-07-30  5:53 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: NeilBrown, Zygo Blaxell, Neal Gompa, Wang Yugui,
	Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Btrfs BTRFS

On Fri, Jul 30, 2021 at 8:33 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2021/7/30 下午1:25, Qu Wenruo wrote:
> >
> >
> > On 2021/7/30 上午10:36, NeilBrown wrote:
> >>
> >> I've been pondering all the excellent feedback, and what I have learnt
> >> from examining the code in btrfs, and I have developed a different
> >> perspective.
> >
> > Great! Some new developers into the btrfs realm!
> >
> >>
> >> Maybe "subvol" is a poor choice of name because it conjures up
> >> connections with the Volumes in LVM, and btrfs subvols are very different
> >> things.  Btrfs subvols are really just subtrees that can be treated as a
> >> unit for operations like "clone" or "destroy".
> >>
> >> As such, they don't really deserve separate st_dev numbers.
> >>
> >> Maybe the different st_dev numbers were introduced as a "cheap" way to
> >> extend to size of the inode-number space.  Like many "cheap" things, it
> >> has hidden costs.
>
> Forgot another problem already caused by this st_dev method.
>
> Since btrfs uses st_dev to distinguish them its inode name space, and
> st_dev is allocated using anonymous bdev, and the anonymous bdev poor
> has limited size (much smaller than btrfs subvolume id name space), it's
> already causing problems like we can't allocate enough anonymous bdev
> for each subvolume, and failed to create subvolume/snapshot.
>

How about creating a major dev for btrfs subvolumes to start with.
Then at least there is a possibility for administrative reservation of
st_dev values for subvols that need persistent <st_dev;st_ino>

By default subvols get assigned a minor dynamically as today
and with opt-in (e.g. for small short lived btrfs filesystems), the
unified st_dev approach can be used, possibly by providing
an upper limit to the inode numbers on the filesystem, similar to
xfs -o inode32 mount option.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-30  5:28     ` NeilBrown
@ 2021-07-30  5:54       ` Miklos Szeredi
  2021-07-30  6:13         ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Miklos Szeredi @ 2021-07-30  5:54 UTC (permalink / raw)
  To: NeilBrown
  Cc: Al Viro, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Fri, 30 Jul 2021 at 07:28, NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 30 Jul 2021, Al Viro wrote:
> > On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > > /proc/$PID/mountinfo contains a field for the device number of the
> > > filesystem at each mount.
> > >
> > > This is taken from the superblock ->s_dev field, which is correct for
> > > every filesystem except btrfs.  A btrfs filesystem can contain multiple
> > > subvols which each have a different device number.  If (a directory
> > > within) one of these subvols is mounted, the device number reported in
> > > mountinfo will be different from the device number reported by stat().
> > >
> > > This confuses some libraries and tools such as, historically, findmnt.
> > > Current findmnt seems to cope with the strangeness.
> > >
> > > So instead of using ->s_dev, call vfs_getattr_nosec() and use the ->dev
> > > provided.  As there is no STATX flag to ask for the device number, we
> > > pass a request mask for zero, and also ask the filesystem to avoid
> > > syncing with any remote service.
> >
> > Hard NAK.  You are putting IO (potentially - network IO, with no upper
> > limit on the completion time) under namespace_sem.
>
> Why would IO be generated? The inode must already be in cache because it
> is mounted, and STATX_DONT_SYNC is passed.  If a filesystem did IO in
> those circumstances, it would be broken.

STATX_DONT_SYNC is a hint, and while some network fs do honor it, not all do.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  5:25                   ` Qu Wenruo
  2021-07-30  5:31                     ` Qu Wenruo
@ 2021-07-30  5:58                     ` NeilBrown
  2021-07-30  6:23                       ` Qu Wenruo
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-30  5:58 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Zygo Blaxell, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS

On Fri, 30 Jul 2021, Qu Wenruo wrote:
> 
> On 2021/7/30 上午10:36, NeilBrown wrote:
> >
> > I've been pondering all the excellent feedback, and what I have learnt
> > from examining the code in btrfs, and I have developed a different
> > perspective.
> 
> Great! Some new developers into the btrfs realm!

:-)

> 
> >
> > Maybe "subvol" is a poor choice of name because it conjures up
> > connections with the Volumes in LVM, and btrfs subvols are very different
> > things.  Btrfs subvols are really just subtrees that can be treated as a
> > unit for operations like "clone" or "destroy".
> >
> > As such, they don't really deserve separate st_dev numbers.
> >
> > Maybe the different st_dev numbers were introduced as a "cheap" way to
> > extend to size of the inode-number space.  Like many "cheap" things, it
> > has hidden costs.
> >
> > Maybe objects in different subvols should still be given different inode
> > numbers.  This would be problematic on 32bit systems, but much less so on
> > 64bit systems.
> >
> > The patch below, which is just a proof-of-concept, changes btrfs to
> > report a uniform st_dev, and different (64bit) st_ino in different subvols.
> >
> > It has problems:
> >   - it will break any 32bit readdir and 32bit stat.  I don't know how big
> >     a problem that is these days (ino_t in the kernel is "unsigned long",
> >     not "unsigned long long). That surprised me).
> >   - It might break some user-space expectations.  One thing I have learnt
> >     is not to make any assumption about what other people might expect.
> 
> Wouldn't any filesystem boundary check fail to stop at subvolume boundary?

You mean like "du -x"?? Yes.  You would lose the misleading illusion
that there are multiple filesystems.  That is one user-expectation that
would need to be addressed before people opt-in

> 
> Then it will go through the full btrfs subvolumes/snapshots, which can
> be super slow.
> 
> >
> > However, it would be quite easy to make this opt-in (or opt-out) with a
> > mount option, so that people who need the current inode numbers and will
> > accept the current breakage can keep working.
> >
> > I think this approach would be a net-win for NFS export, whether BTRFS
> > supports it directly or not.  I might post a patch which modifies NFS to
> > intuit improved inode numbers for btrfs exports....
> 
> Some extra ideas, but not familiar with VFS enough to be sure.
> 
> Can we generate "fake" superblock for each subvolume?

I don't see how that would help.  Either subvols are like filesystems
and appear in /proc/mounts, or they aren't like filesystems and don't
get different st_dev.  Either of these outcomes can be achieved without
fake superblocks.  If you really need BTRFS subvols to have some
properties of filesystems but not all, then you are in for a whole world
of pain.

Maybe btrfs subvols should be treated more like XFS "managed trees".  At
least there you have precedent and someone else to share the pain.
Maybe we should train people to use "quota" to check the usage of a
subvol, rather than "du" (which will stop working with my patch if it
contains refs to other subvols) or "df" (which already doesn't work), or
"btrs df"

> Like using the subolume UUID to replace the FSID of each subvolume.
> Could that migrate the problem?

Which problem, exactly?  My first approach to making subvols work on NFS
took essentially that approach.  It was seen (quite reasonably) as a
hack to work around poor behaviour in btrfs.

Given that NFS has always seen all of a btrfs filesystem as have a
uniform fsid, I'm now of the opinion that we don't want to change that,
but should just fix the duplicate-inode-number problem.

If I could think of some way for NFSD to see different inode numbers
than VFS, I would push hard for fixs nfsd by giving it more sane inode
numbers.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  5:31                     ` Qu Wenruo
  2021-07-30  5:53                       ` Amir Goldstein
@ 2021-07-30  6:00                       ` NeilBrown
  2021-07-30  6:09                         ` Qu Wenruo
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-30  6:00 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Zygo Blaxell, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS

On Fri, 30 Jul 2021, Qu Wenruo wrote:
> 
> On 2021/7/30 下午1:25, Qu Wenruo wrote:
> >
> >
> > On 2021/7/30 上午10:36, NeilBrown wrote:
> >>
> >> I've been pondering all the excellent feedback, and what I have learnt
> >> from examining the code in btrfs, and I have developed a different
> >> perspective.
> >
> > Great! Some new developers into the btrfs realm!
> >
> >>
> >> Maybe "subvol" is a poor choice of name because it conjures up
> >> connections with the Volumes in LVM, and btrfs subvols are very different
> >> things.  Btrfs subvols are really just subtrees that can be treated as a
> >> unit for operations like "clone" or "destroy".
> >>
> >> As such, they don't really deserve separate st_dev numbers.
> >>
> >> Maybe the different st_dev numbers were introduced as a "cheap" way to
> >> extend to size of the inode-number space.  Like many "cheap" things, it
> >> has hidden costs.
> 
> Forgot another problem already caused by this st_dev method.
> 
> Since btrfs uses st_dev to distinguish them its inode name space, and
> st_dev is allocated using anonymous bdev, and the anonymous bdev poor
> has limited size (much smaller than btrfs subvolume id name space), it's
> already causing problems like we can't allocate enough anonymous bdev
> for each subvolume, and failed to create subvolume/snapshot.

What sort of numbers do you see in practice? How many subvolumes and how
many inodes per subvolume?
If we allocated some number of bits to each, with over-allocation to
allow for growth, could we fit both into 64 bits?

NeilBrown


> 
> Thus it's really a time to re-consider how we should export this info to
> user space.
> 
> Thanks,
> Qu
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  0:13     ` Al Viro
@ 2021-07-30  6:08       ` NeilBrown
  0 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-07-30  6:08 UTC (permalink / raw)
  To: Al Viro
  Cc: Josef Bacik, J. Bruce Fields, Christoph Hellwig, Chuck Lever,
	Chris Mason, David Sterba, linux-fsdevel, linux-nfs, linux-btrfs

On Fri, 30 Jul 2021, Al Viro wrote:
> On Wed, Jul 28, 2021 at 05:30:04PM -0400, Josef Bacik wrote:
> 
> > I don't think anybody has that many file systems.  For btrfs it's a single
> > file system.  Think of syncfs, it's going to walk through all of the super
> > blocks on the system calling ->sync_fs on each subvol superblock.  Now this
> > isn't a huge deal, we could just have some flag that says "I'm not real" or
> > even just have anonymous superblocks that don't get added to the global
> > super_blocks list, and that would address my main pain points.
> 
> Umm...  Aren't the snapshots read-only by definition?

No, though they can be.
subvols can be created empty, or duplicated from an existing subvol.
Any subvol can be written, using copy-on-write of course.

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  6:00                       ` NeilBrown
@ 2021-07-30  6:09                         ` Qu Wenruo
  0 siblings, 0 replies; 123+ messages in thread
From: Qu Wenruo @ 2021-07-30  6:09 UTC (permalink / raw)
  To: NeilBrown
  Cc: Zygo Blaxell, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS



On 2021/7/30 下午2:00, NeilBrown wrote:
> On Fri, 30 Jul 2021, Qu Wenruo wrote:
>>
>> On 2021/7/30 下午1:25, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/7/30 上午10:36, NeilBrown wrote:
>>>>
>>>> I've been pondering all the excellent feedback, and what I have learnt
>>>> from examining the code in btrfs, and I have developed a different
>>>> perspective.
>>>
>>> Great! Some new developers into the btrfs realm!
>>>
>>>>
>>>> Maybe "subvol" is a poor choice of name because it conjures up
>>>> connections with the Volumes in LVM, and btrfs subvols are very different
>>>> things.  Btrfs subvols are really just subtrees that can be treated as a
>>>> unit for operations like "clone" or "destroy".
>>>>
>>>> As such, they don't really deserve separate st_dev numbers.
>>>>
>>>> Maybe the different st_dev numbers were introduced as a "cheap" way to
>>>> extend to size of the inode-number space.  Like many "cheap" things, it
>>>> has hidden costs.
>>
>> Forgot another problem already caused by this st_dev method.
>>
>> Since btrfs uses st_dev to distinguish them its inode name space, and
>> st_dev is allocated using anonymous bdev, and the anonymous bdev poor
>> has limited size (much smaller than btrfs subvolume id name space), it's
>> already causing problems like we can't allocate enough anonymous bdev
>> for each subvolume, and failed to create subvolume/snapshot.
>
> What sort of numbers do you see in practice? How many subvolumes and how
> many inodes per subvolume?

Normally the "live"(*) subvolume numbers are below the minor dev number
range (1<<20), thus not a big deal.

*: Live here means the subvolume is at least accessed once. Subvolume
exists but never accessed doesn't get its anonymous bdev number allocated.

But (1<<20) is really small compared some real-world users.
Thus we had some reports of such problem, and changed the timing to
allocate such bdev number.

> If we allocated some number of bits to each, with over-allocation to
> allow for growth, could we fit both into 64 bits?

I don't think it's even possible, as currently we use u32 for dev_t,
which is already way below the theoretical limit (U64_MAX - 512).

Thus AFAIK there is no real way to solve it right now.

Thanks,
Qu
>
> NeilBrown
>
>
>>
>> Thus it's really a time to re-consider how we should export this info to
>> user space.
>>
>> Thanks,
>> Qu
>>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-30  5:54       ` Miklos Szeredi
@ 2021-07-30  6:13         ` NeilBrown
  2021-07-30  7:18           ` Miklos Szeredi
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-30  6:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Fri, 30 Jul 2021, Miklos Szeredi wrote:
> On Fri, 30 Jul 2021 at 07:28, NeilBrown <neilb@suse.de> wrote:
> >
> > On Fri, 30 Jul 2021, Al Viro wrote:
> > > On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > > > /proc/$PID/mountinfo contains a field for the device number of the
> > > > filesystem at each mount.
> > > >
> > > > This is taken from the superblock ->s_dev field, which is correct for
> > > > every filesystem except btrfs.  A btrfs filesystem can contain multiple
> > > > subvols which each have a different device number.  If (a directory
> > > > within) one of these subvols is mounted, the device number reported in
> > > > mountinfo will be different from the device number reported by stat().
> > > >
> > > > This confuses some libraries and tools such as, historically, findmnt.
> > > > Current findmnt seems to cope with the strangeness.
> > > >
> > > > So instead of using ->s_dev, call vfs_getattr_nosec() and use the ->dev
> > > > provided.  As there is no STATX flag to ask for the device number, we
> > > > pass a request mask for zero, and also ask the filesystem to avoid
> > > > syncing with any remote service.
> > >
> > > Hard NAK.  You are putting IO (potentially - network IO, with no upper
> > > limit on the completion time) under namespace_sem.
> >
> > Why would IO be generated? The inode must already be in cache because it
> > is mounted, and STATX_DONT_SYNC is passed.  If a filesystem did IO in
> > those circumstances, it would be broken.
> 
> STATX_DONT_SYNC is a hint, and while some network fs do honor it, not all do.
> 

That's ... unfortunate.  Rather seems to spoil the whole point of having
a flag like that.  Maybe it should have been called 
   "STATX_SYNC_OR_SYNC_NOT_THERE_IS_NO_GUARANTEE"

Thanks.
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  5:58                     ` NeilBrown
@ 2021-07-30  6:23                       ` Qu Wenruo
  2021-07-30  6:53                         ` NeilBrown
  2021-07-30 15:17                         ` J. Bruce Fields
  0 siblings, 2 replies; 123+ messages in thread
From: Qu Wenruo @ 2021-07-30  6:23 UTC (permalink / raw)
  To: NeilBrown
  Cc: Zygo Blaxell, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS



On 2021/7/30 下午1:58, NeilBrown wrote:
> On Fri, 30 Jul 2021, Qu Wenruo wrote:
>>
>> On 2021/7/30 上午10:36, NeilBrown wrote:
>>>
>>> I've been pondering all the excellent feedback, and what I have learnt
>>> from examining the code in btrfs, and I have developed a different
>>> perspective.
>>
>> Great! Some new developers into the btrfs realm!
>
> :-)
>
>>
>>>
>>> Maybe "subvol" is a poor choice of name because it conjures up
>>> connections with the Volumes in LVM, and btrfs subvols are very different
>>> things.  Btrfs subvols are really just subtrees that can be treated as a
>>> unit for operations like "clone" or "destroy".
>>>
>>> As such, they don't really deserve separate st_dev numbers.
>>>
>>> Maybe the different st_dev numbers were introduced as a "cheap" way to
>>> extend to size of the inode-number space.  Like many "cheap" things, it
>>> has hidden costs.
>>>
>>> Maybe objects in different subvols should still be given different inode
>>> numbers.  This would be problematic on 32bit systems, but much less so on
>>> 64bit systems.
>>>
>>> The patch below, which is just a proof-of-concept, changes btrfs to
>>> report a uniform st_dev, and different (64bit) st_ino in different subvols.
>>>
>>> It has problems:
>>>    - it will break any 32bit readdir and 32bit stat.  I don't know how big
>>>      a problem that is these days (ino_t in the kernel is "unsigned long",
>>>      not "unsigned long long). That surprised me).
>>>    - It might break some user-space expectations.  One thing I have learnt
>>>      is not to make any assumption about what other people might expect.
>>
>> Wouldn't any filesystem boundary check fail to stop at subvolume boundary?
>
> You mean like "du -x"?? Yes.  You would lose the misleading illusion
> that there are multiple filesystems.  That is one user-expectation that
> would need to be addressed before people opt-in

OK, forgot it's an opt-in feature, then it's less an impact.

But it can still sometimes be problematic.

E.g. if the user want to put some git code into one subvolume, while
export another subvolume through NFS.

Then the user has to opt-in, affecting the git subvolume to lose the
ability to determine subvolume boundary, right?

This is more concerning as most btrfs users won't want to explicitly
prepare another different btrfs.

>
>>
>> Then it will go through the full btrfs subvolumes/snapshots, which can
>> be super slow.
>>
>>>
>>> However, it would be quite easy to make this opt-in (or opt-out) with a
>>> mount option, so that people who need the current inode numbers and will
>>> accept the current breakage can keep working.
>>>
>>> I think this approach would be a net-win for NFS export, whether BTRFS
>>> supports it directly or not.  I might post a patch which modifies NFS to
>>> intuit improved inode numbers for btrfs exports....
>>
>> Some extra ideas, but not familiar with VFS enough to be sure.
>>
>> Can we generate "fake" superblock for each subvolume?
>
> I don't see how that would help.  Either subvols are like filesystems
> and appear in /proc/mounts, or they aren't like filesystems and don't
> get different st_dev.  Either of these outcomes can be achieved without
> fake superblocks.  If you really need BTRFS subvols to have some
> properties of filesystems but not all, then you are in for a whole world
> of pain.

I guess it's time we pay for the hacks...

>
> Maybe btrfs subvols should be treated more like XFS "managed trees".  At
> least there you have precedent and someone else to share the pain.
> Maybe we should train people to use "quota" to check the usage of a
> subvol,

Well, btrfs quota has its own pain...

> rather than "du" (which will stop working with my patch if it
> contains refs to other subvols) or "df" (which already doesn't work), or
> "btrs df"


BTW, since XFS has a similar feature (not sure with XFS though), I guess
in the long run, it may be worthy to make the VFS to have some way to
treat such concept that is not full volume but still different trees.

>
>> Like using the subolume UUID to replace the FSID of each subvolume.
>> Could that migrate the problem?
>
> Which problem, exactly?  My first approach to making subvols work on NFS
> took essentially that approach.  It was seen (quite reasonably) as a
> hack to work around poor behaviour in btrfs.
>
> Given that NFS has always seen all of a btrfs filesystem as have a
> uniform fsid, I'm now of the opinion that we don't want to change that,
> but should just fix the duplicate-inode-number problem.
>
> If I could think of some way for NFSD to see different inode numbers
> than VFS, I would push hard for fixs nfsd by giving it more sane inode
> numbers.

Really not familiar with NFS/VFS, thus some ideas from me may sounds
super crazy.

Is it possible that, for nfsd to detect such "subvolume" concept by its
own, like checking st_dev and the fsid returned from statfs().

Then if nfsd find some boundary which has different st_dev, but the same
fsid as its parent, then it knows it's a "subvolume"-like concept.

Then do some local inode number mapping inside nfsd?
Like use the highest 20 bits for different subvolumes, while the
remaining 44 bits for real inode numbers.

Of-course, this is still a workaround...

Thanks,
Qu

>
> Thanks,
> NeilBrown
>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  6:23                       ` Qu Wenruo
@ 2021-07-30  6:53                         ` NeilBrown
  2021-07-30  7:09                           ` Qu Wenruo
  2021-07-30 15:17                         ` J. Bruce Fields
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-30  6:53 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Zygo Blaxell, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS

On Fri, 30 Jul 2021, Qu Wenruo wrote:
> >
> > You mean like "du -x"?? Yes.  You would lose the misleading illusion
> > that there are multiple filesystems.  That is one user-expectation that
> > would need to be addressed before people opt-in
> 
> OK, forgot it's an opt-in feature, then it's less an impact.

The hope would have to be that everyone would eventually opt-in once all
issues were understood.

> 
> Really not familiar with NFS/VFS, thus some ideas from me may sounds
> super crazy.
> 
> Is it possible that, for nfsd to detect such "subvolume" concept by its
> own, like checking st_dev and the fsid returned from statfs().
> 
> Then if nfsd find some boundary which has different st_dev, but the same
> fsid as its parent, then it knows it's a "subvolume"-like concept.
> 
> Then do some local inode number mapping inside nfsd?
> Like use the highest 20 bits for different subvolumes, while the
> remaining 44 bits for real inode numbers.
> 
> Of-course, this is still a workaround...

Yes, it would certainly be possible to add some hacks to nfsd to fix the
immediate problem, and we could probably even created some well-defined
interfaces into btrfs to extract the required information so that it
wasn't too hackish.

Maybe that is what we will have to do.  But I'd rather not hack NFSD
while there is any chance that a more complete solution will be found.

I'm not quite ready to give up on the idea of squeezing all btrfs inodes
into a 64bit number space.  24bits of subvol and 40 bits of inode?
Make the split a mkfs or mount option?
Maybe hand out inode numbers to subvols in 2^32 chunks so each subvol
(which has ever been accessed) has a mapping from the top 32 bits of the
objectid to the top 32 bits of the inode number.

We don't need something that is theoretically perfect (that's not
possible anyway as we don't have 64bits of device numbers).  We just
need something that is practical and scales adequately.  If you have
petabytes of storage, it is reasonable to spend a gigabyte of memory on
a lookup table(?).

If we can make inode numbers unique, we can possibly leave the st_dev
changing at subvols so that "du -x" works as currently expected.

One thought I had was to use a strong hash to combine the subvol object
id and the inode object id into a 64bit number.  What is the chance of
a collision in practice :-)

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  6:53                         ` NeilBrown
@ 2021-07-30  7:09                           ` Qu Wenruo
  2021-07-30 18:15                             ` Zygo Blaxell
  0 siblings, 1 reply; 123+ messages in thread
From: Qu Wenruo @ 2021-07-30  7:09 UTC (permalink / raw)
  To: NeilBrown
  Cc: Zygo Blaxell, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS



On 2021/7/30 下午2:53, NeilBrown wrote:
> On Fri, 30 Jul 2021, Qu Wenruo wrote:
>>>
>>> You mean like "du -x"?? Yes.  You would lose the misleading illusion
>>> that there are multiple filesystems.  That is one user-expectation that
>>> would need to be addressed before people opt-in
>>
>> OK, forgot it's an opt-in feature, then it's less an impact.
>
> The hope would have to be that everyone would eventually opt-in once all
> issues were understood.
>
>>
>> Really not familiar with NFS/VFS, thus some ideas from me may sounds
>> super crazy.
>>
>> Is it possible that, for nfsd to detect such "subvolume" concept by its
>> own, like checking st_dev and the fsid returned from statfs().
>>
>> Then if nfsd find some boundary which has different st_dev, but the same
>> fsid as its parent, then it knows it's a "subvolume"-like concept.
>>
>> Then do some local inode number mapping inside nfsd?
>> Like use the highest 20 bits for different subvolumes, while the
>> remaining 44 bits for real inode numbers.
>>
>> Of-course, this is still a workaround...
>
> Yes, it would certainly be possible to add some hacks to nfsd to fix the
> immediate problem, and we could probably even created some well-defined
> interfaces into btrfs to extract the required information so that it
> wasn't too hackish.
>
> Maybe that is what we will have to do.  But I'd rather not hack NFSD
> while there is any chance that a more complete solution will be found.
>
> I'm not quite ready to give up on the idea of squeezing all btrfs inodes
> into a 64bit number space.  24bits of subvol and 40 bits of inode?
> Make the split a mkfs or mount option?

Btrfs used to have a subvolume number limit in the past, for different
reasons.

In that case, subvolume number is limited to 48 bits, which is still too
large to avoid conflicts.

For inode number there is really no limit except the 256 ~ (U64)-256 limit.

Considering all these numbers are almost U64, conflicts would be
unavoidable AFAIK.

> Maybe hand out inode numbers to subvols in 2^32 chunks so each subvol
> (which has ever been accessed) has a mapping from the top 32 bits of the
> objectid to the top 32 bits of the inode number.
>
> We don't need something that is theoretically perfect (that's not
> possible anyway as we don't have 64bits of device numbers).  We just
> need something that is practical and scales adequately.  If you have
> petabytes of storage, it is reasonable to spend a gigabyte of memory on
> a lookup table(?).

Can such squishing-all-inodes-into-one-namespace work to be done in a
more generic way? e.g, let each fs with "subvolume"-like feature to
provide the interface to do that.


Despite that I still hope to have a way to distinguish the "subvolume"
boundary.

If completely inside btrfs, it's pretty simple to locate a subvolume
boundary.
All subvolume have the same inode number 256.

Maybe we could reserve some special "squished" inode number to indicate
boundary inside a filesystem.

E.g. reserve (u64)-1 as a special indicator for subvolume boundaries.
As most fs would have reserved super high inode numbers anyway.


>
> If we can make inode numbers unique, we can possibly leave the st_dev
> changing at subvols so that "du -x" works as currently expected.
>
> One thought I had was to use a strong hash to combine the subvol object
> id and the inode object id into a 64bit number.  What is the chance of
> a collision in practice :-)

But with just 64bits, conflicts will happen anyway...

Thanks,
Qu
>
> Thanks,
> NeilBrown
>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-30  6:13         ` NeilBrown
@ 2021-07-30  7:18           ` Miklos Szeredi
  2021-07-30  7:33             ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Miklos Szeredi @ 2021-07-30  7:18 UTC (permalink / raw)
  To: NeilBrown
  Cc: Al Viro, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Fri, 30 Jul 2021 at 08:13, NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 30 Jul 2021, Miklos Szeredi wrote:
> > On Fri, 30 Jul 2021 at 07:28, NeilBrown <neilb@suse.de> wrote:
> > >
> > > On Fri, 30 Jul 2021, Al Viro wrote:
> > > > On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > > > > /proc/$PID/mountinfo contains a field for the device number of the
> > > > > filesystem at each mount.
> > > > >
> > > > > This is taken from the superblock ->s_dev field, which is correct for
> > > > > every filesystem except btrfs.  A btrfs filesystem can contain multiple
> > > > > subvols which each have a different device number.  If (a directory
> > > > > within) one of these subvols is mounted, the device number reported in
> > > > > mountinfo will be different from the device number reported by stat().
> > > > >
> > > > > This confuses some libraries and tools such as, historically, findmnt.
> > > > > Current findmnt seems to cope with the strangeness.
> > > > >
> > > > > So instead of using ->s_dev, call vfs_getattr_nosec() and use the ->dev
> > > > > provided.  As there is no STATX flag to ask for the device number, we
> > > > > pass a request mask for zero, and also ask the filesystem to avoid
> > > > > syncing with any remote service.
> > > >
> > > > Hard NAK.  You are putting IO (potentially - network IO, with no upper
> > > > limit on the completion time) under namespace_sem.
> > >
> > > Why would IO be generated? The inode must already be in cache because it
> > > is mounted, and STATX_DONT_SYNC is passed.  If a filesystem did IO in
> > > those circumstances, it would be broken.
> >
> > STATX_DONT_SYNC is a hint, and while some network fs do honor it, not all do.
> >
>
> That's ... unfortunate.  Rather seems to spoil the whole point of having
> a flag like that.  Maybe it should have been called
>    "STATX_SYNC_OR_SYNC_NOT_THERE_IS_NO_GUARANTEE"

And I guess just about every filesystem would need to be fixed to
prevent starting I/O on STATX_DONT_SYNC, as block I/O could just as
well generate network traffic.

Probably much easier fix btrfs to use some sort of subvolume structure
that the VFS knows about.  I think there's been talk about that for a
long time, not sure where it got stalled.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-30  7:18           ` Miklos Szeredi
@ 2021-07-30  7:33             ` NeilBrown
  2021-07-30  7:59               ` Miklos Szeredi
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-07-30  7:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

>On Fri, 30 Jul 2021, Miklos Szeredi wrote:
> On Fri, 30 Jul 2021 at 08:13, NeilBrown <neilb@suse.de> wrote:
> >
> > On Fri, 30 Jul 2021, Miklos Szeredi wrote:
> > > On Fri, 30 Jul 2021 at 07:28, NeilBrown <neilb@suse.de> wrote:
> > > >
> > > > On Fri, 30 Jul 2021, Al Viro wrote:
> > > > > On Wed, Jul 28, 2021 at 08:37:45AM +1000, NeilBrown wrote:
> > > > > > /proc/$PID/mountinfo contains a field for the device number of the
> > > > > > filesystem at each mount.
> > > > > >
> > > > > > This is taken from the superblock ->s_dev field, which is correct for
> > > > > > every filesystem except btrfs.  A btrfs filesystem can contain multiple
> > > > > > subvols which each have a different device number.  If (a directory
> > > > > > within) one of these subvols is mounted, the device number reported in
> > > > > > mountinfo will be different from the device number reported by stat().
> > > > > >
> > > > > > This confuses some libraries and tools such as, historically, findmnt.
> > > > > > Current findmnt seems to cope with the strangeness.
> > > > > >
> > > > > > So instead of using ->s_dev, call vfs_getattr_nosec() and use the ->dev
> > > > > > provided.  As there is no STATX flag to ask for the device number, we
> > > > > > pass a request mask for zero, and also ask the filesystem to avoid
> > > > > > syncing with any remote service.
> > > > >
> > > > > Hard NAK.  You are putting IO (potentially - network IO, with no upper
> > > > > limit on the completion time) under namespace_sem.
> > > >
> > > > Why would IO be generated? The inode must already be in cache because it
> > > > is mounted, and STATX_DONT_SYNC is passed.  If a filesystem did IO in
> > > > those circumstances, it would be broken.
> > >
> > > STATX_DONT_SYNC is a hint, and while some network fs do honor it, not all do.
> > >
> >
> > That's ... unfortunate.  Rather seems to spoil the whole point of having
> > a flag like that.  Maybe it should have been called
> >    "STATX_SYNC_OR_SYNC_NOT_THERE_IS_NO_GUARANTEE"
> 
> And I guess just about every filesystem would need to be fixed to
> prevent starting I/O on STATX_DONT_SYNC, as block I/O could just as
> well generate network traffic.

Certainly I think that would be appropriate.  If the information simply
isn't available EWOULDBLOCK could be returned.

> 
> Probably much easier fix btrfs to use some sort of subvolume structure
> that the VFS knows about.  I think there's been talk about that for a
> long time, not sure where it got stalled.

An easy fix for this particular patch is to add a super_operation which
provides the devno to show in /proc/self/mountinfo.  There are already a
number of show_foo super_operations to show other content.

But I'm curious about your reference to "some sort of subvolume
structure that the VFS knows about".  Do you have any references, or can
you suggest a search term I could try?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 01/11] VFS: show correct dev num in mountinfo
  2021-07-30  7:33             ` NeilBrown
@ 2021-07-30  7:59               ` Miklos Szeredi
  2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Miklos Szeredi @ 2021-07-30  7:59 UTC (permalink / raw)
  To: NeilBrown
  Cc: Al Viro, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Fri, 30 Jul 2021 at 09:34, NeilBrown <neilb@suse.de> wrote:

> But I'm curious about your reference to "some sort of subvolume
> structure that the VFS knows about".  Do you have any references, or can
> you suggest a search term I could try?

Found this:
https://lore.kernel.org/linux-fsdevel/20180508180436.716-1-mfasheh@suse.de/

I also remember discussing it at some LSF/MM with the btrfs devs, but
no specific conclusion.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  6:23                       ` Qu Wenruo
  2021-07-30  6:53                         ` NeilBrown
@ 2021-07-30 15:17                         ` J. Bruce Fields
  2021-07-30 15:48                           ` Josef Bacik
  1 sibling, 1 reply; 123+ messages in thread
From: J. Bruce Fields @ 2021-07-30 15:17 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: NeilBrown, Zygo Blaxell, Neal Gompa, Wang Yugui,
	Christoph Hellwig, Josef Bacik, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS

On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
> OK, forgot it's an opt-in feature, then it's less an impact.
> 
> But it can still sometimes be problematic.
> 
> E.g. if the user want to put some git code into one subvolume, while
> export another subvolume through NFS.
> 
> Then the user has to opt-in, affecting the git subvolume to lose the
> ability to determine subvolume boundary, right?

Totally naive question: is it be possible to treat different subvolumes
differently, and give the user some choice at subvolume creation time
how this new boundary should behave?

It seems like there are some conflicting priorities that can only be
resolved by someone who knows the intended use case.

--b.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30 15:17                         ` J. Bruce Fields
@ 2021-07-30 15:48                           ` Josef Bacik
  2021-07-30 16:25                             ` Forza
  2021-07-30 17:43                             ` Zygo Blaxell
  0 siblings, 2 replies; 123+ messages in thread
From: Josef Bacik @ 2021-07-30 15:48 UTC (permalink / raw)
  To: J. Bruce Fields, Qu Wenruo
  Cc: NeilBrown, Zygo Blaxell, Neal Gompa, Wang Yugui,
	Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On 7/30/21 11:17 AM, J. Bruce Fields wrote:
> On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
>> OK, forgot it's an opt-in feature, then it's less an impact.
>>
>> But it can still sometimes be problematic.
>>
>> E.g. if the user want to put some git code into one subvolume, while
>> export another subvolume through NFS.
>>
>> Then the user has to opt-in, affecting the git subvolume to lose the
>> ability to determine subvolume boundary, right?
> 
> Totally naive question: is it be possible to treat different subvolumes
> differently, and give the user some choice at subvolume creation time
> how this new boundary should behave?
> 
> It seems like there are some conflicting priorities that can only be
> resolved by someone who knows the intended use case.
> 

This is the crux of the problem.  We have no real interfaces or anything to deal 
with this sort of paradigm.  We do the st_dev thing because that's the most 
common way that tools like find or rsync use to determine they've wandered into 
a "different" volume.  This exists specifically because of usescases like 
Zygo's, where he's taking thousands of snapshots and manually excluding them 
from find/rsync is just not reasonable.

We have no good way to give the user information about what's going on, we just 
have these old shitty interfaces.  I asked our guys about filling up 
/proc/self/mountinfo with our subvolumes and they had a heart attack because we 
have around 2-4k subvolumes on machines, and with monitoring stuff in place we 
regularly read /proc/self/mountinfo to determine what's mounted and such.

And then there's NFS which needs to know that it's walked into a new inode space.

This is all super shitty, and mostly exists because we don't have a good way to 
expose to the user wtf is going on.

Personally I would be ok with simply disallowing NFS to wander into subvolumes 
from an exported fs.  If you want to export subvolumes then export them 
individually, otherwise if you walk into a subvolume from NFS you simply get an 
empty directory.

This doesn't solve the mountinfo problem where a user may want to figure out 
which subvol they're in, but this is where I think we could address the issue 
with better interfaces.  Or perhaps Neil's idea to have a common major number 
with a different minor number for every subvol.

Either way this isn't as simple as shoehorning it into automount and being done 
with it, we need to take a step back and think about how should this actually 
look, taking into account we've got 12 years of having Btrfs deployed with 
existing usecases that expect a certain behavior.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30 15:48                           ` Josef Bacik
@ 2021-07-30 16:25                             ` Forza
  2021-07-30 17:43                             ` Zygo Blaxell
  1 sibling, 0 replies; 123+ messages in thread
From: Forza @ 2021-07-30 16:25 UTC (permalink / raw)
  To: Josef Bacik, J. Bruce Fields, Qu Wenruo
  Cc: NeilBrown, Zygo Blaxell, Neal Gompa, Wang Yugui,
	Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS



On 2021-07-30 17:48, Josef Bacik wrote:
> On 7/30/21 11:17 AM, J. Bruce Fields wrote:
>> On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
>>> OK, forgot it's an opt-in feature, then it's less an impact.
>>>
>>> But it can still sometimes be problematic.
>>>
>>> E.g. if the user want to put some git code into one subvolume, while
>>> export another subvolume through NFS.
>>>
>>> Then the user has to opt-in, affecting the git subvolume to lose the
>>> ability to determine subvolume boundary, right?
>>
>> Totally naive question: is it be possible to treat different subvolumes
>> differently, and give the user some choice at subvolume creation time
>> how this new boundary should behave?
>>
>> It seems like there are some conflicting priorities that can only be
>> resolved by someone who knows the intended use case.
>>
> 
> This is the crux of the problem.  We have no real interfaces or anything 
> to deal with this sort of paradigm.  We do the st_dev thing because 
> that's the most common way that tools like find or rsync use to 
> determine they've wandered into a "different" volume.  This exists 
> specifically because of usescases like Zygo's, where he's taking 
> thousands of snapshots and manually excluding them from find/rsync is 
> just not reasonable.
> 
> We have no good way to give the user information about what's going on, 
> we just have these old shitty interfaces.  I asked our guys about 
> filling up /proc/self/mountinfo with our subvolumes and they had a heart 
> attack because we have around 2-4k subvolumes on machines, and with 
> monitoring stuff in place we regularly read /proc/self/mountinfo to 
> determine what's mounted and such.
> 
> And then there's NFS which needs to know that it's walked into a new 
> inode space.
> 
> This is all super shitty, and mostly exists because we don't have a good 
> way to expose to the user wtf is going on.
> 
> Personally I would be ok with simply disallowing NFS to wander into 
> subvolumes from an exported fs.  If you want to export subvolumes then 
> export them individually, otherwise if you walk into a subvolume from 
> NFS you simply get an empty directory.
> 
> This doesn't solve the mountinfo problem where a user may want to figure 
> out which subvol they're in, but this is where I think we could address 
> the issue with better interfaces.  Or perhaps Neil's idea to have a 
> common major number with a different minor number for every subvol.
> 
> Either way this isn't as simple as shoehorning it into automount and 
> being done with it, we need to take a step back and think about how 
> should this actually look, taking into account we've got 12 years of 
> having Btrfs deployed with existing usecases that expect a certain 
> behavior.  Thanks,
> 
> Josef


As a user and sysadmin I really appreciate the way Btrfs currently works.

We use hourly snapshots which are exposed over Samba as "Previous 
Versions" to Windows users. This amounts to thousands of snapshots, all 
user serviceable. A great feature!

In Samba world we have a mount option[1] called "noserverino" which lets 
the client generate unique inode numbers, rather than using the server 
provided inode numbers. This allows Linux clients to work well against 
servers exposing subvolumes and snapshots.

NFS has really old roots and had to make choices that we don't really 
have to make today. Can we not provide something similar to mount.cifs 
that generate unique inode numbers for the clients. This could be either 
an nfsd export option (such as /mnt/foo *(rw,uniq_inodes)) or a mount 
option on the clients.

One worry I have with making subvolumes automountpoints is that it might 
affect the possibility to cp --reflink across that boundary.



[1] https://www.samba.org/~ab/output/htmldocs/manpages-3/mount.cifs.8.html




^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30 15:48                           ` Josef Bacik
  2021-07-30 16:25                             ` Forza
@ 2021-07-30 17:43                             ` Zygo Blaxell
  1 sibling, 0 replies; 123+ messages in thread
From: Zygo Blaxell @ 2021-07-30 17:43 UTC (permalink / raw)
  To: Josef Bacik
  Cc: J. Bruce Fields, Qu Wenruo, NeilBrown, Neal Gompa, Wang Yugui,
	Christoph Hellwig, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, Btrfs BTRFS

On Fri, Jul 30, 2021 at 11:48:15AM -0400, Josef Bacik wrote:
> On 7/30/21 11:17 AM, J. Bruce Fields wrote:
> > On Fri, Jul 30, 2021 at 02:23:44PM +0800, Qu Wenruo wrote:
> > > OK, forgot it's an opt-in feature, then it's less an impact.
> > > 
> > > But it can still sometimes be problematic.
> > > 
> > > E.g. if the user want to put some git code into one subvolume, while
> > > export another subvolume through NFS.
> > > 
> > > Then the user has to opt-in, affecting the git subvolume to lose the
> > > ability to determine subvolume boundary, right?
> > 
> > Totally naive question: is it be possible to treat different subvolumes
> > differently, and give the user some choice at subvolume creation time
> > how this new boundary should behave?
> > 
> > It seems like there are some conflicting priorities that can only be
> > resolved by someone who knows the intended use case.
> > 
> 
> This is the crux of the problem.  We have no real interfaces or anything to
> deal with this sort of paradigm.  We do the st_dev thing because that's the
> most common way that tools like find or rsync use to determine they've
> wandered into a "different" volume.  This exists specifically because of
> usescases like Zygo's, where he's taking thousands of snapshots and manually
> excluding them from find/rsync is just not reasonable.
> 
> We have no good way to give the user information about what's going on, we
> just have these old shitty interfaces.  I asked our guys about filling up
> /proc/self/mountinfo with our subvolumes and they had a heart attack because
> we have around 2-4k subvolumes on machines, and with monitoring stuff in
> place we regularly read /proc/self/mountinfo to determine what's mounted and
> such.
> 
> And then there's NFS which needs to know that it's walked into a new inode space.

NFS somehow works surprisingly well without knowing that.  I didn't know
there was a problem with NFS, despite exporting thousands of btrfs subvols
from a single export point for 7 years.  Maybe I have some non-default
setting in /etc/exports which works around the problems, or maybe I got
lucky, and all my use cases are weirdly specific and evade all the bugs
by accident?

> This is all super shitty, and mostly exists because we don't have a good way
> to expose to the user wtf is going on.
> 
> Personally I would be ok with simply disallowing NFS to wander into
> subvolumes from an exported fs.  If you want to export subvolumes then
> export them individually, otherwise if you walk into a subvolume from NFS
> you simply get an empty directory.

As a present exporter of thousands of btrfs subvols over NFS from single
export points, I'm not a fan of this idea.

> This doesn't solve the mountinfo problem where a user may want to figure out
> which subvol they're in, but this is where I think we could address the
> issue with better interfaces.  Or perhaps Neil's idea to have a common major
> number with a different minor number for every subvol.

It's not hard to figure out what subvol you're in.  There's an ioctl
which tells the subvol ID, and another that tells the name.  The problem
is that it's btrfs-specific, and no existing software knows how and when
to use it (and also it's privileged, but that's easy to fix compared to
the other issues).

> Either way this isn't as simple as shoehorning it into automount and being
> done with it, we need to take a step back and think about how should this
> actually look, taking into account we've got 12 years of having Btrfs
> deployed with existing usecases that expect a certain behavior.  Thanks,

I think if we got into a time machine, went back 12 years, changed
the btrfs behavior, and then returned to the present, in the alternate
history, we would all be here today talking about how mountinfo doesn't
scale up to what btrfs throws at it, and can btrfs opt out of it somehow.

Maybe we could have a system call for mount point discovery?  Right now,
the kernel throws a trail of breadcrumbs into /proc/self/mountinfo,
and users use userspace libraries to translate that text blob into
actionable information.  We could solve problems with scalability and
visibility in mountinfo if we only had to provide the information in
the context of a single inode (i.e. the inode's parent or child mount
points accessible to the caller).

So you'd have a call for "get paths for all the mount points below inode
X" and another for "get paths for all mount points above inode X", and
calls that tell you details about mount points (like what they're mounted
on, which filesystem they are part of, what the mount flags are, etc).

> Josef

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
  2021-07-30  7:09                           ` Qu Wenruo
@ 2021-07-30 18:15                             ` Zygo Blaxell
  0 siblings, 0 replies; 123+ messages in thread
From: Zygo Blaxell @ 2021-07-30 18:15 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: NeilBrown, Neal Gompa, Wang Yugui, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel, linux-nfs,
	Btrfs BTRFS

On Fri, Jul 30, 2021 at 03:09:12PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/7/30 下午2:53, NeilBrown wrote:
> > On Fri, 30 Jul 2021, Qu Wenruo wrote:
> > > > 
> > > > You mean like "du -x"?? Yes.  You would lose the misleading illusion
> > > > that there are multiple filesystems.  That is one user-expectation that
> > > > would need to be addressed before people opt-in
> > > 
> > > OK, forgot it's an opt-in feature, then it's less an impact.
> > 
> > The hope would have to be that everyone would eventually opt-in once all
> > issues were understood.
> > 
> > > 
> > > Really not familiar with NFS/VFS, thus some ideas from me may sounds
> > > super crazy.
> > > 
> > > Is it possible that, for nfsd to detect such "subvolume" concept by its
> > > own, like checking st_dev and the fsid returned from statfs().
> > > 
> > > Then if nfsd find some boundary which has different st_dev, but the same
> > > fsid as its parent, then it knows it's a "subvolume"-like concept.
> > > 
> > > Then do some local inode number mapping inside nfsd?
> > > Like use the highest 20 bits for different subvolumes, while the
> > > remaining 44 bits for real inode numbers.
> > > 
> > > Of-course, this is still a workaround...
> > 
> > Yes, it would certainly be possible to add some hacks to nfsd to fix the
> > immediate problem, and we could probably even created some well-defined
> > interfaces into btrfs to extract the required information so that it
> > wasn't too hackish.
> > 
> > Maybe that is what we will have to do.  But I'd rather not hack NFSD
> > while there is any chance that a more complete solution will be found.
> > 
> > I'm not quite ready to give up on the idea of squeezing all btrfs inodes
> > into a 64bit number space.  24bits of subvol and 40 bits of inode?
> > Make the split a mkfs or mount option?
> 
> Btrfs used to have a subvolume number limit in the past, for different
> reasons.
> 
> In that case, subvolume number is limited to 48 bits, which is still too
> large to avoid conflicts.
> 
> For inode number there is really no limit except the 256 ~ (U64)-256 limit.
> 
> Considering all these numbers are almost U64, conflicts would be
> unavoidable AFAIK.
> 
> > Maybe hand out inode numbers to subvols in 2^32 chunks so each subvol
> > (which has ever been accessed) has a mapping from the top 32 bits of the
> > objectid to the top 32 bits of the inode number.
> > 
> > We don't need something that is theoretically perfect (that's not
> > possible anyway as we don't have 64bits of device numbers).  We just
> > need something that is practical and scales adequately.  If you have
> > petabytes of storage, it is reasonable to spend a gigabyte of memory on
> > a lookup table(?).
> 
> Can such squishing-all-inodes-into-one-namespace work to be done in a
> more generic way? e.g, let each fs with "subvolume"-like feature to
> provide the interface to do that.

If you know the highest subvol ID number, you can pack two integers into
one larger integer by reversing the bits of the subvol number and ORing
them with the inode number, i.e. 0x0080000000000300 is subvol 256
inode 768.

The subvol ID's grow left to right while the inode numbers grow right
to left.  You can have billions of inodes in a few subvols, or billions of
subvols with a few inodes each, and neither will collide with the other
until there are billions of both.

If the filesystem tracks the number of bits in the highest subvol ID
and the highest inode number, then the inode numbers can be decoded,
and collisions can be detected.  e.g. if the maximum subvol ID on the
filesystem is below 131072, it will fit in 17 bits, then we know bits
63-47 are the subvol ID and bits 46-0 are the inode..  When subvol 131072
is created, the number of subvol bits increases to 18, but if every inode
fits in less than 46 bits, we know that every existing inode has a 0 in
the 18th subvol ID bit of the inode number, so there is no ambiguity.

If you don't know the maximum subvol ID, you can guess based on the
position of the large run of zero bits in the middle of the integer--not
reliable, but good enough for a guess if you were looking at 'ls -li'
output (and wrote the inode numbers in hex).

In the pathological case (the maximum subvol ID and maximum inode number
require more than 64 total bits) we return ENOSPC.

This can all be done when btrfs fills in an inode struct.  There's no need
to change the on-disk format, other than to track the highest inode and
subvol number.  btrfs can compute the maxima in reasonable but non-zero
time by searching trees on mount, so an incompatible disk format change
would only be needed to avoid making mount slower.

> Despite that I still hope to have a way to distinguish the "subvolume"
> boundary.

Packing the bits into a single uint64 doesn't help with this--it does
the opposite.  Subvol boundaries become harder to see without deliberate
checking (i.e. not the traditional parent.st_dev != child.st_dev test).

Judging from previous btrfs-related complaints, some users do want
"stealth" subvols whose boundaries are not accidentally visible, so the
new behavior could be a feature for someone.

> If completely inside btrfs, it's pretty simple to locate a subvolume
> boundary.
> All subvolume have the same inode number 256.
> 
> Maybe we could reserve some special "squished" inode number to indicate
> boundary inside a filesystem.
> 
> E.g. reserve (u64)-1 as a special indicator for subvolume boundaries.
> As most fs would have reserved super high inode numbers anyway.
> 
> 
> > 
> > If we can make inode numbers unique, we can possibly leave the st_dev
> > changing at subvols so that "du -x" works as currently expected.
> > 
> > One thought I had was to use a strong hash to combine the subvol object
> > id and the inode object id into a 64bit number.  What is the chance of
> > a collision in practice :-)
> 
> But with just 64bits, conflicts will happen anyway...

The collision rate might be low enough that we could just skip over the
colliding numbers, but we'd have to have some kind of in-memory collision
map to avoid slowing down inode creation (currently the next inode number
is more or less "++last_inode_number", and looking up inodes to see if
they exist first would slow down new file creation a lot).

> Thanks,
> Qu
> > 
> > Thanks,
> > NeilBrown
> > 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [btrfs]  5874902268: xfstests.btrfs.202.fail
  2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
                     ` (2 preceding siblings ...)
  2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
@ 2021-07-31  6:25   ` kernel test robot
  3 siblings, 0 replies; 123+ messages in thread
From: kernel test robot @ 2021-07-31  6:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: 0day robot, LKML, lkp, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4629 bytes --]



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 58749022685aea90dfddfb9f8b2fcdc74dee6ec0 ("[PATCH 11/11] btrfs: use automount to bind-mount all subvol roots.")
url: https://github.com/0day-ci/linux/commits/NeilBrown/expose-btrfs-subvols-in-mount-table-correctly/20210728-064502
base: git://linux-nfs.org/~bfields/linux.git nfsd-next

in testcase: xfstests
version: xfstests-x86_64-76d2a91-1_20210728
with following parameters:

	disk: 6HDD
	fs: btrfs
	test: btrfs-group-20
	ucode: 0x28

test-description: xfstests is a regression test suite for xfs and other files ystems.
test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git


on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 8G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>

2021-07-29 14:27:50 export TEST_DIR=/fs/sdb1
2021-07-29 14:27:50 export TEST_DEV=/dev/sdb1
2021-07-29 14:27:50 export FSTYP=btrfs
2021-07-29 14:27:50 export SCRATCH_MNT=/fs/scratch
2021-07-29 14:27:50 mkdir /fs/scratch -p
2021-07-29 14:27:50 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdb5 /dev/sdb6"
2021-07-29 14:27:50 sed "s:^:btrfs/:" //lkp/benchmarks/xfstests/tests/btrfs-group-20
2021-07-29 14:27:50 ./check btrfs/200 btrfs/201 btrfs/202 btrfs/203 btrfs/204 btrfs/205 btrfs/207 btrfs/208 btrfs/209
FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 lkp-hsw-d01 5.13.0-rc2-00080-g58749022685a #1 SMP Wed Jul 28 21:12:27 CST 2021
MKFS_OPTIONS  -- /dev/sdb2
MOUNT_OPTIONS -- /dev/sdb2 /fs/scratch

btrfs/200	- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/200.out.bad)
    --- tests/btrfs/200.out	2021-07-28 06:20:35.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//btrfs/200.out.bad	2021-07-29 14:27:52.359748374 +0000
    @@ -10,8 +10,16 @@
     linked 65536/65536 bytes at offset 65536
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/incr'
    -At subvol SCRATCH_MNT/incr
    +ERROR: not on mount point: SCRATCH_MNT/incr
     At subvol base
    -At snapshot incr
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/200.out /lkp/benchmarks/xfstests/results//btrfs/200.out.bad'  to see the entire diff)
btrfs/201	 18s
btrfs/202	- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/202.out.bad)
    --- tests/btrfs/202.out	2021-07-28 06:20:35.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//btrfs/202.out.bad	2021-07-29 14:28:10.873746534 +0000
    @@ -2,3 +2,4 @@
     Create subvolume 'SCRATCH_MNT/a'
     Create subvolume 'SCRATCH_MNT/a/b'
     Create a snapshot of 'SCRATCH_MNT/a' in 'SCRATCH_MNT/c'
    +rm: cannot remove '/fs/scratch/c': Device or resource busy
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/202.out /lkp/benchmarks/xfstests/results//btrfs/202.out.bad'  to see the entire diff)
btrfs/203	- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/203.out.bad)
    --- tests/btrfs/203.out	2021-07-28 06:20:35.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//btrfs/203.out.bad	2021-07-29 14:28:12.189746403 +0000
    @@ -16,10 +16,10 @@
     linked 196608/196608 bytes at offset 196608
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/incr'
    -At subvol SCRATCH_MNT/incr
    +ERROR: not on mount point: SCRATCH_MNT/incr
     File foobar digest in the original filesystem:
     2b76b23b62fdbbbcae1ee37eec84fd7d
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/203.out /lkp/benchmarks/xfstests/results//btrfs/203.out.bad'  to see the entire diff)
btrfs/204	 2s
btrfs/205	 2s
btrfs/207	 49s
btrfs/208	[not run] /bin/btrfs too old (must support subvolume delete --subvolid)
btrfs/209	 1s
Ran: btrfs/200 btrfs/201 btrfs/202 btrfs/203 btrfs/204 btrfs/205 btrfs/207 btrfs/208 btrfs/209
Not run: btrfs/208
Failures: btrfs/200 btrfs/202 btrfs/203
Failed 3 of 9 tests




To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml  # generate the yaml file for lkp run
        bin/lkp run                    generated-yaml-file



---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


[-- Attachment #2: config-5.13.0-rc2-00080-g58749022685a --]
[-- Type: text/plain, Size: 173972 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.13.0-rc2 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-9 (Debian 9.3.0-22) 9.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=90300
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23502
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23502
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_CONTEXT_TRACKING=y
# CONFIG_CONTEXT_TRACKING_FORCE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_MISC is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
# CONFIG_BPF_LSM is not set
CONFIG_BPF_SYSCALL=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
# CONFIG_BPF_PRELOAD is not set
CONFIG_USERFAULTFD=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLAB_MERGE_DEFAULT=y
CONFIG_SLAB_FREELIST_RANDOM=y
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_X2APIC=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
CONFIG_RETPOLINE=y
# CONFIG_X86_CPU_RESCTRL is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_NUMACHIP is not set
# CONFIG_X86_VSMP is not set
CONFIG_X86_UV=y
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
CONFIG_X86_INTEL_LPSS=y
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_PARAVIRT_SPINLOCKS=y
CONFIG_X86_HV_CALLBACK_VECTOR=y
CONFIG_XEN=y
# CONFIG_XEN_PV is not set
CONFIG_XEN_PVHVM=y
CONFIG_XEN_PVHVM_SMP=y
CONFIG_XEN_PVHVM_GUEST=y
CONFIG_XEN_SAVE_RESTORE=y
# CONFIG_XEN_DEBUG_FS is not set
# CONFIG_XEN_PVH is not set
CONFIG_KVM_GUEST=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
# CONFIG_PVH is not set
CONFIG_PARAVIRT_TIME_ACCOUNTING=y
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_JAILHOUSE_GUEST is not set
# CONFIG_ACRN_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_GART_IOMMU is not set
CONFIG_MAXSMP=y
CONFIG_NR_CPUS_RANGE_BEGIN=8192
CONFIG_NR_CPUS_RANGE_END=8192
CONFIG_NR_CPUS_DEFAULT=8192
CONFIG_NR_CPUS=8192
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCELOG_LEGACY=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=m

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=m
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
# CONFIG_PERF_EVENTS_AMD_POWER is not set
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_X86_IOPL_IOPERM=y
CONFIG_I8K=m
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_5LEVEL=y
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_X86_CPA_STATISTICS is not set
# CONFIG_AMD_MEM_ENCRYPT is not set
CONFIG_NUMA=y
# CONFIG_AMD_NUMA is not set
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=10
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
# CONFIG_ARCH_MEMORY_PROBE is not set
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_X86_PMEM_LEGACY_DEVICE=y
CONFIG_X86_PMEM_LEGACY=m
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_X86_UMIP=y
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
# CONFIG_X86_SGX is not set
CONFIG_EFI=y
CONFIG_EFI_STUB=y
CONFIG_EFI_MIXED=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
# CONFIG_KEXEC_SIG is not set
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_DYNAMIC_MEMORY_LAYOUT=y
CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
CONFIG_HOTPLUG_CPU=y
CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
# CONFIG_COMPAT_VDSO is not set
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_XONLY is not set
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y
# end of Processor type and features

CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_HIBERNATION_SNAPSHOT_DEV=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_PM_SLEEP_DEBUG=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_PM_CLK=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
# CONFIG_ENERGY_MODEL is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SPCR_TABLE=y
# CONFIG_ACPI_FPDT is not set
CONFIG_ACPI_LPIT=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=m
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_TAD=m
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_CPPC_LIB=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_IPMI=m
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_PLATFORM_PROFILE=m
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_ACPI_SBS=m
CONFIG_ACPI_HED=y
# CONFIG_ACPI_CUSTOM_METHOD is not set
CONFIG_ACPI_BGRT=y
CONFIG_ACPI_NFIT=m
# CONFIG_NFIT_SECURITY_DEBUG is not set
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_HMAT is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
# CONFIG_ACPI_APEI_ERST_DEBUG is not set
# CONFIG_ACPI_DPTF is not set
CONFIG_ACPI_WATCHDOG=y
CONFIG_ACPI_EXTLOG=m
CONFIG_ACPI_ADXL=y
# CONFIG_ACPI_CONFIGFS is not set
CONFIG_PMIC_OPREGION=y
CONFIG_X86_PM_TIMER=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

#
# CPU frequency scaling drivers
#
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_ACPI_CPUFREQ_CPB=y
CONFIG_X86_POWERNOW_K8=m
# CONFIG_X86_AMD_FREQ_SENSITIVITY is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_P4_CLOCKMOD=m

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=m
# end of CPU Frequency scaling

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
# CONFIG_CPU_IDLE_GOV_HALTPOLL is not set
CONFIG_HALTPOLL_CPUIDLE=y
# end of CPU Idle

CONFIG_INTEL_IDLE=y
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_XEN=y
CONFIG_MMCONF_FAM10H=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# CONFIG_X86_SYSFB is not set
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_IA32_EMULATION=y
# CONFIG_X86_X32 is not set
CONFIG_COMPAT_32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
# end of Binary Emulations

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DMIID=y
CONFIG_DMI_SYSFS=y
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_ISCSI_IBFT is not set
CONFIG_FW_CFG_SYSFS=y
# CONFIG_FW_CFG_SYSFS_CMDLINE is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# EFI (Extensible Firmware Interface) Support
#
CONFIG_EFI_VARS=y
CONFIG_EFI_ESRT=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_EFI_FAKE_MEMMAP is not set
CONFIG_EFI_RUNTIME_WRAPPERS=y
CONFIG_EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER=y
# CONFIG_EFI_BOOTLOADER_CONTROL is not set
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
CONFIG_APPLE_PROPERTIES=y
# CONFIG_RESET_ATTACK_MITIGATION is not set
# CONFIG_EFI_RCI2_TABLE is not set
# CONFIG_EFI_DISABLE_PCI_DMA is not set
# end of EFI (Extensible Firmware Interface) Support

CONFIG_UEFI_CPER=y
CONFIG_UEFI_CPER_X86=y
CONFIG_EFI_DEV_PATH_PARSER=y
CONFIG_EFI_EARLYCON=y
CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_HAVE_KVM_IRQ_BYPASS=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_KVM_XFER_TO_GUEST_WORK=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
# CONFIG_KVM_AMD is not set
# CONFIG_KVM_XEN is not set
CONFIG_KVM_MMU_AUDIT=y
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_HOTPLUG_SMT=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_USE_MEMREMAP_PROT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_HAS_ELFCORE_COMPAT=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_SCSI_REQUEST=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=m
CONFIG_BLK_DEV_ZONED=y
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
# CONFIG_BLK_CMDLINE_PARSER is not set
CONFIG_BLK_WBT=y
# CONFIG_BLK_CGROUP_IOLATENCY is not set
# CONFIG_BLK_CGROUP_IOCOST is not set
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_DEBUG_FS=y
CONFIG_BLK_DEBUG_FS_ZONED=y
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_MQ_RDMA=y
CONFIG_BLK_PM=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_MHP_MEMMAP_ON_MEMORY=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_ARCH_ENABLE_THP_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=m
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_ARCH_WANTS_THP_SWAP=y
CONFIG_THP_SWAP=y
CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
# CONFIG_CMA_SYSFS is not set
CONFIG_CMA_AREAS=19
# CONFIG_MEM_SOFT_DIRTY is not set
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set
CONFIG_ZPOOL=y
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
CONFIG_ZSMALLOC_STAT=y
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ZONE_DEVICE=y
CONFIG_DEV_PAGEMAP_OPS=y
CONFIG_HMM_MIRROR=y
CONFIG_DEVICE_PRIVATE=y
CONFIG_VMAP_PFN=y
CONFIG_ARCH_USES_HIGH_VMA_FLAGS=y
CONFIG_ARCH_HAS_PKEYS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_TEST is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_IO_MAPPING=y
# end of Memory Management options

CONFIG_NET=y
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_UNIX_DIAG=m
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
# CONFIG_TLS_TOE is not set
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_USER_COMPAT is not set
# CONFIG_XFRM_INTERFACE is not set
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
# CONFIG_SMC is not set
CONFIG_XDP_SOCKETS=y
# CONFIG_XDP_SOCKETS_DIAG is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
# CONFIG_INET_ESPINTCP is not set
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_INET_RAW_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_NV=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_TCP_CONG_DCTCP=m
# CONFIG_TCP_CONG_CDG is not set
CONFIG_TCP_CONG_BBR=m
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_ESP_OFFLOAD=m
# CONFIG_INET6_ESPINTCP is not set
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
CONFIG_NETLABEL=y
# CONFIG_MPTCP is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
# CONFIG_NETFILTER_NETLINK_ACCT is not set
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_SYSLOG=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=y
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_COUNTER=m
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
CONFIG_NFT_SOCKET=m
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
CONFIG_NF_DUP_NETDEV=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
# CONFIG_NFT_REJECT_NETDEV is not set
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XTABLES_COMPAT=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_IPMAC=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
CONFIG_IP_VS_PROTO_SCTP=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_FO=m
CONFIG_IP_VS_OVF=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
# CONFIG_IP_VS_TWOS is not set

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_SOCKET_IPV4=m
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_SOCKET_IPV6=m
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
CONFIG_NFT_DUP_IPV6=m
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
# CONFIG_IP6_NF_TARGET_HL is not set
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_TARGET_SYNPROXY=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
CONFIG_NFT_BRIDGE_REJECT=m
# CONFIG_NF_CONNTRACK_BRIDGE is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_IP6=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_NFLOG=m
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_SCTP_COOKIE_HMAC_SHA1=y
CONFIG_INET_SCTP_DIAG=m
# CONFIG_RDS is not set
CONFIG_TIPC=m
# CONFIG_TIPC_MEDIA_IB is not set
CONFIG_TIPC_MEDIA_UDP=y
CONFIG_TIPC_CRYPTO=y
CONFIG_TIPC_DIAG=m
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_MRP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_BRIDGE_VLAN_FILTERING=y
# CONFIG_BRIDGE_MRP is not set
# CONFIG_BRIDGE_CFM is not set
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_6LOWPAN=m
# CONFIG_6LOWPAN_DEBUGFS is not set
# CONFIG_6LOWPAN_NHC is not set
CONFIG_IEEE802154=m
# CONFIG_IEEE802154_NL802154_EXPERIMENTAL is not set
CONFIG_IEEE802154_SOCKET=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=y
# CONFIG_NET_SCH_CAKE is not set
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_HHF=m
CONFIG_NET_SCH_PIE=m
# CONFIG_NET_SCH_FQ_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
# CONFIG_NET_SCH_ETS is not set
CONFIG_NET_SCH_DEFAULT=y
# CONFIG_DEFAULT_FQ is not set
# CONFIG_DEFAULT_CODEL is not set
CONFIG_DEFAULT_FQ_CODEL=y
# CONFIG_DEFAULT_SFQ is not set
# CONFIG_DEFAULT_PFIFO_FAST is not set
CONFIG_DEFAULT_NET_SCH="fq_codel"

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_CLS_MATCHALL=m
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
# CONFIG_NET_EMATCH_CANID is not set
CONFIG_NET_EMATCH_IPSET=m
# CONFIG_NET_EMATCH_IPT is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_SAMPLE=m
# CONFIG_NET_ACT_IPT is not set
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
CONFIG_NET_ACT_VLAN=m
CONFIG_NET_ACT_BPF=m
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
CONFIG_NET_ACT_SKBMOD=m
# CONFIG_NET_ACT_IFE is not set
CONFIG_NET_ACT_TUNNEL_KEY=m
# CONFIG_NET_ACT_GATE is not set
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
CONFIG_DNS_RESOLVER=m
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VMWARE_VMCI_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_HYPERV_VSOCKETS=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=y
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
CONFIG_NET_NSH=y
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
CONFIG_NET_L3_MASTER_DEV=y
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
CONFIG_CAN=m
CONFIG_CAN_RAW=m
CONFIG_CAN_BCM=m
CONFIG_CAN_GW=m
# CONFIG_CAN_J1939 is not set
# CONFIG_CAN_ISOTP is not set

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
# CONFIG_CAN_VXCAN is not set
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=m
CONFIG_CAN_CALC_BITTIMING=y
# CONFIG_CAN_KVASER_PCIEFD is not set
CONFIG_CAN_C_CAN=m
CONFIG_CAN_C_CAN_PLATFORM=m
CONFIG_CAN_C_CAN_PCI=m
CONFIG_CAN_CC770=m
# CONFIG_CAN_CC770_ISA is not set
CONFIG_CAN_CC770_PLATFORM=m
# CONFIG_CAN_IFI_CANFD is not set
# CONFIG_CAN_M_CAN is not set
# CONFIG_CAN_PEAK_PCIEFD is not set
CONFIG_CAN_SJA1000=m
CONFIG_CAN_EMS_PCI=m
# CONFIG_CAN_F81601 is not set
CONFIG_CAN_KVASER_PCI=m
CONFIG_CAN_PEAK_PCI=m
CONFIG_CAN_PEAK_PCIEC=y
CONFIG_CAN_PLX_PCI=m
# CONFIG_CAN_SJA1000_ISA is not set
CONFIG_CAN_SJA1000_PLATFORM=m
CONFIG_CAN_SOFTING=m

#
# CAN SPI interfaces
#
# CONFIG_CAN_HI311X is not set
# CONFIG_CAN_MCP251X is not set
# CONFIG_CAN_MCP251XFD is not set
# end of CAN SPI interfaces

#
# CAN USB interfaces
#
# CONFIG_CAN_8DEV_USB is not set
# CONFIG_CAN_EMS_USB is not set
# CONFIG_CAN_ESD_USB2 is not set
# CONFIG_CAN_ETAS_ES58X is not set
# CONFIG_CAN_GS_USB is not set
# CONFIG_CAN_KVASER_USB is not set
# CONFIG_CAN_MCBA_USB is not set
# CONFIG_CAN_PEAK_USB is not set
# CONFIG_CAN_UCAN is not set
# end of CAN USB interfaces

# CONFIG_CAN_DEBUG_DEVICES is not set
# end of CAN Device Drivers

CONFIG_BT=m
CONFIG_BT_BREDR=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HIDP=m
CONFIG_BT_HS=y
CONFIG_BT_LE=y
# CONFIG_BT_6LOWPAN is not set
# CONFIG_BT_LEDS is not set
# CONFIG_BT_MSFTEXT is not set
# CONFIG_BT_AOSPEXT is not set
CONFIG_BT_DEBUGFS=y
# CONFIG_BT_SELFTEST is not set

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIBTUSB is not set
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
CONFIG_BT_HCIUART_ATH3K=y
# CONFIG_BT_HCIUART_INTEL is not set
# CONFIG_BT_HCIUART_AG6XX is not set
# CONFIG_BT_HCIBCM203X is not set
# CONFIG_BT_HCIBPA10X is not set
# CONFIG_BT_HCIBFUSB is not set
CONFIG_BT_HCIVHCI=m
CONFIG_BT_MRVL=m
# CONFIG_BT_MRVL_SDIO is not set
# CONFIG_BT_MTKSDIO is not set
# CONFIG_BT_VIRTIO is not set
# end of Bluetooth device drivers

# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_WEXT_CORE=y
CONFIG_WEXT_PROC=y
CONFIG_CFG80211=m
# CONFIG_NL80211_TESTMODE is not set
# CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
CONFIG_CFG80211_DEFAULT_PS=y
# CONFIG_CFG80211_DEBUGFS is not set
CONFIG_CFG80211_CRDA_SUPPORT=y
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
CONFIG_MAC80211_MESH=y
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_DEBUGFS=y
# CONFIG_MAC80211_MESSAGE_TRACING is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
CONFIG_RFKILL=m
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
# CONFIG_RFKILL_GPIO is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_XEN is not set
# CONFIG_NET_9P_RDMA is not set
# CONFIG_NET_9P_DEBUG is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
# CONFIG_NFC is not set
CONFIG_PSAMPLE=m
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_SOCK_VALIDATE_XMIT=y
CONFIG_NET_SELFTESTS=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NET_DEVLINK=y
CONFIG_PAGE_POOL=y
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y
CONFIG_HAVE_EBPF_JIT=y

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIEAER_INJECT=m
CONFIG_PCIE_ECRC=y
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
CONFIG_PCIE_PME=y
CONFIG_PCIE_DPC=y
# CONFIG_PCIE_PTM is not set
# CONFIG_PCIE_EDR is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_PCI_PF_STUB=m
# CONFIG_XEN_PCIDEV_FRONTEND is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_LOCKLESS_CONFIG=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y
# CONFIG_PCI_P2PDMA is not set
CONFIG_PCI_LABEL=y
CONFIG_PCI_HYPERV=m
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y

#
# PCI controller drivers
#
CONFIG_VMD=y
CONFIG_PCI_HYPERV_INTERFACE=m

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_CACHE=y
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
CONFIG_PM_QOS_KUNIT_TEST=y
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_DRIVER_PE_KUNIT_TEST=y
CONFIG_SYS_HYPERVISOR=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=m
CONFIG_REGMAP_SPI=m
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_NULL_BLK=m
CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION=y
# CONFIG_BLK_DEV_FD is not set
CONFIG_CDROM=m
# CONFIG_PARIDE is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_ZRAM is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_XEN_BLKDEV_FRONTEND=m
CONFIG_VIRTIO_BLK=m
CONFIG_BLK_DEV_RBD=m
# CONFIG_BLK_DEV_RSXX is not set

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
CONFIG_NVME_MULTIPATH=y
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=m
# CONFIG_NVME_RDMA is not set
CONFIG_NVME_FC=m
# CONFIG_NVME_TCP is not set
CONFIG_NVME_TARGET=m
# CONFIG_NVME_TARGET_PASSTHRU is not set
CONFIG_NVME_TARGET_LOOP=m
# CONFIG_NVME_TARGET_RDMA is not set
CONFIG_NVME_TARGET_FC=m
CONFIG_NVME_TARGET_FCLOOP=m
# CONFIG_NVME_TARGET_TCP is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=m
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
CONFIG_TIFM_CORE=m
CONFIG_TIFM_7XX1=m
# CONFIG_ICS932S401 is not set
CONFIG_ENCLOSURE_SERVICES=m
CONFIG_SGI_XP=m
CONFIG_HP_ILO=m
CONFIG_SGI_GRU=m
# CONFIG_SGI_GRU_DEBUG is not set
CONFIG_APDS9802ALS=m
CONFIG_ISL29003=m
CONFIG_ISL29020=m
CONFIG_SENSORS_TSL2550=m
CONFIG_SENSORS_BH1770=m
CONFIG_SENSORS_APDS990X=m
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
CONFIG_VMWARE_BALLOON=m
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
CONFIG_MISC_RTSX=m
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_AT25 is not set
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_EEPROM_93XX46 is not set
# CONFIG_EEPROM_IDT_89HPESX is not set
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

CONFIG_CB710_CORE=m
# CONFIG_CB710_DEBUG is not set
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# end of Texas Instruments shared transport line discipline

CONFIG_SENSORS_LIS3_I2C=m
CONFIG_ALTERA_STAPL=m
CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_INTEL_MEI_HDCP is not set
CONFIG_VMWARE_VMCI=m
# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
CONFIG_MISC_RTSX_PCI=m
# CONFIG_MISC_RTSX_USB is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
CONFIG_PVPANIC=y
# CONFIG_PVPANIC_MMIO is not set
# CONFIG_PVPANIC_PCI is not set
# end of Misc devices

CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT3SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_SMARTPQI is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_SCSI_MYRS is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_XEN_SCSI_FRONTEND is not set
CONFIG_HYPERV_STORAGE=m
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
CONFIG_SCSI_ISCI=m
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_VIRTIO is not set
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=y
CONFIG_SCSI_DH_HP_SW=y
CONFIG_SCSI_DH_EMC=y
CONFIG_SCSI_DH_ALUA=y
# end of SCSI device support

CONFIG_ATA=m
CONFIG_SATA_HOST=y
CONFIG_PATA_TIMINGS=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_FORCE=y
CONFIG_ATA_ACPI=y
# CONFIG_SATA_ZPODD is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=m
CONFIG_SATA_MOBILE_LPM_POLICY=0
CONFIG_SATA_AHCI_PLATFORM=m
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=m
# CONFIG_SATA_DWC is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
# CONFIG_DM_UNSTRIPED is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_SMQ=m
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
CONFIG_DM_ERA=m
# CONFIG_DM_CLONE is not set
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
# CONFIG_DM_MULTIPATH_HST is not set
# CONFIG_DM_MULTIPATH_IOA is not set
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
# CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG is not set
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=m
# CONFIG_DM_ZONED is not set
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
CONFIG_TCM_PSCSI=m
CONFIG_TCM_USER2=m
CONFIG_LOOPBACK_TARGET=m
CONFIG_ISCSI_TARGET=m
# CONFIG_SBP_TARGET is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_SBP2=m
CONFIG_FIREWIRE_NET=m
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_FC is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_MACSEC is not set
CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_TUN=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
# CONFIG_NLMON is not set
# CONFIG_NET_VRF is not set
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
# CONFIG_ATM_TCP is not set
# CONFIG_ATM_LANAI is not set
# CONFIG_ATM_ENI is not set
# CONFIG_ATM_FIRESTREAM is not set
# CONFIG_ATM_ZATM is not set
# CONFIG_ATM_NICSTAR is not set
# CONFIG_ATM_IDT77252 is not set
# CONFIG_ATM_AMBASSADOR is not set
# CONFIG_ATM_HORIZON is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
# CONFIG_ATM_HE is not set
# CONFIG_ATM_SOLOS is not set
CONFIG_ETHERNET=y
CONFIG_MDIO=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_ENA_ETHERNET is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_AMD_XGBE is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_ALX is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
# CONFIG_TIGON3 is not set
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
# CONFIG_THUNDER_NIC_PF is not set
# CONFIG_THUNDER_NIC_VF is not set
# CONFIG_THUNDER_NIC_BGX is not set
# CONFIG_THUNDER_NIC_RGX is not set
CONFIG_CAVIUM_PTP=y
# CONFIG_LIQUIDIO is not set
# CONFIG_LIQUIDIO_VF is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
# CONFIG_CX_ECAT is not set
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_GOOGLE=y
# CONFIG_GVE is not set
CONFIG_NET_VENDOR_HUAWEI=y
# CONFIG_HINIC is not set
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_E1000E_HWTS=y
CONFIG_IGB=y
CONFIG_IGB_HWMON=y
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
CONFIG_IXGBE=y
CONFIG_IXGBE_HWMON=y
# CONFIG_IXGBE_DCB is not set
CONFIG_IXGBE_IPSEC=y
# CONFIG_IXGBEVF is not set
CONFIG_I40E=y
# CONFIG_I40E_DCB is not set
# CONFIG_I40EVF is not set
# CONFIG_ICE is not set
# CONFIG_FM10K is not set
CONFIG_IGC=y
CONFIG_NET_VENDOR_MICROSOFT=y
# CONFIG_MICROSOFT_MANA is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_PRESTERA is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8842 is not set
# CONFIG_KS8851 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_ENC28J60 is not set
# CONFIG_ENCX24J600 is not set
# CONFIG_LAN743X is not set
CONFIG_NET_VENDOR_MICROSEMI=y
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_NE2K_PCI is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_PENSANDO=y
# CONFIG_IONIC is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
CONFIG_R8169=y
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
# CONFIG_ROCKER is not set
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_EMACLITE is not set
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y
# CONFIG_LED_TRIGGER_PHY is not set
# CONFIG_FIXED_PHY is not set

#
# MII PHY device drivers
#
# CONFIG_AMD_PHY is not set
# CONFIG_ADIN_PHY is not set
# CONFIG_AQUANTIA_PHY is not set
# CONFIG_AX88796B_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM54140_PHY is not set
# CONFIG_BCM7XXX_PHY is not set
# CONFIG_BCM84881_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_CORTINA_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_INTEL_XWAY_PHY is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_MARVELL_10G_PHY is not set
# CONFIG_MARVELL_88X2222_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
# CONFIG_MICROSEMI_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_C45_TJA11XX_PHY is not set
# CONFIG_NXP_TJA11XX_PHY is not set
# CONFIG_QSEMI_PHY is not set
CONFIG_REALTEK_PHY=y
# CONFIG_RENESAS_PHY is not set
# CONFIG_ROCKCHIP_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_TERANETICS_PHY is not set
# CONFIG_DP83822_PHY is not set
# CONFIG_DP83TC811_PHY is not set
# CONFIG_DP83848_PHY is not set
# CONFIG_DP83867_PHY is not set
# CONFIG_DP83869_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_XILINX_GMII2RGMII is not set
# CONFIG_MICREL_KS8995MA is not set
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_MDIO_DEVRES=y
# CONFIG_MDIO_BITBANG is not set
# CONFIG_MDIO_BCM_UNIMAC is not set
# CONFIG_MDIO_MVUSB is not set
# CONFIG_MDIO_MSCC_MIIM is not set
# CONFIG_MDIO_THUNDER is not set

#
# MDIO Multiplexers
#

#
# PCS device drivers
#
# CONFIG_PCS_XPCS is not set
# end of PCS device drivers

# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
CONFIG_USB_NET_DRIVERS=y
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
CONFIG_USB_RTL8152=y
# CONFIG_USB_LAN78XX is not set
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_AX88179_178A=y
# CONFIG_USB_NET_CDCETHER is not set
# CONFIG_USB_NET_CDC_EEM is not set
# CONFIG_USB_NET_CDC_NCM is not set
# CONFIG_USB_NET_HUAWEI_CDC_NCM is not set
# CONFIG_USB_NET_CDC_MBIM is not set
# CONFIG_USB_NET_DM9601 is not set
# CONFIG_USB_NET_SR9700 is not set
# CONFIG_USB_NET_SR9800 is not set
# CONFIG_USB_NET_SMSC75XX is not set
# CONFIG_USB_NET_SMSC95XX is not set
# CONFIG_USB_NET_GL620A is not set
# CONFIG_USB_NET_NET1080 is not set
# CONFIG_USB_NET_PLUSB is not set
# CONFIG_USB_NET_MCS7830 is not set
# CONFIG_USB_NET_RNDIS_HOST is not set
# CONFIG_USB_NET_CDC_SUBSET is not set
# CONFIG_USB_NET_ZAURUS is not set
# CONFIG_USB_NET_CX82310_ETH is not set
# CONFIG_USB_NET_KALMIA is not set
# CONFIG_USB_NET_QMI_WWAN is not set
# CONFIG_USB_HSO is not set
# CONFIG_USB_NET_INT51X1 is not set
# CONFIG_USB_IPHETH is not set
# CONFIG_USB_SIERRA_NET is not set
# CONFIG_USB_NET_CH9200 is not set
# CONFIG_USB_NET_AQC111 is not set
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
# CONFIG_ADM8211 is not set
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K is not set
# CONFIG_ATH5K_PCI is not set
# CONFIG_ATH9K is not set
# CONFIG_ATH9K_HTC is not set
# CONFIG_CARL9170 is not set
# CONFIG_ATH6KL is not set
# CONFIG_AR5523 is not set
# CONFIG_WIL6210 is not set
# CONFIG_ATH10K is not set
# CONFIG_WCN36XX is not set
# CONFIG_ATH11K is not set
CONFIG_WLAN_VENDOR_ATMEL=y
# CONFIG_ATMEL is not set
# CONFIG_AT76C50X_USB is not set
CONFIG_WLAN_VENDOR_BROADCOM=y
# CONFIG_B43 is not set
# CONFIG_B43LEGACY is not set
# CONFIG_BRCMSMAC is not set
# CONFIG_BRCMFMAC is not set
CONFIG_WLAN_VENDOR_CISCO=y
# CONFIG_AIRO is not set
CONFIG_WLAN_VENDOR_INTEL=y
# CONFIG_IPW2100 is not set
# CONFIG_IPW2200 is not set
# CONFIG_IWL4965 is not set
# CONFIG_IWL3945 is not set
# CONFIG_IWLWIFI is not set
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
# CONFIG_HERMES is not set
# CONFIG_P54_COMMON is not set
# CONFIG_PRISM54 is not set
CONFIG_WLAN_VENDOR_MARVELL=y
# CONFIG_LIBERTAS is not set
# CONFIG_LIBERTAS_THINFIRM is not set
# CONFIG_MWIFIEX is not set
# CONFIG_MWL8K is not set
CONFIG_WLAN_VENDOR_MEDIATEK=y
# CONFIG_MT7601U is not set
# CONFIG_MT76x0U is not set
# CONFIG_MT76x0E is not set
# CONFIG_MT76x2E is not set
# CONFIG_MT76x2U is not set
# CONFIG_MT7603E is not set
# CONFIG_MT7615E is not set
# CONFIG_MT7663U is not set
# CONFIG_MT7663S is not set
# CONFIG_MT7915E is not set
# CONFIG_MT7921E is not set
CONFIG_WLAN_VENDOR_MICROCHIP=y
# CONFIG_WILC1000_SDIO is not set
# CONFIG_WILC1000_SPI is not set
CONFIG_WLAN_VENDOR_RALINK=y
# CONFIG_RT2X00 is not set
CONFIG_WLAN_VENDOR_REALTEK=y
# CONFIG_RTL8180 is not set
# CONFIG_RTL8187 is not set
CONFIG_RTL_CARDS=m
# CONFIG_RTL8192CE is not set
# CONFIG_RTL8192SE is not set
# CONFIG_RTL8192DE is not set
# CONFIG_RTL8723AE is not set
# CONFIG_RTL8723BE is not set
# CONFIG_RTL8188EE is not set
# CONFIG_RTL8192EE is not set
# CONFIG_RTL8821AE is not set
# CONFIG_RTL8192CU is not set
# CONFIG_RTL8XXXU is not set
# CONFIG_RTW88 is not set
CONFIG_WLAN_VENDOR_RSI=y
# CONFIG_RSI_91X is not set
CONFIG_WLAN_VENDOR_ST=y
# CONFIG_CW1200 is not set
CONFIG_WLAN_VENDOR_TI=y
# CONFIG_WL1251 is not set
# CONFIG_WL12XX is not set
# CONFIG_WL18XX is not set
# CONFIG_WLCORE is not set
CONFIG_WLAN_VENDOR_ZYDAS=y
# CONFIG_USB_ZD1201 is not set
# CONFIG_ZD1211RW is not set
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_QTNFMAC_PCIE is not set
CONFIG_MAC80211_HWSIM=m
# CONFIG_USB_NET_RNDIS_WLAN is not set
# CONFIG_VIRT_WIFI is not set
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_IEEE802154_FAKELB is not set
# CONFIG_IEEE802154_AT86RF230 is not set
# CONFIG_IEEE802154_MRF24J40 is not set
# CONFIG_IEEE802154_CC2520 is not set
# CONFIG_IEEE802154_ATUSB is not set
# CONFIG_IEEE802154_ADF7242 is not set
# CONFIG_IEEE802154_CA8210 is not set
# CONFIG_IEEE802154_MCR20A is not set
# CONFIG_IEEE802154_HWSIM is not set
# CONFIG_WWAN is not set
CONFIG_XEN_NETDEV_FRONTEND=y
# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_HYPERV_NET is not set
CONFIG_NETDEVSIM=m
CONFIG_NET_FAILOVER=m
# CONFIG_ISDN is not set
# CONFIG_NVM is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=m
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
# CONFIG_KEYBOARD_APPLESPI is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_ELANTECH_SMBUS=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_VMMOUSE=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
CONFIG_MOUSE_CYAPA=m
CONFIG_MOUSE_ELAN_I2C=m
CONFIG_MOUSE_ELAN_I2C_I2C=y
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
CONFIG_MOUSE_VSXXXAA=m
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=m
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SPI=m
CONFIG_RMI4_SMB=m
CONFIG_RMI4_F03=y
CONFIG_RMI4_F03_SERIO=m
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
CONFIG_RMI4_F12=y
CONFIG_RMI4_F30=y
CONFIG_RMI4_F34=y
# CONFIG_RMI4_F3A is not set
# CONFIG_RMI4_F54 is not set
CONFIG_RMI4_F55=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_SERIO_ALTERA_PS2=m
# CONFIG_SERIO_PS2MULT is not set
CONFIG_SERIO_ARC_PS2=m
CONFIG_HYPERV_KEYBOARD=m
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_16550A_VARIANTS is not set
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_EXAR=y
CONFIG_SERIAL_8250_NR_UARTS=64
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
CONFIG_SERIAL_8250_RSA=y
CONFIG_SERIAL_8250_DWLIB=y
CONFIG_SERIAL_8250_DW=y
# CONFIG_SERIAL_8250_RT288X is not set
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
# CONFIG_SERIAL_MAX310X is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_BCM63XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
CONFIG_SERIAL_ARC=m
CONFIG_SERIAL_ARC_NR_PORTS=1
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

CONFIG_SERIAL_MCTRL_GPIO=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
CONFIG_SYNCLINK_GT=m
CONFIG_N_HDLC=m
CONFIG_N_GSM=m
CONFIG_NOZOMI=m
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IRQ=y
CONFIG_HVC_XEN=y
CONFIG_HVC_XEN_FRONTEND=y
# CONFIG_SERIAL_DEV_BUS is not set
CONFIG_PRINTER=m
# CONFIG_LP_CONSOLE is not set
CONFIG_PPDEV=m
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DMI_DECODE=y
CONFIG_IPMI_PLAT_DATA=y
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_HW_RANDOM_XIPHERA is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
CONFIG_DEVMEM=y
CONFIG_NVRAM=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_DEVPORT=y
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HPET_MMAP_DEFAULT is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_UV_MMTIMER=m
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_SPI is not set
# CONFIG_TCG_TIS_I2C_CR50 is not set
CONFIG_TCG_TIS_I2C_ATMEL=m
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
# CONFIG_TCG_XEN is not set
CONFIG_TCG_CRB=y
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
# CONFIG_TCG_TIS_ST33ZP24_SPI is not set
CONFIG_TELCLOCK=m
# CONFIG_XILLYBUS is not set
# end of Character devices

# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_MUX=m

#
# Multiplexer I2C Chip support
#
# CONFIG_I2C_MUX_GPIO is not set
# CONFIG_I2C_MUX_LTC4306 is not set
# CONFIG_I2C_MUX_PCA9541 is not set
# CONFIG_I2C_MUX_PCA954x is not set
# CONFIG_I2C_MUX_REG is not set
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
# CONFIG_I2C_AMD_MP2 is not set
CONFIG_I2C_I801=y
CONFIG_I2C_ISCH=m
CONFIG_I2C_ISMT=m
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
# CONFIG_I2C_NVIDIA_GPU is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# ACPI drivers
#
CONFIG_I2C_SCMI=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_CBUS_GPIO is not set
CONFIG_I2C_DESIGNWARE_CORE=m
# CONFIG_I2C_DESIGNWARE_SLAVE is not set
CONFIG_I2C_DESIGNWARE_PLATFORM=m
CONFIG_I2C_DESIGNWARE_BAYTRAIL=y
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EMEV2 is not set
# CONFIG_I2C_GPIO is not set
# CONFIG_I2C_OCORES is not set
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_SIMTEC=m
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
# CONFIG_I2C_CP2615 is not set
CONFIG_I2C_PARPORT=m
# CONFIG_I2C_ROBOTFUZZ_OSIF is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_MLXCPLD=m
# end of I2C Hardware Bus support

CONFIG_I2C_STUB=m
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

# CONFIG_I3C is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y
# CONFIG_SPI_MEM is not set

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
# CONFIG_SPI_ALTERA_CORE is not set
# CONFIG_SPI_AXI_SPI_ENGINE is not set
# CONFIG_SPI_BITBANG is not set
# CONFIG_SPI_BUTTERFLY is not set
# CONFIG_SPI_CADENCE is not set
# CONFIG_SPI_DESIGNWARE is not set
# CONFIG_SPI_NXP_FLEXSPI is not set
# CONFIG_SPI_GPIO is not set
# CONFIG_SPI_LM70_LLP is not set
# CONFIG_SPI_LANTIQ_SSC is not set
# CONFIG_SPI_OC_TINY is not set
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_ROCKCHIP is not set
# CONFIG_SPI_SC18IS602 is not set
# CONFIG_SPI_SIFIVE is not set
# CONFIG_SPI_MXIC is not set
# CONFIG_SPI_XCOMM is not set
# CONFIG_SPI_XILINX is not set
# CONFIG_SPI_ZYNQMP_GQSPI is not set
# CONFIG_SPI_AMD is not set

#
# SPI Multiplexer support
#
# CONFIG_SPI_MUX is not set

#
# SPI Protocol Masters
#
# CONFIG_SPI_SPIDEV is not set
# CONFIG_SPI_LOOPBACK_TEST is not set
# CONFIG_SPI_TLE62X0 is not set
# CONFIG_SPI_SLAVE is not set
CONFIG_SPI_DYNAMIC=y
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
CONFIG_PPS_CLIENT_LDISC=m
CONFIG_PPS_CLIENT_PARPORT=m
CONFIG_PPS_CLIENT_GPIO=m

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y
# CONFIG_DP83640_PHY is not set
# CONFIG_PTP_1588_CLOCK_INES is not set
CONFIG_PTP_1588_CLOCK_KVM=m
# CONFIG_PTP_1588_CLOCK_IDT82P33 is not set
# CONFIG_PTP_1588_CLOCK_IDTCM is not set
# CONFIG_PTP_1588_CLOCK_VMW is not set
# CONFIG_PTP_1588_CLOCK_OCP is not set
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINMUX=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
# CONFIG_DEBUG_PINCTRL is not set
CONFIG_PINCTRL_AMD=m
# CONFIG_PINCTRL_MCP23S08 is not set
# CONFIG_PINCTRL_SX150X is not set
CONFIG_PINCTRL_BAYTRAIL=y
# CONFIG_PINCTRL_CHERRYVIEW is not set
# CONFIG_PINCTRL_LYNXPOINT is not set
CONFIG_PINCTRL_INTEL=y
# CONFIG_PINCTRL_ALDERLAKE is not set
CONFIG_PINCTRL_BROXTON=m
CONFIG_PINCTRL_CANNONLAKE=m
CONFIG_PINCTRL_CEDARFORK=m
CONFIG_PINCTRL_DENVERTON=m
# CONFIG_PINCTRL_ELKHARTLAKE is not set
# CONFIG_PINCTRL_EMMITSBURG is not set
CONFIG_PINCTRL_GEMINILAKE=m
# CONFIG_PINCTRL_ICELAKE is not set
# CONFIG_PINCTRL_JASPERLAKE is not set
# CONFIG_PINCTRL_LAKEFIELD is not set
CONFIG_PINCTRL_LEWISBURG=m
CONFIG_PINCTRL_SUNRISEPOINT=m
# CONFIG_PINCTRL_TIGERLAKE is not set

#
# Renesas pinctrl drivers
#
# end of Renesas pinctrl drivers

CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_CDEV=y
CONFIG_GPIO_CDEV_V1=y
CONFIG_GPIO_GENERIC=m

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_AMDPT=m
# CONFIG_GPIO_DWAPB is not set
# CONFIG_GPIO_EXAR is not set
# CONFIG_GPIO_GENERIC_PLATFORM is not set
CONFIG_GPIO_ICH=m
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_VX855 is not set
# CONFIG_GPIO_AMD_FCH is not set
# end of Memory mapped GPIO drivers

#
# Port-mapped I/O GPIO drivers
#
# CONFIG_GPIO_F7188X is not set
# CONFIG_GPIO_IT87 is not set
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_SCH311X is not set
# CONFIG_GPIO_WINBOND is not set
# CONFIG_GPIO_WS16C48 is not set
# end of Port-mapped I/O GPIO drivers

#
# I2C GPIO expanders
#
# CONFIG_GPIO_ADP5588 is not set
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCA9570 is not set
# CONFIG_GPIO_PCF857X is not set
# CONFIG_GPIO_TPIC2810 is not set
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# end of MFD GPIO expanders

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_PCI_IDIO_16 is not set
# CONFIG_GPIO_PCIE_IDIO_24 is not set
# CONFIG_GPIO_RDC321X is not set
# end of PCI GPIO expanders

#
# SPI GPIO expanders
#
# CONFIG_GPIO_MAX3191X is not set
# CONFIG_GPIO_MAX7301 is not set
# CONFIG_GPIO_MC33880 is not set
# CONFIG_GPIO_PISOSR is not set
# CONFIG_GPIO_XRA1403 is not set
# end of SPI GPIO expanders

#
# USB GPIO expanders
#
# end of USB GPIO expanders

#
# Virtual GPIO drivers
#
# CONFIG_GPIO_AGGREGATOR is not set
# CONFIG_GPIO_MOCKUP is not set
# end of Virtual GPIO drivers

# CONFIG_W1 is not set
CONFIG_POWER_RESET=y
# CONFIG_POWER_RESET_RESTART is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_CHARGER_ADP5061 is not set
# CONFIG_BATTERY_CW2015 is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
# CONFIG_MANAGER_SBS is not set
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_LT3651 is not set
# CONFIG_CHARGER_LTC4162L is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_BQ24257 is not set
# CONFIG_CHARGER_BQ24735 is not set
# CONFIG_CHARGER_BQ2515X is not set
# CONFIG_CHARGER_BQ25890 is not set
# CONFIG_CHARGER_BQ25980 is not set
# CONFIG_CHARGER_BQ256XX is not set
CONFIG_CHARGER_SMB347=m
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
# CONFIG_BATTERY_GOLDFISH is not set
# CONFIG_CHARGER_RT9455 is not set
# CONFIG_CHARGER_BD99954 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
CONFIG_SENSORS_ABITUGURU=m
CONFIG_SENSORS_ABITUGURU3=m
# CONFIG_SENSORS_AD7314 is not set
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
CONFIG_SENSORS_ADM1029=m
CONFIG_SENSORS_ADM1031=m
# CONFIG_SENSORS_ADM1177 is not set
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7X10=m
# CONFIG_SENSORS_ADT7310 is not set
CONFIG_SENSORS_ADT7410=m
CONFIG_SENSORS_ADT7411=m
CONFIG_SENSORS_ADT7462=m
CONFIG_SENSORS_ADT7470=m
CONFIG_SENSORS_ADT7475=m
# CONFIG_SENSORS_AHT10 is not set
# CONFIG_SENSORS_AS370 is not set
CONFIG_SENSORS_ASC7621=m
# CONFIG_SENSORS_AXI_FAN_CONTROL is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_K10TEMP=m
CONFIG_SENSORS_FAM15H_POWER=m
CONFIG_SENSORS_APPLESMC=m
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ASPEED is not set
CONFIG_SENSORS_ATXP1=m
# CONFIG_SENSORS_CORSAIR_CPRO is not set
# CONFIG_SENSORS_CORSAIR_PSU is not set
# CONFIG_SENSORS_DRIVETEMP is not set
CONFIG_SENSORS_DS620=m
CONFIG_SENSORS_DS1621=m
CONFIG_SENSORS_DELL_SMM=m
CONFIG_SENSORS_I5K_AMB=m
CONFIG_SENSORS_F71805F=m
CONFIG_SENSORS_F71882FG=m
CONFIG_SENSORS_F75375S=m
CONFIG_SENSORS_FSCHMD=m
# CONFIG_SENSORS_FTSTEUTATES is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
CONFIG_SENSORS_G760A=m
# CONFIG_SENSORS_G762 is not set
# CONFIG_SENSORS_HIH6130 is not set
CONFIG_SENSORS_IBMAEM=m
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_I5500=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
# CONFIG_SENSORS_POWR1220 is not set
CONFIG_SENSORS_LINEAGE=m
# CONFIG_SENSORS_LTC2945 is not set
# CONFIG_SENSORS_LTC2947_I2C is not set
# CONFIG_SENSORS_LTC2947_SPI is not set
# CONFIG_SENSORS_LTC2990 is not set
# CONFIG_SENSORS_LTC2992 is not set
CONFIG_SENSORS_LTC4151=m
CONFIG_SENSORS_LTC4215=m
# CONFIG_SENSORS_LTC4222 is not set
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
# CONFIG_SENSORS_MAX1111 is not set
# CONFIG_SENSORS_MAX127 is not set
CONFIG_SENSORS_MAX16065=m
CONFIG_SENSORS_MAX1619=m
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
# CONFIG_SENSORS_MAX31722 is not set
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6621 is not set
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6642=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_MAX6697=m
# CONFIG_SENSORS_MAX31790 is not set
CONFIG_SENSORS_MCP3021=m
# CONFIG_SENSORS_MLXREG_FAN is not set
# CONFIG_SENSORS_TC654 is not set
# CONFIG_SENSORS_TPS23861 is not set
# CONFIG_SENSORS_MR75203 is not set
# CONFIG_SENSORS_ADCXX is not set
CONFIG_SENSORS_LM63=m
# CONFIG_SENSORS_LM70 is not set
CONFIG_SENSORS_LM73=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
CONFIG_SENSORS_LM95241=m
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
CONFIG_SENSORS_NTC_THERMISTOR=m
# CONFIG_SENSORS_NCT6683 is not set
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT7802 is not set
# CONFIG_SENSORS_NCT7904 is not set
# CONFIG_SENSORS_NPCM7XX is not set
# CONFIG_SENSORS_NZXT_KRAKEN2 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_PMBUS=m
CONFIG_SENSORS_PMBUS=m
# CONFIG_SENSORS_ADM1266 is not set
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_BEL_PFE is not set
# CONFIG_SENSORS_BPA_RS600 is not set
# CONFIG_SENSORS_FSP_3Y is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
# CONFIG_SENSORS_INSPUR_IPSPS is not set
# CONFIG_SENSORS_IR35221 is not set
# CONFIG_SENSORS_IR36021 is not set
# CONFIG_SENSORS_IR38064 is not set
# CONFIG_SENSORS_IRPS5401 is not set
# CONFIG_SENSORS_ISL68137 is not set
CONFIG_SENSORS_LM25066=m
CONFIG_SENSORS_LTC2978=m
# CONFIG_SENSORS_LTC3815 is not set
# CONFIG_SENSORS_MAX15301 is not set
CONFIG_SENSORS_MAX16064=m
# CONFIG_SENSORS_MAX16601 is not set
# CONFIG_SENSORS_MAX20730 is not set
# CONFIG_SENSORS_MAX20751 is not set
# CONFIG_SENSORS_MAX31785 is not set
CONFIG_SENSORS_MAX34440=m
CONFIG_SENSORS_MAX8688=m
# CONFIG_SENSORS_MP2975 is not set
# CONFIG_SENSORS_PM6764TR is not set
# CONFIG_SENSORS_PXE1610 is not set
# CONFIG_SENSORS_Q54SJ108A2 is not set
# CONFIG_SENSORS_STPDDC60 is not set
# CONFIG_SENSORS_TPS40422 is not set
# CONFIG_SENSORS_TPS53679 is not set
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_XDPE122 is not set
CONFIG_SENSORS_ZL6100=m
# CONFIG_SENSORS_SBTSI is not set
CONFIG_SENSORS_SHT15=m
CONFIG_SENSORS_SHT21=m
# CONFIG_SENSORS_SHT3x is not set
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=m
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
# CONFIG_SENSORS_EMC2103 is not set
CONFIG_SENSORS_EMC6W201=m
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
CONFIG_SENSORS_SCH56XX_COMMON=m
CONFIG_SENSORS_SCH5627=m
CONFIG_SENSORS_SCH5636=m
# CONFIG_SENSORS_STTS751 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_ADC128D818 is not set
CONFIG_SENSORS_ADS7828=m
# CONFIG_SENSORS_ADS7871 is not set
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
# CONFIG_SENSORS_INA3221 is not set
# CONFIG_SENSORS_TC74 is not set
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=m
# CONFIG_SENSORS_TMP513 is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83773G is not set
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83795=m
# CONFIG_SENSORS_W83795_FANCTRL is not set
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
# CONFIG_SENSORS_XGENE is not set

#
# ACPI drivers
#
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_SENSORS_ATK0110=m
CONFIG_THERMAL=y
# CONFIG_THERMAL_NETLINK is not set
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set

#
# Intel thermal drivers
#
CONFIG_INTEL_POWERCLAMP=m
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_X86_PKG_TEMP_THERMAL=m
CONFIG_INTEL_SOC_DTS_IOSF_CORE=m
# CONFIG_INTEL_SOC_DTS_THERMAL is not set

#
# ACPI INT340X thermal drivers
#
CONFIG_INT340X_THERMAL=m
CONFIG_ACPI_THERMAL_REL=m
# CONFIG_INT3406_THERMAL is not set
CONFIG_PROC_THERMAL_MMIO_RAPL=m
# end of ACPI INT340X thermal drivers

CONFIG_INTEL_PCH_THERMAL=m
# CONFIG_INTEL_TCC_COOLING is not set
# end of Intel thermal drivers

CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
CONFIG_WATCHDOG_SYSFS=y

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
CONFIG_WDAT_WDT=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_ZIIRAVE_WATCHDOG is not set
# CONFIG_MLX_WDT is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_EBC_C384_WDT is not set
CONFIG_F71808E_WDT=m
CONFIG_SP5100_TCO=m
CONFIG_SBC_FITPC2_WATCHDOG=m
# CONFIG_EUROTECH_WDT is not set
CONFIG_IB700_WDT=m
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=y
CONFIG_IE6XX_WDT=m
CONFIG_ITCO_WDT=y
CONFIG_ITCO_VENDOR_SUPPORT=y
CONFIG_IT8712F_WDT=m
CONFIG_IT87_WDT=m
CONFIG_HP_WATCHDOG=m
CONFIG_HPWDT_NMI_DECODING=y
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
CONFIG_NV_TCO=m
# CONFIG_60XX_WDT is not set
# CONFIG_CPU5_WDT is not set
CONFIG_SMSC_SCH311X_WDT=m
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_TQMX86_WDT is not set
CONFIG_VIA_WDT=m
CONFIG_W83627HF_WDT=m
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
CONFIG_INTEL_MEI_WDT=m
# CONFIG_NI903X_WDT is not set
# CONFIG_NIC7018_WDT is not set
# CONFIG_MEN_A21_WDT is not set
CONFIG_XEN_WDT=m

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=m
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
# CONFIG_BCMA_HOST_SOC is not set
CONFIG_BCMA_DRIVER_PCI=y
CONFIG_BCMA_DRIVER_GMAC_CMN=y
CONFIG_BCMA_DRIVER_GPIO=y
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_AS3711 is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_AAT2870_CORE is not set
# CONFIG_MFD_BCM590XX is not set
# CONFIG_MFD_BD9571MWV is not set
# CONFIG_MFD_AXP20X_I2C is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_DA9052_SPI is not set
# CONFIG_MFD_DA9052_I2C is not set
# CONFIG_MFD_DA9055 is not set
# CONFIG_MFD_DA9062 is not set
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
# CONFIG_MFD_DLN2 is not set
# CONFIG_MFD_MC13XXX_SPI is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_MFD_MP2629 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_HTC_I2CPLD is not set
# CONFIG_MFD_INTEL_QUARK_I2C_GPIO is not set
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=m
# CONFIG_INTEL_SOC_PMIC_CHTDC_TI is not set
CONFIG_MFD_INTEL_LPSS=y
CONFIG_MFD_INTEL_LPSS_ACPI=y
CONFIG_MFD_INTEL_LPSS_PCI=y
# CONFIG_MFD_INTEL_PMC_BXT is not set
# CONFIG_MFD_INTEL_PMT is not set
# CONFIG_MFD_IQS62X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_MAX14577 is not set
# CONFIG_MFD_MAX77693 is not set
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_MENF21BMC is not set
# CONFIG_EZX_PCAP is not set
# CONFIG_MFD_VIPERBOARD is not set
# CONFIG_MFD_RETU is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
# CONFIG_MFD_SEC_CORE is not set
# CONFIG_MFD_SI476X_CORE is not set
CONFIG_MFD_SM501=m
CONFIG_MFD_SM501_GPIO=y
# CONFIG_MFD_SKY81452 is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65086 is not set
# CONFIG_MFD_TPS65090 is not set
# CONFIG_MFD_TI_LP873X is not set
# CONFIG_MFD_TPS6586X is not set
# CONFIG_MFD_TPS65910 is not set
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_MFD_TPS65912_SPI is not set
# CONFIG_MFD_TPS80031 is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TQMX86 is not set
CONFIG_MFD_VX855=m
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_ARIZONA_SPI is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_ATC260X_I2C is not set
# CONFIG_MFD_INTEL_M10_BMC is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
CONFIG_RC_CORE=m
CONFIG_RC_MAP=m
CONFIG_LIRC=y
CONFIG_RC_DECODERS=y
CONFIG_IR_NEC_DECODER=m
CONFIG_IR_RC5_DECODER=m
CONFIG_IR_RC6_DECODER=m
CONFIG_IR_JVC_DECODER=m
CONFIG_IR_SONY_DECODER=m
CONFIG_IR_SANYO_DECODER=m
# CONFIG_IR_SHARP_DECODER is not set
CONFIG_IR_MCE_KBD_DECODER=m
# CONFIG_IR_XMP_DECODER is not set
CONFIG_IR_IMON_DECODER=m
# CONFIG_IR_RCMM_DECODER is not set
CONFIG_RC_DEVICES=y
# CONFIG_RC_ATI_REMOTE is not set
CONFIG_IR_ENE=m
# CONFIG_IR_IMON is not set
# CONFIG_IR_IMON_RAW is not set
# CONFIG_IR_MCEUSB is not set
CONFIG_IR_ITE_CIR=m
CONFIG_IR_FINTEK=m
CONFIG_IR_NUVOTON=m
# CONFIG_IR_REDRAT3 is not set
# CONFIG_IR_STREAMZAP is not set
CONFIG_IR_WINBOND_CIR=m
# CONFIG_IR_IGORPLUGUSB is not set
# CONFIG_IR_IGUANA is not set
# CONFIG_IR_TTUSBIR is not set
# CONFIG_RC_LOOPBACK is not set
CONFIG_IR_SERIAL=m
CONFIG_IR_SERIAL_TRANSMITTER=y
CONFIG_IR_SIR=m
# CONFIG_RC_XBOX_DVD is not set
# CONFIG_IR_TOY is not set
CONFIG_MEDIA_CEC_SUPPORT=y
# CONFIG_CEC_CH7322 is not set
# CONFIG_CEC_SECO is not set
# CONFIG_USB_PULSE8_CEC is not set
# CONFIG_USB_RAINSHADOW_CEC is not set
CONFIG_MEDIA_SUPPORT=m
# CONFIG_MEDIA_SUPPORT_FILTER is not set
# CONFIG_MEDIA_SUBDRV_AUTOSELECT is not set

#
# Media device types
#
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
CONFIG_MEDIA_RADIO_SUPPORT=y
CONFIG_MEDIA_SDR_SUPPORT=y
CONFIG_MEDIA_PLATFORM_SUPPORT=y
CONFIG_MEDIA_TEST_SUPPORT=y
# end of Media device types

#
# Media core support
#
CONFIG_VIDEO_DEV=m
CONFIG_MEDIA_CONTROLLER=y
CONFIG_DVB_CORE=m
# end of Media core support

#
# Video4Linux options
#
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L2_I2C=y
CONFIG_VIDEO_V4L2_SUBDEV_API=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
# end of Video4Linux options

#
# Media controller options
#
# CONFIG_MEDIA_CONTROLLER_DVB is not set
# end of Media controller options

#
# Digital TV options
#
# CONFIG_DVB_MMAP is not set
CONFIG_DVB_NET=y
CONFIG_DVB_MAX_ADAPTERS=16
CONFIG_DVB_DYNAMIC_MINORS=y
# CONFIG_DVB_DEMUX_SECTION_LOSS_LOG is not set
# CONFIG_DVB_ULE_DEBUG is not set
# end of Digital TV options

#
# Media drivers
#
# CONFIG_MEDIA_USB_SUPPORT is not set
# CONFIG_MEDIA_PCI_SUPPORT is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_SI470X is not set
# CONFIG_RADIO_SI4713 is not set
# CONFIG_USB_MR800 is not set
# CONFIG_USB_DSBR is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_SHARK is not set
# CONFIG_RADIO_SHARK2 is not set
# CONFIG_USB_KEENE is not set
# CONFIG_USB_RAREMONO is not set
# CONFIG_USB_MA901 is not set
# CONFIG_RADIO_TEA5764 is not set
# CONFIG_RADIO_SAA7706H is not set
# CONFIG_RADIO_TEF6862 is not set
# CONFIG_RADIO_WL1273 is not set
CONFIG_VIDEOBUF2_CORE=m
CONFIG_VIDEOBUF2_V4L2=m
CONFIG_VIDEOBUF2_MEMOPS=m
CONFIG_VIDEOBUF2_VMALLOC=m
# CONFIG_V4L_PLATFORM_DRIVERS is not set
# CONFIG_V4L_MEM2MEM_DRIVERS is not set
# CONFIG_DVB_PLATFORM_DRIVERS is not set
# CONFIG_SDR_PLATFORM_DRIVERS is not set

#
# MMC/SDIO DVB adapters
#
# CONFIG_SMS_SDIO_DRV is not set
# CONFIG_V4L_TEST_DRIVERS is not set
# CONFIG_DVB_TEST_DRIVERS is not set

#
# FireWire (IEEE 1394) Adapters
#
# CONFIG_DVB_FIREDTV is not set
# end of Media drivers

#
# Media ancillary drivers
#
CONFIG_MEDIA_ATTACH=y
CONFIG_VIDEO_IR_I2C=m

#
# Audio decoders, processors and mixers
#
# CONFIG_VIDEO_TVAUDIO is not set
# CONFIG_VIDEO_TDA7432 is not set
# CONFIG_VIDEO_TDA9840 is not set
# CONFIG_VIDEO_TEA6415C is not set
# CONFIG_VIDEO_TEA6420 is not set
# CONFIG_VIDEO_MSP3400 is not set
# CONFIG_VIDEO_CS3308 is not set
# CONFIG_VIDEO_CS5345 is not set
# CONFIG_VIDEO_CS53L32A is not set
# CONFIG_VIDEO_TLV320AIC23B is not set
# CONFIG_VIDEO_UDA1342 is not set
# CONFIG_VIDEO_WM8775 is not set
# CONFIG_VIDEO_WM8739 is not set
# CONFIG_VIDEO_VP27SMPX is not set
# CONFIG_VIDEO_SONY_BTF_MPX is not set
# end of Audio decoders, processors and mixers

#
# RDS decoders
#
# CONFIG_VIDEO_SAA6588 is not set
# end of RDS decoders

#
# Video decoders
#
# CONFIG_VIDEO_ADV7180 is not set
# CONFIG_VIDEO_ADV7183 is not set
# CONFIG_VIDEO_ADV7604 is not set
# CONFIG_VIDEO_ADV7842 is not set
# CONFIG_VIDEO_BT819 is not set
# CONFIG_VIDEO_BT856 is not set
# CONFIG_VIDEO_BT866 is not set
# CONFIG_VIDEO_KS0127 is not set
# CONFIG_VIDEO_ML86V7667 is not set
# CONFIG_VIDEO_SAA7110 is not set
# CONFIG_VIDEO_SAA711X is not set
# CONFIG_VIDEO_TC358743 is not set
# CONFIG_VIDEO_TVP514X is not set
# CONFIG_VIDEO_TVP5150 is not set
# CONFIG_VIDEO_TVP7002 is not set
# CONFIG_VIDEO_TW2804 is not set
# CONFIG_VIDEO_TW9903 is not set
# CONFIG_VIDEO_TW9906 is not set
# CONFIG_VIDEO_TW9910 is not set
# CONFIG_VIDEO_VPX3220 is not set

#
# Video and audio decoders
#
# CONFIG_VIDEO_SAA717X is not set
# CONFIG_VIDEO_CX25840 is not set
# end of Video decoders

#
# Video encoders
#
# CONFIG_VIDEO_SAA7127 is not set
# CONFIG_VIDEO_SAA7185 is not set
# CONFIG_VIDEO_ADV7170 is not set
# CONFIG_VIDEO_ADV7175 is not set
# CONFIG_VIDEO_ADV7343 is not set
# CONFIG_VIDEO_ADV7393 is not set
# CONFIG_VIDEO_ADV7511 is not set
# CONFIG_VIDEO_AD9389B is not set
# CONFIG_VIDEO_AK881X is not set
# CONFIG_VIDEO_THS8200 is not set
# end of Video encoders

#
# Video improvement chips
#
# CONFIG_VIDEO_UPD64031A is not set
# CONFIG_VIDEO_UPD64083 is not set
# end of Video improvement chips

#
# Audio/Video compression chips
#
# CONFIG_VIDEO_SAA6752HS is not set
# end of Audio/Video compression chips

#
# SDR tuner chips
#
# CONFIG_SDR_MAX2175 is not set
# end of SDR tuner chips

#
# Miscellaneous helper chips
#
# CONFIG_VIDEO_THS7303 is not set
# CONFIG_VIDEO_M52790 is not set
# CONFIG_VIDEO_I2C is not set
# CONFIG_VIDEO_ST_MIPID02 is not set
# end of Miscellaneous helper chips

#
# Camera sensor devices
#
# CONFIG_VIDEO_HI556 is not set
# CONFIG_VIDEO_IMX214 is not set
# CONFIG_VIDEO_IMX219 is not set
# CONFIG_VIDEO_IMX258 is not set
# CONFIG_VIDEO_IMX274 is not set
# CONFIG_VIDEO_IMX290 is not set
# CONFIG_VIDEO_IMX319 is not set
# CONFIG_VIDEO_IMX355 is not set
# CONFIG_VIDEO_OV02A10 is not set
# CONFIG_VIDEO_OV2640 is not set
# CONFIG_VIDEO_OV2659 is not set
# CONFIG_VIDEO_OV2680 is not set
# CONFIG_VIDEO_OV2685 is not set
# CONFIG_VIDEO_OV2740 is not set
# CONFIG_VIDEO_OV5647 is not set
# CONFIG_VIDEO_OV5648 is not set
# CONFIG_VIDEO_OV6650 is not set
# CONFIG_VIDEO_OV5670 is not set
# CONFIG_VIDEO_OV5675 is not set
# CONFIG_VIDEO_OV5695 is not set
# CONFIG_VIDEO_OV7251 is not set
# CONFIG_VIDEO_OV772X is not set
# CONFIG_VIDEO_OV7640 is not set
# CONFIG_VIDEO_OV7670 is not set
# CONFIG_VIDEO_OV7740 is not set
# CONFIG_VIDEO_OV8856 is not set
# CONFIG_VIDEO_OV8865 is not set
# CONFIG_VIDEO_OV9640 is not set
# CONFIG_VIDEO_OV9650 is not set
# CONFIG_VIDEO_OV9734 is not set
# CONFIG_VIDEO_OV13858 is not set
# CONFIG_VIDEO_VS6624 is not set
# CONFIG_VIDEO_MT9M001 is not set
# CONFIG_VIDEO_MT9M032 is not set
# CONFIG_VIDEO_MT9M111 is not set
# CONFIG_VIDEO_MT9P031 is not set
# CONFIG_VIDEO_MT9T001 is not set
# CONFIG_VIDEO_MT9T112 is not set
# CONFIG_VIDEO_MT9V011 is not set
# CONFIG_VIDEO_MT9V032 is not set
# CONFIG_VIDEO_MT9V111 is not set
# CONFIG_VIDEO_SR030PC30 is not set
# CONFIG_VIDEO_NOON010PC30 is not set
# CONFIG_VIDEO_M5MOLS is not set
# CONFIG_VIDEO_RDACM20 is not set
# CONFIG_VIDEO_RDACM21 is not set
# CONFIG_VIDEO_RJ54N1 is not set
# CONFIG_VIDEO_S5K6AA is not set
# CONFIG_VIDEO_S5K6A3 is not set
# CONFIG_VIDEO_S5K4ECGX is not set
# CONFIG_VIDEO_S5K5BAF is not set
# CONFIG_VIDEO_CCS is not set
# CONFIG_VIDEO_ET8EK8 is not set
# CONFIG_VIDEO_S5C73M3 is not set
# end of Camera sensor devices

#
# Lens drivers
#
# CONFIG_VIDEO_AD5820 is not set
# CONFIG_VIDEO_AK7375 is not set
# CONFIG_VIDEO_DW9714 is not set
# CONFIG_VIDEO_DW9768 is not set
# CONFIG_VIDEO_DW9807_VCM is not set
# end of Lens drivers

#
# Flash devices
#
# CONFIG_VIDEO_ADP1653 is not set
# CONFIG_VIDEO_LM3560 is not set
# CONFIG_VIDEO_LM3646 is not set
# end of Flash devices

#
# SPI helper chips
#
# CONFIG_VIDEO_GS1662 is not set
# end of SPI helper chips

#
# Media SPI Adapters
#
CONFIG_CXD2880_SPI_DRV=m
# end of Media SPI Adapters

CONFIG_MEDIA_TUNER=m

#
# Customize TV tuners
#
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA18250=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA827X=m
CONFIG_MEDIA_TUNER_TDA18271=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MSI001=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_MT2060=m
CONFIG_MEDIA_TUNER_MT2063=m
CONFIG_MEDIA_TUNER_MT2266=m
CONFIG_MEDIA_TUNER_MT2131=m
CONFIG_MEDIA_TUNER_QT1010=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_XC4000=m
CONFIG_MEDIA_TUNER_MXL5005S=m
CONFIG_MEDIA_TUNER_MXL5007T=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_MEDIA_TUNER_MAX2165=m
CONFIG_MEDIA_TUNER_TDA18218=m
CONFIG_MEDIA_TUNER_FC0011=m
CONFIG_MEDIA_TUNER_FC0012=m
CONFIG_MEDIA_TUNER_FC0013=m
CONFIG_MEDIA_TUNER_TDA18212=m
CONFIG_MEDIA_TUNER_E4000=m
CONFIG_MEDIA_TUNER_FC2580=m
CONFIG_MEDIA_TUNER_M88RS6000T=m
CONFIG_MEDIA_TUNER_TUA9001=m
CONFIG_MEDIA_TUNER_SI2157=m
CONFIG_MEDIA_TUNER_IT913X=m
CONFIG_MEDIA_TUNER_R820T=m
CONFIG_MEDIA_TUNER_MXL301RF=m
CONFIG_MEDIA_TUNER_QM1D1C0042=m
CONFIG_MEDIA_TUNER_QM1D1B0004=m
# end of Customize TV tuners

#
# Customise DVB Frontends
#

#
# Multistandard (satellite) frontends
#
CONFIG_DVB_STB0899=m
CONFIG_DVB_STB6100=m
CONFIG_DVB_STV090x=m
CONFIG_DVB_STV0910=m
CONFIG_DVB_STV6110x=m
CONFIG_DVB_STV6111=m
CONFIG_DVB_MXL5XX=m
CONFIG_DVB_M88DS3103=m

#
# Multistandard (cable + terrestrial) frontends
#
CONFIG_DVB_DRXK=m
CONFIG_DVB_TDA18271C2DD=m
CONFIG_DVB_SI2165=m
CONFIG_DVB_MN88472=m
CONFIG_DVB_MN88473=m

#
# DVB-S (satellite) frontends
#
CONFIG_DVB_CX24110=m
CONFIG_DVB_CX24123=m
CONFIG_DVB_MT312=m
CONFIG_DVB_ZL10036=m
CONFIG_DVB_ZL10039=m
CONFIG_DVB_S5H1420=m
CONFIG_DVB_STV0288=m
CONFIG_DVB_STB6000=m
CONFIG_DVB_STV0299=m
CONFIG_DVB_STV6110=m
CONFIG_DVB_STV0900=m
CONFIG_DVB_TDA8083=m
CONFIG_DVB_TDA10086=m
CONFIG_DVB_TDA8261=m
CONFIG_DVB_VES1X93=m
CONFIG_DVB_TUNER_ITD1000=m
CONFIG_DVB_TUNER_CX24113=m
CONFIG_DVB_TDA826X=m
CONFIG_DVB_TUA6100=m
CONFIG_DVB_CX24116=m
CONFIG_DVB_CX24117=m
CONFIG_DVB_CX24120=m
CONFIG_DVB_SI21XX=m
CONFIG_DVB_TS2020=m
CONFIG_DVB_DS3000=m
CONFIG_DVB_MB86A16=m
CONFIG_DVB_TDA10071=m

#
# DVB-T (terrestrial) frontends
#
CONFIG_DVB_SP8870=m
CONFIG_DVB_SP887X=m
CONFIG_DVB_CX22700=m
CONFIG_DVB_CX22702=m
CONFIG_DVB_S5H1432=m
CONFIG_DVB_DRXD=m
CONFIG_DVB_L64781=m
CONFIG_DVB_TDA1004X=m
CONFIG_DVB_NXT6000=m
CONFIG_DVB_MT352=m
CONFIG_DVB_ZL10353=m
CONFIG_DVB_DIB3000MB=m
CONFIG_DVB_DIB3000MC=m
CONFIG_DVB_DIB7000M=m
CONFIG_DVB_DIB7000P=m
CONFIG_DVB_DIB9000=m
CONFIG_DVB_TDA10048=m
CONFIG_DVB_AF9013=m
CONFIG_DVB_EC100=m
CONFIG_DVB_STV0367=m
CONFIG_DVB_CXD2820R=m
CONFIG_DVB_CXD2841ER=m
CONFIG_DVB_RTL2830=m
CONFIG_DVB_RTL2832=m
CONFIG_DVB_RTL2832_SDR=m
CONFIG_DVB_SI2168=m
CONFIG_DVB_ZD1301_DEMOD=m
CONFIG_DVB_CXD2880=m

#
# DVB-C (cable) frontends
#
CONFIG_DVB_VES1820=m
CONFIG_DVB_TDA10021=m
CONFIG_DVB_TDA10023=m
CONFIG_DVB_STV0297=m

#
# ATSC (North American/Korean Terrestrial/Cable DTV) frontends
#
CONFIG_DVB_NXT200X=m
CONFIG_DVB_OR51211=m
CONFIG_DVB_OR51132=m
CONFIG_DVB_BCM3510=m
CONFIG_DVB_LGDT330X=m
CONFIG_DVB_LGDT3305=m
CONFIG_DVB_LGDT3306A=m
CONFIG_DVB_LG2160=m
CONFIG_DVB_S5H1409=m
CONFIG_DVB_AU8522=m
CONFIG_DVB_AU8522_DTV=m
CONFIG_DVB_AU8522_V4L=m
CONFIG_DVB_S5H1411=m
CONFIG_DVB_MXL692=m

#
# ISDB-T (terrestrial) frontends
#
CONFIG_DVB_S921=m
CONFIG_DVB_DIB8000=m
CONFIG_DVB_MB86A20S=m

#
# ISDB-S (satellite) & ISDB-T (terrestrial) frontends
#
CONFIG_DVB_TC90522=m
CONFIG_DVB_MN88443X=m

#
# Digital terrestrial only tuners/PLL
#
CONFIG_DVB_PLL=m
CONFIG_DVB_TUNER_DIB0070=m
CONFIG_DVB_TUNER_DIB0090=m

#
# SEC control devices for DVB-S
#
CONFIG_DVB_DRX39XYJ=m
CONFIG_DVB_LNBH25=m
CONFIG_DVB_LNBH29=m
CONFIG_DVB_LNBP21=m
CONFIG_DVB_LNBP22=m
CONFIG_DVB_ISL6405=m
CONFIG_DVB_ISL6421=m
CONFIG_DVB_ISL6423=m
CONFIG_DVB_A8293=m
CONFIG_DVB_LGS8GL5=m
CONFIG_DVB_LGS8GXX=m
CONFIG_DVB_ATBM8830=m
CONFIG_DVB_TDA665x=m
CONFIG_DVB_IX2505V=m
CONFIG_DVB_M88RS2000=m
CONFIG_DVB_AF9033=m
CONFIG_DVB_HORUS3A=m
CONFIG_DVB_ASCOT2E=m
CONFIG_DVB_HELENE=m

#
# Common Interface (EN50221) controller drivers
#
CONFIG_DVB_CXD2099=m
CONFIG_DVB_SP2=m
# end of Customise DVB Frontends

#
# Tools to develop new frontends
#
# CONFIG_DVB_DUMMY_FE is not set
# end of Media ancillary drivers

#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_INTEL_GTT=m
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=64
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=m
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_DP_AUX_CHARDEV=y
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_KMS_FB_HELPER=y
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=m
CONFIG_DRM_VRAM_HELPER=m
CONFIG_DRM_TTM_HELPER=m
CONFIG_DRM_GEM_SHMEM_HELPER=y

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_I2C_NXP_TDA9950 is not set
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=m
CONFIG_DRM_I915_FORCE_PROBE=""
CONFIG_DRM_I915_CAPTURE_ERROR=y
CONFIG_DRM_I915_COMPRESS_ERROR=y
CONFIG_DRM_I915_USERPTR=y
CONFIG_DRM_I915_GVT=y
CONFIG_DRM_I915_GVT_KVMGT=m
CONFIG_DRM_I915_REQUEST_TIMEOUT=20000
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
# CONFIG_DRM_VGEM is not set
# CONFIG_DRM_VKMS is not set
# CONFIG_DRM_VMWGFX is not set
CONFIG_DRM_GMA500=m
# CONFIG_DRM_UDL is not set
CONFIG_DRM_AST=m
CONFIG_DRM_MGAG200=m
CONFIG_DRM_QXL=m
CONFIG_DRM_BOCHS=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_DRM_PANEL=y

#
# Display Panels
#
# CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN is not set
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
# CONFIG_DRM_ANALOGIX_ANX78XX is not set
# end of Display Interface Bridges

# CONFIG_DRM_ETNAVIV is not set
CONFIG_DRM_CIRRUS_QEMU=m
# CONFIG_DRM_GM12U320 is not set
# CONFIG_TINYDRM_HX8357D is not set
# CONFIG_TINYDRM_ILI9225 is not set
# CONFIG_TINYDRM_ILI9341 is not set
# CONFIG_TINYDRM_ILI9486 is not set
# CONFIG_TINYDRM_MI0283QT is not set
# CONFIG_TINYDRM_REPAPER is not set
# CONFIG_TINYDRM_ST7586 is not set
# CONFIG_TINYDRM_ST7735R is not set
# CONFIG_DRM_XEN_FRONTEND is not set
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_GUD is not set
# CONFIG_DRM_LEGACY is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_MODE_HELPERS is not set
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_SM501 is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_XEN_FBDEV_FRONTEND is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
CONFIG_FB_HYPERV=m
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_L4F00242T03 is not set
# CONFIG_LCD_LMS283GF05 is not set
# CONFIG_LCD_LTV350QV is not set
# CONFIG_LCD_ILI922X is not set
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_TDO24M is not set
# CONFIG_LCD_VGG2432A4 is not set
CONFIG_LCD_PLATFORM=m
# CONFIG_LCD_AMS369FG06 is not set
# CONFIG_LCD_LMS501KF03 is not set
# CONFIG_LCD_HX8357 is not set
# CONFIG_LCD_OTM3225A is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_KTD253 is not set
# CONFIG_BACKLIGHT_PWM is not set
CONFIG_BACKLIGHT_APPLE=m
# CONFIG_BACKLIGHT_QCOM_WLED is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630A is not set
# CONFIG_BACKLIGHT_LM3639 is not set
CONFIG_BACKLIGHT_LP855X=m
# CONFIG_BACKLIGHT_GPIO is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
# CONFIG_BACKLIGHT_BD6107 is not set
# CONFIG_BACKLIGHT_ARCXCNN is not set
# end of Backlight & LCD device support

CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
# CONFIG_HID_ACCUTOUCH is not set
CONFIG_HID_ACRUX=m
# CONFIG_HID_ACRUX_FF is not set
CONFIG_HID_APPLE=m
# CONFIG_HID_APPLEIR is not set
CONFIG_HID_ASUS=m
CONFIG_HID_AUREAL=m
CONFIG_HID_BELKIN=m
# CONFIG_HID_BETOP_FF is not set
# CONFIG_HID_BIGBEN_FF is not set
CONFIG_HID_CHERRY=m
CONFIG_HID_CHICONY=m
# CONFIG_HID_CORSAIR is not set
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
CONFIG_HID_CMEDIA=m
# CONFIG_HID_CP2112 is not set
# CONFIG_HID_CREATIVE_SB0540 is not set
CONFIG_HID_CYPRESS=m
CONFIG_HID_DRAGONRISE=m
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELAN is not set
CONFIG_HID_ELECOM=m
# CONFIG_HID_ELO is not set
CONFIG_HID_EZKEY=m
# CONFIG_HID_FT260 is not set
CONFIG_HID_GEMBIRD=m
CONFIG_HID_GFRM=m
# CONFIG_HID_GLORIOUS is not set
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_VIVALDI is not set
# CONFIG_HID_GT683R is not set
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
# CONFIG_HID_UCLOGIC is not set
CONFIG_HID_WALTOP=m
# CONFIG_HID_VIEWSONIC is not set
CONFIG_HID_GYRATION=m
CONFIG_HID_ICADE=m
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_LOGITECH=m
CONFIG_HID_LOGITECH_DJ=m
CONFIG_HID_LOGITECH_HIDPP=m
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
CONFIG_HID_MAGICMOUSE=y
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=m
CONFIG_HID_MULTITOUCH=m
CONFIG_HID_NTI=m
# CONFIG_HID_NTRIG is not set
CONFIG_HID_ORTEK=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
# CONFIG_HID_PENMOUNT is not set
CONFIG_HID_PETALYNX=m
CONFIG_HID_PICOLCD=m
CONFIG_HID_PICOLCD_FB=y
CONFIG_HID_PICOLCD_BACKLIGHT=y
CONFIG_HID_PICOLCD_LCD=y
CONFIG_HID_PICOLCD_LEDS=y
CONFIG_HID_PICOLCD_CIR=y
CONFIG_HID_PLANTRONICS=m
# CONFIG_HID_PLAYSTATION is not set
CONFIG_HID_PRIMAX=m
# CONFIG_HID_RETRODE is not set
# CONFIG_HID_ROCCAT is not set
CONFIG_HID_SAITEK=m
CONFIG_HID_SAMSUNG=m
# CONFIG_HID_SONY is not set
CONFIG_HID_SPEEDLINK=m
# CONFIG_HID_STEAM is not set
CONFIG_HID_STEELSERIES=m
CONFIG_HID_SUNPLUS=m
CONFIG_HID_RMI=m
CONFIG_HID_GREENASIA=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_HYPERV_MOUSE=m
CONFIG_HID_SMARTJOYPLUS=m
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
CONFIG_HID_THINGM=m
CONFIG_HID_THRUSTMASTER=m
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_HID_UDRAW_PS3 is not set
# CONFIG_HID_U2FZERO is not set
# CONFIG_HID_WACOM is not set
CONFIG_HID_WIIMOTE=m
CONFIG_HID_XINMO=m
CONFIG_HID_ZEROPLUS=m
# CONFIG_ZEROPLUS_FF is not set
CONFIG_HID_ZYDACRON=m
CONFIG_HID_SENSOR_HUB=y
CONFIG_HID_SENSOR_CUSTOM_SENSOR=m
CONFIG_HID_ALPS=m
# CONFIG_HID_MCP2221 is not set
# end of Special HID drivers

#
# USB HID support
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set
# end of USB HID support

#
# I2C HID support
#
# CONFIG_I2C_HID_ACPI is not set
# end of I2C HID support

#
# Intel ISH HID support
#
CONFIG_INTEL_ISH_HID=m
# CONFIG_INTEL_ISH_FIRMWARE_DOWNLOADER is not set
# end of Intel ISH HID support

#
# AMD SFH HID Support
#
# CONFIG_AMD_SFH_HID is not set
# end of AMD SFH HID Support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
# CONFIG_USB_LED_TRIG is not set
# CONFIG_USB_ULPI_BUS is not set
# CONFIG_USB_CONN_GPIO is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_PCI=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEFAULT_PERSIST=y
# CONFIG_USB_FEW_INIT_RETRIES is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_OTG_PRODUCTLIST is not set
CONFIG_USB_LEDS_TRIGGER_USBPORT=y
CONFIG_USB_AUTOSUSPEND_DELAY=2
CONFIG_USB_MON=y

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_XHCI_HCD=y
# CONFIG_USB_XHCI_DBGCAP is not set
CONFIG_USB_XHCI_PCI=y
# CONFIG_USB_XHCI_PCI_RENESAS is not set
# CONFIG_USB_XHCI_PLATFORM is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_EHCI_PCI=y
# CONFIG_USB_EHCI_FSL is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_FOTG210_HCD is not set
# CONFIG_USB_MAX3421_HCD is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PCI=y
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HCD_BCMA is not set
# CONFIG_USB_HCD_TEST_MODE is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_USB_CDNS_SUPPORT is not set
# CONFIG_USB_MUSB_HDRC is not set
# CONFIG_USB_DWC3 is not set
# CONFIG_USB_DWC2 is not set
# CONFIG_USB_CHIPIDEA is not set
# CONFIG_USB_ISP1760 is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_SIMPLE is not set
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_F81232 is not set
# CONFIG_USB_SERIAL_F8153X is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_METRO is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MXUPORT is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QCAUX is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_XSENS_MT is not set
# CONFIG_USB_SERIAL_WISHBONE is not set
# CONFIG_USB_SERIAL_SSU100 is not set
# CONFIG_USB_SERIAL_QT2 is not set
# CONFIG_USB_SERIAL_UPD78F0730 is not set
# CONFIG_USB_SERIAL_XR is not set
CONFIG_USB_SERIAL_DEBUG=m

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_APPLE_MFI_FASTCHARGE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_EHSET_TEST_FIXTURE is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set
# CONFIG_USB_HUB_USB251XB is not set
# CONFIG_USB_HSIC_USB3503 is not set
# CONFIG_USB_HSIC_USB4604 is not set
# CONFIG_USB_LINK_LAYER_TEST is not set
# CONFIG_USB_CHAOSKEY is not set
# CONFIG_USB_ATM is not set

#
# USB Physical Layer drivers
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_USB_ISP1301 is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
CONFIG_TYPEC=y
# CONFIG_TYPEC_TCPM is not set
CONFIG_TYPEC_UCSI=y
# CONFIG_UCSI_CCG is not set
CONFIG_UCSI_ACPI=y
# CONFIG_TYPEC_TPS6598X is not set
# CONFIG_TYPEC_STUSB160X is not set

#
# USB Type-C Multiplexer/DeMultiplexer Switch support
#
# CONFIG_TYPEC_MUX_PI3USB30532 is not set
# end of USB Type-C Multiplexer/DeMultiplexer Switch support

#
# USB Type-C Alternate Mode drivers
#
# CONFIG_TYPEC_DP_ALTMODE is not set
# end of USB Type-C Alternate Mode drivers

# CONFIG_USB_ROLE_SWITCH is not set
CONFIG_MMC=m
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_SDIO_UART=m
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_SDHCI=m
CONFIG_MMC_SDHCI_IO_ACCESSORS=y
CONFIG_MMC_SDHCI_PCI=m
CONFIG_MMC_RICOH_MMC=y
CONFIG_MMC_SDHCI_ACPI=m
CONFIG_MMC_SDHCI_PLTFM=m
# CONFIG_MMC_SDHCI_F_SDH30 is not set
# CONFIG_MMC_WBSD is not set
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SPI is not set
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
# CONFIG_MMC_VUB300 is not set
# CONFIG_MMC_USHC is not set
# CONFIG_MMC_USDHI6ROL0 is not set
# CONFIG_MMC_REALTEK_PCI is not set
CONFIG_MMC_CQHCI=m
# CONFIG_MMC_HSQ is not set
# CONFIG_MMC_TOSHIBA_PCI is not set
# CONFIG_MMC_MTK is not set
# CONFIG_MMC_SDHCI_XENON is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set
# CONFIG_LEDS_CLASS_MULTICOLOR is not set
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_APU is not set
CONFIG_LEDS_LM3530=m
# CONFIG_LEDS_LM3532 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_GPIO is not set
CONFIG_LEDS_LP3944=m
# CONFIG_LEDS_LP3952 is not set
# CONFIG_LEDS_LP50XX is not set
CONFIG_LEDS_CLEVO_MAIL=m
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA963X is not set
# CONFIG_LEDS_DAC124S085 is not set
# CONFIG_LEDS_PWM is not set
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_INTEL_SS4200=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=m
CONFIG_LEDS_MLXCPLD=m
# CONFIG_LEDS_MLXREG is not set
# CONFIG_LEDS_USER is not set
# CONFIG_LEDS_NIC78BX is not set
# CONFIG_LEDS_TI_LMU_COMMON is not set

#
# Flash and Torch LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_ONESHOT=m
# CONFIG_LEDS_TRIGGER_DISK is not set
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_ACTIVITY is not set
CONFIG_LEDS_TRIGGER_GPIO=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
CONFIG_LEDS_TRIGGER_CAMERA=m
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=m
# CONFIG_LEDS_TRIGGER_TTY is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
CONFIG_INFINIBAND_VIRT_DMA=y
# CONFIG_INFINIBAND_MTHCA is not set
# CONFIG_INFINIBAND_EFA is not set
# CONFIG_INFINIBAND_I40IW is not set
# CONFIG_MLX4_INFINIBAND is not set
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_INFINIBAND_USNIC is not set
# CONFIG_INFINIBAND_RDMAVT is not set
CONFIG_RDMA_RXE=m
CONFIG_RDMA_SIW=m
CONFIG_INFINIBAND_IPOIB=m
# CONFIG_INFINIBAND_IPOIB_CM is not set
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_SRPT=m
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_INFINIBAND_ISERT is not set
# CONFIG_INFINIBAND_RTRS_CLIENT is not set
# CONFIG_INFINIBAND_RTRS_SERVER is not set
# CONFIG_INFINIBAND_OPA_VNIC is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
# CONFIG_EDAC_I10NM is not set
CONFIG_EDAC_PND2=m
# CONFIG_EDAC_IGEN6 is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_SYSTOHC is not set
# CONFIG_RTC_DEBUG is not set
CONFIG_RTC_NVMEM=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_ABB5ZES3 is not set
# CONFIG_RTC_DRV_ABEOZ9 is not set
# CONFIG_RTC_DRV_ABX80X is not set
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1307_CENTURY is not set
CONFIG_RTC_DRV_DS1374=m
# CONFIG_RTC_DRV_DS1374_WDT is not set
CONFIG_RTC_DRV_DS1672=m
CONFIG_RTC_DRV_MAX6900=m
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_ISL12022=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8523=m
# CONFIG_RTC_DRV_PCF85063 is not set
# CONFIG_RTC_DRV_PCF85363 is not set
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
CONFIG_RTC_DRV_M41T80=m
CONFIG_RTC_DRV_M41T80_WDT=y
CONFIG_RTC_DRV_BQ32K=m
# CONFIG_RTC_DRV_S35390A is not set
CONFIG_RTC_DRV_FM3130=m
# CONFIG_RTC_DRV_RX8010 is not set
CONFIG_RTC_DRV_RX8581=m
CONFIG_RTC_DRV_RX8025=m
CONFIG_RTC_DRV_EM3027=m
# CONFIG_RTC_DRV_RV3028 is not set
# CONFIG_RTC_DRV_RV3032 is not set
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
# CONFIG_RTC_DRV_M41T93 is not set
# CONFIG_RTC_DRV_M41T94 is not set
# CONFIG_RTC_DRV_DS1302 is not set
# CONFIG_RTC_DRV_DS1305 is not set
# CONFIG_RTC_DRV_DS1343 is not set
# CONFIG_RTC_DRV_DS1347 is not set
# CONFIG_RTC_DRV_DS1390 is not set
# CONFIG_RTC_DRV_MAX6916 is not set
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RX4581=m
# CONFIG_RTC_DRV_RS5C348 is not set
# CONFIG_RTC_DRV_MAX6902 is not set
# CONFIG_RTC_DRV_PCF2123 is not set
# CONFIG_RTC_DRV_MCP795 is not set
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=m
CONFIG_RTC_DRV_DS3232_HWMON=y
# CONFIG_RTC_DRV_PCF2127 is not set
CONFIG_RTC_DRV_RV3029C2=m
# CONFIG_RTC_DRV_RV3029_HWMON is not set
# CONFIG_RTC_DRV_RX6110 is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=m
CONFIG_RTC_DRV_DS1511=m
CONFIG_RTC_DRV_DS1553=m
# CONFIG_RTC_DRV_DS1685_FAMILY is not set
CONFIG_RTC_DRV_DS1742=m
CONFIG_RTC_DRV_DS2404=m
CONFIG_RTC_DRV_STK17TA8=m
# CONFIG_RTC_DRV_M48T86 is not set
CONFIG_RTC_DRV_M48T35=m
CONFIG_RTC_DRV_M48T59=m
CONFIG_RTC_DRV_MSM6242=m
CONFIG_RTC_DRV_BQ4802=m
CONFIG_RTC_DRV_RP5C01=m
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_RTC_DRV_FTRTC010 is not set

#
# HID Sensor RTC drivers
#
# CONFIG_RTC_DRV_GOLDFISH is not set
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
# CONFIG_ALTERA_MSGDMA is not set
CONFIG_INTEL_IDMA64=m
# CONFIG_INTEL_IDXD is not set
CONFIG_INTEL_IOATDMA=m
# CONFIG_PLX_DMA is not set
# CONFIG_XILINX_ZYNQMP_DPDMA is not set
# CONFIG_QCOM_HIDMA_MGMT is not set
# CONFIG_QCOM_HIDMA is not set
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=m
CONFIG_DW_DMAC_PCI=y
# CONFIG_DW_EDMA is not set
# CONFIG_DW_EDMA_PCIE is not set
CONFIG_HSU_DMA=y
# CONFIG_SF_PDMA is not set
# CONFIG_INTEL_LDMA is not set

#
# DMA Clients
#
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=m
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_DEBUG is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# end of DMABUF options

CONFIG_DCA=m
# CONFIG_AUXDISPLAY is not set
# CONFIG_PANEL is not set
CONFIG_UIO=m
CONFIG_UIO_CIF=m
CONFIG_UIO_PDRV_GENIRQ=m
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=m
CONFIG_UIO_SERCOS3=m
CONFIG_UIO_PCI_GENERIC=m
# CONFIG_UIO_NETX is not set
# CONFIG_UIO_PRUSS is not set
# CONFIG_UIO_MF624 is not set
CONFIG_UIO_HV_GENERIC=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO=m
CONFIG_VFIO_NOIOMMU=y
CONFIG_VFIO_PCI=m
# CONFIG_VFIO_PCI_VGA is not set
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
# CONFIG_VFIO_PCI_IGD is not set
CONFIG_VFIO_MDEV=m
CONFIG_VFIO_MDEV_DEVICE=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_PCI_LEGACY=y
# CONFIG_VIRTIO_PMEM is not set
CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_MEM=m
CONFIG_VIRTIO_INPUT=m
# CONFIG_VIRTIO_MMIO is not set
CONFIG_VIRTIO_DMA_SHARED_BUFFER=m
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
# CONFIG_VHOST_SCSI is not set
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
CONFIG_HYPERV=m
CONFIG_HYPERV_TIMER=y
CONFIG_HYPERV_UTILS=m
CONFIG_HYPERV_BALLOON=m
# end of Microsoft Hyper-V guest support

#
# Xen driver support
#
# CONFIG_XEN_BALLOON is not set
CONFIG_XEN_DEV_EVTCHN=m
# CONFIG_XEN_BACKEND is not set
CONFIG_XENFS=m
CONFIG_XEN_COMPAT_XENFS=y
CONFIG_XEN_SYS_HYPERVISOR=y
CONFIG_XEN_XENBUS_FRONTEND=y
# CONFIG_XEN_GNTDEV is not set
# CONFIG_XEN_GRANT_DEV_ALLOC is not set
# CONFIG_XEN_GRANT_DMA_ALLOC is not set
CONFIG_SWIOTLB_XEN=y
# CONFIG_XEN_PVCALLS_FRONTEND is not set
CONFIG_XEN_PRIVCMD=m
CONFIG_XEN_EFI=y
CONFIG_XEN_AUTO_XLATE=y
CONFIG_XEN_ACPI=y
# CONFIG_XEN_UNPOPULATED_ALLOC is not set
# end of Xen driver support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACPI_WMI=m
CONFIG_WMI_BMOF=m
# CONFIG_HUAWEI_WMI is not set
# CONFIG_UV_SYSFS is not set
# CONFIG_INTEL_WMI_SBL_FW_UPDATE is not set
CONFIG_INTEL_WMI_THUNDERBOLT=m
CONFIG_MXM_WMI=m
# CONFIG_PEAQ_WMI is not set
# CONFIG_XIAOMI_WMI is not set
# CONFIG_GIGABYTE_WMI is not set
CONFIG_ACERHDF=m
# CONFIG_ACER_WIRELESS is not set
CONFIG_ACER_WMI=m
# CONFIG_AMD_PMC is not set
# CONFIG_ADV_SWBUTTON is not set
CONFIG_APPLE_GMUX=m
CONFIG_ASUS_LAPTOP=m
# CONFIG_ASUS_WIRELESS is not set
CONFIG_ASUS_WMI=m
CONFIG_ASUS_NB_WMI=m
CONFIG_EEEPC_LAPTOP=m
CONFIG_EEEPC_WMI=m
# CONFIG_X86_PLATFORM_DRIVERS_DELL is not set
CONFIG_AMILO_RFKILL=m
CONFIG_FUJITSU_LAPTOP=m
CONFIG_FUJITSU_TABLET=m
# CONFIG_GPD_POCKET_FAN is not set
CONFIG_HP_ACCEL=m
CONFIG_HP_WIRELESS=m
CONFIG_HP_WMI=m
# CONFIG_IBM_RTL is not set
CONFIG_IDEAPAD_LAPTOP=m
CONFIG_SENSORS_HDAPS=m
CONFIG_THINKPAD_ACPI=m
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_INTEL_ATOMISP2_PM is not set
CONFIG_INTEL_HID_EVENT=m
# CONFIG_INTEL_INT0002_VGPIO is not set
# CONFIG_INTEL_MENLOW is not set
CONFIG_INTEL_OAKTRAIL=m
CONFIG_INTEL_VBTN=m
CONFIG_MSI_LAPTOP=m
CONFIG_MSI_WMI=m
# CONFIG_PCENGINES_APU2 is not set
CONFIG_SAMSUNG_LAPTOP=m
CONFIG_SAMSUNG_Q10=m
CONFIG_TOSHIBA_BT_RFKILL=m
# CONFIG_TOSHIBA_HAPS is not set
# CONFIG_TOSHIBA_WMI is not set
CONFIG_ACPI_CMPC=m
CONFIG_COMPAL_LAPTOP=m
# CONFIG_LG_LAPTOP is not set
CONFIG_PANASONIC_LAPTOP=m
CONFIG_SONY_LAPTOP=m
CONFIG_SONYPI_COMPAT=y
# CONFIG_SYSTEM76_ACPI is not set
CONFIG_TOPSTAR_LAPTOP=m
# CONFIG_I2C_MULTI_INSTANTIATE is not set
CONFIG_MLX_PLATFORM=m
CONFIG_INTEL_IPS=m
CONFIG_INTEL_RST=m
# CONFIG_INTEL_SMARTCONNECT is not set

#
# Intel Speed Select Technology interface support
#
# CONFIG_INTEL_SPEED_SELECT_INTERFACE is not set
# end of Intel Speed Select Technology interface support

CONFIG_INTEL_TURBO_MAX_3=y
# CONFIG_INTEL_UNCORE_FREQ_CONTROL is not set
CONFIG_INTEL_PMC_CORE=m
# CONFIG_INTEL_PUNIT_IPC is not set
# CONFIG_INTEL_SCU_PCI is not set
# CONFIG_INTEL_SCU_PLATFORM is not set
CONFIG_PMC_ATOM=y
# CONFIG_CHROME_PLATFORMS is not set
CONFIG_MELLANOX_PLATFORM=y
CONFIG_MLXREG_HOTPLUG=m
# CONFIG_MLXREG_IO is not set
CONFIG_SURFACE_PLATFORMS=y
# CONFIG_SURFACE3_WMI is not set
# CONFIG_SURFACE_3_POWER_OPREGION is not set
# CONFIG_SURFACE_GPE is not set
# CONFIG_SURFACE_HOTPLUG is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
CONFIG_HAVE_CLK=y
CONFIG_CLKDEV_LOOKUP=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
# CONFIG_COMMON_CLK_MAX9485 is not set
# CONFIG_COMMON_CLK_SI5341 is not set
# CONFIG_COMMON_CLK_SI5351 is not set
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_CDCE706 is not set
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_COMMON_CLK_PWM is not set
# CONFIG_XILINX_VCU is not set
CONFIG_HWSPINLOCK=y

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_PCC=y
# CONFIG_ALTERA_MBOX is not set
CONFIG_IOMMU_IOVA=y
CONFIG_IOASID=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_IOMMU_DMA=y
# CONFIG_AMD_IOMMU is not set
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
# CONFIG_INTEL_IOMMU_SVM is not set
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set
CONFIG_IRQ_REMAP=y
CONFIG_HYPERV_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_QCOM_GLINK_RPM is not set
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
CONFIG_NTB=m
# CONFIG_NTB_MSI is not set
# CONFIG_NTB_AMD is not set
# CONFIG_NTB_IDT is not set
# CONFIG_NTB_INTEL is not set
# CONFIG_NTB_EPF is not set
# CONFIG_NTB_SWITCHTEC is not set
# CONFIG_NTB_PINGPONG is not set
# CONFIG_NTB_TOOL is not set
# CONFIG_NTB_PERF is not set
# CONFIG_NTB_TRANSPORT is not set
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
# CONFIG_PWM_DEBUG is not set
# CONFIG_PWM_DWC is not set
CONFIG_PWM_LPSS=m
CONFIG_PWM_LPSS_PCI=m
CONFIG_PWM_LPSS_PLATFORM=m
# CONFIG_PWM_PCA9685 is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_USB_LGM_PHY is not set
# CONFIG_BCM_KONA_USB2_PHY is not set
# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# end of PHY Subsystem

CONFIG_POWERCAP=y
CONFIG_INTEL_RAPL_CORE=m
CONFIG_INTEL_RAPL=m
# CONFIG_IDLE_INJECT is not set
# CONFIG_DTPM is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
# CONFIG_RAS_CEC is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

CONFIG_LIBNVDIMM=m
CONFIG_BLK_DEV_PMEM=m
CONFIG_ND_BLK=m
CONFIG_ND_CLAIM=y
CONFIG_ND_BTT=m
CONFIG_BTT=y
CONFIG_ND_PFN=m
CONFIG_NVDIMM_PFN=y
CONFIG_NVDIMM_DAX=y
CONFIG_NVDIMM_KEYS=y
CONFIG_DAX_DRIVER=y
CONFIG_DAX=y
CONFIG_DEV_DAX=m
CONFIG_DEV_DAX_PMEM=m
CONFIG_DEV_DAX_KMEM=m
CONFIG_DEV_DAX_PMEM_COMPAT=m
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
# CONFIG_NVMEM_RMEM is not set

#
# HW tracing support
#
CONFIG_STM=m
# CONFIG_STM_PROTO_BASIC is not set
# CONFIG_STM_PROTO_SYS_T is not set
CONFIG_STM_DUMMY=m
CONFIG_STM_SOURCE_CONSOLE=m
CONFIG_STM_SOURCE_HEARTBEAT=m
CONFIG_STM_SOURCE_FTRACE=m
CONFIG_INTEL_TH=m
CONFIG_INTEL_TH_PCI=m
CONFIG_INTEL_TH_ACPI=m
CONFIG_INTEL_TH_GTH=m
CONFIG_INTEL_TH_STH=m
CONFIG_INTEL_TH_MSU=m
CONFIG_INTEL_TH_PTI=m
# CONFIG_INTEL_TH_DEBUG is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_UNISYS_VISORBUS is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
CONFIG_EXT2_FS=m
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
CONFIG_EXT4_KUNIT_TESTS=m
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=m
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
# CONFIG_NILFS2_FS is not set
CONFIG_F2FS_FS=m
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_FS_SECURITY=y
# CONFIG_F2FS_CHECK_FS is not set
# CONFIG_F2FS_FAULT_INJECTION is not set
# CONFIG_F2FS_FS_COMPRESSION is not set
# CONFIG_ZONEFS_FS is not set
CONFIG_FS_DAX=y
CONFIG_FS_DAX_PMD=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_MANDATORY_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_VERITY is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=y
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
# CONFIG_VIRTIO_FS is not set
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
# CONFIG_NETFS_STATS is not set
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
# CONFIG_FSCACHE_OBJECT_LIST is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_VMCORE_DEVICE_DUMP=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
# CONFIG_TMPFS_INODE64 is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_EFIVAR_FS=y
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
# CONFIG_SQUASHFS_FILE_CACHE is not set
CONFIG_SQUASHFS_FILE_DIRECT=y
# CONFIG_SQUASHFS_DECOMP_SINGLE is not set
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
# CONFIG_SQUASHFS_LZ4 is not set
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
# CONFIG_SQUASHFS_ZSTD is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
CONFIG_MINIX_FS=m
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
# CONFIG_PSTORE_CONSOLE is not set
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=m
# CONFIG_PSTORE_BLK is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
# CONFIG_NFS_V2 is not set
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
# CONFIG_NFS_SWAP is not set
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
# CONFIG_NFS_V4_1_MIGRATION is not set
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_ROOT_NFS=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
# CONFIG_NFS_V4_2_READ_PLUS is not set
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
CONFIG_NFSD_SCSILAYOUT=y
# CONFIG_NFSD_FLEXFILELAYOUT is not set
# CONFIG_NFSD_V4_2_INTER_SSC is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_NFS_V4_2_SSC_HELPER=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_BACKCHANNEL=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
CONFIG_SUNRPC_DEBUG=y
CONFIG_SUNRPC_XPRT_RDMA=m
CONFIG_CEPH_FS=m
# CONFIG_CEPH_FSCACHE is not set
CONFIG_CEPH_FS_POSIX_ACL=y
# CONFIG_CEPH_FS_SECURITY_LABEL is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS2 is not set
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_CIFS_SWN_UPCALL is not set
# CONFIG_CIFS_SMB_DIRECT is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=m
CONFIG_NLS_MAC_CYRILLIC=m
CONFIG_NLS_MAC_GAELIC=m
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=m
CONFIG_NLS_MAC_INUIT=m
CONFIG_NLS_MAC_ROMANIAN=m
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
CONFIG_TRUSTED_KEYS=y
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_KEY_DH_OPERATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_PAGE_TABLE_ISOLATION=y
# CONFIG_SECURITY_INFINIBAND is not set
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_PATH=y
CONFIG_INTEL_TXT=y
CONFIG_LSM_MMAP_MIN_ADDR=65535
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_HARDENED_USERCOPY_FALLBACK=y
CONFIG_FORTIFY_SOURCE=y
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_HASH=y
CONFIG_SECURITY_APPARMOR_HASH_DEFAULT=y
# CONFIG_SECURITY_APPARMOR_DEBUG is not set
CONFIG_SECURITY_APPARMOR_KUNIT_TEST=y
# CONFIG_SECURITY_LOADPIN is not set
CONFIG_SECURITY_YAMA=y
# CONFIG_SECURITY_SAFESETID is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
# CONFIG_SECURITY_LANDLOCK is not set
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
# CONFIG_INTEGRITY_PLATFORM_KEYRING is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
CONFIG_EVM=y
CONFIG_EVM_ATTR_FSUUID=y
# CONFIG_EVM_ADD_XATTRS is not set
# CONFIG_EVM_LOAD_X509 is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_APPARMOR is not set
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
# end of Memory initialization
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_SIMD=y

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
# CONFIG_CRYPTO_ECDSA is not set
# CONFIG_CRYPTO_ECRDSA is not set
# CONFIG_CRYPTO_SM2 is not set
# CONFIG_CRYPTO_CURVE25519 is not set
# CONFIG_CRYPTO_CURVE25519_X86 is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
# CONFIG_CRYPTO_AEGIS128 is not set
# CONFIG_CRYPTO_AEGIS128_AESNI_SSE2 is not set
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
# CONFIG_CRYPTO_OFB is not set
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=m
# CONFIG_CRYPTO_KEYWRAP is not set
# CONFIG_CRYPTO_NHPOLY1305_SSE2 is not set
# CONFIG_CRYPTO_NHPOLY1305_AVX2 is not set
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_BLAKE2B=m
# CONFIG_CRYPTO_BLAKE2S is not set
# CONFIG_CRYPTO_BLAKE2S_X86 is not set
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_POLY1305_X86_64=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_SSSE3=y
CONFIG_CRYPTO_SHA256_SSSE3=y
CONFIG_CRYPTO_SHA512_SSSE3=m
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_SHA3=m
# CONFIG_CRYPTO_SM3 is not set
# CONFIG_CRYPTO_STREEBOG is not set
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_NI_INTEL=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_BLOWFISH_X86_64=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_DES3_EDE_X86_64=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_CHACHA20_X86_64=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
# CONFIG_CRYPTO_SM4 is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_X86_64=m
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=m
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_842 is not set
# CONFIG_CRYPTO_LZ4 is not set
# CONFIG_CRYPTO_LZ4HC is not set
# CONFIG_CRYPTO_ZSTD is not set

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_USER_API_RNG=y
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
CONFIG_CRYPTO_USER_API_AEAD=y
CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE=y
# CONFIG_CRYPTO_STATS is not set
CONFIG_CRYPTO_HASH_INFO=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
# CONFIG_CRYPTO_LIB_BLAKE2S is not set
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
CONFIG_CRYPTO_ARCH_HAVE_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
# CONFIG_CRYPTO_DEV_ATMEL_ECC is not set
# CONFIG_CRYPTO_DEV_ATMEL_SHA204A is not set
CONFIG_CRYPTO_DEV_CCP=y
CONFIG_CRYPTO_DEV_CCP_DD=m
CONFIG_CRYPTO_DEV_SP_CCP=y
CONFIG_CRYPTO_DEV_CCP_CRYPTO=m
CONFIG_CRYPTO_DEV_SP_PSP=y
# CONFIG_CRYPTO_DEV_CCP_DEBUGFS is not set
CONFIG_CRYPTO_DEV_QAT=m
CONFIG_CRYPTO_DEV_QAT_DH895xCC=m
CONFIG_CRYPTO_DEV_QAT_C3XXX=m
CONFIG_CRYPTO_DEV_QAT_C62X=m
# CONFIG_CRYPTO_DEV_QAT_4XXX is not set
CONFIG_CRYPTO_DEV_QAT_DH895xCCVF=m
CONFIG_CRYPTO_DEV_QAT_C3XXXVF=m
CONFIG_CRYPTO_DEV_QAT_C62XVF=m
CONFIG_CRYPTO_DEV_NITROX=m
CONFIG_CRYPTO_DEV_NITROX_CNN55XX=m
# CONFIG_CRYPTO_DEV_VIRTIO is not set
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
# CONFIG_ASYMMETRIC_TPM_KEY_SUBTYPE is not set
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
CONFIG_SIGNED_PE_FILE_VERIFICATION=y

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# CONFIG_SYSTEM_REVOCATION_LIST is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_RAID6_PQ_BENCHMARK=y
CONFIG_LINEAR_RANGES=m
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_CORDIC=m
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=m
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_SWIOTLB=y
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=200
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_CPUMASK_OFFSTACK=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_DIMLIB=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_MEMREGION=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_COPY_MC=y
CONFIG_ARCH_STACKWALK=y
CONFIG_SBITMAP=y
# CONFIG_STRING_SELFTEST is not set
# end of Library routines

CONFIG_ASN1_ENCODER=y

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_PRINTK_CALLER=y
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_BOOT_PRINTK_DELAY=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
# CONFIG_GDB_SCRIPTS is not set
CONFIG_FRAME_WARN=2048
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_STACK_VALIDATION=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
# end of Generic Kernel Debugging Instruments

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=480
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_WQ_WATCHDOG=y
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
# CONFIG_MMIOTRACE is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_BPF_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_RING_BUFFER_BENCHMARK=m
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_RECORD_RECURSION is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
# CONFIG_SAMPLES is not set
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
CONFIG_STRICT_DEVMEM=y
# CONFIG_IO_STRICT_DEVMEM is not set

#
# x86 Debugging
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_EARLY_PRINTK_USB=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_USB_XDBC=y
# CONFIG_EFI_PGT_DUMP is not set
# CONFIG_DEBUG_TLBFLUSH is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_X86_DECODER_SELFTEST=y
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_DEBUG_NMI_SELFTEST is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_PUNIT_ATOM_DEBUG is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=y
CONFIG_KUNIT_DEBUGFS=y
# CONFIG_KUNIT_TEST is not set
# CONFIG_KUNIT_EXAMPLE_TEST is not set
CONFIG_KUNIT_ALL_TESTS=m
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
CONFIG_FAULT_INJECTION=y
# CONFIG_FAILSLAB is not set
# CONFIG_FAIL_PAGE_ALLOC is not set
# CONFIG_FAULT_INJECTION_USERCOPY is not set
CONFIG_FAIL_MAKE_REQUEST=y
# CONFIG_FAIL_IO_TIMEOUT is not set
# CONFIG_FAIL_FUTEX is not set
CONFIG_FAULT_INJECTION_DEBUG_FS=y
# CONFIG_FAIL_FUNCTION is not set
# CONFIG_FAIL_MMC_REQUEST is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
# CONFIG_LKDTM is not set
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_SORT is not set
# CONFIG_TEST_DIV64 is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_PERCPU_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_OVERFLOW is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_HASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
CONFIG_BITFIELD_KUNIT=m
CONFIG_RESOURCE_KUNIT_TEST=m
CONFIG_SYSCTL_KUNIT_TEST=m
CONFIG_LIST_KUNIT_TEST=m
CONFIG_LINEAR_RANGES_TEST=m
CONFIG_CMDLINE_KUNIT_TEST=m
CONFIG_BITS_TEST=m
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
# CONFIG_TEST_LIVEPATCH is not set
# CONFIG_TEST_STACKINIT is not set
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_HMM is not set
# CONFIG_TEST_FREE_PAGES is not set
# CONFIG_TEST_FPU is not set
CONFIG_ARCH_USE_MEMTEST=y
# CONFIG_MEMTEST is not set
# CONFIG_HYPERV_TESTING is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

[-- Attachment #3: job-script --]
[-- Type: text/plain, Size: 5820 bytes --]

#!/bin/sh

export_top_env()
{
	export suite='xfstests'
	export testcase='xfstests'
	export category='functional'
	export need_memory='3G'
	export job_origin='xfstests-btrfs.yaml'
	export queue_cmdline_keys='branch
commit
queue_at_least_once'
	export queue='validate'
	export testbox='lkp-hsw-d01'
	export tbox_group='lkp-hsw-d01'
	export kconfig='x86_64-rhel-8.3'
	export submit_id='6102b94dcc8c84025e3e7591'
	export job_file='/lkp/jobs/scheduled/lkp-hsw-d01/xfstests-6HDD-btrfs-btrfs-group-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-58749022685aea90dfddfb9f8b2fcdc74dee6ec0-20210729-66142-ecdxp4-3.yaml'
	export id='7ea1c79e43fece4caa9f698267d33984e70c34c8'
	export queuer_version='/lkp-src'
	export model='Haswell'
	export nr_node=1
	export nr_cpu=8
	export memory='8G'
	export nr_ssd_partitions=1
	export nr_hdd_partitions=6
	export hdd_partitions='/dev/disk/by-id/ata-ST4000NM0035-1V4107_ZC12HHHW-part*'
	export ssd_partitions='/dev/disk/by-id/ata-INTEL_SSDSC2BB800G4_PHWL42040015800RGN-part2'
	export swap_partitions=
	export rootfs_partition='/dev/disk/by-id/ata-INTEL_SSDSC2BB800G4_PHWL42040015800RGN-part1'
	export brand='Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz'
	export need_kconfig='BLK_DEV_SD
SCSI
{"BLOCK"=>"y"}
SATA_AHCI
SATA_AHCI_PLATFORM
ATA
{"PCI"=>"y"}
BTRFS_FS'
	export commit='58749022685aea90dfddfb9f8b2fcdc74dee6ec0'
	export ucode='0x28'
	export need_kconfig_hw='{"E1000E"=>"y"}
{"ATA_SFF"=>"y"}
{"ATA_BMDMA"=>"y"}
ATA_PIIX
DRM_I915'
	export bisect_dmesg=true
	export enqueue_time='2021-07-29 22:21:01 +0800'
	export _id='6102b95ecc8c84025e3e7593'
	export _rt='/result/xfstests/6HDD-btrfs-btrfs-group-20-ucode=0x28/lkp-hsw-d01/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0'
	export user='lkp'
	export compiler='gcc-9'
	export LKP_SERVER='internal-lkp-server'
	export head_commit='299b946d27c0a7c2618281d90ed875f3aa59f942'
	export base_commit='ff1176468d368232b684f75e82563369208bc371'
	export branch='linux-review/NeilBrown/expose-btrfs-subvols-in-mount-table-correctly/20210728-064502'
	export rootfs='debian-10.4-x86_64-20200603.cgz'
	export result_root='/result/xfstests/6HDD-btrfs-btrfs-group-20-ucode=0x28/lkp-hsw-d01/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/3'
	export scheduler_version='/lkp/lkp/.src-20210728-191903'
	export arch='x86_64'
	export max_uptime=2100
	export initrd='/osimage/debian/debian-10.4-x86_64-20200603.cgz'
	export bootloader_append='root=/dev/ram0
user=lkp
job=/lkp/jobs/scheduled/lkp-hsw-d01/xfstests-6HDD-btrfs-btrfs-group-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-58749022685aea90dfddfb9f8b2fcdc74dee6ec0-20210729-66142-ecdxp4-3.yaml
ARCH=x86_64
kconfig=x86_64-rhel-8.3
branch=linux-review/NeilBrown/expose-btrfs-subvols-in-mount-table-correctly/20210728-064502
commit=58749022685aea90dfddfb9f8b2fcdc74dee6ec0
BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/vmlinuz-5.13.0-rc2-00080-g58749022685a
max_uptime=2100
RESULT_ROOT=/result/xfstests/6HDD-btrfs-btrfs-group-20-ucode=0x28/lkp-hsw-d01/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/3
LKP_SERVER=internal-lkp-server
nokaslr
selinux=0
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_loglevel
console=tty0
earlyprintk=ttyS0,115200
console=ttyS0,115200
vga=normal
rw'
	export modules_initrd='/pkg/linux/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/modules.cgz'
	export bm_initrd='/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20210707.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/fs_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/xfstests_20210728.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/xfstests-x86_64-76d2a91-1_20210728.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz'
	export ucode_initrd='/osimage/ucode/intel-ucode-20210222.cgz'
	export lkp_initrd='/osimage/user/lkp/lkp-x86_64.cgz'
	export site='inn'
	export LKP_CGI_PORT=80
	export LKP_CIFS_PORT=139
	export last_kernel='5.14.0-rc3-00014-g7d549995d4e0'
	export repeat_to=6
	export queue_at_least_once=1
	export kernel='/pkg/linux/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/vmlinuz-5.13.0-rc2-00080-g58749022685a'
	export dequeue_time='2021-07-29 22:27:00 +0800'
	export job_initrd='/lkp/jobs/scheduled/lkp-hsw-d01/xfstests-6HDD-btrfs-btrfs-group-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-58749022685aea90dfddfb9f8b2fcdc74dee6ec0-20210729-66142-ecdxp4-3.cgz'

	[ -n "$LKP_SRC" ] ||
	export LKP_SRC=/lkp/${user:-lkp}/src
}

run_job()
{
	echo $$ > $TMP/run-job.pid

	. $LKP_SRC/lib/http.sh
	. $LKP_SRC/lib/job.sh
	. $LKP_SRC/lib/env.sh

	export_top_env

	run_setup nr_hdd=6 $LKP_SRC/setup/disk

	run_setup fs='btrfs' $LKP_SRC/setup/fs

	run_monitor $LKP_SRC/monitors/wrapper kmsg
	run_monitor $LKP_SRC/monitors/wrapper heartbeat
	run_monitor $LKP_SRC/monitors/wrapper meminfo
	run_monitor $LKP_SRC/monitors/wrapper oom-killer
	run_monitor $LKP_SRC/monitors/plain/watchdog

	run_test test='btrfs-group-20' $LKP_SRC/tests/wrapper xfstests
}

extract_stats()
{
	export stats_part_begin=
	export stats_part_end=

	env test='btrfs-group-20' $LKP_SRC/stats/wrapper xfstests
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper meminfo

	$LKP_SRC/stats/wrapper time xfstests.time
	$LKP_SRC/stats/wrapper dmesg
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper last_state
	$LKP_SRC/stats/wrapper stderr
	$LKP_SRC/stats/wrapper time
}

"$@"

[-- Attachment #4: dmesg.xz --]
[-- Type: application/x-xz, Size: 29496 bytes --]

[-- Attachment #5: xfstests --]
[-- Type: text/plain, Size: 3163 bytes --]

2021-07-29 14:27:50 export TEST_DIR=/fs/sdb1
2021-07-29 14:27:50 export TEST_DEV=/dev/sdb1
2021-07-29 14:27:50 export FSTYP=btrfs
2021-07-29 14:27:50 export SCRATCH_MNT=/fs/scratch
2021-07-29 14:27:50 mkdir /fs/scratch -p
2021-07-29 14:27:50 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdb5 /dev/sdb6"
2021-07-29 14:27:50 sed "s:^:btrfs/:" //lkp/benchmarks/xfstests/tests/btrfs-group-20
2021-07-29 14:27:50 ./check btrfs/200 btrfs/201 btrfs/202 btrfs/203 btrfs/204 btrfs/205 btrfs/207 btrfs/208 btrfs/209
FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 lkp-hsw-d01 5.13.0-rc2-00080-g58749022685a #1 SMP Wed Jul 28 21:12:27 CST 2021
MKFS_OPTIONS  -- /dev/sdb2
MOUNT_OPTIONS -- /dev/sdb2 /fs/scratch

btrfs/200	- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/200.out.bad)
    --- tests/btrfs/200.out	2021-07-28 06:20:35.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//btrfs/200.out.bad	2021-07-29 14:27:52.359748374 +0000
    @@ -10,8 +10,16 @@
     linked 65536/65536 bytes at offset 65536
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/incr'
    -At subvol SCRATCH_MNT/incr
    +ERROR: not on mount point: SCRATCH_MNT/incr
     At subvol base
    -At snapshot incr
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/200.out /lkp/benchmarks/xfstests/results//btrfs/200.out.bad'  to see the entire diff)
btrfs/201	 18s
btrfs/202	- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/202.out.bad)
    --- tests/btrfs/202.out	2021-07-28 06:20:35.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//btrfs/202.out.bad	2021-07-29 14:28:10.873746534 +0000
    @@ -2,3 +2,4 @@
     Create subvolume 'SCRATCH_MNT/a'
     Create subvolume 'SCRATCH_MNT/a/b'
     Create a snapshot of 'SCRATCH_MNT/a' in 'SCRATCH_MNT/c'
    +rm: cannot remove '/fs/scratch/c': Device or resource busy
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/202.out /lkp/benchmarks/xfstests/results//btrfs/202.out.bad'  to see the entire diff)
btrfs/203	- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/203.out.bad)
    --- tests/btrfs/203.out	2021-07-28 06:20:35.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//btrfs/203.out.bad	2021-07-29 14:28:12.189746403 +0000
    @@ -16,10 +16,10 @@
     linked 196608/196608 bytes at offset 196608
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/incr'
    -At subvol SCRATCH_MNT/incr
    +ERROR: not on mount point: SCRATCH_MNT/incr
     File foobar digest in the original filesystem:
     2b76b23b62fdbbbcae1ee37eec84fd7d
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/203.out /lkp/benchmarks/xfstests/results//btrfs/203.out.bad'  to see the entire diff)
btrfs/204	 2s
btrfs/205	 2s
btrfs/207	 49s
btrfs/208	[not run] /bin/btrfs too old (must support subvolume delete --subvolid)
btrfs/209	 1s
Ran: btrfs/200 btrfs/201 btrfs/202 btrfs/203 btrfs/204 btrfs/205 btrfs/207 btrfs/208 btrfs/209
Not run: btrfs/208
Failures: btrfs/200 btrfs/202 btrfs/203
Failed 3 of 9 tests


[-- Attachment #6: job.yaml --]
[-- Type: text/plain, Size: 4659 bytes --]

---
:#! jobs/xfstests-btrfs.yaml:
suite: xfstests
testcase: xfstests
category: functional
need_memory: 3G
disk: 6HDD
fs: btrfs
xfstests:
  test: btrfs-group-20
job_origin: xfstests-btrfs.yaml
:#! queue options:
queue_cmdline_keys:
- branch
- commit
queue: bisect
testbox: lkp-hsw-d01
tbox_group: lkp-hsw-d01
kconfig: x86_64-rhel-8.3
submit_id: 6101dd9f45ecd1168c4f4344
job_file: "/lkp/jobs/scheduled/lkp-hsw-d01/xfstests-6HDD-btrfs-btrfs-group-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-58749022685aea90dfddfb9f8b2fcdc74dee6ec0-20210729-5772-1wgfnu3-0.yaml"
id: 63f9dc3434d6977a9ecd4d77e01096ba64f782bb
queuer_version: "/lkp-src"
:#! hosts/lkp-hsw-d01:
model: Haswell
nr_node: 1
nr_cpu: 8
memory: 8G
nr_ssd_partitions: 1
nr_hdd_partitions: 6
hdd_partitions: "/dev/disk/by-id/ata-ST4000NM0035-1V4107_ZC12HHHW-part*"
ssd_partitions: "/dev/disk/by-id/ata-INTEL_SSDSC2BB800G4_PHWL42040015800RGN-part2"
swap_partitions:
rootfs_partition: "/dev/disk/by-id/ata-INTEL_SSDSC2BB800G4_PHWL42040015800RGN-part1"
brand: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
:#! include/category/functional:
kmsg:
heartbeat:
meminfo:
:#! include/disk/nr_hdd:
need_kconfig:
- BLK_DEV_SD
- SCSI
- BLOCK: y
- SATA_AHCI
- SATA_AHCI_PLATFORM
- ATA
- PCI: y
- BTRFS_FS
:#! include/queue/cyclic:
commit: 58749022685aea90dfddfb9f8b2fcdc74dee6ec0
:#! include/testbox/lkp-hsw-d01:
ucode: '0x28'
need_kconfig_hw:
- E1000E: y
- ATA_SFF: y
- ATA_BMDMA: y
- ATA_PIIX
- DRM_I915
bisect_dmesg: true
:#! include/fs/OTHERS:
enqueue_time: 2021-07-29 06:43:43.576479988 +08:00
_id: 6101dd9f45ecd1168c4f4344
_rt: "/result/xfstests/6HDD-btrfs-btrfs-group-20-ucode=0x28/lkp-hsw-d01/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0"
:#! schedule options:
user: lkp
compiler: gcc-9
LKP_SERVER: internal-lkp-server
head_commit: 299b946d27c0a7c2618281d90ed875f3aa59f942
base_commit: ff1176468d368232b684f75e82563369208bc371
branch: linux-devel/devel-hourly-20210728-182605
rootfs: debian-10.4-x86_64-20200603.cgz
result_root: "/result/xfstests/6HDD-btrfs-btrfs-group-20-ucode=0x28/lkp-hsw-d01/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/0"
scheduler_version: "/lkp/lkp/.src-20210728-191903"
arch: x86_64
max_uptime: 2100
initrd: "/osimage/debian/debian-10.4-x86_64-20200603.cgz"
bootloader_append:
- root=/dev/ram0
- user=lkp
- job=/lkp/jobs/scheduled/lkp-hsw-d01/xfstests-6HDD-btrfs-btrfs-group-20-ucode=0x28-debian-10.4-x86_64-20200603.cgz-58749022685aea90dfddfb9f8b2fcdc74dee6ec0-20210729-5772-1wgfnu3-0.yaml
- ARCH=x86_64
- kconfig=x86_64-rhel-8.3
- branch=linux-devel/devel-hourly-20210728-182605
- commit=58749022685aea90dfddfb9f8b2fcdc74dee6ec0
- BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/vmlinuz-5.13.0-rc2-00080-g58749022685a
- max_uptime=2100
- RESULT_ROOT=/result/xfstests/6HDD-btrfs-btrfs-group-20-ucode=0x28/lkp-hsw-d01/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/0
- LKP_SERVER=internal-lkp-server
- nokaslr
- selinux=0
- debug
- apic=debug
- sysrq_always_enabled
- rcupdate.rcu_cpu_stall_timeout=100
- net.ifnames=0
- printk.devkmsg=on
- panic=-1
- softlockup_panic=1
- nmi_watchdog=panic
- oops=panic
- load_ramdisk=2
- prompt_ramdisk=0
- drbd.minor_count=8
- systemd.log_level=err
- ignore_loglevel
- console=tty0
- earlyprintk=ttyS0,115200
- console=ttyS0,115200
- vga=normal
- rw
modules_initrd: "/pkg/linux/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/modules.cgz"
bm_initrd: "/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20210707.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/fs_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/xfstests_20210728.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/xfstests-x86_64-76d2a91-1_20210728.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz"
ucode_initrd: "/osimage/ucode/intel-ucode-20210222.cgz"
lkp_initrd: "/osimage/user/lkp/lkp-x86_64.cgz"
site: inn
:#! /lkp/lkp/.src-20210728-191903/include/site/inn:
LKP_CGI_PORT: 80
LKP_CIFS_PORT: 139
oom-killer:
watchdog:
:#! runtime status:
last_kernel: 4.20.0
:#! user overrides:
kernel: "/pkg/linux/x86_64-rhel-8.3/gcc-9/58749022685aea90dfddfb9f8b2fcdc74dee6ec0/vmlinuz-5.13.0-rc2-00080-g58749022685a"
dequeue_time: 2021-07-29 06:45:38.715432586 +08:00
job_state: finished
loadavg: 1.03 0.51 0.20 1/179 18970
start_time: '1627512391'
end_time: '1627512477'
version: "/lkp/lkp/.src-20210728-191934:cceb8212:b57eb1679"

[-- Attachment #7: reproduce --]
[-- Type: text/plain, Size: 1004 bytes --]

dmsetup remove_all
wipefs -a --force /dev/sdb1
wipefs -a --force /dev/sdb2
wipefs -a --force /dev/sdb3
wipefs -a --force /dev/sdb4
wipefs -a --force /dev/sdb5
wipefs -a --force /dev/sdb6
mkfs -t btrfs /dev/sdb2
mkfs -t btrfs /dev/sdb1
mkfs -t btrfs /dev/sdb3
mkfs -t btrfs /dev/sdb4
mkfs -t btrfs /dev/sdb6
mkfs -t btrfs /dev/sdb5
mkdir -p /fs/sdb1
mount -t btrfs /dev/sdb1 /fs/sdb1
mkdir -p /fs/sdb2
mount -t btrfs /dev/sdb2 /fs/sdb2
mkdir -p /fs/sdb3
mount -t btrfs /dev/sdb3 /fs/sdb3
mkdir -p /fs/sdb4
mount -t btrfs /dev/sdb4 /fs/sdb4
mkdir -p /fs/sdb5
mount -t btrfs /dev/sdb5 /fs/sdb5
mkdir -p /fs/sdb6
mount -t btrfs /dev/sdb6 /fs/sdb6
export TEST_DIR=/fs/sdb1
export TEST_DEV=/dev/sdb1
export FSTYP=btrfs
export SCRATCH_MNT=/fs/scratch
mkdir /fs/scratch -p
export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdb5 /dev/sdb6"
sed "s:^:btrfs/:" //lkp/benchmarks/xfstests/tests/btrfs-group-20
./check btrfs/200 btrfs/201 btrfs/202 btrfs/203 btrfs/204 btrfs/205 btrfs/207 btrfs/208 btrfs/209

^ permalink raw reply	[flat|nested] 123+ messages in thread

* A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-07-30  7:59               ` Miklos Szeredi
@ 2021-08-02  4:18                 ` NeilBrown
  2021-08-02  5:25                   ` Al Viro
                                     ` (2 more replies)
  0 siblings, 3 replies; 123+ messages in thread
From: NeilBrown @ 2021-08-02  4:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Fri, 30 Jul 2021, Miklos Szeredi wrote:
> On Fri, 30 Jul 2021 at 09:34, NeilBrown <neilb@suse.de> wrote:
> 
> > But I'm curious about your reference to "some sort of subvolume
> > structure that the VFS knows about".  Do you have any references, or can
> > you suggest a search term I could try?
> 
> Found this:
> https://lore.kernel.org/linux-fsdevel/20180508180436.716-1-mfasheh@suse.de/
> 

Excellent, thanks.  Very useful.

OK.  Time for a third perspective.

With its current framing the problem is unsolvable.  So it needs to be
reframed.

By "current framing", I mean that we are trying to get btrfs to behave
in a way that meets current user-space expectations.  Specially, the
expectation that each object in any filesystem can be uniquely
identified by a 64bit inode number.  btrfs provides functionality which
needs more than 64bits.  So it simple does not fit.  btrfs currently
fudges with device numbers to hide the problem.  This is at best an
incomplete solution, and is already being shown to be insufficient.

Therefore we need to change user-space expectations.  This has been done
before multiple times - often by breaking things and leaving it up to
user-space to fix it.  My favourite example is that NFSv2 broke the
creation of lock files with O_CREAT|O_EXCL.  USER-space starting using
hard-links to achieve the same result.  When NFSv3 added reliable
O_CREAT|O_EXCL support, it hardly mattered.... but I digress.

It think we need to bite-the-bullet and decide that 64bits is not
enough, and in fact no number of bits will ever be enough.  overlayfs
makes this clear.  overlayfs merges multiple filesystems, and so needs
strictly more bits to uniquely identify all inodes than any of the
filesystems use.  Currently it over-loads the high bits and hopes the
filesystem doesn't use them.

The "obvious" choice for a replacement is the file handle provided by
name_to_handle_at() (falling back to st_ino if name_to_handle_at isn't
supported by the filesystem).  This returns an extensible opaque
byte-array.  It is *already* more reliable than st_ino.  Comparing
st_ino is only a reliable way to check if two files are the same if you
have both of them open.  If you don't, then one of the files might have
been deleted and the inode number reused for the other.  A filehandle
contains a generation number which protects against this.

So I think we need to strongly encourage user-space to start using
name_to_handle_at() whenever there is a need to test if two things are
the same.

This frees us to be a little less precise about assuring st_ino is
always unique, but only a little.  We still want to minimize conflicts
and avoid them in common situations.

A filehandle typically has some bytes used to locate the inode -
"location" - and some to validate it - "generation".  In general, st_ino
must now be seen as a hash of the "location".  It could be a generic hash
(xxhash? jhash?) or it could be a careful xor of the bits.

For btrfs, the "location" is root.objectid ++ file.objectid.  I think
the inode should become (file.objectid ^ swab64(root.objectid)).  This
will provide numbers that are unique until you get very large subvols,
and very many subvols.  It also ensures that two inodes in the same
subvol will be guaranteed to have different st_ino.

This will quickly cause problems for overlayfs as it means that if btrfs
is used with overlayfs, the top few bits won't be zero.  Possibly btrfs
could be polite and shift the swab64(root.objectid) down 8 bits to make
room.  Possible overlayfs should handle this case (top N-bits not all
zero), and switch to a generic hash of the inode number (or preferably
the filehandle) to (64-N bits).

If we convince user-space to use filehandles to compare objects, the NFS
problems I initially was trying to address go away.  Even before that,
if btrfs switches to a hashed (i.e. xor) inode number, then the problems
also go away.  but they aren't the only problems here.

Accessing the fhandle isn't always possible.  For example reading
/proc/locks reports major:minor:inode-number for each file (This is the
major:minor from the superblock, so btrfs internal dev numbers aren't
used).  The filehandle is simply not available.  I think the only way
to address this is to create a new file. "/proc/locks2" :-)
Similarly the "lock:" lines in /proc/$PID/fdinfo/$FD need to be duplicated
as "lock2:" lines with filehandle instead of inode number.  Ditto for
'inotify:' lines and possibly others.

Similarly /proc/$PID/maps contains the inode number with no fhandle.
The situation isn't so bad there as there is a path name, and you can
even call name_to_handle_at("/proc/$PID/map_files/$RANGE") to get the
fhandle.  It might be better to provide a new file though.

Next we come to the use of different device numbers in the one btrfs
filesystem.  I'm of the opinion that this was an unfortunately choice
that we need to deprecate.  Tools that use fhandle won't need it to
differentiate inodes, but there is more to the story than just that
need.

As has been mentioned, people depend on "du -x" and "find -mount" (aka
"-xdev") to stay within a "subvol".  We need to provide a clean
alternate before discouraging that usage.

xfs, ext4, fuse, and f2fs each (can) maintain a "project id" for each
inode, which effectively groups inodes into a tree.  This is used for
project quotas.  At one level this is conceptually very similar to the
btrfs subtree.root.objectid.  It is different in that it is only 32 bits
(:-[) and is mapped between user name-spaces like uids and gids.  It is
similar in that it identifies a group of inodes that are accounted
together and are (generally) contiguous in a tree.

If we encouraged "du" to have a "--proj" option (-j) which stays within
a project, and gave a similar option to find, that could be broadly
useful.  Then if btrfs provided the subvol objectid as fsx_projid
(available in FS_IOC_FSGETXATTR ioctl), then "du --proj" on btrfs would
stay in a subvol.  Longer term it might make sense to add a 64bit
project-id to statx.  I don't think it would make sense for btrfs to
have a 'project' concept that is different from the "subvolume".

It would be cool if "df" could have a "--proj" (or similar) flag so that
it would report the usage of a "subtree" (given a path).  Unfortunately
there isn't really an interface for this.  Going through the quota
system might be nice, I don't think it would work.

Another thought about btrfs device numbers is that, providing inode
numbers are (nearly) unique, we don't really need more than 2.  A btrfs
filesystem could allocate 2 anon device numbers.  One would be assigned
to the root, and each subvolume would get whichever device number its
parent doesn't have.  This would stop "du -x" and "find -mount" and
similar from crossing into subvols.  There could be a mount option to
select between "1", "2", and "many" device numbers for a filesystem.

- I note that cephfs place games with st_dev too....  I wonder if we can
  learn anything from that. 
- audit uses sb->s_dev without asking the filesystem.  So it won't
  handle  btrfs correctly.  I wonder if there is room for it to use
  file handles.

I accept that I'm proposing some BIG changes here, and they might break
things.  But btrfs is already broken in various ways.  I think we need a
goal to work towards which will eventually remove all breakage and still
have room for expansion.  I think that must include:

- providing as-unique-as-practical inode numbers across the whole
  filesystem, and deprecating the internal use of different device
  numbers.  Make it possible to mount without them ASAP, and aim to
  make that the default eventually.
- working with user-space tool/library developers to use
  name_to_handle_at() to identify inodes, only using st_ino
  as a fall-back
- adding filehandles to various /proc etc files as needed, either
  duplicating lines or duplicating files.  And helping application which
  use these files to migrate (I would *NOT* change the dev numbers in
  the current file to report the internal btrfs dev numbers the way that
  SUSE does.  I would prefer that current breakage could be used to
  motivate developers towards depending instead on fhandles).
- exporting subtree (aka subvol) id to user-space, possibly paralleling
  proj_id in some way, and extending various tools to understand
  subtrees

Who's with me??

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
@ 2021-08-02  5:25                   ` Al Viro
  2021-08-02  5:40                     ` NeilBrown
  2021-08-02  7:15                   ` Martin Steigerwald
  2021-08-02 12:39                   ` J. Bruce Fields
  2 siblings, 1 reply; 123+ messages in thread
From: Al Viro @ 2021-08-02  5:25 UTC (permalink / raw)
  To: NeilBrown
  Cc: Miklos Szeredi, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:

> It think we need to bite-the-bullet and decide that 64bits is not
> enough, and in fact no number of bits will ever be enough.  overlayfs
> makes this clear.

Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
early...

> So I think we need to strongly encourage user-space to start using
> name_to_handle_at() whenever there is a need to test if two things are
> the same.

... and forgetting the inconvenient facts, such as that two different
fhandles may correspond to the same object.

> I accept that I'm proposing some BIG changes here, and they might break
> things.  But btrfs is already broken in various ways.  I think we need a
> goal to work towards which will eventually remove all breakage and still
> have room for expansion.  I think that must include:
> 
> - providing as-unique-as-practical inode numbers across the whole
>   filesystem, and deprecating the internal use of different device
>   numbers.  Make it possible to mount without them ASAP, and aim to
>   make that the default eventually.
> - working with user-space tool/library developers to use
>   name_to_handle_at() to identify inodes, only using st_ino
>   as a fall-back
> - adding filehandles to various /proc etc files as needed, either
>   duplicating lines or duplicating files.  And helping application which
>   use these files to migrate (I would *NOT* change the dev numbers in
>   the current file to report the internal btrfs dev numbers the way that
>   SUSE does.  I would prefer that current breakage could be used to
>   motivate developers towards depending instead on fhandles).
> - exporting subtree (aka subvol) id to user-space, possibly paralleling
>   proj_id in some way, and extending various tools to understand
>   subtrees
> 
> Who's with me??

Cf. "Poe Law"...

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  5:25                   ` Al Viro
@ 2021-08-02  5:40                     ` NeilBrown
  2021-08-02  7:54                       ` Amir Goldstein
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-02  5:40 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Mon, 02 Aug 2021, Al Viro wrote:
> On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> 
> > It think we need to bite-the-bullet and decide that 64bits is not
> > enough, and in fact no number of bits will ever be enough.  overlayfs
> > makes this clear.
> 
> Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
> early...
> 
> > So I think we need to strongly encourage user-space to start using
> > name_to_handle_at() whenever there is a need to test if two things are
> > the same.
> 
> ... and forgetting the inconvenient facts, such as that two different
> fhandles may correspond to the same object.

Can they?  They certainly can if the "connectable" flag is passed.
name_to_handle_at() cannot set that flag.
nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite
perfect.  However it is the best that can be done over NFS.

Or is there some other situation where two different filehandles can be
reported for the same inode?

Do you have a better suggestion?

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
  2021-08-02  5:25                   ` Al Viro
@ 2021-08-02  7:15                   ` Martin Steigerwald
  2021-08-02 21:40                     ` NeilBrown
  2021-08-02 12:39                   ` J. Bruce Fields
  2 siblings, 1 reply; 123+ messages in thread
From: Martin Steigerwald @ 2021-08-02  7:15 UTC (permalink / raw)
  To: Miklos Szeredi, NeilBrown
  Cc: Al Viro, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

Hi Neil!

Wow, this is a bit overwhelming for me. However, I got a very specific 
question for userspace developers in order to probably provide valuable 
input to the KDE Baloo desktop search developers:

NeilBrown - 02.08.21, 06:18:29 CEST:
> The "obvious" choice for a replacement is the file handle provided by
> name_to_handle_at() (falling back to st_ino if name_to_handle_at isn't
> supported by the filesystem).  This returns an extensible opaque
> byte-array.  It is *already* more reliable than st_ino.  Comparing
> st_ino is only a reliable way to check if two files are the same if
> you have both of them open.  If you don't, then one of the files
> might have been deleted and the inode number reused for the other.  A
> filehandle contains a generation number which protects against this.
> 
> So I think we need to strongly encourage user-space to start using
> name_to_handle_at() whenever there is a need to test if two things are
> the same.

How could that work for Baloo's use case to see whether a file it 
encounters is already in its database or whether it is a new file.

Would Baloo compare the whole file handle or just certain fields or make a 
hash of the filehandle or what ever? Could you, in pseudo code or 
something, describe the approach you'd suggest. I'd then share it on:

Bug 438434 - Baloo appears to be indexing twice the number of files than 
are actually in my home directory

https://bugs.kde.org/438434

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  5:40                     ` NeilBrown
@ 2021-08-02  7:54                       ` Amir Goldstein
  2021-08-02 13:53                         ` Josef Bacik
                                           ` (2 more replies)
  0 siblings, 3 replies; 123+ messages in thread
From: Amir Goldstein @ 2021-08-02  7:54 UTC (permalink / raw)
  To: NeilBrown
  Cc: Al Viro, Miklos Szeredi, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	linux-fsdevel, Linux NFS list, Btrfs BTRFS

On Mon, Aug 2, 2021 at 8:41 AM NeilBrown <neilb@suse.de> wrote:
>
> On Mon, 02 Aug 2021, Al Viro wrote:
> > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> >
> > > It think we need to bite-the-bullet and decide that 64bits is not
> > > enough, and in fact no number of bits will ever be enough.  overlayfs
> > > makes this clear.
> >
> > Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
> > early...
> >
> > > So I think we need to strongly encourage user-space to start using
> > > name_to_handle_at() whenever there is a need to test if two things are
> > > the same.
> >
> > ... and forgetting the inconvenient facts, such as that two different
> > fhandles may correspond to the same object.
>
> Can they?  They certainly can if the "connectable" flag is passed.
> name_to_handle_at() cannot set that flag.
> nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite
> perfect.  However it is the best that can be done over NFS.
>
> Or is there some other situation where two different filehandles can be
> reported for the same inode?
>
> Do you have a better suggestion?
>

Neil,

I think the plan of "changing the world" is not very realistic.
Sure, *some* tools can be changed, but all of them?

I went back to read your initial cover letter to understand the
problem and what I mostly found there was that the view of
/proc/x/mountinfo was hiding information that is important for
some tools to understand what is going on with btrfs subvols.

Well I am not a UNIX history expert, but I suppose that
/proc/PID/mountinfo was created because /proc/mounts and
/proc/PID/mounts no longer provided tool with all the information
about Linux mounts.

Maybe it's time for a new interface to query the more advanced
sb/mount topology? fsinfo() maybe? With mount2 compatible API for
traversing mounts that is not limited to reporting all entries inside
a single page. I suppose we could go for some hierarchical view
under /proc/PID/mounttree. I don't know - new API is hard.

In any case, instead of changing st_dev and st_ino or changing the
world to work with file handles, why not add inode generation (and
maybe subvol id) to statx().
filesystem that care enough will provide this information and tools that
care enough will use it.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
  2021-08-02  5:25                   ` Al Viro
  2021-08-02  7:15                   ` Martin Steigerwald
@ 2021-08-02 12:39                   ` J. Bruce Fields
  2021-08-02 20:32                     ` Patrick Goetz
  2021-08-02 21:10                     ` NeilBrown
  2 siblings, 2 replies; 123+ messages in thread
From: J. Bruce Fields @ 2021-08-02 12:39 UTC (permalink / raw)
  To: NeilBrown
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> the inode should become (file.objectid ^ swab64(root.objectid)).  This
> will provide numbers that are unique until you get very large subvols,
> and very many subvols.

If you snapshot a filesystem, I'd expect, at least by default, that
inodes in the snapshot to stay the same as in the snapshotted
filesystem.

--b.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  7:54                       ` Amir Goldstein
@ 2021-08-02 13:53                         ` Josef Bacik
  2021-08-03 22:29                           ` Qu Wenruo
  2021-08-02 14:47                         ` Frank Filz
  2021-08-02 21:24                         ` NeilBrown
  2 siblings, 1 reply; 123+ messages in thread
From: Josef Bacik @ 2021-08-02 13:53 UTC (permalink / raw)
  To: Amir Goldstein, NeilBrown
  Cc: Al Viro, Miklos Szeredi, Christoph Hellwig, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On 8/2/21 3:54 AM, Amir Goldstein wrote:
> On Mon, Aug 2, 2021 at 8:41 AM NeilBrown <neilb@suse.de> wrote:
>>
>> On Mon, 02 Aug 2021, Al Viro wrote:
>>> On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
>>>
>>>> It think we need to bite-the-bullet and decide that 64bits is not
>>>> enough, and in fact no number of bits will ever be enough.  overlayfs
>>>> makes this clear.
>>>
>>> Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
>>> early...
>>>
>>>> So I think we need to strongly encourage user-space to start using
>>>> name_to_handle_at() whenever there is a need to test if two things are
>>>> the same.
>>>
>>> ... and forgetting the inconvenient facts, such as that two different
>>> fhandles may correspond to the same object.
>>
>> Can they?  They certainly can if the "connectable" flag is passed.
>> name_to_handle_at() cannot set that flag.
>> nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite
>> perfect.  However it is the best that can be done over NFS.
>>
>> Or is there some other situation where two different filehandles can be
>> reported for the same inode?
>>
>> Do you have a better suggestion?
>>
> 
> Neil,
> 
> I think the plan of "changing the world" is not very realistic.
> Sure, *some* tools can be changed, but all of them?
> 
> I went back to read your initial cover letter to understand the
> problem and what I mostly found there was that the view of
> /proc/x/mountinfo was hiding information that is important for
> some tools to understand what is going on with btrfs subvols.
> 
> Well I am not a UNIX history expert, but I suppose that
> /proc/PID/mountinfo was created because /proc/mounts and
> /proc/PID/mounts no longer provided tool with all the information
> about Linux mounts.
> 
> Maybe it's time for a new interface to query the more advanced
> sb/mount topology? fsinfo() maybe? With mount2 compatible API for
> traversing mounts that is not limited to reporting all entries inside
> a single page. I suppose we could go for some hierarchical view
> under /proc/PID/mounttree. I don't know - new API is hard.
> 
> In any case, instead of changing st_dev and st_ino or changing the
> world to work with file handles, why not add inode generation (and
> maybe subvol id) to statx().
> filesystem that care enough will provide this information and tools that
> care enough will use it.
> 

Can y'all wait till I'm back from vacation, goddamn ;)

This is what I'm aiming for, I spent some time looking at how many 
places we string parse /proc/<whatever>/mounts and my head hurts.

Btrfs already has a reasonable solution for this, we have UUID's for 
everything.  UUID's aren't a strictly btrfs thing either, all the file 
systems have some sort of UUID identifier, hell its built into blkid.  I 
would love if we could do a better job about letting applications query 
information about where they are.  And we could expose this with the 
relatively common UUID format.  You ask what fs you're in, you get the 
FS UUID, and then if you're on Btrfs you get the specific subvolume UUID 
you're in.  That way you could do more fancy things like know if you've 
wandered into a new file system completely or just a different subvolume.

We have to keep the st_ino/st_dev thing for backwards compatibility, but 
make it easier to get more info out of the file system.

We could in theory expose just the subvolid also, since that's a nice 
simple u64, but it limits our ability to do new fancy shit in the 
future.  It's not a bad solution, but like I said I think we need to 
take a step back and figure out what problem we're specifically trying 
to solve, and work from there.  Starting from automounts and working our 
way back is not going very well.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  7:54                       ` Amir Goldstein
  2021-08-02 13:53                         ` Josef Bacik
@ 2021-08-02 14:47                         ` Frank Filz
  2021-08-02 21:24                         ` NeilBrown
  2 siblings, 0 replies; 123+ messages in thread
From: Frank Filz @ 2021-08-02 14:47 UTC (permalink / raw)
  To: 'Amir Goldstein', 'NeilBrown'
  Cc: 'Al Viro', 'Miklos Szeredi',
	'Christoph Hellwig', 'Josef Bacik',
	'J. Bruce Fields', 'Chuck Lever',
	'Chris Mason', 'David Sterba',
	'linux-fsdevel', 'Linux NFS list',
	'Btrfs BTRFS'

> In any case, instead of changing st_dev and st_ino or changing the world to
> work with file handles, why not add inode generation (and maybe subvol id) to
> statx().
> filesystem that care enough will provide this information and tools that care
> enough will use it.

And how is NFS (especially V2 and V3 - V4.2 at least can add attributes) going to provide these values for statx if applications are going to start depending on them, and especially, will this work for those applications that need to distinguish inodes that are working on an NFS exported btrfs filesystem?

Frank


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 12:39                   ` J. Bruce Fields
@ 2021-08-02 20:32                     ` Patrick Goetz
  2021-08-02 20:41                       ` J. Bruce Fields
  2021-08-02 21:10                     ` NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: Patrick Goetz @ 2021-08-02 20:32 UTC (permalink / raw)
  To: J. Bruce Fields, NeilBrown
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS



On 8/2/21 7:39 AM, J. Bruce Fields wrote:
> On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
>> For btrfs, the "location" is root.objectid ++ file.objectid.  I think
>> the inode should become (file.objectid ^ swab64(root.objectid)).  This
>> will provide numbers that are unique until you get very large subvols,
>> and very many subvols.
> 
> If you snapshot a filesystem, I'd expect, at least by default, that
> inodes in the snapshot to stay the same as in the snapshotted
> filesystem.
> 
> --b.
> 

For copy on right systems like ZFS, how could it be otherwise?


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 20:32                     ` Patrick Goetz
@ 2021-08-02 20:41                       ` J. Bruce Fields
  0 siblings, 0 replies; 123+ messages in thread
From: J. Bruce Fields @ 2021-08-02 20:41 UTC (permalink / raw)
  To: Patrick Goetz
  Cc: NeilBrown, Miklos Szeredi, Al Viro, Christoph Hellwig,
	Josef Bacik, Chuck Lever, Chris Mason, David Sterba,
	linux-fsdevel, Linux NFS list, Btrfs BTRFS

On Mon, Aug 02, 2021 at 03:32:45PM -0500, Patrick Goetz wrote:
> On 8/2/21 7:39 AM, J. Bruce Fields wrote:
> >On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> >>For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> >>the inode should become (file.objectid ^ swab64(root.objectid)).  This
> >>will provide numbers that are unique until you get very large subvols,
> >>and very many subvols.
> >
> >If you snapshot a filesystem, I'd expect, at least by default, that
> >inodes in the snapshot to stay the same as in the snapshotted
> >filesystem.
> 
> For copy on right systems like ZFS, how could it be otherwise?

I'm reacting to Neil's suggesting above, which (as I understand it)
would result in different inode numbers.

--b.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 12:39                   ` J. Bruce Fields
  2021-08-02 20:32                     ` Patrick Goetz
@ 2021-08-02 21:10                     ` NeilBrown
  2021-08-02 21:50                       ` J. Bruce Fields
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-02 21:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Mon, 02 Aug 2021, J. Bruce Fields wrote:
> On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> > For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> > the inode should become (file.objectid ^ swab64(root.objectid)).  This
> > will provide numbers that are unique until you get very large subvols,
> > and very many subvols.
> 
> If you snapshot a filesystem, I'd expect, at least by default, that
> inodes in the snapshot to stay the same as in the snapshotted
> filesystem.

As I said: we need to challenge and revise user-space (and meat-space)
expectations. 

In btrfs, you DO NOT snapshot a FILESYSTEM.  Rather, you effectively
create a 'reflink' for a subtree (only works on subtrees that have been
correctly created with the poorly named "btrfs subvolume" command).

As with any reflink, the original has the same inode number that it did
before, the new version has a different inode number (though in current
BTRFS, half of the inode number is hidden from user-space, so it looks
like the inode number hasn't changed).

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  7:54                       ` Amir Goldstein
  2021-08-02 13:53                         ` Josef Bacik
  2021-08-02 14:47                         ` Frank Filz
@ 2021-08-02 21:24                         ` NeilBrown
  2 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-08-02 21:24 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Al Viro, Miklos Szeredi, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	linux-fsdevel, Linux NFS list, Btrfs BTRFS

On Mon, 02 Aug 2021, Amir Goldstein wrote:
> On Mon, Aug 2, 2021 at 8:41 AM NeilBrown <neilb@suse.de> wrote:
> >
> > On Mon, 02 Aug 2021, Al Viro wrote:
> > > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> > >
> > > > It think we need to bite-the-bullet and decide that 64bits is not
> > > > enough, and in fact no number of bits will ever be enough.  overlayfs
> > > > makes this clear.
> > >
> > > Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
> > > early...
> > >
> > > > So I think we need to strongly encourage user-space to start using
> > > > name_to_handle_at() whenever there is a need to test if two things are
> > > > the same.
> > >
> > > ... and forgetting the inconvenient facts, such as that two different
> > > fhandles may correspond to the same object.
> >
> > Can they?  They certainly can if the "connectable" flag is passed.
> > name_to_handle_at() cannot set that flag.
> > nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite
> > perfect.  However it is the best that can be done over NFS.
> >
> > Or is there some other situation where two different filehandles can be
> > reported for the same inode?
> >
> > Do you have a better suggestion?
> >
> 
> Neil,
> 
> I think the plan of "changing the world" is not very realistic.

I disagree.  It has happened before, it will happen again.  The only
difference about my proposal is that I'm suggesting the change be
proactive rather than reactive.

> Sure, *some* tools can be changed, but all of them?

We only need to change the tools that notice there is a problem.  So it
is important to minimize the effect on existing tools, even when we
cannot reduce it to zero.  We then fix things that are likely to see a
problem, or that actually do.  And we clearly document the behaviour and
how to deal with it, for code that we cannot directly affect.

Remember: there is ALREADY breakage that has been fixed.  btrfs does
*not* behave like a "normal" filesystem.  Nor does NFS.  Multiple tools
have been adjusted to work with them.  Let's not pretend that will never
happen again, but instead use the dynamic to drive evolution in the way
we choose.

> 
> I went back to read your initial cover letter to understand the
> problem and what I mostly found there was that the view of
> /proc/x/mountinfo was hiding information that is important for
> some tools to understand what is going on with btrfs subvols.

That was where I started, but not where I ended.  There are *lots* of
places that currently report inconsistent information for btrfs subvols.

> 
> Well I am not a UNIX history expert, but I suppose that
> /proc/PID/mountinfo was created because /proc/mounts and
> /proc/PID/mounts no longer provided tool with all the information
> about Linux mounts.
> 
> Maybe it's time for a new interface to query the more advanced
> sb/mount topology? fsinfo() maybe? With mount2 compatible API for
> traversing mounts that is not limited to reporting all entries inside
> a single page. I suppose we could go for some hierarchical view
> under /proc/PID/mounttree. I don't know - new API is hard.

Yes, exactly - but not just for mounts.  Yes, we need new APIs (Because
the old ones have been broken in various ways).  That is exactly what
I'm proposing.  But "fixing" mountinfo turns out to be little more than
rearranging deck-chairs on the Titanic.

> 
> In any case, instead of changing st_dev and st_ino or changing the
> world to work with file handles, why not add inode generation (and
> maybe subvol id) to statx().

The enormous benefit of filehandles is that they are supported by
kernels running today.  As others have commented, they also work over
NFS.
But I would be quite happy to see more information made available
through statx - providing the meaning of that information was clearly
specified - both what can be assumed about it and what cannot.

Thanks,
NeilBrown


> filesystem that care enough will provide this information and tools that
> care enough will use it.
> 
> Thanks,
> Amir.
> 
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02  7:15                   ` Martin Steigerwald
@ 2021-08-02 21:40                     ` NeilBrown
  0 siblings, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-08-02 21:40 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	linux-fsdevel, Linux NFS list, Btrfs BTRFS

On Mon, 02 Aug 2021, Martin Steigerwald wrote:
> Hi Neil!
> 
> Wow, this is a bit overwhelming for me. However, I got a very specific 
> question for userspace developers in order to probably provide valuable 
> input to the KDE Baloo desktop search developers:
> 
> NeilBrown - 02.08.21, 06:18:29 CEST:
> > The "obvious" choice for a replacement is the file handle provided by
> > name_to_handle_at() (falling back to st_ino if name_to_handle_at isn't
> > supported by the filesystem).  This returns an extensible opaque
> > byte-array.  It is *already* more reliable than st_ino.  Comparing
> > st_ino is only a reliable way to check if two files are the same if
> > you have both of them open.  If you don't, then one of the files
> > might have been deleted and the inode number reused for the other.  A
> > filehandle contains a generation number which protects against this.
> > 
> > So I think we need to strongly encourage user-space to start using
> > name_to_handle_at() whenever there is a need to test if two things are
> > the same.
> 
> How could that work for Baloo's use case to see whether a file it 
> encounters is already in its database or whether it is a new file.
> 
> Would Baloo compare the whole file handle or just certain fields or make a 
> hash of the filehandle or what ever? Could you, in pseudo code or 
> something, describe the approach you'd suggest. I'd then share it on:

Yes, the whole filehandle.

 struct file_handle {
        unsigned int handle_bytes; /* Size of f_handle [in, out] */
        int           handle_type;    /* Handle type [out] */
        unsigned char f_handle[0]; /* File identifier (sized by
                                     caller) [out] */
 };

i.e.  compare handle_type, handle_bytes, and handle_bytes worth of
f_handle.
This file_handle is local to the filesytem.  Two different filesystems
can use the same filehandle for different files.  So the identity of the
filesystem need to be combined with the file_handle.

> 
> Bug 438434 - Baloo appears to be indexing twice the number of files than 
> are actually in my home directory
> 
> https://bugs.kde.org/438434

This bug wouldn't be address by using the filehandle.  Using a
filehandle allows you to compare two files within a single filesystem.
This bug is about comparing two filesystems either side of a reboot, to
see if they are the same.

As has already been mentioned in that bug, statfs().f_fsid is the best
solution (unless comparing the mount point is satisfactory).

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 21:10                     ` NeilBrown
@ 2021-08-02 21:50                       ` J. Bruce Fields
  2021-08-02 21:59                         ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: J. Bruce Fields @ 2021-08-02 21:50 UTC (permalink / raw)
  To: NeilBrown
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Tue, Aug 03, 2021 at 07:10:44AM +1000, NeilBrown wrote:
> On Mon, 02 Aug 2021, J. Bruce Fields wrote:
> > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> > > For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> > > the inode should become (file.objectid ^ swab64(root.objectid)).  This
> > > will provide numbers that are unique until you get very large subvols,
> > > and very many subvols.
> > 
> > If you snapshot a filesystem, I'd expect, at least by default, that
> > inodes in the snapshot to stay the same as in the snapshotted
> > filesystem.
> 
> As I said: we need to challenge and revise user-space (and meat-space)
> expectations. 

The example that came to mind is people that export a snapshot, then
replace it with an updated snapshot, and expect that to be transparent
to clients.

Our client will error out with ESTALE if it notices an inode number
changed out from under it.

I don't know if there are other such cases.  It seems like surprising
behavior to me, though.

--b.

> In btrfs, you DO NOT snapshot a FILESYSTEM.  Rather, you effectively
> create a 'reflink' for a subtree (only works on subtrees that have been
> correctly created with the poorly named "btrfs subvolume" command).
> 
> As with any reflink, the original has the same inode number that it did
> before, the new version has a different inode number (though in current
> BTRFS, half of the inode number is hidden from user-space, so it looks
> like the inode number hasn't changed).

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 21:50                       ` J. Bruce Fields
@ 2021-08-02 21:59                         ` NeilBrown
  2021-08-02 22:14                           ` J. Bruce Fields
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-02 21:59 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Tue, 03 Aug 2021, J. Bruce Fields wrote:
> On Tue, Aug 03, 2021 at 07:10:44AM +1000, NeilBrown wrote:
> > On Mon, 02 Aug 2021, J. Bruce Fields wrote:
> > > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> > > > For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> > > > the inode should become (file.objectid ^ swab64(root.objectid)).  This
> > > > will provide numbers that are unique until you get very large subvols,
> > > > and very many subvols.
> > > 
> > > If you snapshot a filesystem, I'd expect, at least by default, that
> > > inodes in the snapshot to stay the same as in the snapshotted
> > > filesystem.
> > 
> > As I said: we need to challenge and revise user-space (and meat-space)
> > expectations. 
> 
> The example that came to mind is people that export a snapshot, then
> replace it with an updated snapshot, and expect that to be transparent
> to clients.
> 
> Our client will error out with ESTALE if it notices an inode number
> changed out from under it.

Will it?  If the inode number changed, then the filehandle would change.
Unless the filesystem were exported with subtreecheck, the old filehandle
would continue to work (unless the old snapshot was deleted).  File-name
lookups from the root would find new files...

"replace with an updated snapshot" is no different from "replace with an
updated directory tree".  If you delete the old tree, then
currently-open files will break.  If you don't you get a reasonably
clean transition.

> 
> I don't know if there are other such cases.  It seems like surprising
> behavior to me, though.

If you refuse to risk breaking anything, then you cannot make progress.
Providing people can choose when things break, and have advanced
warning, they often cope remarkable well.

Thanks,
NeilBrown


> 
> --b.
> 
> > In btrfs, you DO NOT snapshot a FILESYSTEM.  Rather, you effectively
> > create a 'reflink' for a subtree (only works on subtrees that have been
> > correctly created with the poorly named "btrfs subvolume" command).
> > 
> > As with any reflink, the original has the same inode number that it did
> > before, the new version has a different inode number (though in current
> > BTRFS, half of the inode number is hidden from user-space, so it looks
> > like the inode number hasn't changed).
> 
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 21:59                         ` NeilBrown
@ 2021-08-02 22:14                           ` J. Bruce Fields
  2021-08-02 22:36                             ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: J. Bruce Fields @ 2021-08-02 22:14 UTC (permalink / raw)
  To: NeilBrown
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Tue, Aug 03, 2021 at 07:59:30AM +1000, NeilBrown wrote:
> On Tue, 03 Aug 2021, J. Bruce Fields wrote:
> > On Tue, Aug 03, 2021 at 07:10:44AM +1000, NeilBrown wrote:
> > > On Mon, 02 Aug 2021, J. Bruce Fields wrote:
> > > > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> > > > > For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> > > > > the inode should become (file.objectid ^ swab64(root.objectid)).  This
> > > > > will provide numbers that are unique until you get very large subvols,
> > > > > and very many subvols.
> > > > 
> > > > If you snapshot a filesystem, I'd expect, at least by default, that
> > > > inodes in the snapshot to stay the same as in the snapshotted
> > > > filesystem.
> > > 
> > > As I said: we need to challenge and revise user-space (and meat-space)
> > > expectations. 
> > 
> > The example that came to mind is people that export a snapshot, then
> > replace it with an updated snapshot, and expect that to be transparent
> > to clients.
> > 
> > Our client will error out with ESTALE if it notices an inode number
> > changed out from under it.
> 
> Will it?

See fs/nfs/inode.c:nfs_check_inode_attributes():

	if (nfsi->fileid != fattr->fileid) {
                /* Is this perhaps the mounted-on fileid? */
                if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) &&
                    nfsi->fileid == fattr->mounted_on_fileid)
                        return 0;
                return -ESTALE;
        }

--b.

> If the inode number changed, then the filehandle would change.
> Unless the filesystem were exported with subtreecheck, the old filehandle
> would continue to work (unless the old snapshot was deleted).  File-name
> lookups from the root would find new files...
> 
> "replace with an updated snapshot" is no different from "replace with an
> updated directory tree".  If you delete the old tree, then
> currently-open files will break.  If you don't you get a reasonably
> clean transition.
> 
> > 
> > I don't know if there are other such cases.  It seems like surprising
> > behavior to me, though.
> 
> If you refuse to risk breaking anything, then you cannot make progress.
> Providing people can choose when things break, and have advanced
> warning, they often cope remarkable well.
> 
> Thanks,
> NeilBrown
> 
> 
> > 
> > --b.
> > 
> > > In btrfs, you DO NOT snapshot a FILESYSTEM.  Rather, you effectively
> > > create a 'reflink' for a subtree (only works on subtrees that have been
> > > correctly created with the poorly named "btrfs subvolume" command).
> > > 
> > > As with any reflink, the original has the same inode number that it did
> > > before, the new version has a different inode number (though in current
> > > BTRFS, half of the inode number is hidden from user-space, so it looks
> > > like the inode number hasn't changed).
> > 
> > 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 22:14                           ` J. Bruce Fields
@ 2021-08-02 22:36                             ` NeilBrown
  2021-08-03  0:15                               ` J. Bruce Fields
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-02 22:36 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Tue, 03 Aug 2021, J. Bruce Fields wrote:
> On Tue, Aug 03, 2021 at 07:59:30AM +1000, NeilBrown wrote:
> > On Tue, 03 Aug 2021, J. Bruce Fields wrote:
> > > On Tue, Aug 03, 2021 at 07:10:44AM +1000, NeilBrown wrote:
> > > > On Mon, 02 Aug 2021, J. Bruce Fields wrote:
> > > > > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> > > > > > For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> > > > > > the inode should become (file.objectid ^ swab64(root.objectid)).  This
> > > > > > will provide numbers that are unique until you get very large subvols,
> > > > > > and very many subvols.
> > > > > 
> > > > > If you snapshot a filesystem, I'd expect, at least by default, that
> > > > > inodes in the snapshot to stay the same as in the snapshotted
> > > > > filesystem.
> > > > 
> > > > As I said: we need to challenge and revise user-space (and meat-space)
> > > > expectations. 
> > > 
> > > The example that came to mind is people that export a snapshot, then
> > > replace it with an updated snapshot, and expect that to be transparent
> > > to clients.
> > > 
> > > Our client will error out with ESTALE if it notices an inode number
> > > changed out from under it.
> > 
> > Will it?
> 
> See fs/nfs/inode.c:nfs_check_inode_attributes():
> 
> 	if (nfsi->fileid != fattr->fileid) {
>                 /* Is this perhaps the mounted-on fileid? */
>                 if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) &&
>                     nfsi->fileid == fattr->mounted_on_fileid)
>                         return 0;
>                 return -ESTALE;
>         }

That code fires if the fileid (inode number) reported for a particular
filehandle changes.  I'm saying that won't happen.

If you reflink (aka snaphot) a btrfs subtree (aka "subvol"), then the
new sub-tree will ALREADY have different filehandles than the original
subvol.  Whether it has the same inode numbers or different ones is
irrelevant to NFS.

(on reflection, I didn't say that as clearly as I could have done last time)

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 22:36                             ` NeilBrown
@ 2021-08-03  0:15                               ` J. Bruce Fields
  0 siblings, 0 replies; 123+ messages in thread
From: J. Bruce Fields @ 2021-08-03  0:15 UTC (permalink / raw)
  To: NeilBrown
  Cc: Miklos Szeredi, Al Viro, Christoph Hellwig, Josef Bacik,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS

On Tue, Aug 03, 2021 at 08:36:44AM +1000, NeilBrown wrote:
> On Tue, 03 Aug 2021, J. Bruce Fields wrote:
> > On Tue, Aug 03, 2021 at 07:59:30AM +1000, NeilBrown wrote:
> > > On Tue, 03 Aug 2021, J. Bruce Fields wrote:
> > > > On Tue, Aug 03, 2021 at 07:10:44AM +1000, NeilBrown wrote:
> > > > > On Mon, 02 Aug 2021, J. Bruce Fields wrote:
> > > > > > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
> > > > > > > For btrfs, the "location" is root.objectid ++ file.objectid.  I think
> > > > > > > the inode should become (file.objectid ^ swab64(root.objectid)).  This
> > > > > > > will provide numbers that are unique until you get very large subvols,
> > > > > > > and very many subvols.
> > > > > > 
> > > > > > If you snapshot a filesystem, I'd expect, at least by default, that
> > > > > > inodes in the snapshot to stay the same as in the snapshotted
> > > > > > filesystem.
> > > > > 
> > > > > As I said: we need to challenge and revise user-space (and meat-space)
> > > > > expectations. 
> > > > 
> > > > The example that came to mind is people that export a snapshot, then
> > > > replace it with an updated snapshot, and expect that to be transparent
> > > > to clients.
> > > > 
> > > > Our client will error out with ESTALE if it notices an inode number
> > > > changed out from under it.
> > > 
> > > Will it?
> > 
> > See fs/nfs/inode.c:nfs_check_inode_attributes():
> > 
> > 	if (nfsi->fileid != fattr->fileid) {
> >                 /* Is this perhaps the mounted-on fileid? */
> >                 if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) &&
> >                     nfsi->fileid == fattr->mounted_on_fileid)
> >                         return 0;
> >                 return -ESTALE;
> >         }
> 
> That code fires if the fileid (inode number) reported for a particular
> filehandle changes.  I'm saying that won't happen.
> 
> If you reflink (aka snaphot) a btrfs subtree (aka "subvol"), then the
> new sub-tree will ALREADY have different filehandles than the original
> subvol.

Whoops, you're right, sorry for the noise....

--b.

> Whether it has the same inode numbers or different ones is
> irrelevant to NFS.


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues.
  2021-08-02 13:53                         ` Josef Bacik
@ 2021-08-03 22:29                           ` Qu Wenruo
  0 siblings, 0 replies; 123+ messages in thread
From: Qu Wenruo @ 2021-08-03 22:29 UTC (permalink / raw)
  To: Josef Bacik, Amir Goldstein, NeilBrown
  Cc: Al Viro, Miklos Szeredi, Christoph Hellwig, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, linux-fsdevel,
	Linux NFS list, Btrfs BTRFS



On 2021/8/2 下午9:53, Josef Bacik wrote:
> On 8/2/21 3:54 AM, Amir Goldstein wrote:
>> On Mon, Aug 2, 2021 at 8:41 AM NeilBrown <neilb@suse.de> wrote:
>>>
>>> On Mon, 02 Aug 2021, Al Viro wrote:
>>>> On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote:
>>>>
>>>>> It think we need to bite-the-bullet and decide that 64bits is not
>>>>> enough, and in fact no number of bits will ever be enough.  overlayfs
>>>>> makes this clear.
>>>>
>>>> Sure - let's go for broke and use XML.  Oh, wait - it's 8 months too
>>>> early...
>>>>
>>>>> So I think we need to strongly encourage user-space to start using
>>>>> name_to_handle_at() whenever there is a need to test if two things are
>>>>> the same.
>>>>
>>>> ... and forgetting the inconvenient facts, such as that two different
>>>> fhandles may correspond to the same object.
>>>
>>> Can they?  They certainly can if the "connectable" flag is passed.
>>> name_to_handle_at() cannot set that flag.
>>> nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite
>>> perfect.  However it is the best that can be done over NFS.
>>>
>>> Or is there some other situation where two different filehandles can be
>>> reported for the same inode?
>>>
>>> Do you have a better suggestion?
>>>
>>
>> Neil,
>>
>> I think the plan of "changing the world" is not very realistic.
>> Sure, *some* tools can be changed, but all of them?
>>
>> I went back to read your initial cover letter to understand the
>> problem and what I mostly found there was that the view of
>> /proc/x/mountinfo was hiding information that is important for
>> some tools to understand what is going on with btrfs subvols.
>>
>> Well I am not a UNIX history expert, but I suppose that
>> /proc/PID/mountinfo was created because /proc/mounts and
>> /proc/PID/mounts no longer provided tool with all the information
>> about Linux mounts.
>>
>> Maybe it's time for a new interface to query the more advanced
>> sb/mount topology? fsinfo() maybe? With mount2 compatible API for
>> traversing mounts that is not limited to reporting all entries inside
>> a single page. I suppose we could go for some hierarchical view
>> under /proc/PID/mounttree. I don't know - new API is hard.
>>
>> In any case, instead of changing st_dev and st_ino or changing the
>> world to work with file handles, why not add inode generation (and
>> maybe subvol id) to statx().
>> filesystem that care enough will provide this information and tools that
>> care enough will use it.
>>
>
> Can y'all wait till I'm back from vacation, goddamn ;)
>
> This is what I'm aiming for, I spent some time looking at how many
> places we string parse /proc/<whatever>/mounts and my head hurts.
>
> Btrfs already has a reasonable solution for this, we have UUID's for
> everything.  UUID's aren't a strictly btrfs thing either, all the file
> systems have some sort of UUID identifier, hell its built into blkid.  I
> would love if we could do a better job about letting applications query
> information about where they are.  And we could expose this with the
> relatively common UUID format.  You ask what fs you're in, you get the
> FS UUID, and then if you're on Btrfs you get the specific subvolume UUID
> you're in.  That way you could do more fancy things like know if you've
> wandered into a new file system completely or just a different subvolume.

I'm completely on the side of using proper UUID.

But suddenly I find a problem for this, at least we still need something
like st_dev for real volume based snapshot.

One of the problem for real volume based snapshot is, the snapshoted
volume is completely the same filesystem, every binary is the same,
including UUID.

That means, the only way to distinguish such volumes is by st_dev.

For such pure UUID base solution, it's in fact unable to distinguish
them using just UUID.
Unless we have some device UUID to replace the old st_dev.

Thanks,
Qu

>
> We have to keep the st_ino/st_dev thing for backwards compatibility, but
> make it easier to get more info out of the file system.
>
> We could in theory expose just the subvolid also, since that's a nice
> simple u64, but it limits our ability to do new fancy shit in the
> future.  It's not a bad solution, but like I said I think we need to
> take a step back and figure out what problem we're specifically trying
> to solve, and work from there.  Starting from automounts and working our
> way back is not going very well.  Thanks,
>
> Josef

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-07-29  5:27       ` Amir Goldstein
@ 2021-08-06  7:52         ` Miklos Szeredi
  2021-08-06  8:08           ` Amir Goldstein
  0 siblings, 1 reply; 123+ messages in thread
From: Miklos Szeredi @ 2021-08-06  7:52 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, Linux NFS Mailing List, Linux Btrfs

On Thu, 29 Jul 2021 at 07:27, Amir Goldstein <amir73il@gmail.com> wrote:

> Given that today, subvolume mounts (or any mounts) on the lower layer
> are not followed by overlayfs, I don't really see the difference
> if mounts are created manually or automatically.
> Miklos?

Never tried overlay on btrfs.  Subvolumes AFAIK do not use submounts
currently, they are a sort of hack where the st_dev changes when
crossing the subvolume boundary, but there's no sign of them in
/proc/mounts (there's no d_automount in btrfs).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-08-06  7:52         ` Miklos Szeredi
@ 2021-08-06  8:08           ` Amir Goldstein
  2021-08-06  8:18             ` Miklos Szeredi
  0 siblings, 1 reply; 123+ messages in thread
From: Amir Goldstein @ 2021-08-06  8:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, Linux NFS Mailing List, Linux Btrfs

On Fri, Aug 6, 2021 at 10:52 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 29 Jul 2021 at 07:27, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > Given that today, subvolume mounts (or any mounts) on the lower layer
> > are not followed by overlayfs, I don't really see the difference
> > if mounts are created manually or automatically.
> > Miklos?
>
> Never tried overlay on btrfs.  Subvolumes AFAIK do not use submounts
> currently, they are a sort of hack where the st_dev changes when
> crossing the subvolume boundary, but there's no sign of them in
> /proc/mounts (there's no d_automount in btrfs).

That's what Niel's patch 11/11 is proposing to add and that's the reason
he was asking if this is going to break overlayfs over btrfs.

My question was, regardless of btrfs, can ovl_lookup() treat automount
dentries gracefully as empty dirs or just read them as is, instead of
returning EREMOTE on lookup?

The rationale is that we use a private mount and we are not following
across mounts from layers anyway, so what do we care about
auto or manual mounts?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points.
  2021-08-06  8:08           ` Amir Goldstein
@ 2021-08-06  8:18             ` Miklos Szeredi
  0 siblings, 0 replies; 123+ messages in thread
From: Miklos Szeredi @ 2021-08-06  8:18 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, Linux NFS Mailing List, Linux Btrfs

On Fri, 6 Aug 2021 at 10:08, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Aug 6, 2021 at 10:52 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, 29 Jul 2021 at 07:27, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > Given that today, subvolume mounts (or any mounts) on the lower layer
> > > are not followed by overlayfs, I don't really see the difference
> > > if mounts are created manually or automatically.
> > > Miklos?
> >
> > Never tried overlay on btrfs.  Subvolumes AFAIK do not use submounts
> > currently, they are a sort of hack where the st_dev changes when
> > crossing the subvolume boundary, but there's no sign of them in
> > /proc/mounts (there's no d_automount in btrfs).
>
> That's what Niel's patch 11/11 is proposing to add and that's the reason
> he was asking if this is going to break overlayfs over btrfs.
>
> My question was, regardless of btrfs, can ovl_lookup() treat automount
> dentries gracefully as empty dirs or just read them as is, instead of
> returning EREMOTE on lookup?
>
> The rationale is that we use a private mount and we are not following
> across mounts from layers anyway, so what do we care about
> auto or manual mounts?

I guess that depends on the use cases.  If no one cares (as is the
case apparently), the simplest is to leave it the way it is.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
                   ` (13 preceding siblings ...)
  2021-07-28 19:35 ` J. Bruce Fields
@ 2021-08-13  1:45 ` NeilBrown
  2021-08-13 14:55   ` Josef Bacik
                     ` (2 more replies)
  14 siblings, 3 replies; 123+ messages in thread
From: NeilBrown @ 2021-08-13  1:45 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs


[[This patch is a minimal patch which addresses the current problems
  with nfsd and btrfs, in a way which I think is most supportable, least
  surprising, and least likely to impact any future attempts to more
  completely fix the btrfs file-identify problem]]

BTRFS does not provide unique inode numbers across a filesystem.
It *does* provide unique inode numbers with a subvolume and
uses synthetic device numbers for different subvolumes to ensure
uniqueness for device+inode.

nfsd cannot use these varying device numbers.  If nfsd were to
synthesise different stable filesystem ids to give to the client, that
would cause subvolumes to appear in the mount table on the client, even
though they don't appear in the mount table on the server.  Also, NFSv3
doesn't support changing the filesystem id without a new explicit
mount on the client (this is partially supported in practice, but
violates the protocol specification).

So currently, the roots of all subvolumes report the same inode number
in the same filesystem to NFS clients and tools like 'find' notice that
a directory has the same identity as an ancestor, and so refuse to
enter that directory.

This patch allows btrfs (or any filesystem) to provide a 64bit number
that can be xored with the inode number to make the number more unique.
Rather than the client being certain to see duplicates, with this patch
it is possible but extremely rare.

The number than btrfs provides is a swab64() version of the subvolume
identifier.  This has most entropy in the high bits (the low bits of the
subvolume identifer), while the inoe has most entropy in the low bits.
The result will always be unique within a subvolume, and will almost
always be unique across the filesystem.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/btrfs/inode.c     |  4 ++++
 fs/nfsd/nfs3xdr.c    | 17 ++++++++++++++++-
 fs/nfsd/nfs4xdr.c    |  9 ++++++++-
 fs/nfsd/xdr3.h       |  2 ++
 include/linux/stat.h | 17 +++++++++++++++++
 5 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0117d867ecf8..989fdf2032d5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9195,6 +9195,10 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
 	generic_fillattr(&init_user_ns, inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
 
+	if (BTRFS_I(inode)->root->root_key.objectid != BTRFS_FS_TREE_OBJECTID)
+		stat->ino_uniquifier =
+			swab64(BTRFS_I(inode)->root->root_key.objectid);
+
 	spin_lock(&BTRFS_I(inode)->lock);
 	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
 	inode_bytes = inode_get_bytes(inode);
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 0a5ebc52e6a9..669e2437362a 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -340,6 +340,7 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 {
 	struct user_namespace *userns = nfsd_user_namespace(rqstp);
 	__be32 *p;
+	u64 ino;
 	u64 fsid;
 
 	p = xdr_reserve_space(xdr, XDR_UNIT * 21);
@@ -377,7 +378,10 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 	p = xdr_encode_hyper(p, fsid);
 
 	/* fileid */
-	p = xdr_encode_hyper(p, stat->ino);
+	ino = stat->ino;
+	if (stat->ino_uniquifier && stat->ino_uniquifier != ino)
+		ino ^= stat->ino_uniquifier;
+	p = xdr_encode_hyper(p, ino);
 
 	p = encode_nfstime3(p, &stat->atime);
 	p = encode_nfstime3(p, &stat->mtime);
@@ -1151,6 +1155,17 @@ svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name,
 	if (xdr_stream_encode_item_present(xdr) < 0)
 		return false;
 	/* fileid */
+	if (!resp->dir_have_uniquifier) {
+		struct kstat stat;
+		if (fh_getattr(&resp->fh, &stat) == nfs_ok)
+			resp->dir_ino_uniquifier = stat.ino_uniquifier;
+		else
+			resp->dir_ino_uniquifier = 0;
+		resp->dir_have_uniquifier = 1;
+	}
+	if (resp->dir_ino_uniquifier &&
+	    resp->dir_ino_uniquifier != ino)
+		ino ^= resp->dir_ino_uniquifier;
 	if (xdr_stream_encode_u64(xdr, ino) < 0)
 		return false;
 	/* name */
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 7abeccb975b2..ddccf849c29c 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3114,10 +3114,14 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 					fhp->fh_handle.fh_size);
 	}
 	if (bmval0 & FATTR4_WORD0_FILEID) {
+		u64 ino = stat.ino;
+		if (stat.ino_uniquifier &&
+		    stat.ino_uniquifier != stat.ino)
+			ino ^= stat.ino_uniquifier;
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
 			goto out_resource;
-		p = xdr_encode_hyper(p, stat.ino);
+		p = xdr_encode_hyper(p, ino);
 	}
 	if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
 		p = xdr_reserve_space(xdr, 8);
@@ -3285,6 +3289,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 			if (err)
 				goto out_nfserr;
 			ino = parent_stat.ino;
+			if (parent_stat.ino_uniquifier &&
+			    parent_stat.ino_uniquifier != ino)
+				ino ^= parent_stat.ino_uniquifier;
 		}
 		p = xdr_encode_hyper(p, ino);
 	}
diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
index 933008382bbe..b4f9f3c71f72 100644
--- a/fs/nfsd/xdr3.h
+++ b/fs/nfsd/xdr3.h
@@ -179,6 +179,8 @@ struct nfsd3_readdirres {
 	struct xdr_buf		dirlist;
 	struct svc_fh		scratch;
 	struct readdir_cd	common;
+	u64			dir_ino_uniquifier;
+	int			dir_have_uniquifier;
 	unsigned int		cookie_offset;
 	struct svc_rqst *	rqstp;
 
diff --git a/include/linux/stat.h b/include/linux/stat.h
index fff27e603814..a5188f42ed81 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -46,6 +46,23 @@ struct kstat {
 	struct timespec64 btime;			/* File creation time */
 	u64		blocks;
 	u64		mnt_id;
+	/*
+	 * BTRFS does not provide unique inode numbers within a filesystem,
+	 * depending on a synthetic 'dev' to provide uniqueness.
+	 * NFSd cannot make use of this 'dev' number so clients often see
+	 * duplicate inode numbers.
+	 * For BTRFS, 'ino' is unlikely to use the high bits.  It puts
+	 * another number in ino_uniquifier which:
+	 * - has most entropy in the high bits
+	 * - is different precisely when 'dev' is different
+	 * - is stable across unmount/remount
+	 * NFSd can xor this with 'ino' to get a substantially more unique
+	 * number for reporting to the client.
+	 * The ino_uniquifier for a directory can reasonably be applied
+	 * to inode numbers reported by the readdir filldir callback.
+	 * It is NOT currently exported to user-space.
+	 */
+	u64		ino_uniquifier;
 };
 
 #endif
-- 
2.32.0


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
@ 2021-08-13 14:55   ` Josef Bacik
  2021-08-15  7:39   ` Goffredo Baroncelli
  2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
  2 siblings, 0 replies; 123+ messages in thread
From: Josef Bacik @ 2021-08-13 14:55 UTC (permalink / raw)
  To: NeilBrown, Christoph Hellwig, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

On 8/12/21 9:45 PM, NeilBrown wrote:
> 
> [[This patch is a minimal patch which addresses the current problems
>    with nfsd and btrfs, in a way which I think is most supportable, least
>    surprising, and least likely to impact any future attempts to more
>    completely fix the btrfs file-identify problem]]
> 
> BTRFS does not provide unique inode numbers across a filesystem.
> It *does* provide unique inode numbers with a subvolume and
> uses synthetic device numbers for different subvolumes to ensure
> uniqueness for device+inode.
> 
> nfsd cannot use these varying device numbers.  If nfsd were to
> synthesise different stable filesystem ids to give to the client, that
> would cause subvolumes to appear in the mount table on the client, even
> though they don't appear in the mount table on the server.  Also, NFSv3
> doesn't support changing the filesystem id without a new explicit
> mount on the client (this is partially supported in practice, but
> violates the protocol specification).
> 
> So currently, the roots of all subvolumes report the same inode number
> in the same filesystem to NFS clients and tools like 'find' notice that
> a directory has the same identity as an ancestor, and so refuse to
> enter that directory.
> 
> This patch allows btrfs (or any filesystem) to provide a 64bit number
> that can be xored with the inode number to make the number more unique.
> Rather than the client being certain to see duplicates, with this patch
> it is possible but extremely rare.
> 
> The number than btrfs provides is a swab64() version of the subvolume
> identifier.  This has most entropy in the high bits (the low bits of the
> subvolume identifer), while the inoe has most entropy in the low bits.
> The result will always be unique within a subvolume, and will almost
> always be unique across the filesystem.
> 

This is a reasonable approach to me, solves the problem without being overly 
complicated and side-steps the thornier issues around how we deal with 
subvolumes.  I'll leave it up to the other maintainers of the other fs'es to 
weigh in, but for me you can add

Acked-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
  2021-08-13 14:55   ` Josef Bacik
@ 2021-08-15  7:39   ` Goffredo Baroncelli
  2021-08-15 19:35     ` Roman Mamedov
  2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
  2 siblings, 1 reply; 123+ messages in thread
From: Goffredo Baroncelli @ 2021-08-15  7:39 UTC (permalink / raw)
  To: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro
  Cc: linux-fsdevel, linux-nfs, linux-btrfs

Hi All,

On 8/13/21 3:45 AM, NeilBrown wrote:
> 
> [[This patch is a minimal patch which addresses the current problems
>    with nfsd and btrfs, in a way which I think is most supportable, least
>    surprising, and least likely to impact any future attempts to more
>    completely fix the btrfs file-identify problem]]
> 
> BTRFS does not provide unique inode numbers across a filesystem.
> It *does* provide unique inode numbers with a subvolume and
> uses synthetic device numbers for different subvolumes to ensure
> uniqueness for device+inode.
> 
> nfsd cannot use these varying device numbers.  If nfsd were to
> synthesise different stable filesystem ids to give to the client, that
> would cause subvolumes to appear in the mount table on the client, even
> though they don't appear in the mount table on the server.  Also, NFSv3
> doesn't support changing the filesystem id without a new explicit
> mount on the client (this is partially supported in practice, but
> violates the protocol specification).

I am sure that it was discussed already but I was unable to find any track of this discussion.
But if the problem is the collision between the inode number of different subvolume in the nfd export, is it simpler if the export is truncated to the subvolume boundary ? It would be more coherent with the current behavior of vfs+nfsd.

In fact in btrfs a subvolume is a complete filesystem, with an "own synthetic" device. We could like or not this solution, but this solution is the more aligned to the unix standard, where for each filesystem there is a pair (device, inode-set). NFS (by default) avoids to cross the boundary between the filesystems. So why in BTRFS this
should be different ?

May be an opt-in option would solve the backward compatibility (i.e. to avoid problem with setup which relies on this behaviour).

> So currently, the roots of all subvolumes report the same inode number
> in the same filesystem to NFS clients and tools like 'find' notice that
> a directory has the same identity as an ancestor, and so refuse to
> enter that directory.
> 
> This patch allows btrfs (or any filesystem) to provide a 64bit number
> that can be xored with the inode number to make the number more unique.
> Rather than the client being certain to see duplicates, with this patch
> it is possible but extremely rare.
> 
> The number than btrfs provides is a swab64() version of the subvolume
> identifier.  This has most entropy in the high bits (the low bits of the
> subvolume identifer), while the inoe has most entropy in the low bits.
> The result will always be unique within a subvolume, and will almost
> always be unique across the filesystem.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>   fs/btrfs/inode.c     |  4 ++++
>   fs/nfsd/nfs3xdr.c    | 17 ++++++++++++++++-
>   fs/nfsd/nfs4xdr.c    |  9 ++++++++-
>   fs/nfsd/xdr3.h       |  2 ++
>   include/linux/stat.h | 17 +++++++++++++++++
>   5 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0117d867ecf8..989fdf2032d5 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9195,6 +9195,10 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
>   	generic_fillattr(&init_user_ns, inode, stat);
>   	stat->dev = BTRFS_I(inode)->root->anon_dev;
>   
> +	if (BTRFS_I(inode)->root->root_key.objectid != BTRFS_FS_TREE_OBJECTID)
> +		stat->ino_uniquifier =
> +			swab64(BTRFS_I(inode)->root->root_key.objectid);
> +
>   	spin_lock(&BTRFS_I(inode)->lock);
>   	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
>   	inode_bytes = inode_get_bytes(inode);
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 0a5ebc52e6a9..669e2437362a 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -340,6 +340,7 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
>   {
>   	struct user_namespace *userns = nfsd_user_namespace(rqstp);
>   	__be32 *p;
> +	u64 ino;
>   	u64 fsid;
>   
>   	p = xdr_reserve_space(xdr, XDR_UNIT * 21);
> @@ -377,7 +378,10 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
>   	p = xdr_encode_hyper(p, fsid);
>   
>   	/* fileid */
> -	p = xdr_encode_hyper(p, stat->ino);
> +	ino = stat->ino;
> +	if (stat->ino_uniquifier && stat->ino_uniquifier != ino)
> +		ino ^= stat->ino_uniquifier;
> +	p = xdr_encode_hyper(p, ino);
>   
>   	p = encode_nfstime3(p, &stat->atime);
>   	p = encode_nfstime3(p, &stat->mtime);
> @@ -1151,6 +1155,17 @@ svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name,
>   	if (xdr_stream_encode_item_present(xdr) < 0)
>   		return false;
>   	/* fileid */
> +	if (!resp->dir_have_uniquifier) {
> +		struct kstat stat;
> +		if (fh_getattr(&resp->fh, &stat) == nfs_ok)
> +			resp->dir_ino_uniquifier = stat.ino_uniquifier;
> +		else
> +			resp->dir_ino_uniquifier = 0;
> +		resp->dir_have_uniquifier = 1;
> +	}
> +	if (resp->dir_ino_uniquifier &&
> +	    resp->dir_ino_uniquifier != ino)
> +		ino ^= resp->dir_ino_uniquifier;
>   	if (xdr_stream_encode_u64(xdr, ino) < 0)
>   		return false;
>   	/* name */
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 7abeccb975b2..ddccf849c29c 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3114,10 +3114,14 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>   					fhp->fh_handle.fh_size);
>   	}
>   	if (bmval0 & FATTR4_WORD0_FILEID) {
> +		u64 ino = stat.ino;
> +		if (stat.ino_uniquifier &&
> +		    stat.ino_uniquifier != stat.ino)
> +			ino ^= stat.ino_uniquifier;
>   		p = xdr_reserve_space(xdr, 8);
>   		if (!p)
>   			goto out_resource;
> -		p = xdr_encode_hyper(p, stat.ino);
> +		p = xdr_encode_hyper(p, ino);
>   	}
>   	if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
>   		p = xdr_reserve_space(xdr, 8);
> @@ -3285,6 +3289,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>   			if (err)
>   				goto out_nfserr;
>   			ino = parent_stat.ino;
> +			if (parent_stat.ino_uniquifier &&
> +			    parent_stat.ino_uniquifier != ino)
> +				ino ^= parent_stat.ino_uniquifier;
>   		}
>   		p = xdr_encode_hyper(p, ino);
>   	}
> diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
> index 933008382bbe..b4f9f3c71f72 100644
> --- a/fs/nfsd/xdr3.h
> +++ b/fs/nfsd/xdr3.h
> @@ -179,6 +179,8 @@ struct nfsd3_readdirres {
>   	struct xdr_buf		dirlist;
>   	struct svc_fh		scratch;
>   	struct readdir_cd	common;
> +	u64			dir_ino_uniquifier;
> +	int			dir_have_uniquifier;
>   	unsigned int		cookie_offset;
>   	struct svc_rqst *	rqstp;
>   
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index fff27e603814..a5188f42ed81 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -46,6 +46,23 @@ struct kstat {
>   	struct timespec64 btime;			/* File creation time */
>   	u64		blocks;
>   	u64		mnt_id;
> +	/*
> +	 * BTRFS does not provide unique inode numbers within a filesystem,
> +	 * depending on a synthetic 'dev' to provide uniqueness.
> +	 * NFSd cannot make use of this 'dev' number so clients often see
> +	 * duplicate inode numbers.
> +	 * For BTRFS, 'ino' is unlikely to use the high bits.  It puts
> +	 * another number in ino_uniquifier which:
> +	 * - has most entropy in the high bits
> +	 * - is different precisely when 'dev' is different
> +	 * - is stable across unmount/remount
> +	 * NFSd can xor this with 'ino' to get a substantially more unique
> +	 * number for reporting to the client.
> +	 * The ino_uniquifier for a directory can reasonably be applied
> +	 * to inode numbers reported by the readdir filldir callback.
> +	 * It is NOT currently exported to user-space.
> +	 */
> +	u64		ino_uniquifier;
>   };

Why don't rename "ino_uniquifier" as "ino_and_subvolume" and leave to the filesystem the work to combine the inode and the subvolume-id ?

I am worried that the logic is split between the filesystem, which synthesizes the ino_uniquifier, and to NFS which combine to the inode. I am thinking that this combination is filesystem specific; for BTRFS is a simple xor but for other filesystem may be a more complex operation, so leaving an half in the filesystem and another half to the NFS seems to not optimal if other filesystem needs to use ino_uniquifier.


>   #endif
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-15  7:39   ` Goffredo Baroncelli
@ 2021-08-15 19:35     ` Roman Mamedov
  2021-08-15 21:03       ` Goffredo Baroncelli
  2021-08-15 22:17       ` NeilBrown
  0 siblings, 2 replies; 123+ messages in thread
From: Roman Mamedov @ 2021-08-15 19:35 UTC (permalink / raw)
  To: Goffredo Baroncelli
  Cc: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Sun, 15 Aug 2021 09:39:08 +0200
Goffredo Baroncelli <kreijack@libero.it> wrote:

> I am sure that it was discussed already but I was unable to find any track
> of this discussion. But if the problem is the collision between the inode
> number of different subvolume in the nfd export, is it simpler if the export
> is truncated to the subvolume boundary ? It would be more coherent with the
> current behavior of vfs+nfsd.

See this bugreport thread which started it all:
https://www.spinics.net/lists/linux-btrfs/msg111172.html

In there the reporting user replied that it is strongly not feasible for them
to export each individual snapshot.

> In fact in btrfs a subvolume is a complete filesystem, with an "own
> synthetic" device. We could like or not this solution, but this solution is
> the more aligned to the unix standard, where for each filesystem there is a
> pair (device, inode-set). NFS (by default) avoids to cross the boundary
> between the filesystems. So why in BTRFS this should be different ?

From the user point of view subvolumes are basically directories; that they
are "complete filesystems"* is merely a low-level implementation detail.

* well except they are not, as you cannot 'dd' a subvolume to another
blockdevice.

> Why don't rename "ino_uniquifier" as "ino_and_subvolume" and leave to the
> filesystem the work to combine the inode and the subvolume-id ?
>
> I am worried that the logic is split between the filesystem, which
> synthesizes the ino_uniquifier, and to NFS which combine to the inode. I am
> thinking that this combination is filesystem specific; for BTRFS is a simple
> xor but for other filesystem may be a more complex operation, so leaving an
> half in the filesystem and another half to the NFS seems to not optimal if
> other filesystem needs to use ino_uniquifier.

I wondered a bit myself, what are the downsides of just doing the
uniquefication inside Btrfs, not leaving that to NFSD?

I mean not even adding the extra stat field, just return the inode itself with
that already applied. Surely cannot be any worse collision-wise, than
different subvolumes straight up having the same inode numbers as right now?

Or is it a performance concern, always doing more work, for something which
only NFSD has needed so far.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-15 19:35     ` Roman Mamedov
@ 2021-08-15 21:03       ` Goffredo Baroncelli
  2021-08-15 21:53         ` NeilBrown
  2021-08-15 22:17       ` NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: Goffredo Baroncelli @ 2021-08-15 21:03 UTC (permalink / raw)
  To: Roman Mamedov, NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On 8/15/21 9:35 PM, Roman Mamedov wrote:
> On Sun, 15 Aug 2021 09:39:08 +0200
> Goffredo Baroncelli <kreijack@libero.it> wrote:
> 
>> I am sure that it was discussed already but I was unable to find any track
>> of this discussion. But if the problem is the collision between the inode
>> number of different subvolume in the nfd export, is it simpler if the export
>> is truncated to the subvolume boundary ? It would be more coherent with the
>> current behavior of vfs+nfsd.
> 
> See this bugreport thread which started it all:
> https://www.spinics.net/lists/linux-btrfs/msg111172.html
> 
> In there the reporting user replied that it is strongly not feasible for them
> to export each individual snapshot.

Thanks for pointing that.

However looking at the 'exports' man page, it seems that NFS has already an
option to cover these cases: 'crossmnt'.

If NFSd detects a "child" filesystem (i.e. a filesystem mounted inside an already
exported one) and the "parent" filesystem is marked as 'crossmnt',  the client mount
the parent AND the child filesystem with two separate mounts, so there is not problem of inode collision.

I tested it mounting two nested ext4 filesystem, and there isn't any problem of inode collision
(even if there are two different files with the same inode number).

---------
# mount -o loop disk2 test3/
# echo 123 >test3/one
# mkdir test3/test4
# sudo mount -o loop disk3 test3/test4/
# echo 123 >test3/test4/one
# ls -liR test3/
test3/:
total 24
11 drwx------ 2 root  root  16384 Aug 15 22:27 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:29 one
  2 drwxr-xrwx 3 root  root   4096 Aug 15 22:46 test4

test3/test4:
total 20
11 drwx------ 2 root  root  16384 Aug 15 22:45 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:46 one

# egrep test3 /etc/exports
/tmp/test3 *(rw,no_subtree_check,crossmnt)

# mount 192.168.1.27:/tmp/test3 3
ls -lRi 3
3:
total 24
11 drwx------ 2 root  root  16384 Aug 15 22:27 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:29 one
  2 drwxr-xrwx 3 root  root   4096 Aug 15 22:46 test4

3/test4:
total 20
11 drwx------ 2 root  root  16384 Aug 15 22:45 lost+found
12 -rw-r--r-- 1 ghigo ghigo     4 Aug 15 22:46 one

# mount | egrep 192
192.168.1.27:/tmp/test3 on /tmp/3 type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.27,local_lock=none,addr=192.168.1.27)
192.168.1.27:/tmp/test3/test4 on /tmp/3/test4 type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.27,local_lock=none,addr=192.168.1.27)


---------------

I tried to mount even with "nfsvers=3", and it seems to work.

However the tests above are related to ext4; in fact it doesn't work with btrfs, but I think this is more a implementation problem than a strategy problem.
What I means is that NFS already has a way to mount different parts of the fs-tree with different mounts (form a client POV). I think that this strategy should be used when NFSd exports a BTRFS filesystem:
- if the 'crossmnt' is NOT Passed, the export should ends at the subvolume boundary (or allow inode collision)
- if the 'crossmnt' is passed, the client should automatically mounts each nested subvolume as a separate mount

> 
>> In fact in btrfs a subvolume is a complete filesystem, with an "own
>> synthetic" device. We could like or not this solution, but this solution is
>> the more aligned to the unix standard, where for each filesystem there is a
>> pair (device, inode-set). NFS (by default) avoids to cross the boundary
>> between the filesystems. So why in BTRFS this should be different ?
> 
>  From the user point of view subvolumes are basically directories; that they
> are "complete filesystems"* is merely a low-level implementation detail.
> 
> * well except they are not, as you cannot 'dd' a subvolume to another
> blockdevice.
> 
>> Why don't rename "ino_uniquifier" as "ino_and_subvolume" and leave to the
>> filesystem the work to combine the inode and the subvolume-id ?
>>
>> I am worried that the logic is split between the filesystem, which
>> synthesizes the ino_uniquifier, and to NFS which combine to the inode. I am
>> thinking that this combination is filesystem specific; for BTRFS is a simple
>> xor but for other filesystem may be a more complex operation, so leaving an
>> half in the filesystem and another half to the NFS seems to not optimal if
>> other filesystem needs to use ino_uniquifier.
> 
> I wondered a bit myself, what are the downsides of just doing the
> uniquefication inside Btrfs, not leaving that to NFSD?
> 
> I mean not even adding the extra stat field, just return the inode itself with
> that already applied. Surely cannot be any worse collision-wise, than
> different subvolumes straight up having the same inode numbers as right now?
> 
> Or is it a performance concern, always doing more work, for something which
> only NFSD has needed so far.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-15 21:03       ` Goffredo Baroncelli
@ 2021-08-15 21:53         ` NeilBrown
  2021-08-17 19:34           ` Goffredo Baroncelli
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-15 21:53 UTC (permalink / raw)
  To: kreijack
  Cc: Roman Mamedov, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Mon, 16 Aug 2021, kreijack@inwind.it wrote:
> On 8/15/21 9:35 PM, Roman Mamedov wrote:
> > On Sun, 15 Aug 2021 09:39:08 +0200
> > Goffredo Baroncelli <kreijack@libero.it> wrote:
> > 
> >> I am sure that it was discussed already but I was unable to find any track
> >> of this discussion. But if the problem is the collision between the inode
> >> number of different subvolume in the nfd export, is it simpler if the export
> >> is truncated to the subvolume boundary ? It would be more coherent with the
> >> current behavior of vfs+nfsd.
> > 
> > See this bugreport thread which started it all:
> > https://www.spinics.net/lists/linux-btrfs/msg111172.html
> > 
> > In there the reporting user replied that it is strongly not feasible for them
> > to export each individual snapshot.
> 
> Thanks for pointing that.
> 
> However looking at the 'exports' man page, it seems that NFS has already an
> option to cover these cases: 'crossmnt'.
> 
> If NFSd detects a "child" filesystem (i.e. a filesystem mounted inside an already
> exported one) and the "parent" filesystem is marked as 'crossmnt',  the client mount
> the parent AND the child filesystem with two separate mounts, so there is not problem of inode collision.

As you acknowledged, you haven't read the whole back-story.  Maybe you
should.

https://lore.kernel.org/linux-nfs/20210613115313.BC59.409509F4@e16-tech.com/
https://lore.kernel.org/linux-nfs/162848123483.25823.15844774651164477866.stgit@noble.brown/
https://lore.kernel.org/linux-btrfs/162742539595.32498.13687924366155737575.stgit@noble.brown/

The flow of conversation does sometimes jump between threads.

I'm very happy to respond you questions after you've absorbed all that.

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-15 19:35     ` Roman Mamedov
  2021-08-15 21:03       ` Goffredo Baroncelli
@ 2021-08-15 22:17       ` NeilBrown
  2021-08-19  8:01         ` Amir Goldstein
  2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
  1 sibling, 2 replies; 123+ messages in thread
From: NeilBrown @ 2021-08-15 22:17 UTC (permalink / raw)
  To: Roman Mamedov
  Cc: Goffredo Baroncelli, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever, Chris Mason, David Sterba,
	Alexander Viro, linux-fsdevel, linux-nfs, linux-btrfs

On Mon, 16 Aug 2021, Roman Mamedov wrote:
> 
> I wondered a bit myself, what are the downsides of just doing the
> uniquefication inside Btrfs, not leaving that to NFSD?
> 
> I mean not even adding the extra stat field, just return the inode itself with
> that already applied. Surely cannot be any worse collision-wise, than
> different subvolumes straight up having the same inode numbers as right now?
> 
> Or is it a performance concern, always doing more work, for something which
> only NFSD has needed so far.

Any change in behaviour will have unexpected consequences.  I think the
btrfs maintainers perspective is they they don't want to change
behaviour if they don't have to (which is reasonable) and that currently
they don't have to (which probably means that users aren't complaining
loudly enough).

NFS export of BTRFS is already demonstrably broken and users are
complaining loudly enough that I can hear them ....  though I think it
has been broken like this for 10 years, do I wonder that I didn't hear
them before.

If something is perceived as broken, then a behaviour change that
appears to fix it is more easily accepted.

However, having said that I now see that my latest patch is not ideal.
It changes the inode numbers associated with filehandles of objects in
the non-root subvolume.  This will cause the Linux NFS client to treat
the object as 'stale' For most objects this is a transient annoyance.
Reopen the file or restart the process and all should be well again.
However if the inode number of the mount point changes, you will need to
unmount and remount.  That is more somewhat more of an annoyance.

There are a few ways to handle this more gracefully.

1/ We could get btrfs to hand out new filehandles as well as new inode
numbers, but still accept the old filehandles.  Then we could make the
inode number reported be based on the filehandle.  This would be nearly
seamless but rather clumsy to code.  I'm not *very* keen on this idea,
but it is worth keeping in mind.

2/ We could add a btrfs mount option to control whether the uniquifier
was set or not.  This would allow the sysadmin to choose when to manage
any breakage.  I think this is my preference, but Josef has declared an
aversion to mount options.

3/ We could add a module parameter to nfsd to control whether the
uniquifier is merged in.  This again gives the sysadmin control, and it
can be done despite any aversion from btrfs maintainers.  But I'd need
to overcome any aversion from the nfsd maintainers, and I don't know how
strong that would be yet. (A new export option isn't really appropriate.
It is much more work to add an export option than the add a mount option).

I don't know.... maybe I should try harder to like option 1, or at least
verify if it works as expected and see how ugly the code really is.

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-15 21:53         ` NeilBrown
@ 2021-08-17 19:34           ` Goffredo Baroncelli
  2021-08-17 21:39             ` NeilBrown
  0 siblings, 1 reply; 123+ messages in thread
From: Goffredo Baroncelli @ 2021-08-17 19:34 UTC (permalink / raw)
  To: NeilBrown
  Cc: Roman Mamedov, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On 8/15/21 11:53 PM, NeilBrown wrote:
> On Mon, 16 Aug 2021, kreijack@inwind.it wrote:
>> On 8/15/21 9:35 PM, Roman Mamedov wrote:
>>> On Sun, 15 Aug 2021 09:39:08 +0200
>>> Goffredo Baroncelli <kreijack@libero.it> wrote:
>>>
>>>> I am sure that it was discussed already but I was unable to find any track
>>>> of this discussion. But if the problem is the collision between the inode
>>>> number of different subvolume in the nfd export, is it simpler if the export
>>>> is truncated to the subvolume boundary ? It would be more coherent with the
>>>> current behavior of vfs+nfsd.
>>>
>>> See this bugreport thread which started it all:
>>> https://www.spinics.net/lists/linux-btrfs/msg111172.html
>>>
>>> In there the reporting user replied that it is strongly not feasible for them
>>> to export each individual snapshot.
>>
>> Thanks for pointing that.
>>
>> However looking at the 'exports' man page, it seems that NFS has already an
>> option to cover these cases: 'crossmnt'.
>>
>> If NFSd detects a "child" filesystem (i.e. a filesystem mounted inside an already
>> exported one) and the "parent" filesystem is marked as 'crossmnt',  the client mount
>> the parent AND the child filesystem with two separate mounts, so there is not problem of inode collision.
> 
> As you acknowledged, you haven't read the whole back-story.  Maybe you
> should.
> 
> https://lore.kernel.org/linux-nfs/20210613115313.BC59.409509F4@e16-tech.com/
> https://lore.kernel.org/linux-nfs/162848123483.25823.15844774651164477866.stgit@noble.brown/
> https://lore.kernel.org/linux-btrfs/162742539595.32498.13687924366155737575.stgit@noble.brown/
> 
> The flow of conversation does sometimes jump between threads.
> 
> I'm very happy to respond you questions after you've absorbed all that.

Hi Neil,

I read the other threads. And I still have the opinion that the nfsd crossmnt behavior should be a good solution for the btrfs subvolumes.
> 
> NeilBrown
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-17 19:34           ` Goffredo Baroncelli
@ 2021-08-17 21:39             ` NeilBrown
  2021-08-18 17:24               ` Goffredo Baroncelli
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-17 21:39 UTC (permalink / raw)
  To: kreijack
  Cc: Roman Mamedov, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Wed, 18 Aug 2021, kreijack@inwind.it wrote:
> On 8/15/21 11:53 PM, NeilBrown wrote:
> > On Mon, 16 Aug 2021, kreijack@inwind.it wrote:
> >> On 8/15/21 9:35 PM, Roman Mamedov wrote:

> >>
> >> However looking at the 'exports' man page, it seems that NFS has already an
> >> option to cover these cases: 'crossmnt'.
> >>
> >> If NFSd detects a "child" filesystem (i.e. a filesystem mounted inside an already
> >> exported one) and the "parent" filesystem is marked as 'crossmnt',  the client mount
> >> the parent AND the child filesystem with two separate mounts, so there is not problem of inode collision.
> > 
> > As you acknowledged, you haven't read the whole back-story.  Maybe you
> > should.
> > 
> > https://lore.kernel.org/linux-nfs/20210613115313.BC59.409509F4@e16-tech.com/
> > https://lore.kernel.org/linux-nfs/162848123483.25823.15844774651164477866.stgit@noble.brown/
> > https://lore.kernel.org/linux-btrfs/162742539595.32498.13687924366155737575.stgit@noble.brown/
> > 
> > The flow of conversation does sometimes jump between threads.
> > 
> > I'm very happy to respond you questions after you've absorbed all that.
> 
> Hi Neil,
> 
> I read the other threads.  And I still have the opinion that the nfsd
> crossmnt behavior should be a good solution for the btrfs subvolumes. 

Thanks for reading it all.  Let me join the dots for you.

"crossmnt" doesn't currently work because "subvolumes" aren't mount
points.

We could change btrfs so that subvolumes *are* mountpoints.  They would
have to be automounts.  I posted patches to do that.  They were broadly
rejected because people have many thousands of submounts that are
concurrently active and so /proc/mounts would be multiple megabytes is
size and working with it would become impractical.  Also, non-privileged
users can create subvols, and may want the path names to remain private.
But these subvols would appear in the mount table and so would no longer
be private.

Alternately we could change the "crossmnt" functionality to treat a
change of st_dev as though it were a mount point.  I posted patches to
do this too.  This hits the same sort of problems in a different way.
If NFSD reports that is has crossed a "mount" by providing a different
filesystem-id to the client, then the client will create a new mount
point which will appear in /proc/mounts.  It might be less likely that
many thousands of subvolumes are accessed over NFS than locally, but it
is still entirely possible.  I don't want the NFS client to suffer a
problem that btrfs doesn't impose locally.  And 'private' subvolumes
could again appear on a public list if they were accessed via NFS.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
  2021-08-13 14:55   ` Josef Bacik
  2021-08-15  7:39   ` Goffredo Baroncelli
@ 2021-08-18 14:54   ` Wang Yugui
  2021-08-18 21:46     ` NeilBrown
  2 siblings, 1 reply; 123+ messages in thread
From: Wang Yugui @ 2021-08-18 14:54 UTC (permalink / raw)
  To: NeilBrown
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

Hi,

We use  'swab64' to combinate 'subvol id' and 'inode' into 64bit in this
patch.

case1:
'subvol id': 16bit => 64K, a little small because the subvol id is
always increase?
'inode':	48bit * 4K per node, this is big enough.

case2:
'subvol id': 24bit => 16M,  this is big enough.
'inode':	40bit * 4K per node => 4 PB.  this is a little small?

Is there a way to 'bit-swap' the subvol id, rather the current byte-swap?

If not, maybe it is a better balance if we combinate 22bit subvol id and
42 bit inode?

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/08/18

> 
> [[This patch is a minimal patch which addresses the current problems
>   with nfsd and btrfs, in a way which I think is most supportable, least
>   surprising, and least likely to impact any future attempts to more
>   completely fix the btrfs file-identify problem]]
> 
> BTRFS does not provide unique inode numbers across a filesystem.
> It *does* provide unique inode numbers with a subvolume and
> uses synthetic device numbers for different subvolumes to ensure
> uniqueness for device+inode.
> 
> nfsd cannot use these varying device numbers.  If nfsd were to
> synthesise different stable filesystem ids to give to the client, that
> would cause subvolumes to appear in the mount table on the client, even
> though they don't appear in the mount table on the server.  Also, NFSv3
> doesn't support changing the filesystem id without a new explicit
> mount on the client (this is partially supported in practice, but
> violates the protocol specification).
> 
> So currently, the roots of all subvolumes report the same inode number
> in the same filesystem to NFS clients and tools like 'find' notice that
> a directory has the same identity as an ancestor, and so refuse to
> enter that directory.
> 
> This patch allows btrfs (or any filesystem) to provide a 64bit number
> that can be xored with the inode number to make the number more unique.
> Rather than the client being certain to see duplicates, with this patch
> it is possible but extremely rare.
> 
> The number than btrfs provides is a swab64() version of the subvolume
> identifier.  This has most entropy in the high bits (the low bits of the
> subvolume identifer), while the inoe has most entropy in the low bits.
> The result will always be unique within a subvolume, and will almost
> always be unique across the filesystem.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/btrfs/inode.c     |  4 ++++
>  fs/nfsd/nfs3xdr.c    | 17 ++++++++++++++++-
>  fs/nfsd/nfs4xdr.c    |  9 ++++++++-
>  fs/nfsd/xdr3.h       |  2 ++
>  include/linux/stat.h | 17 +++++++++++++++++
>  5 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0117d867ecf8..989fdf2032d5 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9195,6 +9195,10 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
>  	generic_fillattr(&init_user_ns, inode, stat);
>  	stat->dev = BTRFS_I(inode)->root->anon_dev;
>  
> +	if (BTRFS_I(inode)->root->root_key.objectid != BTRFS_FS_TREE_OBJECTID)
> +		stat->ino_uniquifier =
> +			swab64(BTRFS_I(inode)->root->root_key.objectid);
> +
>  	spin_lock(&BTRFS_I(inode)->lock);
>  	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
>  	inode_bytes = inode_get_bytes(inode);
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 0a5ebc52e6a9..669e2437362a 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -340,6 +340,7 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
>  {
>  	struct user_namespace *userns = nfsd_user_namespace(rqstp);
>  	__be32 *p;
> +	u64 ino;
>  	u64 fsid;
>  
>  	p = xdr_reserve_space(xdr, XDR_UNIT * 21);
> @@ -377,7 +378,10 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
>  	p = xdr_encode_hyper(p, fsid);
>  
>  	/* fileid */
> -	p = xdr_encode_hyper(p, stat->ino);
> +	ino = stat->ino;
> +	if (stat->ino_uniquifier && stat->ino_uniquifier != ino)
> +		ino ^= stat->ino_uniquifier;
> +	p = xdr_encode_hyper(p, ino);
>  
>  	p = encode_nfstime3(p, &stat->atime);
>  	p = encode_nfstime3(p, &stat->mtime);
> @@ -1151,6 +1155,17 @@ svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name,
>  	if (xdr_stream_encode_item_present(xdr) < 0)
>  		return false;
>  	/* fileid */
> +	if (!resp->dir_have_uniquifier) {
> +		struct kstat stat;
> +		if (fh_getattr(&resp->fh, &stat) == nfs_ok)
> +			resp->dir_ino_uniquifier = stat.ino_uniquifier;
> +		else
> +			resp->dir_ino_uniquifier = 0;
> +		resp->dir_have_uniquifier = 1;
> +	}
> +	if (resp->dir_ino_uniquifier &&
> +	    resp->dir_ino_uniquifier != ino)
> +		ino ^= resp->dir_ino_uniquifier;
>  	if (xdr_stream_encode_u64(xdr, ino) < 0)
>  		return false;
>  	/* name */
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 7abeccb975b2..ddccf849c29c 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3114,10 +3114,14 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>  					fhp->fh_handle.fh_size);
>  	}
>  	if (bmval0 & FATTR4_WORD0_FILEID) {
> +		u64 ino = stat.ino;
> +		if (stat.ino_uniquifier &&
> +		    stat.ino_uniquifier != stat.ino)
> +			ino ^= stat.ino_uniquifier;
>  		p = xdr_reserve_space(xdr, 8);
>  		if (!p)
>  			goto out_resource;
> -		p = xdr_encode_hyper(p, stat.ino);
> +		p = xdr_encode_hyper(p, ino);
>  	}
>  	if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
>  		p = xdr_reserve_space(xdr, 8);
> @@ -3285,6 +3289,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
>  			if (err)
>  				goto out_nfserr;
>  			ino = parent_stat.ino;
> +			if (parent_stat.ino_uniquifier &&
> +			    parent_stat.ino_uniquifier != ino)
> +				ino ^= parent_stat.ino_uniquifier;
>  		}
>  		p = xdr_encode_hyper(p, ino);
>  	}
> diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
> index 933008382bbe..b4f9f3c71f72 100644
> --- a/fs/nfsd/xdr3.h
> +++ b/fs/nfsd/xdr3.h
> @@ -179,6 +179,8 @@ struct nfsd3_readdirres {
>  	struct xdr_buf		dirlist;
>  	struct svc_fh		scratch;
>  	struct readdir_cd	common;
> +	u64			dir_ino_uniquifier;
> +	int			dir_have_uniquifier;
>  	unsigned int		cookie_offset;
>  	struct svc_rqst *	rqstp;
>  
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index fff27e603814..a5188f42ed81 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -46,6 +46,23 @@ struct kstat {
>  	struct timespec64 btime;			/* File creation time */
>  	u64		blocks;
>  	u64		mnt_id;
> +	/*
> +	 * BTRFS does not provide unique inode numbers within a filesystem,
> +	 * depending on a synthetic 'dev' to provide uniqueness.
> +	 * NFSd cannot make use of this 'dev' number so clients often see
> +	 * duplicate inode numbers.
> +	 * For BTRFS, 'ino' is unlikely to use the high bits.  It puts
> +	 * another number in ino_uniquifier which:
> +	 * - has most entropy in the high bits
> +	 * - is different precisely when 'dev' is different
> +	 * - is stable across unmount/remount
> +	 * NFSd can xor this with 'ino' to get a substantially more unique
> +	 * number for reporting to the client.
> +	 * The ino_uniquifier for a directory can reasonably be applied
> +	 * to inode numbers reported by the readdir filldir callback.
> +	 * It is NOT currently exported to user-space.
> +	 */
> +	u64		ino_uniquifier;
>  };
>  
>  #endif
> -- 
> 2.32.0



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-17 21:39             ` NeilBrown
@ 2021-08-18 17:24               ` Goffredo Baroncelli
  0 siblings, 0 replies; 123+ messages in thread
From: Goffredo Baroncelli @ 2021-08-18 17:24 UTC (permalink / raw)
  To: NeilBrown
  Cc: Roman Mamedov, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On 8/17/21 11:39 PM, NeilBrown wrote:
> On Wed, 18 Aug 2021, kreijack@inwind.it wrote:
>> On 8/15/21 11:53 PM, NeilBrown wrote:
>>> On Mon, 16 Aug 2021, kreijack@inwind.it wrote:
>>>> On 8/15/21 9:35 PM, Roman Mamedov wrote:
> 
>>>>
>>>> However looking at the 'exports' man page, it seems that NFS has already an
>>>> option to cover these cases: 'crossmnt'.
>>>>
>>>> If NFSd detects a "child" filesystem (i.e. a filesystem mounted inside an already
>>>> exported one) and the "parent" filesystem is marked as 'crossmnt',  the client mount
>>>> the parent AND the child filesystem with two separate mounts, so there is not problem of inode collision.
>>>
>>> As you acknowledged, you haven't read the whole back-story.  Maybe you
>>> should.
>>>
>>> https://lore.kernel.org/linux-nfs/20210613115313.BC59.409509F4@e16-tech.com/
>>> https://lore.kernel.org/linux-nfs/162848123483.25823.15844774651164477866.stgit@noble.brown/
>>> https://lore.kernel.org/linux-btrfs/162742539595.32498.13687924366155737575.stgit@noble.brown/
>>>
>>> The flow of conversation does sometimes jump between threads.
>>>
>>> I'm very happy to respond you questions after you've absorbed all that.
>>
>> Hi Neil,
>>
>> I read the other threads.  And I still have the opinion that the nfsd
>> crossmnt behavior should be a good solution for the btrfs subvolumes.
> 
> Thanks for reading it all.  Let me join the dots for you.
> 
[...]
> 
> Alternately we could change the "crossmnt" functionality to treat a
> change of st_dev as though it were a mount point.  I posted patches to
> do this too.  This hits the same sort of problems in a different way.
> If NFSD reports that is has crossed a "mount" by providing a different
> filesystem-id to the client, then the client will create a new mount
> point which will appear in /proc/mounts.  

Yes, this is my proposal.

> It might be less likely that
> many thousands of subvolumes are accessed over NFS than locally, but it
> is still entirely possible.  

I don't think that it would be so unlikely. Think about a file indexer
and/or a 'find' command runned in the folder that contains the snapshots...

> I don't want the NFS client to suffer a
> problem that btrfs doesn't impose locally.  

The solution is not easy. In fact we are trying to map a u64 x u64 space to a u64 space. The true is that we
cannot guarantee that a collision will not happen. We can only say that for a fresh filesystem is near
impossible, but for an aged filesystem it is unlikely but possible.

We already faced real case where we exhausted the inode space in the 32 bit arch.What is the chances that the subvolumes ever created count is greater  2^24 and the inode number is greater  2^40 ? The likelihood is low but not 0...

Some random toughs:
- the new inode number are created merging the original inode-number (in the lower bit) and the object-id of the subvolume (in higher bit). We could add a warning when these bits overlap:

	if (fls(stat->ino) >= ffs(stat->ino_uniquifer))
		printk("NFSD: Warning possible inode collision...")

More smarter heuristic can be developed, like doing the check against the maximum value if inode and the maximum value of the subvolume once at mount time....

- for the inode number it is an expensive operation (even tough it exists/existed for the 32bit processor), but we could reuse the object-id after it is freed

- I think that we could add an option to nfsd or btrfs (not a default behavior) to avoid to cross the subvolume boundary

> And 'private' subvolumes
> could again appear on a public list if they were accessed via NFS.

(wrongly) I never considered  a similar scenario. However I think that these could be anonymized using a alias (the name of the path to mount is passed by nfsd, so it could create an alias that will be recognized by nfsd when the clienet requires it... complex but doable...)

> 
> Thanks,
> NeilBrown
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
@ 2021-08-18 21:46     ` NeilBrown
  2021-08-19  2:19       ` Zygo Blaxell
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-18 21:46 UTC (permalink / raw)
  To: Wang Yugui
  Cc: Christoph Hellwig, Josef Bacik, J. Bruce Fields, Chuck Lever,
	Chris Mason, David Sterba, Alexander Viro, linux-fsdevel,
	linux-nfs, linux-btrfs

On Thu, 19 Aug 2021, Wang Yugui wrote:
> Hi,
> 
> We use  'swab64' to combinate 'subvol id' and 'inode' into 64bit in this
> patch.
> 
> case1:
> 'subvol id': 16bit => 64K, a little small because the subvol id is
> always increase?
> 'inode':	48bit * 4K per node, this is big enough.
> 
> case2:
> 'subvol id': 24bit => 16M,  this is big enough.
> 'inode':	40bit * 4K per node => 4 PB.  this is a little small?

I don't know what point you are trying to make with the above.

> 
> Is there a way to 'bit-swap' the subvol id, rather the current byte-swap?

Sure:
   for (i=0; i<64; i++) {
        new = (new << 1) | (old & 1)
        old >>= 1;
   }

but would it gain anything significant?

Remember what the goal is.  Most apps don't care at all about duplicate
inode numbers - only a few do, and they only care about a few inodes.
The only bug I actually have a report of is caused by a directory having
the same inode as an ancestor.  i.e.  in lots of cases, duplicate inode
numbers won't be noticed.

The behaviour of btrfs over NFS RELIABLY causes exactly this behaviour
of a directory having the same inode number as an ancestor.  The root of
a subtree will *always* do this.  If we JUST changed the inode numbers
of the roots of subtrees, then most observed problems would go away.  It
would change from "trivial to reproduce" to "rarely happens".  The patch
I actually propose makes it much more unlikely than that.  Even if
duplicate inode numbers do happen, the chance of them being noticed is
infinitesimal.  Given that, there is no point in minor tweaks unless
they can make duplicate inode numbers IMPOSSIBLE.

> 
> If not, maybe it is a better balance if we combinate 22bit subvol id and
> 42 bit inode?

This would be better except when it is worse.  We cannot know which will
happen more often.

As long as BTRFS allows object-ids and root-ids combined to use more
than 64 bits there can be no perfect solution.  There are many possible
solutions that will be close to perfect in practice.  swab64() is the
simplest that I could think of.  Picking any arbitrary cut-off (22/42,
24/40, ...) is unlikely to be better, and could is some circumstances be
worse.

My preference would be for btrfs to start re-using old object-ids and
root-ids, and to enforce a limit (set at mkfs or tunefs) so that the
total number of bits does not exceed 64.  Unfortunately the maintainers
seem reluctant to even consider this.

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-18 21:46     ` NeilBrown
@ 2021-08-19  2:19       ` Zygo Blaxell
  2021-08-20  2:54         ` NeilBrown
  2021-08-23  0:57         ` Wang Yugui
  0 siblings, 2 replies; 123+ messages in thread
From: Zygo Blaxell @ 2021-08-19  2:19 UTC (permalink / raw)
  To: NeilBrown
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Thu, Aug 19, 2021 at 07:46:22AM +1000, NeilBrown wrote:
> On Thu, 19 Aug 2021, Wang Yugui wrote:
> > Hi,
> > 
> > We use  'swab64' to combinate 'subvol id' and 'inode' into 64bit in this
> > patch.
> > 
> > case1:
> > 'subvol id': 16bit => 64K, a little small because the subvol id is
> > always increase?
> > 'inode':	48bit * 4K per node, this is big enough.
> > 
> > case2:
> > 'subvol id': 24bit => 16M,  this is big enough.
> > 'inode':	40bit * 4K per node => 4 PB.  this is a little small?
> 
> I don't know what point you are trying to make with the above.
> 
> > 
> > Is there a way to 'bit-swap' the subvol id, rather the current byte-swap?
> 
> Sure:
>    for (i=0; i<64; i++) {
>         new = (new << 1) | (old & 1)
>         old >>= 1;
>    }
> 
> but would it gain anything significant?
> 
> Remember what the goal is.  Most apps don't care at all about duplicate
> inode numbers - only a few do, and they only care about a few inodes.
> The only bug I actually have a report of is caused by a directory having
> the same inode as an ancestor.  i.e.  in lots of cases, duplicate inode
> numbers won't be noticed.

rsync -H and cpio's hardlink detection can be badly confused.  They will
think distinct files with the same inode number are hardlinks.  This could
be bad if you were making backups (though if you're making backups over
NFS, you are probably doing something that could be done better in a
different way).

> The behaviour of btrfs over NFS RELIABLY causes exactly this behaviour
> of a directory having the same inode number as an ancestor.  The root of
> a subtree will *always* do this.  If we JUST changed the inode numbers
> of the roots of subtrees, then most observed problems would go away.  It
> would change from "trivial to reproduce" to "rarely happens".  The patch
> I actually propose makes it much more unlikely than that.  Even if
> duplicate inode numbers do happen, the chance of them being noticed is
> infinitesimal.  Given that, there is no point in minor tweaks unless
> they can make duplicate inode numbers IMPOSSIBLE.

That's a good argument.  I have a different one with the same conclusion.

40 bit inodes would take about 20 years to collide with 24-bit subvols--if
you are creating an average of 1742 inodes every second.  Also at the
same time you have to be creating a subvol every 37 seconds to occupy
the colliding 25th bit of the subvol ID.  Only the highest inode number
in any subvol counts--if your inode creation is spread out over several
different subvols, you'll need to make inodes even faster.

For reference, my high scores are 17 inodes per second and a subvol
every 595 seconds (averaged over 1 year).  Burst numbers are much higher,
but one has to spend some time _reading_ the files now and then.

I've encountered other btrfs users with two orders of magnitude higher
inode creation rates than mine.  They are barely squeaking under the
20-year line--or they would be, if they were creating snapshots 50 times
faster than they do today.

Use cases that have the highest inode creation rates (like /tmp) tend
to get more specialized storage solutions (like tmpfs).

Cloud fleets do have higher average inode creation rates, but their
filesystems have much shorter lifespans than 20 years, so the delta on
both sides of the ratio cancels out.

If this hack is only used for NFS, it gives us some time to come up
with a better solution.  (On the other hand, we had 14 years already,
and here we are...)

> > If not, maybe it is a better balance if we combinate 22bit subvol id and
> > 42 bit inode?
> 
> This would be better except when it is worse.  We cannot know which will
> happen more often.
> 
> As long as BTRFS allows object-ids and root-ids combined to use more
> than 64 bits there can be no perfect solution.  There are many possible
> solutions that will be close to perfect in practice.  swab64() is the
> simplest that I could think of.  Picking any arbitrary cut-off (22/42,
> 24/40, ...) is unlikely to be better, and could is some circumstances be
> worse.
> 
> My preference would be for btrfs to start re-using old object-ids and
> root-ids, and to enforce a limit (set at mkfs or tunefs) so that the
> total number of bits does not exceed 64.  Unfortunately the maintainers
> seem reluctant to even consider this.

It was considered, implemented in 2011, and removed in 2020.  Rationale
is in commit b547a88ea5776a8092f7f122ddc20d6720528782 "btrfs: start
deprecation of mount option inode_cache".  It made file creation slower,
and consumed disk space, iops, and memory to run.  Nobody used it.
Newer on-disk data structure versions (free space tree, 2015) didn't
bother implementing inode_cache's storage requirement.

> NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-15 22:17       ` NeilBrown
@ 2021-08-19  8:01         ` Amir Goldstein
  2021-08-20  3:21           ` NeilBrown
  2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
  1 sibling, 1 reply; 123+ messages in thread
From: Amir Goldstein @ 2021-08-19  8:01 UTC (permalink / raw)
  To: NeilBrown
  Cc: Roman Mamedov, Goffredo Baroncelli, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Linux Btrfs

On Mon, Aug 16, 2021 at 1:21 AM NeilBrown <neilb@suse.de> wrote:
>
> On Mon, 16 Aug 2021, Roman Mamedov wrote:
> >
> > I wondered a bit myself, what are the downsides of just doing the
> > uniquefication inside Btrfs, not leaving that to NFSD?
> >
> > I mean not even adding the extra stat field, just return the inode itself with
> > that already applied. Surely cannot be any worse collision-wise, than
> > different subvolumes straight up having the same inode numbers as right now?
> >
> > Or is it a performance concern, always doing more work, for something which
> > only NFSD has needed so far.
>
> Any change in behaviour will have unexpected consequences.  I think the
> btrfs maintainers perspective is they they don't want to change
> behaviour if they don't have to (which is reasonable) and that currently
> they don't have to (which probably means that users aren't complaining
> loudly enough).
>
> NFS export of BTRFS is already demonstrably broken and users are
> complaining loudly enough that I can hear them ....  though I think it
> has been broken like this for 10 years, do I wonder that I didn't hear
> them before.
>
> If something is perceived as broken, then a behaviour change that
> appears to fix it is more easily accepted.
>
> However, having said that I now see that my latest patch is not ideal.
> It changes the inode numbers associated with filehandles of objects in
> the non-root subvolume.  This will cause the Linux NFS client to treat
> the object as 'stale' For most objects this is a transient annoyance.
> Reopen the file or restart the process and all should be well again.
> However if the inode number of the mount point changes, you will need to
> unmount and remount.  That is more somewhat more of an annoyance.
>
> There are a few ways to handle this more gracefully.
>
> 1/ We could get btrfs to hand out new filehandles as well as new inode
> numbers, but still accept the old filehandles.  Then we could make the
> inode number reported be based on the filehandle.  This would be nearly
> seamless but rather clumsy to code.  I'm not *very* keen on this idea,
> but it is worth keeping in mind.
>

So objects would change their inode number after nfs inode cache is
evicted and while nfs filesystem is mounted. That does not sound ideal.

But I am a bit confused about the problem.
If the export is of the btrfs root, then nfs client cannot access any
subvolumes (right?) - that was the bug report, so the value of inode
numbers in non-root subvolumes is not an issue.
If export is of non-root subvolume, then why bother changing anything
at all? Is there a need to traverse into sub-sub-volumes?

> 2/ We could add a btrfs mount option to control whether the uniquifier
> was set or not.  This would allow the sysadmin to choose when to manage
> any breakage.  I think this is my preference, but Josef has declared an
> aversion to mount options.
>
> 3/ We could add a module parameter to nfsd to control whether the
> uniquifier is merged in.  This again gives the sysadmin control, and it
> can be done despite any aversion from btrfs maintainers.  But I'd need
> to overcome any aversion from the nfsd maintainers, and I don't know how
> strong that would be yet. (A new export option isn't really appropriate.
> It is much more work to add an export option than the add a mount option).
>

That is too bad, because IMO from users POV, "fsid=btrfsroot" or "cross-subvol"
export option would have been a nice way to describe and opt-in to this new
functionality.

But let's consider for a moment the consequences of enabling this functionality
automatically whenever exporting a btrfs root volume without "crossmnt":

1. Objects inside a subvol that are inaccessible(?) with current
nfs/nfsd without
    "crossmnt" will become accessible after enabling the feature -
this will match
    the user experience of accessing btrfs on the host
2. The inode numbers of the newly accessible objects would not match the inode
    numbers on the host fs (no big deal?)
3. The inode numbers of objects in a snapshot would not match the inode
    numbers of the original (pre-snapshot) objects (acceptable tradeoff for
    being able to access the snapshot objects without bloating /proc/mounts?)
4. The inode numbers of objects in a subvol observed via this "cross-subvol"
    export would not match the inode numbers of the same objects observed
    via an individual subvol export
5. st_ino conflicts are possible when multiplexing subvol id and inode number.
    overlayfs resolved those conflicts by allocating an inode number from a
    reserved non-persistent inode range, which may cause objects to change
    their inode number during the lifetime on the filesystem (sensible
tradeoff?)

I think that #4 is a bit hard to swallow and #3 is borderline acceptable...
Both and quite hard to document and to set expectations as a non-opt-in
change of behavior when exporting btrfs root.

IMO, an nfsd module parameter will give some control and therefore is
a must, but it won't make life easier to document and set user expectations
when the semantics are not clearly stated in the exports table.

You claim that "A new export option isn't really appropriate."
but your only argument is that "It is much more work to add
an export option than the add a mount option".

With all due respect, for this particular challenge with all the
constraints involved, this sounds like a pretty weak argument.

Surely, adding an export option is easier than slowly changing all
userspace tools to understand subvolumes? a solution that you had
previously brought up.

Can you elaborate some more about your aversion to a new
export option.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-19  2:19       ` Zygo Blaxell
@ 2021-08-20  2:54         ` NeilBrown
  2021-08-22 19:29           ` Zygo Blaxell
  2021-08-23  0:57         ` Wang Yugui
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-20  2:54 UTC (permalink / raw)
  To: Zygo Blaxell
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Thu, 19 Aug 2021, Zygo Blaxell wrote:
> On Thu, Aug 19, 2021 at 07:46:22AM +1000, NeilBrown wrote:
> > 
> > Remember what the goal is.  Most apps don't care at all about duplicate
> > inode numbers - only a few do, and they only care about a few inodes.
> > The only bug I actually have a report of is caused by a directory having
> > the same inode as an ancestor.  i.e.  in lots of cases, duplicate inode
> > numbers won't be noticed.
> 
> rsync -H and cpio's hardlink detection can be badly confused.  They will
> think distinct files with the same inode number are hardlinks.  This could
> be bad if you were making backups (though if you're making backups over
> NFS, you are probably doing something that could be done better in a
> different way).

Yes, they could get confused.  inode numbers remain unique within a
"subvolume" so you would need to do at backup of multiple subtrees to
hit a problem.  Certainly possible, but probably less common.

> 
> 40 bit inodes would take about 20 years to collide with 24-bit subvols--if
> you are creating an average of 1742 inodes every second.  Also at the
> same time you have to be creating a subvol every 37 seconds to occupy
> the colliding 25th bit of the subvol ID.  Only the highest inode number
> in any subvol counts--if your inode creation is spread out over several
> different subvols, you'll need to make inodes even faster.
> 
> For reference, my high scores are 17 inodes per second and a subvol
> every 595 seconds (averaged over 1 year).  Burst numbers are much higher,
> but one has to spend some time _reading_ the files now and then.
> 
> I've encountered other btrfs users with two orders of magnitude higher
> inode creation rates than mine.  They are barely squeaking under the
> 20-year line--or they would be, if they were creating snapshots 50 times
> faster than they do today.

I do like seeing concrete numbers, thanks.  How many of these inodes and
subvols remain undeleted?  Supposing inode numbers were reused, how many
bits might you need?


> > My preference would be for btrfs to start re-using old object-ids and
> > root-ids, and to enforce a limit (set at mkfs or tunefs) so that the
> > total number of bits does not exceed 64.  Unfortunately the maintainers
> > seem reluctant to even consider this.
> 
> It was considered, implemented in 2011, and removed in 2020.  Rationale
> is in commit b547a88ea5776a8092f7f122ddc20d6720528782 "btrfs: start
> deprecation of mount option inode_cache".  It made file creation slower,
> and consumed disk space, iops, and memory to run.  Nobody used it.
> Newer on-disk data structure versions (free space tree, 2015) didn't
> bother implementing inode_cache's storage requirement.

Yes, I saw that.  Providing reliable functional certainly can impact
performance and consume disk-space.  That isn't an excuse for not doing
it. 
I suspect that carefully tuned code could result in typical creation
times being unchanged, and mean creation times suffering only a tiny
cost.  Using "max+1" when the creation rate is particularly high might
be a reasonable part of managing costs.
Storage cost need not be worse than the cost of tracking free blocks
on the device.

"Nobody used it" is odd.  It implies it would have to be explicitly
enabled, and all it would provide anyone is sane behaviour.  Who would
imagine that to be an optional extra.

NeilBrown


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-19  8:01         ` Amir Goldstein
@ 2021-08-20  3:21           ` NeilBrown
  2021-08-20  6:23             ` Amir Goldstein
  0 siblings, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-20  3:21 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Roman Mamedov, Goffredo Baroncelli, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Linux Btrfs

On Thu, 19 Aug 2021, Amir Goldstein wrote:
> On Mon, Aug 16, 2021 at 1:21 AM NeilBrown <neilb@suse.de> wrote:
> >
> > There are a few ways to handle this more gracefully.
> >
> > 1/ We could get btrfs to hand out new filehandles as well as new inode
> > numbers, but still accept the old filehandles.  Then we could make the
> > inode number reported be based on the filehandle.  This would be nearly
> > seamless but rather clumsy to code.  I'm not *very* keen on this idea,
> > but it is worth keeping in mind.
> >
> 
> So objects would change their inode number after nfs inode cache is
> evicted and while nfs filesystem is mounted. That does not sound ideal.

No.  Almost all filehandle lookups happen in the context of some other
filehandle.  If the provided context is an old-style filehandle, we
provide an old-style filehandle for the lookup.  There is already code
in nfsd to support this (as we have in the past changed how filesystems
are identified).

It would only be if the mountpoint filehandle (which is fetched without
that context) went out of cache that inode numbers would change.  That
would mean that the filesystem (possibly an automount) was unmounted.
When it was remounted it could have a different device number anyway, so
having different inode numbers would be of little consequence.

> 
> But I am a bit confused about the problem.
> If the export is of the btrfs root, then nfs client cannot access any
> subvolumes (right?) - that was the bug report, so the value of inode
> numbers in non-root subvolumes is not an issue.

Not correct.  All objects in the filesystem are fully accessible.  The
only problem is that some pairs of objects have the same inode number.
This causes some programs like 'find' and 'du' to behave differently to
expectations.  They will refuse to even look in a subvolume, because it
looks like doing so could cause an infinite loop.  The values of inode
numbers in non-root subvolumes is EXACTLY the issue.

> If export is of non-root subvolume, then why bother changing anything
> at all? Is there a need to traverse into sub-sub-volumes?
> 
> > 2/ We could add a btrfs mount option to control whether the uniquifier
> > was set or not.  This would allow the sysadmin to choose when to manage
> > any breakage.  I think this is my preference, but Josef has declared an
> > aversion to mount options.
> >
> > 3/ We could add a module parameter to nfsd to control whether the
> > uniquifier is merged in.  This again gives the sysadmin control, and it
> > can be done despite any aversion from btrfs maintainers.  But I'd need
> > to overcome any aversion from the nfsd maintainers, and I don't know how
> > strong that would be yet. (A new export option isn't really appropriate.
> > It is much more work to add an export option than the add a mount option).
> >
> 
> That is too bad, because IMO from users POV, "fsid=btrfsroot" or "cross-subvol"
> export option would have been a nice way to describe and opt-in to this new
> functionality.
> 
> But let's consider for a moment the consequences of enabling this functionality
> automatically whenever exporting a btrfs root volume without "crossmnt":
> 
> 1. Objects inside a subvol that are inaccessible(?) with current
> nfs/nfsd without
>     "crossmnt" will become accessible after enabling the feature -
> this will match
>     the user experience of accessing btrfs on the host

Not correct - as above.

> 2. The inode numbers of the newly accessible objects would not match the inode
>     numbers on the host fs (no big deal?)

Unlikely to be a problem.  Inode numbers have no meaning beyond the facts
that:
  - they are stable for the lifetime of the object
  - they are unique within a filesystem (except btrfs lies about
    filesystems)
  - they are not zero

The facts only need to be equally true on the NFS server and client..

> 3. The inode numbers of objects in a snapshot would not match the inode
>     numbers of the original (pre-snapshot) objects (acceptable tradeoff for
>     being able to access the snapshot objects without bloating /proc/mounts?)

This also should not be a problem.  Files in different snapshots are
different things that happen to share storage (like reflinks).
Comparing inode numbers between places which report different st_dev
does not fit within the meaning of inode numbers.

> 4. The inode numbers of objects in a subvol observed via this "cross-subvol"
>     export would not match the inode numbers of the same objects observed
>     via an individual subvol export

The device number would differ too, so the relative values of the inode
numbers would be irrelevant.

> 5. st_ino conflicts are possible when multiplexing subvol id and inode number.
>     overlayfs resolved those conflicts by allocating an inode number from a
>     reserved non-persistent inode range, which may cause objects to change
>     their inode number during the lifetime on the filesystem (sensible
> tradeoff?)
> 
> I think that #4 is a bit hard to swallow and #3 is borderline acceptable...
> Both and quite hard to document and to set expectations as a non-opt-in
> change of behavior when exporting btrfs root.
> 
> IMO, an nfsd module parameter will give some control and therefore is
> a must, but it won't make life easier to document and set user expectations
> when the semantics are not clearly stated in the exports table.
> 
> You claim that "A new export option isn't really appropriate."
> but your only argument is that "It is much more work to add
> an export option than the add a mount option".
> 
> With all due respect, for this particular challenge with all the
> constraints involved, this sounds like a pretty weak argument.
> 
> Surely, adding an export option is easier than slowly changing all
> userspace tools to understand subvolumes? a solution that you had
> previously brought up.
> 
> Can you elaborate some more about your aversion to a new
> export option.

Export options are bits in a 32bit word - so both user-space and kernel
need to agree on names for them.  We are currently using 18, so there is
room to grow.  It is a perfectly reasonable way to implement sensible
features.  It is, I think, a poor way to implement hacks to work around
misfeatures in filesystems.

This is the core of my dislike for adding an export option.  Using one
effectively admits that what btrfs is doing is a valid thing to do.  I
don't think it is.  I don't think we want any other filesystem developer
to think that they can emulate the behaviour because support is already
provided.

If we add any configuration to support btrfs, I would much prefer it to
be implemented in fs/btrfs, and if not, then with loud warnings that it
works around a deficiency in btrfs.
  /sys/modules/nfsd/parameters/btrfs_export_workaround

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-20  3:21           ` NeilBrown
@ 2021-08-20  6:23             ` Amir Goldstein
  0 siblings, 0 replies; 123+ messages in thread
From: Amir Goldstein @ 2021-08-20  6:23 UTC (permalink / raw)
  To: NeilBrown
  Cc: Roman Mamedov, Goffredo Baroncelli, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever, Chris Mason,
	David Sterba, Alexander Viro, linux-fsdevel,
	Linux NFS Mailing List, Linux Btrfs

On Fri, Aug 20, 2021 at 6:22 AM NeilBrown <neilb@suse.de> wrote:
>
> On Thu, 19 Aug 2021, Amir Goldstein wrote:
> > On Mon, Aug 16, 2021 at 1:21 AM NeilBrown <neilb@suse.de> wrote:
> > >
> > > There are a few ways to handle this more gracefully.
> > >
> > > 1/ We could get btrfs to hand out new filehandles as well as new inode
> > > numbers, but still accept the old filehandles.  Then we could make the
> > > inode number reported be based on the filehandle.  This would be nearly
> > > seamless but rather clumsy to code.  I'm not *very* keen on this idea,
> > > but it is worth keeping in mind.
> > >
> >
> > So objects would change their inode number after nfs inode cache is
> > evicted and while nfs filesystem is mounted. That does not sound ideal.
>
> No.  Almost all filehandle lookups happen in the context of some other
> filehandle.  If the provided context is an old-style filehandle, we
> provide an old-style filehandle for the lookup.  There is already code
> in nfsd to support this (as we have in the past changed how filesystems
> are identified).
>

I see. This is nice!
It almost sounds like "inode mapped mounts" :-)


> It would only be if the mountpoint filehandle (which is fetched without
> that context) went out of cache that inode numbers would change.  That
> would mean that the filesystem (possibly an automount) was unmounted.
> When it was remounted it could have a different device number anyway, so
> having different inode numbers would be of little consequence.
>
> >
> > But I am a bit confused about the problem.
> > If the export is of the btrfs root, then nfs client cannot access any
> > subvolumes (right?) - that was the bug report, so the value of inode
> > numbers in non-root subvolumes is not an issue.
>
> Not correct.  All objects in the filesystem are fully accessible.  The
> only problem is that some pairs of objects have the same inode number.
> This causes some programs like 'find' and 'du' to behave differently to
> expectations.  They will refuse to even look in a subvolume, because it
> looks like doing so could cause an infinite loop.  The values of inode
> numbers in non-root subvolumes is EXACTLY the issue.
>

Thanks for clarifying.

> > If export is of non-root subvolume, then why bother changing anything
> > at all? Is there a need to traverse into sub-sub-volumes?
> >
> > > 2/ We could add a btrfs mount option to control whether the uniquifier
> > > was set or not.  This would allow the sysadmin to choose when to manage
> > > any breakage.  I think this is my preference, but Josef has declared an
> > > aversion to mount options.
> > >
> > > 3/ We could add a module parameter to nfsd to control whether the
> > > uniquifier is merged in.  This again gives the sysadmin control, and it
> > > can be done despite any aversion from btrfs maintainers.  But I'd need
> > > to overcome any aversion from the nfsd maintainers, and I don't know how
> > > strong that would be yet. (A new export option isn't really appropriate.
> > > It is much more work to add an export option than the add a mount option).
> > >
> >
> > That is too bad, because IMO from users POV, "fsid=btrfsroot" or "cross-subvol"
> > export option would have been a nice way to describe and opt-in to this new
> > functionality.
> >
> > But let's consider for a moment the consequences of enabling this functionality
> > automatically whenever exporting a btrfs root volume without "crossmnt":
> >
> > 1. Objects inside a subvol that are inaccessible(?) with current
> > nfs/nfsd without
> >     "crossmnt" will become accessible after enabling the feature -
> > this will match
> >     the user experience of accessing btrfs on the host
>
> Not correct - as above.
>
> > 2. The inode numbers of the newly accessible objects would not match the inode
> >     numbers on the host fs (no big deal?)
>
> Unlikely to be a problem.  Inode numbers have no meaning beyond the facts
> that:
>   - they are stable for the lifetime of the object
>   - they are unique within a filesystem (except btrfs lies about
>     filesystems)
>   - they are not zero
>
> The facts only need to be equally true on the NFS server and client..
>
> > 3. The inode numbers of objects in a snapshot would not match the inode
> >     numbers of the original (pre-snapshot) objects (acceptable tradeoff for
> >     being able to access the snapshot objects without bloating /proc/mounts?)
>
> This also should not be a problem.  Files in different snapshots are
> different things that happen to share storage (like reflinks).
> Comparing inode numbers between places which report different st_dev
> does not fit within the meaning of inode numbers.
>
> > 4. The inode numbers of objects in a subvol observed via this "cross-subvol"
> >     export would not match the inode numbers of the same objects observed
> >     via an individual subvol export
>
> The device number would differ too, so the relative values of the inode
> numbers would be irrelevant.
>
> > 5. st_ino conflicts are possible when multiplexing subvol id and inode number.
> >     overlayfs resolved those conflicts by allocating an inode number from a
> >     reserved non-persistent inode range, which may cause objects to change
> >     their inode number during the lifetime on the filesystem (sensible
> > tradeoff?)
> >
> > I think that #4 is a bit hard to swallow and #3 is borderline acceptable...
> > Both and quite hard to document and to set expectations as a non-opt-in
> > change of behavior when exporting btrfs root.
> >
> > IMO, an nfsd module parameter will give some control and therefore is
> > a must, but it won't make life easier to document and set user expectations
> > when the semantics are not clearly stated in the exports table.
> >
> > You claim that "A new export option isn't really appropriate."
> > but your only argument is that "It is much more work to add
> > an export option than the add a mount option".
> >
> > With all due respect, for this particular challenge with all the
> > constraints involved, this sounds like a pretty weak argument.
> >
> > Surely, adding an export option is easier than slowly changing all
> > userspace tools to understand subvolumes? a solution that you had
> > previously brought up.
> >
> > Can you elaborate some more about your aversion to a new
> > export option.
>
> Export options are bits in a 32bit word - so both user-space and kernel
> need to agree on names for them.  We are currently using 18, so there is
> room to grow.  It is a perfectly reasonable way to implement sensible
> features.  It is, I think, a poor way to implement hacks to work around
> misfeatures in filesystems.
>
> This is the core of my dislike for adding an export option.  Using one
> effectively admits that what btrfs is doing is a valid thing to do.  I
> don't think it is.  I don't think we want any other filesystem developer
> to think that they can emulate the behaviour because support is already
> provided.
>
> If we add any configuration to support btrfs, I would much prefer it to
> be implemented in fs/btrfs, and if not, then with loud warnings that it
> works around a deficiency in btrfs.
>   /sys/modules/nfsd/parameters/btrfs_export_workaround
>

Thanks for clarifying.
I now understand how "hacky" the workaround in nfsd is.

Naive question: could this behavior be implemented in btrfs as you
prefer, but in a way that only serves nfsd and NEW nfs mounts for
that matter (i.e. only NEW filehandles)?

Meaning passing some hint in ->getattr() and ->iterate_shared(),
sort of like idmapped mount does for uid/gid.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-20  2:54         ` NeilBrown
@ 2021-08-22 19:29           ` Zygo Blaxell
  2021-08-23  5:51             ` NeilBrown
  2021-08-23 23:22             ` NeilBrown
  0 siblings, 2 replies; 123+ messages in thread
From: Zygo Blaxell @ 2021-08-22 19:29 UTC (permalink / raw)
  To: NeilBrown
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Fri, Aug 20, 2021 at 12:54:17PM +1000, NeilBrown wrote:
> On Thu, 19 Aug 2021, Zygo Blaxell wrote:
> > 40 bit inodes would take about 20 years to collide with 24-bit subvols--if
> > you are creating an average of 1742 inodes every second.  Also at the
> > same time you have to be creating a subvol every 37 seconds to occupy
> > the colliding 25th bit of the subvol ID.  Only the highest inode number
> > in any subvol counts--if your inode creation is spread out over several
> > different subvols, you'll need to make inodes even faster.
> > 
> > For reference, my high scores are 17 inodes per second and a subvol
> > every 595 seconds (averaged over 1 year).  Burst numbers are much higher,
> > but one has to spend some time _reading_ the files now and then.
> > 
> > I've encountered other btrfs users with two orders of magnitude higher
> > inode creation rates than mine.  They are barely squeaking under the
> > 20-year line--or they would be, if they were creating snapshots 50 times
> > faster than they do today.
> 
> I do like seeing concrete numbers, thanks.  How many of these inodes and
> subvols remain undeleted?  Supposing inode numbers were reused, how many
> bits might you need?

Number of existing inodes is filesystem size divided by average inode
size, about 30 million inodes per terabyte for build servers, give or
take an order of magnitude per project.  That does put 1 << 32 inodes in
the range of current disk sizes, which motivated the inode_cache feature.

Number of existing subvols stays below 1 << 14.  It's usually some
near-constant multiple of the filesystem age (if it is not limited more
by capacity) because it's not trivial to move a subvol structure from
one filesystem to another.

The main constraint on the product of both numbers is filesystem size.
If that limit is reached, we often see that lower subvol numbers correlate
with higher inode numbers and vice versa; otherwise both keep growing until
they hit the size limit or some user-chosen limit (e.g. "we just don't
need more than the last 300 builds online at any time").

For build and backup use cases (which both heavily use snapshots) there is
no incentive to delete snapshots other than to avoid eventually running
out of space.  There is also no incentive to increase filesystem size
to accommodate extra snapshots, as long as there is room for some minimal
useful number of snapshots, the original subvols, and some free space.

So we get snapshots in numbers that are rougly:

	min(age_of_filesystem * snapshot_creation_rate, filesystem_capacity / average_subvol_unique_data_size)

Subvol IDs are not reusable.  They are embedded in shared object ownership
metadata, and persist for some time after subvols are deleted.

> > > My preference would be for btrfs to start re-using old object-ids and
> > > root-ids, and to enforce a limit (set at mkfs or tunefs) so that the
> > > total number of bits does not exceed 64.  Unfortunately the maintainers
> > > seem reluctant to even consider this.
> > 
> > It was considered, implemented in 2011, and removed in 2020.  Rationale
> > is in commit b547a88ea5776a8092f7f122ddc20d6720528782 "btrfs: start
> > deprecation of mount option inode_cache".  It made file creation slower,
> > and consumed disk space, iops, and memory to run.  Nobody used it.
> > Newer on-disk data structure versions (free space tree, 2015) didn't
> > bother implementing inode_cache's storage requirement.
> 
> Yes, I saw that.  Providing reliable functional certainly can impact
> performance and consume disk-space.  That isn't an excuse for not doing
> it. 
> I suspect that carefully tuned code could result in typical creation
> times being unchanged, and mean creation times suffering only a tiny
> cost.  Using "max+1" when the creation rate is particularly high might
> be a reasonable part of managing costs.
> Storage cost need not be worse than the cost of tracking free blocks
> on the device.

The cost of _tracking_ free object IDs is trivial compared to the cost
of _reusing_ an object ID on btrfs.

If btrfs doesn't reuse object numbers, btrfs can append new objects
to the last partially filled leaf.  If there are shared metadata pages
(i.e. snapshots), btrfs unshares a handful of pages once, and then future
writes use densely packed new pages and delayed allocation without having
to read anything.

If btrfs reuses object numbers, the filesystem has to pack new objects
into random previously filled metadata leaf nodes, so there are a lot
of read-modify-writes scattered over old metadata pages, which spreads
the working set around and reduces cache usage efficiency (i.e. uses
more RAM).  If there are snapshots, each shared page that is modified
for the first time after the snapshot comes with two-orders-of-magnitude
worst-case write multipliers.

The two-algorithm scheme (switching from "reuse freed inode" to "max+1"
under load) would be forced into the "max+1" mode half the time by a
daily workload of alternating git checkouts and builds.  It would save
only one bit of inode namespace over the lifetime of the filesystem.

> "Nobody used it" is odd.  It implies it would have to be explicitly
> enabled, and all it would provide anyone is sane behaviour.  Who would
> imagine that to be an optional extra.

It always had to be explicitly enabled.  It was initially a workaround
for 32-bit ino_t that was limiting a few users, but ino_t got better
and the need for inode_cache went away.

NFS (particularly NFSv2) might be the use case inode_cache has been
waiting for.  btrfs has an i_version field for NFSv4, so it's not like
there's no precedent for adding features in btrfs to support NFS.

On the other hand, the cost of ino_cache gets worse with snapshots,
and the benefit in practice takes years to decades to become relevant.
Users who are exporting snapshots over NFS are likely to be especially
averse to using inode_cache.

> NeilBrown
> 
> 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-19  2:19       ` Zygo Blaxell
  2021-08-20  2:54         ` NeilBrown
@ 2021-08-23  0:57         ` Wang Yugui
  1 sibling, 0 replies; 123+ messages in thread
From: Wang Yugui @ 2021-08-23  0:57 UTC (permalink / raw)
  To: Zygo Blaxell
  Cc: NeilBrown, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

Hi,

> rsync -H and cpio's hardlink detection can be badly confused.  They will
> think distinct files with the same inode number are hardlinks.  This could
> be bad if you were making backups (though if you're making backups over
> NFS, you are probably doing something that could be done better in a
> different way).

'rysnc -x ' and 'find -mount/-xdev' will fail to work in
snapper config?
snapper is a very important user case.

Although  yet not some option like '-mount/-xdev' for '/bin/cp',
but maybe come soon.

I though the first patchset( crossmnt in nfs client) is the right way,
because in most case, subvol is a filesystem, not  a directory.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/08/23





^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH v2] BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-15 22:17       ` NeilBrown
  2021-08-19  8:01         ` Amir Goldstein
@ 2021-08-23  4:05         ` NeilBrown
  2021-08-23  8:17           ` kernel test robot
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-23  4:05 UTC (permalink / raw)
  To: Chris Mason, David Sterba, Christoph Hellwig, Josef Bacik,
	J. Bruce Fields, Chuck Lever
  Cc: Roman Mamedov, Goffredo Baroncelli, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs


BTRFS does not provide unique inode numbers across a filesystem.
It only provide unique inode numbers within a subvolume and
uses synthetic device numbers for different subvolumes to ensure
uniqueness for device+inode.

nfsd cannot use these varying synthetic device numbers.  If nfsd were to
synthesise different stable filesystem ids to give to the client, that
would cause subvolumes to appear in the mount table on the client, even
though they don't appear in the mount table on the server.  Also, NFSv3
doesn't support changing the filesystem id without a new explicit mount
on the client (this is partially supported in practice, but violates the
protocol specification and has problems in some edge cases).

So currently, the roots of all subvolumes report the same inode number
in the same filesystem to NFS clients and tools like 'find' notice that
a directory has the same identity as an ancestor, and so refuse to
enter that directory.

This patch allows btrfs (or any filesystem) to provide a 64bit number
that can be xored with the inode number to make the number more unique.
Rather than the client being certain to see duplicates, with this patch
it is possible but extremely rare.

The number that btrfs provides is a swab64() version of the subvolume
identifier.  This has most entropy in the high bits (the low bits of the
subvolume identifer), while the inode has most entropy in the low bits.
The result will always be unique within a subvolume, and will almost
always be unique across the filesystem.

If an upgrade of the NFS server caused all inode numbers in an exportfs
BTRFS filesystem to appear to the client to change, the client may not
handle this well.  The Linux client will cause any open files to become
'stale'.  If the mount point changed inode number, the whole mount would
become inaccessible.

To avoid this, an unused byte in the filehandle (fh_auth) has been
repurposed as "fh_options".  (The use of #defines make fh_flags a
problematic choice).  The new behaviour of uniquifying inode number is
only activated when this bit is set.

NFSD will only set this bit in filehandles it reports if the filehandle
of the parent (provided by the client) contains the bit, or if
 - the filehandle for the parent is not provided or is for a different
   export and
 - the filehandle refers to a BTRFS filesystem.

Thus if you have a BTRFS filesystem originally mounted from a server
without this patch, the flag will never be set and the current behaviour
will continue.  Only once you re-mount the filesystem (or the filesystem
is re-auto-mounted) will the inode numbers change.  When that happens,
it is likely that the filesystem st_dev number seen on the client will
change anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/btrfs/inode.c                |  4 ++++
 fs/nfsd/nfs3xdr.c               | 15 ++++++++++++++-
 fs/nfsd/nfs4xdr.c               |  7 ++++---
 fs/nfsd/nfsfh.c                 | 13 +++++++++++--
 fs/nfsd/nfsfh.h                 | 22 ++++++++++++++++++++++
 fs/nfsd/xdr3.h                  |  2 ++
 include/linux/stat.h            | 18 ++++++++++++++++++
 include/uapi/linux/nfsd/nfsfh.h | 18 ++++++++++++------
 8 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0117d867ecf8..989fdf2032d5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9195,6 +9195,10 @@ static int btrfs_getattr(struct user_namespace *mnt_userns,
 	generic_fillattr(&init_user_ns, inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
 
+	if (BTRFS_I(inode)->root->root_key.objectid != BTRFS_FS_TREE_OBJECTID)
+		stat->ino_uniquifier =
+			swab64(BTRFS_I(inode)->root->root_key.objectid);
+
 	spin_lock(&BTRFS_I(inode)->lock);
 	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
 	inode_bytes = inode_get_bytes(inode);
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 0a5ebc52e6a9..19d14f11f79a 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -340,6 +340,7 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 {
 	struct user_namespace *userns = nfsd_user_namespace(rqstp);
 	__be32 *p;
+	u64 ino;
 	u64 fsid;
 
 	p = xdr_reserve_space(xdr, XDR_UNIT * 21);
@@ -377,7 +378,8 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 	p = xdr_encode_hyper(p, fsid);
 
 	/* fileid */
-	p = xdr_encode_hyper(p, stat->ino);
+	ino = nfsd_uniquify_ino(fhp, stat);
+	p = xdr_encode_hyper(p, ino);
 
 	p = encode_nfstime3(p, &stat->atime);
 	p = encode_nfstime3(p, &stat->mtime);
@@ -1151,6 +1153,17 @@ svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name,
 	if (xdr_stream_encode_item_present(xdr) < 0)
 		return false;
 	/* fileid */
+	if (!resp->dir_have_uniquifier) {
+		struct kstat stat;
+		if (fh_getattr(&resp->fh, &stat) == nfs_ok)
+			resp->dir_ino_uniquifier =
+				nfsd_ino_uniquifier(&resp->fh, &stat);
+		else
+			resp->dir_ino_uniquifier = 0;
+		resp->dir_have_uniquifier = true;
+	}
+	if (resp->dir_ino_uniquifier != ino)
+		ino ^= resp->dir_ino_uniquifier;
 	if (xdr_stream_encode_u64(xdr, ino) < 0)
 		return false;
 	/* name */
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 7abeccb975b2..5ed894ceebb0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3114,10 +3114,11 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 					fhp->fh_handle.fh_size);
 	}
 	if (bmval0 & FATTR4_WORD0_FILEID) {
+		u64 ino = nfsd_uniquify_ino(fhp, &stat);
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
 			goto out_resource;
-		p = xdr_encode_hyper(p, stat.ino);
+		p = xdr_encode_hyper(p, ino);
 	}
 	if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
 		p = xdr_reserve_space(xdr, 8);
@@ -3274,7 +3275,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
-                	goto out_resource;
+			goto out_resource;
 		/*
 		 * Get parent's attributes if not ignoring crossmount
 		 * and this is the root of a cross-mounted filesystem.
@@ -3284,7 +3285,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 			err = get_parent_attributes(exp, &parent_stat);
 			if (err)
 				goto out_nfserr;
-			ino = parent_stat.ino;
+			ino = nfsd_uniquify_ino(fhp, &parent_stat);
 		}
 		p = xdr_encode_hyper(p, ino);
 	}
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c475d2271f9c..e97ed957a379 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -172,7 +172,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 
 		if (--data_left < 0)
 			return error;
-		if (fh->fh_auth_type != 0)
+		if ((fh->fh_options & ~NFSD_FH_OPTION_ALL) != 0)
 			return error;
 		len = key_len(fh->fh_fsid_type) / 4;
 		if (len == 0)
@@ -569,6 +569,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 
 	struct inode * inode = d_inode(dentry);
 	dev_t ex_dev = exp_sb(exp)->s_dev;
+	u8 options = 0;
 
 	dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
 		MAJOR(ex_dev), MINOR(ex_dev),
@@ -585,6 +586,14 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 	/* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
 	fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
 
+	if (ref_fh && ref_fh->fh_export == exp) {
+		options = ref_fh->fh_handle.fh_options;
+	} else {
+		/* Set options as needed */
+		if (exp->ex_path.mnt->mnt_sb->s_magic == BTRFS_SUPER_MAGIC)
+			options |= NFSD_FH_OPTION_INO_UNIQUIFY;
+	}
+
 	if (ref_fh == fhp)
 		fh_put(ref_fh);
 
@@ -615,7 +624,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 	} else {
 		fhp->fh_handle.fh_size =
 			key_len(fhp->fh_handle.fh_fsid_type) + 4;
-		fhp->fh_handle.fh_auth_type = 0;
+		fhp->fh_handle.fh_options = options;
 
 		mk_fsid(fhp->fh_handle.fh_fsid_type,
 			fhp->fh_handle.fh_fsid,
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 6106697adc04..1144a98c2951 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -84,6 +84,28 @@ enum fsid_source {
 };
 extern enum fsid_source fsid_source(const struct svc_fh *fhp);
 
+enum nfsd_fh_options {
+	NFSD_FH_OPTION_INO_UNIQUIFY = 1,	/* BTRFS only */
+
+	NFSD_FH_OPTION_ALL = 1
+};
+
+static inline u64 nfsd_ino_uniquifier(const struct svc_fh *fhp,
+				      const struct kstat *stat)
+{
+	if (fhp->fh_handle.fh_options & NFSD_FH_OPTION_INO_UNIQUIFY)
+		return stat->ino_uniquifier;
+	return 0;
+}
+
+static inline u64 nfsd_uniquify_ino(const struct svc_fh *fhp,
+				    const struct kstat *stat)
+{
+	u64 u = nfsd_ino_uniquifier(fhp, stat);
+	if (u != stat->ino)
+		return stat->ino ^ u;
+	return stat->ino;
+}
 
 /*
  * This might look a little large to "inline" but in all calls except
diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
index 933008382bbe..d9b6c8314bbb 100644
--- a/fs/nfsd/xdr3.h
+++ b/fs/nfsd/xdr3.h
@@ -179,6 +179,8 @@ struct nfsd3_readdirres {
 	struct xdr_buf		dirlist;
 	struct svc_fh		scratch;
 	struct readdir_cd	common;
+	u64			dir_ino_uniquifier;
+	bool			dir_have_uniquifier;
 	unsigned int		cookie_offset;
 	struct svc_rqst *	rqstp;
 
diff --git a/include/linux/stat.h b/include/linux/stat.h
index fff27e603814..0f3f74d302f8 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -46,6 +46,24 @@ struct kstat {
 	struct timespec64 btime;			/* File creation time */
 	u64		blocks;
 	u64		mnt_id;
+	/*
+	 * BTRFS does not provide unique inode numbers within a filesystem,
+	 * depending on a synthetic 'dev' to provide uniqueness.
+	 * NFSd cannot make use of this 'dev' number so clients often see
+	 * duplicate inode numbers.
+	 * For BTRFS, 'ino' is unlikely to use the high bits until the filesystem
+	 * has created a great many inodes.
+	 * It puts another number in ino_uniquifier which:
+	 * - has most entropy in the high bits
+	 * - is different precisely when 'dev' is different
+	 * - is stable across unmount/remount
+	 * NFSd can xor this with 'ino' to get a substantially more unique
+	 * number for reporting to the client.
+	 * The ino_uniquifier for a directory can reasonably be applied
+	 * to inode numbers reported by the readdir filldir callback.
+	 * It is NOT currently exported to user-space.
+	 */
+	u64		ino_uniquifier;
 };
 
 #endif
diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h
index 427294dd56a1..59311df4b476 100644
--- a/include/uapi/linux/nfsd/nfsfh.h
+++ b/include/uapi/linux/nfsd/nfsfh.h
@@ -38,11 +38,17 @@ struct nfs_fhbase_old {
  * The file handle starts with a sequence of four-byte words.
  * The first word contains a version number (1) and three descriptor bytes
  * that tell how the remaining 3 variable length fields should be handled.
- * These three bytes are auth_type, fsid_type and fileid_type.
+ * These three bytes are options, fsid_type and fileid_type.
  *
  * All four-byte values are in host-byte-order.
  *
- * The auth_type field is deprecated and must be set to 0.
+ * The options field (previously auth_type) can be used when nfsd behaviour
+ * needs to change in a non-compatible way, usually for some specific
+ * filesystem.  Options should only be set in filehandles for filesystems which
+ * need them.
+ * Current values:
+ *   1  -  BTRFS only.  Cause stat->ino_uniquifier to be used to improve inode
+ *         number uniqueness.
  *
  * The fsid_type identifies how the filesystem (or export point) is
  *    encoded.
@@ -67,7 +73,7 @@ struct nfs_fhbase_new {
 	union {
 		struct {
 			__u8		fb_version_aux;	/* == 1, even => nfs_fhbase_old */
-			__u8		fb_auth_type_aux;
+			__u8		fb_options_aux;
 			__u8		fb_fsid_type_aux;
 			__u8		fb_fileid_type_aux;
 			__u32		fb_auth[1];
@@ -76,7 +82,7 @@ struct nfs_fhbase_new {
 		};
 		struct {
 			__u8		fb_version;	/* == 1, even => nfs_fhbase_old */
-			__u8		fb_auth_type;
+			__u8		fb_options;
 			__u8		fb_fsid_type;
 			__u8		fb_fileid_type;
 			__u32		fb_auth_flex[]; /* flexible-array member */
@@ -106,11 +112,11 @@ struct knfsd_fh {
 
 #define	fh_version		fh_base.fh_new.fb_version
 #define	fh_fsid_type		fh_base.fh_new.fb_fsid_type
-#define	fh_auth_type		fh_base.fh_new.fb_auth_type
+#define	fh_options		fh_base.fh_new.fb_options
 #define	fh_fileid_type		fh_base.fh_new.fb_fileid_type
 #define	fh_fsid			fh_base.fh_new.fb_auth_flex
 
 /* Do not use, provided for userspace compatiblity. */
-#define	fh_auth			fh_base.fh_new.fb_auth
+#define	fh_auth			fh_base.fh_new.fb_options
 
 #endif /* _UAPI_LINUX_NFSD_FH_H */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-22 19:29           ` Zygo Blaxell
@ 2021-08-23  5:51             ` NeilBrown
  2021-08-23 23:22             ` NeilBrown
  1 sibling, 0 replies; 123+ messages in thread
From: NeilBrown @ 2021-08-23  5:51 UTC (permalink / raw)
  To: Zygo Blaxell
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Mon, 23 Aug 2021, Zygo Blaxell wrote:
> 
> Subvol IDs are not reusable.  They are embedded in shared object ownership
> metadata, and persist for some time after subvols are deleted.

Hmmm...  that's interesting.  Makes some sense too.  I did wonder how
ownership across multiple snapshots was tracked.

> 
> > > > My preference would be for btrfs to start re-using old object-ids and
> > > > root-ids, and to enforce a limit (set at mkfs or tunefs) so that the
> > > > total number of bits does not exceed 64.  Unfortunately the maintainers
> > > > seem reluctant to even consider this.
> > > 
> > > It was considered, implemented in 2011, and removed in 2020.  Rationale
> > > is in commit b547a88ea5776a8092f7f122ddc20d6720528782 "btrfs: start
> > > deprecation of mount option inode_cache".  It made file creation slower,
> > > and consumed disk space, iops, and memory to run.  Nobody used it.
> > > Newer on-disk data structure versions (free space tree, 2015) didn't
> > > bother implementing inode_cache's storage requirement.
> > 
> > Yes, I saw that.  Providing reliable functional certainly can impact
> > performance and consume disk-space.  That isn't an excuse for not doing
> > it. 
> > I suspect that carefully tuned code could result in typical creation
> > times being unchanged, and mean creation times suffering only a tiny
> > cost.  Using "max+1" when the creation rate is particularly high might
> > be a reasonable part of managing costs.
> > Storage cost need not be worse than the cost of tracking free blocks
> > on the device.
> 
> The cost of _tracking_ free object IDs is trivial compared to the cost
> of _reusing_ an object ID on btrfs.

I hadn't thought of that.

> 
> If btrfs doesn't reuse object numbers, btrfs can append new objects
> to the last partially filled leaf.  If there are shared metadata pages
> (i.e. snapshots), btrfs unshares a handful of pages once, and then future
> writes use densely packed new pages and delayed allocation without having
> to read anything.
> 
> If btrfs reuses object numbers, the filesystem has to pack new objects
> into random previously filled metadata leaf nodes, so there are a lot
> of read-modify-writes scattered over old metadata pages, which spreads
> the working set around and reduces cache usage efficiency (i.e. uses
> more RAM).  If there are snapshots, each shared page that is modified
> for the first time after the snapshot comes with two-orders-of-magnitude
> worst-case write multipliers.

I don't really follow that .... but I'll take your word for it for now.

> 
> The two-algorithm scheme (switching from "reuse freed inode" to "max+1"
> under load) would be forced into the "max+1" mode half the time by a
> daily workload of alternating git checkouts and builds.  It would save
> only one bit of inode namespace over the lifetime of the filesystem.
> 
> > "Nobody used it" is odd.  It implies it would have to be explicitly
> > enabled, and all it would provide anyone is sane behaviour.  Who would
> > imagine that to be an optional extra.
> 
> It always had to be explicitly enabled.  It was initially a workaround
> for 32-bit ino_t that was limiting a few users, but ino_t got better
> and the need for inode_cache went away.
> 
> NFS (particularly NFSv2) might be the use case inode_cache has been
> waiting for.  btrfs has an i_version field for NFSv4, so it's not like
> there's no precedent for adding features in btrfs to support NFS.

NFSv2 is not worth any effort.  NFSv4 is.  NFSv3 ... some, but not a lot.

> 
> On the other hand, the cost of ino_cache gets worse with snapshots,
> and the benefit in practice takes years to decades to become relevant.
> Users who are exporting snapshots over NFS are likely to be especially
> averse to using inode_cache.

That's the real killer.  Everything will work fine for years until it
doesn't.  And once it doesn't ....  what do you do?

Thanks for lot for all this background info.  I've found it to be very
helpful for my general understanding.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH v2] BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
@ 2021-08-23  8:17           ` kernel test robot
  0 siblings, 0 replies; 123+ messages in thread
From: kernel test robot @ 2021-08-23  8:17 UTC (permalink / raw)
  To: NeilBrown, Chris Mason, David Sterba, Christoph Hellwig,
	Josef Bacik, J. Bruce Fields, Chuck Lever
  Cc: clang-built-linux, kbuild-all, Roman Mamedov,
	Goffredo Baroncelli, Alexander Viro, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 5472 bytes --]

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on nfs/linux-next]
[also build test ERROR on hch-configfs/for-next linus/master v5.14-rc7 next-20210820]
[cannot apply to kdave/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/BTRFS-NFSD-provide-more-unique-inode-number-for-btrfs-export/20210823-120718
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: hexagon-randconfig-r045-20210822 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 79b55e5038324e61a3abf4e6a9a949c473edd858)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/e99ff00e4055532e35c592b50809761d82f87595
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/BTRFS-NFSD-provide-more-unique-inode-number-for-btrfs-export/20210823-120718
        git checkout e99ff00e4055532e35c592b50809761d82f87595
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> fs/nfsd/nfsfh.c:593:44: error: use of undeclared identifier 'BTRFS_SUPER_MAGIC'
                   if (exp->ex_path.mnt->mnt_sb->s_magic == BTRFS_SUPER_MAGIC)
                                                            ^
>> fs/nfsd/nfsfh.c:593:44: error: use of undeclared identifier 'BTRFS_SUPER_MAGIC'
>> fs/nfsd/nfsfh.c:593:44: error: use of undeclared identifier 'BTRFS_SUPER_MAGIC'
   3 errors generated.


vim +/BTRFS_SUPER_MAGIC +593 fs/nfsd/nfsfh.c

   557	
   558	__be32
   559	fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
   560		   struct svc_fh *ref_fh)
   561	{
   562		/* ref_fh is a reference file handle.
   563		 * if it is non-null and for the same filesystem, then we should compose
   564		 * a filehandle which is of the same version, where possible.
   565		 * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
   566		 * Then create a 32byte filehandle using nfs_fhbase_old
   567		 *
   568		 */
   569	
   570		struct inode * inode = d_inode(dentry);
   571		dev_t ex_dev = exp_sb(exp)->s_dev;
   572		u8 options = 0;
   573	
   574		dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
   575			MAJOR(ex_dev), MINOR(ex_dev),
   576			(long) d_inode(exp->ex_path.dentry)->i_ino,
   577			dentry,
   578			(inode ? inode->i_ino : 0));
   579	
   580		/* Choose filehandle version and fsid type based on
   581		 * the reference filehandle (if it is in the same export)
   582		 * or the export options.
   583		 */
   584		set_version_and_fsid_type(fhp, exp, ref_fh);
   585	
   586		/* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
   587		fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
   588	
   589		if (ref_fh && ref_fh->fh_export == exp) {
   590			options = ref_fh->fh_handle.fh_options;
   591		} else {
   592			/* Set options as needed */
 > 593			if (exp->ex_path.mnt->mnt_sb->s_magic == BTRFS_SUPER_MAGIC)
   594				options |= NFSD_FH_OPTION_INO_UNIQUIFY;
   595		}
   596	
   597		if (ref_fh == fhp)
   598			fh_put(ref_fh);
   599	
   600		if (fhp->fh_locked || fhp->fh_dentry) {
   601			printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n",
   602			       dentry);
   603		}
   604		if (fhp->fh_maxsize < NFS_FHSIZE)
   605			printk(KERN_ERR "fh_compose: called with maxsize %d! %pd2\n",
   606			       fhp->fh_maxsize,
   607			       dentry);
   608	
   609		fhp->fh_dentry = dget(dentry); /* our internal copy */
   610		fhp->fh_export = exp_get(exp);
   611	
   612		if (fhp->fh_handle.fh_version == 0xca) {
   613			/* old style filehandle please */
   614			memset(&fhp->fh_handle.fh_base, 0, NFS_FHSIZE);
   615			fhp->fh_handle.fh_size = NFS_FHSIZE;
   616			fhp->fh_handle.ofh_dcookie = 0xfeebbaca;
   617			fhp->fh_handle.ofh_dev =  old_encode_dev(ex_dev);
   618			fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev;
   619			fhp->fh_handle.ofh_xino =
   620				ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
   621			fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry));
   622			if (inode)
   623				_fh_update_old(dentry, exp, &fhp->fh_handle);
   624		} else {
   625			fhp->fh_handle.fh_size =
   626				key_len(fhp->fh_handle.fh_fsid_type) + 4;
   627			fhp->fh_handle.fh_options = options;
   628	
   629			mk_fsid(fhp->fh_handle.fh_fsid_type,
   630				fhp->fh_handle.fh_fsid,
   631				ex_dev,
   632				d_inode(exp->ex_path.dentry)->i_ino,
   633				exp->ex_fsid, exp->ex_uuid);
   634	
   635			if (inode)
   636				_fh_update(fhp, exp, dentry);
   637			if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
   638				fh_put(fhp);
   639				return nfserr_opnotsupp;
   640			}
   641		}
   642	
   643		return 0;
   644	}
   645	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29470 bytes --]

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-22 19:29           ` Zygo Blaxell
  2021-08-23  5:51             ` NeilBrown
@ 2021-08-23 23:22             ` NeilBrown
  2021-08-25  2:06               ` Zygo Blaxell
  1 sibling, 1 reply; 123+ messages in thread
From: NeilBrown @ 2021-08-23 23:22 UTC (permalink / raw)
  To: Zygo Blaxell
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Mon, 23 Aug 2021, Zygo Blaxell wrote:
...
> 
> Subvol IDs are not reusable.  They are embedded in shared object ownership
> metadata, and persist for some time after subvols are deleted.
...
> 
> The cost of _tracking_ free object IDs is trivial compared to the cost
> of _reusing_ an object ID on btrfs.

One possible approach to these two objections is to decouple inode
numbers from object ids.
The inode number becomes just another piece of metadata stored in the
inode.
struct btrfs_inode_item has four spare u64s, so we could use one of
those.
struct btrfs_dir_item would need to store the inode number too.  What
is location.offset used for?  Would a diritem ever point to a non-zero
offset?  Could the 'offset' be used to store the inode number?

This could even be added to existing filesystems I think.  It might not
be easy to re-use inode numbers smaller than the largest at the time the
extension was added, but newly added inode numbers could be reused after
they were deleted.

Just a thought...

NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export
  2021-08-23 23:22             ` NeilBrown
@ 2021-08-25  2:06               ` Zygo Blaxell
  0 siblings, 0 replies; 123+ messages in thread
From: Zygo Blaxell @ 2021-08-25  2:06 UTC (permalink / raw)
  To: NeilBrown
  Cc: Wang Yugui, Christoph Hellwig, Josef Bacik, J. Bruce Fields,
	Chuck Lever, Chris Mason, David Sterba, Alexander Viro,
	linux-fsdevel, linux-nfs, linux-btrfs

On Tue, Aug 24, 2021 at 09:22:05AM +1000, NeilBrown wrote:
> On Mon, 23 Aug 2021, Zygo Blaxell wrote:
> ...
> > 
> > Subvol IDs are not reusable.  They are embedded in shared object ownership
> > metadata, and persist for some time after subvols are deleted.
> ...
> > 
> > The cost of _tracking_ free object IDs is trivial compared to the cost
> > of _reusing_ an object ID on btrfs.
> 
> One possible approach to these two objections is to decouple inode
> numbers from object ids.

This would be reasonable for subvol IDs (I thought of it earlier in this
thread, but didn't mention it because I wasn't going to be the first to
open that worm can ;).

There aren't very many subvol IDs and they're not used as frequently
as inodes, so a lookup table to remap them to smaller numbers to save
st_ino bit-space wouldn't be unreasonably expensive.  If we stop right
here and use the [some_zeros:reversed_subvol:inode] bit-packing scheme
you proposed for NFS, that seems like a reasonable plan.  It would have
48 bits of usable inode number space, ~440000 file creates per second
for 20 years with up to 65535 snapshots, the same number of bits that
ZFS has in its inodes.

Once that subvol ID mapping tree exists, it could also map subvol inode
numbers to globally unique numbers.  Each tree item would contain a map of
[subvol_inode1..subvol_inode2] that maps the inode numbers in the subvol
into the global inode number space at [global_inode1..global_inode2].
When a snapshot is created, the snapshot gets a copy of all the origin
subvol's inode ranges, but with newly allocated base offsets.  If the
original subvol needs new inodes, it gets a new chunk from the global
inode allocator.  If the snapshot subvol needs new inodes, it gets a
different new chunk from the global allocator.  The minimum chunk might
be a million inodes or so to avoid having to allocate new chunks all the
time, but not so high to make the code completely untested (or testers
just set the minchunk to 1000 inodes).

The question I have (and why I didn't propose this earlier) is whether
this scheme is any real improvement over dividing the subvol:inode space
by bit packing.  If you have one subvol that has 3 billion existing inodes
in it, every snapshot of that subvol is going to burn up roughly 2^-32 of
the available globally unique inode numbers.  If we burn 3 billion inodes
instead of 4 billion per subvol, it only gets 25% more lifespan for the
filesystem, and the allocation of unique inode spaces and tracking inode
space usage will add cost to every single file creation and snapshot
operation.  If your oldest/biggest subvol only has a million inodes in
it, all of the above is pure cost:  you can create billions of snapshots,
never repeat any object IDs, and never worry about running out.

I'd want to see cost/benefit simulations of:

	this plan,

	the simpler but less efficient bit-packing plan,

	'cp -a --reflink' to a new subvol and start over every 20 years
	when inodes run out,

	and online garbage-collection/renumbering schemes that allow
	users to schedule the inode renumbering costs in overnight
	batches instead of on every inode create.

> The inode number becomes just another piece of metadata stored in the
> inode.
> struct btrfs_inode_item has four spare u64s, so we could use one of
> those.
> struct btrfs_dir_item would need to store the inode number too.  What
> is location.offset used for?  Would a diritem ever point to a non-zero
> offset?  Could the 'offset' be used to store the inode number?

Offset is used to identify subvol roots at the moment, but so far that
means only values 0 and UINT64_MAX are used.  It seems possible to treat
all other values as inode numbers.  Don't quote me on that--I'm not an
expert on this structure.

> This could even be added to existing filesystems I think.  It might not
> be easy to re-use inode numbers smaller than the largest at the time the
> extension was added, but newly added inode numbers could be reused after
> they were deleted.

We'd need a structure to track reusable inode numbers and it would have to
be kept up to date to work, so this feature would necessarily come with an
incompat bit.  Whether you borrow bits from existing structures or make
extended new structures doesn't matter at that point, though obviously
for something as common as inodes it would be bad to make them bigger.

Some of the btrfs userspace API uses inode numbers, but unless I missed
something, it could all be converted to use object numbers directly
instead.

> Just a thought...
> 
> NeilBrown

^ permalink raw reply	[flat|nested] 123+ messages in thread

end of thread, other threads:[~2021-08-25  2:07 UTC | newest]

Thread overview: 123+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27 22:37 [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly NeilBrown
2021-07-27 22:37 ` [PATCH 07/11] exportfs: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 10:13   ` Amir Goldstein
2021-07-29  0:28     ` NeilBrown
2021-07-29  5:27       ` Amir Goldstein
2021-08-06  7:52         ` Miklos Szeredi
2021-08-06  8:08           ` Amir Goldstein
2021-08-06  8:18             ` Miklos Szeredi
2021-07-28 19:17   ` J. Bruce Fields
2021-07-28 22:25     ` NeilBrown
2021-07-27 22:37 ` [PATCH 04/11] VFS: export lookup_mnt() NeilBrown
2021-07-30  0:31   ` Al Viro
2021-07-30  5:33     ` NeilBrown
2021-07-27 22:37 ` [PATCH 01/11] VFS: show correct dev num in mountinfo NeilBrown
2021-07-30  0:25   ` Al Viro
2021-07-30  5:28     ` NeilBrown
2021-07-30  5:54       ` Miklos Szeredi
2021-07-30  6:13         ` NeilBrown
2021-07-30  7:18           ` Miklos Szeredi
2021-07-30  7:33             ` NeilBrown
2021-07-30  7:59               ` Miklos Szeredi
2021-08-02  4:18                 ` A Third perspective on BTRFS nfsd subvol dev/inode number issues NeilBrown
2021-08-02  5:25                   ` Al Viro
2021-08-02  5:40                     ` NeilBrown
2021-08-02  7:54                       ` Amir Goldstein
2021-08-02 13:53                         ` Josef Bacik
2021-08-03 22:29                           ` Qu Wenruo
2021-08-02 14:47                         ` Frank Filz
2021-08-02 21:24                         ` NeilBrown
2021-08-02  7:15                   ` Martin Steigerwald
2021-08-02 21:40                     ` NeilBrown
2021-08-02 12:39                   ` J. Bruce Fields
2021-08-02 20:32                     ` Patrick Goetz
2021-08-02 20:41                       ` J. Bruce Fields
2021-08-02 21:10                     ` NeilBrown
2021-08-02 21:50                       ` J. Bruce Fields
2021-08-02 21:59                         ` NeilBrown
2021-08-02 22:14                           ` J. Bruce Fields
2021-08-02 22:36                             ` NeilBrown
2021-08-03  0:15                               ` J. Bruce Fields
2021-07-27 22:37 ` [PATCH 03/11] VFS: pass lookup_flags into follow_down() NeilBrown
2021-07-27 22:37 ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots NeilBrown
2021-07-28  8:37   ` kernel test robot
2021-07-28  8:37   ` [RFC PATCH] btrfs: btrfs_mountpoint_expiry_timeout can be static kernel test robot
2021-07-28 13:12   ` [PATCH 11/11] btrfs: use automount to bind-mount all subvol roots Christian Brauner
2021-07-29  0:43     ` NeilBrown
2021-07-29 14:38       ` Christian Brauner
2021-07-31  6:25   ` [btrfs] 5874902268: xfstests.btrfs.202.fail kernel test robot
2021-07-27 22:37 ` [PATCH 06/11] nfsd: include a vfsmount in struct svc_fh NeilBrown
2021-07-27 22:37 ` [PATCH 10/11] btrfs: introduce mapping function from location to inum NeilBrown
2021-07-27 22:37 ` [PATCH 02/11] VFS: allow d_automount to create in-place bind-mount NeilBrown
2021-07-27 22:37 ` [PATCH 09/11] nfsd: Allow filehandle lookup to cross internal mount points NeilBrown
2021-07-28 19:15   ` J. Bruce Fields
2021-07-28 22:29     ` NeilBrown
2021-07-30  0:42   ` Al Viro
2021-07-30  5:43     ` NeilBrown
2021-07-27 22:37 ` [PATCH 08/11] nfsd: change get_parent_attributes() to nfsd_get_mounted_on() NeilBrown
2021-07-27 22:37 ` [PATCH 05/11] VFS: new function: mount_is_internal() NeilBrown
2021-07-28  2:16   ` Al Viro
2021-07-28  3:32     ` NeilBrown
2021-07-30  0:34       ` Al Viro
2021-07-28  2:19 ` [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Al Viro
2021-07-28  4:58 ` Wang Yugui
2021-07-28  6:04   ` Wang Yugui
2021-07-28  7:01     ` NeilBrown
2021-07-28 12:26       ` Neal Gompa
2021-07-28 19:14         ` J. Bruce Fields
2021-07-29  1:29           ` Zygo Blaxell
2021-07-29  1:43             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-28 22:50         ` NeilBrown
2021-07-29  2:37           ` Zygo Blaxell
2021-07-29  3:36             ` NeilBrown
2021-07-29 23:20               ` Zygo Blaxell
2021-07-30  2:36                 ` NeilBrown
2021-07-30  5:25                   ` Qu Wenruo
2021-07-30  5:31                     ` Qu Wenruo
2021-07-30  5:53                       ` Amir Goldstein
2021-07-30  6:00                       ` NeilBrown
2021-07-30  6:09                         ` Qu Wenruo
2021-07-30  5:58                     ` NeilBrown
2021-07-30  6:23                       ` Qu Wenruo
2021-07-30  6:53                         ` NeilBrown
2021-07-30  7:09                           ` Qu Wenruo
2021-07-30 18:15                             ` Zygo Blaxell
2021-07-30 15:17                         ` J. Bruce Fields
2021-07-30 15:48                           ` Josef Bacik
2021-07-30 16:25                             ` Forza
2021-07-30 17:43                             ` Zygo Blaxell
2021-07-30  5:28                   ` Amir Goldstein
2021-07-28 13:43       ` g.btrfs
2021-07-29  1:39         ` NeilBrown
2021-07-29  9:28           ` Graham Cobb
2021-07-28  7:06   ` NeilBrown
2021-07-28  9:36     ` Wang Yugui
2021-07-28 19:35 ` J. Bruce Fields
2021-07-28 21:30   ` Josef Bacik
2021-07-30  0:13     ` Al Viro
2021-07-30  6:08       ` NeilBrown
2021-08-13  1:45 ` [PATCH] VFS/BTRFS/NFSD: provide more unique inode number for btrfs export NeilBrown
2021-08-13 14:55   ` Josef Bacik
2021-08-15  7:39   ` Goffredo Baroncelli
2021-08-15 19:35     ` Roman Mamedov
2021-08-15 21:03       ` Goffredo Baroncelli
2021-08-15 21:53         ` NeilBrown
2021-08-17 19:34           ` Goffredo Baroncelli
2021-08-17 21:39             ` NeilBrown
2021-08-18 17:24               ` Goffredo Baroncelli
2021-08-15 22:17       ` NeilBrown
2021-08-19  8:01         ` Amir Goldstein
2021-08-20  3:21           ` NeilBrown
2021-08-20  6:23             ` Amir Goldstein
2021-08-23  4:05         ` [PATCH v2] BTRFS/NFSD: " NeilBrown
2021-08-23  8:17           ` kernel test robot
2021-08-18 14:54   ` [PATCH] VFS/BTRFS/NFSD: " Wang Yugui
2021-08-18 21:46     ` NeilBrown
2021-08-19  2:19       ` Zygo Blaxell
2021-08-20  2:54         ` NeilBrown
2021-08-22 19:29           ` Zygo Blaxell
2021-08-23  5:51             ` NeilBrown
2021-08-23 23:22             ` NeilBrown
2021-08-25  2:06               ` Zygo Blaxell
2021-08-23  0:57         ` Wang Yugui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.