Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly

From: Wang Yugui <wangyugui@e16-tech.com>
To: NeilBrown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly
Date: Wed, 28 Jul 2021 12:58:20 +0800	[thread overview]
Message-ID: <20210728125819.6E52.409509F4@e16-tech.com> (raw)
In-Reply-To: <162742539595.32498.13687924366155737575.stgit@noble.brown>

[-- Attachment #1: Type: text/plain, Size: 4396 bytes --]

Hi,

We no longer need the dummy inode(BTRFS_FIRST_FREE_OBJECTID - 1) in this
patch serials?

I tried to backport it to 5.10.x, but it failed to work.
No big modification in this 5.10.x backporting, and all modified pathes
are attached.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/07/28

> There are long-standing problems with btrfs subvols, particularly in
> relation to whether and how they are exposed in the mount table.
> 
>  - /proc/self/mountinfo reports the major:minor device number for each
>     filesystem and when a btrfs subvol is explicitly mounted, the number
>     reported is wrong - it does not match what stat() reports for the
>     mountpoint.
> 
>  - when subvol are not explicitly mounted, they don't appear in
>    mountinfo at all.
> 
> Consequences include that a tool which uses stat() to find the dev of the
> filesystem, then searches mountinfo for that filesystem, will not find
> it.
> 
> Some tools (e.g. findmnt) appear to have been enhanced to cope with this
> strangeness, but it would be best to make btrfs behave more normally.
> 
>   - nfsd cannot currently see the transition to subvol, so reports the
>     main volume and all subvols to the client as being in the same
>     filesystem.  As inode numbers are not unique across all subvols,
>     this can confuse clients.  In particular, 'find' is likely to report a
>     loop.
> 
> subvols can be made to appear in mountinfo using automounts.  However
> nfsd does not cope well with automounts.  It assumes all filesystems to
> be exported are already mounted.  So adding automounts to btrfs would
> break nfsd.
> 
> We can enhance nfsd to understand that some automounts can be managed.
> "internal mounts" where a filesystem provides an automount point and
> mounts its own directories, can be handled differently by nfsd.
> 
> This series addresses all these issues.  After a few enhancements to the
> VFS to provide needed support, they enhance exportfs and nfsd to cope
> with the concept of internal mounts, and then enhance btrfs to provide
> them.
> 
> The NFSv3 support is incomplete.  I'm not sure we can make it work
> "perfectly".  A normal nfsv3 mount seem to work well enough, but if
> mounted with '-o noac', it loses track of the mounted-on inode number
> and complains about inode numbers changing.
> 
> My basic test for these is to mount a btrfs filesystem which contains
> subvols, nfs-export it and mount it with nfsv3 and nfsv4, then run
> 'find' in each of the filesystem and check the contents of
> /proc/self/mountinfo.
> 
> The first patch simply fixes the dev number in mountinfo and could
> possibly be tagged for -stable.
> 
> NeilBrown
> 
> ---
> 
> NeilBrown (11):
>       VFS: show correct dev num in mountinfo
>       VFS: allow d_automount to create in-place bind-mount.
>       VFS: pass lookup_flags into follow_down()
>       VFS: export lookup_mnt()
>       VFS: new function: mount_is_internal()
>       nfsd: include a vfsmount in struct svc_fh
>       exportfs: Allow filehandle lookup to cross internal mount points.
>       nfsd: change get_parent_attributes() to nfsd_get_mounted_on()
>       nfsd: Allow filehandle lookup to cross internal mount points.
>       btrfs: introduce mapping function from location to inum
>       btrfs: use automount to bind-mount all subvol roots.
> 
> 
>  fs/btrfs/btrfs_inode.h   |  12 +++
>  fs/btrfs/inode.c         | 111 ++++++++++++++++++++++++++-
>  fs/btrfs/super.c         |   1 +
>  fs/exportfs/expfs.c      | 100 ++++++++++++++++++++----
>  fs/fhandle.c             |   2 +-
>  fs/internal.h            |   1 -
>  fs/namei.c               |   6 +-
>  fs/namespace.c           |  32 +++++++-
>  fs/nfsd/export.c         |   4 +-
>  fs/nfsd/nfs3xdr.c        |  40 +++++++---
>  fs/nfsd/nfs4proc.c       |   9 ++-
>  fs/nfsd/nfs4xdr.c        | 106 ++++++++++++-------------
>  fs/nfsd/nfsfh.c          |  44 +++++++----
>  fs/nfsd/nfsfh.h          |   3 +-
>  fs/nfsd/nfsproc.c        |   5 +-
>  fs/nfsd/vfs.c            | 162 +++++++++++++++++++++++----------------
>  fs/nfsd/vfs.h            |  12 +--
>  fs/nfsd/xdr4.h           |   2 +-
>  fs/overlayfs/namei.c     |   5 +-
>  fs/xfs/xfs_ioctl.c       |  12 ++-
>  include/linux/exportfs.h |   4 +-
>  include/linux/mount.h    |   4 +
>  include/linux/namei.h    |   2 +-
>  23 files changed, 490 insertions(+), 189 deletions(-)
> 
> --
> Signature


[-- Attachment #2: 0009-nfsd-Allow-filehandle-lookup-to-cross-internal-mount.patch --]
[-- Type: application/octet-stream, Size: 6507 bytes --]

From af944383d835860cf9ebd7859aadbfbc4a5c295c Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] nfsd: Allow filehandle lookup to cross internal mount points.

Enhance nfsd to detect internal mounts and to cross them without
requiring a new export.

Also ensure the fsid reported is different for different submounts.  We
do this by xoring in the ino of the mounted-on directory.  This makes
sense for btrfs at least.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3xdr.c | 28 +++++++++++++++++++++-------
 fs/nfsd/nfs4xdr.c | 34 +++++++++++++++++++++++-----------
 fs/nfsd/nfsfh.c   |  8 +++++++-
 fs/nfsd/vfs.c     | 11 +++++++++--
 4 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 583a228..5e77442 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -156,6 +156,8 @@ static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp)
 	case FSIDSOURCE_UUID:
 		f = ((u64*)fhp->fh_export->ex_uuid)[0];
 		f ^= ((u64*)fhp->fh_export->ex_uuid)[1];
+		if (fhp->fh_mnt != fhp->fh_export->ex_path.mnt)
+			f ^= nfsd_get_mounted_on(fhp->fh_mnt);
 		p = xdr_encode_hyper(p, f);
 		break;
 	}
@@ -859,8 +861,8 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 	__be32 rv = nfserr_noent;
 
 	dparent = cd->fh.fh_dentry;
-	exp  = cd->fh.fh_export;
-	child.mnt = cd->fh.fh_mnt;
+	exp  = exp_get(cd->fh.fh_export);
+	child.mnt = mntget(cd->fh.fh_mnt);
 
 	if (isdotent(name, namlen)) {
 		if (namlen == 2) {
@@ -1112,15 +1114,27 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 			child.dentry = dget(dparent);
 	} else
 		child.dentry = lookup_positive_unlocked(name, dparent, namlen);
-	if (IS_ERR(child.dentry))
+	if (IS_ERR(child.dentry)) {
+		mntput(child.mnt);
+		exp_put(exp);
 		return rv;
-	if (d_mountpoint(child.dentry))
-		goto out;
-	if (child.dentry->d_inode->i_ino != ino)
+	}
+	/* If child is a mountpoint, then we want to expose the fact
+	 * so client can create a mountpoint.  If not, then a different
+	 * ino number probably means a race with rename, so avoid providing
+	 * too much detail.
+	 */
+	if (nfsd_mountpoint(child.dentry, exp)) {
+		int err;
+		err = nfsd_cross_mnt(cd->rqstp, &child, &exp);
+		if (err)
+			goto out;
+	} else if (child.dentry->d_inode->i_ino != ino)
 		goto out;
 	rv = fh_compose(fhp, exp, &child, &cd->fh);
 out:
-	dput(child.dentry);
+	path_put(&child);
+	exp_put(exp);
 	return rv;
 }
 
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d5683b6a74b2..4dbc99ed2c8b 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2817,6 +2817,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 	struct kstat stat;
 	struct svc_fh *tempfh = NULL;
 	struct kstatfs statfs;
+	u64 mounted_on_ino;
+	u64 sub_fsid;
 	__be32 *p;
 	int starting_len = xdr->buf->len;
 	int attrlen_offset;
@@ -2871,6 +2873,24 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 			goto out;
 		fhp = tempfh;
 	}
+	if ((bmval0 & FATTR4_WORD0_FSID) ||
+	    (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID)) {
+		mounted_on_ino = stat.ino;
+		sub_fsid = 0;
+		/*
+		 * The inode number that the current mnt is mounted on is
+		 * used for MOUNTED_ON_FILED if we are at the root,
+		 * and for sub_fsid if mnt is not the export mnt.
+		 */
+		if (ignore_crossmnt == 0) {
+			u64 moi = nfsd_get_mounted_on(mnt);
+
+			if (dentry == mnt->mnt_root && moi)
+				mounted_on_ino = moi;
+			if (mnt != exp->ex_path.mnt)
+				sub_fsid = moi;
+		}
+	}
 	if (bmval0 & FATTR4_WORD0_ACL) {
 		err = nfsd4_get_nfs4_acl(rqstp, dentry, &acl);
 		if (err == -EOPNOTSUPP)
@@ -3008,6 +3028,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		case FSIDSOURCE_UUID:
 			p = xdr_encode_opaque_fixed(p, exp->ex_uuid,
 								EX_UUID_LEN);
+			if (mnt != exp->ex_path.mnt)
+				*(u64*)(p-2) ^= sub_fsid;
 			break;
 		}
 	}
@@ -3253,20 +3275,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		*p++ = cpu_to_be32(stat.mtime.tv_nsec);
 	}
 	if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
-		u64 ino;
-
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
 			goto out_resource;
-		/*
-		 * Get parent's attributes if not ignoring crossmount
-		 * and this is the root of a cross-mounted filesystem.
-		 */
-		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root)
-			ino = nfsd_get_mounted_on(mnt);
-		if (!ino)
-			ino = stat.ino;
-		p = xdr_encode_hyper(p, ino);
+		p = xdr_encode_hyper(p, mounted_on_ino);
 	}
 #ifdef CONFIG_NFSD_PNFS
 	if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 4023046f63e2..4b53838bca89 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -9,7 +9,7 @@
  */
 
 #include <linux/exportfs.h>
-
+#include <linux/namei.h>
 #include <linux/sunrpc/svcauth_gss.h>
 #include "nfsd.h"
 #include "vfs.h"
@@ -277,6 +277,12 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 		if (IS_ERR_OR_NULL(dentry))
 			trace_nfsd_set_fh_dentry_badhandle(rqstp, fhp,
 					dentry ?  PTR_ERR(dentry) : -ESTALE);
+		else if (nfsd_mountpoint(dentry, exp)) {
+			struct path path = { .mnt = mnt, .dentry = dentry };
+			follow_down(&path, LOOKUP_AUTOMOUNT);
+			mnt = path.mnt;
+			dentry = path.dentry;
+		}
 	}
 	if (dentry == NULL)
 		goto out;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index baa12ac36ece..22523e1cd478 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -64,7 +64,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
 			    .dentry = dget(path_parent->dentry)};
 	int err = 0;
 
-	err = follow_down(&path, 0);
+	err = follow_down(&path, LOOKUP_AUTOMOUNT);
 	if (err < 0)
 		goto out;
 	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
@@ -73,6 +73,13 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
 		path_put(&path);
 		goto out;
 	}
+	if (mount_is_internal(path.mnt)) {
+		/* Use the new path, but don't look for a new export */
+		/* FIXME should I check NOHIDE in this case?? */
+		path_put(path_parent);
+		*path_parent = path;
+		goto out;
+	}
 
 	exp2 = rqst_exp_get_by_name(rqstp, &path);
 	if (IS_ERR(exp2)) {
@@ -157,7 +164,7 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
 		return 1;
 	if (nfsd4_is_junction(dentry))
 		return 1;
-	if (d_mountpoint(dentry))
+	if (d_managed(dentry))
 		/*
 		 * Might only be a mountpoint in a different namespace,
 		 * but we need to check.
-- 
2.32.0


[-- Attachment #3: 0011-btrfs-use-automount-to-bind-mount-all-subvol-roots.patch --]
[-- Type: application/octet-stream, Size: 6484 bytes --]

From e818a147155d2d9b66b986e4617455fd6a1454aa Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] btrfs: use automount to bind-mount all subvol roots.

All subvol roots are now marked as automounts.  If the d_automount()
function determines that the dentry is not the root of the vfsmount, it
creates a simple loop-back mount of the dentry onto itself.  If it
determines that it IS the root of the vfsmount, it returns -EISDIR so
that no further automounting is attempted.

btrfs_getattr pays special attention to these automount dentries.
If it is NOT the root of the vfsmount:
 - the ->dev is reported as that for the rest of the vfsmount
 - the ->ino is reported as the subvol objectid, suitable transformed
   to avoid collision.

This way the same inode appear to be different depending on which mount
it is in.

automounted vfsmounts are kept on a list and timeout after 500 to 1000
seconds of last use.  This is configurable via a module parameter.
The tracking and timeout of automounts is copied from NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/btrfs/btrfs_inode.h |   2 +
 fs/btrfs/inode.c       | 108 +++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/super.c       |   1 +
 3 files changed, 111 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index a4b5f38196e6..f03056cacc4a 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -387,4 +387,6 @@ static inline void btrfs_print_data_csum_error(struct btrfs_inode *inode,
 			mirror_num);
 }
 
+void btrfs_release_automount_timer(void);
+
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 25db806ca68a..809b97defafe 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -31,6 +31,8 @@
 #include <linux/migrate.h>
 #include <linux/sched/mm.h>
 #include <linux/iomap.h>
+#include <linux/fs_context.h>
+#include <linux/mount.h>
 #include <asm/unaligned.h>
 #include "misc.h"
 #include "ctree.h"
@@ -5782,6 +5783,8 @@ static int btrfs_init_locked_inode(struct inode *inode, void *p)
 	struct btrfs_iget_args *args = p;
 
 	inode->i_ino = args->ino;
+	if (args->ino == BTRFS_FIRST_FREE_OBJECTID)
+		inode->i_flags |= S_AUTOMOUNT;
 	BTRFS_I(inode)->location.objectid = args->ino;
 	BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY;
 	BTRFS_I(inode)->location.offset = 0;
@@ -5985,6 +5988,101 @@ static int btrfs_dentry_delete(const struct dentry *dentry)
 	return 0;
 }
 
+static void btrfs_expire_automounts(struct work_struct *work);
+static LIST_HEAD(btrfs_automount_list);
+static DECLARE_DELAYED_WORK(btrfs_automount_task, btrfs_expire_automounts);
+int btrfs_mountpoint_expiry_timeout = 500 * HZ;
+static void btrfs_expire_automounts(struct work_struct *work)
+{
+	struct list_head *list = &btrfs_automount_list;
+	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
+
+	mark_mounts_for_expiry(list);
+	if (!list_empty(list) && timeout > 0)
+		schedule_delayed_work(&btrfs_automount_task, timeout);
+}
+
+void btrfs_release_automount_timer(void)
+{
+	if (list_empty(&btrfs_automount_list))
+		cancel_delayed_work(&btrfs_automount_task);
+}
+
+static struct vfsmount *btrfs_automount(struct path *path)
+{
+	struct fs_context fc;
+	struct vfsmount *mnt;
+	int timeout = READ_ONCE(btrfs_mountpoint_expiry_timeout);
+
+	if (path->dentry == path->mnt->mnt_root)
+		/* dentry is root of the vfsmount,
+		 * so skip automount processing
+		 */
+		return ERR_PTR(-EISDIR);
+	/* Create a bind-mount to expose the subvol in the mount table */
+	fc.root = path->dentry;
+	fc.sb_flags = 0;
+	fc.source = "btrfs-automount";
+	mnt = vfs_create_mount(&fc);
+	if (IS_ERR(mnt))
+		return mnt;
+	mntget(mnt);
+	mnt_set_expiry(mnt, &btrfs_automount_list);
+	if (timeout > 0)
+		schedule_delayed_work(&btrfs_automount_task, timeout);
+	return mnt;
+}
+
+static int param_set_btrfs_timeout(const char *val, const struct kernel_param *kp)
+{
+	long num;
+	int ret;
+
+	if (!val)
+		return -EINVAL;
+	ret = kstrtol(val, 0, &num);
+	if (ret)
+		return -EINVAL;
+	if (num > 0) {
+		if (num >= INT_MAX / HZ)
+			num = INT_MAX;
+		else
+			num *= HZ;
+		*((int *)kp->arg) = num;
+		if (!list_empty(&btrfs_automount_list))
+			mod_delayed_work(system_wq, &btrfs_automount_task, num);
+	} else {
+		*((int *)kp->arg) = -1*HZ;
+		cancel_delayed_work(&btrfs_automount_task);
+	}
+	return 0;
+}
+
+static int param_get_btrfs_timeout(char *buffer, const struct kernel_param *kp)
+{
+	long num = *((int *)kp->arg);
+
+	if (num > 0) {
+		if (num >= INT_MAX - (HZ - 1))
+			num = INT_MAX / HZ;
+		else
+			num = (num + (HZ - 1)) / HZ;
+	} else
+		num = -1;
+	return scnprintf(buffer, PAGE_SIZE, "%li\n", num);
+}
+
+static const struct kernel_param_ops param_ops_btrfs_timeout = {
+	.set = param_set_btrfs_timeout,
+	.get = param_get_btrfs_timeout,
+};
+#define param_check_btrfs_timeout(name, p) __param_check(name, p, int)
+
+module_param(btrfs_mountpoint_expiry_timeout, btrfs_timeout, 0644);
+MODULE_PARM_DESC(btrfs_mountpoint_expiry_timeout,
+		"Set the btrfs automounted mountpoint timeout value (seconds). "
+		"Values <= 0 turn expiration off.");
+
 static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
 				   unsigned int flags)
 {
@@ -8874,6 +8972,15 @@ static int btrfs_getattr(const struct path *path, struct kstat *stat,
 
 	generic_fillattr(inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
+	if ((inode->i_flags & S_AUTOMOUNT) &&
+	    path->dentry != path->mnt->mnt_root) {
+		/* This is the mounted-on side of the automount,
+		 * so we show the inode number from the ROOT_ITEM key
+		 * and the dev of the mountpoint.
+		 */
+		stat->ino = btrfs_location_to_ino(&BTRFS_I(inode)->root->root_key);
+		stat->dev = BTRFS_I(d_inode(path->mnt->mnt_root))->root->anon_dev;
+	}
 
 	spin_lock(&BTRFS_I(inode)->lock);
 	delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes;
@@ -10844,4 +10951,5 @@ static const struct inode_operations btrfs_symlink_inode_operations = {
 
 const struct dentry_operations btrfs_dentry_operations = {
 	.d_delete	= btrfs_dentry_delete,
+	.d_automount	= btrfs_automount,
 };
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d07b18b2b250..33008e432a15 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -338,6 +338,7 @@ void __btrfs_panic(struct btrfs_fs_info *fs_info, const char *function,
 static void btrfs_put_super(struct super_block *sb)
 {
 	close_ctree(btrfs_sb(sb));
+	btrfs_release_automount_timer();
 }
 
 enum {
-- 
2.32.0


[-- Attachment #4: 0006-nfsd-include-a-vfsmount-in-struct-svc_fh.patch --]
[-- Type: application/octet-stream, Size: 27410 bytes --]

From b7e1488f5f44806c5ccff692adac907a7c57e545 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] nfsd: include a vfsmount in struct svc_fh

A future patch will allow exportfs_decode_fh{,_raw} to return a
different vfsmount than the one passed.  This is specifically for btrfs,
but would be useful for any filesystem that presents as multiple volumes
(i.e. different st_dev, each with their own st_ino number-space).

For nfsd, this means that the mnt in the svc_export may not apply to all
filehandles reached from that export.  So svc_fh needs to store a
distinct vfsmount as well.

For now, fs->fh_mnt == fh->fh_export->ex_path.mnt, but that will change.

Changes include:
  fh_compose()
  nfsd_lookup_dentry()
     now take a *path instead of a *dentry

  nfsd4_encode_fattr()
  nfsd4_encode_fattr_to_buf()
     now take a *vfsmount as well as a *dentry

  nfsd_cross_mnt() now takes a *path instead of a **dentry
     to pass in, and get back, the mnt and dentry.

  nfsd_lookup_parent() used to take a *dentry and a **dentry.
     now it just takes a *path.  This is the *path that as passed
     to nfsd_lookup_dentry().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/export.c   |   4 +-
 fs/nfsd/nfs3xdr.c  |  22 ++++----
 fs/nfsd/nfs4proc.c |   9 +--
 fs/nfsd/nfs4xdr.c  |  55 ++++++++++---------
 fs/nfsd/nfsfh.c    |  30 ++++++----
 fs/nfsd/nfsfh.h    |   3 +-
 fs/nfsd/nfsproc.c  |   5 +-
 fs/nfsd/vfs.c      | 133 ++++++++++++++++++++++++---------------------
 fs/nfsd/vfs.h      |  10 ++--
 fs/nfsd/xdr4.h     |   2 +-
 10 files changed, 150 insertions(+), 123 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 9421dae22737..e506cbe78b4f 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -1003,7 +1003,7 @@ exp_rootfh(struct net *net, struct auth_domain *clp, char *name,
 	 * fh must be initialized before calling fh_compose
 	 */
 	fh_init(&fh, maxsize);
-	if (fh_compose(&fh, exp, path.dentry, NULL))
+	if (fh_compose(&fh, exp, &path, NULL))
 		err = -EINVAL;
 	else
 		err = 0;
@@ -1178,7 +1178,7 @@ exp_pseudoroot(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	exp = rqst_find_fsidzero_export(rqstp);
 	if (IS_ERR(exp))
 		return nfserrno(PTR_ERR(exp));
-	rv = fh_compose(fhp, exp, exp->ex_path.dentry, NULL);
+	rv = fh_compose(fhp, exp, &exp->ex_path, NULL);
 	exp_put(exp);
 	return rv;
 }
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 0a5ebc52e6a9..67af0c5c1543 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -1089,36 +1089,38 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 		 const char *name, int namlen, u64 ino)
 {
 	struct svc_export	*exp;
-	struct dentry		*dparent, *dchild;
+	struct dentry		*dparent;
+	struct path		child;
 	__be32 rv = nfserr_noent;
 
 	dparent = cd->fh.fh_dentry;
 	exp  = cd->fh.fh_export;
+	child.mnt = cd->fh.fh_mnt;
 
 	if (isdotent(name, namlen)) {
 		if (namlen == 2) {
-			dchild = dget_parent(dparent);
+			child.dentry = dget_parent(dparent);
 			/*
 			 * Don't return filehandle for ".." if we're at
 			 * the filesystem or export root:
 			 */
-			if (dchild == dparent)
+			if (child.dentry == dparent)
 				goto out;
 			if (dparent == exp->ex_path.dentry)
 				goto out;
 		} else
-			dchild = dget(dparent);
+			child.dentry = dget(dparent);
 	} else
-		dchild = lookup_positive_unlocked(name, dparent, namlen);
-	if (IS_ERR(dchild))
+		child.dentry = lookup_positive_unlocked(name, dparent, namlen);
+	if (IS_ERR(child.dentry))
 		return rv;
-	if (d_mountpoint(dchild))
+	if (d_mountpoint(child.dentry))
 		goto out;
-	if (dchild->d_inode->i_ino != ino)
+	if (child.dentry->d_inode->i_ino != ino)
 		goto out;
-	rv = fh_compose(fhp, exp, dchild, &cd->fh);
+	rv = fh_compose(fhp, exp, &child, &cd->fh);
 out:
-	dput(dchild);
+	dput(child.dentry);
 	return rv;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 486c5dba4b65..743b9315cd3e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -902,7 +902,7 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	struct nfsd4_secinfo *secinfo = &u->secinfo;
 	struct svc_export *exp;
-	struct dentry *dentry;
+	struct path path;
 	__be32 err;
 
 	err = fh_verify(rqstp, &cstate->current_fh, S_IFDIR, NFSD_MAY_EXEC);
@@ -910,16 +910,16 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		return err;
 	err = nfsd_lookup_dentry(rqstp, &cstate->current_fh,
 				    secinfo->si_name, secinfo->si_namelen,
-				    &exp, &dentry);
+				    &exp, &path);
 	if (err)
 		return err;
 	fh_unlock(&cstate->current_fh);
-	if (d_really_is_negative(dentry)) {
+	if (d_really_is_negative(path.dentry)) {
 		exp_put(exp);
 		err = nfserr_noent;
 	} else
 		secinfo->si_exp = exp;
-	dput(dentry);
+	path_put(&path);
 	if (cstate->minorversion)
 		/* See rfc 5661 section 2.6.3.1.1.8 */
 		fh_put(&cstate->current_fh);
@@ -1930,6 +1930,7 @@ _nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	p = buf;
 	status = nfsd4_encode_fattr_to_buf(&p, count, &cstate->current_fh,
 				    cstate->current_fh.fh_export,
+				    cstate->current_fh.fh_mnt,
 				    cstate->current_fh.fh_dentry,
 				    verify->ve_bmval,
 				    rqstp, 0);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 7abeccb975b2..21c277fa28ae 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2823,9 +2823,9 @@ nfsd4_encode_bitmap(struct xdr_stream *xdr, u32 bmval0, u32 bmval1, u32 bmval2)
  */
 static __be32
 nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
-		struct svc_export *exp,
-		struct dentry *dentry, u32 *bmval,
-		struct svc_rqst *rqstp, int ignore_crossmnt)
+		   struct svc_export *exp,
+		   struct vfsmount *mnt, struct dentry *dentry,
+		   u32 *bmval, struct svc_rqst *rqstp, int ignore_crossmnt)
 {
 	u32 bmval0 = bmval[0];
 	u32 bmval1 = bmval[1];
@@ -2851,7 +2851,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 	struct nfsd4_compoundres *resp = rqstp->rq_resp;
 	u32 minorversion = resp->cstate.minorversion;
 	struct path path = {
-		.mnt	= exp->ex_path.mnt,
+		.mnt	= mnt,
 		.dentry	= dentry,
 	};
 	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
@@ -2882,7 +2882,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 		if (!tempfh)
 			goto out;
 		fh_init(tempfh, NFS4_FHSIZE);
-		status = fh_compose(tempfh, exp, dentry, NULL);
+		status = fh_compose(tempfh, exp, &path, NULL);
 		if (status)
 			goto out;
 		fhp = tempfh;
@@ -3274,13 +3274,12 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 
 		p = xdr_reserve_space(xdr, 8);
 		if (!p)
-                	goto out_resource;
+			goto out_resource;
 		/*
 		 * Get parent's attributes if not ignoring crossmount
 		 * and this is the root of a cross-mounted filesystem.
 		 */
-		if (ignore_crossmnt == 0 &&
-		    dentry == exp->ex_path.mnt->mnt_root) {
+		if (ignore_crossmnt == 0 && dentry == mnt->mnt_root) {
 			err = get_parent_attributes(exp, &parent_stat);
 			if (err)
 				goto out_nfserr;
@@ -3380,17 +3379,18 @@ static void svcxdr_init_encode_from_buffer(struct xdr_stream *xdr,
 }
 
 __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
-			struct svc_fh *fhp, struct svc_export *exp,
-			struct dentry *dentry, u32 *bmval,
-			struct svc_rqst *rqstp, int ignore_crossmnt)
+				 struct svc_fh *fhp, struct svc_export *exp,
+				 struct vfsmount *mnt, struct dentry *dentry,
+				 u32 *bmval, struct svc_rqst *rqstp,
+				 int ignore_crossmnt)
 {
 	struct xdr_buf dummy;
 	struct xdr_stream xdr;
 	__be32 ret;
 
 	svcxdr_init_encode_from_buffer(&xdr, &dummy, *p, words << 2);
-	ret = nfsd4_encode_fattr(&xdr, fhp, exp, dentry, bmval, rqstp,
-							ignore_crossmnt);
+	ret = nfsd4_encode_fattr(&xdr, fhp, exp, mnt, dentry, bmval, rqstp,
+				 ignore_crossmnt);
 	*p = xdr.p;
 	return ret;
 }
@@ -3409,14 +3409,16 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 			const char *name, int namlen)
 {
 	struct svc_export *exp = cd->rd_fhp->fh_export;
-	struct dentry *dentry;
+	struct path path;
 	__be32 nfserr;
 	int ignore_crossmnt = 0;
 
-	dentry = lookup_positive_unlocked(name, cd->rd_fhp->fh_dentry, namlen);
-	if (IS_ERR(dentry))
-		return nfserrno(PTR_ERR(dentry));
+	path.dentry = lookup_positive_unlocked(name, cd->rd_fhp->fh_dentry,
+					      namlen);
+	if (IS_ERR(path.dentry))
+		return nfserrno(PTR_ERR(path.dentry));
 
+	path.mnt = mntget(cd->rd_fhp->fh_mnt);
 	exp_get(exp);
 	/*
 	 * In the case of a mountpoint, the client may be asking for
@@ -3425,7 +3427,7 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 	 * we will not follow the cross mount and will fill the attribtutes
 	 * directly from the mountpoint dentry.
 	 */
-	if (nfsd_mountpoint(dentry, exp)) {
+	if (nfsd_mountpoint(path.dentry, exp)) {
 		int err;
 
 		if (!(exp->ex_flags & NFSEXP_V4ROOT)
@@ -3434,11 +3436,11 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 			goto out_encode;
 		}
 		/*
-		 * Why the heck aren't we just using nfsd_lookup??
+		 * Why the heck aren't we just using nfsd_lookup_dentry??
 		 * Different "."/".." handling?  Something else?
 		 * At least, add a comment here to explain....
 		 */
-		err = nfsd_cross_mnt(cd->rd_rqstp, &dentry, &exp);
+		err = nfsd_cross_mnt(cd->rd_rqstp, &path, &exp);
 		if (err) {
 			nfserr = nfserrno(err);
 			goto out_put;
@@ -3446,13 +3448,13 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
 		nfserr = check_nfsd_access(exp, cd->rd_rqstp);
 		if (nfserr)
 			goto out_put;
-
 	}
 out_encode:
-	nfserr = nfsd4_encode_fattr(xdr, NULL, exp, dentry, cd->rd_bmval,
-					cd->rd_rqstp, ignore_crossmnt);
+	nfserr = nfsd4_encode_fattr(xdr, NULL, exp, path.mnt, path.dentry,
+				    cd->rd_bmval, cd->rd_rqstp,
+				    ignore_crossmnt);
 out_put:
-	dput(dentry);
+	path_put(&path);
 	exp_put(exp);
 	return nfserr;
 }
@@ -3651,8 +3653,9 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
 	struct svc_fh *fhp = getattr->ga_fhp;
 	struct xdr_stream *xdr = &resp->xdr;
 
-	return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry,
-				    getattr->ga_bmval, resp->rqstp, 0);
+	return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export,
+				  fhp->fh_mnt, fhp->fh_dentry,
+				  getattr->ga_bmval, resp->rqstp, 0);
 }
 
 static __be32
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c475d2271f9c..0bf7ac13ae50 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -299,6 +299,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	}
 
 	fhp->fh_dentry = dentry;
+	fhp->fh_mnt = mntget(exp->ex_path.mnt);
 	fhp->fh_export = exp;
 	return 0;
 out:
@@ -556,7 +557,7 @@ static void set_version_and_fsid_type(struct svc_fh *fhp, struct svc_export *exp
 }
 
 __be32
-fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
+fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct path *path,
 	   struct svc_fh *ref_fh)
 {
 	/* ref_fh is a reference file handle.
@@ -567,13 +568,13 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 	 *
 	 */
 
-	struct inode * inode = d_inode(dentry);
+	struct inode * inode = d_inode(path->dentry);
 	dev_t ex_dev = exp_sb(exp)->s_dev;
 
 	dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
 		MAJOR(ex_dev), MINOR(ex_dev),
 		(long) d_inode(exp->ex_path.dentry)->i_ino,
-		dentry,
+		path->dentry,
 		(inode ? inode->i_ino : 0));
 
 	/* Choose filehandle version and fsid type based on
@@ -590,14 +591,15 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 
 	if (fhp->fh_locked || fhp->fh_dentry) {
 		printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n",
-		       dentry);
+		       path->dentry);
 	}
 	if (fhp->fh_maxsize < NFS_FHSIZE)
 		printk(KERN_ERR "fh_compose: called with maxsize %d! %pd2\n",
 		       fhp->fh_maxsize,
-		       dentry);
+		       path->dentry);
 
-	fhp->fh_dentry = dget(dentry); /* our internal copy */
+	fhp->fh_dentry = dget(path->dentry); /* our internal copy */
+	fhp->fh_mnt = mntget(path->mnt);
 	fhp->fh_export = exp_get(exp);
 
 	if (fhp->fh_handle.fh_version == 0xca) {
@@ -609,9 +611,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 		fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev;
 		fhp->fh_handle.ofh_xino =
 			ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
-		fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry));
+		fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(path->dentry));
 		if (inode)
-			_fh_update_old(dentry, exp, &fhp->fh_handle);
+			_fh_update_old(path->dentry, exp, &fhp->fh_handle);
 	} else {
 		fhp->fh_handle.fh_size =
 			key_len(fhp->fh_handle.fh_fsid_type) + 4;
@@ -624,7 +626,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 			exp->ex_fsid, exp->ex_uuid);
 
 		if (inode)
-			_fh_update(fhp, exp, dentry);
+			_fh_update(fhp, exp, path->dentry);
 		if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
 			fh_put(fhp);
 			return nfserr_opnotsupp;
@@ -675,8 +677,10 @@ fh_update(struct svc_fh *fhp)
 void
 fh_put(struct svc_fh *fhp)
 {
-	struct dentry * dentry = fhp->fh_dentry;
-	struct svc_export * exp = fhp->fh_export;
+	struct dentry *dentry = fhp->fh_dentry;
+	struct svc_export *exp = fhp->fh_export;
+	struct vfsmount *mnt = fhp->fh_mnt;
+
 	if (dentry) {
 		fh_unlock(fhp);
 		fhp->fh_dentry = NULL;
@@ -684,6 +688,10 @@ fh_put(struct svc_fh *fhp)
 		fh_clear_wcc(fhp);
 	}
 	fh_drop_write(fhp);
+	if (mnt) {
+		mntput(mnt);
+		fhp->fh_mnt = NULL;
+	}
 	if (exp) {
 		exp_put(exp);
 		fhp->fh_export = NULL;
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 6106697adc04..26c02209babd 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -31,6 +31,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
 typedef struct svc_fh {
 	struct knfsd_fh		fh_handle;	/* FH data */
 	int			fh_maxsize;	/* max size for fh_handle */
+	struct vfsmount	*	fh_mnt;		/* mnt, possibly of subvol */
 	struct dentry *		fh_dentry;	/* validated dentry */
 	struct svc_export *	fh_export;	/* export pointer */
 
@@ -171,7 +172,7 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
  * Function prototypes
  */
 __be32	fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
-__be32	fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
+__be32	fh_compose(struct svc_fh *, struct svc_export *, struct path *, struct svc_fh *);
 __be32	fh_update(struct svc_fh *);
 void	fh_put(struct svc_fh *);
 
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 60d7c59e7935..245199b0e630 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -268,6 +268,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 	struct iattr	*attr = &argp->attrs;
 	struct inode	*inode;
 	struct dentry	*dchild;
+	struct path	path;
 	int		type, mode;
 	int		hosterr;
 	dev_t		rdev = 0, wanted = new_decode_dev(attr->ia_size);
@@ -298,7 +299,9 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 		goto out_unlock;
 	}
 	fh_init(newfhp, NFS_FHSIZE);
-	resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
+	path.mnt = dirfhp->fh_mnt;
+	path.dentry = dchild;
+	resp->status = fh_compose(newfhp, dirfhp->fh_export, &path, dirfhp);
 	if (!resp->status && d_really_is_negative(dchild))
 		resp->status = nfserr_noent;
 	dput(dchild);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 7c32edcfd2e9..c0c6920f25a4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -49,27 +49,26 @@
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
-/* 
- * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
+/*
+ * Called from nfsd_lookup and encode_dirent. Check if we have crossed
  * a mount point.
- * Returns -EAGAIN or -ETIMEDOUT leaving *dpp and *expp unchanged,
- *  or nfs_ok having possibly changed *dpp and *expp
+ * Returns -EAGAIN or -ETIMEDOUT leaving *path and *expp unchanged,
+ *  or nfs_ok having possibly changed *path and *expp
  */
 int
-nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, 
-		        struct svc_export **expp)
+nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *path_parent,
+	       struct svc_export **expp)
 {
 	struct svc_export *exp = *expp, *exp2 = NULL;
-	struct dentry *dentry = *dpp;
-	struct path path = {.mnt = mntget(exp->ex_path.mnt),
-			    .dentry = dget(dentry)};
+	struct path path = {.mnt = mntget(path_parent->mnt),
+			    .dentry = dget(path_parent->dentry)};
 	int err = 0;
 
 	err = follow_down(&path, 0);
 	if (err < 0)
 		goto out;
-	if (path.mnt == exp->ex_path.mnt && path.dentry == dentry &&
-	    nfsd_mountpoint(dentry, exp) == 2) {
+	if (path.mnt == path_parent->mnt && path.dentry == path_parent->dentry &&
+	    nfsd_mountpoint(path.dentry, exp) == 2) {
 		/* This is only a mountpoint in some other namespace */
 		path_put(&path);
 		goto out;
@@ -93,19 +92,14 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 	if (nfsd_v4client(rqstp) ||
 		(exp->ex_flags & NFSEXP_CROSSMOUNT) || EX_NOHIDE(exp2)) {
 		/* successfully crossed mount point */
-		/*
-		 * This is subtle: path.dentry is *not* on path.mnt
-		 * at this point.  The only reason we are safe is that
-		 * original mnt is pinned down by exp, so we should
-		 * put path *before* putting exp
-		 */
-		*dpp = path.dentry;
-		path.dentry = dentry;
+		path_put(path_parent);
+		*path_parent = path;
+		exp_put(exp);
 		*expp = exp2;
-		exp2 = exp;
+	} else {
+		path_put(&path);
+		exp_put(exp2);
 	}
-	path_put(&path);
-	exp_put(exp2);
 out:
 	return err;
 }
@@ -121,27 +115,30 @@ static void follow_to_parent(struct path *path)
 	path->dentry = dp;
 }
 
-static int nfsd_lookup_parent(struct svc_rqst *rqstp, struct dentry *dparent, struct svc_export **exp, struct dentry **dentryp)
+static int nfsd_lookup_parent(struct svc_rqst *rqstp, struct svc_export **exp,
+			      struct path *path)
 {
+	struct path path2;
 	struct svc_export *exp2;
-	struct path path = {.mnt = mntget((*exp)->ex_path.mnt),
-			    .dentry = dget(dparent)};
 
-	follow_to_parent(&path);
-
-	exp2 = rqst_exp_parent(rqstp, &path);
+	path2 = *path;
+	path_get(&path2);
+	follow_to_parent(&path2);
+	exp2 = rqst_exp_parent(rqstp, path);
 	if (PTR_ERR(exp2) == -ENOENT) {
-		*dentryp = dget(dparent);
+		/* leave path unchanged */
+		path_put(&path2);
+		return 0;
 	} else if (IS_ERR(exp2)) {
-		path_put(&path);
+		path_put(&path2);
 		return PTR_ERR(exp2);
 	} else {
-		*dentryp = dget(path.dentry);
+		path_put(path);
+		*path = path2;
 		exp_put(*exp);
 		*exp = exp2;
+		return 0;
 	}
-	path_put(&path);
-	return 0;
 }
 
 /*
@@ -172,29 +169,32 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
 __be32
 nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		   const char *name, unsigned int len,
-		   struct svc_export **exp_ret, struct dentry **dentry_ret)
+		   struct svc_export **exp_ret, struct path *ret)
 {
 	struct svc_export	*exp;
 	struct dentry		*dparent;
-	struct dentry		*dentry;
 	int			host_err;
 
 	dprintk("nfsd: nfsd_lookup(fh %s, %.*s)\n", SVCFH_fmt(fhp), len,name);
 
 	dparent = fhp->fh_dentry;
+	ret->mnt = mntget(fhp->fh_mnt);
 	exp = exp_get(fhp->fh_export);
 
 	/* Lookup the name, but don't follow links */
 	if (isdotent(name, len)) {
 		if (len==1)
-			dentry = dget(dparent);
+			ret->dentry = dget(dparent);
 		else if (dparent != exp->ex_path.dentry)
-			dentry = dget_parent(dparent);
+			ret->dentry = dget_parent(dparent);
 		else if (!EX_NOHIDE(exp) && !nfsd_v4client(rqstp))
-			dentry = dget(dparent); /* .. == . just like at / */
+			ret->dentry = dget(dparent); /* .. == . just like at / */
 		else {
-			/* checking mountpoint crossing is very different when stepping up */
-			host_err = nfsd_lookup_parent(rqstp, dparent, &exp, &dentry);
+			/* checking mountpoint crossing is very different when
+			 * stepping up
+			 */
+			ret->dentry = dget(dparent);
+			host_err = nfsd_lookup_parent(rqstp, &exp, ret);
 			if (host_err)
 				goto out_nfserr;
 		}
@@ -205,11 +205,13 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		 * need to take the child's i_mutex:
 		 */
 		fh_lock_nested(fhp, I_MUTEX_PARENT);
-		dentry = lookup_one_len(name, dparent, len);
-		host_err = PTR_ERR(dentry);
-		if (IS_ERR(dentry))
+		ret->dentry = lookup_one_len(name, dparent, len);
+		host_err = PTR_ERR(ret->dentry);
+		if (IS_ERR(ret->dentry)) {
+			ret->dentry = NULL;
 			goto out_nfserr;
-		if (nfsd_mountpoint(dentry, exp)) {
+		}
+		if (nfsd_mountpoint(ret->dentry, exp)) {
 			/*
 			 * We don't need the i_mutex after all.  It's
 			 * still possible we could open this (regular
@@ -219,18 +221,16 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			 * and a mountpoint won't be renamed:
 			 */
 			fh_unlock(fhp);
-			if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) {
-				dput(dentry);
+			if ((host_err = nfsd_cross_mnt(rqstp, ret, &exp)))
 				goto out_nfserr;
-			}
 		}
 	}
-	*dentry_ret = dentry;
 	*exp_ret = exp;
 	return 0;
 
 out_nfserr:
 	exp_put(exp);
+	path_put(ret);
 	return nfserrno(host_err);
 }
 
@@ -251,13 +251,13 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 				unsigned int len, struct svc_fh *resfh)
 {
 	struct svc_export	*exp;
-	struct dentry		*dentry;
+	struct path		path;
 	__be32 err;
 
 	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
 	if (err)
 		return err;
-	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry);
+	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &path);
 	if (err)
 		return err;
 	err = check_nfsd_access(exp, rqstp);
@@ -267,11 +267,11 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 	 * Note: we compose the file handle now, but as the
 	 * dentry may be negative, it may need to be updated.
 	 */
-	err = fh_compose(resfh, exp, dentry, fhp);
-	if (!err && d_really_is_negative(dentry))
+	err = fh_compose(resfh, exp, &path, fhp);
+	if (!err && d_really_is_negative(path.dentry))
 		err = nfserr_noent;
 out:
-	dput(dentry);
+	path_put(&path);
 	exp_put(exp);
 	return err;
 }
@@ -740,7 +740,7 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	__be32		err;
 	int		host_err = 0;
 
-	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.mnt = fhp->fh_mnt;
 	path.dentry = fhp->fh_dentry;
 	inode = d_inode(path.dentry);
 
@@ -1350,6 +1350,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		int type, dev_t rdev, struct svc_fh *resfhp)
 {
 	struct dentry	*dentry, *dchild = NULL;
+	struct path	path;
 	__be32		err;
 	int		host_err;
 
@@ -1371,7 +1372,9 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	host_err = PTR_ERR(dchild);
 	if (IS_ERR(dchild))
 		return nfserrno(host_err);
-	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
+	path.mnt = fhp->fh_mnt;
+	path.dentry = dchild;
+	err = fh_compose(resfhp, fhp->fh_export, &path, fhp);
 	/*
 	 * We unconditionally drop our ref to dchild as fh_compose will have
 	 * already grabbed its own ref for it.
@@ -1390,11 +1393,12 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 __be32
 do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		char *fname, int flen, struct iattr *iap,
-		struct svc_fh *resfhp, int createmode, u32 *verifier,
-	        bool *truncp, bool *created)
+	       char *fname, int flen, struct iattr *iap,
+	       struct svc_fh *resfhp, int createmode, u32 *verifier,
+	       bool *truncp, bool *created)
 {
 	struct dentry	*dentry, *dchild = NULL;
+	struct path	path;
 	struct inode	*dirp;
 	__be32		err;
 	int		host_err;
@@ -1436,7 +1440,9 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			goto out;
 	}
 
-	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
+	path.mnt = fhp->fh_mnt;
+	path.dentry = dchild;
+	err = fh_compose(resfhp, fhp->fh_export, &path, fhp);
 	if (err)
 		goto out;
 
@@ -1569,7 +1575,7 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
 	if (unlikely(err))
 		return err;
 
-	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.mnt = fhp->fh_mnt;
 	path.dentry = fhp->fh_dentry;
 
 	if (unlikely(!d_is_symlink(path.dentry)))
@@ -1600,6 +1606,7 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct svc_fh *resfhp)
 {
 	struct dentry	*dentry, *dnew;
+	struct path	pathnew;
 	__be32		err, cerr;
 	int		host_err;
 
@@ -1633,7 +1640,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	fh_drop_write(fhp);
 
-	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
+	pathnew.mnt = fhp->fh_mnt;
+	pathnew.dentry = dnew;
+	cerr = fh_compose(resfhp, fhp->fh_export, &pathnew, fhp);
 	dput(dnew);
 	if (err==0) err = cerr;
 out:
@@ -2107,7 +2116,7 @@ nfsd_statfs(struct svc_rqst *rqstp, struct svc_fh *fhp, struct kstatfs *stat, in
 	err = fh_verify(rqstp, fhp, 0, NFSD_MAY_NOP | access);
 	if (!err) {
 		struct path path = {
-			.mnt	= fhp->fh_export->ex_path.mnt,
+			.mnt	= fhp->fh_mnt,
 			.dentry	= fhp->fh_dentry,
 		};
 		if (vfs_statfs(&path, stat))
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index b21b76e6b9a8..52f587716208 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -42,13 +42,13 @@ struct nfsd_file;
 typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
 
 /* nfsd/vfs.c */
-int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
+int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct path *,
 		                struct svc_export **expp);
 __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
 				const char *, unsigned int, struct svc_fh *);
 __be32		 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *,
 				const char *, unsigned int,
-				struct svc_export **, struct dentry **);
+				struct svc_export **, struct path *);
 __be32		nfsd_setattr(struct svc_rqst *, struct svc_fh *,
 				struct iattr *, int, time64_t);
 int nfsd_mountpoint(struct dentry *, struct svc_export *);
@@ -138,7 +138,7 @@ static inline int fh_want_write(struct svc_fh *fh)
 
 	if (fh->fh_want_write)
 		return 0;
-	ret = mnt_want_write(fh->fh_export->ex_path.mnt);
+	ret = mnt_want_write(fh->fh_mnt);
 	if (!ret)
 		fh->fh_want_write = true;
 	return ret;
@@ -148,13 +148,13 @@ static inline void fh_drop_write(struct svc_fh *fh)
 {
 	if (fh->fh_want_write) {
 		fh->fh_want_write = false;
-		mnt_drop_write(fh->fh_export->ex_path.mnt);
+		mnt_drop_write(fh->fh_mnt);
 	}
 }
 
 static inline __be32 fh_getattr(struct svc_fh *fh, struct kstat *stat)
 {
-	struct path p = {.mnt = fh->fh_export->ex_path.mnt,
+	struct path p = {.mnt = fh->fh_mnt,
 			 .dentry = fh->fh_dentry};
 	return nfserrno(vfs_getattr(&p, stat, STATX_BASIC_STATS,
 				    AT_STATX_SYNC_AS_STAT));
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 3e4052e3bd50..8934db5113ac 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -763,7 +763,7 @@ void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *);
 void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op);
 __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
 		struct svc_fh *fhp, struct svc_export *exp,
-		struct dentry *dentry,
+		struct vfsmount *mnt, struct dentry *dentry,
 		u32 *bmval, struct svc_rqst *, int ignore_crossmnt);
 extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *, union nfsd4_op_u *u);
-- 
2.32.0


[-- Attachment #5: 0007-exportfs-Allow-filehandle-lookup-to-cross-internal-m.patch --]
[-- Type: application/octet-stream, Size: 11957 bytes --]

From 637b97c587df703d9348e4075051834e25666441 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 28 Jul 2021 08:37:45 +1000
Subject: [PATCH] exportfs: Allow filehandle lookup to cross internal mount
 points.

When a filesystem has internal mounts, it controls the filehandles
across all those mounts (subvols) in the filesystem.  So it is useful to
be able to look up a filehandle again one mount, and get a result which
is in a different mount (part of the same overall file system).

This patch makes that possible by changing export_decode_fh() and
export_decode_fh_raw() to take a vfsmount pointer by reference, and
possibly change the vfsmount pointed to before returning.

The core of the change is in reconnect_path() which now not only checks
that the dentry is fully connected, but also that the vfsmnt reported
has the same 'dev' (reported by vfs_getattr) as the dentry.
If it doesn't, we walk up the dparent() chain to find the highest place
where the dev changes without there being a mount point, and trigger an
automount there.

As no filesystems yet provide local-mounts, this does not yet change any
behaviour.

In exportfs_decode_fh_raw() we previously tested for DCACHE_DISCONNECT
before calling reconnect_path().  That test is dropped.  It was only a
minor optimisation and is now inconvenient.

The change in overlayfs needs more careful thought than I have yet given
it.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/exportfs/expfs.c      | 96 ++++++++++++++++++++++++++++++++++------
 fs/fhandle.c             |  2 +-
 fs/nfsd/nfsfh.c          |  9 ++--
 fs/overlayfs/namei.c     |  5 ++-
 fs/xfs/xfs_ioctl.c       | 12 +++--
 include/linux/exportfs.h |  2 +-
 6 files changed, 103 insertions(+), 23 deletions(-)

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..2d7c42137b49 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -207,11 +207,18 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
  * that case reconnect_path may still succeed with target_dir fully
  * connected, but further operations using the filehandle will fail when
  * necessary (due to S_DEAD being set on the directory).
+ *
+ * If the filesystem supports multiple subvols, then *mntp may be updated
+ * to a subordinate mount point on the same filesystem.
  */
 static int
-reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
+reconnect_path(struct vfsmount **mntp, struct dentry *target_dir, char *nbuf)
 {
+	struct vfsmount *mnt = *mntp;
+	struct path path;
 	struct dentry *dentry, *parent;
+	struct kstat stat;
+	dev_t target_dev;
 
 	dentry = dget(target_dir);
 
@@ -232,6 +239,68 @@ reconnect_path(struct vfsmount *mnt, struct dentry *target_dir, char *nbuf)
 	}
 	dput(dentry);
 	clear_disconnected(target_dir);
+
+	/* Need to find appropriate vfsmount, which might not exist yet.
+	 * We may need to trigger automount points.
+	 */
+	path.mnt = mnt;
+	path.dentry = target_dir;
+	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+	target_dev = stat.dev;
+
+	path.dentry = mnt->mnt_root;
+	vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+
+	while (stat.dev != target_dev) {
+		/* walk up the dcache tree from target_dir, recording the
+		 * location of the most recent change in dev number,
+		 * until we find a mountpoint.
+		 * If there was no change in show_dev result before the
+		 * mountpount, the vfsmount at the mountpoint is what we want.
+		 * If there was, we need to trigger an automount where the
+		 * show_dev() result changed.
+		 */
+		struct dentry *last_change = NULL;
+		dev_t last_dev = target_dev;
+
+		dentry = dget(target_dir);
+		while ((parent = dget_parent(dentry)) != dentry) {
+			path.dentry = parent;
+			vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+			if (stat.dev != last_dev) {
+				path.dentry = dentry;
+				mnt = lookup_mnt(&path);
+				if (mnt) {
+					mntput(path.mnt);
+					path.mnt = mnt;
+					break;
+				}
+				dput(last_change);
+				last_change = dget(dentry);
+				last_dev = stat.dev;
+			}
+			dput(dentry);
+			dentry = parent;
+		}
+		dput(dentry); dput(parent);
+
+		if (!last_change)
+			break;
+
+		mnt = path.mnt;
+		path.dentry = last_change;
+		follow_down(&path, LOOKUP_AUTOMOUNT);
+		dput(path.dentry);
+		if (path.mnt == mnt)
+			/* There should have been a mount-trap there,
+			 * but there wasn't.  Just give up.
+			 */
+			break;
+
+		path.dentry = mnt->mnt_root;
+		vfs_getattr_nosec(&path, &stat, 0, AT_STATX_DONT_SYNC);
+	}
+	*mntp = path.mnt;
 	return 0;
 }
 
@@ -417,11 +486,12 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
 }
 EXPORT_SYMBOL_GPL(exportfs_encode_fh);
 
-struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
+struct dentry *exportfs_decode_fh(struct vfsmount **mntp, struct fid *fid,
 		int fh_len, int fileid_type,
 		int (*acceptable)(void *, struct dentry *), void *context)
 {
-	const struct export_operations *nop = mnt->mnt_sb->s_export_op;
+	struct super_block *sb = (*mntp)->mnt_sb;
+	const struct export_operations *nop = sb->s_export_op;
 	struct dentry *result, *alias;
 	char nbuf[NAME_MAX+1];
 	int err;
@@ -431,7 +501,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 	 */
 	if (!nop || !nop->fh_to_dentry)
 		return ERR_PTR(-ESTALE);
-	result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
+	result = nop->fh_to_dentry(sb, fid, fh_len, fileid_type);
 	if (PTR_ERR(result) == -ENOMEM)
 		return ERR_CAST(result);
 	if (IS_ERR_OR_NULL(result))
@@ -452,14 +522,12 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		 *
 		 * On the positive side there is only one dentry for each
 		 * directory inode.  On the negative side this implies that we
-		 * to ensure our dentry is connected all the way up to the
+		 * need to ensure our dentry is connected all the way up to the
 		 * filesystem root.
 		 */
-		if (result->d_flags & DCACHE_DISCONNECTED) {
-			err = reconnect_path(mnt, result, nbuf);
-			if (err)
-				goto err_result;
-		}
+		err = reconnect_path(mntp, result, nbuf);
+		if (err)
+			goto err_result;
 
 		if (!acceptable(context, result)) {
 			err = -EACCES;
@@ -494,7 +562,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		if (!nop->fh_to_parent)
 			goto err_result;
 
-		target_dir = nop->fh_to_parent(mnt->mnt_sb, fid,
+		target_dir = nop->fh_to_parent(sb, fid,
 				fh_len, fileid_type);
 		if (!target_dir)
 			goto err_result;
@@ -507,7 +575,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		 * connected to the filesystem root.  The VFS really doesn't
 		 * like disconnected directories..
 		 */
-		err = reconnect_path(mnt, target_dir, nbuf);
+		err = reconnect_path(mntp, target_dir, nbuf);
 		if (err) {
 			dput(target_dir);
 			goto err_result;
@@ -518,7 +586,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		 * dentry for the inode we're after, make sure that our
 		 * inode is actually connected to the parent.
 		 */
-		err = exportfs_get_name(mnt, target_dir, nbuf, result);
+		err = exportfs_get_name(*mntp, target_dir, nbuf, result);
 		if (err) {
 			dput(target_dir);
 			goto err_result;
@@ -556,7 +624,7 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 			goto err_result;
 		}
 
-		return alias;
+		return result;
 	}
 
  err_result:
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 6630c69c23a2..b47c7696469f 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -149,7 +149,7 @@ static int do_handle_to_path(int mountdirfd, struct file_handle *handle,
 	}
 	/* change the handle size to multiple of sizeof(u32) */
 	handle_dwords = handle->handle_bytes >> 2;
-	path->dentry = exportfs_decode_fh(path->mnt,
+	path->dentry = exportfs_decode_fh(&path->mnt,
 					  (struct fid *)handle->f_handle,
 					  handle_dwords, handle->handle_type,
 					  vfs_dentry_acceptable, NULL);
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 0bf7ac13ae50..4023046f63e2 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -157,6 +157,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	struct fid *fid = NULL, sfid;
 	struct svc_export *exp;
 	struct dentry *dentry;
+	struct vfsmount *mnt = NULL;
 	int fileid_type;
 	int data_left = fh->fh_size/4;
 	__be32 error;
@@ -253,6 +254,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	if (rqstp->rq_vers > 2)
 		error = nfserr_badhandle;
 
+	mnt = mntget(exp->ex_path.mnt);
+
 	if (fh->fh_version != 1) {
 		sfid.i32.ino = fh->ofh_ino;
 		sfid.i32.gen = fh->ofh_generation;
@@ -269,7 +272,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	if (fileid_type == FILEID_ROOT)
 		dentry = dget(exp->ex_path.dentry);
 	else {
-		dentry = exportfs_decode_fh(exp->ex_path.mnt, fid,
+		dentry = exportfs_decode_fh(&mnt, fid,
 				data_left, fileid_type,
 				nfsd_acceptable, exp);
 		if (IS_ERR_OR_NULL(dentry))
@@ -290,10 +293,11 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	}
 
 	fhp->fh_dentry = dentry;
-	fhp->fh_mnt = mntget(exp->ex_path.mnt);
+	fhp->fh_mnt = mnt;
 	fhp->fh_export = exp;
 	return 0;
 out:
+	mntput(mnt);
 	exp_put(exp);
 	return error;
 }
@@ -428,7 +432,6 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 	return error;
 }
 
-
 /*
  * Compose a file handle for an NFS reply.
  *
diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index 210cd6f66e28..0bca19f6df54 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -155,6 +155,7 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 {
 	struct dentry *real;
 	int bytes;
+	struct vfsmount *mnt2;
 
 	/*
 	 * Make sure that the stored uuid matches the uuid of the lower
@@ -164,9 +165,11 @@ struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
 		return NULL;
 
 	bytes = (fh->fb.len - offsetof(struct ovl_fb, fid));
-	real = exportfs_decode_fh(mnt, (struct fid *)fh->fb.fid,
+	mnt2 = mntget(mnt);
+	real = exportfs_decode_fh(&mnt2, (struct fid *)fh->fb.fid,
 				  bytes >> 2, (int)fh->fb.type,
 				  connected ? ovl_acceptable : NULL, mnt);
+	mntput(mnt2);
 	if (IS_ERR(real)) {
 		/*
 		 * Treat stale file handle to lower file as "origin unknown".
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 16039ea10ac9..76eb7d540811 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -149,6 +149,8 @@ xfs_handle_to_dentry(
 {
 	xfs_handle_t		handle;
 	struct xfs_fid64	fid;
+	struct dentry		*ret;
+	struct vfsmount		*mnt;
 
 	/*
 	 * Only allow handle opens under a directory.
@@ -168,9 +170,13 @@ xfs_handle_to_dentry(
 	fid.ino = handle.ha_fid.fid_ino;
 	fid.gen = handle.ha_fid.fid_gen;
 
-	return exportfs_decode_fh(parfilp->f_path.mnt, (struct fid *)&fid, 3,
-			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
-			xfs_handle_acceptable, NULL);
+	mnt = mntget(parfilp->f_path.mnt);
+	ret = exportfs_decode_fh(&mnt, (struct fid *)&fid, 3,
+				 FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
+				 xfs_handle_acceptable, NULL);
+	WARN_ON(mnt != parfilp->f_path.mnt);
+	mntput(mnt);
+	return ret;
 }
 
 STATIC struct dentry *
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index fe848901fcc3..9a8c5434a5cf 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -219,7 +219,7 @@ extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
 				    int *max_len, struct inode *parent);
 extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
 	int *max_len, int connectable);
-extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
+extern struct dentry *exportfs_decode_fh(struct vfsmount **mnt, struct fid *fid,
 	int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *),
 	void *context);
 
-- 
2.32.0