linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fuse submount_lookup needs to be initialized
@ 2023-11-09 22:36 Krister Johansen
  2023-11-09 22:37 ` [PATCH 1/2] fuse: ensure submount_lookup is initialized on alloc Krister Johansen
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Krister Johansen @ 2023-11-09 22:36 UTC (permalink / raw)
  To: Miklos Szeredi, linux-fsdevel
  Cc: Miklos Szeredi, linux-kernel, German Maglione, Greg Kurz,
	Max Reitz, Bernd Schubert, Borah, Chaitanya Kumar,
	Naresh Kamboju, Dan Carpenter, Kurmi, Suresh Kumar, Saarinen,
	Jani, lkft-triage, linux-kselftest, regressions, intel-gfx

Hi Miklos,
I got a couple of bug reports[1][2] this morning from teams that are
tracking regresssions in linux-next.  My patch 513dfacefd71 ("fuse:
share lookup state between submount and its parent") is causing panics
in the fuse unmount path.  The reports came from users with SLUB_DEBUG
enabled, and the additional debug sanitization catches the fact that the
submount_lookup field isn't getting initialized which could lead to a
subsequently bogus attempt to access the submount_lookup structure and
adjust its refcount.

I've added SLUB_DEBUG to my testing kconfig, and have reproduced the
problem using the memfd self-test that was triggering the problem for
both reporters.  With the fix that follows this e-mail, I see no more
erroneous accesses of poisoned slub memory.

I'm a bit unsure of the desired approach for fixing these kinds of
problems.  I'm also away from the office on Nov 10th and Nov 13th, but
expect to be back on the console on the Nov 14th.  Given the gap, I've
prepared a pair of patches, but we only need one.

The first is simply a followup fix that addresses the problem in a
subsequent one-line commit.

If you'd rather revert the entire bad patch and go again, the second
patch in the series is a v5 of the original with the submount_lookup
initialization added.

Either should do, but I wasn't sure which approach was preferable.

Thanks, and my apologies for the inconvenience.

-K

[1] https://lore.kernel.org/linux-fsdevel/CA+G9fYue-dV7t-NrOhWwGshvyboXjb2B6HpCDVDe3bgG7fbnsg@mail.gmail.com/T/#u
[2] https://lore.kernel.org/intel-gfx/SJ1PR11MB6129508509896AD7D0E03114B9AFA@SJ1PR11MB6129.namprd11.prod.outlook.com/T/#u

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] fuse: ensure submount_lookup is initialized on alloc
  2023-11-09 22:36 [PATCH 0/2] Fuse submount_lookup needs to be initialized Krister Johansen
@ 2023-11-09 22:37 ` Krister Johansen
  2023-11-09 22:37 ` [v5 PATCH 2/2] fuse: share lookup state between submount and its parent Krister Johansen
  2023-11-10  9:44 ` [PATCH 0/2] Fuse submount_lookup needs to be initialized Miklos Szeredi
  2 siblings, 0 replies; 4+ messages in thread
From: Krister Johansen @ 2023-11-09 22:37 UTC (permalink / raw)
  To: Miklos Szeredi, linux-fsdevel
  Cc: Miklos Szeredi, linux-kernel, German Maglione, Greg Kurz,
	Max Reitz, Bernd Schubert, Borah, Chaitanya Kumar,
	Naresh Kamboju, Dan Carpenter, Kurmi, Suresh Kumar, Saarinen,
	Jani, lkft-triage, linux-kselftest, regressions, intel-gfx

When introduced, the submount lookup reference tracking neglected to set
an initial value in the fuse inode as part of fuse_inode_alloc.  Users
running with SLUB_DEBUG enabled caught and reported this error.  Fix by
ensuring that this value is always initialized to NULL.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: stable@vger.kernel.org
Fixes: 513dfacefd71 ("fuse: share lookup state between submount and its parent")
---
 fs/fuse/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 243bda3cfdf6..d7ebc322e55b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -103,6 +103,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 	fi->state = 0;
 	mutex_init(&fi->mutex);
 	spin_lock_init(&fi->lock);
+	fi->submount_lookup = NULL;
 	fi->forget = fuse_alloc_forget();
 	if (!fi->forget)
 		goto out_free;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [v5 PATCH 2/2] fuse: share lookup state between submount and its parent
  2023-11-09 22:36 [PATCH 0/2] Fuse submount_lookup needs to be initialized Krister Johansen
  2023-11-09 22:37 ` [PATCH 1/2] fuse: ensure submount_lookup is initialized on alloc Krister Johansen
@ 2023-11-09 22:37 ` Krister Johansen
  2023-11-10  9:44 ` [PATCH 0/2] Fuse submount_lookup needs to be initialized Miklos Szeredi
  2 siblings, 0 replies; 4+ messages in thread
From: Krister Johansen @ 2023-11-09 22:37 UTC (permalink / raw)
  To: Miklos Szeredi, linux-fsdevel
  Cc: Miklos Szeredi, linux-kernel, German Maglione, Greg Kurz,
	Max Reitz, Bernd Schubert, Borah, Chaitanya Kumar,
	Naresh Kamboju, Dan Carpenter, Kurmi, Suresh Kumar, Saarinen,
	Jani, lkft-triage, linux-kselftest, regressions, intel-gfx

Fuse submounts do not perform a lookup for the nodeid that they inherit
from their parent.  Instead, the code decrements the nlookup on the
submount's fuse_inode when it is instantiated, and no forget is
performed when a submount root is evicted.

Trouble arises when the submount's parent is evicted despite the
submount itself being in use.  In this author's case, the submount was
in a container and deatched from the initial mount namespace via a
MNT_DEATCH operation.  When memory pressure triggered the shrinker, the
inode from the parent was evicted, which triggered enough forgets to
render the submount's nodeid invalid.

Since submounts should still function, even if their parent goes away,
solve this problem by sharing refcounted state between the parent and
its submount.  When all of the references on this shared state reach
zero, it's safe to forget the final lookup of the fuse nodeid.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: stable@vger.kernel.org
Fixes: 1866d779d5d2 ("fuse: Allow fuse_fill_super_common() for submounts")
---
Changes since v4:

- Ensure that submount_lookup is NULL initialized in fuse_alloc_inode.
  (Feedback from Naresh Kamboju and Chaitanya Kumar Borah)

Changes since v3:

- Remove rcu head from lookup tracking struct along with unnecessary
  kfree_rcu call. (Feedback from Miklos Szeredi)
- Make nlookup one implicitly.  Remove from struct and simplify places
  where it was being used. (Feedback from Miklos Szeredi)
- Remove unnecessary spinlock acquisition. (Feedback from Miklos
  Szeredi)
- Add a WARN_ON if the lookup tracking cookie cannot be found during
  fuse_fill_super_submount.  (Feedback from Miklos Szeredi)

Changes since v2:

- Move to an approach where the lookup is shared between the submount's
  parent and children.  Use a reference counted lookup cookie to decide
  when it is safe to perform the forget of the final reference.
  (Feedback from Miklos Szeredi)

Changes since v1:

- Cleanups to pacify test robot

Changes since RFC:

- Modified fuse_fill_super_submount to always fail if dentry cannot be
  revalidated.  (Feedback from Bernd Schubert)
- Fixed up an edge case where looked up but subsequently declared
  invalid dentries were not correctly tracking nlookup.  (Error was
  introduced in my RFC).
---
 fs/fuse/fuse_i.h | 15 ++++++++++
 fs/fuse/inode.c  | 75 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 405252bb51f2..9377c46f14c4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -63,6 +63,19 @@ struct fuse_forget_link {
 	struct fuse_forget_link *next;
 };
 
+/* Submount lookup tracking */
+struct fuse_submount_lookup {
+	/** Refcount */
+	refcount_t count;
+
+	/** Unique ID, which identifies the inode between userspace
+	 * and kernel */
+	u64 nodeid;
+
+	/** The request used for sending the FORGET message */
+	struct fuse_forget_link *forget;
+};
+
 /** FUSE inode */
 struct fuse_inode {
 	/** Inode data */
@@ -158,6 +171,8 @@ struct fuse_inode {
 	 */
 	struct fuse_inode_dax *dax;
 #endif
+	/** Submount specific lookup tracking */
+	struct fuse_submount_lookup *submount_lookup;
 };
 
 /** FUSE inode state bits */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 444418e240c8..d7ebc322e55b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -68,6 +68,24 @@ struct fuse_forget_link *fuse_alloc_forget(void)
 	return kzalloc(sizeof(struct fuse_forget_link), GFP_KERNEL_ACCOUNT);
 }
 
+static struct fuse_submount_lookup *fuse_alloc_submount_lookup(void)
+{
+	struct fuse_submount_lookup *sl;
+
+	sl = kzalloc(sizeof(struct fuse_submount_lookup), GFP_KERNEL_ACCOUNT);
+	if (!sl)
+		return NULL;
+	sl->forget = fuse_alloc_forget();
+	if (!sl->forget)
+		goto out_free;
+
+	return sl;
+
+out_free:
+	kfree(sl);
+	return NULL;
+}
+
 static struct inode *fuse_alloc_inode(struct super_block *sb)
 {
 	struct fuse_inode *fi;
@@ -85,6 +103,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 	fi->state = 0;
 	mutex_init(&fi->mutex);
 	spin_lock_init(&fi->lock);
+	fi->submount_lookup = NULL;
 	fi->forget = fuse_alloc_forget();
 	if (!fi->forget)
 		goto out_free;
@@ -113,6 +132,17 @@ static void fuse_free_inode(struct inode *inode)
 	kmem_cache_free(fuse_inode_cachep, fi);
 }
 
+static void fuse_cleanup_submount_lookup(struct fuse_conn *fc,
+					 struct fuse_submount_lookup *sl)
+{
+	if (!refcount_dec_and_test(&sl->count))
+		return;
+
+	fuse_queue_forget(fc, sl->forget, sl->nodeid, 1);
+	sl->forget = NULL;
+	kfree(sl);
+}
+
 static void fuse_evict_inode(struct inode *inode)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
@@ -132,6 +162,11 @@ static void fuse_evict_inode(struct inode *inode)
 					  fi->nlookup);
 			fi->forget = NULL;
 		}
+
+		if (fi->submount_lookup) {
+			fuse_cleanup_submount_lookup(fc, fi->submount_lookup);
+			fi->submount_lookup = NULL;
+		}
 	}
 	if (S_ISREG(inode->i_mode) && !fuse_is_bad(inode)) {
 		WARN_ON(!list_empty(&fi->write_files));
@@ -332,6 +367,13 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		fuse_dax_dontcache(inode, attr->flags);
 }
 
+static void fuse_init_submount_lookup(struct fuse_submount_lookup *sl,
+				      u64 nodeid)
+{
+	sl->nodeid = nodeid;
+	refcount_set(&sl->count, 1);
+}
+
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr,
 			    struct fuse_conn *fc)
 {
@@ -395,12 +437,22 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 	 */
 	if (fc->auto_submounts && (attr->flags & FUSE_ATTR_SUBMOUNT) &&
 	    S_ISDIR(attr->mode)) {
+		struct fuse_inode *fi;
+
 		inode = new_inode(sb);
 		if (!inode)
 			return NULL;
 
 		fuse_init_inode(inode, attr, fc);
-		get_fuse_inode(inode)->nodeid = nodeid;
+		fi = get_fuse_inode(inode);
+		fi->nodeid = nodeid;
+		fi->submount_lookup = fuse_alloc_submount_lookup();
+		if (!fi->submount_lookup) {
+			iput(inode);
+			return NULL;
+		}
+		/* Sets nlookup = 1 on fi->submount_lookup->nlookup */
+		fuse_init_submount_lookup(fi->submount_lookup, nodeid);
 		inode->i_flags |= S_AUTOMOUNT;
 		goto done;
 	}
@@ -423,11 +475,11 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 		iput(inode);
 		goto retry;
 	}
-done:
 	fi = get_fuse_inode(inode);
 	spin_lock(&fi->lock);
 	fi->nlookup++;
 	spin_unlock(&fi->lock);
+done:
 	fuse_change_attributes(inode, attr, NULL, attr_valid, attr_version);
 
 	return inode;
@@ -1465,6 +1517,8 @@ static int fuse_fill_super_submount(struct super_block *sb,
 	struct super_block *parent_sb = parent_fi->inode.i_sb;
 	struct fuse_attr root_attr;
 	struct inode *root;
+	struct fuse_submount_lookup *sl;
+	struct fuse_inode *fi;
 
 	fuse_sb_defaults(sb);
 	fm->sb = sb;
@@ -1487,12 +1541,27 @@ static int fuse_fill_super_submount(struct super_block *sb,
 	 * its nlookup should not be incremented.  fuse_iget() does
 	 * that, though, so undo it here.
 	 */
-	get_fuse_inode(root)->nlookup--;
+	fi = get_fuse_inode(root);
+	fi->nlookup--;
+
 	sb->s_d_op = &fuse_dentry_operations;
 	sb->s_root = d_make_root(root);
 	if (!sb->s_root)
 		return -ENOMEM;
 
+	/*
+	 * Grab the parent's submount_lookup pointer and take a
+	 * reference on the shared nlookup from the parent.  This is to
+	 * prevent the last forget for this nodeid from getting
+	 * triggered until all users have finished with it.
+	 */
+	sl = parent_fi->submount_lookup;
+	WARN_ON(!sl);
+	if (sl) {
+		refcount_inc(&sl->count);
+		fi->submount_lookup = sl;
+	}
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] Fuse submount_lookup needs to be initialized
  2023-11-09 22:36 [PATCH 0/2] Fuse submount_lookup needs to be initialized Krister Johansen
  2023-11-09 22:37 ` [PATCH 1/2] fuse: ensure submount_lookup is initialized on alloc Krister Johansen
  2023-11-09 22:37 ` [v5 PATCH 2/2] fuse: share lookup state between submount and its parent Krister Johansen
@ 2023-11-10  9:44 ` Miklos Szeredi
  2 siblings, 0 replies; 4+ messages in thread
From: Miklos Szeredi @ 2023-11-10  9:44 UTC (permalink / raw)
  To: Krister Johansen
  Cc: linux-fsdevel, Miklos Szeredi, linux-kernel, German Maglione,
	Greg Kurz, Max Reitz, Bernd Schubert, Borah, Chaitanya Kumar,
	Naresh Kamboju, Dan Carpenter, Kurmi, Suresh Kumar, Saarinen,
	Jani, lkft-triage, linux-kselftest, regressions, intel-gfx

On Thu, 9 Nov 2023 at 23:37, Krister Johansen <kjlx@templeofstupid.com> wrote:

> Either should do, but I wasn't sure which approach was preferable.

An incremental is better in this situation.   Applied and pushed.

> Thanks, and my apologies for the inconvenience.

Really no need to apologize, this happens and the best possible
outcome is that it get fixed before being released.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-10 19:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 22:36 [PATCH 0/2] Fuse submount_lookup needs to be initialized Krister Johansen
2023-11-09 22:37 ` [PATCH 1/2] fuse: ensure submount_lookup is initialized on alloc Krister Johansen
2023-11-09 22:37 ` [v5 PATCH 2/2] fuse: share lookup state between submount and its parent Krister Johansen
2023-11-10  9:44 ` [PATCH 0/2] Fuse submount_lookup needs to be initialized Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).