All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] autofs: fix may_umount_tree()
@ 2022-07-11  3:37 Ian Kent
  2022-07-11  3:37 ` [PATCH 1/3] vfs: track count of child mounts Ian Kent
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ian Kent @ 2022-07-11  3:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

The function used by autofs to check if a tree of mounts may be umounted
doesn't work with mount namespaces.

Some time ago an attempt to fix it, appart from the implementation being
wrong, failed to take advantage of cases that allowed the check to
terminate early and so was too inefficient to be considered for merge.

This series utilizes cases for which the check can be terminated early
as best it can.

The patches in this series are prefixed with vfs becuase they are
changes to the VFS code but the only caller of the may_umount_tree()
function is the autofs file system.

For the interested here is a procedure that can be used to reproduce
the problem on a current kernel:

- Add this line to /etc/auto.master:
/- /etc/auto.test -t 5

- create the map /etc/auto.test as:
/test -fstype=tmpfs :tmpfs

- Enable debug logging in automount:

sed -i '/^#logging =/c logging = debug' /etc/autofs.conf
systemctl restart autofs.service

The autofs debug logging output can be observed in another terminal
using "journalctl -f".

Use the following script to run the two tests below.

$ cat /usr/local/bin/test.sh
#!/bin/sh
set -e
exec > >(logger --id=$$) 2>&1
echo Starting test
# Change to the /test directory to keep the mount active
cd /test
grep /test /proc/self/mountinfo
sleep 10
echo Ending test

1. Run the test script as root from the root mount namespace.
- observe that automount reports "expire_proc_direct: 1 remaining in /-"
  until after the script exits.
- correct behaviour.

2. Run the test script as root from a new mount namespace by using:

# unshare -m --propagation unchanged test.sh

- Observe that automount reports "expiring path /test" before the
  script has exited and tries to unmount /test.
  This fails with ">> umount: /test: target is busy." until the script
  exits.
- incorrect behaviour.

Signed-off-by: Ian Kent <raven@themaw.net>
---

Ian Kent (3):
      vfs: track count of child mounts
      vfs: add propagate_mount_tree_busy() helper
      vfs: make may_umount_tree() mount namespace aware


 fs/autofs/expire.c    | 14 ++++++++--
 fs/mount.h            |  1 +
 fs/namespace.c        | 40 +++++++++++++++++++---------
 fs/pnode.c            | 61 +++++++++++++++++++++++++++++++++++++++++++
 fs/pnode.h            |  1 +
 include/linux/mount.h |  5 +++-
 6 files changed, 107 insertions(+), 15 deletions(-)

--
Ian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] vfs: track count of child mounts
  2022-07-11  3:37 [PATCH 0/3] autofs: fix may_umount_tree() Ian Kent
@ 2022-07-11  3:37 ` Ian Kent
  2022-07-20  1:50   ` Al Viro
  2022-07-11  3:37 ` [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper Ian Kent
  2022-07-11  3:37 ` [PATCH 3/3] vfs: make may_umount_tree() mount namespace aware Ian Kent
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Kent @ 2022-07-11  3:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

While the total reference count of a mount is mostly all that's needed
the reference count corresponding to the mounts only is occassionally
also needed (for example, autofs checking if a tree of mounts can be
expired).

To make this reference count avaialble with minimal changes add a
counter to track the number of child mounts under a given mount. This
count can then be used to calculate the mounts only reference count.

Signed-off-by: Ian Kent <raven@themaw.net>
---
 fs/mount.h     |    1 +
 fs/namespace.c |    8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/fs/mount.h b/fs/mount.h
index 0b6e08cf8afb..3f0f62912463 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -52,6 +52,7 @@ struct mount {
 	int mnt_writers;
 #endif
 	struct list_head mnt_mounts;	/* list of children, anchored here */
+	unsigned int mnt_mounts_cnt;	/* count of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
 	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
diff --git a/fs/namespace.c b/fs/namespace.c
index e6a7e769d25d..3c1ee5b5bb69 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -882,6 +882,8 @@ static struct mountpoint *unhash_mnt(struct mount *mnt)
 	struct mountpoint *mp;
 	mnt->mnt_parent = mnt;
 	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
+	if (!list_empty(&mnt->mnt_child))
+		mnt->mnt_parent->mnt_mounts_cnt--;
 	list_del_init(&mnt->mnt_child);
 	hlist_del_init_rcu(&mnt->mnt_hash);
 	hlist_del_init(&mnt->mnt_mp_list);
@@ -918,6 +920,7 @@ static void __attach_mnt(struct mount *mnt, struct mount *parent)
 	hlist_add_head_rcu(&mnt->mnt_hash,
 			   m_hash(&parent->mnt, mnt->mnt_mountpoint));
 	list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
+	parent->mnt_mounts_cnt++;
 }
 
 /*
@@ -936,6 +939,8 @@ void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct m
 	struct mountpoint *old_mp = mnt->mnt_mp;
 	struct mount *old_parent = mnt->mnt_parent;
 
+	if (!list_empty(&mnt->mnt_child))
+		mnt->mnt_parent->mnt_mounts_cnt--;
 	list_del_init(&mnt->mnt_child);
 	hlist_del_init(&mnt->mnt_mp_list);
 	hlist_del_init_rcu(&mnt->mnt_hash);
@@ -1562,6 +1567,8 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 
 	/* Hide the mounts from mnt_mounts */
 	list_for_each_entry(p, &tmp_list, mnt_list) {
+		if (!list_empty(&p->mnt_child))
+			p->mnt_parent->mnt_mounts_cnt--;
 		list_del_init(&p->mnt_child);
 	}
 
@@ -1590,6 +1597,7 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 			if (!disconnect) {
 				/* Don't forget about p */
 				list_add_tail(&p->mnt_child, &p->mnt_parent->mnt_mounts);
+				p->mnt_parent->mnt_mounts_cnt++;
 			} else {
 				umount_mnt(p);
 			}



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper
  2022-07-11  3:37 [PATCH 0/3] autofs: fix may_umount_tree() Ian Kent
  2022-07-11  3:37 ` [PATCH 1/3] vfs: track count of child mounts Ian Kent
@ 2022-07-11  3:37 ` Ian Kent
  2022-07-20  1:54   ` Al Viro
  2022-07-11  3:37 ` [PATCH 3/3] vfs: make may_umount_tree() mount namespace aware Ian Kent
  2 siblings, 1 reply; 12+ messages in thread
From: Ian Kent @ 2022-07-11  3:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

Now that child mounts are tracked the expire checks need to be able to
use this to check if a mount is in use.

Currently when checking a mount for expiration may_umount_tree() checks
only if the passed in mount is in use. This leads to false positive
callbacks to the automount daemon to umount the mount which fail if any
propagated mounts are in use.

To avoid these unnecessary callbacks may_umount_tree() needs to check
propagated mounts in a similar way to may_umount().

Add a helper that can do this.

Signed-off-by: Ian Kent <raven@themaw.net>
---
 fs/pnode.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/pnode.h |    3 +++
 2 files changed, 64 insertions(+)

diff --git a/fs/pnode.c b/fs/pnode.c
index 1106137c747a..e2a906db4324 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -404,6 +404,67 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
 	return 0;
 }
 
+static int do_mount_in_use_check(struct mount *mnt, int cnt)
+{
+	struct mount *topper;
+
+	/* Is there exactly one mount on the child that covers
+	 * it completely?
+	 */
+	topper = find_topper(mnt);
+	if (topper) {
+		int topper_cnt = topper->mnt_mounts_cnt + 1;
+
+		/* Open file or pwd within singular mount? */
+		if (do_refcount_check(topper, topper_cnt))
+			return 1;
+		/* Account for singular mount on parent */
+		cnt += 1;
+	}
+
+	if (do_refcount_check(mnt, cnt))
+		return 1;
+
+	return 0;
+}
+
+/*
+ * Check if the mount tree at 'mnt' is in use or any of its
+ * propogated mounts are in use.
+ * @mnt: the mount to be checked
+ * @adjust: caller holds an additional reference to mount
+ * Check if mnt or any of its propogated mounts have a reference
+ * count greater than the minimum reference count (ie. are in use).
+ */
+int propagate_mount_tree_busy(struct mount *mnt, unsigned int flags)
+{
+	struct mount *parent = mnt->mnt_parent;
+	struct mount *m, *child;
+	unsigned int referenced = flags & TREE_BUSY_REFERENCED;
+	int cnt;
+
+	/* Check for an elevated refcount on the passed in mount.
+	 * If adjust is true the caller holds a reference to the
+	 * passed in mount.
+	 */
+	cnt = mnt->mnt_mounts_cnt + (referenced ?  2 : 1);
+	if (do_mount_in_use_check(mnt, cnt))
+		return 1;
+
+	for (m = propagation_next(parent, parent); m;
+			m = propagation_next(m, parent)) {
+		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
+		if (!child)
+			continue;
+
+		cnt = child->mnt_mounts_cnt + 1;
+
+		if (do_mount_in_use_check(child, cnt))
+			return 1;
+	}
+	return 0;
+}
+
 /*
  * Clear MNT_LOCKED when it can be shown to be safe.
  *
diff --git a/fs/pnode.h b/fs/pnode.h
index 988f1aa9b02a..d7b9dddb257b 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -30,6 +30,8 @@
 
 #define CL_COPY_ALL		(CL_COPY_UNBINDABLE | CL_COPY_MNT_NS_FILE)
 
+#define TREE_BUSY_REFERENCED	0x01
+
 static inline void set_mnt_shared(struct mount *mnt)
 {
 	mnt->mnt.mnt_flags &= ~MNT_SHARED_MASK;
@@ -41,6 +43,7 @@ int propagate_mnt(struct mount *, struct mountpoint *, struct mount *,
 		struct hlist_head *);
 int propagate_umount(struct list_head *);
 int propagate_mount_busy(struct mount *, int);
+int propagate_mount_tree_busy(struct mount *, unsigned int);
 void propagate_mount_unlock(struct mount *);
 void mnt_release_group_id(struct mount *);
 int get_dominating_id(struct mount *mnt, const struct path *root);



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] vfs: make may_umount_tree() mount namespace aware
  2022-07-11  3:37 [PATCH 0/3] autofs: fix may_umount_tree() Ian Kent
  2022-07-11  3:37 ` [PATCH 1/3] vfs: track count of child mounts Ian Kent
  2022-07-11  3:37 ` [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper Ian Kent
@ 2022-07-11  3:37 ` Ian Kent
  2 siblings, 0 replies; 12+ messages in thread
From: Ian Kent @ 2022-07-11  3:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

Change may_umount_tree() (and the associated autofs code) to use the
propagate_mount_tree_busy() helper so it also checks if propagated
mounts are busy.

This avoids unnecessary umount requests being sent to the automount
daemon when a mount in another mount namespace is in use when the
expire check is done.

Signed-off-by: Ian Kent <raven@themaw.net>
---
 fs/autofs/expire.c    |   14 ++++++++++++--
 fs/namespace.c        |   32 ++++++++++++++++++++------------
 fs/pnode.h            |    2 --
 include/linux/mount.h |    5 ++++-
 4 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/fs/autofs/expire.c b/fs/autofs/expire.c
index 038b3d2d9f57..1352c454cf1d 100644
--- a/fs/autofs/expire.c
+++ b/fs/autofs/expire.c
@@ -31,10 +31,13 @@ static int autofs_mount_busy(struct vfsmount *mnt,
 {
 	struct dentry *top = dentry;
 	struct path path = {.mnt = mnt, .dentry = dentry};
+	unsigned int flags;
 	int status = 1;
 
 	pr_debug("dentry %p %pd\n", dentry, dentry);
 
+	/* A reference to the mount is held. */
+	flags = TREE_BUSY_REFERENCED;
 	path_get(&path);
 
 	if (!follow_down_one(&path))
@@ -55,7 +58,7 @@ static int autofs_mount_busy(struct vfsmount *mnt,
 	}
 
 	/* Update the expiry counter if fs is busy */
-	if (!may_umount_tree(path.mnt)) {
+	if (!may_umount_tree(path.mnt, flags)) {
 		struct autofs_info *ino;
 
 		ino = autofs_dentry_ino(top);
@@ -152,14 +155,21 @@ static int autofs_direct_busy(struct vfsmount *mnt,
 			      unsigned long timeout,
 			      unsigned int how)
 {
+	unsigned int flags;
+
 	pr_debug("top %p %pd\n", top, top);
 
 	/* Forced expire, user space handles busy mounts */
 	if (how & AUTOFS_EXP_FORCED)
 		return 0;
 
+	/* A mounted direct mount will have an open file handle
+	 * associated with it so we need TREE_BUSY_REFERENCED.
+	 */
+	flags = TREE_BUSY_REFERENCED;
+
 	/* If it's busy update the expiry counters */
-	if (!may_umount_tree(mnt)) {
+	if (!may_umount_tree(mnt, flags)) {
 		struct autofs_info *ino;
 
 		ino = autofs_dentry_ino(top);
diff --git a/fs/namespace.c b/fs/namespace.c
index 3c1ee5b5bb69..bdcb55e821f4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1431,28 +1431,36 @@ void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor)
  * open files, pwds, chroots or sub mounts that are
  * busy.
  */
-int may_umount_tree(struct vfsmount *m)
+int may_umount_tree(struct vfsmount *m, unsigned int flags)
 {
 	struct mount *mnt = real_mount(m);
-	int actual_refs = 0;
-	int minimum_refs = 0;
 	struct mount *p;
+	int ret = 1;
+
 	BUG_ON(!m);
 
-	/* write lock needed for mnt_get_count */
+	down_read(&namespace_sem);
 	lock_mount_hash();
-	for (p = mnt; p; p = next_mnt(p, mnt)) {
-		actual_refs += mnt_get_count(p);
-		minimum_refs += 2;
+	if (propagate_mount_tree_busy(mnt, flags)) {
+		ret = 0;
+		goto out;
 	}
+	/* Only the passed in mount will have a reference held by
+	 * the caller.
+	 */
+	flags &= ~TREE_BUSY_REFERENCED;
+	for (p = next_mnt(mnt, mnt); p; p = next_mnt(p, mnt)) {
+		if (propagate_mount_tree_busy(p, flags)) {
+			ret = 0;
+			break;
+		}
+	}
+out:
 	unlock_mount_hash();
+	up_read(&namespace_sem);
 
-	if (actual_refs > minimum_refs)
-		return 0;
-
-	return 1;
+	return ret;
 }
-
 EXPORT_SYMBOL(may_umount_tree);
 
 /**
diff --git a/fs/pnode.h b/fs/pnode.h
index d7b9dddb257b..12c3ab5962a0 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -30,8 +30,6 @@
 
 #define CL_COPY_ALL		(CL_COPY_UNBINDABLE | CL_COPY_MNT_NS_FILE)
 
-#define TREE_BUSY_REFERENCED	0x01
-
 static inline void set_mnt_shared(struct mount *mnt)
 {
 	mnt->mnt.mnt_flags &= ~MNT_SHARED_MASK;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 55a4abaf6715..c21d74ea3d85 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -112,7 +112,10 @@ extern bool our_mnt(struct vfsmount *mnt);
 
 extern struct vfsmount *kern_mount(struct file_system_type *);
 extern void kern_unmount(struct vfsmount *mnt);
-extern int may_umount_tree(struct vfsmount *);
+
+#define TREE_BUSY_REFERENCED	0x01
+
+extern int may_umount_tree(struct vfsmount *m, unsigned int flags);
 extern int may_umount(struct vfsmount *);
 extern long do_mount(const char *, const char __user *,
 		     const char *, unsigned long, void *);



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] vfs: track count of child mounts
  2022-07-11  3:37 ` [PATCH 1/3] vfs: track count of child mounts Ian Kent
@ 2022-07-20  1:50   ` Al Viro
  2022-07-20  2:17     ` Ian Kent
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-07-20  1:50 UTC (permalink / raw)
  To: Ian Kent
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

On Mon, Jul 11, 2022 at 11:37:40AM +0800, Ian Kent wrote:
> While the total reference count of a mount is mostly all that's needed
> the reference count corresponding to the mounts only is occassionally
> also needed (for example, autofs checking if a tree of mounts can be
> expired).
> 
> To make this reference count avaialble with minimal changes add a
> counter to track the number of child mounts under a given mount. This
> count can then be used to calculate the mounts only reference count.

No.  This is a wrong approach - instead of keeping track of number of
children, we should just stop having them contribute to refcount of
the parent.  Here's what I've got in my local tree; life gets simpler
that way.

commit e99f1f9cc864103f326a5352e6ce1e377613437f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Jul 9 14:45:39 2022 -0400

    namespace: don't keep ->mnt_parent pinned
    
    makes refcounting more consistent
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/namespace.c b/fs/namespace.c
index 68789f896f08..53c29110a0cd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -906,7 +906,6 @@ void mnt_set_mountpoint(struct mount *mnt,
 			struct mount *child_mnt)
 {
 	mp->m_count++;
-	mnt_add_count(mnt, 1);	/* essentially, that's mntget */
 	child_mnt->mnt_mountpoint = mp->m_dentry;
 	child_mnt->mnt_parent = mnt;
 	child_mnt->mnt_mp = mp;
@@ -1429,22 +1428,18 @@ void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor)
 int may_umount_tree(struct vfsmount *m)
 {
 	struct mount *mnt = real_mount(m);
-	int actual_refs = 0;
-	int minimum_refs = 0;
-	struct mount *p;
 	BUG_ON(!m);
 
 	/* write lock needed for mnt_get_count */
 	lock_mount_hash();
-	for (p = mnt; p; p = next_mnt(p, mnt)) {
-		actual_refs += mnt_get_count(p);
-		minimum_refs += 2;
+	for (struct mount *p = mnt; p; p = next_mnt(p, mnt)) {
+		int allowed = p == mnt ? 2 : 1;
+		if (mnt_get_count(p) > allowed) {
+			unlock_mount_hash();
+			return 0;
+		}
 	}
 	unlock_mount_hash();
-
-	if (actual_refs > minimum_refs)
-		return 0;
-
 	return 1;
 }
 
@@ -1586,7 +1581,6 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 
 		disconnect = disconnect_mount(p, how);
 		if (mnt_has_parent(p)) {
-			mnt_add_count(p->mnt_parent, -1);
 			if (!disconnect) {
 				/* Don't forget about p */
 				list_add_tail(&p->mnt_child, &p->mnt_parent->mnt_mounts);
@@ -2892,12 +2886,8 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
 		put_mountpoint(old_mp);
 out:
 	unlock_mount(mp);
-	if (!err) {
-		if (attached)
-			mntput_no_expire(parent);
-		else
-			free_mnt_ns(ns);
-	}
+	if (!err && !attached)
+		free_mnt_ns(ns);
 	return err;
 }
 
@@ -3869,7 +3859,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 		const char __user *, put_old)
 {
 	struct path new, old, root;
-	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
+	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent;
 	struct mountpoint *old_mp, *root_mp;
 	int error;
 
@@ -3900,10 +3890,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	new_mnt = real_mount(new.mnt);
 	root_mnt = real_mount(root.mnt);
 	old_mnt = real_mount(old.mnt);
-	ex_parent = new_mnt->mnt_parent;
 	root_parent = root_mnt->mnt_parent;
 	if (IS_MNT_SHARED(old_mnt) ||
-		IS_MNT_SHARED(ex_parent) ||
+		IS_MNT_SHARED(new_mnt->mnt_parent) ||
 		IS_MNT_SHARED(root_parent))
 		goto out4;
 	if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
@@ -3942,7 +3931,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	attach_mnt(root_mnt, old_mnt, old_mp);
 	/* mount new_root on / */
 	attach_mnt(new_mnt, root_parent, root_mp);
-	mnt_add_count(root_parent, -1);
 	touch_mnt_namespace(current->nsproxy->mnt_ns);
 	/* A moved mount should not expire automatically */
 	list_del_init(&new_mnt->mnt_expire);
@@ -3952,8 +3940,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	error = 0;
 out4:
 	unlock_mount(old_mp);
-	if (!error)
-		mntput_no_expire(ex_parent);
 out3:
 	path_put(&root);
 out2:
diff --git a/fs/pnode.c b/fs/pnode.c
index 1106137c747a..e2c8a4b18857 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -368,7 +368,7 @@ static inline int do_refcount_check(struct mount *mnt, int count)
  */
 int propagate_mount_busy(struct mount *mnt, int refcnt)
 {
-	struct mount *m, *child, *topper;
+	struct mount *m, *child;
 	struct mount *parent = mnt->mnt_parent;
 
 	if (mnt == parent)
@@ -384,7 +384,6 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
 
 	for (m = propagation_next(parent, parent); m;
 	     		m = propagation_next(m, parent)) {
-		int count = 1;
 		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
 		if (!child)
 			continue;
@@ -392,13 +391,10 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
 		/* Is there exactly one mount on the child that covers
 		 * it completely whose reference should be ignored?
 		 */
-		topper = find_topper(child);
-		if (topper)
-			count += 1;
-		else if (!list_empty(&child->mnt_mounts))
+		if (!find_topper(child) && !list_empty(&child->mnt_mounts))
 			continue;
 
-		if (do_refcount_check(child, count))
+		if (do_refcount_check(child, 1))
 			return 1;
 	}
 	return 0;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper
  2022-07-11  3:37 ` [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper Ian Kent
@ 2022-07-20  1:54   ` Al Viro
  2022-07-20  2:31     ` Ian Kent
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-07-20  1:54 UTC (permalink / raw)
  To: Ian Kent
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

On Mon, Jul 11, 2022 at 11:37:46AM +0800, Ian Kent wrote:

> +static int do_mount_in_use_check(struct mount *mnt, int cnt)
> +{
> +	struct mount *topper;
> +
> +	/* Is there exactly one mount on the child that covers
> +	 * it completely?
> +	 */
> +	topper = find_topper(mnt);
> +	if (topper) {
> +		int topper_cnt = topper->mnt_mounts_cnt + 1;
> +
> +		/* Open file or pwd within singular mount? */
> +		if (do_refcount_check(topper, topper_cnt))
> +			return 1;

Whatever the hell for?  umount(2) will be able to slide the
underlying mount from under the topper, whatever the
refcount of topper might have been.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] vfs: track count of child mounts
  2022-07-20  1:50   ` Al Viro
@ 2022-07-20  2:17     ` Ian Kent
  2022-07-20  7:26       ` Ian Kent
  2022-07-26  5:11       ` Ian Kent
  0 siblings, 2 replies; 12+ messages in thread
From: Ian Kent @ 2022-07-20  2:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List


On 20/7/22 09:50, Al Viro wrote:
> On Mon, Jul 11, 2022 at 11:37:40AM +0800, Ian Kent wrote:
>> While the total reference count of a mount is mostly all that's needed
>> the reference count corresponding to the mounts only is occassionally
>> also needed (for example, autofs checking if a tree of mounts can be
>> expired).
>>
>> To make this reference count avaialble with minimal changes add a
>> counter to track the number of child mounts under a given mount. This
>> count can then be used to calculate the mounts only reference count.
> No.  This is a wrong approach - instead of keeping track of number of
> children, we should just stop having them contribute to refcount of
> the parent.  Here's what I've got in my local tree; life gets simpler
> that way.

Right, I'll grab this and run some tests.


Ian

>
> commit e99f1f9cc864103f326a5352e6ce1e377613437f
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sat Jul 9 14:45:39 2022 -0400
>
>      namespace: don't keep ->mnt_parent pinned
>      
>      makes refcounting more consistent
>      
>      Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 68789f896f08..53c29110a0cd 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -906,7 +906,6 @@ void mnt_set_mountpoint(struct mount *mnt,
>   			struct mount *child_mnt)
>   {
>   	mp->m_count++;
> -	mnt_add_count(mnt, 1);	/* essentially, that's mntget */
>   	child_mnt->mnt_mountpoint = mp->m_dentry;
>   	child_mnt->mnt_parent = mnt;
>   	child_mnt->mnt_mp = mp;
> @@ -1429,22 +1428,18 @@ void mnt_cursor_del(struct mnt_namespace *ns, struct mount *cursor)
>   int may_umount_tree(struct vfsmount *m)
>   {
>   	struct mount *mnt = real_mount(m);
> -	int actual_refs = 0;
> -	int minimum_refs = 0;
> -	struct mount *p;
>   	BUG_ON(!m);
>   
>   	/* write lock needed for mnt_get_count */
>   	lock_mount_hash();
> -	for (p = mnt; p; p = next_mnt(p, mnt)) {
> -		actual_refs += mnt_get_count(p);
> -		minimum_refs += 2;
> +	for (struct mount *p = mnt; p; p = next_mnt(p, mnt)) {
> +		int allowed = p == mnt ? 2 : 1;
> +		if (mnt_get_count(p) > allowed) {
> +			unlock_mount_hash();
> +			return 0;
> +		}
>   	}
>   	unlock_mount_hash();
> -
> -	if (actual_refs > minimum_refs)
> -		return 0;
> -
>   	return 1;
>   }
>   
> @@ -1586,7 +1581,6 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
>   
>   		disconnect = disconnect_mount(p, how);
>   		if (mnt_has_parent(p)) {
> -			mnt_add_count(p->mnt_parent, -1);
>   			if (!disconnect) {
>   				/* Don't forget about p */
>   				list_add_tail(&p->mnt_child, &p->mnt_parent->mnt_mounts);
> @@ -2892,12 +2886,8 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
>   		put_mountpoint(old_mp);
>   out:
>   	unlock_mount(mp);
> -	if (!err) {
> -		if (attached)
> -			mntput_no_expire(parent);
> -		else
> -			free_mnt_ns(ns);
> -	}
> +	if (!err && !attached)
> +		free_mnt_ns(ns);
>   	return err;
>   }
>   
> @@ -3869,7 +3859,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
>   		const char __user *, put_old)
>   {
>   	struct path new, old, root;
> -	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
> +	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent;
>   	struct mountpoint *old_mp, *root_mp;
>   	int error;
>   
> @@ -3900,10 +3890,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
>   	new_mnt = real_mount(new.mnt);
>   	root_mnt = real_mount(root.mnt);
>   	old_mnt = real_mount(old.mnt);
> -	ex_parent = new_mnt->mnt_parent;
>   	root_parent = root_mnt->mnt_parent;
>   	if (IS_MNT_SHARED(old_mnt) ||
> -		IS_MNT_SHARED(ex_parent) ||
> +		IS_MNT_SHARED(new_mnt->mnt_parent) ||
>   		IS_MNT_SHARED(root_parent))
>   		goto out4;
>   	if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
> @@ -3942,7 +3931,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
>   	attach_mnt(root_mnt, old_mnt, old_mp);
>   	/* mount new_root on / */
>   	attach_mnt(new_mnt, root_parent, root_mp);
> -	mnt_add_count(root_parent, -1);
>   	touch_mnt_namespace(current->nsproxy->mnt_ns);
>   	/* A moved mount should not expire automatically */
>   	list_del_init(&new_mnt->mnt_expire);
> @@ -3952,8 +3940,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
>   	error = 0;
>   out4:
>   	unlock_mount(old_mp);
> -	if (!error)
> -		mntput_no_expire(ex_parent);
>   out3:
>   	path_put(&root);
>   out2:
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 1106137c747a..e2c8a4b18857 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -368,7 +368,7 @@ static inline int do_refcount_check(struct mount *mnt, int count)
>    */
>   int propagate_mount_busy(struct mount *mnt, int refcnt)
>   {
> -	struct mount *m, *child, *topper;
> +	struct mount *m, *child;
>   	struct mount *parent = mnt->mnt_parent;
>   
>   	if (mnt == parent)
> @@ -384,7 +384,6 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
>   
>   	for (m = propagation_next(parent, parent); m;
>   	     		m = propagation_next(m, parent)) {
> -		int count = 1;
>   		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
>   		if (!child)
>   			continue;
> @@ -392,13 +391,10 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
>   		/* Is there exactly one mount on the child that covers
>   		 * it completely whose reference should be ignored?
>   		 */
> -		topper = find_topper(child);
> -		if (topper)
> -			count += 1;
> -		else if (!list_empty(&child->mnt_mounts))
> +		if (!find_topper(child) && !list_empty(&child->mnt_mounts))
>   			continue;
>   
> -		if (do_refcount_check(child, count))
> +		if (do_refcount_check(child, 1))
>   			return 1;
>   	}
>   	return 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper
  2022-07-20  1:54   ` Al Viro
@ 2022-07-20  2:31     ` Ian Kent
  2022-07-20  2:39       ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Kent @ 2022-07-20  2:31 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List


On 20/7/22 09:54, Al Viro wrote:
> On Mon, Jul 11, 2022 at 11:37:46AM +0800, Ian Kent wrote:
>
>> +static int do_mount_in_use_check(struct mount *mnt, int cnt)
>> +{
>> +	struct mount *topper;
>> +
>> +	/* Is there exactly one mount on the child that covers
>> +	 * it completely?
>> +	 */
>> +	topper = find_topper(mnt);
>> +	if (topper) {
>> +		int topper_cnt = topper->mnt_mounts_cnt + 1;
>> +
>> +		/* Open file or pwd within singular mount? */
>> +		if (do_refcount_check(topper, topper_cnt))
>> +			return 1;
> Whatever the hell for?  umount(2) will be able to slide the
> underlying mount from under the topper, whatever the
> refcount of topper might have been.

My thinking was that a process could have set a working

directory (or opened a descriptor) and some later change

to an autofs map resulted in it being mounted on. It's

irrelevant now with your suggested simpler approach, ;)


Ian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper
  2022-07-20  2:31     ` Ian Kent
@ 2022-07-20  2:39       ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2022-07-20  2:39 UTC (permalink / raw)
  To: Ian Kent
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

On Wed, Jul 20, 2022 at 10:31:26AM +0800, Ian Kent wrote:
> 
> On 20/7/22 09:54, Al Viro wrote:
> > On Mon, Jul 11, 2022 at 11:37:46AM +0800, Ian Kent wrote:
> > 
> > > +static int do_mount_in_use_check(struct mount *mnt, int cnt)
> > > +{
> > > +	struct mount *topper;
> > > +
> > > +	/* Is there exactly one mount on the child that covers
> > > +	 * it completely?
> > > +	 */
> > > +	topper = find_topper(mnt);
> > > +	if (topper) {
> > > +		int topper_cnt = topper->mnt_mounts_cnt + 1;
> > > +
> > > +		/* Open file or pwd within singular mount? */
> > > +		if (do_refcount_check(topper, topper_cnt))
> > > +			return 1;
> > Whatever the hell for?  umount(2) will be able to slide the
> > underlying mount from under the topper, whatever the
> > refcount of topper might have been.
> 
> My thinking was that a process could have set a working
> 
> directory (or opened a descriptor) and some later change
> 
> to an autofs map resulted in it being mounted on. It's
> 
> irrelevant now with your suggested simpler approach, ;)

No, I mean why bother checking refcount of overmount in the first
place?  umount(2) will *not* consider it as -EBUSY.  On propagation
under the full overmount it will quietly remove the thing it's
overmounting.

If you have

overmount
victim
mountpoint

stacked like that, with overmount sitting directly on the root
subtree covered by the victim, the only things checked will be
	* victim itself is not busy
	* victim has nothing mounted deeper in it.
In that case it'll collapse to

overmount
mountpoint

and proceed to take the (now detached) victim out.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] vfs: track count of child mounts
  2022-07-20  2:17     ` Ian Kent
@ 2022-07-20  7:26       ` Ian Kent
  2022-07-26  5:11       ` Ian Kent
  1 sibling, 0 replies; 12+ messages in thread
From: Ian Kent @ 2022-07-20  7:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

On 20/7/22 10:17, Ian Kent wrote:
>
> On 20/7/22 09:50, Al Viro wrote:
>> On Mon, Jul 11, 2022 at 11:37:40AM +0800, Ian Kent wrote:
>>> While the total reference count of a mount is mostly all that's needed
>>> the reference count corresponding to the mounts only is occassionally
>>> also needed (for example, autofs checking if a tree of mounts can be
>>> expired).
>>>
>>> To make this reference count avaialble with minimal changes add a
>>> counter to track the number of child mounts under a given mount. This
>>> count can then be used to calculate the mounts only reference count.
>> No.  This is a wrong approach - instead of keeping track of number of
>> children, we should just stop having them contribute to refcount of
>> the parent.  Here's what I've got in my local tree; life gets simpler
>> that way.
>
> Right, I'll grab this and run some tests.
>
>
> Ian
>
>>
>> commit e99f1f9cc864103f326a5352e6ce1e377613437f
>> Author: Al Viro <viro@zeniv.linux.org.uk>
>> Date:   Sat Jul 9 14:45:39 2022 -0400
>>
>>      namespace: don't keep ->mnt_parent pinned
>>           makes refcounting more consistent
>>           Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>>
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index 68789f896f08..53c29110a0cd 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -906,7 +906,6 @@ void mnt_set_mountpoint(struct mount *mnt,
>>               struct mount *child_mnt)
>>   {
>>       mp->m_count++;
>> -    mnt_add_count(mnt, 1);    /* essentially, that's mntget */
>>       child_mnt->mnt_mountpoint = mp->m_dentry;
>>       child_mnt->mnt_parent = mnt;
>>       child_mnt->mnt_mp = mp;
>> @@ -1429,22 +1428,18 @@ void mnt_cursor_del(struct mnt_namespace *ns, 
>> struct mount *cursor)
>>   int may_umount_tree(struct vfsmount *m)
>>   {
>>       struct mount *mnt = real_mount(m);
>> -    int actual_refs = 0;
>> -    int minimum_refs = 0;
>> -    struct mount *p;
>>       BUG_ON(!m);
>>         /* write lock needed for mnt_get_count */
>>       lock_mount_hash();
>> -    for (p = mnt; p; p = next_mnt(p, mnt)) {
>> -        actual_refs += mnt_get_count(p);
>> -        minimum_refs += 2;
>> +    for (struct mount *p = mnt; p; p = next_mnt(p, mnt)) {
>> +        int allowed = p == mnt ? 2 : 1;
>> +        if (mnt_get_count(p) > allowed) {
>> +            unlock_mount_hash();
>> +            return 0;
>> +        }
>>       }

One part of the problem I'm trying to fix is when some other

process has created a mount namespace (say with unshare(1))

on an auto-mounted path and holds the mount in use in some way.

This leads to an attempted umount in the automount daemon which

fails because a propagation of the mount is in use.


Given the way may_umount() behaves I thought it sensible that

may_umount_tree() behave that way too, but the above doesn't

check propagated mounts.


I'll see if I can come up with something on that ... unless

I'm missing something and you have different thoughts how

this should be done ...


Ian

>>       unlock_mount_hash();
>> -
>> -    if (actual_refs > minimum_refs)
>> -        return 0;
>> -
>>       return 1;
>>   }
>>   @@ -1586,7 +1581,6 @@ static void umount_tree(struct mount *mnt, 
>> enum umount_tree_flags how)
>>             disconnect = disconnect_mount(p, how);
>>           if (mnt_has_parent(p)) {
>> -            mnt_add_count(p->mnt_parent, -1);
>>               if (!disconnect) {
>>                   /* Don't forget about p */
>>                   list_add_tail(&p->mnt_child, 
>> &p->mnt_parent->mnt_mounts);
>> @@ -2892,12 +2886,8 @@ static int do_move_mount(struct path 
>> *old_path, struct path *new_path)
>>           put_mountpoint(old_mp);
>>   out:
>>       unlock_mount(mp);
>> -    if (!err) {
>> -        if (attached)
>> -            mntput_no_expire(parent);
>> -        else
>> -            free_mnt_ns(ns);
>> -    }
>> +    if (!err && !attached)
>> +        free_mnt_ns(ns);
>>       return err;
>>   }
>>   @@ -3869,7 +3859,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>> *, new_root,
>>           const char __user *, put_old)
>>   {
>>       struct path new, old, root;
>> -    struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, 
>> *ex_parent;
>> +    struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent;
>>       struct mountpoint *old_mp, *root_mp;
>>       int error;
>>   @@ -3900,10 +3890,9 @@ SYSCALL_DEFINE2(pivot_root, const char 
>> __user *, new_root,
>>       new_mnt = real_mount(new.mnt);
>>       root_mnt = real_mount(root.mnt);
>>       old_mnt = real_mount(old.mnt);
>> -    ex_parent = new_mnt->mnt_parent;
>>       root_parent = root_mnt->mnt_parent;
>>       if (IS_MNT_SHARED(old_mnt) ||
>> -        IS_MNT_SHARED(ex_parent) ||
>> +        IS_MNT_SHARED(new_mnt->mnt_parent) ||
>>           IS_MNT_SHARED(root_parent))
>>           goto out4;
>>       if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
>> @@ -3942,7 +3931,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>> *, new_root,
>>       attach_mnt(root_mnt, old_mnt, old_mp);
>>       /* mount new_root on / */
>>       attach_mnt(new_mnt, root_parent, root_mp);
>> -    mnt_add_count(root_parent, -1);
>>       touch_mnt_namespace(current->nsproxy->mnt_ns);
>>       /* A moved mount should not expire automatically */
>>       list_del_init(&new_mnt->mnt_expire);
>> @@ -3952,8 +3940,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>> *, new_root,
>>       error = 0;
>>   out4:
>>       unlock_mount(old_mp);
>> -    if (!error)
>> -        mntput_no_expire(ex_parent);
>>   out3:
>>       path_put(&root);
>>   out2:
>> diff --git a/fs/pnode.c b/fs/pnode.c
>> index 1106137c747a..e2c8a4b18857 100644
>> --- a/fs/pnode.c
>> +++ b/fs/pnode.c
>> @@ -368,7 +368,7 @@ static inline int do_refcount_check(struct mount 
>> *mnt, int count)
>>    */
>>   int propagate_mount_busy(struct mount *mnt, int refcnt)
>>   {
>> -    struct mount *m, *child, *topper;
>> +    struct mount *m, *child;
>>       struct mount *parent = mnt->mnt_parent;
>>         if (mnt == parent)
>> @@ -384,7 +384,6 @@ int propagate_mount_busy(struct mount *mnt, int 
>> refcnt)
>>         for (m = propagation_next(parent, parent); m;
>>                    m = propagation_next(m, parent)) {
>> -        int count = 1;
>>           child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
>>           if (!child)
>>               continue;
>> @@ -392,13 +391,10 @@ int propagate_mount_busy(struct mount *mnt, int 
>> refcnt)
>>           /* Is there exactly one mount on the child that covers
>>            * it completely whose reference should be ignored?
>>            */
>> -        topper = find_topper(child);
>> -        if (topper)
>> -            count += 1;
>> -        else if (!list_empty(&child->mnt_mounts))
>> +        if (!find_topper(child) && !list_empty(&child->mnt_mounts))
>>               continue;
>>   -        if (do_refcount_check(child, count))
>> +        if (do_refcount_check(child, 1))
>>               return 1;
>>       }
>>       return 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] vfs: track count of child mounts
  2022-07-20  2:17     ` Ian Kent
  2022-07-20  7:26       ` Ian Kent
@ 2022-07-26  5:11       ` Ian Kent
  2022-07-26  7:10         ` Ian Kent
  1 sibling, 1 reply; 12+ messages in thread
From: Ian Kent @ 2022-07-26  5:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List

On 20/7/22 10:17, Ian Kent wrote:
>
> On 20/7/22 09:50, Al Viro wrote:
>> On Mon, Jul 11, 2022 at 11:37:40AM +0800, Ian Kent wrote:
>>> While the total reference count of a mount is mostly all that's needed
>>> the reference count corresponding to the mounts only is occassionally
>>> also needed (for example, autofs checking if a tree of mounts can be
>>> expired).
>>>
>>> To make this reference count avaialble with minimal changes add a
>>> counter to track the number of child mounts under a given mount. This
>>> count can then be used to calculate the mounts only reference count.
>> No.  This is a wrong approach - instead of keeping track of number of
>> children, we should just stop having them contribute to refcount of
>> the parent.  Here's what I've got in my local tree; life gets simpler
>> that way.
>
> Right, I'll grab this and run some tests.

Just a heads up, I've been able to reliably hang autofs with the

below patch using my submount test (which is actually pretty good

at exposing problems).


No idea what it is yet but I'll look around and keep trying to work

it out, ;)


Ian

>
>
> Ian
>
>>
>> commit e99f1f9cc864103f326a5352e6ce1e377613437f
>> Author: Al Viro <viro@zeniv.linux.org.uk>
>> Date:   Sat Jul 9 14:45:39 2022 -0400
>>
>>      namespace: don't keep ->mnt_parent pinned
>>           makes refcounting more consistent
>>           Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>>
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index 68789f896f08..53c29110a0cd 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -906,7 +906,6 @@ void mnt_set_mountpoint(struct mount *mnt,
>>               struct mount *child_mnt)
>>   {
>>       mp->m_count++;
>> -    mnt_add_count(mnt, 1);    /* essentially, that's mntget */
>>       child_mnt->mnt_mountpoint = mp->m_dentry;
>>       child_mnt->mnt_parent = mnt;
>>       child_mnt->mnt_mp = mp;
>> @@ -1429,22 +1428,18 @@ void mnt_cursor_del(struct mnt_namespace *ns, 
>> struct mount *cursor)
>>   int may_umount_tree(struct vfsmount *m)
>>   {
>>       struct mount *mnt = real_mount(m);
>> -    int actual_refs = 0;
>> -    int minimum_refs = 0;
>> -    struct mount *p;
>>       BUG_ON(!m);
>>         /* write lock needed for mnt_get_count */
>>       lock_mount_hash();
>> -    for (p = mnt; p; p = next_mnt(p, mnt)) {
>> -        actual_refs += mnt_get_count(p);
>> -        minimum_refs += 2;
>> +    for (struct mount *p = mnt; p; p = next_mnt(p, mnt)) {
>> +        int allowed = p == mnt ? 2 : 1;
>> +        if (mnt_get_count(p) > allowed) {
>> +            unlock_mount_hash();
>> +            return 0;
>> +        }
>>       }
>>       unlock_mount_hash();
>> -
>> -    if (actual_refs > minimum_refs)
>> -        return 0;
>> -
>>       return 1;
>>   }
>>   @@ -1586,7 +1581,6 @@ static void umount_tree(struct mount *mnt, 
>> enum umount_tree_flags how)
>>             disconnect = disconnect_mount(p, how);
>>           if (mnt_has_parent(p)) {
>> -            mnt_add_count(p->mnt_parent, -1);
>>               if (!disconnect) {
>>                   /* Don't forget about p */
>>                   list_add_tail(&p->mnt_child, 
>> &p->mnt_parent->mnt_mounts);
>> @@ -2892,12 +2886,8 @@ static int do_move_mount(struct path 
>> *old_path, struct path *new_path)
>>           put_mountpoint(old_mp);
>>   out:
>>       unlock_mount(mp);
>> -    if (!err) {
>> -        if (attached)
>> -            mntput_no_expire(parent);
>> -        else
>> -            free_mnt_ns(ns);
>> -    }
>> +    if (!err && !attached)
>> +        free_mnt_ns(ns);
>>       return err;
>>   }
>>   @@ -3869,7 +3859,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>> *, new_root,
>>           const char __user *, put_old)
>>   {
>>       struct path new, old, root;
>> -    struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, 
>> *ex_parent;
>> +    struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent;
>>       struct mountpoint *old_mp, *root_mp;
>>       int error;
>>   @@ -3900,10 +3890,9 @@ SYSCALL_DEFINE2(pivot_root, const char 
>> __user *, new_root,
>>       new_mnt = real_mount(new.mnt);
>>       root_mnt = real_mount(root.mnt);
>>       old_mnt = real_mount(old.mnt);
>> -    ex_parent = new_mnt->mnt_parent;
>>       root_parent = root_mnt->mnt_parent;
>>       if (IS_MNT_SHARED(old_mnt) ||
>> -        IS_MNT_SHARED(ex_parent) ||
>> +        IS_MNT_SHARED(new_mnt->mnt_parent) ||
>>           IS_MNT_SHARED(root_parent))
>>           goto out4;
>>       if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
>> @@ -3942,7 +3931,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>> *, new_root,
>>       attach_mnt(root_mnt, old_mnt, old_mp);
>>       /* mount new_root on / */
>>       attach_mnt(new_mnt, root_parent, root_mp);
>> -    mnt_add_count(root_parent, -1);
>>       touch_mnt_namespace(current->nsproxy->mnt_ns);
>>       /* A moved mount should not expire automatically */
>>       list_del_init(&new_mnt->mnt_expire);
>> @@ -3952,8 +3940,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>> *, new_root,
>>       error = 0;
>>   out4:
>>       unlock_mount(old_mp);
>> -    if (!error)
>> -        mntput_no_expire(ex_parent);
>>   out3:
>>       path_put(&root);
>>   out2:
>> diff --git a/fs/pnode.c b/fs/pnode.c
>> index 1106137c747a..e2c8a4b18857 100644
>> --- a/fs/pnode.c
>> +++ b/fs/pnode.c
>> @@ -368,7 +368,7 @@ static inline int do_refcount_check(struct mount 
>> *mnt, int count)
>>    */
>>   int propagate_mount_busy(struct mount *mnt, int refcnt)
>>   {
>> -    struct mount *m, *child, *topper;
>> +    struct mount *m, *child;
>>       struct mount *parent = mnt->mnt_parent;
>>         if (mnt == parent)
>> @@ -384,7 +384,6 @@ int propagate_mount_busy(struct mount *mnt, int 
>> refcnt)
>>         for (m = propagation_next(parent, parent); m;
>>                    m = propagation_next(m, parent)) {
>> -        int count = 1;
>>           child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
>>           if (!child)
>>               continue;
>> @@ -392,13 +391,10 @@ int propagate_mount_busy(struct mount *mnt, int 
>> refcnt)
>>           /* Is there exactly one mount on the child that covers
>>            * it completely whose reference should be ignored?
>>            */
>> -        topper = find_topper(child);
>> -        if (topper)
>> -            count += 1;
>> -        else if (!list_empty(&child->mnt_mounts))
>> +        if (!find_topper(child) && !list_empty(&child->mnt_mounts))
>>               continue;
>>   -        if (do_refcount_check(child, count))
>> +        if (do_refcount_check(child, 1))
>>               return 1;
>>       }
>>       return 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] vfs: track count of child mounts
  2022-07-26  5:11       ` Ian Kent
@ 2022-07-26  7:10         ` Ian Kent
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Kent @ 2022-07-26  7:10 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, David Howells, Miklos Szeredi, linux-fsdevel,
	Kernel Mailing List


On 26/7/22 13:11, Ian Kent wrote:
> On 20/7/22 10:17, Ian Kent wrote:
>>
>> On 20/7/22 09:50, Al Viro wrote:
>>> On Mon, Jul 11, 2022 at 11:37:40AM +0800, Ian Kent wrote:
>>>> While the total reference count of a mount is mostly all that's needed
>>>> the reference count corresponding to the mounts only is occassionally
>>>> also needed (for example, autofs checking if a tree of mounts can be
>>>> expired).
>>>>
>>>> To make this reference count avaialble with minimal changes add a
>>>> counter to track the number of child mounts under a given mount. This
>>>> count can then be used to calculate the mounts only reference count.
>>> No.  This is a wrong approach - instead of keeping track of number of
>>> children, we should just stop having them contribute to refcount of
>>> the parent.  Here's what I've got in my local tree; life gets simpler
>>> that way.
>>
>> Right, I'll grab this and run some tests.
>
> Just a heads up, I've been able to reliably hang autofs with the
>
> below patch using my submount test (which is actually pretty good
>
> at exposing problems).
>
>
> No idea what it is yet but I'll look around and keep trying to work
>
> it out, ;)

Mmm ... so it's just slower ... that was unexpected ...


>
>
> Ian
>
>>
>>
>> Ian
>>
>>>
>>> commit e99f1f9cc864103f326a5352e6ce1e377613437f
>>> Author: Al Viro <viro@zeniv.linux.org.uk>
>>> Date:   Sat Jul 9 14:45:39 2022 -0400
>>>
>>>      namespace: don't keep ->mnt_parent pinned
>>>           makes refcounting more consistent
>>>           Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>>>
>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>> index 68789f896f08..53c29110a0cd 100644
>>> --- a/fs/namespace.c
>>> +++ b/fs/namespace.c
>>> @@ -906,7 +906,6 @@ void mnt_set_mountpoint(struct mount *mnt,
>>>               struct mount *child_mnt)
>>>   {
>>>       mp->m_count++;
>>> -    mnt_add_count(mnt, 1);    /* essentially, that's mntget */
>>>       child_mnt->mnt_mountpoint = mp->m_dentry;
>>>       child_mnt->mnt_parent = mnt;
>>>       child_mnt->mnt_mp = mp;
>>> @@ -1429,22 +1428,18 @@ void mnt_cursor_del(struct mnt_namespace 
>>> *ns, struct mount *cursor)
>>>   int may_umount_tree(struct vfsmount *m)
>>>   {
>>>       struct mount *mnt = real_mount(m);
>>> -    int actual_refs = 0;
>>> -    int minimum_refs = 0;
>>> -    struct mount *p;
>>>       BUG_ON(!m);
>>>         /* write lock needed for mnt_get_count */
>>>       lock_mount_hash();
>>> -    for (p = mnt; p; p = next_mnt(p, mnt)) {
>>> -        actual_refs += mnt_get_count(p);
>>> -        minimum_refs += 2;
>>> +    for (struct mount *p = mnt; p; p = next_mnt(p, mnt)) {
>>> +        int allowed = p == mnt ? 2 : 1;
>>> +        if (mnt_get_count(p) > allowed) {
>>> +            unlock_mount_hash();
>>> +            return 0;
>>> +        }
>>>       }
>>>       unlock_mount_hash();
>>> -
>>> -    if (actual_refs > minimum_refs)
>>> -        return 0;
>>> -
>>>       return 1;
>>>   }
>>>   @@ -1586,7 +1581,6 @@ static void umount_tree(struct mount *mnt, 
>>> enum umount_tree_flags how)
>>>             disconnect = disconnect_mount(p, how);
>>>           if (mnt_has_parent(p)) {
>>> -            mnt_add_count(p->mnt_parent, -1);
>>>               if (!disconnect) {
>>>                   /* Don't forget about p */
>>>                   list_add_tail(&p->mnt_child, 
>>> &p->mnt_parent->mnt_mounts);
>>> @@ -2892,12 +2886,8 @@ static int do_move_mount(struct path 
>>> *old_path, struct path *new_path)
>>>           put_mountpoint(old_mp);
>>>   out:
>>>       unlock_mount(mp);
>>> -    if (!err) {
>>> -        if (attached)
>>> -            mntput_no_expire(parent);
>>> -        else
>>> -            free_mnt_ns(ns);
>>> -    }
>>> +    if (!err && !attached)
>>> +        free_mnt_ns(ns);
>>>       return err;
>>>   }
>>>   @@ -3869,7 +3859,7 @@ SYSCALL_DEFINE2(pivot_root, const char 
>>> __user *, new_root,
>>>           const char __user *, put_old)
>>>   {
>>>       struct path new, old, root;
>>> -    struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, 
>>> *ex_parent;
>>> +    struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent;
>>>       struct mountpoint *old_mp, *root_mp;
>>>       int error;
>>>   @@ -3900,10 +3890,9 @@ SYSCALL_DEFINE2(pivot_root, const char 
>>> __user *, new_root,
>>>       new_mnt = real_mount(new.mnt);
>>>       root_mnt = real_mount(root.mnt);
>>>       old_mnt = real_mount(old.mnt);
>>> -    ex_parent = new_mnt->mnt_parent;
>>>       root_parent = root_mnt->mnt_parent;
>>>       if (IS_MNT_SHARED(old_mnt) ||
>>> -        IS_MNT_SHARED(ex_parent) ||
>>> +        IS_MNT_SHARED(new_mnt->mnt_parent) ||
>>>           IS_MNT_SHARED(root_parent))
>>>           goto out4;
>>>       if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
>>> @@ -3942,7 +3931,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>>> *, new_root,
>>>       attach_mnt(root_mnt, old_mnt, old_mp);
>>>       /* mount new_root on / */
>>>       attach_mnt(new_mnt, root_parent, root_mp);
>>> -    mnt_add_count(root_parent, -1);
>>>       touch_mnt_namespace(current->nsproxy->mnt_ns);
>>>       /* A moved mount should not expire automatically */
>>>       list_del_init(&new_mnt->mnt_expire);
>>> @@ -3952,8 +3940,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user 
>>> *, new_root,
>>>       error = 0;
>>>   out4:
>>>       unlock_mount(old_mp);
>>> -    if (!error)
>>> -        mntput_no_expire(ex_parent);
>>>   out3:
>>>       path_put(&root);
>>>   out2:
>>> diff --git a/fs/pnode.c b/fs/pnode.c
>>> index 1106137c747a..e2c8a4b18857 100644
>>> --- a/fs/pnode.c
>>> +++ b/fs/pnode.c
>>> @@ -368,7 +368,7 @@ static inline int do_refcount_check(struct mount 
>>> *mnt, int count)
>>>    */
>>>   int propagate_mount_busy(struct mount *mnt, int refcnt)
>>>   {
>>> -    struct mount *m, *child, *topper;
>>> +    struct mount *m, *child;
>>>       struct mount *parent = mnt->mnt_parent;
>>>         if (mnt == parent)
>>> @@ -384,7 +384,6 @@ int propagate_mount_busy(struct mount *mnt, int 
>>> refcnt)
>>>         for (m = propagation_next(parent, parent); m;
>>>                    m = propagation_next(m, parent)) {
>>> -        int count = 1;
>>>           child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
>>>           if (!child)
>>>               continue;
>>> @@ -392,13 +391,10 @@ int propagate_mount_busy(struct mount *mnt, 
>>> int refcnt)
>>>           /* Is there exactly one mount on the child that covers
>>>            * it completely whose reference should be ignored?
>>>            */
>>> -        topper = find_topper(child);
>>> -        if (topper)
>>> -            count += 1;
>>> -        else if (!list_empty(&child->mnt_mounts))
>>> +        if (!find_topper(child) && !list_empty(&child->mnt_mounts))
>>>               continue;
>>>   -        if (do_refcount_check(child, count))
>>> +        if (do_refcount_check(child, 1))
>>>               return 1;
>>>       }
>>>       return 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-07-26  7:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-11  3:37 [PATCH 0/3] autofs: fix may_umount_tree() Ian Kent
2022-07-11  3:37 ` [PATCH 1/3] vfs: track count of child mounts Ian Kent
2022-07-20  1:50   ` Al Viro
2022-07-20  2:17     ` Ian Kent
2022-07-20  7:26       ` Ian Kent
2022-07-26  5:11       ` Ian Kent
2022-07-26  7:10         ` Ian Kent
2022-07-11  3:37 ` [PATCH 2/3] vfs: add propagate_mount_tree_busy() helper Ian Kent
2022-07-20  1:54   ` Al Viro
2022-07-20  2:31     ` Ian Kent
2022-07-20  2:39       ` Al Viro
2022-07-11  3:37 ` [PATCH 3/3] vfs: make may_umount_tree() mount namespace aware Ian Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.