All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list
@ 2021-03-15 12:07 David Howells
  2021-03-15 12:07 ` [PATCH 1/3] vfs: Use an xarray in the mount namespace to handle /proc/mounts list David Howells
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: David Howells @ 2021-03-15 12:07 UTC (permalink / raw)
  To: Alexander Viro, Miklos Szeredi
  Cc: Matthew Wilcox, dhowells, Matthew Wilcox, Ian Kent,
	Linus Torvalds, linux-fsdevel, linux-kernel


Hi Al, Miklós,

Can we consider replacing the "insert cursor" approach we're currently
using for proc files to scan the current namespace's mount list[1] with
something that uses an xarray of mounts indexed by mnt_id?

This has some advantages:

 (1) It's simpler.  We don't need to insert dummy mount objects as
     bookmarks into the mount list and code that's walking the list doesn't
     have to carefully step over them.

 (2) We can use the file position to represent the mnt_id and can jump to
     it directly - ie. using seek() to jump to a mount object by its ID.

 (3) It might make it easier to use RCU in future to dump mount entries
     rather than having to take namespace_sem.  xarray provides for the
     possibility of tagging entries to say that they're viewable to avoid
     dumping incomplete mount objects.

But there are a number of disadvantages:

 (1) We have to allocate memory to maintain the xarray, which becomes more
     of a problem as mnt_id values get scattered.

 (2) We need to preallocate the xarray memory because we're manipulating

 (3) The effect could be magnified because someone mounts an entire
     subtree and propagation has to copy all of it.

David

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f7fa85607bc398f843ff0f [1]

---
David Howells (3):
      vfs: Use an xarray in the mount namespace to handle /proc/mounts list
      vfs: Use the mounts_to_id array to do /proc/mounts and co.
      vfs: Remove mount list trawling cursor stuff


 fs/mount.h            |  2 +-
 fs/namespace.c        | 66 ++++++++++---------------------------------
 fs/proc_namespace.c   |  3 --
 include/linux/mount.h |  4 +--
 4 files changed, 17 insertions(+), 58 deletions(-)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] vfs: Use an xarray in the mount namespace to handle /proc/mounts list
  2021-03-15 12:07 [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list David Howells
@ 2021-03-15 12:07 ` David Howells
  2021-03-15 12:07 ` [PATCH 2/3] vfs: Use the mounts_to_id array to do /proc/mounts and co David Howells
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2021-03-15 12:07 UTC (permalink / raw)
  To: Alexander Viro, Miklos Szeredi
  Cc: Matthew Wilcox, dhowells, Matthew Wilcox, Ian Kent,
	Linus Torvalds, linux-fsdevel, linux-kernel

Add an xarray to the mount namespace and use this to perform a mnt_id to
mount object mapping for the namespace.  Make use of xa_reserve() to
perform preallocation before taking the mount_lock.

This will allow the set of mount objects in a namespace to be iterated
using xarray iteration and without the need to insert and remove fake
mounts as bookmarks - which cause issues for other trawlers of the list.

As a bonus, if we want to allow it, lseek() can be used to start at a
particular mount - though there's no easy way to limit the return to just a
single entry or enforce a failure if that mount doesn't exist, but a later
one does.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Matthew Wilcox <willy@infradead.org>
---

 fs/mount.h     |    2 +
 fs/namespace.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 0b6e08cf8afb..455f4d293a65 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -4,6 +4,7 @@
 #include <linux/poll.h>
 #include <linux/ns_common.h>
 #include <linux/fs_pin.h>
+#include <linux/xarray.h>
 
 struct mnt_namespace {
 	struct ns_common	ns;
@@ -14,6 +15,7 @@ struct mnt_namespace {
 	 * - taking namespace_sem for read AND taking .ns_lock.
 	 */
 	struct list_head	list;
+	struct xarray		mounts_by_id; /* List of mounts by mnt_id */
 	spinlock_t		ns_lock;
 	struct user_namespace	*user_ns;
 	struct ucounts		*ucounts;
diff --git a/fs/namespace.c b/fs/namespace.c
index 56bb5a5fdc0d..5c9bcaeac4de 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -901,6 +901,57 @@ void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct m
 	mnt_add_count(old_parent, -1);
 }
 
+/*
+ * Reserve slots in the mnt_id-to-mount mapping in a namespace.  This gets the
+ * memory allocation done upfront.
+ */
+static int reserve_mnt_id_one(struct mount *mnt, struct mnt_namespace *ns)
+{
+	struct mount *m;
+	int ret;
+
+	ret = xa_reserve(&ns->mounts_by_id, mnt->mnt_id, GFP_KERNEL);
+	if (ret < 0)
+		return ret;
+
+	list_for_each_entry(m, &mnt->mnt_list, mnt_list) {
+		ret = xa_reserve(&ns->mounts_by_id, m->mnt_id, GFP_KERNEL);
+		if (ret < 0)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int reserve_mnt_id_list(struct hlist_head *tree_list)
+{
+	struct mount *child;
+	int ret;
+
+	hlist_for_each_entry(child, tree_list, mnt_hash) {
+		ret = reserve_mnt_id_one(child, child->mnt_parent->mnt_ns);
+		if (ret < 0)
+			return ret;
+	}
+	return 0;
+}
+
+static void add_mnt_to_ns(struct mount *m, struct mnt_namespace *ns)
+{
+	void *x;
+
+	m->mnt_ns = ns;
+	x = xa_store(&ns->mounts_by_id, m->mnt_id, m, GFP_ATOMIC);
+	WARN(xa_err(x), "Couldn't store mnt_id %x\n", m->mnt_id);
+}
+
+static void remove_mnt_from_ns(struct mount *mnt)
+{
+	if (mnt->mnt_ns && mnt->mnt_ns != MNT_NS_INTERNAL)
+		xa_erase(&mnt->mnt_ns->mounts_by_id, mnt->mnt_id);
+	mnt->mnt_ns = NULL;
+}
+
 /*
  * vfsmount lock must be held for write
  */
@@ -914,8 +965,9 @@ static void commit_tree(struct mount *mnt)
 	BUG_ON(parent == mnt);
 
 	list_add_tail(&head, &mnt->mnt_list);
-	list_for_each_entry(m, &head, mnt_list)
-		m->mnt_ns = n;
+	list_for_each_entry(m, &head, mnt_list) {
+		add_mnt_to_ns(m, n);
+	}
 
 	list_splice(&head, n->list.prev);
 
@@ -1529,7 +1581,7 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 			ns->mounts--;
 			__touch_mnt_namespace(ns);
 		}
-		p->mnt_ns = NULL;
+		remove_mnt_from_ns(p);
 		if (how & UMOUNT_SYNC)
 			p->mnt.mnt_flags |= MNT_SYNC_UMOUNT;
 
@@ -2144,6 +2196,13 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 		err = count_mounts(ns, source_mnt);
 		if (err)
 			goto out;
+
+		/* Reserve id-to-mount mapping slots in the namespace we're
+		 * going to use.
+		 */
+		err = reserve_mnt_id_one(source_mnt, dest_mnt->mnt_ns);
+		if (err)
+			goto out;
 	}
 
 	if (IS_MNT_SHARED(dest_mnt)) {
@@ -2151,6 +2210,8 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 		if (err)
 			goto out;
 		err = propagate_mnt(dest_mnt, dest_mp, source_mnt, &tree_list);
+		if (!err && !moving)
+			err = reserve_mnt_id_list(&tree_list);
 		lock_mount_hash();
 		if (err)
 			goto out_cleanup_ids;
@@ -3260,6 +3321,7 @@ static void dec_mnt_namespaces(struct ucounts *ucounts)
 
 static void free_mnt_ns(struct mnt_namespace *ns)
 {
+	WARN_ON(!xa_empty(&ns->mounts_by_id));
 	if (!is_anon_ns(ns))
 		ns_free_inum(&ns->ns);
 	dec_mnt_namespaces(ns->ucounts);
@@ -3306,6 +3368,7 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a
 	INIT_LIST_HEAD(&new_ns->list);
 	init_waitqueue_head(&new_ns->poll);
 	spin_lock_init(&new_ns->ns_lock);
+	xa_init(&new_ns->mounts_by_id);
 	new_ns->user_ns = get_user_ns(user_ns);
 	new_ns->ucounts = ucounts;
 	return new_ns;
@@ -3362,7 +3425,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	p = old;
 	q = new;
 	while (p) {
-		q->mnt_ns = new_ns;
+		add_mnt_to_ns(q, new_ns);
 		new_ns->mounts++;
 		if (new_fs) {
 			if (&p->mnt == new_fs->root.mnt) {
@@ -3404,7 +3467,7 @@ struct dentry *mount_subtree(struct vfsmount *m, const char *name)
 		mntput(m);
 		return ERR_CAST(ns);
 	}
-	mnt->mnt_ns = ns;
+	add_mnt_to_ns(mnt, ns);
 	ns->root = mnt;
 	ns->mounts++;
 	list_add(&mnt->mnt_list, &ns->list);
@@ -3583,7 +3646,7 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 		goto err_path;
 	}
 	mnt = real_mount(newmount.mnt);
-	mnt->mnt_ns = ns;
+	add_mnt_to_ns(mnt, ns);
 	ns->root = mnt;
 	ns->mounts = 1;
 	list_add(&mnt->mnt_list, &ns->list);
@@ -4193,7 +4256,7 @@ static void __init init_mount_tree(void)
 	if (IS_ERR(ns))
 		panic("Can't allocate initial namespace");
 	m = real_mount(mnt);
-	m->mnt_ns = ns;
+	add_mnt_to_ns(m, ns);
 	ns->root = m;
 	ns->mounts = 1;
 	list_add(&m->mnt_list, &ns->list);
@@ -4270,7 +4333,7 @@ void kern_unmount(struct vfsmount *mnt)
 {
 	/* release long term mount so mount point can be released */
 	if (!IS_ERR_OR_NULL(mnt)) {
-		real_mount(mnt)->mnt_ns = NULL;
+		remove_mnt_from_ns(real_mount(mnt));
 		synchronize_rcu();	/* yecchhh... */
 		mntput(mnt);
 	}
@@ -4283,7 +4346,7 @@ void kern_unmount_array(struct vfsmount *mnt[], unsigned int num)
 
 	for (i = 0; i < num; i++)
 		if (mnt[i])
-			real_mount(mnt[i])->mnt_ns = NULL;
+			remove_mnt_from_ns(real_mount(mnt[i]));
 	synchronize_rcu_expedited();
 	for (i = 0; i < num; i++)
 		mntput(mnt[i]);



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] vfs: Use the mounts_to_id array to do /proc/mounts and co.
  2021-03-15 12:07 [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list David Howells
  2021-03-15 12:07 ` [PATCH 1/3] vfs: Use an xarray in the mount namespace to handle /proc/mounts list David Howells
@ 2021-03-15 12:07 ` David Howells
  2021-03-15 12:54   ` Matthew Wilcox
  2021-03-15 12:08 ` [PATCH 3/3] vfs: Remove mount list trawling cursor stuff David Howells
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: David Howells @ 2021-03-15 12:07 UTC (permalink / raw)
  To: Alexander Viro, Miklos Szeredi
  Cc: Matthew Wilcox, dhowells, Matthew Wilcox, Ian Kent,
	Linus Torvalds, linux-fsdevel, linux-kernel

Use the mounts_to_id xarray added to the mount namespace to perform
iteration over the mounts in a namespace on behalf of /proc/mounts and
similar.

Since it doesn't trawl a standard list_head, but rather uses xarray, this
could be done under the RCU read lock only.  To do this, we would need to
hide mounts that are in the process of being inserted into the tree by
marking them in the xarray itself or using a mount flag.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Matthew Wilcox <willy@infradead.org>
---

 fs/mount.h          |    2 +-
 fs/namespace.c      |   40 +++++++++++++++++-----------------------
 fs/proc_namespace.c |    3 ---
 3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 455f4d293a65..114e7d603995 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -130,7 +130,7 @@ struct proc_mounts {
 	struct mnt_namespace *ns;
 	struct path root;
 	int (*show)(struct seq_file *, struct vfsmount *);
-	struct mount cursor;
+	struct xa_state xas;
 };
 
 extern const struct seq_operations mounts_op;
diff --git a/fs/namespace.c b/fs/namespace.c
index 5c9bcaeac4de..d19fde0654f7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1334,6 +1334,7 @@ struct vfsmount *mnt_clone_internal(const struct path *path)
 }
 
 #ifdef CONFIG_PROC_FS
+#if 0
 static struct mount *mnt_list_next(struct mnt_namespace *ns,
 				   struct list_head *p)
 {
@@ -1351,47 +1352,40 @@ static struct mount *mnt_list_next(struct mnt_namespace *ns,
 
 	return ret;
 }
+#endif
 
 /* iterator; we want it to have access to namespace_sem, thus here... */
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
-	struct proc_mounts *p = m->private;
-	struct list_head *prev;
+	struct proc_mounts *state = m->private;
+	void *entry;
 
 	down_read(&namespace_sem);
-	if (!*pos) {
-		prev = &p->ns->list;
-	} else {
-		prev = &p->cursor.mnt_list;
+	state->xas = (struct xa_state) __XA_STATE(&state->ns->mounts_by_id, *pos, 0, 0);
 
-		/* Read after we'd reached the end? */
-		if (list_empty(prev))
-			return NULL;
-	}
+	entry = xas_find(&state->xas, ULONG_MAX);
+	while (entry && xas_invalid(entry))
+		entry = xas_next_entry(&state->xas, ULONG_MAX);
 
-	return mnt_list_next(p->ns, prev);
+	return entry;
 }
 
 static void *m_next(struct seq_file *m, void *v, loff_t *pos)
 {
-	struct proc_mounts *p = m->private;
+	struct proc_mounts *state = m->private;
 	struct mount *mnt = v;
+	void *entry;
+
+	*pos = mnt->mnt_id + 1;
+	entry = xas_next_entry(&state->xas, ULONG_MAX);
+	while (entry && xas_invalid(entry))
+		entry = xas_next_entry(&state->xas, ULONG_MAX);
 
-	++*pos;
-	return mnt_list_next(p->ns, &mnt->mnt_list);
+	return entry;
 }
 
 static void m_stop(struct seq_file *m, void *v)
 {
-	struct proc_mounts *p = m->private;
-	struct mount *mnt = v;
-
-	lock_ns_list(p->ns);
-	if (mnt)
-		list_move_tail(&p->cursor.mnt_list, &mnt->mnt_list);
-	else
-		list_del_init(&p->cursor.mnt_list);
-	unlock_ns_list(p->ns);
 	up_read(&namespace_sem);
 }
 
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 392ef5162655..9ae07f1904e6 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -283,8 +283,6 @@ static int mounts_open_common(struct inode *inode, struct file *file,
 	p->ns = ns;
 	p->root = root;
 	p->show = show;
-	INIT_LIST_HEAD(&p->cursor.mnt_list);
-	p->cursor.mnt.mnt_flags = MNT_CURSOR;
 
 	return 0;
 
@@ -301,7 +299,6 @@ static int mounts_release(struct inode *inode, struct file *file)
 	struct seq_file *m = file->private_data;
 	struct proc_mounts *p = m->private;
 	path_put(&p->root);
-	mnt_cursor_del(p->ns, &p->cursor);
 	put_mnt_ns(p->ns);
 	return seq_release_private(inode, file);
 }



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] vfs: Remove mount list trawling cursor stuff
  2021-03-15 12:07 [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list David Howells
  2021-03-15 12:07 ` [PATCH 1/3] vfs: Use an xarray in the mount namespace to handle /proc/mounts list David Howells
  2021-03-15 12:07 ` [PATCH 2/3] vfs: Use the mounts_to_id array to do /proc/mounts and co David Howells
@ 2021-03-15 12:08 ` David Howells
  2021-03-15 12:46 ` [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list Matthew Wilcox
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2021-03-15 12:08 UTC (permalink / raw)
  To: Alexander Viro, Miklos Szeredi
  Cc: Matthew Wilcox, dhowells, Matthew Wilcox, Ian Kent,
	Linus Torvalds, linux-fsdevel, linux-kernel

Remove the stuff for trawling a mount namespace's mount list using inserted
cursors as bookmarks as this has been replaced with an xarray-based
approach.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Matthew Wilcox <willy@infradead.org>
---

 fs/namespace.c        |   30 ------------------------------
 include/linux/mount.h |    4 +---
 2 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d19fde0654f7..105a6d882cb4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -673,11 +673,6 @@ static inline void unlock_ns_list(struct mnt_namespace *ns)
 	spin_unlock(&ns->ns_lock);
 }
 
-static inline bool mnt_is_cursor(struct mount *mnt)
-{
-	return mnt->mnt.mnt_flags & MNT_CURSOR;
-}
-
 /*
  * __is_local_mountpoint - Test to see if dentry is a mountpoint in the
  *                         current mount namespace.
@@ -702,8 +697,6 @@ bool __is_local_mountpoint(struct dentry *dentry)
 	down_read(&namespace_sem);
 	lock_ns_list(ns);
 	list_for_each_entry(mnt, &ns->list, mnt_list) {
-		if (mnt_is_cursor(mnt))
-			continue;
 		is_covered = (mnt->mnt_mountpoint == dentry);
 		if (is_covered)
 			break;
@@ -1334,26 +1327,6 @@ struct vfsmount *mnt_clone_internal(const struct path *path)
 }
 
 #ifdef CONFIG_PROC_FS
-#if 0
-static struct mount *mnt_list_next(struct mnt_namespace *ns,
-				   struct list_head *p)
-{
-	struct mount *mnt, *ret = NULL;
-
-	lock_ns_list(ns);
-	list_for_each_continue(p, &ns->list) {
-		mnt = list_entry(p, typeof(*mnt), mnt_list);
-		if (!mnt_is_cursor(mnt)) {
-			ret = mnt;
-			break;
-		}
-	}
-	unlock_ns_list(ns);
-
-	return ret;
-}
-#endif
-
 /* iterator; we want it to have access to namespace_sem, thus here... */
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
@@ -4390,9 +4363,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 		struct mount *child;
 		int mnt_flags;
 
-		if (mnt_is_cursor(mnt))
-			continue;
-
 		if (mnt->mnt.mnt_sb->s_type != sb->s_type)
 			continue;
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 5d92a7e1a742..88027d38833c 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -51,8 +51,7 @@ struct fs_context;
 #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME )
 
 #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \
-			    MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED | \
-			    MNT_CURSOR)
+			    MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED)
 
 #define MNT_INTERNAL	0x4000
 
@@ -66,7 +65,6 @@ struct fs_context;
 #define MNT_SYNC_UMOUNT		0x2000000
 #define MNT_MARKED		0x4000000
 #define MNT_UMOUNT		0x8000000
-#define MNT_CURSOR		0x10000000
 
 struct vfsmount {
 	struct dentry *mnt_root;	/* root of the mounted tree */



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list
  2021-03-15 12:07 [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list David Howells
                   ` (2 preceding siblings ...)
  2021-03-15 12:08 ` [PATCH 3/3] vfs: Remove mount list trawling cursor stuff David Howells
@ 2021-03-15 12:46 ` Matthew Wilcox
  2021-03-15 13:14 ` Miklos Szeredi
  2021-03-15 13:41 ` David Howells
  5 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2021-03-15 12:46 UTC (permalink / raw)
  To: David Howells
  Cc: Alexander Viro, Miklos Szeredi, Ian Kent, Linus Torvalds,
	linux-fsdevel, linux-kernel

On Mon, Mar 15, 2021 at 12:07:39PM +0000, David Howells wrote:
> 
> Hi Al, Miklós,
> 
> Can we consider replacing the "insert cursor" approach we're currently
> using for proc files to scan the current namespace's mount list[1] with
> something that uses an xarray of mounts indexed by mnt_id?
> 
> This has some advantages:
> 
>  (1) It's simpler.  We don't need to insert dummy mount objects as
>      bookmarks into the mount list and code that's walking the list doesn't
>      have to carefully step over them.
> 
>  (2) We can use the file position to represent the mnt_id and can jump to
>      it directly - ie. using seek() to jump to a mount object by its ID.
> 
>  (3) It might make it easier to use RCU in future to dump mount entries
>      rather than having to take namespace_sem.  xarray provides for the
>      possibility of tagging entries to say that they're viewable to avoid
>      dumping incomplete mount objects.

Usually one fully constructs the object, then inserts it into the XArray.

> But there are a number of disadvantages:
> 
>  (1) We have to allocate memory to maintain the xarray, which becomes more
>      of a problem as mnt_id values get scattered.

mnt_id values don't seem to get particularly scattered.  They're allocated
using an IDA, so they stay small (unlike someone using idr_alloc_cyclic
;-).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] vfs: Use the mounts_to_id array to do /proc/mounts and co.
  2021-03-15 12:07 ` [PATCH 2/3] vfs: Use the mounts_to_id array to do /proc/mounts and co David Howells
@ 2021-03-15 12:54   ` Matthew Wilcox
  0 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2021-03-15 12:54 UTC (permalink / raw)
  To: David Howells
  Cc: Alexander Viro, Miklos Szeredi, Ian Kent, Linus Torvalds,
	linux-fsdevel, linux-kernel

On Mon, Mar 15, 2021 at 12:07:56PM +0000, David Howells wrote:
> Use the mounts_to_id xarray added to the mount namespace to perform

You called it mounts_by_id in the last patch ...

> Since it doesn't trawl a standard list_head, but rather uses xarray, this
> could be done under the RCU read lock only.  To do this, we would need to
> hide mounts that are in the process of being inserted into the tree by
> marking them in the xarray itself or using a mount flag.

>  /* iterator; we want it to have access to namespace_sem, thus here... */
>  static void *m_start(struct seq_file *m, loff_t *pos)
>  {
> -	struct proc_mounts *p = m->private;
> -	struct list_head *prev;
> +	struct proc_mounts *state = m->private;
> +	void *entry;
>  
>  	down_read(&namespace_sem);
> -	if (!*pos) {
> -		prev = &p->ns->list;
> -	} else {
> -		prev = &p->cursor.mnt_list;
> +	state->xas = (struct xa_state) __XA_STATE(&state->ns->mounts_by_id, *pos, 0, 0);
>  
> -		/* Read after we'd reached the end? */
> -		if (list_empty(prev))
> -			return NULL;
> -	}
> +	entry = xas_find(&state->xas, ULONG_MAX);

I know you haven't enabled enough debugging because this will assert
that either the RCU read lock or the xa_lock is held to prevent xa_nodes
from disappearing underneath us.

Why do you want to use an xa_state for this?  This is /proc, so efficiency
isn't the highest priority.  I'd just use xa_find(), and then you don't
need to care about an xa_state or locking -- it handles taking the rcu
read lock for you.

> +	while (entry && xas_invalid(entry))

I've never seen anybody make that mistake before.  Good one.  Not sure
if there's anything I can do to prevent it in future.

> +		entry = xas_next_entry(&state->xas, ULONG_MAX);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list
  2021-03-15 12:07 [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list David Howells
                   ` (3 preceding siblings ...)
  2021-03-15 12:46 ` [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list Matthew Wilcox
@ 2021-03-15 13:14 ` Miklos Szeredi
  2021-03-15 13:17   ` Matthew Wilcox
  2021-03-15 13:41 ` David Howells
  5 siblings, 1 reply; 10+ messages in thread
From: Miklos Szeredi @ 2021-03-15 13:14 UTC (permalink / raw)
  To: David Howells
  Cc: Alexander Viro, Matthew Wilcox, Ian Kent, Linus Torvalds,
	linux-fsdevel, linux-kernel

On Mon, Mar 15, 2021 at 1:07 PM David Howells <dhowells@redhat.com> wrote:
>
>
> Hi Al, Miklós,
>
> Can we consider replacing the "insert cursor" approach we're currently
> using for proc files to scan the current namespace's mount list[1] with
> something that uses an xarray of mounts indexed by mnt_id?
>
> This has some advantages:
>
>  (1) It's simpler.  We don't need to insert dummy mount objects as
>      bookmarks into the mount list and code that's walking the list doesn't
>      have to carefully step over them.
>
>  (2) We can use the file position to represent the mnt_id and can jump to
>      it directly - ie. using seek() to jump to a mount object by its ID.

What happens if the mount at the current position is removed?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list
  2021-03-15 13:14 ` Miklos Szeredi
@ 2021-03-15 13:17   ` Matthew Wilcox
  0 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2021-03-15 13:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, Alexander Viro, Ian Kent, Linus Torvalds,
	linux-fsdevel, linux-kernel

On Mon, Mar 15, 2021 at 02:14:35PM +0100, Miklos Szeredi wrote:
> On Mon, Mar 15, 2021 at 1:07 PM David Howells <dhowells@redhat.com> wrote:
> >
> >
> > Hi Al, Miklós,
> >
> > Can we consider replacing the "insert cursor" approach we're currently
> > using for proc files to scan the current namespace's mount list[1] with
> > something that uses an xarray of mounts indexed by mnt_id?
> >
> > This has some advantages:
> >
> >  (1) It's simpler.  We don't need to insert dummy mount objects as
> >      bookmarks into the mount list and code that's walking the list doesn't
> >      have to carefully step over them.
> >
> >  (2) We can use the file position to represent the mnt_id and can jump to
> >      it directly - ie. using seek() to jump to a mount object by its ID.
> 
> What happens if the mount at the current position is removed?

xa_find() will move to the next one.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list
  2021-03-15 12:07 [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list David Howells
                   ` (4 preceding siblings ...)
  2021-03-15 13:14 ` Miklos Szeredi
@ 2021-03-15 13:41 ` David Howells
  2021-03-15 14:22   ` Miklos Szeredi
  5 siblings, 1 reply; 10+ messages in thread
From: David Howells @ 2021-03-15 13:41 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Alexander Viro, Matthew Wilcox, Ian Kent,
	Linus Torvalds, linux-fsdevel, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> >  (2) We can use the file position to represent the mnt_id and can jump to
> >      it directly - ie. using seek() to jump to a mount object by its ID.
> 
> What happens if the mount at the current position is removed?

umount_tree() requires the namespace_sem to be writelocked, so that should be
fine as the patches currently read-lock that whilst doing /proc/*/mount*

I'm assuming that kern_unmount() won't be a problem as it is there to deal
with mounts made by kern_mount() which don't get added to the mount list
(mnt_ns is MNT_NS_INTERNAL).  kern_unmount_array() seems to be the same
because overlayfs gives it mounts generated by clone_private_mount().  It
might be worth putting a WARN_ON() in kern_unmount() to require this.

When reading through proc, m_start() calls xas_find() which returns the entry
at the starting index or, if not present, the next higher entry.

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list
  2021-03-15 13:41 ` David Howells
@ 2021-03-15 14:22   ` Miklos Szeredi
  0 siblings, 0 replies; 10+ messages in thread
From: Miklos Szeredi @ 2021-03-15 14:22 UTC (permalink / raw)
  To: David Howells
  Cc: Alexander Viro, Matthew Wilcox, Ian Kent, Linus Torvalds,
	linux-fsdevel, linux-kernel

On Mon, Mar 15, 2021 at 2:41 PM David Howells <dhowells@redhat.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > >  (2) We can use the file position to represent the mnt_id and can jump to
> > >      it directly - ie. using seek() to jump to a mount object by its ID.
> >
> > What happens if the mount at the current position is removed?
>
> umount_tree() requires the namespace_sem to be writelocked, so that should be
> fine as the patches currently read-lock that whilst doing /proc/*/mount*
>
> I'm assuming that kern_unmount() won't be a problem as it is there to deal
> with mounts made by kern_mount() which don't get added to the mount list
> (mnt_ns is MNT_NS_INTERNAL).  kern_unmount_array() seems to be the same
> because overlayfs gives it mounts generated by clone_private_mount().  It
> might be worth putting a WARN_ON() in kern_unmount() to require this.
>
> When reading through proc, m_start() calls xas_find() which returns the entry
> at the starting index or, if not present, the next higher entry.

This will break the property of new mounts always being added to the
end of the list.  That's likely a regression for nerural based parsers
(i.e. people), probably less so for machine parsers.

Thanks,
Miklos

>
> David
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-15 14:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15 12:07 [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list David Howells
2021-03-15 12:07 ` [PATCH 1/3] vfs: Use an xarray in the mount namespace to handle /proc/mounts list David Howells
2021-03-15 12:07 ` [PATCH 2/3] vfs: Use the mounts_to_id array to do /proc/mounts and co David Howells
2021-03-15 12:54   ` Matthew Wilcox
2021-03-15 12:08 ` [PATCH 3/3] vfs: Remove mount list trawling cursor stuff David Howells
2021-03-15 12:46 ` [RFC][PATCH 0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list Matthew Wilcox
2021-03-15 13:14 ` Miklos Szeredi
2021-03-15 13:17   ` Matthew Wilcox
2021-03-15 13:41 ` David Howells
2021-03-15 14:22   ` Miklos Szeredi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.