All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix rate limited ipc_namespace freeing
@ 2022-02-18 18:31 Rik van Riel
  2022-02-18 18:31 ` [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount Rik van Riel
  2022-02-18 18:31 ` [PATCH 2/2] ipc: get rid of free_ipc_work workqueue Rik van Riel
  0 siblings, 2 replies; 12+ messages in thread
From: Rik van Riel @ 2022-02-18 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, linux-fsdevel, paulmck, gscrivan, viro, Rik van Riel

The test case below fails on 5.17 (and a bunch of older kernels)
with unshare getting -ENOSPC, because the rate at which ipc_namespace
structures can be freed is limited by each item on the to-free list
waiting in synchronize_rcu.

Making kern_unmount use queue_rcu_work gets rid of that slowdown,
allowing a batch of vfsmount structures to be freed after each
RCU grace period has expired.

That, in turn, allows us to just get rid of the workqueue in
ipc/namespace.c completely.

With these two changes the test case reliably succeeds at
calling unshare a million times, even with max_ipc_namespaces
reduced to 1000 :)

#define _GNU_SOURCE
#include <sched.h>
#include <error.h>
#include <errno.h>
#include <stdlib.h>

int main()
{
	int i;

	for (i = 0; i < 1000000; i++) {
		if (unshare(CLONE_NEWIPC) < 0)
			error(EXIT_FAILURE, errno, "unshare");
	}
}


Rik van Riel (2):
  vfs: free vfsmount through rcu work from kern_unmount
  ipc: get rid of free_ipc_work workqueue

 fs/namespace.c                | 11 +++++++++--
 include/linux/ipc_namespace.h |  2 --
 include/linux/mount.h         |  2 ++
 ipc/namespace.c               | 21 +--------------------
 4 files changed, 12 insertions(+), 24 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 18:31 [PATCH 0/2] fix rate limited ipc_namespace freeing Rik van Riel
@ 2022-02-18 18:31 ` Rik van Riel
  2022-02-18 19:26   ` Al Viro
  2022-02-19  5:53   ` Al Viro
  2022-02-18 18:31 ` [PATCH 2/2] ipc: get rid of free_ipc_work workqueue Rik van Riel
  1 sibling, 2 replies; 12+ messages in thread
From: Rik van Riel @ 2022-02-18 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, linux-fsdevel, paulmck, gscrivan, viro,
	Rik van Riel, Eric Biederman, Chris Mason

After kern_unmount returns, callers can no longer access the
vfsmount structure. However, the vfsmount structure does need
to be kept around until the end of the RCU grace period, to
make sure other accesses have all gone away too.

This can be accomplished by either gating each kern_unmount
on synchronize_rcu (the comment in the code says it all), or
by deferring the freeing until the next grace period, where
it needs to be handled in a workqueue due to the locking in
mntput_no_expire().

Suggested-by: Eric Biederman <ebiederm@xmission.com>
Reported-by: Chris Mason <clm@fb.com>
---
 fs/namespace.c        | 11 +++++++++--
 include/linux/mount.h |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 40b994a29e90..9f62cf6c69de 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4384,13 +4384,20 @@ struct vfsmount *kern_mount(struct file_system_type *type)
 }
 EXPORT_SYMBOL_GPL(kern_mount);
 
+static void mntput_rcu_work(struct work_struct *work)
+{
+	struct vfsmount *mnt = container_of(to_rcu_work(work),
+			       struct vfsmount, free_rwork);
+	mntput(mnt);
+}
+
 void kern_unmount(struct vfsmount *mnt)
 {
 	/* release long term mount so mount point can be released */
 	if (!IS_ERR_OR_NULL(mnt)) {
 		real_mount(mnt)->mnt_ns = NULL;
-		synchronize_rcu();	/* yecchhh... */
-		mntput(mnt);
+		INIT_RCU_WORK(&mnt->free_rwork, mntput_rcu_work);
+		queue_rcu_work(system_wq, &mnt->free_rwork);
 	}
 }
 EXPORT_SYMBOL(kern_unmount);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 7f18a7555dff..cd007cb70d57 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -16,6 +16,7 @@
 #include <linux/spinlock.h>
 #include <linux/seqlock.h>
 #include <linux/atomic.h>
+#include <linux/workqueue.h>
 
 struct super_block;
 struct vfsmount;
@@ -73,6 +74,7 @@ struct vfsmount {
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	int mnt_flags;
 	struct user_namespace *mnt_userns;
+	struct rcu_work free_rwork;
 } __randomize_layout;
 
 static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] ipc: get rid of free_ipc_work workqueue
  2022-02-18 18:31 [PATCH 0/2] fix rate limited ipc_namespace freeing Rik van Riel
  2022-02-18 18:31 ` [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount Rik van Riel
@ 2022-02-18 18:31 ` Rik van Riel
  1 sibling, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2022-02-18 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, linux-fsdevel, paulmck, gscrivan, viro, Rik van Riel

With kern_unmount deferring the freeing of the vfsmount structure
through queue_rcu_work, we no longer need a separate workqueue for
freeing up ipc_namespace structures.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 include/linux/ipc_namespace.h |  2 --
 ipc/namespace.c               | 21 +--------------------
 2 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index b75395ec8d52..5a3debde2f3d 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -67,8 +67,6 @@ struct ipc_namespace {
 	struct user_namespace *user_ns;
 	struct ucounts *ucounts;
 
-	struct llist_node mnt_llist;
-
 	struct ns_common ns;
 } __randomize_layout;
 
diff --git a/ipc/namespace.c b/ipc/namespace.c
index ae83f0f2651b..090a08b17710 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -117,9 +117,6 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 
 static void free_ipc_ns(struct ipc_namespace *ns)
 {
-	/* mq_put_mnt() waits for a grace period as kern_unmount()
-	 * uses synchronize_rcu().
-	 */
 	mq_put_mnt(ns);
 	sem_exit_ns(ns);
 	msg_exit_ns(ns);
@@ -131,21 +128,6 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 	kfree(ns);
 }
 
-static LLIST_HEAD(free_ipc_list);
-static void free_ipc(struct work_struct *unused)
-{
-	struct llist_node *node = llist_del_all(&free_ipc_list);
-	struct ipc_namespace *n, *t;
-
-	llist_for_each_entry_safe(n, t, node, mnt_llist)
-		free_ipc_ns(n);
-}
-
-/*
- * The work queue is used to avoid the cost of synchronize_rcu in kern_unmount.
- */
-static DECLARE_WORK(free_ipc_work, free_ipc);
-
 /*
  * put_ipc_ns - drop a reference to an ipc namespace.
  * @ns: the namespace to put
@@ -168,8 +150,7 @@ void put_ipc_ns(struct ipc_namespace *ns)
 		mq_clear_sbinfo(ns);
 		spin_unlock(&mq_lock);
 
-		if (llist_add(&ns->mnt_llist, &free_ipc_list))
-			schedule_work(&free_ipc_work);
+		free_ipc_ns(ns);
 	}
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 18:31 ` [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount Rik van Riel
@ 2022-02-18 19:26   ` Al Viro
  2022-02-18 19:33     ` Rik van Riel
  2022-02-19  5:53   ` Al Viro
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-02-18 19:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:
> After kern_unmount returns, callers can no longer access the
> vfsmount structure. However, the vfsmount structure does need
> to be kept around until the end of the RCU grace period, to
> make sure other accesses have all gone away too.
> 
> This can be accomplished by either gating each kern_unmount
> on synchronize_rcu (the comment in the code says it all), or
> by deferring the freeing until the next grace period, where
> it needs to be handled in a workqueue due to the locking in
> mntput_no_expire().

NAK.  There's code that relies upon kern_unmount() being
synchronous.  That's precisely the reason why MNT_INTERNAL
is treated that way in mntput_no_expire().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 19:26   ` Al Viro
@ 2022-02-18 19:33     ` Rik van Riel
  2022-02-18 19:43       ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2022-02-18 19:33 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

On Fri, 2022-02-18 at 19:26 +0000, Al Viro wrote:
> On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:
> > After kern_unmount returns, callers can no longer access the
> > vfsmount structure. However, the vfsmount structure does need
> > to be kept around until the end of the RCU grace period, to
> > make sure other accesses have all gone away too.
> > 
> > This can be accomplished by either gating each kern_unmount
> > on synchronize_rcu (the comment in the code says it all), or
> > by deferring the freeing until the next grace period, where
> > it needs to be handled in a workqueue due to the locking in
> > mntput_no_expire().
> 
> NAK.  There's code that relies upon kern_unmount() being
> synchronous.  That's precisely the reason why MNT_INTERNAL
> is treated that way in mntput_no_expire().

Fair enough. Should I make a kern_unmount_rcu() version
that gets called just from mq_put_mnt()?

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 19:33     ` Rik van Riel
@ 2022-02-18 19:43       ` Al Viro
  2022-02-18 20:24         ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-02-18 19:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Fri, Feb 18, 2022 at 02:33:31PM -0500, Rik van Riel wrote:
> On Fri, 2022-02-18 at 19:26 +0000, Al Viro wrote:
> > On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:
> > > After kern_unmount returns, callers can no longer access the
> > > vfsmount structure. However, the vfsmount structure does need
> > > to be kept around until the end of the RCU grace period, to
> > > make sure other accesses have all gone away too.
> > > 
> > > This can be accomplished by either gating each kern_unmount
> > > on synchronize_rcu (the comment in the code says it all), or
> > > by deferring the freeing until the next grace period, where
> > > it needs to be handled in a workqueue due to the locking in
> > > mntput_no_expire().
> > 
> > NAK.  There's code that relies upon kern_unmount() being
> > synchronous.  That's precisely the reason why MNT_INTERNAL
> > is treated that way in mntput_no_expire().
> 
> Fair enough. Should I make a kern_unmount_rcu() version
> that gets called just from mq_put_mnt()?

Umm...  I'm not sure you can afford having struct ipc_namespace
freed and reused before the mqueue superblock gets at least to
deactivate_locked_super().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 19:43       ` Al Viro
@ 2022-02-18 20:24         ` Al Viro
  2022-02-18 21:06           ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-02-18 20:24 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Fri, Feb 18, 2022 at 07:43:43PM +0000, Al Viro wrote:
> On Fri, Feb 18, 2022 at 02:33:31PM -0500, Rik van Riel wrote:
> > On Fri, 2022-02-18 at 19:26 +0000, Al Viro wrote:
> > > On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:
> > > > After kern_unmount returns, callers can no longer access the
> > > > vfsmount structure. However, the vfsmount structure does need
> > > > to be kept around until the end of the RCU grace period, to
> > > > make sure other accesses have all gone away too.
> > > > 
> > > > This can be accomplished by either gating each kern_unmount
> > > > on synchronize_rcu (the comment in the code says it all), or
> > > > by deferring the freeing until the next grace period, where
> > > > it needs to be handled in a workqueue due to the locking in
> > > > mntput_no_expire().
> > > 
> > > NAK.  There's code that relies upon kern_unmount() being
> > > synchronous.  That's precisely the reason why MNT_INTERNAL
> > > is treated that way in mntput_no_expire().
> > 
> > Fair enough. Should I make a kern_unmount_rcu() version
> > that gets called just from mq_put_mnt()?
> 
> Umm...  I'm not sure you can afford having struct ipc_namespace
> freed and reused before the mqueue superblock gets at least to
> deactivate_locked_super().

BTW, that's a good demonstration of the problems with making those
beasts async.  struct mount is *not* accessed past kern_unmount(),
but the objects used by the superblock might very well be - in
this case they (struct ipc_namespace, pointed to by s->s_fs_data)
are freed by the caller after kern_unmount() returns.  And possibly
reused.  Now note that they are used as search keys by
mqueue_get_tree() and it becomes very fishy.

If you want to go that way, make it something like

void put_ipc_ns(struct ipc_namespace *ns)
{
        if (refcount_dec_and_lock(&ns->ns.count, &mq_lock)) {
		mq_clear_sbinfo(ns);
		spin_unlock(&mq_lock);
		kern_unmount_rcu(ns->mq_mnt);
	}
}

and give mqueue this for ->kill_sb():

static void mqueue_kill_super(struct super_block *sb)
{
	struct ipc_namespace *ns = sb->s_fs_info;
	kill_litter_super(sb);
	do the rest of free_ipc_ns();
}

One thing: kern_unmount_rcu() needs a very big warning about
the caution needed from its callers.  It's really not safe
for general use, and it will be a temptation for folks with
scalability problems like this one to just use it instead of
kern_unmount() and declare the problem solved.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 20:24         ` Al Viro
@ 2022-02-18 21:06           ` Al Viro
  2022-02-19  5:50             ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-02-18 21:06 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Fri, Feb 18, 2022 at 08:24:09PM +0000, Al Viro wrote:
> On Fri, Feb 18, 2022 at 07:43:43PM +0000, Al Viro wrote:
> > On Fri, Feb 18, 2022 at 02:33:31PM -0500, Rik van Riel wrote:
> > > On Fri, 2022-02-18 at 19:26 +0000, Al Viro wrote:
> > > > On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:
> > > > > After kern_unmount returns, callers can no longer access the
> > > > > vfsmount structure. However, the vfsmount structure does need
> > > > > to be kept around until the end of the RCU grace period, to
> > > > > make sure other accesses have all gone away too.
> > > > > 
> > > > > This can be accomplished by either gating each kern_unmount
> > > > > on synchronize_rcu (the comment in the code says it all), or
> > > > > by deferring the freeing until the next grace period, where
> > > > > it needs to be handled in a workqueue due to the locking in
> > > > > mntput_no_expire().
> > > > 
> > > > NAK.  There's code that relies upon kern_unmount() being
> > > > synchronous.  That's precisely the reason why MNT_INTERNAL
> > > > is treated that way in mntput_no_expire().
> > > 
> > > Fair enough. Should I make a kern_unmount_rcu() version
> > > that gets called just from mq_put_mnt()?
> > 
> > Umm...  I'm not sure you can afford having struct ipc_namespace
> > freed and reused before the mqueue superblock gets at least to
> > deactivate_locked_super().
> 
> BTW, that's a good demonstration of the problems with making those
> beasts async.  struct mount is *not* accessed past kern_unmount(),
> but the objects used by the superblock might very well be - in
> this case they (struct ipc_namespace, pointed to by s->s_fs_data)
> are freed by the caller after kern_unmount() returns.  And possibly
> reused.  Now note that they are used as search keys by
> mqueue_get_tree() and it becomes very fishy.
> 
> If you want to go that way, make it something like
> 
> void put_ipc_ns(struct ipc_namespace *ns)
> {
>         if (refcount_dec_and_lock(&ns->ns.count, &mq_lock)) {
> 		mq_clear_sbinfo(ns);
> 		spin_unlock(&mq_lock);
> 		kern_unmount_rcu(ns->mq_mnt);
> 	}
> }
> 
> and give mqueue this for ->kill_sb():
> 
> static void mqueue_kill_super(struct super_block *sb)
> {
> 	struct ipc_namespace *ns = sb->s_fs_info;
> 	kill_litter_super(sb);
> 	do the rest of free_ipc_ns();
> }
> 
> One thing: kern_unmount_rcu() needs a very big warning about
> the caution needed from its callers.  It's really not safe
> for general use, and it will be a temptation for folks with
> scalability problems like this one to just use it instead of
> kern_unmount() and declare the problem solved.

FWIW, that won't work correctly wrt failure exits.  I'm digging
through the lifetime rules in there right now, will post when
I'm done.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 21:06           ` Al Viro
@ 2022-02-19  5:50             ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2022-02-19  5:50 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Fri, Feb 18, 2022 at 09:06:31PM +0000, Al Viro wrote:

> FWIW, that won't work correctly wrt failure exits.  I'm digging
> through the lifetime rules in there right now, will post when
> I'm done.

OK, now that I'd reconstructed the picture...  The problems with
delayed shutdown are prevented by mq_clear_sbinfo() call in there -
mqueue is capable of more or less gracefully dealing with
having ->s_fs_info ripped from under it, which is what that
thing does.  Before the kern_unmount().  And since that code is
non-modular, we don't need to protect that either.

IOW, having
void put_ipc_ns(struct ipc_namespace *ns)
{
        if (refcount_dec_and_lock(&ns->ns.count, &mq_lock)) {
		mq_clear_sbinfo(ns);
		spin_unlock(&mq_lock);
		free_ipc_ns(ns);
	}
}

and
void mq_put_mnt(struct ipc_namespace *ns)
{
	/*
	 * The only reason it's safe to have the mntput async
	 * is that we'd already ripped the ipc_namespace away
	 * from the mqueue superblock, by having called
	 * mq_clear_sbinfo().
	 *
	 * NOTE: kern_unmount_rcu() IS NOT SAFE TO USE
	 * WITHOUT SERIOUS PRECAUTIONS.
	 *
	 * Anything that is used by filesystem must either
	 * be already taken away (and fs must survive that)
	 * or have its destruction delayed until the superblock
	 * shutdown.
	 *
	 */
        kern_unmount_rcu(ns->mq_mnt);
}

would suffice.  free_ipc_work/free_ipc/mnt_llist can be killed off.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-18 18:31 ` [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount Rik van Riel
  2022-02-18 19:26   ` Al Viro
@ 2022-02-19  5:53   ` Al Viro
  2022-02-19  5:58     ` Al Viro
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-02-19  5:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:

>  struct super_block;
>  struct vfsmount;
> @@ -73,6 +74,7 @@ struct vfsmount {
>  	struct super_block *mnt_sb;	/* pointer to superblock */
>  	int mnt_flags;
>  	struct user_namespace *mnt_userns;
> +	struct rcu_work free_rwork;
>  } __randomize_layout;

Wait, what?  First of all, that has no business being in vfsmount -
everything that deeply internal belongs in struct mount, not in
its public part.  Moreover, there's already mount->mnt_rcu, so what's
the point duplicating that?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-19  5:53   ` Al Viro
@ 2022-02-19  5:58     ` Al Viro
  2022-02-19  6:07       ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2022-02-19  5:58 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Sat, Feb 19, 2022 at 05:53:37AM +0000, Al Viro wrote:
> On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:
> 
> >  struct super_block;
> >  struct vfsmount;
> > @@ -73,6 +74,7 @@ struct vfsmount {
> >  	struct super_block *mnt_sb;	/* pointer to superblock */
> >  	int mnt_flags;
> >  	struct user_namespace *mnt_userns;
> > +	struct rcu_work free_rwork;
> >  } __randomize_layout;
> 
> Wait, what?  First of all, that has no business being in vfsmount -
> everything that deeply internal belongs in struct mount, not in
> its public part.  Moreover, there's already mount->mnt_rcu, so what's
> the point duplicating that?

Argh... You need rcu_work there...

OK, so make that a member of the same union mnt_rcu is.  In struct mount,
please.  And I'm not sure I like the idea of shoving that much into
struct mount, TBH...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount
  2022-02-19  5:58     ` Al Viro
@ 2022-02-19  6:07       ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2022-02-19  6:07 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, kernel-team, linux-fsdevel, paulmck, gscrivan,
	Eric Biederman, Chris Mason

On Sat, Feb 19, 2022 at 05:58:13AM +0000, Al Viro wrote:
> On Sat, Feb 19, 2022 at 05:53:37AM +0000, Al Viro wrote:
> > On Fri, Feb 18, 2022 at 01:31:13PM -0500, Rik van Riel wrote:
> > 
> > >  struct super_block;
> > >  struct vfsmount;
> > > @@ -73,6 +74,7 @@ struct vfsmount {
> > >  	struct super_block *mnt_sb;	/* pointer to superblock */
> > >  	int mnt_flags;
> > >  	struct user_namespace *mnt_userns;
> > > +	struct rcu_work free_rwork;
> > >  } __randomize_layout;
> > 
> > Wait, what?  First of all, that has no business being in vfsmount -
> > everything that deeply internal belongs in struct mount, not in
> > its public part.  Moreover, there's already mount->mnt_rcu, so what's
> > the point duplicating that?
> 
> Argh... You need rcu_work there...
> 
> OK, so make that a member of the same union mnt_rcu is.  In struct mount,
> please.  And I'm not sure I like the idea of shoving that much into
> struct mount, TBH...

We might have a plenty of struct mount instances.  Very few of them will
ever be internal, in the first place.  Fewer yet - using kern_unmount_rcu().
And it's not small.  If anything, I would consider something like
	call_rcu(&m->mnt_rcu, callback)
with callback adding struct mount to llist and doing schedule_delayed_work()
on that.  With work consisting of doing mntput on everything in it.

I'll get some sleep and put together something along those lines tomorrow
morning.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-19  6:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18 18:31 [PATCH 0/2] fix rate limited ipc_namespace freeing Rik van Riel
2022-02-18 18:31 ` [PATCH 1/2] vfs: free vfsmount through rcu work from kern_unmount Rik van Riel
2022-02-18 19:26   ` Al Viro
2022-02-18 19:33     ` Rik van Riel
2022-02-18 19:43       ` Al Viro
2022-02-18 20:24         ` Al Viro
2022-02-18 21:06           ` Al Viro
2022-02-19  5:50             ` Al Viro
2022-02-19  5:53   ` Al Viro
2022-02-19  5:58     ` Al Viro
2022-02-19  6:07       ` Al Viro
2022-02-18 18:31 ` [PATCH 2/2] ipc: get rid of free_ipc_work workqueue Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.