linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrei Vagin <avagin@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	containers@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v3] mount: dont execute propagate_umount() many times for same mounts
Date: Thu, 13 Oct 2016 12:14:55 -0500	[thread overview]
Message-ID: <877f9c6ui8.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <1476141965-21429-1-git-send-email-avagin@openvz.org> (Andrei Vagin's message of "Mon, 10 Oct 2016 16:26:05 -0700")

Andrei Vagin <avagin@openvz.org> writes:

> The reason of this optimization is that umount() can hold namespace_sem
> for a long time, this semaphore is global, so it affects all users.
> Recently Eric W. Biederman added a per mount namespace limit on the
> number of mounts. The default number of mounts allowed per mount
> namespace at 100,000. Currently this value is allowed to construct a tree
> which requires hours to be umounted.
>
> In a worse case the current complexity of umount_tree() is O(n^3).
> * Enumirate all mounts in a target tree (propagate_umount)
> * Enumirate mounts to find where these changes have to
>   be propagated (mark_umount_candidates)
> * Enumirate mounts to find a requered mount by parent and dentry
>   (__lookup_mnt_lat)
>
> The worse case is when all mounts from the tree live in the same shared
> group. In this case we have to enumirate all mounts on each step.
>
> Here we can optimize the second step. We don't need to make it for
> mounts which we already met when we did this step for previous mounts.
> It reduces the complexity of umount_tree() to O(n^2).

To O(n) not O(n^2).

A hash table lookup (aka __lookup_mnt() and friends) is O(1) or the hash
table is malfunctioning.  Please don't call 

Arguably we are getting into sizes where
the mount hash table fills up and is on the edge of malfunctioning, but
that is not particularly relevant to this case.

What your patch is aiming to do is to take a O(n^2) algorithm and
make it O(n).  That is very much worth doing.

However your patch confuses two separate issues.  Marking mounts that
may be unmounted.  Marking pieces of the propagation tree that have
already been traversed.

I do not see anything requiring propagation trees to intersect at the
set of mounts that are unmounted in umount_tree before propagate_umount
is called.  Which means there are topologies where we can and should do
better than your patch.

I am also bothered that your patch changes how we look up the mount
mounted on a mount point  (aka playing with __lookup_mnt_last).  There
is no reason to do that to solve the problem, and I think it obscures
what is actually going on.

I am going to see if I can rework your basic concept with explicit
marking of the propagation tree.  In the meantime for people
who are want to see what your patch is doing the version below
essentially does the same thing, without the extra essentially
meaningless loop.

Eric


diff --git a/fs/namespace.c b/fs/namespace.c
index 8183fba9ab4d..33a76ee1b76b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -650,13 +650,11 @@ struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry)
 	p = __lookup_mnt(mnt, dentry);
 	if (!p)
 		goto out;
-	if (!(p->mnt.mnt_flags & MNT_UMOUNT))
-		res = p;
+	res = p;
 	hlist_for_each_entry_continue(p, mnt_hash) {
 		if (&p->mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry)
 			break;
-		if (!(p->mnt.mnt_flags & MNT_UMOUNT))
-			res = p;
+		res = p;
 	}
 out:
 	return res;
diff --git a/fs/pnode.c b/fs/pnode.c
index 234a9ac49958..ade5e7d8308c 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -399,11 +399,18 @@ static void mark_umount_candidates(struct mount *mnt)
 
 	BUG_ON(parent == mnt);
 
+	if (IS_MNT_MARKED(mnt))
+		return;
+
+	SET_MNT_MARK(mnt);
+
 	for (m = propagation_next(parent, parent); m;
 			m = propagation_next(m, parent)) {
 		struct mount *child = __lookup_mnt_last(&m->mnt,
 						mnt->mnt_mountpoint);
-		if (child && (!IS_MNT_LOCKED(child) || IS_MNT_MARKED(m))) {
+		if (child && ((child->mnt.mnt_flags & MNT_UMOUNT) ||
+			      !IS_MNT_LOCKED(child) ||
+			      IS_MNT_MARKED(m))) {
 			SET_MNT_MARK(child);
 		}
 	}
@@ -420,6 +427,11 @@ static void __propagate_umount(struct mount *mnt)
 
 	BUG_ON(parent == mnt);
 
+	if (!IS_MNT_MARKED(mnt))
+		return;
+
+	CLEAR_MNT_MARK(mnt);
+
 	for (m = propagation_next(parent, parent); m;
 			m = propagation_next(m, parent)) {
 
@@ -432,7 +444,8 @@ static void __propagate_umount(struct mount *mnt)
 		if (!child || !IS_MNT_MARKED(child))
 			continue;
 		CLEAR_MNT_MARK(child);
-		if (list_empty(&child->mnt_mounts)) {
+		if (!(child->mnt.mnt_flags & MNT_UMOUNT) &&
+		    list_empty(&child->mnt_mounts)) {
 			list_del_init(&child->mnt_child);
 			child->mnt.mnt_flags |= MNT_UMOUNT;
 			list_move_tail(&child->mnt_list, &mnt->mnt_list);

  reply	other threads:[~2016-10-13 17:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-10 23:26 [PATCH] [v3] mount: dont execute propagate_umount() many times for same mounts Andrei Vagin
2016-10-13 17:14 ` Eric W. Biederman [this message]
2016-10-13 19:53   ` [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once Eric W. Biederman
2016-10-13 21:46     ` Andrei Vagin
2016-10-14  2:31       ` Andrey Vagin
2016-10-14  2:45         ` Eric W. Biederman
2016-10-14 18:29           ` [RFC][PATCH v2] " Eric W. Biederman
2016-10-18  2:40             ` Andrei Vagin
2016-10-18  6:49               ` Eric W. Biederman
2016-10-19  3:46                 ` [REVIEW][PATCH] mount: In propagate_umount handle overlapping mount propagation trees Eric W. Biederman
2016-10-20 21:30                   ` Andrei Vagin
2016-10-21 19:26                     ` Eric W. Biederman
2016-10-22 19:42                       ` [RFC][PATCH v2] " Eric W. Biederman
2016-10-25 20:58                         ` Andrei Vagin
2016-10-25 21:45                           ` Eric W. Biederman
2016-10-25 23:41                             ` Andrei Vagin
2016-10-26  1:42                               ` Eric W. Biederman
2016-11-01  6:14                             ` Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877f9c6ui8.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=avagin@openvz.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).