All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andrei Vagin <avagin@openvz.org>,
	Andrei Vagin <avagin@virtuozzo.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: [PATCH 4.12 63/84] mnt: Make propagate_umount less slow for overlapping mount propagation trees
Date: Wed, 19 Jul 2017 11:44:09 +0200	[thread overview]
Message-ID: <20170719092324.824523696@linuxfoundation.org> (raw)
In-Reply-To: <20170719092322.362625377@linuxfoundation.org>

4.12-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric W. Biederman <ebiederm@xmission.com>

commit 296990deb389c7da21c78030376ba244dc1badf5 upstream.

Andrei Vagin pointed out that time to executue propagate_umount can go
non-linear (and take a ludicrious amount of time) when the mount
propogation trees of the mounts to be unmunted by a lazy unmount
overlap.

Make the walk of the mount propagation trees nearly linear by
remembering which mounts have already been visited, allowing
subsequent walks to detect when walking a mount propgation tree or a
subtree of a mount propgation tree would be duplicate work and to skip
them entirely.

Walk the list of mounts whose propgatation trees need to be traversed
from the mount highest in the mount tree to mounts lower in the mount
tree so that odds are higher that the code will walk the largest trees
first, allowing later tree walks to be skipped entirely.

Add cleanup_umount_visitation to remover the code's memory of which
mounts have been visited.

Add the functions last_slave and skip_propagation_subtree to allow
skipping appropriate parts of the mount propagation tree without
needing to change the logic of the rest of the code.

A script to generate overlapping mount propagation trees:

$ cat runs.h
set -e
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/1 /mnt/2
mount -t tmpfs zdtm /mnt/1
mount --make-shared /mnt/1
mkdir /mnt/1/1

iteration=10
if [ -n "$1" ] ; then
	iteration=$1
fi

for i in $(seq $iteration); do
	mount --bind /mnt/1/1 /mnt/1/1
done

mount --rbind /mnt/1 /mnt/2

TIMEFORMAT='%Rs'
nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 ))
echo -n "umount -l /mnt/1 -> $nr        "
time umount -l /mnt/1

nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l )
time umount -l /mnt/2

$ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done

Here are the performance numbers with and without the patch:

     mhash |  8192   |  8192  | 1048576 | 1048576
    mounts | before  | after  |  before | after
    ------------------------------------------------
      1025 |  0.040s | 0.016s |  0.038s | 0.019s
      2049 |  0.094s | 0.017s |  0.080s | 0.018s
      4097 |  0.243s | 0.019s |  0.206s | 0.023s
      8193 |  1.202s | 0.028s |  1.562s | 0.032s
     16385 |  9.635s | 0.036s |  9.952s | 0.041s
     32769 | 60.928s | 0.063s | 44.321s | 0.064s
     65537 |         | 0.097s |         | 0.097s
    131073 |         | 0.233s |         | 0.176s
    262145 |         | 0.653s |         | 0.344s
    524289 |         | 2.305s |         | 0.735s
   1048577 |         | 7.107s |         | 2.603s

Andrei Vagin reports fixing the performance problem is part of the
work to fix CVE-2016-6213.

Fixes: a05964f3917c ("[PATCH] shared mounts handling: umount")
Reported-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/pnode.c |   63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 62 insertions(+), 1 deletion(-)

--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -24,6 +24,11 @@ static inline struct mount *first_slave(
 	return list_entry(p->mnt_slave_list.next, struct mount, mnt_slave);
 }
 
+static inline struct mount *last_slave(struct mount *p)
+{
+	return list_entry(p->mnt_slave_list.prev, struct mount, mnt_slave);
+}
+
 static inline struct mount *next_slave(struct mount *p)
 {
 	return list_entry(p->mnt_slave.next, struct mount, mnt_slave);
@@ -162,6 +167,19 @@ static struct mount *propagation_next(st
 	}
 }
 
+static struct mount *skip_propagation_subtree(struct mount *m,
+						struct mount *origin)
+{
+	/*
+	 * Advance m such that propagation_next will not return
+	 * the slaves of m.
+	 */
+	if (!IS_MNT_NEW(m) && !list_empty(&m->mnt_slave_list))
+		m = last_slave(m);
+
+	return m;
+}
+
 static struct mount *next_group(struct mount *m, struct mount *origin)
 {
 	while (1) {
@@ -505,6 +523,15 @@ static void restore_mounts(struct list_h
 	}
 }
 
+static void cleanup_umount_visitations(struct list_head *visited)
+{
+	while (!list_empty(visited)) {
+		struct mount *mnt =
+			list_first_entry(visited, struct mount, mnt_umounting);
+		list_del_init(&mnt->mnt_umounting);
+	}
+}
+
 /*
  * collect all mounts that receive propagation from the mount in @list,
  * and return these additional mounts in the same list.
@@ -517,11 +544,23 @@ int propagate_umount(struct list_head *l
 	struct mount *mnt;
 	LIST_HEAD(to_restore);
 	LIST_HEAD(to_umount);
+	LIST_HEAD(visited);
 
-	list_for_each_entry(mnt, list, mnt_list) {
+	/* Find candidates for unmounting */
+	list_for_each_entry_reverse(mnt, list, mnt_list) {
 		struct mount *parent = mnt->mnt_parent;
 		struct mount *m;
 
+		/*
+		 * If this mount has already been visited it is known that it's
+		 * entire peer group and all of their slaves in the propagation
+		 * tree for the mountpoint has already been visited and there is
+		 * no need to visit them again.
+		 */
+		if (!list_empty(&mnt->mnt_umounting))
+			continue;
+
+		list_add_tail(&mnt->mnt_umounting, &visited);
 		for (m = propagation_next(parent, parent); m;
 		     m = propagation_next(m, parent)) {
 			struct mount *child = __lookup_mnt(&m->mnt,
@@ -529,6 +568,27 @@ int propagate_umount(struct list_head *l
 			if (!child)
 				continue;
 
+			if (!list_empty(&child->mnt_umounting)) {
+				/*
+				 * If the child has already been visited it is
+				 * know that it's entire peer group and all of
+				 * their slaves in the propgation tree for the
+				 * mountpoint has already been visited and there
+				 * is no need to visit this subtree again.
+				 */
+				m = skip_propagation_subtree(m, parent);
+				continue;
+			} else if (child->mnt.mnt_flags & MNT_UMOUNT) {
+				/*
+				 * We have come accross an partially unmounted
+				 * mount in list that has not been visited yet.
+				 * Remember it has been visited and continue
+				 * about our merry way.
+				 */
+				list_add_tail(&child->mnt_umounting, &visited);
+				continue;
+			}
+
 			/* Check the child and parents while progress is made */
 			while (__propagate_umount(child,
 						  &to_umount, &to_restore)) {
@@ -542,6 +602,7 @@ int propagate_umount(struct list_head *l
 
 	umount_list(&to_umount, &to_restore);
 	restore_mounts(&to_restore);
+	cleanup_umount_visitations(&visited);
 	list_splice_tail(&to_umount, list);
 
 	return 0;

  parent reply	other threads:[~2017-07-19  9:51 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-19  9:43 [PATCH 4.12 00/84] 4.12.3-stable review Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 01/84] staging: android: uapi: drop definitions of removed ION_IOC_{FREE,SHARE} ioctls Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 02/84] net/mlx5: Fix driver load error flow when firmware is stuck Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 03/84] net/mlx5: Cancel delayed recovery work when unloading the driver Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 04/84] net/mlx5e: Fix TX carrier errors report in get stats ndo Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 05/84] ipv6: dad: dont remove dynamic addresses if link is down Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 06/84] vxlan: fix hlist corruption Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 08/84] net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64 Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 09/84] liquidio: fix bug in soft reset failure detection Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 11/84] vrf: fix bug_on triggered by rx when destroying a vrf Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 12/84] virtio-net: fix leaking of ctx array Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 13/84] rds: tcp: use sock_create_lite() to create the accept socket Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 14/84] net/mlx5e: Initialize CEEs getpermhwaddr address buffer to 0xff Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 15/84] cxgb4: fix BUG() on interrupt deallocating path of ULD Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 16/84] tap: convert a mutex to a spinlock Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 17/84] bridge: mdb: fix leak on complete_info ptr on fail path Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 19/84] sfc: dont read beyond unicast address list Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 20/84] Adding asm-prototypes.h for genksyms to generate crc Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 21/84] sed regex in Makefile.build requires line break between exported symbols Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 22/84] Adding the type of " Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 23/84] sparc64: Fix gup_huge_pmd Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 24/84] brcmfmac: Fix a memory leak in error handling path in brcmf_cfg80211_attach Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 25/84] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 26/84] x86/xen/efi: Initialize only the EFI struct members used by Xen Greg Kroah-Hartman
2017-07-19  9:43   ` Greg Kroah-Hartman
2017-07-19 10:37   ` Daniel Kiper
2017-07-19 10:37   ` Daniel Kiper
2017-07-19 11:12     ` Greg Kroah-Hartman
2017-07-19 11:19       ` Greg Kroah-Hartman
2017-07-19 11:19       ` Greg Kroah-Hartman
2017-07-19 11:57         ` Daniel Kiper
2017-07-19 11:57         ` Daniel Kiper
2017-07-19 11:28       ` Daniel Kiper
2017-07-19 11:28       ` Daniel Kiper
2017-07-19 11:12     ` Greg Kroah-Hartman
2017-07-20  8:39     ` Ingo Molnar
2017-07-20  8:39       ` Ingo Molnar
2017-07-20  9:16       ` Greg Kroah-Hartman
2017-07-20  9:16       ` Greg Kroah-Hartman
2017-07-20  9:41         ` Ingo Molnar
2017-07-20  9:41           ` Ingo Molnar
2017-07-20  9:41         ` Ingo Molnar
2017-07-20 12:33         ` Daniel Kiper
2017-07-20 12:33         ` Daniel Kiper
2017-07-20 12:33           ` Daniel Kiper
2017-07-21  6:32           ` Juergen Gross
2017-07-21  6:32           ` Juergen Gross
2017-07-21  6:32             ` Juergen Gross
2017-07-19  9:43 ` [PATCH 4.12 27/84] efi: Process the MEMATTR table only if EFI_MEMMAP is enabled Greg Kroah-Hartman
2017-07-19  9:43 ` Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 28/84] cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 29/84] cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 30/84] cfg80211: Check if PMKID attribute is of expected size Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 31/84] cfg80211: Check if NAN service ID " Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 32/84] drm/amdgpu/gfx6: properly cache mc_arb_ramcfg Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 33/84] KVM: ARM64: fix phy counter access failure in guest Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 34/84] KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 35/84] kvm-vfio: Decouple only when we match a group Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 36/84] irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 37/84] parisc: Report SIGSEGV instead of SIGBUS when running out of stack Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 38/84] parisc: use compat_sys_keyctl() Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 39/84] parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 40/84] parisc/mm: Ensure IRQs are off in switch_mm() Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 41/84] tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 42/84] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 43/84] thp, mm: fix crash due race in MADV_FREE handling Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 44/84] kernel/extable.c: mark core_kernel_text notrace Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 45/84] mm/list_lru.c: fix list_lru_count_node() to be race free Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 46/84] fs/dcache.c: fix spin lockup issue on nlru->lock Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 47/84] checkpatch: silence perl 5.26.0 unescaped left brace warnings Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 48/84] binfmt_elf: use ELF_ET_DYN_BASE only for PIE Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 49/84] arm: move ELF_ET_DYN_BASE to 4MB Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 50/84] arm64: move ELF_ET_DYN_BASE to 4GB / 4MB Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 51/84] powerpc: " Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 52/84] s390: reduce ELF_ET_DYN_BASE Greg Kroah-Hartman
2017-07-19  9:43 ` [PATCH 4.12 53/84] exec: Limit arg stack to at most 75% of _STK_LIM Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 54/84] powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 55/84] ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 56/84] arm64: Preventing READ_IMPLIES_EXEC propagation Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 57/84] vt: fix unchecked __put_user() in tioclinux ioctls Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 58/84] rcu: Add memory barriers for NOCB leader wakeup Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 59/84] nvmem: core: fix leaks on registration errors Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 60/84] Drivers: hv: vmbus: Close timing hole that can corrupt per-cpu page Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 61/84] mnt: In umount propagation reparent in a separate pass Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 62/84] mnt: In propgate_umount handle visiting mounts in any order Greg Kroah-Hartman
2017-07-19  9:44 ` Greg Kroah-Hartman [this message]
2017-07-19  9:44 ` [PATCH 4.12 64/84] selftests/capabilities: Fix the test_execve test Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 67/84] crypto: atmel - only treat EBUSY as transient if backlog Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 68/84] crypto: sha1-ssse3 - Disable avx2 Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 71/84] KEYS: DH: validate __spare field Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 72/84] sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 73/84] sched/topology: Fix building of overlapping sched-groups Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 74/84] sched/topology: Optimize build_group_mask() Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 75/84] sched/topology: Fix overlapping sched_group_mask Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 76/84] PM / wakeirq: Convert to SRCU Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 77/84] pstore: Fix leaked pstore_record in pstore_get_backend_records() Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 79/84] ALSA: hda/realtek - change the location for one of two front microphones Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 80/84] PM / QoS: return -EINVAL for bogus strings Greg Kroah-Hartman
2017-07-19  9:44 ` [PATCH 4.12 84/84] kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS Greg Kroah-Hartman
     [not found] ` <596f88cd.48b4df0a.797d0.d5e3@mx.google.com>
2017-07-19 16:36   ` [PATCH 4.12 00/84] 4.12.3-stable review Shuah Khan
2017-07-22 15:47     ` Kevin Hilman
2017-07-24  9:11       ` Sjoerd Simons
     [not found]   ` <7hfudosc1k.fsf@baylibre.com>
2017-08-01  8:21     ` Jan Lübbe
2017-07-19 20:35 ` Guenter Roeck
2017-07-20  5:06   ` Greg Kroah-Hartman
2017-07-19 23:37 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170719092324.824523696@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=avagin@openvz.org \
    --cc=avagin@virtuozzo.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.