All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Cc: linux-fsdevel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	Andrei Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
Subject: Re: [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once
Date: Thu, 13 Oct 2016 21:45:00 -0500	[thread overview]
Message-ID: <87wphb4pjn.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CANaxB-xPkgdyeg0z6TvExMfyy4uOC+Nu4Q99WpCscNKMWz8VPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> (Andrey Vagin's message of "Thu, 13 Oct 2016 19:31:22 -0700")

Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:

> On Thu, Oct 13, 2016 at 2:46 PM, Andrei Vagin <avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> wrote:
>> On Thu, Oct 13, 2016 at 02:53:46PM -0500, Eric W. Biederman wrote:
>>>
>>> Adrei Vagin pointed out that time to executue propagate_umount can go
>>> non-linear (and take a ludicrious amount of time) when the mount
>>> propogation trees of the mounts to be unmunted by a lazy unmount
>>> overlap.
>>>
>>> Solve this in the most straight forward way possible, by adding a new
>>> mount flag to mark parts of the mount propagation tree that have been
>>> visited, and use that mark to skip parts of the mount propagation tree
>>> that have already been visited during an unmount.  This guarantees
>>> that each mountpoint in the possibly overlapping mount propagation
>>> trees will be visited exactly once.
>>>
>>> Add the functions propagation_visit_next and propagation_revisit_next
>>> to coordinate setting and clearling the visited mount mark.
>>>
>>> Here is a script to generate such mount tree:
>>> $ cat run.sh
>>> mount -t tmpfs test-mount /mnt
>>> mount --make-shared /mnt
>>> for i in `seq $1`; do
>>>         mkdir /mnt/test.$i
>>>         mount --bind /mnt /mnt/test.$i
>>> done
>>> cat /proc/mounts | grep test-mount | wc -l
>>> time umount -l /mnt
>>> $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done
>>>
>>> Here are the performance numbers with and without the patch:
>>>
>>> mounts | before | after (real sec)
>>> -----------------------------
>>>   1024 |  0.071 | 0.024
>>>   2048 |  0.184 | 0.030
>>>   4096 |  0.604 | 0.040
>>>   8912 |  4.471 | 0.043
>>>  16384 | 34.826 | 0.082
>>>  32768 |        | 0.151
>>>  65536 |        | 0.289
>>> 131072 |        | 0.659
>>>
>>> Andrei Vagin fixing this performance problem is part of the
>>> work to fix CVE-2016-6213.
>>>
>>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Reported-by: Andrei Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>> ---
>>>
>>> Andrei can you take a look at this patch and see if you can see any
>>> problems.  My limited testing suggests this approach does a much better
>>> job of solving the problem you were seeing.  With the time looking
>>> almost linear in the number of mounts now.
>>
>> I read this patch and I like the idea.
>>
>> Then I run my tests and one of them doesn't work with this patch.
>> I haven't found a reason yet.
>
>>> +     for (m = propagation_visit_next(parent, parent); m;
>>> +                     m = propagation_visit_next(m, parent)) {
>>>               struct mount *child = __lookup_mnt_last(&m->mnt,
>>>                                               mnt->mnt_mountpoint);
>
> The reason is that this loop is called for different "mnt", but
> it is executed only once with this optimization.
>
> So I think the idea to mark parent will not work, because one parent
> can have a few children which have to be umounted.

Good catch.  So what needs to be marked is the parent mount and
mountpoint combination.  Which is effectively the child mount.

I still think replacing the propagation_next and fixing the propagation
walk is the way to go.   But it sounds like to make things work the
__lookup_mnt_last needs to be moved into the propagation walk function.

That doesn't feel to hard.  I will have to see what the code looks like.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrey Vagin <avagin@openvz.org>
Cc: Andrei Vagin <avagin@virtuozzo.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once
Date: Thu, 13 Oct 2016 21:45:00 -0500	[thread overview]
Message-ID: <87wphb4pjn.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CANaxB-xPkgdyeg0z6TvExMfyy4uOC+Nu4Q99WpCscNKMWz8VPg@mail.gmail.com> (Andrey Vagin's message of "Thu, 13 Oct 2016 19:31:22 -0700")

Andrey Vagin <avagin@openvz.org> writes:

> On Thu, Oct 13, 2016 at 2:46 PM, Andrei Vagin <avagin@virtuozzo.com> wrote:
>> On Thu, Oct 13, 2016 at 02:53:46PM -0500, Eric W. Biederman wrote:
>>>
>>> Adrei Vagin pointed out that time to executue propagate_umount can go
>>> non-linear (and take a ludicrious amount of time) when the mount
>>> propogation trees of the mounts to be unmunted by a lazy unmount
>>> overlap.
>>>
>>> Solve this in the most straight forward way possible, by adding a new
>>> mount flag to mark parts of the mount propagation tree that have been
>>> visited, and use that mark to skip parts of the mount propagation tree
>>> that have already been visited during an unmount.  This guarantees
>>> that each mountpoint in the possibly overlapping mount propagation
>>> trees will be visited exactly once.
>>>
>>> Add the functions propagation_visit_next and propagation_revisit_next
>>> to coordinate setting and clearling the visited mount mark.
>>>
>>> Here is a script to generate such mount tree:
>>> $ cat run.sh
>>> mount -t tmpfs test-mount /mnt
>>> mount --make-shared /mnt
>>> for i in `seq $1`; do
>>>         mkdir /mnt/test.$i
>>>         mount --bind /mnt /mnt/test.$i
>>> done
>>> cat /proc/mounts | grep test-mount | wc -l
>>> time umount -l /mnt
>>> $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done
>>>
>>> Here are the performance numbers with and without the patch:
>>>
>>> mounts | before | after (real sec)
>>> -----------------------------
>>>   1024 |  0.071 | 0.024
>>>   2048 |  0.184 | 0.030
>>>   4096 |  0.604 | 0.040
>>>   8912 |  4.471 | 0.043
>>>  16384 | 34.826 | 0.082
>>>  32768 |        | 0.151
>>>  65536 |        | 0.289
>>> 131072 |        | 0.659
>>>
>>> Andrei Vagin fixing this performance problem is part of the
>>> work to fix CVE-2016-6213.
>>>
>>> Cc: stable@vger.kernel.org
>>> Reported-by: Andrei Vagin <avagin@openvz.org>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>
>>> Andrei can you take a look at this patch and see if you can see any
>>> problems.  My limited testing suggests this approach does a much better
>>> job of solving the problem you were seeing.  With the time looking
>>> almost linear in the number of mounts now.
>>
>> I read this patch and I like the idea.
>>
>> Then I run my tests and one of them doesn't work with this patch.
>> I haven't found a reason yet.
>
>>> +     for (m = propagation_visit_next(parent, parent); m;
>>> +                     m = propagation_visit_next(m, parent)) {
>>>               struct mount *child = __lookup_mnt_last(&m->mnt,
>>>                                               mnt->mnt_mountpoint);
>
> The reason is that this loop is called for different "mnt", but
> it is executed only once with this optimization.
>
> So I think the idea to mark parent will not work, because one parent
> can have a few children which have to be umounted.

Good catch.  So what needs to be marked is the parent mount and
mountpoint combination.  Which is effectively the child mount.

I still think replacing the propagation_next and fixing the propagation
walk is the way to go.   But it sounds like to make things work the
__lookup_mnt_last needs to be moved into the propagation walk function.

That doesn't feel to hard.  I will have to see what the code looks like.

Eric

  parent reply	other threads:[~2016-10-14  2:45 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-10 23:26 [PATCH] [v3] mount: dont execute propagate_umount() many times for same mounts Andrei Vagin
2016-10-10 23:26 ` Andrei Vagin
     [not found] ` <1476141965-21429-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2016-10-13 17:14   ` Eric W. Biederman
2016-10-13 17:14 ` Eric W. Biederman
2016-10-13 19:53   ` [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once Eric W. Biederman
2016-10-13 21:46     ` Andrei Vagin
     [not found]       ` <20161013214650.GB19836-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-14  2:31         ` Andrey Vagin
2016-10-14  2:31           ` Andrey Vagin
     [not found]           ` <CANaxB-xPkgdyeg0z6TvExMfyy4uOC+Nu4Q99WpCscNKMWz8VPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-14  2:45             ` Eric W. Biederman [this message]
2016-10-14  2:45               ` Eric W. Biederman
     [not found]               ` <87wphb4pjn.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-14 18:29                 ` [RFC][PATCH v2] " Eric W. Biederman
2016-10-14 18:29                   ` Eric W. Biederman
     [not found]                   ` <8737jy3htt.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-18  2:40                     ` Andrei Vagin
2016-10-18  2:40                       ` Andrei Vagin
     [not found]                       ` <20161018024000.GA4901-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-18  6:49                         ` Eric W. Biederman
2016-10-18  6:49                           ` Eric W. Biederman
2016-10-19  3:46                           ` [REVIEW][PATCH] mount: In propagate_umount handle overlapping mount propagation trees Eric W. Biederman
     [not found]                             ` <877f95ngpr.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-20 21:30                               ` Andrei Vagin
2016-10-20 21:30                                 ` Andrei Vagin
     [not found]                                 ` <20161020213052.GA25226-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-21 19:26                                   ` Eric W. Biederman
2016-10-21 19:26                                     ` Eric W. Biederman
     [not found]                                     ` <87pomtec6c.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-22 19:42                                       ` [RFC][PATCH v2] " Eric W. Biederman
2016-10-22 19:42                                         ` Eric W. Biederman
     [not found]                                         ` <877f90b27o.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-25 20:58                                           ` Andrei Vagin
2016-10-25 20:58                                             ` Andrei Vagin
     [not found]                                             ` <20161025205846.GA25080-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-25 21:45                                               ` Eric W. Biederman
2016-10-25 21:45                                                 ` Eric W. Biederman
     [not found]                                                 ` <87mvhs14s7.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-25 23:41                                                   ` Andrei Vagin
2016-10-25 23:41                                                     ` Andrei Vagin
2016-10-26  1:42                                                     ` Eric W. Biederman
     [not found]                                                     ` <20161025234125.GA20335-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-10-26  1:42                                                       ` Eric W. Biederman
2016-11-01  6:14                                                   ` Andrei Vagin
2016-11-01  6:14                                                     ` Andrei Vagin
     [not found]                           ` <87r37e9mnj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2016-10-19  3:46                             ` [REVIEW][PATCH] " Eric W. Biederman
     [not found]     ` <87pon458l1.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-13 21:46       ` [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once Andrei Vagin
     [not found]   ` <877f9c6ui8.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-10-13 19:53     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wphb4pjn.fsf@x220.int.ebiederm.org \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
    --cc=avagin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org \
    --cc=avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.