All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konstantin Khlebnikov <koct9i@gmail.com>
To: Michel Lespinasse <walken@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Tim Hartrick <tim@edgecast.com>, Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH] Repeated fork() causes SLAB to grow without bound
Date: Thu, 20 Nov 2014 18:42:03 +0400	[thread overview]
Message-ID: <CALYGNiOC4dEzzVzSQXGC4oxLbgp=8TC=A+duJs67jT97TWQ++g@mail.gmail.com> (raw)
In-Reply-To: <CANN689G+y77m2_paF0vBpHG8EsJ2-pEnJvLJSGs-zHf+SqTEjQ@mail.gmail.com>

On Thu, Nov 20, 2014 at 2:14 AM, Michel Lespinasse <walken@google.com> wrote:
> On Wed, Nov 19, 2014 at 8:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> On Wed, Nov 19, 2014 at 7:09 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>>> Also from reading http://lwn.net/Articles/383162/ I understand that correctness
>>> also depends on the hierarchy and I wonder if there's a danger of reintroducing
>>> a bug like the one described there.
>>
>> If I remember right that was fixed by linking non-exclusively mapped pages to
>> root anon_vma instead of anon_vma from vma where fault has happened.
>> After my patch this still works. Topology hierarchy actually isn't used.
>> Here just one selected "root' anon_vma which dies last. That's all.
>
> That's not how I remember it.

??? That at the end of lwn article:

[quote]
The fix is straightforward; when linking an existing page to an
anon_vma structure,
the kernel needs to pick the one which is highest in the process hierarchy;
that guarantees that the anon_vma will not go away prematurely.
[/quote]

nowdays this happens in __page_set_anon_rmap():

/*
* If the page isn't exclusively mapped into this vma,
* we must use the _oldest_ possible anon_vma for the
* page mapping!
*/
if (!exclusive)
    anon_vma = anon_vma->root;

The rest treeish of topology affects only performance.

>
> An anon_vma corresponds to a given vma V, and is used to track all
> vmas (V and descendant vmas) that may include a page that was
> originally mapped in V.
>
> Each anon page has a link to the anon_vma corresponding to the vma
> they were originally faulted in, and an offset indicating where the
> page was located relative to that original VMA.
>
> The anon_vma has an interval tree of struct anon_vma_chain, and each
> struct anon_vma_chain includes a link to a descendent-of-V vma. This
> allows rmap to quickly find all the vmas that may map a given page
> (based on the page's anon_vma and offset).
>
> When forking or splitting vmas, the new vma is a descendent of the
> same vmas as the old one so it must be added to all the anon_vma
> interval trees that were referencing the old one (that is, ancestors
> of the new vma). To that end, all the struct anon_vma_chain pointing
> to a given vma are kept on a linked list, and struct anon_vma_chain
> includes a link to the anon_vma holding the interval tree.
>
> Locking the entire structure is done with a single lock hosted in the
> root anon_vma (that is, a vma that was created by mmap() and not by
> cloning or forking existing vmas).
>
> Limit the length of the ancestors linked list is correct, though it
> has performance implications. In the extreme case, forcing all vmas to
> be added on the root vma's interval tree would be correct, though it
> may re-introduce the performance problems that lead to the
> introduction of anon_vma.
>
> The good thing about Konstantin's proposal is that it does not have
> any magic constant like mine did. However, I think he is mistaken in
> saying that hierarchy isn't used - an ancestor vma will always have
> more descendents than its children, and the reason for the hierarchy
> is to limit the number of vmas that rmap must explore.

I mean after breaking hierarchy whole structure stays correct and kernel
wouldn't explode, of course reusing anon_vma from ancestor makes
rmap walk less effective because newly allocated pages will get false
aliased vmas where they will never be mapped.


I'm thinking about limitation for reusing anon_vmas which might increase
performance without breaking asymptotic estimation of count anon_vma in
the worst case. For example this heuristic: allow to reuse only anon_vma
with single direct descendant. It seems there will be arount up to two times
more anon_vmas but false-aliasing must be much lower.



>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.

WARNING: multiple messages have this Message-ID (diff)
From: Konstantin Khlebnikov <koct9i@gmail.com>
To: Michel Lespinasse <walken@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Tim Hartrick <tim@edgecast.com>, Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH] Repeated fork() causes SLAB to grow without bound
Date: Thu, 20 Nov 2014 18:42:03 +0400	[thread overview]
Message-ID: <CALYGNiOC4dEzzVzSQXGC4oxLbgp=8TC=A+duJs67jT97TWQ++g@mail.gmail.com> (raw)
In-Reply-To: <CANN689G+y77m2_paF0vBpHG8EsJ2-pEnJvLJSGs-zHf+SqTEjQ@mail.gmail.com>

On Thu, Nov 20, 2014 at 2:14 AM, Michel Lespinasse <walken@google.com> wrote:
> On Wed, Nov 19, 2014 at 8:58 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>> On Wed, Nov 19, 2014 at 7:09 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>>> Also from reading http://lwn.net/Articles/383162/ I understand that correctness
>>> also depends on the hierarchy and I wonder if there's a danger of reintroducing
>>> a bug like the one described there.
>>
>> If I remember right that was fixed by linking non-exclusively mapped pages to
>> root anon_vma instead of anon_vma from vma where fault has happened.
>> After my patch this still works. Topology hierarchy actually isn't used.
>> Here just one selected "root' anon_vma which dies last. That's all.
>
> That's not how I remember it.

??? That at the end of lwn article:

[quote]
The fix is straightforward; when linking an existing page to an
anon_vma structure,
the kernel needs to pick the one which is highest in the process hierarchy;
that guarantees that the anon_vma will not go away prematurely.
[/quote]

nowdays this happens in __page_set_anon_rmap():

/*
* If the page isn't exclusively mapped into this vma,
* we must use the _oldest_ possible anon_vma for the
* page mapping!
*/
if (!exclusive)
    anon_vma = anon_vma->root;

The rest treeish of topology affects only performance.

>
> An anon_vma corresponds to a given vma V, and is used to track all
> vmas (V and descendant vmas) that may include a page that was
> originally mapped in V.
>
> Each anon page has a link to the anon_vma corresponding to the vma
> they were originally faulted in, and an offset indicating where the
> page was located relative to that original VMA.
>
> The anon_vma has an interval tree of struct anon_vma_chain, and each
> struct anon_vma_chain includes a link to a descendent-of-V vma. This
> allows rmap to quickly find all the vmas that may map a given page
> (based on the page's anon_vma and offset).
>
> When forking or splitting vmas, the new vma is a descendent of the
> same vmas as the old one so it must be added to all the anon_vma
> interval trees that were referencing the old one (that is, ancestors
> of the new vma). To that end, all the struct anon_vma_chain pointing
> to a given vma are kept on a linked list, and struct anon_vma_chain
> includes a link to the anon_vma holding the interval tree.
>
> Locking the entire structure is done with a single lock hosted in the
> root anon_vma (that is, a vma that was created by mmap() and not by
> cloning or forking existing vmas).
>
> Limit the length of the ancestors linked list is correct, though it
> has performance implications. In the extreme case, forcing all vmas to
> be added on the root vma's interval tree would be correct, though it
> may re-introduce the performance problems that lead to the
> introduction of anon_vma.
>
> The good thing about Konstantin's proposal is that it does not have
> any magic constant like mine did. However, I think he is mistaken in
> saying that hierarchy isn't used - an ancestor vma will always have
> more descendents than its children, and the reason for the hierarchy
> is to limit the number of vmas that rmap must explore.

I mean after breaking hierarchy whole structure stays correct and kernel
wouldn't explode, of course reusing anon_vma from ancestor makes
rmap walk less effective because newly allocated pages will get false
aliased vmas where they will never be mapped.


I'm thinking about limitation for reusing anon_vmas which might increase
performance without breaking asymptotic estimation of count anon_vma in
the worst case. For example this heuristic: allow to reuse only anon_vma
with single direct descendant. It seems there will be arount up to two times
more anon_vmas but false-aliasing must be much lower.



>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-11-20 14:42 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-16  2:46 Repeated fork() causes SLAB to grow without bound Daniel Forrest
2012-08-16 18:58 ` Rik van Riel
2012-08-16 18:58   ` Rik van Riel
2012-08-18  0:03   ` Daniel Forrest
2012-08-18  0:03     ` Daniel Forrest
2012-08-18  3:46     ` Rik van Riel
2012-08-18  3:46       ` Rik van Riel
2012-08-18  4:07       ` Daniel Forrest
2012-08-18  4:07         ` Daniel Forrest
2012-08-18  4:10         ` Rik van Riel
2012-08-18  4:10           ` Rik van Riel
2012-08-20  8:00       ` Hugh Dickins
2012-08-20  8:00         ` Hugh Dickins
2012-08-20  9:39         ` Michel Lespinasse
2012-08-20  9:39           ` Michel Lespinasse
2012-08-20 11:11           ` Andi Kleen
2012-08-20 11:11             ` Andi Kleen
2012-08-20 11:17           ` Rik van Riel
2012-08-20 11:17             ` Rik van Riel
2012-08-20 11:53             ` Michel Lespinasse
2012-08-20 11:53               ` Michel Lespinasse
2012-08-20 19:11               ` Michel Lespinasse
2012-08-20 19:11                 ` Michel Lespinasse
2012-08-22  3:20           ` [RFC PATCH] " Michel Lespinasse
2012-08-22  3:20             ` Michel Lespinasse
2012-08-22  3:29             ` Rik van Riel
2012-08-22  3:29               ` Rik van Riel
2013-06-03 19:50               ` Daniel Forrest
2013-06-03 19:50                 ` Daniel Forrest
2013-06-04 10:37                 ` Rik van Riel
2013-06-04 10:37                   ` Rik van Riel
2013-06-05 14:02                   ` Andrea Arcangeli
2013-06-05 14:02                     ` Andrea Arcangeli
2014-11-14 16:30                 ` [PATCH] " Daniel Forrest
2014-11-14 16:30                   ` Daniel Forrest
2014-11-18  0:02                   ` Andrew Morton
2014-11-18  0:02                     ` Andrew Morton
2014-11-18  1:41                     ` Daniel Forrest
2014-11-18  1:41                       ` Daniel Forrest
2014-11-18  2:41                       ` Rik van Riel
2014-11-18  2:41                         ` Rik van Riel
2014-11-18 20:19                         ` Andrew Morton
2014-11-18 20:19                           ` Andrew Morton
2014-11-18 22:15                           ` Konstantin Khlebnikov
2014-11-18 22:15                             ` Konstantin Khlebnikov
2014-11-18 23:02                             ` Konstantin Khlebnikov
2014-11-18 23:50                               ` Vlastimil Babka
2014-11-18 23:50                                 ` Vlastimil Babka
2014-11-19 14:36                                 ` Konstantin Khlebnikov
2014-11-19 14:36                                   ` Konstantin Khlebnikov
2014-11-19 16:09                                   ` Vlastimil Babka
2014-11-19 16:09                                     ` Vlastimil Babka
2014-11-19 16:58                                     ` Konstantin Khlebnikov
2014-11-19 16:58                                       ` Konstantin Khlebnikov
2014-11-19 23:14                                       ` Michel Lespinasse
2014-11-19 23:14                                         ` Michel Lespinasse
2014-11-20 14:42                                         ` Konstantin Khlebnikov [this message]
2014-11-20 14:42                                           ` Konstantin Khlebnikov
2014-11-20 14:50                                           ` Rik van Riel
2014-11-20 14:50                                             ` Rik van Riel
2014-11-20 15:03                                             ` Konstantin Khlebnikov
2014-11-20 15:03                                               ` Konstantin Khlebnikov
2014-11-24  7:09                                               ` Konstantin Khlebnikov
2014-11-25 10:59                                                 ` Michal Hocko
2014-11-25 10:59                                                   ` Michal Hocko
2014-11-25 12:13                                                   ` Konstantin Khlebnikov
2014-11-25 15:00                                                     ` Michal Hocko
2014-11-25 15:00                                                       ` Michal Hocko
2014-11-26 17:35                                                       ` Michal Hocko
2014-11-26 17:35                                                         ` Michal Hocko
2014-12-05 15:44                                                         ` Jerome Marchand
2014-11-20 15:27                                           ` Michel Lespinasse
2014-11-20 15:27                                             ` Michel Lespinasse
2014-11-19  2:48                           ` Rik van Riel
2014-11-19  2:48                             ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALYGNiOC4dEzzVzSQXGC4oxLbgp=8TC=A+duJs67jT97TWQ++g@mail.gmail.com' \
    --to=koct9i@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=tim@edgecast.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.