All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Marchand <jmarchan@redhat.com>
To: Michal Hocko <mhocko@suse.cz>, Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Tim Hartrick <tim@edgecast.com>,
	Daniel Forrest <dan.forrest@ssec.wisc.edu>
Subject: Re: [PATCH] Repeated fork() causes SLAB to grow without bound
Date: Fri, 05 Dec 2014 16:44:48 +0100	[thread overview]
Message-ID: <5481D2F0.2090908@redhat.com> (raw)
In-Reply-To: <20141126173517.GA8180@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 8020 bytes --]

On 11/26/2014 06:35 PM, Michal Hocko wrote:
> On Tue 25-11-14 16:00:06, Michal Hocko wrote:
>> On Tue 25-11-14 16:13:16, Konstantin Khlebnikov wrote:
>>> On Tue, Nov 25, 2014 at 1:59 PM, Michal Hocko <mhocko@suse.cz> wrote:
>>>> On Mon 24-11-14 11:09:40, Konstantin Khlebnikov wrote:
>>>>> On Thu, Nov 20, 2014 at 6:03 PM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
>>>>>> On Thu, Nov 20, 2014 at 5:50 PM, Rik van Riel <riel@redhat.com> wrote:
>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>> Hash: SHA1
>>>>>>>
>>>>>>> On 11/20/2014 09:42 AM, Konstantin Khlebnikov wrote:
>>>>>>>
>>>>>>>> I'm thinking about limitation for reusing anon_vmas which might
>>>>>>>> increase performance without breaking asymptotic estimation of
>>>>>>>> count anon_vma in the worst case. For example this heuristic: allow
>>>>>>>> to reuse only anon_vma with single direct descendant. It seems
>>>>>>>> there will be arount up to two times more anon_vmas but
>>>>>>>> false-aliasing must be much lower.
>>>>>
>>>>> Done. RFC patch in attachment.
> 
> Ok, finally managed to untagnle myself from vma chains and your patch
> makes sense to me, it is quite clever actually. Here is it including the
> fixup.
> ---
>> From 1d4b0b38198c69ecfeb37670cb1dda767a802c9a Mon Sep 17 00:00:00 2001
>> From: Konstantin Khlebnikov <koct9i@gmail.com>
>> Date: Tue, 25 Nov 2014 10:54:44 +0100
>> Subject: [PATCH] mm: prevent endless growth of anon_vma hierarchy
>>
>> Constantly forking task causes unlimited grow of anon_vma chain.
>> Each next child allocate new level of anon_vmas and links vmas to all
>> previous levels because it inherits pages from them. None of anon_vmas
>> cannot be freed because there might be pages which points to them.
>>
>> This patch adds heuristic which decides to reuse existing anon_vma instead
>> of forking new one. It counts vmas and direct descendants for each anon_vma.
>> Anon_vma with degree lower than two will be reused at next fork.
>> As a result each anon_vma has either alive vma or at least two descendants,
>> endless chains are no longer possible and count of anon_vmas is no more than
>> two times more than count of vmas.
>>
>> Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
>> Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
> 
> Tested-by: Michal Hocko <mhocko@suse.cz>
> Reviewed-by: Michal Hocko <mhocko@suse.cz>
> 
> and I guess
> Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu>

Tested-by: Jerome Marchand <jmarchan@redhat.com>

Minor nitpicks below.

> 
> who somehow vanished from CC list (added back) would be appropriate as
> well.
> 
> plus
> 
> Fixes: 5beb49305251 (mm: change anon_vma linking to fix multi-process server scalability issue)
> and mark it for stable
> 
> Thanks!
> 
>> ---
>>  include/linux/rmap.h | 16 ++++++++++++++++
>>  mm/rmap.c            | 29 ++++++++++++++++++++++++++++-
>>  2 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index c0c2bce6b0b7..b1d140c20b37 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -45,6 +45,22 @@ struct anon_vma {
>>  	 * mm_take_all_locks() (mm_all_locks_mutex).
>>  	 */
>>  	struct rb_root rb_root;	/* Interval tree of private "related" vmas */
>> +
>> +	/*
>> +	 * Count of child anon_vmas and VMAs which points to this anon_vma.
>> +	 *
>> +	 * This counter is used for making decision about reusing old anon_vma
>> +	 * instead of forking new one. It allows to detect anon_vmas which have
>> +	 * just one direct descendant and no vmas. Reusing such anon_vma not
>> +	 * leads to significant preformance regression but prevents degradation

Does it or does it not lead to significant performance issue? I can't tell.

>> +	 * of anon_vma hierarchy to endless linear chain.
>> +	 *
>> +	 * Root anon_vma is never reused because it is its own parent and it has
>> +	 * at leat one vma or child, thus at fork it's degree is at least 2.

s/leat/least/

Thanks,
Jerome

>> +	 */
>> +	unsigned degree;
>> +
>> +	struct anon_vma *parent;	/* Parent of this anon_vma */
>>  };
>>  
>>  /*
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 19886fb2f13a..40ae8184a1e1 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -72,6 +72,8 @@ static inline struct anon_vma *anon_vma_alloc(void)
>>  	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
>>  	if (anon_vma) {
>>  		atomic_set(&anon_vma->refcount, 1);
>> +		anon_vma->degree = 1;	/* Reference for first vma */
>> +		anon_vma->parent = anon_vma;
>>  		/*
>>  		 * Initialise the anon_vma root to point to itself. If called
>>  		 * from fork, the root will be reset to the parents anon_vma.
>> @@ -188,6 +190,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
>>  		if (likely(!vma->anon_vma)) {
>>  			vma->anon_vma = anon_vma;
>>  			anon_vma_chain_link(vma, avc, anon_vma);
>> +			anon_vma->degree++;
>>  			allocated = NULL;
>>  			avc = NULL;
>>  		}
>> @@ -256,7 +259,17 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>  		anon_vma = pavc->anon_vma;
>>  		root = lock_anon_vma_root(root, anon_vma);
>>  		anon_vma_chain_link(dst, avc, anon_vma);
>> +
>> +		/*
>> +		 * Reuse existing anon_vma if its degree lower than two,
>> +		 * that means it has no vma and just one anon_vma child.
>> +		 */
>> +		if (!dst->anon_vma && anon_vma != src->anon_vma &&
>> +				anon_vma->degree < 2)
>> +			dst->anon_vma = anon_vma;
>>  	}
>> +	if (dst->anon_vma)
>> +		dst->anon_vma->degree++;
>>  	unlock_anon_vma_root(root);
>>  	return 0;
>>  
>> @@ -279,6 +292,9 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>  	if (!pvma->anon_vma)
>>  		return 0;
>>  
>> +	/* Drop inherited anon_vma, we'll reuse old one or allocate new. */
>> +	vma->anon_vma = NULL;
>> +
>>  	/*
>>  	 * First, attach the new VMA to the parent VMA's anon_vmas,
>>  	 * so rmap can find non-COWed pages in child processes.
>> @@ -286,6 +302,10 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>  	if (anon_vma_clone(vma, pvma))
>>  		return -ENOMEM;
>>  
>> +	/* An old anon_vma has been reused. */
>> +	if (vma->anon_vma)
>> +		return 0;
>> +
>>  	/* Then add our own anon_vma. */
>>  	anon_vma = anon_vma_alloc();
>>  	if (!anon_vma)
>> @@ -299,6 +319,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>  	 * lock any of the anon_vmas in this anon_vma tree.
>>  	 */
>>  	anon_vma->root = pvma->anon_vma->root;
>> +	anon_vma->parent = pvma->anon_vma;
>>  	/*
>>  	 * With refcounts, an anon_vma can stay around longer than the
>>  	 * process it belongs to. The root anon_vma needs to be pinned until
>> @@ -309,6 +330,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>  	vma->anon_vma = anon_vma;
>>  	anon_vma_lock_write(anon_vma);
>>  	anon_vma_chain_link(vma, avc, anon_vma);
>> +	anon_vma->parent->degree++;
>>  	anon_vma_unlock_write(anon_vma);
>>  
>>  	return 0;
>> @@ -339,12 +361,16 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
>>  		 * Leave empty anon_vmas on the list - we'll need
>>  		 * to free them outside the lock.
>>  		 */
>> -		if (RB_EMPTY_ROOT(&anon_vma->rb_root))
>> +		if (RB_EMPTY_ROOT(&anon_vma->rb_root)) {
>> +			anon_vma->parent->degree--;
>>  			continue;
>> +		}
>>  
>>  		list_del(&avc->same_vma);
>>  		anon_vma_chain_free(avc);
>>  	}
>> +	if (vma->anon_vma)
>> +		vma->anon_vma->degree--;
>>  	unlock_anon_vma_root(root);
>>  
>>  	/*
>> @@ -355,6 +381,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
>>  	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
>>  		struct anon_vma *anon_vma = avc->anon_vma;
>>  
>> +		BUG_ON(anon_vma->degree);
>>  		put_anon_vma(anon_vma);
>>  
>>  		list_del(&avc->same_vma);
>> -- 
>> 2.1.3
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  reply	other threads:[~2014-12-05 15:45 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-16  2:46 Repeated fork() causes SLAB to grow without bound Daniel Forrest
2012-08-16 18:58 ` Rik van Riel
2012-08-16 18:58   ` Rik van Riel
2012-08-18  0:03   ` Daniel Forrest
2012-08-18  0:03     ` Daniel Forrest
2012-08-18  3:46     ` Rik van Riel
2012-08-18  3:46       ` Rik van Riel
2012-08-18  4:07       ` Daniel Forrest
2012-08-18  4:07         ` Daniel Forrest
2012-08-18  4:10         ` Rik van Riel
2012-08-18  4:10           ` Rik van Riel
2012-08-20  8:00       ` Hugh Dickins
2012-08-20  8:00         ` Hugh Dickins
2012-08-20  9:39         ` Michel Lespinasse
2012-08-20  9:39           ` Michel Lespinasse
2012-08-20 11:11           ` Andi Kleen
2012-08-20 11:11             ` Andi Kleen
2012-08-20 11:17           ` Rik van Riel
2012-08-20 11:17             ` Rik van Riel
2012-08-20 11:53             ` Michel Lespinasse
2012-08-20 11:53               ` Michel Lespinasse
2012-08-20 19:11               ` Michel Lespinasse
2012-08-20 19:11                 ` Michel Lespinasse
2012-08-22  3:20           ` [RFC PATCH] " Michel Lespinasse
2012-08-22  3:20             ` Michel Lespinasse
2012-08-22  3:29             ` Rik van Riel
2012-08-22  3:29               ` Rik van Riel
2013-06-03 19:50               ` Daniel Forrest
2013-06-03 19:50                 ` Daniel Forrest
2013-06-04 10:37                 ` Rik van Riel
2013-06-04 10:37                   ` Rik van Riel
2013-06-05 14:02                   ` Andrea Arcangeli
2013-06-05 14:02                     ` Andrea Arcangeli
2014-11-14 16:30                 ` [PATCH] " Daniel Forrest
2014-11-14 16:30                   ` Daniel Forrest
2014-11-18  0:02                   ` Andrew Morton
2014-11-18  0:02                     ` Andrew Morton
2014-11-18  1:41                     ` Daniel Forrest
2014-11-18  1:41                       ` Daniel Forrest
2014-11-18  2:41                       ` Rik van Riel
2014-11-18  2:41                         ` Rik van Riel
2014-11-18 20:19                         ` Andrew Morton
2014-11-18 20:19                           ` Andrew Morton
2014-11-18 22:15                           ` Konstantin Khlebnikov
2014-11-18 22:15                             ` Konstantin Khlebnikov
2014-11-18 23:02                             ` Konstantin Khlebnikov
2014-11-18 23:50                               ` Vlastimil Babka
2014-11-18 23:50                                 ` Vlastimil Babka
2014-11-19 14:36                                 ` Konstantin Khlebnikov
2014-11-19 14:36                                   ` Konstantin Khlebnikov
2014-11-19 16:09                                   ` Vlastimil Babka
2014-11-19 16:09                                     ` Vlastimil Babka
2014-11-19 16:58                                     ` Konstantin Khlebnikov
2014-11-19 16:58                                       ` Konstantin Khlebnikov
2014-11-19 23:14                                       ` Michel Lespinasse
2014-11-19 23:14                                         ` Michel Lespinasse
2014-11-20 14:42                                         ` Konstantin Khlebnikov
2014-11-20 14:42                                           ` Konstantin Khlebnikov
2014-11-20 14:50                                           ` Rik van Riel
2014-11-20 14:50                                             ` Rik van Riel
2014-11-20 15:03                                             ` Konstantin Khlebnikov
2014-11-20 15:03                                               ` Konstantin Khlebnikov
2014-11-24  7:09                                               ` Konstantin Khlebnikov
2014-11-25 10:59                                                 ` Michal Hocko
2014-11-25 10:59                                                   ` Michal Hocko
2014-11-25 12:13                                                   ` Konstantin Khlebnikov
2014-11-25 15:00                                                     ` Michal Hocko
2014-11-25 15:00                                                       ` Michal Hocko
2014-11-26 17:35                                                       ` Michal Hocko
2014-11-26 17:35                                                         ` Michal Hocko
2014-12-05 15:44                                                         ` Jerome Marchand [this message]
2014-11-20 15:27                                           ` Michel Lespinasse
2014-11-20 15:27                                             ` Michel Lespinasse
2014-11-19  2:48                           ` Rik van Riel
2014-11-19  2:48                             ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5481D2F0.2090908@redhat.com \
    --to=jmarchan@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.forrest@ssec.wisc.edu \
    --cc=hughd@google.com \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=tim@edgecast.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.