linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org, Andi Kleen <ak@linux.intel.com>,
	"H. Peter Anvin" <hpa@linux.intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 10/10] thp: implement refcounting for huge zero page
Date: Thu, 25 Oct 2012 14:37:07 -0700	[thread overview]
Message-ID: <20121025143707.b212d958.akpm@linux-foundation.org> (raw)
In-Reply-To: <20121025212251.GA31749@shutemov.name>

On Fri, 26 Oct 2012 00:22:51 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Thu, Oct 25, 2012 at 02:05:24PM -0700, Andrew Morton wrote:
> > On Thu, 25 Oct 2012 23:49:59 +0300
> > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > 
> > > On Wed, Oct 24, 2012 at 01:25:52PM -0700, Andrew Morton wrote:
> > > > On Wed, 24 Oct 2012 22:45:52 +0300
> > > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > > 
> > > > > On Wed, Oct 24, 2012 at 12:22:53PM -0700, Andrew Morton wrote:
> > > > > > 
> > > > > > I'm thinking that such a workload would be the above dd in parallel
> > > > > > with a small app which touches the huge page and then exits, then gets
> > > > > > executed again.  That "small app" sounds realistic to me.  Obviously
> > > > > > one could exercise the zero page's refcount at higher frequency with a
> > > > > > tight map/touch/unmap loop, but that sounds less realistic.  It's worth
> > > > > > trying that exercise as well though.
> > > > > > 
> > > > > > Or do something else.  But we should try to probe this code's
> > > > > > worst-case behaviour, get an understanding of its effects and then
> > > > > > decide whether any such workload is realisic enough to worry about.
> > > > > 
> > > > > Okay, I'll try few memory pressure scenarios.
> > > 
> > > A test program:
> > > 
> > >         while (1) {
> > >                 posix_memalign((void **)&p, 2 * MB, 2 * MB);
> > >                 assert(*p == 0);
> > >                 free(p);
> > >         }
> > > 
> > > With this code in background we have pretty good chance to have huge zero
> > > page freeable (refcount == 1) when shrinker callback called - roughly one
> > > of two.
> > > 
> > > Pagecache hog (dd if=hugefile of=/dev/null bs=1M) creates enough pressure
> > > to get shrinker callback called, but it was only asked about cache size
> > > (nr_to_scan == 0).
> > > I was not able to get it called with nr_to_scan > 0 on this scenario, so
> > > hzp never freed.
> > 
> > hm.  It's odd that the kernel didn't try to shrink slabs in this case. 
> > Why didn't it??
> 
> nr_to_scan == 0 asks for the fast path. shrinker callback can shink, if
> it thinks it's good idea.

What nr_objects does your shrinker return in that case?  If it's "1"
then it wouild be unsurprising that the core code decides not to
shrink.

> > 
> > > I also tried another scenario: usemem -n16 100M -r 1000. It creates real
> > > memory pressure - no easy reclaimable memory. This time callback called
> > > with nr_to_scan > 0 and we freed hzp. Under pressure we fails to allocate
> > > hzp and code goes to fallback path as it supposed to.
> > > 
> > > Do I need to check any other scenario?
> > 
> > I'm thinking that if we do hit problems in this area, we could avoid
> > freeing the hugepage unless the scan_control.priority is high enough. 
> > That would involve adding a magic number or a tunable to set the
> > threshold.
> 
> What about ratelimit on alloc path to force fallback if we allocate
> to often? Is it good idea?

mmm...  ratelimit via walltime is always a bad idea.  We could
ratelimit by "number of times the shrinker was called", and maybe that
would work OK, unsure.

It *is* appropriate to use sc->priority to be more reluctant to release
expensive-to-reestablish objects.  But there is already actually a
mechanism in the shrinker code to handle this: the shrink_control.seeks
field.  That was originally added to provide an estimate of "how
expensive will it be to recreate this object if we were to reclaim it".
So perhaps we could generalise that a bit, and state that the zero
hugepage is an expensive thing.

I don't think the shrink_control.seeks facility had ever been used much,
so it's possible that it is presently mistuned or not working very
well.



  reply	other threads:[~2012-10-25 21:37 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15  6:00 [PATCH v4 00/10, REBASED] Introduce huge zero page Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 01/10] thp: huge zero page: basic preparation Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 02/10] thp: zap_huge_pmd(): zap huge zero pmd Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 03/10] thp: copy_huge_pmd(): copy huge zero page Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 04/10] thp: do_huge_pmd_wp_page(): handle " Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 05/10] thp: change_huge_pmd(): keep huge zero page write-protected Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 06/10] thp: change split_huge_page_pmd() interface Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 07/10] thp: implement splitting pmd for huge zero page Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 08/10] thp: setup huge zero page on non-write page fault Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 09/10] thp: lazy huge zero page allocation Kirill A. Shutemov
2012-10-15  6:00 ` [PATCH v4 10/10] thp: implement refcounting for huge zero page Kirill A. Shutemov
2012-10-18 23:45   ` Andrew Morton
2012-10-18 23:59     ` Kirill A. Shutemov
2012-10-23  6:35       ` Kirill A. Shutemov
2012-10-23  6:43         ` Andrew Morton
2012-10-23  7:00           ` Kirill A. Shutemov
2012-10-23 22:59             ` Andrew Morton
2012-10-23 23:38               ` Kirill A. Shutemov
2012-10-24 19:22                 ` Andrew Morton
2012-10-24 19:45                   ` Kirill A. Shutemov
2012-10-24 20:25                     ` Andrew Morton
2012-10-24 20:33                       ` Kirill A. Shutemov
2012-10-24 20:44                         ` Andi Kleen
2012-10-25 20:49                       ` Kirill A. Shutemov
2012-10-25 21:05                         ` Andrew Morton
2012-10-25 21:22                           ` Kirill A. Shutemov
2012-10-25 21:37                             ` Andrew Morton [this message]
2012-10-25 22:10                               ` Kirill A. Shutemov
2012-10-16  9:53 ` [PATCH v4 00/10, REBASED] Introduce " Ni zhan Chen
2012-10-16 10:54   ` Kirill A. Shutemov
2012-10-16 11:13     ` Ni zhan Chen
2012-10-16 11:28       ` Kirill A. Shutemov
2012-10-16 11:37         ` Ni zhan Chen
2012-10-26 15:14 ` [PATCH] thp, vmstat: implement HZP_ALLOC and HZP_ALLOC_FAILED events Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121025143707.b212d958.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=hpa@linux.intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).