linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Nitin Gupta <nigupta@nvidia.com>,
	akpm@linux-foundation.org, vbabka@suse.cz,
	mgorman@techsingularity.net, mhocko@suse.com,
	dan.j.williams@intel.com, Yu Zhao <yuzhao@google.com>,
	Matthew Wilcox <willy@infradead.org>, Qian Cai <cai@lca.pw>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Roman Gushchin <guro@fb.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kees Cook <keescook@chromium.org>, Jann Horn <jannh@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Arun KS <arunks@codeaurora.org>,
	Janne Huttunen <janne.huttunen@nokia.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC] mm: Proactive compaction
Date: Tue, 17 Sep 2019 13:26:52 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1909171318070.161860@chino.kir.corp.google.com> (raw)
In-Reply-To: <f4a74669-b86b-741a-1c2b-c117878734c6@nvidia.com>

On Tue, 17 Sep 2019, John Hubbard wrote:

> > We've had good success with periodically compacting memory on a regular 
> > cadence on systems with hugepages enabled.  The cadence itself is defined 
> > by the admin but it causes khugepaged[*] to periodically wakeup and invoke 
> > compaction in an attempt to keep zones as defragmented as possible 
> 
> That's an important data point, thanks for reporting it. 
> 
> And given that we have at least one data point validating it, I think we
> should feel fairly comfortable with this approach. Because the sys admin 
> probably knows  when are the best times to steal cpu cycles and recover 
> some huge pages. Unlike the kernel, the sys admin can actually see the 
> future sometimes, because he/she may know what is going to be run.
> 
> It's still sounding like we can expect excellent results from simply 
> defragmenting from user space, via a chron job and/or before running
> important tests, rather than trying to have the kernel guess whether 
> it's a performance win to defragment at some particular time.
> 
> Are you using existing interfaces, or did you need to add something? How
> exactly are you triggering compaction?
> 

It's possible to do this through a cron job but there are a fre reasons 
that we preferred to do it through khugepaged:

 - we use a lighter variation of compaction, MIGRATE_SYNC_LIGHT, than what 
   the per-node trigger provides since compact_node() forces MIGRATE_SYNC
   and can stall for minutes and become disruptive under some
   circumstances,

 - we do not ignore the pageblock skip hint which compact_node() hardcodes 
   to ignore, and 

 - we didn't want to do this in process context so that the cpu time is
   not taxed to any user cgroup since it's on behalf of the system as a
   whole.

It seems much better to do this on a per-node basis rather than through 
the sysctl to do it for the whole system to partition the work.  Extending 
the per-node interface to do MIGRATE_SYNC_LIGHT and not ignore pageblock 
skip is possible but the work done would still be done in process context 
so if done from userspace this would need to be attached to a cgroup that 
does not tax that cgroup for usage done on behalf of the entire system.

Again, we're using khugepaged and allowing the period to be defined 
through /sys/kernel/mm/transparent_hugepage/khugepaged but that is because 
we only want to do this on systems where we want to dynamically allocate 
hugepages on a regular basis.

  reply	other threads:[~2019-09-17 20:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16 21:43 [RFC] mm: Proactive compaction Nitin Gupta
2019-08-20  8:46 ` Vlastimil Babka
2019-08-20 21:35   ` Nitin Gupta
2019-08-24  7:24   ` Khalid Aziz
2019-09-19 23:37   ` Nitin Gupta
2019-09-24 13:39     ` Vlastimil Babka
2019-09-24 14:11       ` Khalid Aziz
2019-08-20 22:20 ` Matthew Wilcox
2019-08-21 23:23   ` Nitin Gupta
2019-08-22  8:51 ` Mel Gorman
2019-08-22 21:57   ` Nitin Gupta
2019-08-26 11:47     ` Mel Gorman
2019-08-27 20:36       ` Nitin Gupta
2019-09-19 23:22   ` Nitin Gupta
2019-09-16 20:16 ` David Rientjes
2019-09-16 20:50   ` Nitin Gupta
2019-09-17 19:46   ` John Hubbard
2019-09-17 20:26     ` David Rientjes [this message]
2019-11-22 22:31   ` Nitin Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1909171318070.161860@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arunks@codeaurora.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=janne.huttunen@nokia.com \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=nigupta@nvidia.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).