linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: chukaiping <chukaiping@baidu.com>,
	mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com,
	vbabka@suse.cz, nigupta@nvidia.com, bhe@redhat.com,
	khalid.aziz@oracle.com, iamjoonsoo.kim@lge.com,
	mateusznosek0@gmail.com, sh_def@163.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH v4] mm/compaction: let proactive compaction order configurable
Date: Mon, 10 May 2021 21:20:08 -0700 (PDT)	[thread overview]
Message-ID: <bedd6e68-bb9b-2f3b-7aaf-a0877e025a7@google.com> (raw)
In-Reply-To: <20210509171748.8dbc70ceccc5cc1ae61fe41c@linux-foundation.org>

On Sun, 9 May 2021, Andrew Morton wrote:

> > Currently the proactive compaction order is fixed to
> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
> > normal 4KB memory, but it's too high for the machines with small
> > normal memory, for example the machines with most memory configured
> > as 1GB hugetlbfs huge pages. In these machines the max order of
> > free pages is often below 9, and it's always below 9 even with hard
> > compaction. This will lead to proactive compaction be triggered very
> > frequently. In these machines we only care about order of 3 or 4.
> > This patch export the oder to proc and let it configurable
> > by user, and the default value is still COMPACTION_HPAGE_ORDER.
> 
> It would be great to do this automatically?  It's quite simple to see
> when memory is being handed out to hugetlbfs - so can we tune
> proactive_compaction_order in response to this?  That would be far
> better than adding a manual tunable.
> 
> But from having read Khalid's comments, that does sound quite involved.
> Is there some partial solution that we can come up with that will get
> most people out of trouble?
> 
> That being said, this patch is super-super-simple so perhaps we should
> just merge it just to get one person (and hopefully a few more) out of
> trouble.  But on the other hand, once we add a /proc tunable we must
> maintain that tunable for ever (or at least a very long time) even if
> the internal implementations change a lot.
> 

As mentioned in v3 of the patch, I'm not sure why this belongs in the 
kernel at all.

I understand that the system is largely consumed by 1GB gigantic pages and 
that a small percentage of memory is left for native pages.  Thus, 
fragmentation readily occurs and can affect large order allocations even 
at the levels of order-3 or order-4.

So it seems like the ideal solution would be to monitor the fragmentation 
index at the order you care about (the same order you would use for this 
new tunable) and root userspace would manually trigger compaction when 
necessary.  When this was brought up, it was commented that explicitly 
triggered compaction is too expensive to do all in one iteration.  That's 
fair enough, but shouldn't that be an improvement on explicitly triggered 
compaction through sysfs to provide a shorter term (or weaker form) of 
compaction rather than build additional policy decisions into the kernel?

If done this way, there would be a clear separation between mechanism and 
policy and the kernel would not need to carry these sysctls to tune very 
niche areas.

  parent reply	other threads:[~2021-05-11  4:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28  2:28 chukaiping
2021-05-10  0:17 ` Andrew Morton
2021-05-10  2:10   ` 答复: " Chu,Kaiping
2021-05-11  4:20   ` David Rientjes [this message]
2021-05-28 17:42 ` Vlastimil Babka
2021-06-01  1:15   ` 答复: " Chu,Kaiping
2021-06-16 13:49     ` Vlastimil Babka
2021-06-09 10:44 ` David Hildenbrand
2021-06-15  1:11   ` 答复: " Chu,Kaiping
2021-06-15  8:04     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bedd6e68-bb9b-2f3b-7aaf-a0877e025a7@google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=chukaiping@baidu.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=keescook@chromium.org \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mateusznosek0@gmail.com \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=nigupta@nvidia.com \
    --cc=sh_def@163.com \
    --cc=vbabka@suse.cz \
    --cc=yzaikin@google.com \
    --subject='Re: [PATCH v4] mm/compaction: let proactive compaction order configurable' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).