linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Khalid Aziz <khalid.aziz@oracle.com>
To: David Rientjes <rientjes@google.com>, chukaiping <chukaiping@baidu.com>
Cc: mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com,
	akpm@linux-foundation.org, vbabka@suse.cz, nigupta@nvidia.com,
	bhe@redhat.com, iamjoonsoo.kim@lge.com, mateusznosek0@gmail.com,
	sh_def@163.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3] mm/compaction:let proactive compaction order configurable
Date: Thu, 6 May 2021 17:27:21 -0400	[thread overview]
Message-ID: <2f21dec9-065f-e234-f531-c6643965c0cb@oracle.com> (raw)
In-Reply-To: <f941268c-b91-594b-5de3-05fc418fbd0@google.com>

On 4/25/21 9:15 PM, David Rientjes wrote:
> On Sun, 25 Apr 2021, chukaiping wrote:
> 
>> Currently the proactive compaction order is fixed to
>> COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of
>> normal 4KB memory, but it's too high for the machines with small
>> normal memory, for example the machines with most memory configured
>> as 1GB hugetlbfs huge pages. In these machines the max order of
>> free pages is often below 9, and it's always below 9 even with hard
>> compaction. This will lead to proactive compaction be triggered very
>> frequently. In these machines we only care about order of 3 or 4.
>> This patch export the oder to proc and let it configurable
>> by user, and the default value is still COMPACTION_HPAGE_ORDER.
>>
> 
> As asked in the review of the v1 of the patch, why is this not a userspace
> policy decision?  If you are interested in order-3 or order-4
> fragmentation, for whatever reason, you could periodically check
> /proc/buddyinfo and manually invoke compaction on the system.
> 
> In other words, why does this need to live in the kernel?
> 

I have struggled with this question. Fragmentation and allocation stalls are significant issues on large database 
systems which also happen to use memory in similar ways (90+% of memory is allocated as hugepages) leaving just enough 
memory to run rest of the userspace processes. I had originally proposed a kernel patch to monitor, do a trend analysis 
of memory usage and take proactive action - 
<https://lore.kernel.org/lkml/20190813014012.30232-1-khalid.aziz@oracle.com/>. Based upon feedback, I moved the 
implementation to userspace - <https://github.com/oracle/memoptimizer>. Test results across multiple workloads have been 
very good. Results from one of the workloads are in this blog - 
<https://blogs.oracle.com/linux/anticipating-your-memory-needs>. It works well from userspace but it has limited ways to 
influence reclamation and compaction. It uses watermark_scale_factor to boost watermarks and cause reclamation to kick 
in earlier and run longer. It uses /sys/devices/system/node/node%d/compact to force compaction on the node expected to 
reach high level of fragmentation soon. Neither of these is very efficient from userspace even though they get the job 
done. Scaling watermark has longer lasting impact than raising scanning priority in balance_pgdat() temporarily. I plan 
to experiment with watermark_boost_factor to see if I can use it in place of /sys/devices/system/node/node%d/compact and 
get the same results. Doing all of this in the kernel can be more efficient and lessen potential negative impact on the 
system. On the other hand, it is easier to fix and update such policies in userspace although at the cost of having a 
performance critical component live outside the kernel and thus not be active on the system by default.

--
Khalid


  parent reply	other threads:[~2021-05-06 21:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-25  1:21 [PATCH v3] mm/compaction:let proactive compaction order configurable chukaiping
2021-04-26  1:15 ` David Rientjes
2021-04-26  1:29   ` 答复: " Chu,Kaiping
2021-04-26  1:48     ` David Rientjes
2021-04-28  1:38       ` 答复: " Chu,Kaiping
2021-05-06 21:27   ` Khalid Aziz [this message]
2021-05-11  7:48     ` Chu,Kaiping
2021-05-11 15:00       ` Khalid Aziz
2021-04-26  1:31 ` Rafael Aquini
2021-04-28  1:17   ` 答复: " Chu,Kaiping
2021-04-29 19:45     ` Rafael Aquini
2021-05-06  1:08       ` 答复: " Chu,Kaiping

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f21dec9-065f-e234-f531-c6643965c0cb@oracle.com \
    --to=khalid.aziz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=chukaiping@baidu.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mateusznosek0@gmail.com \
    --cc=mcgrof@kernel.org \
    --cc=nigupta@nvidia.com \
    --cc=rientjes@google.com \
    --cc=sh_def@163.com \
    --cc=vbabka@suse.cz \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).