From: David Rientjes <rientjes@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@techsingularity.net>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred
Date: Fri, 23 Dec 2016 14:46:43 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.10.1612231428030.88276@chino.kir.corp.google.com> (raw)
In-Reply-To: <20161223111817.GC23109@dhcp22.suse.cz>
On Fri, 23 Dec 2016, Michal Hocko wrote:
> > We have no way to compact memory for users who are not using
> > MADV_HUGEPAGE,
>
> yes we have. it is defrag=always. If you do not want direct compaction
> and the resulting allocation stalls then you have to rely on kcompactd
> which is something we should work longterm.
>
No, the point of madvise(MADV_HUGEPAGE) is for applications to tell the
kernel that they really want hugepages. Really. Everybody else either
never did direct compaction or did a substantially watered down version of
it. Now, we have a situation where you can either do direct compaction
for MADV_HUGEPAGE and nothing for anybody else, or direct compaction for
everybody. In our usecase, we want everybody to kick off background
compaction because order=9 gfp_mask & __GFP_KSWAPD_RECLAIM is the only
thing that is going to trigger background compaction but are unable to do
so without still incurring lengthy pagefaults for non MADV_HUGEPAGE users.
> > which is some customers, others require MADV_HUGEPAGE for
> > .text segment remap while loading their binary, without defrag=always or
> > defrag=defer. The problem is that we want to demand direct compact for
> > MADV_HUGEPAGE: they _really_ want hugepages, it's the point of the
> > madvise.
>
> and that is the point of defrag=madvise to give them this direct
> compaction.
>
Do you see the problem by first suggesting defrag=always at the top of
your reply and then defrag=madvise now? We cannot set both at once, it's
the entire problem with the tristate and now quadstate setting. We want a
combination: EVERYBODY kicks off background compaction and applications
that really want hugepages and are fine with incuring lengthy page fault,
such as those (for the third time) remapping .text segment and doing
madvise(MADV_HUGEPAGE) before fault, can use the madvise.
> > We have no setting, without this patch, to ask for background
> > compaction for everybody so that their fault does not have long latency
> > and for some customers to demand compaction.
>
> that is true and what I am trying to say is that we should aim to give
> this background compaction for everybody via kcompactd because there are
> more users than THP who might benefit from low latency high order pages
> availability.
My patch does that, we _defer_ for everybody unless you're using
madvise(MADV_HUGEPAGE) and really want hugepages. Forget defrag=never
exists, it's not important in the discussion. Forget defrag=always exists
because all apps, like batch jobs, don't want lengthy pagefaults. We have
two options remaining:
- defrag=defer: everybody kicks off background compaction, _nobody_ does
direct compaction
- defrag=madvise: madvise(MADV_HUGEPAGE) does direct compaction,
everybody else does nothing
The point you're missing is that we _want_ defrag=defer. We really do.
We don't want to stall in the page allocator to get thp, but we want to
try to make it available in the short term. However, apps that do
madvise(MADV_HUGEPAGE), like remapping your .text segment and wanting your
text backed by hugepages and incurring the expense up front, or a
database, or a vm, _want_ hugepages now and don't care about lengthy page
faults.
The point is that I HAVE NO SETTING to get that behavior and
defrag=madvise is _not_ a solution because it requires the presence of an
app that is doing madvise(MADV_HUGEPAGE) AND faulting memory to get any
order=9 compaction.
> > ?????? Why does the admin care if a user's page fault wants to reclaim to
> > get high order memory?
>
> Because the whole point of the defrag knob is to allow _administrator_
> control how much we try to fault in THP. And the primary motivation were
> latencies. The whole point of introducing defer option was to _never_
> stall in the page fault while it still allows to kick the background
> compaction. If you really want to tweak any option then madvise would be
> more appropriate IMHO because the semantic would be still clear. Use
> direct compaction for MADV_HUGEPAGE vmas and kick in kswapd/kcompactd
> for others.
>
You want defrag=madvise to start doing background compaction for
everybody, which was never done before for existing users of
defrag=madvise? That might be possible, I don't really care, I just think
it's riskier because there are existing users of defrag=madvise who are
opting in to new behavior because of the kernel change. This patch
changes defrag=defer because it's the new option and people setting the
mode know what they are getting.
I disagree with your description of what the defrag setting is intended
for. The setting of thp defrag is to optimize for apps that truly want
transparent behavior, i.e. they aren't doing madvise(MADV_HUGEPAGE). Are
they willing to incur lengthy pagefaults for thp when not doing any
madvise(2)? defrag=defer should not mean that users of
madvise(MADV_HUGEPAGE) that have clearly specified their intent should not
be allowed to try compacting memory themselves because they have indicated
they are fine with such an expense by doing the madvise(2).
This is obviously fine for Kirill, and I have users who remap their .text
segment and do madvise(MADV_DONTNEED) because they really want hugepages
when they are exec'd, so I'd kindly ask you to consider the real-world use
cases that require background compaction to make hugepages available for
everybody but allow apps to opt-in to take the expense of compaction on
themselves rather than your own theory of what users want.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-23 22:46 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-22 0:21 [patch] mm, thp: always direct reclaim for MADV_HUGEPAGE even when deferred David Rientjes
2016-12-22 8:31 ` Kirill A. Shutemov
2016-12-22 10:00 ` Michal Hocko
2016-12-22 21:05 ` David Rientjes
2016-12-23 8:51 ` Michal Hocko
2016-12-23 10:01 ` David Rientjes
2016-12-23 11:18 ` Michal Hocko
2016-12-23 22:46 ` David Rientjes [this message]
2016-12-26 9:02 ` Michal Hocko
2016-12-27 0:53 ` David Rientjes
2016-12-27 2:32 ` Kirill A. Shutemov
2016-12-27 9:41 ` Michal Hocko
2016-12-27 21:36 ` David Rientjes
2016-12-28 8:48 ` Michal Hocko
2016-12-28 21:33 ` David Rientjes
2016-12-29 8:24 ` Michal Hocko
2016-12-30 12:36 ` Mel Gorman
2016-12-30 12:56 ` Michal Hocko
2016-12-30 14:08 ` Mel Gorman
2016-12-30 22:30 ` David Rientjes
2017-01-03 10:37 ` Mel Gorman
2017-01-03 21:57 ` David Rientjes
2017-01-04 10:12 ` Mel Gorman
2017-01-04 21:53 ` David Rientjes
2017-01-02 8:38 ` Vlastimil Babka
2017-01-03 22:44 ` David Rientjes
2017-01-04 8:32 ` Vlastimil Babka
2017-01-04 9:46 ` Michal Hocko
2017-01-04 22:04 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1612231428030.88276@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).