From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim
Date: Thu, 3 Oct 2019 07:27:00 +0200 [thread overview]
Message-ID: <20191003052700.GB24174@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1910021556270.187014@chino.kir.corp.google.com>
On Wed 02-10-19 16:03:03, David Rientjes wrote:
> Hugetlb allocations use __GFP_RETRY_MAYFAIL to aggressively attempt to get
> hugepages that the user needs. Commit b39d0ee2632d ("mm, page_alloc:
> avoid expensive reclaim when compaction may not succeed") intends to
> improve allocator behind for thp allocations to prevent excessive amounts
> of reclaim especially when constrained to a single node.
>
> Since hugetlb allocations have explicitly preferred to loop and do reclaim
> and compaction, exempt them from this new behavior at least for the time
> being. It is not shown that hugetlb allocation success rate has been
> impacted by commit b39d0ee2632d but hugetlb allocations are admittedly
> beyond the scope of what the patch is intended to address (thp
> allocations).
It has become pretty clear that b39d0ee2632d has regressed hugetlb
allocation success rate for any non-trivial case (complately free
memory) http://lkml.kernel.org/r/20191001054343.GA15624@dhcp22.suse.cz.
And this really is not just about hugetlb requests, really. They are
likely the most obvious example but __GFP_RETRY_MAYFAIL in general is
supposed to try as hard as feasible to success the allocation. The
decision to bail out is done at a different spot and b39d0ee2632d is
effectively bypassing that logic.
Now to the patch itself. I didn't get to test it on my testing
workload but hey steps are clearly documented and easily to set up and
reproduce. I am at a training for today and unlikely to get to test by
the end of the week infortunatelly. Anyway the patch should be fixing
the problem because it explicitly opts out for __GFP_RETRY_MAYFAIL.
I am pretty sure we will need more follow ups because the bail out logic
is simply behaving quite randomly as my measurements show (I would really
appreciate a feedback there). We need a more systematic solution because
the current logic has been rushed through without a proper analysis and
without any actual workloads to verify the effect.
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
Fixes: b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
> Signed-off-by: David Rientjes <rientjes@google.com>
I am willing to give my ack by considering that this is a clear
regression and this is probably the simplest fix but the changelog
should be explicit about the effect (feel free to borrow my numbers and
explanation in this thread).
> ---
> Mike, you eluded that you may want to opt hugetlbfs out of this for the
> time being in https://marc.info/?l=linux-kernel&m=156771690024533 --
> not sure if you want to allow this excessive amount of reclaim for
> hugetlb allocations or not given the swap storms Andrea has shown is
> possible (and nr_hugepages_mempolicy does exist), but hugetlbfs was not
> part of the problem we are trying to address here so no objection to
> opting it out.
>
> You might want to consider how expensive hugetlb allocations can become
> and disruptive to the system if it does not yield additional hugepages,
> but that can be done at any time later as a general improvement rather
> than part of a series aimed at thp.
>
> mm/page_alloc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4467,12 +4467,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (page)
> goto got_pg;
>
> - if (order >= pageblock_order && (gfp_mask & __GFP_IO)) {
> + if (order >= pageblock_order && (gfp_mask & __GFP_IO) &&
> + !(gfp_mask & __GFP_RETRY_MAYFAIL)) {
> /*
> * If allocating entire pageblock(s) and compaction
> * failed because all zones are below low watermarks
> * or is prohibited because it recently failed at this
> - * order, fail immediately.
> + * order, fail immediately unless the allocator has
> + * requested compaction and reclaim retry.
> *
> * Reclaim is
> * - potentially very expensive because zones are far
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2019-10-03 5:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-02 23:03 [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim David Rientjes
2019-10-02 23:37 ` Linus Torvalds
2019-10-03 5:00 ` Michal Hocko
2019-10-03 5:27 ` Michal Hocko [this message]
2019-10-03 8:14 ` Vlastimil Babka
2019-10-03 19:52 ` David Rientjes
2019-10-04 9:28 ` Michal Hocko
2019-10-04 18:02 ` David Rientjes
2019-10-07 22:15 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191003052700.GB24174@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mike.kravetz@oracle.com \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).