From: David Rientjes <rientjes@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages
Date: Thu, 24 Oct 2019 11:59:43 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.21.1910241156370.130350@chino.kir.corp.google.com> (raw)
In-Reply-To: <53c4a6ca-a4d0-0862-8744-f999b17d82d8@suse.cz>
On Wed, 23 Oct 2019, Vlastimil Babka wrote:
> From 8bd960e4e8e7e99fe13baf0d00b61910b3ae8d23 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 1 Oct 2019 14:20:58 +0200
> Subject: [PATCH] mm, thp: tweak reclaim/compaction effort of local-only and
> all-node allocations
>
> THP page faults now attempt a __GFP_THISNODE allocation first, which should
> only compact existing free memory, followed by another attempt that can
> allocate from any node using reclaim/compaction effort specified by global
> defrag setting and madvise.
>
> This patch makes the following changes to the scheme:
>
> - before the patch, the first allocation relies on a check for pageblock order
> and __GFP_IO to prevent excessive reclaim. This however affects also the
> second attempt, which is not limited to single node. Instead of that, reuse
> the existing check for costly order __GFP_NORETRY allocations, and make sure
> the first THP attempt uses __GFP_NORETRY. As a side-effect, all costly order
> __GFP_NORETRY allocations will bail out if compaction needs reclaim, while
> previously they only bailed out when compaction was deferred due to previous
> failures. This should be still acceptable within the __GFP_NORETRY semantics.
>
> - before the patch, the second allocation attempt (on all nodes) was passing
> __GFP_NORETRY. This is redundant as the check for pageblock order (discussed
> above) was stronger. It's also contrary to madvise(MADV_HUGEPAGE) which means
> some effort to allocate THP is requested. After this patch, the second
> attempt doesn't pass __GFP_THISNODE nor __GFP_NORETRY.
>
> To sum up, THP page faults now try the following attempt:
>
> 1. local node only THP allocation with no reclaim, just compaction.
> 2. THP allocation from any node with effort determined by global defrag setting
> and VMA madvise
> 3. fallback to base pages on any node
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/mempolicy.c | 16 +++++++++-------
> mm/page_alloc.c | 24 +++++-------------------
> 2 files changed, 14 insertions(+), 26 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4ae967bcf954..2c48146f3ee2 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2129,18 +2129,20 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
> nmask = policy_nodemask(gfp, pol);
> if (!nmask || node_isset(hpage_node, *nmask)) {
> mpol_cond_put(pol);
> + /*
> + * First, try to allocate THP only on local node, but
> + * don't reclaim unnecessarily, just compact.
> + */
> page = __alloc_pages_node(hpage_node,
> - gfp | __GFP_THISNODE, order);
> + gfp | __GFP_THISNODE | __GFP_NORETRY, order);
>
> /*
> - * If hugepage allocations are configured to always
> - * synchronous compact or the vma has been madvised
> - * to prefer hugepage backing, retry allowing remote
> - * memory as well.
> + * If that fails, allow both compaction and reclaim,
> + * but on all nodes.
> */
> - if (!page && (gfp & __GFP_DIRECT_RECLAIM))
> + if (!page)
> page = __alloc_pages_node(hpage_node,
> - gfp | __GFP_NORETRY, order);
> + gfp, order);
>
> goto out;
> }
Hi Vlastimil,
For the default case where thp enabled is not set to "always" and the VMA
is not madvised for MADV_HUGEPAGE, how does this prefer to return node
local pages rather than remote hugepages? The idea is to optimize for
access latency when the vma has not been explicitly madvised.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ecc3dbad606b..36d7d852f7b1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4473,8 +4473,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (page)
> goto got_pg;
>
> - if (order >= pageblock_order && (gfp_mask & __GFP_IO) &&
> - !(gfp_mask & __GFP_RETRY_MAYFAIL)) {
> + /*
> + * Checks for costly allocations with __GFP_NORETRY, which
> + * includes some THP page fault allocations
> + */
> + if (costly_order && (gfp_mask & __GFP_NORETRY)) {
> /*
> * If allocating entire pageblock(s) and compaction
> * failed because all zones are below low watermarks
> @@ -4495,23 +4498,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> if (compact_result == COMPACT_SKIPPED ||
> compact_result == COMPACT_DEFERRED)
> goto nopage;
> - }
> -
> - /*
> - * Checks for costly allocations with __GFP_NORETRY, which
> - * includes THP page fault allocations
> - */
> - if (costly_order && (gfp_mask & __GFP_NORETRY)) {
> - /*
> - * If compaction is deferred for high-order allocations,
> - * it is because sync compaction recently failed. If
> - * this is the case and the caller requested a THP
> - * allocation, we do not want to heavily disrupt the
> - * system, so we fail the allocation instead of entering
> - * direct reclaim.
> - */
> - if (compact_result == COMPACT_DEFERRED)
> - goto nopage;
>
> /*
> * Looks like reclaim/compaction is worth trying, but
> --
> 2.23.0
>
>
next prev parent reply other threads:[~2019-10-24 18:59 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-04 19:54 [patch for-5.3 0/4] revert immediate fallback to remote hugepages David Rientjes
2019-09-04 19:54 ` [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed David Rientjes
2019-09-05 9:00 ` Michal Hocko
2019-09-05 11:22 ` Vlastimil Babka
2019-09-05 20:53 ` Mike Kravetz
2019-09-06 20:16 ` David Rientjes
2019-09-06 20:49 ` David Rientjes
2019-09-04 20:43 ` [patch for-5.3 0/4] revert immediate fallback to remote hugepages Linus Torvalds
2019-09-05 20:54 ` David Rientjes
2019-09-07 19:51 ` David Rientjes
2019-09-07 19:55 ` Linus Torvalds
2019-09-08 1:50 ` David Rientjes
2019-09-08 12:47 ` Vlastimil Babka
2019-09-08 20:45 ` David Rientjes
2019-09-09 8:37 ` Michal Hocko
2019-09-04 20:55 ` Andrea Arcangeli
2019-09-05 21:06 ` David Rientjes
2019-09-09 19:30 ` Michal Hocko
2019-09-25 7:08 ` Michal Hocko
2019-09-26 19:03 ` David Rientjes
2019-09-27 7:48 ` Michal Hocko
2019-09-28 20:59 ` Linus Torvalds
2019-09-30 11:28 ` Michal Hocko
2019-10-01 5:43 ` Michal Hocko
2019-10-01 8:37 ` Michal Hocko
2019-10-18 14:15 ` Michal Hocko
2019-10-23 11:03 ` Vlastimil Babka
2019-10-24 18:59 ` David Rientjes [this message]
2019-10-29 14:14 ` Vlastimil Babka
2019-10-29 15:15 ` Michal Hocko
2019-10-29 21:33 ` Andrew Morton
2019-10-29 21:45 ` Vlastimil Babka
2019-10-29 23:25 ` David Rientjes
2019-11-05 13:02 ` Michal Hocko
2019-11-06 1:01 ` David Rientjes
2019-11-06 7:35 ` Michal Hocko
2019-11-06 21:32 ` David Rientjes
2019-11-13 11:20 ` Mel Gorman
2019-11-25 0:10 ` David Rientjes
2019-11-25 11:47 ` Michal Hocko
2019-11-25 20:38 ` David Rientjes
2019-11-25 21:34 ` Vlastimil Babka
2019-10-01 13:50 ` Vlastimil Babka
2019-10-01 20:31 ` David Rientjes
2019-10-01 21:54 ` Vlastimil Babka
2019-10-02 10:34 ` Michal Hocko
2019-10-02 22:32 ` David Rientjes
2019-10-03 8:00 ` Vlastimil Babka
2019-10-04 12:18 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.1910241156370.130350@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).