linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [RFC 05/13] mm, page_alloc: make THP-specific decisions more generic
Date: Tue, 10 May 2016 09:35:55 +0200	[thread overview]
Message-ID: <1462865763-22084-6-git-send-email-vbabka@suse.cz> (raw)
In-Reply-To: <1462865763-22084-1-git-send-email-vbabka@suse.cz>

Since THP allocations during page faults can be costly, extra decisions are
employed for them to avoid excessive reclaim and compaction, if the initial
compaction doesn't look promising. The detection has never been perfect as
there is no gfp flag specific to THP allocations. At this moment it checks the
whole combination of flags that makes up GFP_TRANSHUGE, and hopes that no other
users of such combination exist, or would mind being treated the same way.
Extra care is also taken to separate allocations from khugepaged, where latency
doesn't matter that much.

It is however possible to distinguish these allocations in a simpler and more
reliable way. The key observation is that after the initial compaction followed
by the first iteration of "standard" reclaim/compaction, both __GFP_NORETRY
allocations and costly allocations without __GFP_REPEAT are declared as
failures:

        /* Do not loop if specifically requested */
        if (gfp_mask & __GFP_NORETRY)
                goto nopage;

        /*
         * Do not retry costly high order allocations unless they are
         * __GFP_REPEAT
         */
        if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
                goto nopage;

This means we can further distinguish allocations that are costly order *and*
additionally include the __GFP_NORETRY flag. As it happens, GFP_TRANSHUGE
allocations do already fall into this category. This will also allow other
costly allocations with similar high-order benefit vs latency considerations to
use this semantic. Furthermore, we can distinguish THP allocations that should
try a bit harder (such as from khugepageed) by removing __GFP_NORETRY, as will
be done in the next patch.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 88d680b3e7b6..f5d931e0854a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3182,7 +3182,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-
 /*
  * Maximum number of compaction retries wit a progress before OOM
  * killer is consider as the only way to move forward.
@@ -3447,11 +3446,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
 	return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS);
 }
 
-static inline bool is_thp_gfp_mask(gfp_t gfp_mask)
-{
-	return (gfp_mask & (GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM)) == GFP_TRANSHUGE;
-}
-
 /*
  * Maximum number of reclaim retries without any progress before OOM killer
  * is consider as the only way to move forward.
@@ -3610,8 +3604,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		if (page)
 			goto got_pg;
 
-		/* Checks for THP-specific high-order allocations */
-		if (is_thp_gfp_mask(gfp_mask)) {
+		/*
+		 * Checks for costly allocations with __GFP_NORETRY, which
+		 * includes THP page fault allocations
+		 */
+		if (gfp_mask & __GFP_NORETRY) {
 			/*
 			 * If compaction is deferred for high-order allocations,
 			 * it is because sync compaction recently failed. If
@@ -3631,11 +3628,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				goto nopage;
 
 			/*
-			 * It can become very expensive to allocate transparent
-			 * hugepages at fault, so use asynchronous memory
-			 * compaction for THP unless it is khugepaged trying to
-			 * collapse. All other requests should tolerate at
-			 * least light sync migration.
+			 * Looks like reclaim/compaction is worth trying, but
+			 * sync compaction could be very expensive, so keep
+			 * using async compaction, unless it's khugepaged
+			 * trying to collapse.
 			 */
 			if (!(current->flags & PF_KTHREAD))
 				migration_mode = MIGRATE_ASYNC;
-- 
2.8.2

  parent reply	other threads:[~2016-05-10  7:39 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-10  7:35 [RFC 00/13] make direct compaction more deterministic Vlastimil Babka
2016-05-10  7:35 ` [RFC 01/13] mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode Vlastimil Babka
2016-05-11 12:40   ` Michal Hocko
2016-05-10  7:35 ` [RFC 02/13] mm, page_alloc: set alloc_flags only once in slowpath Vlastimil Babka
2016-05-10 11:28   ` Tetsuo Handa
2016-05-10 12:30     ` Vlastimil Babka
2016-05-12 12:41       ` Michal Hocko
2016-05-31  6:20       ` Joonsoo Kim
2016-05-31  7:59         ` Vlastimil Babka
2016-06-02  1:50           ` Joonsoo Kim
2016-05-10  7:35 ` [RFC 03/13] mm, page_alloc: don't retry initial attempt " Vlastimil Babka
2016-05-12 12:48   ` Michal Hocko
2016-05-31  6:25   ` Joonsoo Kim
2016-05-31 12:03     ` Vlastimil Babka
2016-05-10  7:35 ` [RFC 04/13] mm, page_alloc: restructure direct compaction handling " Vlastimil Babka
2016-05-12 13:29   ` Michal Hocko
2016-05-13  8:10     ` Vlastimil Babka
2016-05-13  8:31       ` Michal Hocko
2016-05-10  7:35 ` Vlastimil Babka [this message]
2016-05-12 13:43   ` [RFC 05/13] mm, page_alloc: make THP-specific decisions more generic Michal Hocko
2016-05-10  7:35 ` [RFC 06/13] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations Vlastimil Babka
2016-05-12 16:20   ` Michal Hocko
2016-05-13  8:23     ` Vlastimil Babka
2016-05-13 12:05       ` Michal Hocko
2016-05-18 11:59         ` Vlastimil Babka
2016-05-18 15:24           ` Michal Hocko
2016-05-20 13:57             ` Vlastimil Babka
2016-05-23  8:39               ` Michal Hocko
2016-05-10  7:35 ` [RFC 07/13] mm, compaction: introduce direct compaction priority Vlastimil Babka
2016-05-13 12:37   ` Michal Hocko
2016-05-10  7:35 ` [RFC 08/13] mm, compaction: simplify contended compaction handling Vlastimil Babka
2016-05-13 13:09   ` Michal Hocko
2016-05-16  7:10     ` Vlastimil Babka
2016-05-10  7:35 ` [RFC 09/13] mm, compaction: make whole_zone flag ignore cached scanner positions Vlastimil Babka
2016-05-13 13:23   ` Michal Hocko
2016-05-10  7:36 ` [RFC 10/13] mm, compaction: cleanup unused functions Vlastimil Babka
2016-05-10  7:36 ` [RFC 11/13] mm, compaction: add the ultimate direct compaction priority Vlastimil Babka
2016-05-13 13:38   ` Michal Hocko
2016-05-16  7:17     ` Vlastimil Babka
2016-05-16  8:11       ` Michal Hocko
2016-05-18 12:46       ` Vlastimil Babka
2016-05-10  7:36 ` [RFC 12/13] mm, compaction: more reliably increase " Vlastimil Babka
2016-05-10 12:55   ` Vlastimil Babka
2016-05-13 14:15   ` Michal Hocko
2016-05-16  7:31     ` Vlastimil Babka
2016-05-16  8:14       ` Michal Hocko
2016-05-16  9:27         ` Vlastimil Babka
2016-05-16  9:52           ` Michal Hocko
2016-05-31  6:37   ` Joonsoo Kim
2016-05-31 12:07     ` Vlastimil Babka
2016-05-31 12:29       ` Vlastimil Babka
2016-06-02  2:50         ` Joonsoo Kim
2016-05-10  7:36 ` [RFC 13/13] mm, compaction: fix and improve watermark handling Vlastimil Babka
2016-05-16  9:25   ` Michal Hocko
2016-05-16  9:50     ` Vlastimil Babka
2016-05-16 12:30       ` Michal Hocko
2016-05-18 13:50     ` Mel Gorman
2016-05-18 14:27       ` Michal Hocko
2016-05-18 14:40         ` Mel Gorman
2016-05-17 20:01 ` [RFC 00/13] make direct compaction more deterministic Michal Hocko
2016-05-18  7:19   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1462865763-22084-6-git-send-email-vbabka@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).