Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Michal Hocko <mhocko@kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH v5 6/8] mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
Date: Thu, 21 Jul 2016 09:36:12 +0200
Message-ID: <20160721073614.24395-7-vbabka@suse.cz> (raw)
In-Reply-To: <20160721073614.24395-1-vbabka@suse.cz>

After the previous patch, we can distinguish costly allocations that should be
really lightweight, such as THP page faults, with __GFP_NORETRY. This means we
don't need to recognize khugepaged allocations via PF_KTHREAD anymore. We can
also change THP page faults in areas where madvise(MADV_HUGEPAGE) was used to
try as hard as khugepaged, as the process has indicated that it benefits from
THP's and is willing to pay some initial latency costs.

We can also make the flags handling less cryptic by distinguishing
GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding __GFP_NORETRY
or __GFP_KSWAPD_RECLAIM is done where needed.

The patch effectively changes the current GFP_TRANSHUGE users as follows:

* get_huge_zero_page() - the zero page lifetime should be relatively long and
  it's shared by multiple users, so it's worth spending some effort on it.
  We use GFP_TRANSHUGE, and __GFP_NORETRY is not added. This also restores
  direct reclaim to this allocation, which was unintentionally removed by
  commit e4a49efe4e7e ("mm: thp: set THP defrag by default to madvise and add
  a stall-free defrag option")

* alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency is not
  an issue. So if khugepaged "defrag" is enabled (the default), do reclaim
  via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the PF_KTHREAD check
  from page alloc.
  As a side-effect, khugepaged will now no longer check if the initial
  compaction was deferred or contended. This is OK, as khugepaged sleep times
  between collapsion attempts are long enough to prevent noticeable disruption,
  so we should allow it to spend some effort.

* migrate_misplaced_transhuge_page() - already was masking out __GFP_RECLAIM,
  so just convert to GFP_TRANSHUGE_LIGHT which is equivalent.

* alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise) are
  now allocating without __GFP_NORETRY. Other vma's keep using __GFP_NORETRY
  if direct reclaim/compaction is at all allowed (by default it's allowed only
  for madvised vma's). The rest is conversion to GFP_TRANSHUGE(_LIGHT).

[mhocko@suse.com: suggested GFP_TRANSHUGE_LIGHT]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/gfp.h            | 14 ++++++++------
 include/trace/events/mmflags.h |  1 +
 mm/huge_memory.c               | 29 ++++++++++++++++-------------
 mm/khugepaged.c                |  2 +-
 mm/migrate.c                   |  2 +-
 mm/page_alloc.c                |  6 ++----
 tools/perf/builtin-kmem.c      |  1 +
 7 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index c29e9d347bc6..f8041f9de31e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -237,9 +237,11 @@ struct vm_area_struct;
  *   are expected to be movable via page reclaim or page migration. Typically,
  *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
  *
- * GFP_TRANSHUGE is used for THP allocations. They are compound allocations
- *   that will fail quickly if memory is not available and will not wake
- *   kswapd on failure.
+ * GFP_TRANSHUGE and GFP_TRANSHUGE_LIGHT are used for THP allocations. They are
+ *   compound allocations that will generally fail quickly if memory is not
+ *   available and will not wake kswapd/kcompactd on failure. The _LIGHT
+ *   version does not attempt reclaim/compaction at all and is by default used
+ *   in page fault path, while the non-light is used by khugepaged.
  */
 #define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -254,9 +256,9 @@ struct vm_area_struct;
 #define GFP_DMA32	__GFP_DMA32
 #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
-#define GFP_TRANSHUGE	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
-			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
-			 ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE_LIGHT	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
+			 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
+#define GFP_TRANSHUGE	(GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 43cedbf0c759..5a81ab48a2fb 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -11,6 +11,7 @@
 
 #define __def_gfpflag_names						\
 	{(unsigned long)GFP_TRANSHUGE,		"GFP_TRANSHUGE"},	\
+	{(unsigned long)GFP_TRANSHUGE_LIGHT,	"GFP_TRANSHUGE_LIGHT"}, \
 	{(unsigned long)GFP_HIGHUSER_MOVABLE,	"GFP_HIGHUSER_MOVABLE"},\
 	{(unsigned long)GFP_HIGHUSER,		"GFP_HIGHUSER"},	\
 	{(unsigned long)GFP_USER,		"GFP_USER"},		\
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4b8d4e588930..83b88f97cd54 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -539,23 +539,26 @@ static int __do_huge_pmd_anonymous_page(struct fault_env *fe, struct page *page,
 }
 
 /*
- * If THP is set to always then directly reclaim/compact as necessary
- * If set to defer then do no reclaim and defer to khugepaged
+ * If THP defrag is set to always then directly reclaim/compact as necessary
+ * If set to defer then do only background reclaim/compact and defer to khugepaged
  * If set to madvise and the VMA is flagged then directly reclaim/compact
+ * When direct reclaim/compact is allowed, don't retry except for flagged VMA's
  */
 static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
-	gfp_t reclaim_flags = 0;
-
-	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags) &&
-	    (vma->vm_flags & VM_HUGEPAGE))
-		reclaim_flags = __GFP_DIRECT_RECLAIM;
-	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
-		reclaim_flags = __GFP_KSWAPD_RECLAIM;
-	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
-		reclaim_flags = __GFP_DIRECT_RECLAIM;
-
-	return GFP_TRANSHUGE | reclaim_flags;
+	bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
+
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG,
+				&transparent_hugepage_flags) && vma_madvised)
+		return GFP_TRANSHUGE;
+	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
+						&transparent_hugepage_flags))
+		return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
+	else if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
+						&transparent_hugepage_flags))
+		return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
+
+	return GFP_TRANSHUGE_LIGHT;
 }
 
 /* Caller must hold page table lock. */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index bb49bd1d2d9f..54ec5f8032a3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -694,7 +694,7 @@ static bool khugepaged_scan_abort(int nid)
 /* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
 static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 {
-	return GFP_TRANSHUGE | (khugepaged_defrag() ? __GFP_DIRECT_RECLAIM : 0);
+	return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT;
 }
 
 #ifdef CONFIG_NUMA
diff --git a/mm/migrate.c b/mm/migrate.c
index 365153c14cd0..622c7e473464 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1930,7 +1930,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		goto out_dropref;
 
 	new_page = alloc_pages_node(node,
-		(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 		HPAGE_PMD_ORDER);
 	if (!new_page)
 		goto out_fail;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 989af3164126..b8035c7b50b4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3587,11 +3587,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			/*
 			 * Looks like reclaim/compaction is worth trying, but
 			 * sync compaction could be very expensive, so keep
-			 * using async compaction, unless it's khugepaged
-			 * trying to collapse.
+			 * using async compaction.
 			 */
-			if (!(current->flags & PF_KTHREAD))
-				migration_mode = MIGRATE_ASYNC;
+			migration_mode = MIGRATE_ASYNC;
 		}
 	}
 
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index c9cb3be47cff..0d98182dc159 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -608,6 +608,7 @@ static const struct {
 	const char *compact;
 } gfp_compact_table[] = {
 	{ "GFP_TRANSHUGE",		"THP" },
+	{ "GFP_TRANSHUGE_LIGHT",	"THL" },
 	{ "GFP_HIGHUSER_MOVABLE",	"HUM" },
 	{ "GFP_HIGHUSER",		"HU" },
 	{ "GFP_USER",			"U" },
-- 
2.9.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-21  7:36 [PATCH 0/8] compaction-related cleanups v5 Vlastimil Babka
2016-07-21  7:36 ` [PATCH v5 1/8] mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode Vlastimil Babka
2016-07-21  7:36 ` [PATCH v5 2/8] mm, page_alloc: set alloc_flags only once in slowpath Vlastimil Babka
2016-07-21  7:36 ` [PATCH v5 3/8] mm, page_alloc: don't retry initial attempt " Vlastimil Babka
2016-07-21  7:36 ` [PATCH v5 4/8] mm, page_alloc: restructure direct compaction handling " Vlastimil Babka
2016-07-21  7:36 ` [PATCH v5 5/8] mm, page_alloc: make THP-specific decisions more generic Vlastimil Babka
2016-07-21  7:36 ` Vlastimil Babka [this message]
2016-07-21  7:36 ` [PATCH v5 7/8] mm, compaction: introduce direct compaction priority Vlastimil Babka
2016-07-21  7:36 ` [PATCH v5 8/8] mm, compaction: simplify contended compaction handling Vlastimil Babka

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160721073614.24395-7-vbabka@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git