linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>
Cc: ying.huang@intel.com, Michal Hocko <mhocko@suse.com>,
	s.priebe@profihost.ag, mgorman@techsingularity.net,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name,
	Andrew Morton <akpm@linux-foundation.org>,
	zi.yan@cs.rutgers.edu, Vlastimil Babka <vbabka@suse.cz>
Subject: [patch 2/2 for-4.20] mm, thp: always fault memory with __GFP_NORETRY
Date: Mon, 3 Dec 2018 15:50:21 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1812031546230.161134@chino.kir.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1812031545080.161134@chino.kir.corp.google.com>

If memory compaction initially fails to free a hugepage, reclaiming and
retrying compaction is more likely to be harmful rather than beneficial.

For reclaim, it is unlikely that the pages reclaimed will form contiguous
memory the size of a hugepage without unnecessarily reclaiming a lot of
memory unnecessarily.  It is also not guaranteed to be beneficial to
compaction if the reclaimed memory is not accessible to the per-zone
freeing scanner.  For both of these reasons independently, all reclaim
activity may be entirely fruitless.

With these two issues, retrying compaction again is not likely to have a
different result.  It is better to fallback to pages of the native page
size and allow khugepaged to collapse the memory into a hugepage later
when the fragmentation or availability of local memory is better.

If __GFP_NORETRY is set, which the page allocator implementation is
expecting in its comments, this can prevent large amounts of unnecesary
reclaim and swapping activity that can cause performance of other
applications to quickly degrade.

Furthermore, since reclaim is likely to be more harmful than beneficial
for such large order allocations, it is better to fail earlier rather
than trying reclaim of SWAP_CLUSTER_MAX pages which is unlikely to make
a difference for memory compaction to become successful.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c     |  8 ++++----
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |  3 +--
 include/linux/gfp.h                      |  3 ++-
 mm/huge_memory.c                         |  3 +--
 mm/page_alloc.c                          | 16 ++++++++++++++++
 5 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -860,8 +860,8 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags,
 			while (npages >= HPAGE_PMD_NR) {
 				gfp_t huge_flags = gfp_flags;
 
-				huge_flags |= GFP_TRANSHUGE_LIGHT | __GFP_NORETRY |
-					__GFP_KSWAPD_RECLAIM;
+				huge_flags |= GFP_TRANSHUGE_LIGHT |
+					      __GFP_KSWAPD_RECLAIM;
 				huge_flags &= ~__GFP_MOVABLE;
 				huge_flags &= ~__GFP_COMP;
 				p = alloc_pages(huge_flags, HPAGE_PMD_ORDER);
@@ -978,13 +978,13 @@ int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages)
 				  GFP_USER | GFP_DMA32, "uc dma", 0);
 
 	ttm_page_pool_init_locked(&_manager->wc_pool_huge,
-				  (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY |
+				  (GFP_TRANSHUGE_LIGHT |
 				   __GFP_KSWAPD_RECLAIM) &
 				  ~(__GFP_MOVABLE | __GFP_COMP),
 				  "wc huge", order);
 
 	ttm_page_pool_init_locked(&_manager->uc_pool_huge,
-				  (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY |
+				  (GFP_TRANSHUGE_LIGHT |
 				   __GFP_KSWAPD_RECLAIM) &
 				  ~(__GFP_MOVABLE | __GFP_COMP)
 				  , "uc huge", order);
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -863,8 +863,7 @@ static gfp_t ttm_dma_pool_gfp_flags(struct ttm_dma_tt *ttm_dma, bool huge)
 		gfp_flags |= __GFP_ZERO;
 
 	if (huge) {
-		gfp_flags |= GFP_TRANSHUGE_LIGHT | __GFP_NORETRY |
-			__GFP_KSWAPD_RECLAIM;
+		gfp_flags |= GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM;
 		gfp_flags &= ~__GFP_MOVABLE;
 		gfp_flags &= ~__GFP_COMP;
 	}
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -298,7 +298,8 @@ struct vm_area_struct;
 #define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
 #define GFP_HIGHUSER_MOVABLE	(GFP_HIGHUSER | __GFP_MOVABLE)
 #define GFP_TRANSHUGE_LIGHT	((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
-			 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
+			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \
+			 ~__GFP_RECLAIM)
 #define GFP_TRANSHUGE	(GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
 
 /* Convert GFP flags to their corresponding migrate type */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -636,8 +636,7 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, un
 
 	/* Always do synchronous compaction */
 	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
-		return GFP_TRANSHUGE | __GFP_THISNODE |
-		       (vma_madvised ? 0 : __GFP_NORETRY);
+		return GFP_TRANSHUGE | __GFP_THISNODE;
 
 	/* Kick kcompactd and fail quickly */
 	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4139,6 +4139,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			if (compact_result == COMPACT_DEFERRED)
 				goto nopage;
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+			/*
+			 * When faulting a hugepage, it is very unlikely that
+			 * thrashing the zonelist is going to help compaction in
+			 * freeing such a high-order page.  Reclaim would need
+			 * to free contiguous memory itself or guarantee the
+			 * reclaimed memory is accessible by the compaction
+			 * freeing scanner.  Since there is no such guarantee,
+			 * thrashing is more harmful than beneficial.  It is
+			 * better to simply fail and fallback to native pages.
+			 */
+			if (order == HPAGE_PMD_ORDER &&
+					!(current->flags & PF_KTHREAD))
+				goto nopage;
+#endif
+
 			/*
 			 * Looks like reclaim/compaction is worth trying, but
 			 * sync compaction could be very expensive, so keep

  reply	other threads:[~2018-12-03 23:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 23:50 [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions David Rientjes
2018-12-03 23:50 ` David Rientjes [this message]
2018-12-03 23:50 ` [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations David Rientjes
2018-12-04  7:35   ` Michal Hocko
2018-12-04 21:56     ` David Rientjes
2018-12-05  7:34       ` Michal Hocko
2018-12-05 19:24         ` David Rientjes
2018-12-05 20:15           ` Michal Hocko
2018-12-05 22:21             ` Andrea Arcangeli
2018-12-04  7:38 ` [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions Michal Hocko
2018-12-04 22:25   ` David Rientjes
2018-12-05  7:40     ` Michal Hocko
2018-12-05 10:15     ` Mel Gorman
2018-12-05 19:41       ` David Rientjes
2018-12-04 10:10 ` Vlastimil Babka
2018-12-04 22:04   ` David Rientjes
2018-12-05  9:05     ` Michal Hocko
2018-12-05 19:49       ` David Rientjes
2018-12-05 20:32         ` Michal Hocko
2018-12-05 21:14           ` David Rientjes
2018-12-05 21:45         ` Andrea Arcangeli
2018-12-05 22:10           ` David Rientjes
2018-12-06  0:31             ` Andrea Arcangeli
2018-12-09 22:44               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1812031546230.161134@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=s.priebe@profihost.ag \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).