linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6 v3] Discard __GFP_ATOMIC
@ 2023-01-13 11:12 Mel Gorman
  2023-01-13 11:12 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

This replaces the "Discard __GFP_ATOMIC v2" series in mm-unstable. There
are changelog and patch replacements that make -fix patches impractical.

Changelog since v2
o Non-blocking (GFP_NOWAIT) allocations get no reserve access	(mhocko)
o __GFP_NOFAIL before OOM reserve access reduced		(mhocko)
o Changelog clarifications					(mhocko)
o Note that rt_task treatment to be deleted in changelog	(mhocko)
o One ack dropped as the patch changed enough to invalidate it

Changelog since v1
o Split one patch						(vbabka)
o Improve OOM reserve handling					(vbabka)
o Fix __GFP_RECLAIM vs __GFP_DIRECT_RECLAIM			(vbabka)

Neil's patch has been residing in mm-unstable as commit 2fafb4fe8f7a
("mm: discard __GFP_ATOMIC") for a long time and recently brought up
again. Most recently, I was worried that __GFP_HIGH allocations could
use high-order atomic reserves which is unintentional but there was no
response so lets revisit -- this series reworks how min reserves are used,
protects highorder reserves and then finishes with Neil's patch with very
minor modifications so it fits on top.

There was a review discussion on renaming __GFP_DIRECT_RECLAIM to
__GFP_ALLOW_BLOCKING but I didn't think it was that big an issue and is
ortogonal to the removal of __GFP_ATOMIC.

There were some concerns about how the gfp flags affect the min reserves
but it never reached a solid conclusion so I made my own attempt.

The series tries to iron out some of the details on how reserves are
used. ALLOC_HIGH becomes ALLOC_MIN_RESERVE and ALLOC_HARDER becomes
ALLOC_NON_BLOCK and documents how the reserves are affected. For example,
ALLOC_NON_BLOCK (no direct reclaim) on its own allows 25% of the min reserve.
ALLOC_MIN_RESERVE (__GFP_HIGH) allows 50% and both combined allows deeper
access again. ALLOC_OOM allows access to 75%.

High-order atomic allocations are explicitly handled with the caveat that
no __GFP_ATOMIC flag means that any high-order allocation that specifies
GFP_HIGH and cannot enter direct reclaim will be treated as if it was
GFP_ATOMIC.

 Documentation/mm/balance.rst   |   2 +-
 drivers/iommu/tegra-smmu.c     |   4 +-
 include/linux/gfp_types.h      |  12 ++--
 include/trace/events/mmflags.h |   1 -
 lib/test_printf.c              |   8 +--
 mm/internal.h                  |  15 ++++-
 mm/page_alloc.c                | 106 ++++++++++++++++++++-------------
 tools/perf/builtin-kmem.c      |   1 -
 8 files changed, 88 insertions(+), 61 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE
  2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
@ 2023-01-13 11:12 ` Mel Gorman
  2023-01-13 11:12 ` [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to __GFP_HIGH Mel Gorman
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

__GFP_HIGH aliases to ALLOC_HIGH but the name does not really hint
what it means. As ALLOC_HIGH is internal to the allocator, rename
it to ALLOC_MIN_RESERVE to document that the min reserves can
be depleted.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/internal.h   | 4 +++-
 mm/page_alloc.c | 8 ++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index bcf75a8b032d..403e4386626d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -736,7 +736,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #endif
 
 #define ALLOC_HARDER		 0x10 /* try to alloc harder */
-#define ALLOC_HIGH		 0x20 /* __GFP_HIGH set */
+#define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
+				       * of the min watermark.
+				       */
 #define ALLOC_CPUSET		 0x40 /* check for correct cpuset */
 #define ALLOC_CMA		 0x80 /* allow allocations from CMA areas */
 #ifdef CONFIG_ZONE_DMA32
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0745aedebb37..244c1e675dc8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3976,7 +3976,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 	/* free_pages may go negative - that's OK */
 	free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
 
-	if (alloc_flags & ALLOC_HIGH)
+	if (alloc_flags & ALLOC_MIN_RESERVE)
 		min -= min / 2;
 
 	if (unlikely(alloc_harder)) {
@@ -4818,18 +4818,18 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
 	/*
-	 * __GFP_HIGH is assumed to be the same as ALLOC_HIGH
+	 * __GFP_HIGH is assumed to be the same as ALLOC_MIN_RESERVE
 	 * and __GFP_KSWAPD_RECLAIM is assumed to be the same as ALLOC_KSWAPD
 	 * to save two branches.
 	 */
-	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
+	BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_MIN_RESERVE);
 	BUILD_BUG_ON(__GFP_KSWAPD_RECLAIM != (__force gfp_t) ALLOC_KSWAPD);
 
 	/*
 	 * The caller may dip into page reserves a bit more if the caller
 	 * cannot run direct reclaim, or if the caller has realtime scheduling
 	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
-	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH).
+	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
 	 */
 	alloc_flags |= (__force int)
 		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to __GFP_HIGH
  2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
  2023-01-13 11:12 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
@ 2023-01-13 11:12 ` Mel Gorman
  2023-01-13 11:12 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is
typically combined with ALLOC_MIN_RESERVE so RT tasks are a little
unusual. While there is some justification for allowing RT tasks
access to memory reserves, there is a strong chance that a RT task
that is also under memory pressure is at risk of missing deadlines
anyway. Relax how much reserves an RT task can access by treating
it the same as __GFP_HIGH allocations.

Note that in a future kernel release that the RT special casing will be
removed. Hard realtime tasks should be locking down resources in advance
and ensuring enough memory is available. Even a soft-realtime task like
audio or video live decoding which cannot jitter should be allocating both
memory and any disk space required up-front before the recording starts
instead of relying on reserves. At best, reserve access will only delay
the problem by a very short interval.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 244c1e675dc8..0040b4e00913 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4847,7 +4847,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 */
 		alloc_flags &= ~ALLOC_CPUSET;
 	} else if (unlikely(rt_task(current)) && in_task())
-		alloc_flags |= ALLOC_HARDER;
+		alloc_flags |= ALLOC_MIN_RESERVE;
 
 	alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
  2023-01-13 11:12 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
  2023-01-13 11:12 ` [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to __GFP_HIGH Mel Gorman
@ 2023-01-13 11:12 ` Mel Gorman
  2023-01-13 13:02   ` Michal Hocko
  2023-01-13 11:12 ` [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Mel Gorman
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
is accurate, it changes later in the series. In preparation, explicitly
record high-order atomic allocations in gfp_to_alloc_flags().

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/internal.h   |  1 +
 mm/page_alloc.c | 29 +++++++++++++++++++++++------
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 403e4386626d..178484d9fd94 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -746,6 +746,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #else
 #define ALLOC_NOFRAGMENT	  0x0
 #endif
+#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
 enum ttu_flags;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0040b4e00913..0ef4f3236a5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3706,10 +3706,20 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 		 * reserved for high-order atomic allocation, so order-0
 		 * request should skip it.
 		 */
-		if (order > 0 && alloc_flags & ALLOC_HARDER)
+		if (alloc_flags & ALLOC_HIGHATOMIC)
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
+
+			/*
+			 * If the allocation fails, allow OOM handling access
+			 * to HIGHATOMIC reserves as failing now is worse than
+			 * failing a high-order atomic allocation in the
+			 * future.
+			 */
+			if (!page && (alloc_flags & ALLOC_OOM))
+				page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+
 			if (!page) {
 				spin_unlock_irqrestore(&zone->lock, flags);
 				return NULL;
@@ -4023,8 +4033,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			return true;
 		}
 #endif
-		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
+		if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
+		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
 			return true;
+		}
 	}
 	return false;
 }
@@ -4286,7 +4298,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 			 * If this is a high-order atomic allocation then check
 			 * if the pageblock should be reserved for the future
 			 */
-			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
+			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
 				reserve_highatomic_pageblock(page, zone, order);
 
 			return page;
@@ -4813,7 +4825,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
 }
 
 static inline unsigned int
-gfp_to_alloc_flags(gfp_t gfp_mask)
+gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 {
 	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
@@ -4839,8 +4851,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
 		 * if it can't schedule.
 		 */
-		if (!(gfp_mask & __GFP_NOMEMALLOC))
+		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
 			alloc_flags |= ALLOC_HARDER;
+
+			if (order > 0)
+				alloc_flags |= ALLOC_HIGHATOMIC;
+		}
+
 		/*
 		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
 		 * comment for __cpuset_node_allowed().
@@ -5048,7 +5065,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * kswapd needs to be woken up, and to avoid the cost of setting up
 	 * alloc_flags precisely. So we do that now.
 	 */
-	alloc_flags = gfp_to_alloc_flags(gfp_mask);
+	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
 
 	/*
 	 * We need to recalculate the starting point for the zonelist iterator
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves
  2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
                   ` (2 preceding siblings ...)
  2023-01-13 11:12 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
@ 2023-01-13 11:12 ` Mel Gorman
  2023-01-13 11:12 ` [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves Mel Gorman
  2023-01-13 11:12 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
  5 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

As there are more ALLOC_ flags that affect reserves, define what flags
affect reserves and clarify the effect of each flag.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/internal.h   |  3 +++
 mm/page_alloc.c | 34 ++++++++++++++++++++++------------
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 178484d9fd94..8706d46863df 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -749,6 +749,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
+/* Flags that allow allocations below the min watermark. */
+#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+
 enum ttu_flags;
 struct tlbflush_unmap_batch;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0ef4f3236a5a..6f41b84a97ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3949,15 +3949,14 @@ ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE);
 static inline long __zone_watermark_unusable_free(struct zone *z,
 				unsigned int order, unsigned int alloc_flags)
 {
-	const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
 	long unusable_free = (1 << order) - 1;
 
 	/*
-	 * If the caller does not have rights to ALLOC_HARDER then subtract
-	 * the high-atomic reserves. This will over-estimate the size of the
-	 * atomic reserve but it avoids a search.
+	 * If the caller does not have rights to reserves below the min
+	 * watermark then subtract the high-atomic reserves. This will
+	 * over-estimate the size of the atomic reserve but it avoids a search.
 	 */
-	if (likely(!alloc_harder))
+	if (likely(!(alloc_flags & ALLOC_RESERVES)))
 		unusable_free += z->nr_reserved_highatomic;
 
 #ifdef CONFIG_CMA
@@ -3981,25 +3980,36 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 {
 	long min = mark;
 	int o;
-	const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
 
 	/* free_pages may go negative - that's OK */
 	free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags);
 
-	if (alloc_flags & ALLOC_MIN_RESERVE)
-		min -= min / 2;
+	if (unlikely(alloc_flags & ALLOC_RESERVES)) {
+		/*
+		 * __GFP_HIGH allows access to 50% of the min reserve as well
+		 * as OOM.
+		 */
+		if (alloc_flags & ALLOC_MIN_RESERVE)
+			min -= min / 2;
 
-	if (unlikely(alloc_harder)) {
 		/*
-		 * OOM victims can try even harder than normal ALLOC_HARDER
+		 * Non-blocking allocations can access some of the reserve
+		 * with more access if also __GFP_HIGH. The reasoning is that
+		 * a non-blocking caller may incur a more severe penalty
+		 * if it cannot get memory quickly, particularly if it's
+		 * also __GFP_HIGH.
+		 */
+		if (alloc_flags & ALLOC_HARDER)
+			min -= min / 4;
+
+		/*
+		 * OOM victims can try even harder than the normal reserve
 		 * users on the grounds that it's definitely going to be in
 		 * the exit path shortly and free memory. Any allocation it
 		 * makes during the free path will be small and short-lived.
 		 */
 		if (alloc_flags & ALLOC_OOM)
 			min -= min / 2;
-		else
-			min -= min / 4;
 	}
 
 	/*
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves
  2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
                   ` (3 preceding siblings ...)
  2023-01-13 11:12 ` [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Mel Gorman
@ 2023-01-13 11:12 ` Mel Gorman
  2023-01-13 13:06   ` Michal Hocko
  2023-02-07 13:32   ` Vlastimil Babka
  2023-01-13 11:12 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
  5 siblings, 2 replies; 15+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a vague
description. In preparation for the removal of GFP_ATOMIC redefine
__GFP_ATOMIC to simply mean non-blocking and renaming ALLOC_HARDER to
ALLOC_NON_BLOCK accordingly. __GFP_HIGH is required for access to reserves
but non-blocking is granted more access. For example, GFP_NOWAIT is
non-blocking but has no special access to reserves. A __GFP_NOFAIL
blocking allocation is granted access similar to __GFP_HIGH if the
only alternative is an OOM kill.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h   |  7 +++++--
 mm/page_alloc.c | 44 ++++++++++++++++++++++++--------------------
 2 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 8706d46863df..23a37588073a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -735,7 +735,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_OOM		ALLOC_NO_WATERMARKS
 #endif
 
-#define ALLOC_HARDER		 0x10 /* try to alloc harder */
+#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
+				       * to 25% of the min watermark or
+				       * 62.5% if __GFP_HIGH is set.
+				       */
 #define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
 				       * of the min watermark.
 				       */
@@ -750,7 +753,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
 /* Flags that allow allocations below the min watermark. */
-#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
+#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6f41b84a97ac..b9ae0ba0a2ab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3989,18 +3989,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		 * __GFP_HIGH allows access to 50% of the min reserve as well
 		 * as OOM.
 		 */
-		if (alloc_flags & ALLOC_MIN_RESERVE)
+		if (alloc_flags & ALLOC_MIN_RESERVE) {
 			min -= min / 2;
 
-		/*
-		 * Non-blocking allocations can access some of the reserve
-		 * with more access if also __GFP_HIGH. The reasoning is that
-		 * a non-blocking caller may incur a more severe penalty
-		 * if it cannot get memory quickly, particularly if it's
-		 * also __GFP_HIGH.
-		 */
-		if (alloc_flags & ALLOC_HARDER)
-			min -= min / 4;
+			/*
+			 * Non-blocking allocations (e.g. GFP_ATOMIC) can
+			 * access more reserves than just __GFP_HIGH. Other
+			 * non-blocking allocations requests such as GFP_NOWAIT
+			 * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
+			 * access to the min reserve.
+			 */
+			if (alloc_flags & ALLOC_NON_BLOCK)
+				min -= min / 4;
+		}
 
 		/*
 		 * OOM victims can try even harder than the normal reserve
@@ -4851,28 +4852,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 	 * The caller may dip into page reserves a bit more if the caller
 	 * cannot run direct reclaim, or if the caller has realtime scheduling
 	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
-	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
+	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
 	 */
 	alloc_flags |= (__force int)
 		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
 
-	if (gfp_mask & __GFP_ATOMIC) {
+	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
 		/*
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
 		 * if it can't schedule.
 		 */
 		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
-			alloc_flags |= ALLOC_HARDER;
+			alloc_flags |= ALLOC_NON_BLOCK;
 
 			if (order > 0)
 				alloc_flags |= ALLOC_HIGHATOMIC;
 		}
 
 		/*
-		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
-		 * comment for __cpuset_node_allowed().
+		 * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
+		 * GFP_ATOMIC) rather than fail, see the comment for
+		 * __cpuset_node_allowed().
 		 */
-		alloc_flags &= ~ALLOC_CPUSET;
+		if (alloc_flags & ALLOC_MIN_RESERVE)
+			alloc_flags &= ~ALLOC_CPUSET;
 	} else if (unlikely(rt_task(current)) && in_task())
 		alloc_flags |= ALLOC_MIN_RESERVE;
 
@@ -5303,12 +5306,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		WARN_ON_ONCE_GFP(costly_order, gfp_mask);
 
 		/*
-		 * Help non-failing allocations by giving them access to memory
-		 * reserves but do not use ALLOC_NO_WATERMARKS because this
+		 * Help non-failing allocations by giving some access to memory
+		 * reserves normally used for high priority non-blocking
+		 * allocations but do not use ALLOC_NO_WATERMARKS because this
 		 * could deplete whole memory reserves which would just make
-		 * the situation worse
+		 * the situation worse.
 		 */
-		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
+		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);
 		if (page)
 			goto got_pg;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] mm: discard __GFP_ATOMIC
  2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
                   ` (4 preceding siblings ...)
  2023-01-13 11:12 ` [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves Mel Gorman
@ 2023-01-13 11:12 ` Mel Gorman
  5 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2023-01-13 11:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML, Mel Gorman

From: NeilBrown <neilb@suse.de>

__GFP_ATOMIC serves little purpose.  Its main effect is to set
ALLOC_HARDER which adds a few little boosts to increase the chance of an
allocation succeeding, one of which is to lower the water-mark at which it
will succeed.

It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also
adjusts this watermark.  It is probable that other users of __GFP_HIGH
should benefit from the other little bonuses that __GFP_ATOMIC gets.

__GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM.
There is little point to this.  We already get a might_sleep() warning if
__GFP_DIRECT_RECLAIM is set.

__GFP_ATOMIC allows the "watermark_boost" to be side-stepped.  It is
probable that testing ALLOC_HARDER is a better fit here.

__GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might
sleep.  This should test __GFP_DIRECT_RECLAIM instead.

This patch:
 - removes __GFP_ATOMIC
 - allows __GFP_HIGH allocations to ignore watermark boosting as well
   as GFP_ATOMIC requests.
 - makes other adjustments as suggested by the above.

The net result is not change to GFP_ATOMIC allocations.  Other
allocations that use __GFP_HIGH will benefit from a few different extra
privileges.  This affects:
  xen, dm, md, ntfs3
  the vermillion frame buffer
  hibernation
  ksm
  swap
all of which likely produce more benefit than cost if these selected
allocation are more likely to succeed quickly.

[mgorman: Minor adjustments to rework on top of a series]
Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/mm/balance.rst   |  2 +-
 drivers/iommu/tegra-smmu.c     |  4 ++--
 include/linux/gfp_types.h      | 12 ++++--------
 include/trace/events/mmflags.h |  1 -
 lib/test_printf.c              |  8 ++++----
 mm/internal.h                  |  2 +-
 mm/page_alloc.c                | 13 +++----------
 tools/perf/builtin-kmem.c      |  1 -
 8 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst
index 6a1fadf3e173..e38e9d83c1c7 100644
--- a/Documentation/mm/balance.rst
+++ b/Documentation/mm/balance.rst
@@ -6,7 +6,7 @@ Memory Balancing
 
 Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
 
-Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
+Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
 well as for non __GFP_IO allocations.
 
 The first reason why a caller may avoid reclaim is that the caller can not
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 5b1af40221ec..af8d0e685260 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -671,12 +671,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
 	 * allocate page in a sleeping context if GFP flags permit. Hence
 	 * spinlock needs to be unlocked and re-locked after allocation.
 	 */
-	if (!(gfp & __GFP_ATOMIC))
+	if (gfpflags_allow_blocking(gfp))
 		spin_unlock_irqrestore(&as->lock, *flags);
 
 	page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO);
 
-	if (!(gfp & __GFP_ATOMIC))
+	if (gfpflags_allow_blocking(gfp))
 		spin_lock_irqsave(&as->lock, *flags);
 
 	/*
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index d88c46ca82e1..5088637fe5c2 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -31,7 +31,7 @@ typedef unsigned int __bitwise gfp_t;
 #define ___GFP_IO		0x40u
 #define ___GFP_FS		0x80u
 #define ___GFP_ZERO		0x100u
-#define ___GFP_ATOMIC		0x200u
+/* 0x200u unused */
 #define ___GFP_DIRECT_RECLAIM	0x400u
 #define ___GFP_KSWAPD_RECLAIM	0x800u
 #define ___GFP_WRITE		0x1000u
@@ -116,11 +116,8 @@ typedef unsigned int __bitwise gfp_t;
  *
  * %__GFP_HIGH indicates that the caller is high-priority and that granting
  * the request is necessary before the system can make forward progress.
- * For example, creating an IO context to clean pages.
- *
- * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is
- * high priority. Users are typically interrupt handlers. This may be
- * used in conjunction with %__GFP_HIGH
+ * For example creating an IO context to clean pages and requests
+ * from atomic context.
  *
  * %__GFP_MEMALLOC allows access to all memory. This should only be used when
  * the caller guarantees the allocation will allow more memory to be freed
@@ -135,7 +132,6 @@ typedef unsigned int __bitwise gfp_t;
  * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves.
  * This takes precedence over the %__GFP_MEMALLOC flag if both are set.
  */
-#define __GFP_ATOMIC	((__force gfp_t)___GFP_ATOMIC)
 #define __GFP_HIGH	((__force gfp_t)___GFP_HIGH)
 #define __GFP_MEMALLOC	((__force gfp_t)___GFP_MEMALLOC)
 #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC)
@@ -329,7 +325,7 @@ typedef unsigned int __bitwise gfp_t;
  * version does not attempt reclaim/compaction at all and is by default used
  * in page fault path, while the non-light is used by khugepaged.
  */
-#define GFP_ATOMIC	(__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
+#define GFP_ATOMIC	(__GFP_HIGH|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)
 #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
 #define GFP_NOWAIT	(__GFP_KSWAPD_RECLAIM)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 412b5a46374c..9db52bc4ce19 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -31,7 +31,6 @@
 	gfpflag_string(__GFP_HIGHMEM),		\
 	gfpflag_string(GFP_DMA32),		\
 	gfpflag_string(__GFP_HIGH),		\
-	gfpflag_string(__GFP_ATOMIC),		\
 	gfpflag_string(__GFP_IO),		\
 	gfpflag_string(__GFP_FS),		\
 	gfpflag_string(__GFP_NOWARN),		\
diff --git a/lib/test_printf.c b/lib/test_printf.c
index d34dc636b81c..46b4e6c414a3 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -674,17 +674,17 @@ flags(void)
 	gfp = GFP_ATOMIC|__GFP_DMA;
 	test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp);
 
-	gfp = __GFP_ATOMIC;
-	test("__GFP_ATOMIC", "%pGg", &gfp);
+	gfp = __GFP_HIGH;
+	test("__GFP_HIGH", "%pGg", &gfp);
 
 	/* Any flags not translated by the table should remain numeric */
 	gfp = ~__GFP_BITS_MASK;
 	snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp);
 	test(cmp_buffer, "%pGg", &gfp);
 
-	snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx",
+	snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx",
 							(unsigned long) gfp);
-	gfp |= __GFP_ATOMIC;
+	gfp |= __GFP_HIGH;
 	test(cmp_buffer, "%pGg", &gfp);
 
 	kfree(cmp_buffer);
diff --git a/mm/internal.h b/mm/internal.h
index 23a37588073a..71b1111427f3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -24,7 +24,7 @@ struct folio_batch;
 #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\
 			__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\
 			__GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\
-			__GFP_ATOMIC|__GFP_NOLOCKDEP)
+			__GFP_NOLOCKDEP)
 
 /* The GFP flags allowed during early boot */
 #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b9ae0ba0a2ab..78ffebc4798b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4087,13 +4087,14 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
 					free_pages))
 		return true;
+
 	/*
-	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
+	 * Ignore watermark boosting for __GFP_HIGH order-0 allocations
 	 * when checking the min watermark. The min watermark is the
 	 * point where boosting is ignored so that kswapd is woken up
 	 * when below the low watermark.
 	 */
-	if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
+	if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost
 		&& ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) {
 		mark = z->_watermark[WMARK_MIN];
 		return __zone_watermark_ok(z, order, mark, highest_zoneidx,
@@ -5058,14 +5059,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned int zonelist_iter_cookie;
 	int reserve_flags;
 
-	/*
-	 * We also sanity check to catch abuse of atomic reserves being used by
-	 * callers that are not in atomic context.
-	 */
-	if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
-				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
-		gfp_mask &= ~__GFP_ATOMIC;
-
 restart:
 	compaction_retries = 0;
 	no_progress_loops = 0;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index e20656c431a4..173d407dce92 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -641,7 +641,6 @@ static const struct {
 	{ "__GFP_HIGHMEM",		"HM" },
 	{ "GFP_DMA32",			"D32" },
 	{ "__GFP_HIGH",			"H" },
-	{ "__GFP_ATOMIC",		"_A" },
 	{ "__GFP_IO",			"I" },
 	{ "__GFP_FS",			"F" },
 	{ "__GFP_NOWARN",		"NWR" },
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2023-01-13 11:12 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
@ 2023-01-13 13:02   ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2023-01-13 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML

On Fri 13-01-23 11:12:14, Mel Gorman wrote:
> A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
> is accurate, it changes later in the series. In preparation, explicitly
> record high-order atomic allocations in gfp_to_alloc_flags().
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/internal.h   |  1 +
>  mm/page_alloc.c | 29 +++++++++++++++++++++++------
>  2 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 403e4386626d..178484d9fd94 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -746,6 +746,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #else
>  #define ALLOC_NOFRAGMENT	  0x0
>  #endif
> +#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  enum ttu_flags;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0040b4e00913..0ef4f3236a5a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3706,10 +3706,20 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
>  		 * reserved for high-order atomic allocation, so order-0
>  		 * request should skip it.
>  		 */
> -		if (order > 0 && alloc_flags & ALLOC_HARDER)
> +		if (alloc_flags & ALLOC_HIGHATOMIC)
>  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
>  		if (!page) {
>  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> +
> +			/*
> +			 * If the allocation fails, allow OOM handling access
> +			 * to HIGHATOMIC reserves as failing now is worse than
> +			 * failing a high-order atomic allocation in the
> +			 * future.
> +			 */
> +			if (!page && (alloc_flags & ALLOC_OOM))
> +				page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> +
>  			if (!page) {
>  				spin_unlock_irqrestore(&zone->lock, flags);
>  				return NULL;
> @@ -4023,8 +4033,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  			return true;
>  		}
>  #endif
> -		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
> +		if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
> +		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
>  			return true;
> +		}
>  	}
>  	return false;
>  }
> @@ -4286,7 +4298,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  			 * If this is a high-order atomic allocation then check
>  			 * if the pageblock should be reserved for the future
>  			 */
> -			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
> +			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
>  				reserve_highatomic_pageblock(page, zone, order);
>  
>  			return page;
> @@ -4813,7 +4825,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
>  }
>  
>  static inline unsigned int
> -gfp_to_alloc_flags(gfp_t gfp_mask)
> +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  {
>  	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
>  
> @@ -4839,8 +4851,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> +		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
>  			alloc_flags |= ALLOC_HARDER;
> +
> +			if (order > 0)
> +				alloc_flags |= ALLOC_HIGHATOMIC;
> +		}
> +
>  		/*
>  		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
>  		 * comment for __cpuset_node_allowed().
> @@ -5048,7 +5065,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 * kswapd needs to be woken up, and to avoid the cost of setting up
>  	 * alloc_flags precisely. So we do that now.
>  	 */
> -	alloc_flags = gfp_to_alloc_flags(gfp_mask);
> +	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
>  
>  	/*
>  	 * We need to recalculate the starting point for the zonelist iterator
> -- 
> 2.35.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves
  2023-01-13 11:12 ` [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves Mel Gorman
@ 2023-01-13 13:06   ` Michal Hocko
  2023-02-07 13:32   ` Vlastimil Babka
  1 sibling, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2023-01-13 13:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, NeilBrown, Thierry Reding, Matthew Wilcox,
	Vlastimil Babka, Linux-MM, LKML

On Fri 13-01-23 11:12:16, Mel Gorman wrote:
> GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a vague
> description. In preparation for the removal of GFP_ATOMIC redefine
> __GFP_ATOMIC to simply mean non-blocking and renaming ALLOC_HARDER to
> ALLOC_NON_BLOCK accordingly. __GFP_HIGH is required for access to reserves
> but non-blocking is granted more access. For example, GFP_NOWAIT is
> non-blocking but has no special access to reserves. A __GFP_NOFAIL
> blocking allocation is granted access similar to __GFP_HIGH if the
> only alternative is an OOM kill.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/internal.h   |  7 +++++--
>  mm/page_alloc.c | 44 ++++++++++++++++++++++++--------------------
>  2 files changed, 29 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 8706d46863df..23a37588073a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -735,7 +735,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_OOM		ALLOC_NO_WATERMARKS
>  #endif
>  
> -#define ALLOC_HARDER		 0x10 /* try to alloc harder */
> +#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
> +				       * to 25% of the min watermark or
> +				       * 62.5% if __GFP_HIGH is set.
> +				       */
>  #define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
>  				       * of the min watermark.
>  				       */
> @@ -750,7 +753,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  /* Flags that allow allocations below the min watermark. */
> -#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
> +#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
>  
>  enum ttu_flags;
>  struct tlbflush_unmap_batch;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6f41b84a97ac..b9ae0ba0a2ab 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3989,18 +3989,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  		 * __GFP_HIGH allows access to 50% of the min reserve as well
>  		 * as OOM.
>  		 */
> -		if (alloc_flags & ALLOC_MIN_RESERVE)
> +		if (alloc_flags & ALLOC_MIN_RESERVE) {
>  			min -= min / 2;
>  
> -		/*
> -		 * Non-blocking allocations can access some of the reserve
> -		 * with more access if also __GFP_HIGH. The reasoning is that
> -		 * a non-blocking caller may incur a more severe penalty
> -		 * if it cannot get memory quickly, particularly if it's
> -		 * also __GFP_HIGH.
> -		 */
> -		if (alloc_flags & ALLOC_HARDER)
> -			min -= min / 4;
> +			/*
> +			 * Non-blocking allocations (e.g. GFP_ATOMIC) can
> +			 * access more reserves than just __GFP_HIGH. Other
> +			 * non-blocking allocations requests such as GFP_NOWAIT
> +			 * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
> +			 * access to the min reserve.
> +			 */
> +			if (alloc_flags & ALLOC_NON_BLOCK)
> +				min -= min / 4;
> +		}
>  
>  		/*
>  		 * OOM victims can try even harder than the normal reserve
> @@ -4851,28 +4852,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  	 * The caller may dip into page reserves a bit more if the caller
>  	 * cannot run direct reclaim, or if the caller has realtime scheduling
>  	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
> -	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
> +	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
>  	 */
>  	alloc_flags |= (__force int)
>  		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
>  
> -	if (gfp_mask & __GFP_ATOMIC) {
> +	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>  		/*
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
>  		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> -			alloc_flags |= ALLOC_HARDER;
> +			alloc_flags |= ALLOC_NON_BLOCK;
>  
>  			if (order > 0)
>  				alloc_flags |= ALLOC_HIGHATOMIC;
>  		}
>  
>  		/*
> -		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
> -		 * comment for __cpuset_node_allowed().
> +		 * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
> +		 * GFP_ATOMIC) rather than fail, see the comment for
> +		 * __cpuset_node_allowed().
>  		 */
> -		alloc_flags &= ~ALLOC_CPUSET;
> +		if (alloc_flags & ALLOC_MIN_RESERVE)
> +			alloc_flags &= ~ALLOC_CPUSET;
>  	} else if (unlikely(rt_task(current)) && in_task())
>  		alloc_flags |= ALLOC_MIN_RESERVE;
>  
> @@ -5303,12 +5306,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		WARN_ON_ONCE_GFP(costly_order, gfp_mask);
>  
>  		/*
> -		 * Help non-failing allocations by giving them access to memory
> -		 * reserves but do not use ALLOC_NO_WATERMARKS because this
> +		 * Help non-failing allocations by giving some access to memory
> +		 * reserves normally used for high priority non-blocking
> +		 * allocations but do not use ALLOC_NO_WATERMARKS because this
>  		 * could deplete whole memory reserves which would just make
> -		 * the situation worse
> +		 * the situation worse.
>  		 */
> -		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);
>  		if (page)
>  			goto got_pg;
>  
> -- 
> 2.35.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves
  2023-01-13 11:12 ` [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves Mel Gorman
  2023-01-13 13:06   ` Michal Hocko
@ 2023-02-07 13:32   ` Vlastimil Babka
  1 sibling, 0 replies; 15+ messages in thread
From: Vlastimil Babka @ 2023-02-07 13:32 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Michal Hocko, NeilBrown, Thierry Reding, Matthew Wilcox, Linux-MM, LKML

On 1/13/23 12:12, Mel Gorman wrote:
> GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a vague
> description. In preparation for the removal of GFP_ATOMIC redefine

						^ __GFP_ATOMC

> __GFP_ATOMIC to simply mean non-blocking and renaming ALLOC_HARDER to
> ALLOC_NON_BLOCK accordingly. __GFP_HIGH is required for access to reserves
> but non-blocking is granted more access. For example, GFP_NOWAIT is
> non-blocking but has no special access to reserves. A __GFP_NOFAIL
> blocking allocation is granted access similar to __GFP_HIGH if the
> only alternative is an OOM kill.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Well just for the lore record (too late for git)

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Nit below:

> ---
>  mm/internal.h   |  7 +++++--
>  mm/page_alloc.c | 44 ++++++++++++++++++++++++--------------------
>  2 files changed, 29 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 8706d46863df..23a37588073a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -735,7 +735,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_OOM		ALLOC_NO_WATERMARKS
>  #endif
>  
> -#define ALLOC_HARDER		 0x10 /* try to alloc harder */
> +#define ALLOC_NON_BLOCK		 0x10 /* Caller cannot block. Allow access
> +				       * to 25% of the min watermark or
> +				       * 62.5% if __GFP_HIGH is set.

This is now (as of v3) inaccurate (the 25% part), right?

> +				       */
>  #define ALLOC_MIN_RESERVE	 0x20 /* __GFP_HIGH set. Allow access to 50%
>  				       * of the min watermark.
>  				       */
> @@ -750,7 +753,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  /* Flags that allow allocations below the min watermark. */
> -#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
> +#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|ALLOC_OOM)
>  
>  enum ttu_flags;
>  struct tlbflush_unmap_batch;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6f41b84a97ac..b9ae0ba0a2ab 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3989,18 +3989,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  		 * __GFP_HIGH allows access to 50% of the min reserve as well
>  		 * as OOM.
>  		 */
> -		if (alloc_flags & ALLOC_MIN_RESERVE)
> +		if (alloc_flags & ALLOC_MIN_RESERVE) {
>  			min -= min / 2;
>  
> -		/*
> -		 * Non-blocking allocations can access some of the reserve
> -		 * with more access if also __GFP_HIGH. The reasoning is that
> -		 * a non-blocking caller may incur a more severe penalty
> -		 * if it cannot get memory quickly, particularly if it's
> -		 * also __GFP_HIGH.
> -		 */
> -		if (alloc_flags & ALLOC_HARDER)
> -			min -= min / 4;
> +			/*
> +			 * Non-blocking allocations (e.g. GFP_ATOMIC) can
> +			 * access more reserves than just __GFP_HIGH. Other
> +			 * non-blocking allocations requests such as GFP_NOWAIT
> +			 * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get
> +			 * access to the min reserve.
> +			 */
> +			if (alloc_flags & ALLOC_NON_BLOCK)
> +				min -= min / 4;
> +		}
>  
>  		/*
>  		 * OOM victims can try even harder than the normal reserve
> @@ -4851,28 +4852,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  	 * The caller may dip into page reserves a bit more if the caller
>  	 * cannot run direct reclaim, or if the caller has realtime scheduling
>  	 * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
> -	 * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH).
> +	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
>  	 */
>  	alloc_flags |= (__force int)
>  		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
>  
> -	if (gfp_mask & __GFP_ATOMIC) {
> +	if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>  		/*
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
>  		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> -			alloc_flags |= ALLOC_HARDER;
> +			alloc_flags |= ALLOC_NON_BLOCK;
>  
>  			if (order > 0)
>  				alloc_flags |= ALLOC_HIGHATOMIC;
>  		}
>  
>  		/*
> -		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
> -		 * comment for __cpuset_node_allowed().
> +		 * Ignore cpuset mems for non-blocking __GFP_HIGH (probably
> +		 * GFP_ATOMIC) rather than fail, see the comment for
> +		 * __cpuset_node_allowed().
>  		 */
> -		alloc_flags &= ~ALLOC_CPUSET;
> +		if (alloc_flags & ALLOC_MIN_RESERVE)
> +			alloc_flags &= ~ALLOC_CPUSET;
>  	} else if (unlikely(rt_task(current)) && in_task())
>  		alloc_flags |= ALLOC_MIN_RESERVE;
>  
> @@ -5303,12 +5306,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		WARN_ON_ONCE_GFP(costly_order, gfp_mask);
>  
>  		/*
> -		 * Help non-failing allocations by giving them access to memory
> -		 * reserves but do not use ALLOC_NO_WATERMARKS because this
> +		 * Help non-failing allocations by giving some access to memory
> +		 * reserves normally used for high priority non-blocking
> +		 * allocations but do not use ALLOC_NO_WATERMARKS because this
>  		 * could deplete whole memory reserves which would just make
> -		 * the situation worse
> +		 * the situation worse.
>  		 */
> -		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac);
>  		if (page)
>  			goto got_pg;
>  


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-12-08 16:51   ` Vlastimil Babka
@ 2023-01-04 11:45     ` Mel Gorman
  0 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2023-01-04 11:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linux-MM, Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

First off, sorry for the long delay getting back to you. I was sick for
a few weeks and still catching up. I'm still not 100%.

On Thu, Dec 08, 2022 at 05:51:11PM +0100, Vlastimil Babka wrote:
> On 11/29/22 16:16, Mel Gorman wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index da746e9eb2cf..e2b65767dda0 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> >  		 * reserved for high-order atomic allocation, so order-0
> >  		 * request should skip it.
> >  		 */
> > -		if (order > 0 && alloc_flags & ALLOC_HARDER)
> > +		if (alloc_flags & ALLOC_HIGHATOMIC)
> >  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> >  		if (!page) {
> >  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> > @@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
> >  			return true;
> >  		}
> >  #endif
> > -		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
> > +		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
> > +		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
> >  			return true;
> 
> alloc_harder is defined as
> 	(alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
> AFAICS this means we no longer allow ALLOC_OOM to use the highatomic
> reserve. Isn't that a risk?
> 

Yes, it is. I intend to apply the patch below on top. I didn't alter the
first check for ALLOC_HIGHATOMIC as I wanted OOM handling to only use the
high-order reserves if there was no other option. While this is a change
in behaviour, it should be a harmless one. I'll add a note in the changelog.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 50fc1e7cb154..0ef4f3236a5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3710,6 +3710,16 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
+
+			/*
+			 * If the allocation fails, allow OOM handling access
+			 * to HIGHATOMIC reserves as failing now is worse than
+			 * failing a high-order atomic allocation in the
+			 * future.
+			 */
+			if (!page && (alloc_flags & ALLOC_OOM))
+				page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
+
 			if (!page) {
 				spin_unlock_irqrestore(&zone->lock, flags);
 				return NULL;
@@ -4023,7 +4033,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			return true;
 		}
 #endif
-		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
+		if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
 		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
 			return true;
 		}
-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
  2022-12-05  5:17   ` NeilBrown
@ 2022-12-08 16:51   ` Vlastimil Babka
  2023-01-04 11:45     ` Mel Gorman
  1 sibling, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2022-12-08 16:51 UTC (permalink / raw)
  To: Mel Gorman, Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, LKML

On 11/29/22 16:16, Mel Gorman wrote:
> A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
> is accurate, it changes later in the series. In preparation, explicitly
> record high-order atomic allocations in gfp_to_alloc_flags().
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/internal.h   |  1 +
>  mm/page_alloc.c | 19 +++++++++++++------
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index d503e57a57a1..9a9d9b5ee87f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -754,6 +754,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #else
>  #define ALLOC_NOFRAGMENT	  0x0
>  #endif
> +#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  enum ttu_flags;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index da746e9eb2cf..e2b65767dda0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
>  		 * reserved for high-order atomic allocation, so order-0
>  		 * request should skip it.
>  		 */
> -		if (order > 0 && alloc_flags & ALLOC_HARDER)
> +		if (alloc_flags & ALLOC_HIGHATOMIC)
>  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
>  		if (!page) {
>  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> @@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  			return true;
>  		}
>  #endif
> -		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
> +		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
> +		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
>  			return true;

alloc_harder is defined as
	(alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
AFAICS this means we no longer allow ALLOC_OOM to use the highatomic
reserve. Isn't that a risk?

> +		}
>  	}
>  	return false;
>  }
> @@ -4291,7 +4293,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  			 * If this is a high-order atomic allocation then check
>  			 * if the pageblock should be reserved for the future
>  			 */
> -			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
> +			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
>  				reserve_highatomic_pageblock(page, zone, order);
>  
>  			return page;
> @@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
>  }
>  
>  static inline unsigned int
> -gfp_to_alloc_flags(gfp_t gfp_mask)
> +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  {
>  	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
>  
> @@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> +		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
>  			alloc_flags |= ALLOC_HARDER;
> +
> +			if (order > 0)
> +				alloc_flags |= ALLOC_HIGHATOMIC;
> +		}
> +
>  		/*
>  		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
>  		 * comment for __cpuset_node_allowed().
> @@ -5053,7 +5060,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 * kswapd needs to be woken up, and to avoid the cost of setting up
>  	 * alloc_flags precisely. So we do that now.
>  	 */
> -	alloc_flags = gfp_to_alloc_flags(gfp_mask);
> +	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
>  
>  	/*
>  	 * We need to recalculate the starting point for the zonelist iterator


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-12-05  5:17   ` NeilBrown
@ 2022-12-05 10:27     ` Mel Gorman
  0 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2022-12-05 10:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Linux-MM, Andrew Morton, Michal Hocko, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML

On Mon, Dec 05, 2022 at 04:17:12PM +1100, NeilBrown wrote:
> 
> Hi Mel,
>  thanks a lot for doing this!

My pleasure.

>  I tried reviewing it but "HIGHATOMIC" is new to me and I quickly got
>  lost :-(

That's ok, HIGHATOMIC reserves are obscure and internal to the
allocator. It's almost as obscure as granting access to reserves for RT
tasks with the only difference being a lack of data on what sort of RT
tasks needed extra privileges back in the early 2000's but I happened to
remember why highatomic reserves were introduced. It was introduced when
fragmentation avoidance triggered  high-order atomic allocations failures
that "happened to work" by accident before fragmentation avoidance (details
in 0aaa29a56e4f). IIRC, there were a storm of bugs, mostly on small embedded
platforms where devices without an IOMMU relied on high-order atomics
to work so a small reserve was created. I was concerned your patch would
trigger this class of bug again even though it might take a few years to
show up as embedded platforms tend to stay on old kernels for ages.

The main point of this patch is identifying these high-order atomic
allocations without relying on __GFP_ATOMIC but it should not be necessary
to understand how high-order atomic reserves work. They still work the
same way, access is just granted differently.

>  Maybe one day I'll work it out - now that several names are more
>  meaningful, it will likely be easier.
> 
> > @@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
> >  }
> >  
> >  static inline unsigned int
> > -gfp_to_alloc_flags(gfp_t gfp_mask)
> > +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
> >  {
> >  	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
> >  
> > @@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
> >  		 * if it can't schedule.
> >  		 */
> > -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> > +		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> >  			alloc_flags |= ALLOC_HARDER;
> > +
> > +			if (order > 0)
> > +				alloc_flags |= ALLOC_HIGHATOMIC;
> > +		}
> > +
> >  		/*
> >  		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
> >  		 * comment for __cpuset_node_allowed().

This is the most crucial hunk two hunks of the patch. If the series is
merged and we start seeing high-order atomic allocations failure, the key
will be to look at the gfp_flags and determine if this hunk is enough to
accurately detect high-order atomic allocations. If the GFP flags look ok,
then a tracing debugging patch will be created to dump gfp_flags every
time access is granted to high-atomic reserves to determine if access is
given incorrectly and under what circumstances.

The main concern I had with your original patch was that it was too
easy to grant access to high-atomic reserves for requests that were not
high-order atomics requests which might exhaust the reserves. The rest of
the series tries to improve the naming of the ALLOC flags and what they
mean. The actual changes to your patch are minimal.  I could have started
with your patch and fixed it up but I preferred this ordering to reduce
use of __GFP_ATOMIC and then delete it. It should be bisection safe.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
@ 2022-12-05  5:17   ` NeilBrown
  2022-12-05 10:27     ` Mel Gorman
  2022-12-08 16:51   ` Vlastimil Babka
  1 sibling, 1 reply; 15+ messages in thread
From: NeilBrown @ 2022-12-05  5:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Andrew Morton, Michal Hocko, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman


Hi Mel,
 thanks a lot for doing this!
 I tried reviewing it but "HIGHATOMIC" is new to me and I quickly got
 lost :-(
 Maybe one day I'll work it out - now that several names are more
 meaningful, it will likely be easier.

Thanks,
NeilBrown

On Wed, 30 Nov 2022, Mel Gorman wrote:
> A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
> is accurate, it changes later in the series. In preparation, explicitly
> record high-order atomic allocations in gfp_to_alloc_flags().
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/internal.h   |  1 +
>  mm/page_alloc.c | 19 +++++++++++++------
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index d503e57a57a1..9a9d9b5ee87f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -754,6 +754,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  #else
>  #define ALLOC_NOFRAGMENT	  0x0
>  #endif
> +#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
>  #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
>  
>  enum ttu_flags;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index da746e9eb2cf..e2b65767dda0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
>  		 * reserved for high-order atomic allocation, so order-0
>  		 * request should skip it.
>  		 */
> -		if (order > 0 && alloc_flags & ALLOC_HARDER)
> +		if (alloc_flags & ALLOC_HIGHATOMIC)
>  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
>  		if (!page) {
>  			page = __rmqueue(zone, order, migratetype, alloc_flags);
> @@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  			return true;
>  		}
>  #endif
> -		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
> +		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
> +		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
>  			return true;
> +		}
>  	}
>  	return false;
>  }
> @@ -4291,7 +4293,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  			 * If this is a high-order atomic allocation then check
>  			 * if the pageblock should be reserved for the future
>  			 */
> -			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
> +			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
>  				reserve_highatomic_pageblock(page, zone, order);
>  
>  			return page;
> @@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
>  }
>  
>  static inline unsigned int
> -gfp_to_alloc_flags(gfp_t gfp_mask)
> +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
>  {
>  	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
>  
> @@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
>  		 * if it can't schedule.
>  		 */
> -		if (!(gfp_mask & __GFP_NOMEMALLOC))
> +		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
>  			alloc_flags |= ALLOC_HARDER;
> +
> +			if (order > 0)
> +				alloc_flags |= ALLOC_HIGHATOMIC;
> +		}
> +
>  		/*
>  		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
>  		 * comment for __cpuset_node_allowed().
> @@ -5053,7 +5060,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 * kswapd needs to be woken up, and to avoid the cost of setting up
>  	 * alloc_flags precisely. So we do that now.
>  	 */
> -	alloc_flags = gfp_to_alloc_flags(gfp_mask);
> +	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
>  
>  	/*
>  	 * We need to recalculate the starting point for the zonelist iterator
> -- 
> 2.35.3
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags
  2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
@ 2022-11-29 15:16 ` Mel Gorman
  2022-12-05  5:17   ` NeilBrown
  2022-12-08 16:51   ` Vlastimil Babka
  0 siblings, 2 replies; 15+ messages in thread
From: Mel Gorman @ 2022-11-29 15:16 UTC (permalink / raw)
  To: Linux-MM
  Cc: Andrew Morton, Michal Hocko, NeilBrown, Thierry Reding,
	Matthew Wilcox, Vlastimil Babka, LKML, Mel Gorman

A high-order ALLOC_HARDER allocation is assumed to be atomic. While that
is accurate, it changes later in the series. In preparation, explicitly
record high-order atomic allocations in gfp_to_alloc_flags().

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h   |  1 +
 mm/page_alloc.c | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index d503e57a57a1..9a9d9b5ee87f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -754,6 +754,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 #else
 #define ALLOC_NOFRAGMENT	  0x0
 #endif
+#define ALLOC_HIGHATOMIC	0x200 /* Allows access to MIGRATE_HIGHATOMIC */
 #define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */
 
 enum ttu_flags;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index da746e9eb2cf..e2b65767dda0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3710,7 +3710,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 		 * reserved for high-order atomic allocation, so order-0
 		 * request should skip it.
 		 */
-		if (order > 0 && alloc_flags & ALLOC_HARDER)
+		if (alloc_flags & ALLOC_HIGHATOMIC)
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
@@ -4028,8 +4028,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			return true;
 		}
 #endif
-		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
+		if ((alloc_flags & ALLOC_HIGHATOMIC) &&
+		    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
 			return true;
+		}
 	}
 	return false;
 }
@@ -4291,7 +4293,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 			 * If this is a high-order atomic allocation then check
 			 * if the pageblock should be reserved for the future
 			 */
-			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
+			if (unlikely(alloc_flags & ALLOC_HIGHATOMIC))
 				reserve_highatomic_pageblock(page, zone, order);
 
 			return page;
@@ -4818,7 +4820,7 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
 }
 
 static inline unsigned int
-gfp_to_alloc_flags(gfp_t gfp_mask)
+gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
 {
 	unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
 
@@ -4844,8 +4846,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 * Not worth trying to allocate harder for __GFP_NOMEMALLOC even
 		 * if it can't schedule.
 		 */
-		if (!(gfp_mask & __GFP_NOMEMALLOC))
+		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
 			alloc_flags |= ALLOC_HARDER;
+
+			if (order > 0)
+				alloc_flags |= ALLOC_HIGHATOMIC;
+		}
+
 		/*
 		 * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the
 		 * comment for __cpuset_node_allowed().
@@ -5053,7 +5060,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 * kswapd needs to be woken up, and to avoid the cost of setting up
 	 * alloc_flags precisely. So we do that now.
 	 */
-	alloc_flags = gfp_to_alloc_flags(gfp_mask);
+	alloc_flags = gfp_to_alloc_flags(gfp_mask, order);
 
 	/*
 	 * We need to recalculate the starting point for the zonelist iterator
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-02-07 13:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13 11:12 [PATCH 0/6 v3] Discard __GFP_ATOMIC Mel Gorman
2023-01-13 11:12 ` [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Mel Gorman
2023-01-13 11:12 ` [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to __GFP_HIGH Mel Gorman
2023-01-13 11:12 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
2023-01-13 13:02   ` Michal Hocko
2023-01-13 11:12 ` [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Mel Gorman
2023-01-13 11:12 ` [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves Mel Gorman
2023-01-13 13:06   ` Michal Hocko
2023-02-07 13:32   ` Vlastimil Babka
2023-01-13 11:12 ` [PATCH 6/6] mm: discard __GFP_ATOMIC Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2022-11-29 15:16 [RFC PATCH 0/6] Discard __GFP_ATOMIC Mel Gorman
2022-11-29 15:16 ` [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Mel Gorman
2022-12-05  5:17   ` NeilBrown
2022-12-05 10:27     ` Mel Gorman
2022-12-08 16:51   ` Vlastimil Babka
2023-01-04 11:45     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).